Introduction to Automatic Text Recognition

Tobias Hodel, Staatsarchiv des Kantons Zürich tobias.hodel@uzh.ch

The workshop will give an introduction in the technologies of text recognition. We will use Transkribus, a free and open platform for recognizing text, using Optical Character Recognition (OCR) as well as Handwritten Text Recognition (HTR).

Transkribus is developed in project READ:

Archives, libraries and universities are increasingly investing in the digitisation of their collections but digital images of handwritten material are still only available to those people who have the time to study each page in some depth. The next step is to use computers to process and search images of historical papers. This is the rationale of the READ project; it aims to revolutionise access to historical collections by supporting cutting-edge research in Automatic Text Recognition and associated technologies. These technological innovations are made available in the Transkribus research platform.
The workshop will show how the Transkribus transcription platform can be used to perform the automated transcription and searching of documents. It will give an overview of the technology and explain how accurate automatic recognition can be.
Workshop participants will use their own laptops to experiment with Transkribus during the session.

Please download the software (available for Linux, Mac and Windows) before the workshop starts: https://transkribus.eu/Transkribus/

You can also bring your own images and documents to the workshop.

Information and news regarding READ and Transkribus can be found online: https://read.transkribus.eu/

First steps and how to guides can be found on the webpage:
https://read.transkribus.eu/2016/06/28/transkribus-how-to-guides/

Our wiki provides you with more in-depth information: https://transkribus.eu/wiki/index.php/Main_Page

READ has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 674943

Leave a comment

Your email address will not be published. Required fields are marked *