University Library

Automatic text recognition for prints and manuscripts

Work involving not only historic but also modern prints and handwriting can be considerably simplified if there is a machine-readable and searchable full text. This can be produced using OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition).

One program that can be used for this is the transcription platform Transkribus. Good to very good results can often be achieved from printed materials in particular, but also some forms of handwriting such as the German cursive known as Kurrent.

Transkribus.ai

For an initial test you can simply upload image files to Transkribus.ai, select the language and whether it is handwriting or print.

  • No need to register for the service
  • Use is generally free
  • The page count is limited
  • No choice of text recognition package

TranskribusLite

If you have a larger document and want to be able to process it or select the most appropriate text recognition package, TranskribusLite allows an easy introduction. In the browser version of the program you can process the layout, choose between 125 public text recognition packages for different languages and scripts, transcribe yourself, or correct automatic transcription, tag texts and structural elements and train your own text recognition package. You can organize your documents in collections and work on them together with other users.

  • You need to register for this
  • There are charges for automatic text recognition, all other features are free (each account receives a free quota on registration)
  • TranskribusLite and Expert Client are interoperable
  • There is an introductory video on YouTube

Transkribus Expert Client

The desktop version of Transkribus has the widest range of features. In addition to the features of TranskribusLite, Expert Client includes among other things the search feature keyword spotting, a language model for text recognition, advanced layout processing features (e.g. tables and structural model training) and additional import/export formats.

  • You need to register for this
  • You need to download a program for this
  • There are charges for automatic text recognition, all other features are free (each account receives a free quota on registration)
  • TranskribusLite and Expert Client are interoperable
  • There is an introductory course on YouTube

Text recognition

Costs

Each account receives 100 free credits per month. This allows approx. 100 pages of manuscript or approx. 600 pages of printed text to be OCRd.

Bachelor, Master and PhD students and course organizers may apply to have costs covered by a grant.

Various subscription plans are also offered. Please contact us for more information, support with funding for projects, etc.

Selecting a text recognition package

When selecting a public text recognition package the following criteria are relevant: language, period and type of script.

What to do if you do not know what type of script it is: Here you can find some examples with suggestions for potentially suitable text recognition packages. Results vary depending on how similar the script in the individual document is to the material on which the text recognition package has been trained.

Support from the University Library

Further information on the subject of automatic text recognition and other software can be found on the homepage of the OCR competence center operated jointly by the University Libraries of Tübingen and Mannheim. The OCR-Recommender will generate a recommendation for the best OCR method for you or simply send us some sample pages.

If you have any questions about the application or if you are interested in integrating automatic text recognition methods into your studies, your scientific work or a project, please take part in our open OCR consultation hours or contact Dorothee Huff.