University Library

Automatic text recognition for prints and manuscripts

Work involving not only historic but also modern prints and handwriting can be considerably simplified if there is a machine-readable and searchable full text. This can be produced using OCR (Optical Character Recognition) and HTR (Handwritten Text Recognition).

One program that can be used for this is the transcription platform Transkribus. Good to very good results can often be achieved from printed materials in particular, but also some forms of handwriting such as the German cursive known as Kurrent.

Transkribus.ai

For an initial test you can simply upload image files to Transkribus.ai, select the language and whether it is handwriting or print.

  • No need to register for the service
  • Use is generally free
  • The page count is limited
  • No choice of text recognition package

TranskribusLite

If you have a larger document and want to be able to process it or select the most appropriate text recognition package, TranskribusLite allows an easy introduction. In the browser version of the program you can process the layout, choose between 125 public text recognition packages for different languages and scripts, transcribe yourself, or correct automatic transcription, tag texts and structural elements and train your own text recognition package. You can organize your documents in collections and work on them together with other users.

  • You need to register for this
  • There are charges for automatic text recognition, all other features are free (each account receives a free quota on registration)
  • TranskribusLite and Expert Client are interoperable
  • There is an introductory course on YouTube

Transkribus Expert Client

The desktop version of Transkribus has the widest range of features. In addition to the features of TranskribusLite, Expert Client includes among other things the search feature keyword spotting, a language model for text recognition, advanced layout processing features (e.g. tables and structural model training) and additional import/export formats.

  • You need to register for this
  • You need to download a program for this
  • There are charges for automatic text recognition, all other features are free (each account receives a free quota on registration)
  • TranskribusLite and Expert Client are interoperable
  • There is an introductory course on YouTube

Text recognition

Costs

Initially each account receives 500 credits. This allows approx. 400-500 pages of manuscript or approx. 2500-3000 pages of printed text to be OCRd.

Bachelor, Master and PhD students and course organizers may apply to have costs covered by a grant.

Please contact us for more information, support with funding for projects, etc.

Selecting a text recognition package

When selecting a public text recognition package the following criteria are relevant: language, period and type of script.

What to do if you do not know what type of script it is: Here you can find some examples with suggestions for potentially suitable text recognition packages. Results vary depending on how similar the script in the individual document is to the material on which the text recognition package has been trained.

Support from the University Library

If you have further questions about use of automatic text recognition for your studies, dissertation or a project or are interested in this, please contact Dorothee Huff at Project OCR-BW (OCR center of excellence at Mannheim and Tübingen university libraries): dorothee.huffspam prevention@uni-tuebingen.de.