Seminar für Sprachwissenschaft

TüBa-D/DP release 5

TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z UD annotation guidelines (Çöltekin et al., 2017) as closely as possible.

Each text of the TüBa-D/DP is annotated with the following layers:

  • Universal part-of-speech tags
  • STTS part-of-speech tags
  • Inflectional morphology (UD and TüBa-D/Z)
  • Lemmas
  • Topological fields
  • UD dependency relations

A more detailed description of the annotation guidelines can be found in the stylebook.

Subcorpora

Subcorpus Genre Sentences Tokens Download
Europarl Parliamentary proceedings 2.2M 55M Download
Politicial speeches Speeches held by officials 619,152 12.8M Download
Die Tageszeitung (taz) Newspaper 29.9M 393.7M Contact us
Wikipedia Encyclopedia 45.5M 917.5M Download

View and search

Add links to the treebanks in TüNDRA here.

Licensing

Citation

Please cite the following reference if you use this treebank in your work:

TüBa-D/DP stylebook, Daniël de Kok and Sebastian Pütz, 2019, Seminar für Sprachwissenschaft, University of Tübingen