Seminar für Sprachwissenschaft

TüBa-D/DP release 5

TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z UD annotation guidelines (Çöltekin et al., 2017) as closely as possible.

Each text of the TüBa-D/DP is annotated with the following layers:

  • Universal part-of-speech tags
  • STTS part-of-speech tags
  • Inflectional morphology (UD and TüBa-D/Z)
  • Lemmas
  • Topological fields
  • UD dependency relations

A more detailed description of the annotation guidelines can be found in the stylebook.

Subcorpora

Subcorpus Genre Sentences Tokens Download View / Search
Europarl Parliamentary proceedings 2.2M 55M Download TüNDRA
Politicial speeches Speeches held by officials 619,152 12.8M Download TüNDRA
Die Tageszeitung (taz) Newspaper 29.9M 393.7M Contact us  
Wikipedia Encyclopedia 45.5M 917.5M Download TüNDRA

View and search

Several Subcorpora of the TüBa-D/DP treebank (Europarl, Wikipedia, political speeches) can be viewed and searched with the TüNDRA web application.

Licensing

Citation

Please cite the following reference if you use this treebank in your work:

TüBa-D/DP stylebook, Daniël de Kok and Sebastian Pütz, 2019, Seminar für Sprachwissenschaft, University of Tübingen