TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z UD annotation guidelines (Çöltekin et al., 2017) as closely as possible.
Each text of the TüBa-D/DP is annotated with the following layers:
- Universal part-of-speech tags
- STTS part-of-speech tags
- Inflectional morphology (UD and TüBa-D/Z)
- Topological fields
- UD dependency relations
A more detailed description of the annotation guidelines can be found in the stylebook.
|Subcorpus||Genre||Sentences||Tokens||Download||View / Search|
|Politicial speeches||Speeches held by officials||619,152||12.8M||Download||TüNDRA|
|Die Tageszeitung (taz)||Newspaper||29.9M||393.7M||Contact us|
Several Subcorpora of the TüBa-D/DP treebank (Europarl, Wikipedia, political speeches) can be viewed and searched with the TüNDRA web application.
- The political speeches corpus is provided by Adrien Barbaresi under the Creative Commons Attribution-ShareAlike 4.0 International License.
- The Wikipedia subcorpus is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
- The raw text of 'die tageszeitung' used in the corpus is copyright of contrapress media GmbH, Berlin. Licenses will be granted on a case-by-case basis at the discretion of the copyright holder, and may include charges or restrictions on the data use. Please contact tuebadz-info for more information.
Please cite the following reference if you use this treebank in your work:
TüBa-D/DP stylebook, Daniël de Kok and Sebastian Pütz, 2019, Seminar für Sprachwissenschaft, University of Tübingen