TüBa-D/DP release 5
TüBa-D/DP is a machine-annotated dependency treebank of German. The goal of TüBa-D/DP is to offer high-qualitity syntactic annotations for a huge amount of contemporary German text. The annotations following the TüBa-D/Z UD annotation guidelines (Çöltekin et al., 2017) as closely as possible.
Each text of the TüBa-D/DP is annotated with the following layers:
- Universal part-of-speech tags
- STTS part-of-speech tags
- Inflectional morphology (UD and TüBa-D/Z)
- Lemmas
- Topological fields
- UD dependency relations
A more detailed description of the annotation guidelines can be found in the stylebook.
Subcorpora
Subcorpus | Genre | Sentences | Tokens | Download | View / Search |
Europarl | Parliamentary proceedings | 2.2M | 55M | Download | TüNDRA |
Politicial speeches | Speeches held by officials | 619,152 | 12.8M | Download | TüNDRA |
Die Tageszeitung (taz) | Newspaper | 29.9M | 393.7M | Contact us | |
Wikipedia | Encyclopedia | 45.5M | 917.5M | Download | TüNDRA |
View and search
Several Subcorpora of the TüBa-D/DP treebank (Europarl, Wikipedia, political speeches) can be viewed and searched with the TüNDRA web application.
Licensing
- The Europarl corpus is provided by Philipp Koehn and as part of OPUS. Terms of use are described on the Europarl website.
- The political speeches corpus is provided by Adrien Barbaresi under the Creative Commons Attribution-ShareAlike 4.0 International License.
- The Wikipedia subcorpus is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
- The raw text of 'die tageszeitung' used in the corpus is copyright of contrapress media GmbH, Berlin. Licenses will be granted on a case-by-case basis at the discretion of the copyright holder, and may include charges or restrictions on the data use. Please contact tuebadz-info for more information.
Citation
Please cite the following reference if you use this treebank in your work:
TüBa-D/DP stylebook, Daniël de Kok and Sebastian Pütz, 2019, Seminar für Sprachwissenschaft, University of Tübingen