The TüBa-D/Z treebank has been manually annotated with senses from the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation. The underlying textual resource, the TüBa-D/Z treebank, is a German newspaper corpus already manually enriched with high-quality, manual annotations at various levels of grammar. The sense inventory used for tagging word senses is taken from GermaNet, the German counterpart of the Princeton WordNet for English. With the sense annotation for a selected set of 109 words (30 nouns and 79 verbs) occurring together 17 910 times in the TüBa-D/Z, the treebank currently represents the largest manually sense-annotated corpus available for GermaNet.

More information on the sense annotation, the annotation process, and the annotated lemmas can be found in the papers listed below.

The sense annotations are freely available as part of release 9.1 of the treebank.



If you use the TüBa-D/Z sense annotations in the context of scientific or research work, please cite the following papers:


Verena Henrich and Erhard Hinrichs: Consistency of Manual Sense Annotation and Integration into the TüBa-D/Z Treebank. In Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13), Tübingen, Germany, December 2014, pp. 62-74.

Verena Henrich and Erhard Hinrichs: Extending the TüBa-D/Z Treebank with GermaNet Sense Annotation. In Iryna Gurevych, Chris Biemann, and Torsten Zesch (eds.): Language Processing and Knowledge in the Web, Lecture Notes in Computer Science, Vol. 8105, 2013, pp. 89-96.

