Seminar für Sprachwissenschaft

Sense-Annotated Corpora


Sense-Annotated TüBa-D/Z Treebank

The TüBa-D/Z treebank is a syntactically annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung' (taz). The treebank has been manually annotated with senses from GermaNet with the goal of providing a gold standard for word sense disambiguation. The sense annotations are freely available as part of release 9.1 of the treebank.

You can find more information on the sense annotations here.

To obtain the treebank data (including the sense annotations), please follow the steps described here.

Sense-Annotated WebCAGe

WebCAGe (short for: Web-Harvested Corpus Annotated with GermaNet Senses) is a domain-independent web-harvested corpus that has been semi-automatically annotated with senses from GermaNet. In order to assure good quality, all automatic annotations have been manually verified.

You can download WebCAGe here.