The Stuttgart-Tübingen Tagset consists of a set of 54 part-of-speech tags for annotating German text corpora with word class information. It has become a standard for annotating POS tags in German.
Tübingen Treebank of Written German - TüBa-D/Z
The Tübingen Treebank of Written German (TüBa-D/Z) is a syntactically annotated newspaper corpus based on data of they daily newspapyer "die tageszeitung". The syntactic annotation was performed manually.
Tübingen's Partially Parsed Corpus of Written German - TüPP-D/Z
TüPP-D/Z is a collection of articles from the daily newspaper, "die tageszeitung", which have been automatically annotated with clause structure, topological fields, and chunks, in addition to more low level annotation including parts of speech and morphological ambiguity classes.
The TüPP-D/Z data of the current release is taken from the 1999 HTML distribution (scientific edition) of the "tageszeitung", which includes newspaper articles from September 2, 1986 up to May 7, 1999 and which amounts to more than 200 million word tokens of text.
Web-Harvested Corpus Annotated with GermaNet Senses - WebCAGe
WebCAGe (short for: Web-Harvested Corpus Annotated with GermaNet Senses) is a domain-independent web-harvested corpus that has been semi-automatically annotated with senses from the German wordnet GermaNet. In order to assure good quality, all automatic annotations have been manually verified.
The Index Thomisticus Treebank is a syntactically annotated corpus of works by Thomas Aquinas. It is a dependency treebank of Latin texts containing 170,030 tokens in a total of 9,497 syntactically parsed and tagged sentences from three of Thomas Aquinas' works.