The TüBa-D/S treebank was annotated in the project Verbmobil. Verbmobil was a longterm Machine Translation project for spontaneous speech funded by the Ministry for Education, Science, Research, and Technology (BMBF).
The Tübingen Treebank of Spoken German, TüBa-D/S, is a syntactically annotated corpus based on spontaneous dialogues, which were manually transliterated. The treebank comprises approximately 38 000 sentences (ca. 360 000 words). The syntactic annotation was performed manually.
The syntactic annotation is based on assumptions which are uncontroversial among major syntactic theories. The annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The primary ordering principle of a clause is the inventory of topological fields, which characterize the word order regularities among different clause types of German, and which are widely accepted among descriptive linguists of German. In addition to constituent structure, annotated trees contain edge labels between nodes. These edge labels encode grammatical functions (as relation between phrases) and the distinction between heads and non-heads (as phrase-internal relations).
The annotation scheme is surface-oriented in that it relies on a context-free backbone and uses neither crossing branches nor traces. Instead, it describes long-distance relations by specific functional labels.
An extensive description of the complete annotation scheme can be found in the stylebook:
- Stylebook (pdf) (ca. 1.3 MB)
The treebank is available in 3 different formats:
The negra export format can be used in combination with the annotation tool Annotate (no longer maintained), which was developed in the Project negra at the Computational Linguistics Department at the University of the Saarland or with the TIGERSearch Tool developed in the TIGER project at the Institute for Natural Language Processing, University of Stuttgart. The XML data can be viewed with any XML viewer.
How to Obtain a License for TüBa-D/S:
For academic research, the license is provided free of charge. For all other uses please contact Erhard Hinrichs for further details.
Please note that we do not give licenses to individuals.
Students who are interested in using TüBa-D/S for a research project or a thesis project should contact their advisors to obtain a license for their academic institutions. The license agreement must be signed by a duly authorized person.
For an academic research license, follow these steps:
- Print the License agreement for TüBa-D/S (PDF).
- Fill out the license agreement and send it back via post, fax or scan to tuebadz-info. Please include a short description of the intended academic research use.
- After processing the license, we will send you a password for the download webpage.
- Download the treebank.
Eberhard Karls Universität Tübingen
Department of Computational Linguistics
D-72074 Tübingen, Germany
Fax: +49 - 7071 - 29 5214