SFB 441 Project C2
Head of the project
Prof. Dr. Marga Reis
Deutsches Seminar
Universität Tübingen
Wilhelmstraße 50
72074 Tübingen
Tel. +49/7071/29-76741; Fax +49/7071/29-5321
email: mer <[at]> uni-tuebingen.de
Prof. Dr. Erhard Hinrichs
Seminar für Sprachwissenschaft
Universität Tübingen
Wilhelmstr. 19
72074 Tübingen
Tel. +49/7071/29-75446
Fax. +49/7071/29-5214
Former staff members
SFB 441 – Universität Tübingen
Dr. Georg Rehm
Dipl.-Inform. Oliver Schonefeld
Dr. Andreas Witt
SFB 538 – Universität Hamburg
Timm Lehmberg
SFB 632 – Universität Potsdam
Christian Chiarcos
Summary
The project C2 aims at preparing language resources to assure an accessible dissemination and sustainable storage of linguistic corpora. One of the main goals of the project is a practical one: resources acquired in long-term projects situated in three Collaborative Research Centres have to be converted in either one or multiple formats to be sustainably usable by researchers and applications. Furthermore, the project will provide unified methods of access for the heterogeneous data acquired in the projects. In addition to the preparation of existing language corpora, general methodologies and rules of best practice will be developed.
The linguistic resources dealt by C2 are highly heterogeneous:
- the primary data itself is heterogeneous:
- size (e.g., single sentences vs. entire articles),
- text types / data types (e.g. newspaper texts, diachronic texts, dialogues, treebanks, ...)
- modality (monologue vs. dialogue),
- categories of information covered by the annotation / annotation levels (e.g. layout, textual structure, morpho-syntax, syntax, ...)
- underlying linguistic theories
- language
- the annotations require data structures of various types (attribute-value pairs, trees, pointers, etc.)
- data is annotated by means of different, task-specific annotation tools