Seminar für Sprachwissenschaft

SFB 441 Project C2

Head of the project

Prof. Dr. Marga Reis
Deutsches Seminar
Universität Tübingen
Wilhelmstraße 50
72074 Tübingen
Tel. +49/7071/29-76741; Fax +49/7071/29-5321
email: mer <[at]> uni-tuebingen.de
 
Prof. Dr. Erhard Hinrichs
Seminar für Sprachwissenschaft
Universität Tübingen
Wilhelmstr. 19
72074 Tübingen
Tel. +49/7071/29-75446
Fax. +49/7071/29-5214
 

Former staff members

SFB 441 – Universität Tübingen

Dr. Georg Rehm
Dipl.-Inform. Oliver Schonefeld
Dr. Andreas Witt

SFB 538 – Universität Hamburg

Timm Lehmberg

SFB 632 – Universität Potsdam

Christian Chiarcos

Summary

The project C2 aims at preparing language resources to assure an accessible dissemination and sustainable storage of linguistic corpora. One of the main goals of the project is a practical one: resources acquired in long-term projects situated in three Collaborative Research Centres have to be converted in either one or multiple formats to be sustainably usable by researchers and applications. Furthermore, the project will provide unified methods of access for the heterogeneous data acquired in the projects. In addition to the preparation of existing language corpora, general methodologies and rules of best practice will be developed.

The linguistic resources dealt by C2 are highly heterogeneous:

  • the primary data itself is heterogeneous:
    • size (e.g., single sentences vs. entire articles),
    • text types / data types (e.g. newspaper texts, diachronic texts, dialogues, treebanks, ...)
    • modality (monologue vs. dialogue),
    • categories of information covered by the annotation / annotation levels (e.g. layout, textual structure, morpho-syntax, syntax, ...)
    • underlying linguistic theories
    • language
  • the annotations require data structures of various types (attribute-value pairs, trees, pointers, etc.)
  • data is annotated by means of different, task-specific annotation tools