Seminar für Sprachwissenschaft

Description

GermaNet is a lexical-semantic net that relates German nouns, verbs, and adjectives. Semantic relations between distinct concepts are defined, where each concept is represented as a set of synonyms (synset). A synset consists of one or more lexical units, each of which represents a specific sense of a word in its base form. For example, the word Stück has several senses, each of which belongs to a different synset (e.g. {Stück} (piece), {Stück,Musikstück,Komposition} (piece of music), {Stück,Bühnenstück,Theaterstück} (stage play)).

Synsets are linked via conceptual relations, which hold for entire concepts. The main conceptual relations in GermaNet are those which organize synsets based on the generality of their concepts (hypernymy and its inverse hyponymy). Synsets are placed into the network in such a way as to form a progression from general to specific concepts using the hyponymy relation (e.g. {Kunstwerk} -> {Musikstück} -> {Trio}). Relations between individual words (lexical relations) also exist, the most important being synonymy (Stück, Musikstück, and Komposition are synonyms).

For each of the word classes (noun, verb, adjective), the semantic space is divided into a number of semantic fields. Each synset is assigned a semantic field and a word class. However, connectivity is not restricted to words in the same semantic field or word class. For example, the relation 'causes' relates verbs to adjectives.

GermaNet contains only base forms of words. Nouns are cited in their nominative singular form, verbs are cited in their infinitive form, and adjectives are cited without endings for gender.

The design of GermaNet is based on the Princeton WordNet, but differs from it in several significant aspects:

  • Adjectives are structured hierarchically
  • GermaNet represents a fully connected graph
  • The causation relation can occur between words of different classes
  • Uniform treatment of the meronymy relation

More detailed information about the GermaNet's semantic fields, word classes, and relations can be found in the "Description Navigation" links. We also include sections on compounds, which require special care when being inserted into the graph, and on GermaNet's connectivity to other data sources.

Lexicographer Guidelines

The following guidelines are used by the GermaNet lexicographers:

  • GermaNet contains only base forms of words. It is assumed that inflected forms are being mapped to base forms by some external morphological analyzer.
    • Nouns: Ordinary nouns are cited by their nominative singular form.
      Plurale tantum are cited by their nominative plural form, e.g.: Kosten.
      For nouns derived from adjectives or verbs, the indefinite nominative singular form is generally used, e.g.: (ein) Angestellter(eine) Angestellte.
    • Verbs are cited by their infinitive form.
    • Adjectives are cited without endings for gender.
  • The amount of polysemy is kept to a minimum. Additional senses are introduced only if the sense conflicts with the coordinates of other senses of the word in the network. When in doubt, GermaNet refers to the degree of polysemy given in standard monolingual print dictionaries.
  • Abbreviations are covered if they form part of every day's language and are used in speech instead of the equivalent full form (e.g.: AIDS, SPD, EDV, LSD,etc.).
  • Multi word expressions are covered if they are commonly used and if they function as lexical units due to the strong collocational relation between their parts (e.g. Hab und Gut, Erste Hilfe, instand setzen).
  • Concepts referring to human beings and thus indicating natural sexus will be treated as:
    Two separate synsets if the difference in sexus is lexicalized (Mann/Frau).
    One synset with two lexical units, listing the masculine and the feminine form (Lehrer, Lehrerin), otherwise.
  • Orthography: The new German orthography will be used.
    Additional citation forms may be listed as variants:
              Orth Form: Fantasie, Orth Var: Phantasie
              Orth Form: Selbstständigkeit, Orth Var: Selbständigkeit
              Orth Form: Cousine, Orth Var: Kusine

    In addition, the old spelling forms and variants are listed as well:
              Old Orth Form: Schiffahrt
              Old Orth Form: Fluß
              Old Orth Form: Schwarz-Weiß-Photo, Old Orth Var: Schwarzweißphoto
  • Lexical Gaps/Artificial Concepts:Concepts which do not exist in German, but which are required in order to build a proper hierarchy are marked as artificial. We refer to such concepts as Lexical Gaps (e.g. natürliches Phänomen). Note that attributive adjectives are cited in lower case unless they are lexicalized (no longer a lexical Gap) as in Erste Hilfe.
  • Named Entities: Proper names are only covered if they refer to a single non-linguistic item in the real world. Therefore, geographical names, organizations, etc., (for example  Deutschland, Bündnis für Arbeit) are marked as named entities whereas nationalities are not. Proper names that refer to persons are not included.
  • Style Marking: Stylistic variants are marked by a special feature:
              schnipsen, stylistic variant: schnippen
              Po, stylistic variant: Arsch
              arbeiten, stylistic variant: schaffen
  • Definitions (Paraphrases): We provide a relatively small number of textual definitions for senses in GermaNet. Lexicographers add definitions when they feel that a particular sense is not adequately defined by its synonyms and/or its immediate neighbor nodes in the network.
    The definitions are non-formalized textual descriptions of the concepts:
              Horizont: Linie, an der sich Himmel und Erde bzw. Meer scheinbar berühren
    Example sentences may be given instead of, or in addition to, free text descriptions:
              abbauen(1): Sie haben das Gerüst schon wieder abgebaut.
              abbauen(2): Hier wird Kohle abgebaut.