Seminar für Sprachwissenschaft

Compounds in GermaNet

Decomposition

Composition is a very productive word formation process in German. For many applications, it is helpful to have information about the parts of the compound, as usually the semantic interpretation is based on the meaning of its parts. In GermaNet, nominal compounds are therefore split into their constituent parts, i.e., modifier and head. This splitting identifies the immediate constituents at each level of analysis and thus reflects the recursive nature of compounds that have more than two constituent parts such as Autobahnanschlussstelle(‘motorway junction’). The immediate constituents of this compound are Autobahn and Anschlussstelle, with the first constituent then splitting further into Auto and Bahn and the second constituent further split into Anschluss and Stelle (see Figure 1).

What makes compound splitting for German a challenging task is the fact that compounding is not always simple string concatenation, but often involves the presence of intervening linking elements or the elision of word-final characters in the modifier constituent of a compound (Henrich & Hinrichs, 2011). In GermaNet, all modifiers are lemmatized and if a modifier is ambiguous with respect to its word class (due to conversion), both possibilities are specified:

  • Laufschuhe: lauf- (en) [verb] and (der) Lauf [noun]
  • Baustelle: bau- (en) [verb] and (der) Bau [noun]

Compound splitting in GermaNet is supported by an automatic algorithm, which combines several individual compound splitters. Please see the referenced paper below for more information on the automatic splitting. All automatically split compounds are manually post-corrected and enriched with relevant properties before they are inserted into GermaNet.

Properties

The following properties are specified for modifiers and/or heads:

Abbreviation

If one part of the compound is an abbreviation, it is labelled as Abkürzung.

Examples:

CompoundModifierHead
SIM-KarteSIM (abbreviation)Karte
ISO-NormISO (abbreviation)Norm
Bonus-CDBonusCD (abbreviation)

Affixoid

Affixoids are morphemes with a special status between bound and free morphemes. As they have a clearly assigned meaning, it makes sense to split the respective words. The bound morpheme is labelled as Affixoid.

Examples:

CompoundModifierHead
Grundfragegrund (affixoid)Frage
Riesenchanceriesen (affixoid)Chance
Hauptsaisonhaupt (affixoid)Saison
Generalschlüsselgeneral (affixoid)Schlüssel

Foreign Word

If one part (or more) of the compound is not a German word, it is labelled as Fremdwort. Note that those constituents which are borrowed words but are nowadays used as loanwords defined in a standard German dictionary (such as Duden) are not considered as foreign words in GermaNet (e.g. Drink and Pool in the examples below).

Examples:

CompoundModifierHead
Longydrinklong (foreign word)Drink
Swimmingpoolswimming (foreign word)Pool
Logdateilog (foreign word)Datei

Konfix

The label Konfix refers to a word which is borrowed from a foreign language, in many cases from Latin or Greek, and whose meaning stems from that particular language. Konfixes are bound morphemes, but in opposition to all other affixes two Konfixes can be combined to form a so-called Konfixkompositum. Those Konfixkomposita are not split in GermaNet, whereas compounds existing of a Konfix and a native word are split.

Examples:

CompoundModifierHead
Milligrammmilli (Konfix)Gramm
Zentimeterzenti (Konfix)Meter
Monokulturmono (Konfix)Kultur

Opaque Morpheme

Modifiers whose meaning is not transparent any more without considering the etymology of the word are labelled with the property opaques Morphem.

Examples:

CompoundModifierHead
HimbeereHim (opaque morpheme)Beere
KarfreitagKar (opaque morpheme)Freitag
SintflutSint (opaque morpheme)Flut
LebkuchenLeb (opaque morpheme)Kuchen
ElfenbeinElfen (opaque morpheme)Bein

Proper Name

If the whole compound is a named entity, it is not split in GermaNet. If only the modifier is a proper name, the compound is split and the label Eigenname is added to the modifier.

Examples:

CompoundModifierHead
HubbleteleskopHubble (proper name)Teleskop
WertherstimmungWerther (proper name)Stimmung
HiobsbotschaftHiob (proper name)Botschaft

Virtual Word Form

Virtual word forms, labelled as Virtuelle Bildung, are regularly built according to existing word formation rules. However, they do not exist in isolation, but only as part of a compound.

Examples:

CompoundModifierHead
EinflussnahmeEinflussNahme (virtual word form)
FragestellerFrageSteller (virtual word form)
FarbgebungFarbeGebung (virtual word form)

Word Group

Modifiers consisting of a phrase are marked as Wortgruppe and the parts of the phrase are annotated as the modifier.

Examples:

CompoundModifierHead
Dreiwege-Katalysatordrei Weg (word group)Katalysator
Nacht-und-Nebel-AktionNacht und Nebel (word group)Aktion
Pro-Kopf-Einkommenpro Kopf (word group)Einkommen

The following table gives an overview of the constituent parts of a compound (i.e. modifier and head) and the corresponding properties that are annotated for each constituent in GermaNet:

PropertyModifierHead
Abbreviationxx
Affixoidxx
Foreign Wordxx
Konfixx 
Opaque Morphemexx
Proper Namex 
Virtual Word Form x
Word Groupx 

Disambiguation of Compound Components

Since the semantic interpretation of compounds typically depends on the meanings of their constituent elements, the availability of detailed information about these components is of considerable analytical value for many applications. However, the components of compounds are often polysemous, which makes adequate computational analysis difficult unless the relevant sense of each component is taken into account.
As a result, the polysemous constituents of the compounds contained in GermaNet have been systematically disambiguated, and all compound components have been annotated with the IDs of the corresponding lexical units.

Modifier Disambiguation

The link between a modifier and its respective sense is established through the specification of the corresponding ID. The semantic relationship between a compound and its modifier allows for a wide range of possible interpretations.

Examples:

CompoundModifierHead
EiswürfelEis ID_01 (frozen water)Würfel
EisbecherEis ID_02 (ice cream)Becher
Süßwassersüß ID_03 (taste-specific)Wasser

If a compound contains two different modifiers, disambiguation is carried out for both components.

Example:

CompoundModifierHead
LaufschuhLauf  ID_04 (running motion)Schuh
 lauf- (en) ID_05 (moving quickly on foot)Schuh

A modifier can sometimes be interpreted in multiple ways; in such cases, all potential meaning variants are recorded by specifying the relevant IDs.

Examples:

CompoundModifierHead
Glaubensfrage

Glaube ID_06 (an unproven conviction)

Glaube ID_07 (religious belief)

Frage
Spielvariante

Spiel ID_08 (sport competition)

Spiel ID_09 (activity done for fun)

Spiel ID_10 (artistic performance)

Variante

If the modifier is an affixoid, confix, foreign word, semantically opaque morpheme, a word class not included in GermaNet, or a complex word group, a semantic assignment is not possible.

Head Disambiguation

In most cases, a compound appears within the conceptual hierarchy as a direct or indirect hyponym of a higher-level hypernym. Accordingly, the head constituent is assigned the ID of the hypernym, since the compound semantically represents a subcategory of that hypernym.

Examples:

CompoundModifierHead
HausschuhHaus ID_11Schuh ID_12  (footwear)
Fahrkartefahren ID_13Karte ID_14  (ticket, receipt)
LandkarteLand ID_15Karte ID_16 (map)
ChipkarteChip ID_17Karte ID_18  (data carrier)

If the compound has a different hypernym than the head constituent, the head’s ID is assigned when the compound can semantically be interpreted as a type of that head.

Examples:

CompoundModifierHead
Backformbacken ID_19Form ID_20 (artifact)
Surfbrettsurfen ID_21Brett ID_22  (board)

A Backform is a kind of Form (as an artifact), and a Surfbrett is a kind of Brett; however, their respective hypernyms in GermaNet are kitchenware for Backform and winter sports equipment for Surfbrett.

If the compound cannot be semantically interpreted as a kind of its head, no ID is assigned to the head constituent. For example, a Nichtraucher (“non-smoker”) is not a type of Raucher (“smoker”), and Acrylglas (“acrylic glass”) is not a type of Glas (“glass”).

Similarly, no ID is assigned if there is a part–whole relation between the compound and the head constituent.
In such cases, the relation to the head is recorded explicitly as a part–whole relation. For example, Viertelliter (“quarter liter”) is not a type of Liter (“liter”), but part of a liter, so no head ID is given. Instead, the part–whole relation is recorded as:
Viertelliter – has_portion_holonym – Liter

Figurative Meanings

If a compound is used idiomatically or metaphorically in its overall meaning, no semantic sense is assigned to either constituent.

Examples:
Frauenschuh (as an orchid species),  Eselsbrücke ("mnemonic"), Fettnäpfchen ("social blunder")

If only the head constituent is used metaphorically, no ID is assigned to the head, but the modifier is still disambiguated.

Examples:

CompoundModifierHead
Baulöwe

Bau ID_23

bauen ID_24

Löwe
GlückspilzGlück ID_25Pilz
ZaunkönigZaun ID_26König

Download

In addition to the information described above that is included in GermaNet (since release 8.0), a list of split compounds with their modifier(s) and head is freely available for download here:

The list of compound data is free for academic research as defined in GermaNet's academic research licence agreement. For any other intended purposes, please contact us.

The format of these split compounds is one compound per line: first the compound itself, then a <tab> space, then the modifier (in case of two modifiers, these are separated by the pipe (|) symbol), then a <tab> space again, and finally the head. For example:

Apfelbaum      Apfel   Baum
Goldmünze     Gold   Münze
Laufband       laufen|Lauf     Band

Reference

The following paper describes the automatic compound splitting that is performed before the manual post-correction. If you want to use the split compounds in the context of scientific or research work, please refer to the paper:

Verena Henrich and Erhard Hinrichs: Determining Immediate Constituents of Compounds in GermaNet. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2011), Hissar, Bulgaria, September 2011, pp. 420-426.