Corpus-based Semantic Composition Models for Phrases

A3 studied semantic composition models for German and English phrases, focusing on adjective-noun phrases and prepositional phrases. The computational modelling used distributional word representations and deep learning techniques, in particular recurrent neural networks (RNN).

Of special interest was the relation between composition and parsing. Existing composition models integrated into parsers are usually trained together with the parser in a supervised manner, typically only on treebank data. In contrast, our approach used unsupervised learning to train stand-alone composition models from large parsed corpora. The semantic phrase representations built by the pre-trained composition models could then be integrated into the parser to improve parsing accuracy.

The composition models were evaluated on several tasks including semantic relation classification, PP-attachment disambiguation, recognising textual entailment and text-to-image retrieval.