A3

Korpusbasierte semantische Kompositionsmodelle für Phrasen

Das A3-Projekt befasste sich mit Modellen der semantischen Komposition von deutschen und englischen Phrasen. Dabei lag der Fokus auf Adjektiv-Nomen-Phrasen und Präpositionalphrasen. Für die computerlinguistische Modellierung wurden distributionelle Wortrepräsentationen und sog. Deep Learning Methoden, im Speziellen rekurrente neuronale Netze (RNN), verwendet.

Die Beziehung zwischen Komposition und Parsing war hier von besonderem Interesse. Bestehende Kompositionsmodelle, die in Parser integriert wurden, werden normalerweise zusammen mit dem Parser durch überwachtes Lernen trainiert. Bei den Trainingsdaten handelt es sich meist um Daten aus Baumbanken. Im Gegensatz dazu wurden in unserem Ansatz Kompositionsmodelle durch unüberwachtes Lernen mit großen geparsten Korpora trainiert. Die semantische Darstellung der Phrasen, die durch diese vortrainierten Kompositionsmodelle erstellt wurden, konnten anschließend in einen Parser eingebaut werden, um dessen Genauigkeit beim Parsen zu verbessern.

Bewertet wurden die Kompositionsmodelle anhand verschiedener Aufgaben, unter anderem durch die Klassifizierung semantischer Relationen, die Desambiguierung von PP-Bindungen, das Erkennen textueller Folgebeziehungen und der Text-Bild-Abruf.

Publikationen

de Kok, D. & Beer, P. (2021). Fast and accurate dependency parsing for Dutch and German. Computational Linguistics in the Netherlands.
Falk, N., Strakatova, Y., Huber, E. & Hinrichs, E. (2021). Automatic classification of attributes in German adjective-noun phrases. Proceedings of the 14th International Conference on Computational Semantics (pp. 239-249).
Hinrichs, E., Fischer, P. & Strakatova, Y. (2021). Rover und TüNDRA: Such- und Visualisierungsplattformen für Wortnetze und Baumbanken. In H. Lobin, A. Witt & A. Wöllstein (Eds.), Deutsch in Europa (pp. 323-328). Berlin/Boston, MA: De Gruyter.
de Kok, D. & Pütz, T. (2020). Self-distillation for German and Dutch Dependency Parsing. Computational Linguistics in the Netherlands (CLIN).
de Kok, D., Falk, N. & Pütz, T. (2020). sticker2: A Neural Syntax Annotator for Dutch and German. CLARIN Jahrestagung 2020.
de Kok, D. & Falk, N. (2020). Reproducible annotation services for WebLicht. CLARIN Jahrestagung 2020.
Fischer, P., de Kok, D. & Hinrichs, E. (2020). When beards start shaving men: A Subject-object resolution test suite for morpho-syntactic and semantic model introspection. Proceedings of the 28th International Conference on Computational Linguistics (COLING) (pp. 3019-3035). Barcelona, Spanien.
Strakatova, Y., Falk, N., Fuhrmann, I., Hinrichs, E. & Rossmann, D. (2020). All that glitters is not gold: A gold standard of adjective-noun collocations for German. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC) (pp. 4368-4378). Marseille, Frankreich.
Dima, C., de Kok, D., Witte, N. & Hinrichs, E. (2019). No word is an island — A transformation weighting model for semantic composition. Transactions of the Association for Computational Linguistics 2019 (pp. 437-451).
Fischer, P., Pütz, P. & de Kok, D. (2019). Association metrics in neural transition-based dependency parsing. Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019) (pp. 181-189). Paris, Frankreich, Juli 2019.
de Kok, D., Fischer, P., Dima, C. & Hinrichs, E. (2018). Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) (pp. 1-9). Prag, Tschechien.
Pütz, T., de Kok, D., Pütz, S. & Hinrichs, E. (2018). Sequence2Sequence or perceptrons for lemmatization. An empirical examination. Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT17). Oslo, Norwegen, Dezember 2018.
de Kok, D., Dima, C., Ma, J. & Hinrichs, E. (2017). Extracting a PP attachment data set from a German dependency treebank using topological fields. Proceedings of the 15th International Workshop on Treebanks and Linguistic Theories (TLT15) (pp. 89-98). Bloomington, IN, USA.
de Kok, D., Ma, J., Dima, C. & Hinrichs, E. (2017). PP attachment: Where do we stand? Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (pp. 311-317). Valencia, Spanien, April 2017.
Dima, C., Ma, J., Bücking, S., Buscher, F., Herdtfelder, J., Lukassek, J., Prysłopska, A., Hinrichs, E., de Kok, D. & Maienborn, C. (2017). A corpus-based model of semantic plausibility for German bracketing paradoxes. Corpora in the Digital Humanities (CDH) (pp. 64-70). Bloomington, IN, USA, Januar 2017.
de Kok, D. & Hinrichs, E. (2016). Transition-based dependency parsing with topological fields. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL, Volume 2: Short Papers) (pp. 1-7). Berlin, Deutschland, August 2016. ACL 2016 Outstanding Paper.
Dima, C. (2016). On the compositionality and semantic interpretation of English noun compounds. Proceedings of 1st Workshop on Representation Learning for NLP (RepL4NLP) @ ACL 2016 (pp. 27-39). August 2016, Berlin, Deutschland.
Ma, J., Çöltekin, Ç. & Hinrichs, E. (2016). Learning phone embeddings for word segmentation of child-directed speech. Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning @ACL 2016 (pp. 53-63). Berlin, Deutschland, August 2016.
Ma, J., Dima, C., Barkey, R. & Hinrichs, E. (2016). Predicting compositionality of compounds using word vectors. Pre-Proceedings of Linguistic Evidence 2016 (pp. 107-111). Tübingen, Deutschland, Februar 2016.
Ma, J., Henrich, V. & Hinrichs, E. (2016). Letter sequence labeling for compound splitting. Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology @ ACL 2016 (pp. 76-81). Berlin, Deutschland, August 2016.
de Kok, D. (2015). A poor man’s morphology for German transition-based dependency parsing. Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (pp. 50-60). Warschau, Polen, Dezember 2015.
Dima, C. (2015). Reverse-engineering language: A study on the semantic compositionality of German compounds. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015) (pp. 1637-1642). Lissabon, Portugal, September 2015.
Dima, C. & Hinrichs, E. (2015). Automatic noun compound interpretation using deep neural networks and word embeddings. Proceedings of the 11th International Conference on Computational Semantics (IWCS 2015) (pp. 173-183). London, UK, April 2015.
Henrich, V. & Hinrichs, E. (Eds.) (2015). Special issue of the Journal of Cognitive Science on Computational, Cognitive, and Linguistic Approaches to the Analysis of Compounds and Collocations 16(3).
Ma, J. & Hinrichs, E. (2015). Accurate linear-time Chinese word segmentation via embedding matching. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL, Volume 1: Long Papers) (pp. 1733-1743). Peking, China, Juli 2015.
Sorokin, D., Dima, C. & Hinrichs, E. (2015). Classifying semantic relations in German nominal compounds using a hybrid annotation scheme. Special issue of the Journal of Cognitive Science on Computational, Cognitive, and Linguistic Approaches to the Analysis of Compounds and Collocations 16(3), 261-286.
Dima, C., Henrich, V., Hinrichs, E. & Hoppermann, C. (2014). How to tell a Schneemann from a Milchmann: An annotation scheme for compound-internal relations. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) (pp. 1194-1201). Reykjavik, Island, Mai 2014.
Dima, C., Henrich, V., Hinrichs, E., Hoppermann, C. & Versley, Y. (2014). Annotating semantic relations in German noun-noun compounds. Pre-Proceedings of Linguistic Evidence 2014 (pp. 130-135). Tübingen, Deutschland, Februar 2014.
Henrich, V., Hinrichs, E., de Kok, D., Osenova, P. & Przepiórkowski, A. (Eds.) (2014). Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13). ISBN 978-3-9809183-9-8, Universität Tübingen, Seminar für Sprachwissenschaft, Tübingen, Deutschland, Dezember 2014.
Henrich, V. & Hinrichs, E. (2014). Consistency of manual sense annotation and integration into the TüBa-D/Z Treebank. Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13) (pp. 62-74). Tübingen, Deutschland, Dezember 2014.
Henrich, V. & Hinrichs, E. (Eds.) (2014). Proceedings of the Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations. 26th European Summer School in Logic, Language and Information (ESSLLI 2014). Universität Tübingen, Seminar für Sprachwissenschaft, Tübingen, Deutschland, August 2014.
Sorokin, D., Dima, C. & Hinrichs, E. (2014). Multi-label classification of semantic relations in German nominal compounds using SVMs. Proceedings of the ESSLLI 2014 Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (pp. 57-63). Tübingen, Deutschland, August 2014.
Henrich, V. & Hinrichs, E. (2013). Extending the TüBa-D/Z treebank with GermaNet sense annotation. In I. Gurevych, C. Biemann & T. Zesch (Eds.), Language Processing and Knowledge in the Web, Lecture Notes in Computer Science, Vol. 8105, (pp. 89-96).
Hinrichs, E., Henrich, V. & Barkey, C. (2013). Using part-whole relations for automatic deduction of compound-internal relations in GermaNet. Language Resources and Evaluation, special issue on "Wordnets and Relations" 47(3), Springer Netherlands, September 2013, 839-858.
Versley, Y. (2013). SFS-TUE: Compound paraphrasing with a language model and discriminative reranking. Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, GA, USA.
Versley, Y. (2013). Subgraph-based classification of explicit and implicit discourse relations. 10th International Conference on Computational Semantics (IWCS 2013), Potsdam, Deutschland.
Versley, Y. & Gastel, A. (2013). Linguistic tests for discourse relations in the TüBa-D/Z corpus of written German. In S. Dipper, B. Webber & H. Zinsmeister (Eds.), Beyond Semantics: The Challenges of Annotating Pragmatic and Discourse Phenomena. Dialogue & Discourse 4(2), 142-173. [Preprint pdf]
Brock, A., Henrich, V., Hinrichs, E. & Versley, Y. (2012). Automatic mining of valence compounds for German: A corpus-based approach. In J. C. Meister (Ed.), Digital Humanities 2012 Conference Abstracts (DH 2012) (pp. 126-129). Hamburg: Hamburg University Press.
Henrich, V., Hinrichs, E. & Vodoladzova, T. (2012). An automatic method for creating a sense-annotated corpus harvested from the web. International Journal of Computational Linguistics and Applications (IJCLA) 3(2), 35-50.
Henrich, V., Hinrichs, E. & Vodolazova, T. (2012). WebCAGe – A web-harvested corpus annotated with GermaNet senses. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012) (pp. 387-396). Avignon, Frankreich.
Versley, Y. (2012). Supervised learning of German qualia relations. In M. Apidianaki, I. Dagan, K. Erk, J. Foster, Y. Marton, I. Rehbein, D. Seddah, R. Tsarfaty & P. Turney (Eds.), Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-SEM-MRL 2012) (pp. 12-23). Jeju, Korea.
Versley, Y., Brock, A., Henrich, V. & Hinrichs, E. (2012). Three approaches to finding valence compounds. Proceedings of the 11th Conference on Natural Language Processing (KONVENS 2012) (pp. 208-212). Wien, Österreich.
Versley, Y. & Henrich, V. (2012). Using nominal compounds for word sense discrimination. Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-SEM-MRL 2012) (pp. 36-41). Jeju, Korea.
Versley, Y. & Panchenko, Y. (2012). Not just bigger: Towards better-quality web corpora. In S. Sharoff & A. Kilgariff (Eds.), Proceedings of the 7th Web as Corpus Workshop at WWW2012 (WAC7) (pp. 45-52). Lyon, Frankreich.
Henrich, V. & Hinrichs, E. (2011). Determining immediate constituents of compounds in GermaNet. In G. Angelova, K. Bontcheva, R. Mitkov & N. Nikolov (Eds.), Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2011) (pp. 420-426). Hissar, Bulgarien.
Versley, Y. (2011). Multilabel tagging of discourse relations in ambiguous temporal connectives. In G. Angelova, K. Bontcheva, R. Mitkov & N. Nikolov (Eds.), Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2011) (pp. 154-161). Hissar, Bulgarien.
Versley, Y. (2011). Towards finer-grained tagging of discourse connectives. DGfS AG Beyond Semantics. Göttingen, Deutschland.
Versley, Y. (2010). Discovery of ambiguous and unambiguous discourse connectives via annotation projection. In L. Ahrenberg, J. Tiedemann & M. Volk (Eds.), Workshop on the Annotation and Exploitation of Parallel Corpora (AEPC) (pp. 83-92). Tartu, Estland.
Versley, Y., Beck, K., Hinrichs, E. & Telljohann, H. (2010). A syntax-first approach to high-quality morphological analysis and lemma disambiguation for the TüBa-D/Z treebank. Proceedings of the 9th Conference on Treebanks and Linguistic Theories (TLT9) (pp. 233-244). Tartu, Estland.
Versley, Y. & Rehbein, I. (2009). Scalable discriminative parsing for German. International Conference on Parsing Technology (IWPT'09) (pp. 134-137). Paris, Frankreich.
Hinrichs, E. & Lau, M. (2008). In contrast - A complex discourse connective. Proceedings of the of the Sixth Conference on International Language Resources and Evaluation (LREC'08) (pp. 1433-1436). Marrakesch, Marokko.
Versley, Y. (2008). Decorrelation and shallow semantic patterns for distributional clustering of nouns and verbs. Proceedings of the ESSLLI 2008 Workshop on Distributional Lexical Semantics.