A4

Bedeutungsvergleich im Kontext

Generalisierung von Informationsstruktur und Bezugstext

Motiviert durch die generelle Frage, wie Bedeutungsvergleiche auch in realistischen Situationen erfolgen können, in denen nicht wohlgeformte Sprache oder Unterschiede im situativen Wissen oder Weltwissen eine vollständige Analyse erschweren oder unmöglich machen, untersuchte das Projekt, welche linguistischen Repräsentationen für einen computerlinguistischen Vergleich der Bedeutung von Sätzen und Textfragmenten effektiv und robust zu verwenden sind.

Im Einzelnen untersuchte das Projekt A4 die Rolle des Kontextes bei der Bedeutungskomposition anhand einer authentischen sprachlichen Aufgabe, der Bewertung von Antworten auf Leseverständnisfragen. Das Projekt i) integrierte vielfältige Faktoren des gegebenen Aufgabenkontexts in den Bedeutungsvergleich als Basis der Bewertung der Antwort, ii) erweiterte den Bedeutungsvergleich so, dass er auch spezifische Rückmeldungen für Aufgaben lieferte, die im Transfer Projekt T1 als Teil eines online Workbooks in realen Lernkontexten eingesetzt wurden, und iii) generalisierte die im Projekt entwickelte informationsstrukturelle Analyse so, dass sie in vielfältigen Kontexten anwendbar wurde.

Publikationen

De Kuthy K., Kannan M., Santhi Ponnusamy H. and Meurers D. (2022) Exploring neural question generation for formal pragmatics: Data set and model evaluation. Frontiers in Artificial Intelligence 5:966013. doi: 10.3389/frai.2022.966013
Brunetti, L., De Kuthy, K. & Riester, A. (2021). The information-structural status of adjuncts: A QUD-based approach. Discours: Revue de linguistique, psycholinguistique et informatique. DOI: https://doi.org/10.4000/discours.11454
De Kuthy, K. (2021). Information structure. In S. Müller, A. Abeillé, R. D. Borsley & J.-P. Koenig (Eds.), Head-Driven Phrase Structure Grammar: The Handbook (Empirically Oriented Theoretical Morphology and Syntax). Berlin: Language Science Press.
De Kuthy, K., Kannan, M., Santhi Ponnusamy, H. & Meurers, D. (2020). Towards automatically generating Questions under Discussion to link information and discourse structure. Proceedings of the 28th International Conference on Computational Linguistics (COLING) (pp. 5786-5798). http://dx.doi.org/10.18653/v1/2020.coling-main.509
De Kuthy, K., Brunetti, L. & Berardi, M. (2019). Annotating information structure in Italian: Characteristics and cross-linguistic applicability of a QUD-based approach. Proceedings of the 13th Linguistic Annotation Workshop (pp. 113-123). Florenz, Italien.
De Kuthy, K. & Konietzko, A. (2019). Information structural constraints on PP topicalization from NPs. In V. Molnár, V. Egerland & S. Winkler (Eds.), Architecture of Topic (pp. 203-222). Studies in Generative Grammar 136. Berlin/New York, NY: de Gruyter.
De Kuthy, K. & Stolterfoht, B. (2019). Focus projection revisited: Pitch accent perception in German. In S. Featherston, R. Hörnig, S. von Wietersheim & S. Winkler (Eds.), Information Structure and Semantic Processing (pp. 57-70). Linguistische Arbeiten 571. Berlin: de Gruyter.
De Kuthy, K., Reiter, N. & Riester, A. (2018). QUD-based annotation of discourse structure and information structure: Tool and evaluation. In N. Calzolari et al. (Eds.), Proceedings of the 11th Language Resources and Evaluation Conference (LREC) (pp. 1932-1938). Miyazaki, Japan.
Riester, A., Brunetti, L. & De Kuthy, K. (2018). Annotation guidelines for Questions under Discussion and information structure. In E. Adamou, K. Haude & M. Vanhove (Eds.), Information Structure in Lesser-Described Languages: Studies in Prosody and Syntax. Studies in Language Companion Series. John Benjamins.
Ziai, R. & Meurers, D. (2018). Automatic focus annotation: Bringing formal pragmatics alive in analyzing the information structure of authentic data. Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 117-128). New Orleans, LA: ACL.
De Kuthy, K., Ziai, R. & Meurers, D. (2016). Focus annotation of task-based data: A comparison of expert and crowd-sourced annotation in a reading comprehension corpus. Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC) (pp. 3928-3935). Portorož, Slowenien. http://www.lrec-conf.org/proceedings/lrec2016/pdf/1083_Paper.pdf
De Kuthy, K., Ziai, R. & Meurers, D. (2016). Focus annotation of task-based data: Establishing the quality of crowd annotation. Proceedings of the 10th Linguistic Annotation Workshop (LAW) (pp. 110-119). Berlin, Deutschland. https://aclweb.org/anthology/W16-1713.pdf
Rudzewitz, B. (2016). Exploring the intersection of short answer assessment, authorship attribution, and plagiarism detection. Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. San Diego, CA.
Ziai, R., De Kuthy, K. & Meurers, D. (2016). Approximating givenness in content assessment through distributional semantics. Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM) (pp. 209-218). Berlin, Deutschland. https://aclweb.org/anthology/S16-2026.pdf
De Kuthy, K., Ziai, R. & Meurers, D. (2015). Learning what the crowd can do: A case study on focus annotation. Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics. Universität Tübingen.
Rudzewitz, B. & Ziai, R. (2015). CoMiC: Adapting a short answer assessment system for answer selection. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 247-251). Association for Computational Linguistics.
Ziai, R. & Rudzewitz, B. (2015). CoMiC: Exploring text segmentation and similarity in the English entrance exams task. In L. Cappellato, N. Ferro, G. J. F. Jones & E. San Juan (Eds.), Working Notes of CLEF 2015 -- Conference and Labs of the Evaluation forum, volume 1391 of CEUR Workshop Proceedings.
Wall, A., Ott, N., Ziai, R. & Rudzewitz, B. (2014). Macunaíma as a data source for the distribution and interpretation of bare NPs in Brazilian Portuguese: Corpus preparation and first annotation agreement results. Poster at the Third International Conference on Ibero-Romance Historical Corpora (CODILI 3). Zürich, Schweiz.
Ziai, R. & Meurers, D. (2014). Focus annotation in reading comprehension data. Proceedings of the 8th Linguistic Annotation Workshop (LAW VIII). Dublin, Irland.
Ott, N., Ziai, R., Hahn, M. & Meurers, D. (2013). CoMeT: Integrating different levels of linguistic modeling for meaning assessment. Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval). Atlanta, GA. Association for Computational Linguistics. (Poster)
Boyd, A. (2012). Detecting and Diagnosing Grammatical Errors for Beginning Learners of German: From Learner Corpus Annotation to Constraint Satisfaction Problems. Ph.D. thesis, The Ohio State University.
Hahn, M. & Meurers, D. (2012). Evaluating the meaning of answers to reading comprehension questions: A semantics-based approach. Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7). Montreal, Kanada. Association for Computational Linguistics.
Ott, N., Ziai, R. & Meurers, D. (2012). Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in context. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Benjamins.
Quixal, M. (2012). Application-Driven Natural Language Processing. Shaping NLP to the Needs of Foreign Language Teaching and Learning. Ph.D. thesis, Universitat Pompeu Fabra Barcelona/Eberhard Karls Universität Tübingen.
Ziai, R., Ott, N. & Meurers, D. (2012). Short answer assessment: Establishing links between research strands. Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7). Montreal, Kanada. Association for Computational Linguistics.
Hahn, M. & Meurers, D. (2011). On deriving semantic representations from dependencies: A practical approach for evaluating meaning in learner corpora. Proceedings of the Int. Conference on Dependency Linguistics (Depling 2011). Barcelona, Spanien.
Krivanek, J. & Meurers, D. (2011). Comparing rule-based and data-driven dependency parsing of learner language. Proceedings of the Int. Conference on Dependency Linguistics (Depling 2011). Barcelona, Spanien.
Meurers, D., Ziai, R., Ott, N. & Bailey, S. (2011). Integrating parallel analysis modules to evaluate the meaning of answers to reading comprehension questions. Special Issue on Free-text Automatic Evaluation. International Journal of Continuing Engineering Education and Life-Long Learning (IJCEELL) 21(4), 355-369.
Meurers, D., Ziai, R., Ott, N. & Kopp, J. (2011). Evaluating answers to reading comprehension questions in context: Results for German and the role of information structure. Proceedings of the TextInfer 2011 Workshop on Textual Entailment at EMNLP. Edinburgh, UK.
Meurers, D., Ott, N. & Ziai, R. (2010). Compiling a task-based corpus for the analysis of learner language in context. Pre-Proceedings of Linguistic Evidence 2010.
Ott, N. & Ziai, R. (2010). Evaluating dependency parsing performance on German learner language. Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT9). Tartu, Estland, 3.-4. Dezember 2010.
Bailey, S. (2008). Content Assessment in Intelligent Computer-Aided Language Learning: Meaning Error Diagnosis for English as a Second Language. Ph.D. thesis, The Ohio State University.
Bailey, S. & Meurers, D. (2008). Diagnosing meaning errors in short answers to reading comprehension questions. Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications (pp. 107-115). Association for Computational Linguistics.

Software und Korpora

Software

RFTagger Java Interface

Korpora

Folgende Releases der im Projekt entstandenen Korpora stehen auf Anfrage unter einer CC-BY-NC-SA-Lizenz zur Verfügung (dazugehörige Veröffentlichungen in Klammern):

Corpus of Reading Comprehension Exercises in German:
- CREG-17k (Ott, Ziai, and Meurers, 2012)
- CREG-1032 (Meurers, Ziai, Ott, and Kopp, 2011)
- CREG-1006 (Verbesserte Version von CREG-1032)
- CREG-109 (Ott and Ziai, 2010)
- CREG-225 (Erweiterung zu CREG-109: Ausgewogene Verteilung korrekter/inkorrekter Antworten, Fragen und Zielantworten beigefügt.)
- CREG-5K (Größeres, "sauberes" Subkorpus mit restriktiver Datenauswahl und ausgewogener Verteilung korrekter/inkorrekter Antworten.)
- CREG-23K (Sehr großes Subkorpus mit zwei Bewertungen pro Antwort, jedoch ohne perfektes Agreement und ohne ausgewogener Verteilung korrekter/inkorrekter Antworten)
- CREG-TUE (3546 Lernerantworten von einer Kontrollgruppe bestehend aus 100 deutschen Muttersprachlern, annotiert von den amerikanischen Projektpartnern)
Corpus of Reading Comprehension Exercises in English:
- CREE (Meurers, Ziai, Ott, and Bailey 2011; Bailey 2008)

Weitere CREG Korpus Ressourcen wurden durch das CoALLa Projekt erstellt:

Condensed Context CREG-5k: Diese xml-Version des CREG-5k-Korpus enthält den kondensierten Kontext des CREG-5k-Korpus, d.h. die Frage, die Zielantworten, die richtigen Schülerantworten sowie den kurzen und langen Lesekontext, der aus den Leseverständnistexten extrahiert wurde.
CREG-MeanT: Dieses Korpus enthält 2574 kommentierte Studentenantworten aus REG-5k mit Form-Meaning-Target-Hypothese-Annotationen für die Studentenantworten von zwei Annotatoren. Das Annotationsmanual findet sich auf der CoALLa Projekt Seite, sowie der Artikel mit einer Diskussion des Annotationsexperiments (Boyd, 2018a).

Falls Sie diese Korpora ebenfalls nutzen möchten, schreiben Sie bitte einfach eine E-Mail an a4@sfs.uni-t[...]en.de.

Des Weiteren stehen folgende Ressourcen zum direkten Download bereit:

DepLeSdeWaC, eine dependenz-geparste Version des sDeWaC mit automatisch annotierten Lemmas und automatisch annotierter Morphologie.