A4

Comparing Meaning in Context

Generalization of Information Structure and Reference Text

How can meaning be compared in realistic situations, in which ill-formed language or individual differences in situative or world knowledge complicate or even preclude a complete linguistic analysis? This question was the motivation for project A4, investigating which linguistic representations can be used effectively and robustly for comparing the meaning of sentences and text fragments computationally.

In particular, project A4 investigated the nature and interaction of context and sentential meaning in an authentic language-based task: teacher assessment of answers to reading comprehension questions. The project i) extended the meaning comparison underlying the content assessment so that it integrated the breadth of characteristics of the task context, ii) advanced the content assessment so that it supported feedback for the real-life educational application developed in the transfer project T1, and iii) generalized the information structural analysis successfully developed for explicit question-answer contexts to be applicable in a broader range of contexts.

Publications

De Kuthy K., Kannan M., Santhi Ponnusamy H. and Meurers D. (2022) Exploring neural question generation for formal pragmatics: Data set and model evaluation. Frontiers in Artificial Intelligence 5:966013. DOI: 10.3389/frai.2022.966013
Brunetti, L., De Kuthy, K. & Riester, A. (2021). The information-structural status of adjuncts: A QUD-based approach. Discours: Revue de linguistique, psycholinguistique et informatique. DOI: https://doi.org/10.4000/discours.11454
De Kuthy, K. (2021). Information structure. In S. Müller, A. Abeillé, R. D. Borsley & J.-P. Koenig (Eds.), Head-Driven Phrase Structure Grammar: The Handbook (Empirically Oriented Theoretical Morphology and Syntax). Berlin: Language Science Press.
De Kuthy, K., Kannan, M., Santhi Ponnusamy, H. & Meurers, D. (2020). Towards automatically generating Questions under Discussion to link information and discourse structure. Proceedings of the 28th International Conference on Computational Linguistics (COLING) (pp. 5786-5798). http://dx.doi.org/10.18653/v1/2020.coling-main.509
De Kuthy, K., Brunetti, L. & Berardi, M. (2019). Annotating information structure in Italian: Characteristics and cross-linguistic applicability of a QUD-based approach. Proceedings of the 13th Linguistic Annotation Workshop (pp. 113-123). Florence, Italy.
De Kuthy, K. & Konietzko, A. (2019). Information structural constraints on PP topicalization from NPs. In V. Molnár, V. Egerland & S. Winkler (Eds.), Architecture of Topic (pp. 203-222). Studies in Generative Grammar 136. Berlin/New York, NY: de Gruyter.
De Kuthy, K. & Stolterfoht, B. (2019). Focus projection revisited: Pitch accent perception in German. In S. Featherston, R. Hörnig, S. von Wietersheim & S. Winkler (Eds.), Information Structure and Semantic Processing (pp. 57-70). Linguistische Arbeiten 571. Berlin: de Gruyter.
De Kuthy, K., Reiter, N. & Riester, A. (2018). QUD-based annotation of discourse structure and information structure: Tool and evaluation. In N. Calzolari et al. (Eds.), Proceedings of the 11th Language Resources and Evaluation Conference (LREC) (pp. 1932-1938). Miyazaki, Japan.
Riester, A., Brunetti, L. & De Kuthy, K. (2018). Annotation guidelines for Questions under Discussion and information structure. In E. Adamou, K. Haude & M. Vanhove (Eds.), Information Structure in Lesser-Described Languages: Studies in Prosody and Syntax. Studies in Language Companion Series. John Benjamins.
Ziai, R. & Meurers, D. (2018). Automatic focus annotation: Bringing formal pragmatics alive in analyzing the information structure of authentic data. Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 117-128). New Orleans, LA: ACL.
De Kuthy, K., Ziai, R. & Meurers, D. (2016). Focus annotation of task-based data: A comparison of expert and crowd-sourced annotation in a reading comprehension corpus. Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC) (pp. 3928-3935). Portorož, Slowenia. http://www.lrec-conf.org/proceedings/lrec2016/pdf/1083_Paper.pdf
De Kuthy, K., Ziai, R. & Meurers, D. (2016). Focus annotation of task-based data: Establishing the quality of crowd annotation. Proceedings of the 10th Linguistic Annotation Workshop (LAW) (pp. 110-119). Berlin, Germany. https://aclweb.org/anthology/W16-1713.pdf
Rudzewitz, B. (2016). Exploring the intersection of short answer assessment, authorship attribution, and plagiarism detection. Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. San Diego, CA.
Ziai, R., De Kuthy, K. & Meurers, D. (2016). Approximating givenness in content assessment through distributional semantics. Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM) (pp. 209-218). Berlin, Germany. https://aclweb.org/anthology/S16-2026.pdf
De Kuthy, K., Ziai, R. & Meurers, D. (2015). Learning what the crowd can do: A case study on focus annotation. Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics. University of Tübingen.
Rudzewitz, B. & Ziai, R. (2015). CoMiC: Adapting a short answer assessment system for answer selection. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) (pp. 247-251). Association for Computational Linguistics.
Ziai, R. & Rudzewitz, B. (2015). CoMiC: Exploring text segmentation and similarity in the English entrance exams task. In L. Cappellato, N. Ferro, G. J. F. Jones & E. San Juan (Eds.), Working Notes of CLEF 2015 -- Conference and Labs of the Evaluation forum, volume 1391 of CEUR Workshop Proceedings.
Wall, A., Ott, N., Ziai, R. & Rudzewitz, B. (2014). Macunaíma as a data source for the distribution and interpretation of bare NPs in Brazilian Portuguese: Corpus preparation and first annotation agreement results. Poster at the Third International Conference on Ibero-Romance Historical Corpora (CODILI 3). Zurich, Switzerland.
Ziai, R. & Meurers, D. (2014). Focus annotation in reading comprehension data. Proceedings of the 8th Linguistic Annotation Workshop (LAW VIII). Dublin, Ireland.
Ott, N., Ziai, R., Hahn, M. & Meurers, D. (2013). CoMeT: Integrating different levels of linguistic modeling for meaning assessment. Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval). Atlanta, GA. Association for Computational Linguistics. (Poster)
Boyd, A. (2012). Detecting and Diagnosing Grammatical Errors for Beginning Learners of German: From Learner Corpus Annotation to Constraint Satisfaction Problems. Ph.D. thesis, The Ohio State University.
Hahn, M. & Meurers, D. (2012). Evaluating the meaning of answers to reading comprehension questions: A semantics-based approach. Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7). Montreal, Canada. Association for Computational Linguistics.
Ott, N., Ziai, R. & Meurers, D. (2012). Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in context. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Benjamins.
Quixal, M. (2012). Application-Driven Natural Language Processing. Shaping NLP to the Needs of Foreign Language Teaching and Learning. Ph.D. thesis, Universitat Pompeu Fabra Barcelona/Eberhard Karls Universität Tübingen.
Ziai, R., Ott, N. & Meurers, D. (2012). Short answer assessment: Establishing links between research strands. Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA7). Montreal, Canada. Association for Computational Linguistics.
Hahn, M. & Meurers, D. (2011). On deriving semantic representations from dependencies: A practical approach for evaluating meaning in learner corpora. Proceedings of the Int. Conference on Dependency Linguistics (Depling 2011). Barcelona, Spain.
Krivanek, J. & Meurers, D. (2011). Comparing rule-based and data-driven dependency parsing of learner language. Proceedings of the Int. Conference on Dependency Linguistics (Depling 2011). Barcelona, Spain.
Meurers, D., Ziai, R., Ott, N. & Bailey, S. (2011). Integrating parallel analysis modules to evaluate the meaning of answers to reading comprehension questions. Special Issue on Free-text Automatic Evaluation. International Journal of Continuing Engineering Education and Life-Long Learning (IJCEELL) 21(4), 355-369.
Meurers, D., Ziai, R., Ott, N. & Kopp, J. (2011). Evaluating answers to reading comprehension questions in context: Results for German and the role of information structure. Proceedings of the TextInfer 2011 Workshop on Textual Entailment at EMNLP. Edinburgh, UK.
Meurers, D., Ott, N. & Ziai, R. (2010). Compiling a task-based corpus for the analysis of learner language in context. Pre-Proceedings of Linguistic Evidence 2010.
Ott, N. & Ziai, R. (2010). Evaluating dependency parsing performance on German learner language. Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT9). Tartu, Estonia, 3.-4. Dezember 2010.
Bailey, S. (2008). Content Assessment in Intelligent Computer-Aided Language Learning: Meaning Error Diagnosis for English as a Second Language. Ph.D. thesis, The Ohio State University.
Bailey, S. & Meurers, D. (2008). Diagnosing meaning errors in short answers to reading comprehension questions. Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications (pp. 107-115). Association for Computational Linguistics.

Software Resources and Corpora

Software Resources

RFTagger Java Interface

Corpora

The following releases of corpora that emerged from this project are available under a CC-BY-NC-SA license upon request (related publications in parenthesis):

Corpus of Reading Comprehension Exercises in German:
- CREG-17k (Ott, Ziai, and Meurers, 2012)
- CREG-1032 (Meurers, Ziai, Ott, and Kopp, 2011)
- CREG-1006 (Refined version of CREG-1032)
- CREG-109 (Ott and Ziai, 2010)
- CREG-225 (Extension of CREG-109: balanced distribution of correct/incorrect answers, added questions and target answers)
- CREG-5K (a larger, "clean" subcorpus with restrictive data selection criteria and a balanced distribution of correct/incorrect answers)
- CREG-23K (very large subcorpus with two ratings per answer, but without perfect agreement and without balanced distribution of correct/incorrect answers)
- CREG-TUE (3546 learner answers from a control group consisting of 100 native speakers of German, annotated by our partners in the US.)
Corpus of Reading Comprehension Exercises in English:
- CREE (Meurers, Ziai, Ott, and Bailey, 2011; Bailey 2008)

Further CREG-related corpus resources were contributed by the CoALLa project:

Condensed Context CREG-5K: This xml version of the CREG-5k corpus contains the condensed context of the CREG-5K corpus, namely the question, the target answers, the correct student answers, as well as the short and long reading context extracted from the reading comprehension texts.
CREG-MeanT: This corpus contains 2574 annotated student answers of a CREG-5K with Form-Meaning Target Hypothesis annotations for the student answers by two annotators. The annotation manual is listed on the CoALLa project page, as is the publication discussing the inter-annotator experiments (Boyd, 2018a)

In case you want to make use of these corpora, do not hesitate to send us a message to a4@sfs.uni-t[...]en.de.

Furthermore, we have the following resources available for direct download:

DepLeSdeWaC, a dependency-parsed version of sDeWaC with automatically annotated lemmas and morphology.