Comparing Meaning in Context

Generalization of Information Structure and Reference Text

How can meaning be compared in realistic situations, in which ill-formed language or individual differences in situative or world knowledge complicate or even preclude a complete linguistic analysis? This question is the motivation for project A4, investigating which linguistic representations can be used effectively and robustly for comparing the meaning of sentences and text fragments computationally.

In particular, project A4 investigates the nature and interaction of context and sentential meaning in an authentic language-based task: teacher assessment of answers to reading comprehension questions. The project i) extends the meaning comparison underlying the content assessment so that it integrates the breadth of characteristics of the task context, ii) advances the content assessment so that it supports feedback for the real-life educational application developed in the transfer project T1, and iii) generalizes the information structural analysis successfully developed for explicit question-answer contexts to be applicable in a broader range of contexts.


Software Resources and Corpora

Software Resources

The following releases of corpora that emerged from this project are available under a CC-BY-NC-SA license upon request (related publications in parenthesis):

  • Corpus of Reading Comprehension Exercises in German:
    • CREG-17k (Ott, Ziai, and Meurers, 2012)
    • CREG-1032 (Meurers, Ziai, Ott, and Kopp, 2011)
    • CREG-1006 (Refined version of CREG-1032)
    • CREG-109 (Ott and Ziai, 2010)
    • CREG-225 (Extension of CREG-109: balanced distribution of correct/incorrect answers, added questions and target answers)
    • CREG-5K (a larger, "clean" subcorpus with restrictive data selection criteria and a balanced distribution of correct/incorrect answers)
    • CREG-23K (very large subcorpus with two ratings per answer, but without perfect agreement and without balanced distribution of correct/incorrect answers)
    • CREG-TUE (3546 learner answers from a control group consisting of 100 native speakers of German, annotated by our partners in the US.)
  • Corpus of Reading Comprehension Exercises in English:
    • CREE (Meurers, Ziai, Ott, and Bailey, 2011; Bailey 2008)

Further CREG-related corpus resources were contributed by the CoALLa project:

  • Condensed Context CREG-5K: This xml version of the CREG-5k corpus contains the condensed context of the CREG-5K corpus, namely the question, the target answers, the correct student answers, as well as the short and long reading context extracted from the reading comprehension texts.
  • CREG-MeanT: This corpus contains 2574 annotated student answers of a CREG-5K with Form-Meaning Target Hypothesis annotations for the student answers by two annotators. The annotation manual is listed on the CoALLa project page, as is the publication discussing the inter-annotator experiments (Boyd, 2018a)

In case you want to make use of these corpora, do not hesitate to send us a message to a4@sfs.uni-t[...]en.de.

Furthermore, we have the following resources available for direct download:

  • DepLeSdeWaC, a dependency-parsed version of sDeWaC with automatically annotated lemmas and morphology.