Seminar für Sprachwissenschaft

As Quantitative Linguistics group there are several projects we are currently working on. A short description of the projects associated with our group follows. For a more detailed view of the research ideas we are investigating look at  Harald Baayens website. Projects that have been finished or run out of funding can be found in Previous Projects.



Wide Incremental learning with Discrimination nEtworks

Principal Investigator: R. Harald Baayen (Professor of Quantitative Linguistics)




Project aims

Central to this research project is the observation that there are regularities and systematicities in the spoken language that escape our awareness, that are shielded from us by linguistic traditions and cultural conventions embodied in writing systems, but that nevertheless are detected by our brains, albeit subliminally, and used to optimize lexical processing.

Philosophers such as Emmanual Kant, Edmund Husserl, and Maurice Merleau-Ponty, and more recently the cognitive scientist Hoffman, have called attention to how our perception of reality is shaped by and filtered through our minds and bodies. According to Hoffman, mathematically, fitness beats truth: our perceptions of the world are tuned to our survival. Writing systems are culturally evolved technologies that also hide from our eyes and ears the truth about what we really hear and say. Obviously, in order to work, writing systems must abstract away from the full richness of the spoken word. However, many features of our speech that are masked by writing systems, are nevertheless exploited by our cognitive system when we listen or speak. For native speakers, mismatches between speech and writing are relatively unproblematic. For second language acquisition, however, mismatches can render learning unnecessarily difficult.

The research programme addresses this issue for Mandarin Chinese. Two kinds of mismatches will be investigated, using state-of-the-art methods in computational modeling, distributional semantics, and statistical analysis: subliminal mismatches between what written words are supposed to sound like, and how they are actually spoken, and subliminal mismatches between how the writing system is supposed to work, and how it actually functions and, as a semiotic system of its own, influences thought. These investigations will inform the applied goal of this project: developing ways to enhance vocabulary learning of Mandarin Chinese as a second language.


Yang, Y., Measure words in Mandarin, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 7, 2023

Jin, X., Retroflex realization in the ShangHai dialect, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 7, 2023

Tseng, Y.-H., Lian, D.-C., and Watty, D., Modeling diachronic semantic change of (Pre-Modern) Mandarin Chinese with contextualized embeddings & Word2Vec, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 7, 2023

Chuang, Y.-Y., Baayen, R. H., and Bell, M., Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English, 20th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic, August 7, 2023 (poster presentation).

Baayen, R. H., Chuang, Y.-Y., and Heitmeier, M., Discriminative learning and the lexicon: NDL and LDL, STEP2023 – CCP Spring Training in Experimental Psycholinguistics, Edmonton, Canada, June 14, 16, 2023 (virtual).


  • R. Harald Baayen (Professor, Principal Investigator)

  • Yu-Ying Chuang (Postdoc)

  • Xiaoyun Jun (Postdoc)

  • Yuxin Lu (Postdoc)

  • Kun Sun (Postdoc)

  • Yu Hsiang Tseng (Postdoc)

  • Weiting Wang (Research assistant)

  • Yi Yang (Postdoc)

  • Runzhi Zhang (Research assistant)


Spoken Morphology: Phonetics and phonology of complex words

DFG Research Unit FOR 2373 (Director: Prof. Dr. Ingo Plag)



Details to DFG-ART

Sub-project ART: The articulation of morphologically complex words

ART is a subproject of the research unit „Spoken Morphology: Phonetics and phonology of complex words“ funded by the Deutsche Forschungsgemeinschaft (DFG) that investigates the articulation of morphologically complex words with the help of electromagnetic articulography.


  • R. Harald Baayen (Principal Investigator)
  • Benjamin V. Tucker (Mercator Fellow)
  • Fabian Tomaschek (Postdoc)
  • Motoki Saito (Research assistant)


Machine Learning for Science

Cluster of Excellence - Machine Learning for Science (Cluster speaker: Philipp Berens, Cluster speaker: Ulrike von Luxburg)


Details to DFG-EML

Innovation Fund Project 1 in research area A - Beyond Prediction, Towards Understanding

In research area A, we will design algorithms that reveal complex structure and causal relationships from data in order to integrate machine learning into the scientific discovery process. Project 1 investigates "Enhancing Machine Learning of Lexical Semantics with Image Mining".


  • Hendrik Lensch (Principal investigator)
  • R. Harald Baayen (Principal investigator)
  • Zohreh Ghaderi (Phd student)
  • Hassan Shahmohammadi (Phd student)