Seminar für Sprachwissenschaft

As Quantitative Linguistics group there are several projects we are currently working on. A short description of the projects associated with our group follows. For a more detailed view of the research ideas we are investigating look at  Harald Baayens website. Projects that have been finished or run out of funding can be found in Previous Projects.

 

DFG-Cwic

Complex words in context

Principal Investigator: R. Harald Baayen (Professor of Quantitative Linguistics)

Website

 

Details to DFG-Cwic

Project Cwic: Complex words in context

Recent years have seen impressive advances in the fields of natural language processing (NLP) and artificial intelligence (AI). State-of-the-art language technologies have been made possible by advances in machine learning utilising many-layered 'deep' learning artificial neural networks. However, understanding what deep learning networks detect in language use, and what probabilistic information they exploit to generate predictions for computational language tasks, often remains unclear (but see Linzen & Baroni, 2021, for recent advances). For engineering purposes, this is not a problem, but for understanding language and the cognition of language processing, this state of affairs is highly unsatisfactory. The discriminative lexicon model (DLM) (Baayen, R. H. et al., 2019; Chuang & Baayen, R. H., 2021) is an attempt to combine the strengths of the mathematics of error-driven learning with the new possibilities offered by word embeddings for the computational modeling of the mental lexicon and lexical processing. Word embeddings, which we will also refer to as 'semantic vectors', represent word meanings as points in a high-dimensional space calculated from word usage in large text corpora.

Members

  • R. Harald Baayen (Principal Investigator)
  • Konstantin Sering (Postdoctoral researcher)

ERC-SUBLIMINAL

Subliminal learning in the Mandarin lexicon

Principal Investigator: R. Harald Baayen (Professor of Quantitative Linguistics)

Website

 

Details to ERC-SUBLIMINAL

Project aims

Central to this research project is the observation that there are regularities and systematicities in the spoken language that escape our awareness, that are shielded from us by linguistic traditions and cultural conventions embodied in writing systems, but that nevertheless are detected by our brains, albeit subliminally, and used to optimize lexical processing.

Philosophers such as Emmanual Kant, Edmund Husserl, and Maurice Merleau-Ponty, and more recently the cognitive scientist Hoffman, have called attention to how our perception of reality is shaped by and filtered through our minds and bodies. According to Hoffman, mathematically, fitness beats truth: our perceptions of the world are tuned to our survival. Writing systems are culturally evolved technologies that also hide from our eyes and ears the truth about what we really hear and say. Obviously, in order to work, writing systems must abstract away from the full richness of the spoken word. However, many features of our speech that are masked by writing systems, are nevertheless exploited by our cognitive system when we listen or speak. For native speakers, mismatches between speech and writing are relatively unproblematic. For second language acquisition, however, mismatches can render learning unnecessarily difficult.

The research programme addresses this issue for Mandarin Chinese. Two kinds of mismatches will be investigated, using state-of-the-art methods in computational modeling, distributional semantics, and statistical analysis: subliminal mismatches between what written words are supposed to sound like, and how they are actually spoken, and subliminal mismatches between how the writing system is supposed to work, and how it actually functions and, as a semiotic system of its own, influences thought. These investigations will inform the applied goal of this project: developing ways to enhance vocabulary learning of Mandarin Chinese as a second language.

Publications

Tseng, Y.-H., Chen, P.-E., Lian, D.-C., and Hsieh, S.-K. (2024). The Semantic Relations in LLMs: An Information-theoretic Compression Approach. In Dong, T., Hinrichs, E., Han, Z., Liu, K., Song, Y., Cao, Y., Hempelmann, C. F., Sifa, R. (Eds.), Proceedings of the Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge) @ LREC-COLING-2024, Italy, 8-21. Torino, Italy: ELRA and ICCL.

Chuang, Y.-Y., Baayen, R. H., and Bell, M. (2023). Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English. In Skarnitzl , R., and Volín, J. (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences, Czech Republic, 1603-1607. Prague, Czech Republic: Guarant International.

Presentations

Tseng, Y.-H., Chen, P.-E., Lian, D.-C., and Hsieh, S.-K., The Semantic Relations in LLMs: An Information-theoretic Compression Approach, Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge), Torino, Italy, May 21, 2024.

Baayen, R. H., Modeling Mandarin tones on two-word compounds, Colloquium English Language and Linguistics, Düsseldorf, Germany, January 19, 2024.

Baayen, R. H., Frequency-Informed Learning, Colloquium Out of Our Minds, Birmingham, United Kingdom, October 11, 2023.

Yang, Y., Measure words in Mandarin, 2nd Joint Workshop on Chinese Lexical Semantic Change, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023

Tseng, Y.-H., Lian, D.-C., and Watty, D., Modeling diachronic semantic change of (Pre-Modern) Mandarin Chinese with contextualized embeddings & Word2Vec, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023

Yang, Y., and Baayen, R. H., Exploring semantic organization across mental lexicons: Perception verbs in Mandarin and English, International Cognitive Linguistics Conference (ICLC16), Düsseldorf, Germany, August 8, 2023 (poster presentation).

Chuang, Y.-Y., Baayen, R. H., and Bell, M., Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English, 20th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic, August 7, 2023 (poster presentation).

Members

  • R. Harald Baayen (Professor, Principal Investigator)

  • Xiaoyun Jin (Doctoral researcher)

  • Zhexuan Li (Research assistant)

  • Yuxin Lu (Doctoral researcher)

  • Motoki Saito (Postdoctoral researcher)

  • Yu Hsiang Tseng (Postdoctoral researcher)

  • Yi Yang (Postdoctoral researcher)

Former members

  • Yu-Ying Chuang (Postdoctoral researcher)

  • Kun Sun (Postdoctoral researcher)

  • Weiting Wang (Research assistant)

  • Kai-Hui Yang (Research assistant)

  • Runzhi Zhang (Research assistant)

DFG-EML

Machine Learning for Science

Cluster of Excellence - Machine Learning for Science (Cluster speaker: Philipp Berens, Cluster speaker: Ulrike von Luxburg)

Website

Details to DFG-EML

Innovation Fund Project 1 in research area A - Beyond Prediction, Towards Understanding

In research area A, we will design algorithms that reveal complex structure and causal relationships from data in order to integrate machine learning into the scientific discovery process. Project 1 investigates "Enhancing Machine Learning of Lexical Semantics with Image Mining".

Members

  • Hendrik Lensch (Principal investigator)
  • R. Harald Baayen (Principal investigator)
  • Zohreh Ghaderi (Phd student)
  • Hassan Shahmohammadi (Phd student)