Folgende Projekte sind gerade in der Arbeitsgruppe der Quantiativen Linguistik angesiedelt und werden von uns bearbeitet. Eine inhaltliche Darstellung der momentan bearbeiteten Forschungsideen findet sich in englischer Sprache auf Harald Baayens Homepage. Die Projekte, die in der Vergangenheit von der Arbeitsgruppe bearbeitet wurden, aber momentan nicht finanziert sind, können unter Abgeschlossene Projekte nachgelesen werden.
Projektleiter: R. Harald Baayen (Professor für Quantitative Linguistik)
Recent years have seen impressive advances in the fields of natural language processing (NLP) and artificial intelligence (AI). State-of-the-art language technologies have been made possible by advances in machine learning utilising many-layered 'deep' learning artificial neural networks. However, understanding what deep learning networks detect in language use, and what probabilistic information they exploit to generate predictions for computational language tasks, often remains unclear (but see Linzen & Baroni, 2021, for recent advances). For engineering purposes, this is not a problem, but for understanding language and the cognition of language processing, this state of affairs is highly unsatisfactory. The discriminative lexicon model (DLM) (Baayen, R. H. et al., 2019; Chuang & Baayen, R. H., 2021) is an attempt to combine the strengths of the mathematics of error-driven learning with the new possibilities offered by word embeddings for the computational modeling of the mental lexicon and lexical processing. Word embeddings, which we will also refer to as 'semantic vectors', represent word meanings as points in a high-dimensional space calculated from word usage in large text corpora.
Projektleiter: R. Harald Baayen (Professor für Quantitative Linguistik)
Im Mittelpunkt dieses Forschungsprojekts steht die Beobachtung, dass es in der gesprochenen Sprache subtile Regelmäßigkeiten gibt, die sich unserem Bewusstsein entziehen, die aber eine wichtige Rolle beim Spracherwerb und Sprachgebrauch spielen.
Philosophen wie Immanuel Kant, Edmund Husserl und Maurice Merleau-Ponty sowie der Kognitionswissenschaftler Donald Hoffman gehen davon aus, dass unsere Wahrnehmung der Realität durch unseren Geist und Körper geformt und gefiltert wird. Gemäß der in diesem Projekt umschriebenen Auffassung gilt dies auch für unsere Sprachwahrnehmung, die durch unsere Schriftsysteme gefiltert wird. Abweichungen zwischen Schreibkonventionen und gesprochener Alltagssprache sind für Muttersprachler in der Regel unproblematisch. So kommen englische Muttersprachler beispielsweise damit zurecht, wenn in einer Konversation das Wort „probably“ (deutsch: „wahrscheinlich“) als „prolly“ ausgesprochen wird. Beim Erlernen einer neuen Sprache jedoch könnten solche Diskrepanzen den Zweitspracherwerb unnötig erschweren, laut diesem Projekt.
Das Forschungsprojekt befasst sich mit dem Erlernen von Mandarin-Chinesisch, einer Sprache, in der unterschiedliche Wörter aus denselben Klängen bestehen können, aber je nach Bedeutung in verschiedenen Tonmelodien ausgesprochen werden. Im Rahmen dieses Forschungsprojekts wird im Detail untersucht, wie Mandarin-Sprecher Wörter tatsächlich aussprechen, mit Fokus darauf, wie sie Tonmelodien einsetzen. Er wird zudem erforschen, wie das einzigartige Schriftsystem des Chinesischen mehrere Bedeutungsebenen erzeugt. Mit Hilfe modernster Methoden der Computermodellierung, der Verteilungssemantik und der statistischen Analyse, wird er untersuchen, wie Form und Bedeutung zusammenpassen, und die Ergebnisse nutzen, um die Methoden des Vokabellernens für Mandarin-Chinesisch als Zweitsprache zu verbessern.
Tseng, Y.-H., Chen, P.-E., Lian, D.-C., and Hsieh, S.-K. (2024). The Semantic Relations in LLMs: An Information-theoretic Compression Approach. In Dong, T., Hinrichs, E., Han, Z., Liu, K., Song, Y., Cao, Y., Hempelmann, C. F., Sifa, R. (Eds.), Proceedings of the Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge) @ LREC-COLING-2024, Italy, 8-21. Torino, Italy: ELRA and ICCL.
Chuang, Y.-Y., Baayen, R. H., and Bell, M. (2023). Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English. In Skarnitzl , R., and Volín, J. (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences, Czech Republic, 1603-1607. Prague, Czech Republic: Guarant International.
Tseng, Y. H. and Baayen, R. H., Investigating forgetting curves with learning rule-derived interferences, The 31st Annual ACT-R Workshop, Tilburg, the Netherlands, July 23, 2024.
Baayen, R. H., and Heitmeier, M., Linear Discriminative Learning, Workshop at the International Word Processing Conference (WoProc 2024), Belgrade, Serbia, July 6, 2024.
Chuang, Y.-Y., Bell, M. J., Tseng, Y.-H., and Baayen, R. H., Word-specific tonal realizations in Mandarin. International Word Processing Conference (WoProc 2024), Belgrade, Serbia, July 5, 2024.
Tseng, Y.-H., Chen, P.-E., Lian, D.-C., and Hsieh, S.-K., The Semantic Relations in LLMs: An Information-theoretic Compression Approach, Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge), Torino, Italy, May 21, 2024.
Baayen, R. H., Modeling Mandarin tones on two-word compounds, Colloquium English Language and Linguistics, Düsseldorf, Germany, January 19, 2024.
Baayen, R. H., Frequency-Informed Learning, Colloquium Out of Our Minds, Birmingham, United Kingdom, October 11, 2023.
Yang, Y., Measure words in Mandarin, 2nd Joint Workshop on Chinese Lexical Semantic Change, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023
Tseng, Y.-H., Lian, D.-C., and Watty, D., Modeling diachronic semantic change of (Pre-Modern) Mandarin Chinese with contextualized embeddings & Word2Vec, 2nd Joint Workshop on Chinese Lexical Semantic Change, Tübingen, Germany, September 6, 2023
Yang, Y., and Baayen, R. H., Exploring semantic organization across mental lexicons: Perception verbs in Mandarin and English, International Cognitive Linguistics Conference (ICLC16), Düsseldorf, Germany, August 8, 2023 (poster presentation).
Chuang, Y.-Y., Baayen, R. H., and Bell, M., Do words sing their own tunes? Word-specific pitch realizations in Mandarin and English, 20th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic, August 7, 2023 (poster presentation).
R. Harald Baayen (Professor, Projektleiter)
Xiaoyun Jin (Doktorandin)
Yuxin Lu (Doktorandin)
Maziyah Mohamed (Postdoktorandin)
Motoki Saito (Postdoktorand)
Yu-Hsiang Tseng (Postdoktorand)
Yi Yang (Post-Doktorandin)
Yu-Ying Chuang (Postdoktorandin)
Kun Sun (Postdoktorand)
Our website uses cookies. Some of them are mandatory, while others allow us to improve your user experience on our website. The settings you have made can be edited at any time.
or
Essential
in2cookiemodal-selection
Required to save the user selection of the cookie settings.
3 months
be_lastLoginProvider
Required for the TYPO3 backend login to determine the time of the last login.
3 months
be_typo_user
This cookie tells the website whether a visitor is logged into the TYPO3 backend and has the rights to manage it.
Browser session
ROUTEID
These cookies are set to always direct the user to the same server.
Browser session
fe_typo_user
Enables frontend login.
Browser session
Videos
iframeswitch
Used to show all third-party contents.
3 months
yt-player-bandaid-host
Is used to display YouTube videos.
Persistent
yt-player-bandwidth
Is used to determine the optimal video quality based on the visitor's device and network settings.
Persistent
yt-remote-connected-devices
Saves the settings of the user's video player using embedded YouTube video.
Persistent
yt-remote-device-id
Saves the settings of the user's video player using embedded YouTube video.
Persistent
yt-player-headers-readable
Collects data about visitors' interaction with the site's video content - This data is used to make the site's video content more relevant to the visitor.
Persistent
yt-player-volume
Is used to save volume preferences for YouTube videos.
Persistent
yt-player-quality
Is used to save the quality settings for YouTube videos.
Persistent
yt-remote-session-name
Saves the settings of the user's video player using embedded YouTube video.
Browser session
yt-remote-session-app
Saves the settings of the user's video player using embedded YouTube video.
Browser session
yt-remote-fast-check-period
Saves the settings of the user's video player using embedded YouTube video.
Browser session
yt-remote-cast-installed
Saves the user settings when retrieving a YouTube video integrated on other web pages
Browser session
yt-remote-cast-available
Saves user settings when retrieving integrated YouTube videos.
Browser session
ANID
Used for targeting purposes to profile the interests of website visitors in order to display relevant and personalized Google advertising.
2 years
SNID
Google Maps - Google uses these cookies to store user preferences and information when you view pages with Google Maps.
1 month
SSID
Used to store information about how you use the site and what advertisements you saw before visiting this site, and to customize advertising on Google resources by remembering your recent searches, your previous interactions with an advertiser's ads or search results, and your visits to an advertiser's site.
6 months
1P_JAR
This cookie is used to support Google's advertising services.
1 month
SAPISID
Used for targeting purposes to profile the interests of website visitors in order to display relevant and personalized Google advertising.
2 years
APISID
Used for targeting purposes to profile the interests of website visitors in order to display relevant and personalized Google advertising.
6 months
HSID
Includes encrypted entries of your Google account and last login time to protect against attacks and data theft from form entries.
2 years
SID
Used for security purposes to store digitally signed and encrypted records of a user's Google Account ID and last login time, enabling Google to authenticate users, prevent fraudulent use of login credentials, and protect user data from unauthorized parties. This may also be used for targeting purposes to display relevant and personalized advertising content.
6 months
SIDCC
This cookie stores information about user settings and information for Google Maps.
3 months
NID
The NID cookie contains a unique ID that Google uses to store your preferences and other information.
6 months
CONSENT
This cookie tracks how you use a website to show you advertisements that may be of interest to you.
18 years
__Secure-3PAPISID
This cookie is used to support Google's advertising services.
2 years
__Secure-3PSID
This cookie is used to support Google's advertising services.
6 months
__Secure-3PSIDCC
This cookie is used to support Google's advertising services.
6 months