The Center hosts several research projects headed by external and local members. Research projects are either hosted directly at the Center or are associated with it.

Emmy Noether Junior Research Group “The Language Dynamics of the ancient Central Andes”

Just like the Inca are often considered the Andean civilization par excellence, the Quechuan language family is frequently portrayed as the prime representative of the Andean languages. Yet in reality, the ancient Central Andean region is characterized by cultural and linguistic diversity and complex interregional relations.

Without neglecting the well studied and widely distributed Central Andean language families such as Quechuan and Aymaran, Dr. Matthias Urban’s independent DFG-funded Emmy Noether Junior Research Group “The language dynamics of the ancient Central Andes” shifts the empirical focus of attention towards the many ‘minor’ languages that were once spoken in the region (see map). The group explores how language contact and language shift involving the full original linguistic diversity of the Central Andes can contribute to accounts and theories of the region’s prehistory.

This involves the study of a representative variety of different contact and shift situations within the Central Andes. Andean geography and prehistoric sociocultural practices have given rise to a multitude of such situations, which the group explores both in its broadest scope as well as through detailed case studies. Therein, the group relies on a combination of methods derived from contact linguistics, historical linguistics, and anthropology.

Generally, the group embraces and aims to foster a view of linguistics as embedded into a concert of disciplines oriented to the study of human history and prehistory.

Automatic Identification of Language Contact and Borrowing (Marisa Köllner)

Language Contact and Borrowing The most fascinating thing about language is that it is one of the biggest and fastest changing systems in humanity. Language change is not only driven by internal changes of sounds, words, or meanings, but also by language contact and the borrowing of words. Language diversification is represented in the tree model. However, it is well-known in historical linguistics that the tree model of language diversification does not fully capture reality.  In order to model all aspects of language evolution, the network model seems to be the solution. While the underlying tree represents the diversification of the languages, reticulations can be added to display language contact and lexical flow.

During language contact, words can be exchanged between the languages. The diving factors of borrowing, as well as the driving force of speakers to borrowed words is contact depended. The missing universal principles to borrow words and the unique adaptation process of the words in the recipient language, makes the identification of loanwords a difficult task. Historical linguistics develop methods to compare languages, reconstruct ancestral states, and identify sound changes. The comparative method can also shed light on language contact and loanwords.

The aim of my work is to identify contact scenarios and borrowing processes between languages in order to shed light on the relationship and evolution of languages. The main challenge in linguistics is to collect gold-standard data and identify contact situation. 

Computational models Mathematical models and computational tools, developed in phylogenetics, population genetics, and informatics can be adapted into historical linguistics to study linguistic prehistory. Several computational methods (e.g. automatic tree reconstruction and cognate detection) can be considered as state-of-the-art methods to reconstruct language evolution and diversity. Models for loanword detection and contact inference are still in their infancy. The main obstacle for applying computational models to the study of language contact is the relative sparseness of language data and gold standard data. Next to my work of pursuing the collection of gold-standard data, is the adaptation of computational models into historical linguistics to automize the processes of loanword and contact identification. Next to methods from phylogenetics and population genetics, mathematical methods like Bayesian models, machine learning, and deep learning can be adopted for this purpose.

Language change across time and space (Dr. Igor Yanovich)

The English progressive
Alina Ladygina and Igor Yanovich trace fine-grained temporal trajectories of the rise of the English progressive, as in “The girl is reading the book”. That construction is known to have risen significantly in prominence during the 19th century, but nobody systematically studied its rise on the level of individual words. Using a much larger dataset than in the preceding literature, Ladygina and Yanovich find that verbs have radically different histories with regard to the progressive. Their different trajectories cannot be explained by obvious linguistic features of those verbs, but at the same time are directional for most words. This raises important evolutionary questions: the differentiation suggests drift, but the directionality suggests systematic force.

The spatial distributions of linguistic dialects
Dialectal data mostly come in categorical form. In order to study the spatial distributions of linguistic dialects, student assistant Mei-Shin Wu and long-term DFG Center fellow Igor Yanovich developed a simple extension of familiar Moran’s I measure and corresponding correlogram for categorical data. They currently investigate whether taking into account population density leads to a more adequate notion of distance in spatial analyses.

Language phylogenetic inference and temporal predictions
Igor Yanovich investigates the robustness of the temporal predictions resulting from state-of-the-art phylogenetic inference methods for language families. In order to estimate the time of language origins and divergence in the absence of independent knowledge about linguistic rates of change, one needs to specify calibration points. The timing of particular language splits is used by algorithms to estimate actual rates of change. Naturally, calibrations based on historical events are much easier to obtain than prehistoric calibrations, but are historical time calibrations enough? Yanovich finds that they often are not. In phylogenetic analyses, temporal estimates for the root of the tree can shift significantly when one adds just one or two high-probability constraints based on archaeological and historical linguistic sources. For example, these estimates can differ on the order of a thousand years for Indo-European. This underscores that only by combining multiple lines of evidence from different sciences of human (pre)history can one obtain meaningful predictions about the past.

Evolutionary game theory framework for understanding language change
Igor Yanovich extends Ashwini Deo’s recently proposed evolutionary-game-theoretic framework for the analysis of cross-linguistically robust progressive-to-imperfective cycle of language change. The original framework by Deo adopts the infinite population assumption, which simplifies the analysis. However, Yanovich shows that in the infinite-population model, some important features of the cycle cannot be reproduced, while a finite-population model fares much better, though analytically it is harder to work with. With his modification of Deo’s model, Yanovich obtains novel empirical predictions about the relative length and stability of different stages of the progressive-to-imperfective cycle that allow for empirical testing.

SignBase (Dr. Christian Bentz)

Website: www.signbase.org

SignBase is an open access database for geometric motifs on mobile objects in Prehistory. Its focus lies on finds of the Eurasian Paleolithic and African Middle Stone Age. In these time periods, geometric motifs – also referred to as signs, patterns, or marks – are abundant in parietal art as well as on mobile objects. The term “geometric” denotes simple non-figurative forms such as dots, lines, and crosses, as well as more complex patterns. This includes frequent semi-abstract depictions such as vulvae, but excludes figurative depictions of animals, humans, etc. Decorated mobile objects are mostly made of osseous material, like ivory, bone or antler, while also featuring other organic and inorganic materials

Quantitative and phylogenetic accounts of linguistic diversity and change (Dr. Christian Bentz)

Languages differ vastly with regard to the range of phonemes, morphemes, words, and constructions used to encode information. Bentz applies information-theoretic methods in order to measure this world-wide linguistic diversity and to model and explain its evolution. In this context, a particular focus is on so-called “external” factors – such as geographical spread, population contact, and cultural exchange – and how these shape languages over historical and evolutionary time. To disentangle the complex interplay of these factors he uses cross-linguistic data, large-scale statistical analyses, and phylogenetic modelling tools.

ERC CrossLingference (Prof. Dr. Gerhard Jäger)

The "CrossLingference – Cross-linguistic statistical inference using hierarchical Bayesian models" project by Prof. Gerhard Jäger is located at the Institute of Linguistics at the University of Tübingen. CrossLingference is funded for five years by the European Research Council (ERC). The aim of is to combine typological research and research from computational historical linguistics.

Together with the completed ERC Projcect "EVOLAEMP - Language Evolution: The Empirical Turn" and the DFG Center for advanced studies "Words, Bones, Genes, Tools: Tracking Linguistic, Cultural and Biological Trajectories of the Human Past", the CrossLingference project complements the Tübingen research in computational historical and typological phylogenetic research.

CrossLingference webpage (https://uni-tuebingen.de/fakultaeten/philosophische-fakultaet/fachbereiche/neuphilologie/seminar-fuer-sprachwissenschaft/arbeitsbereiche/allg-sprachwissenschaft/projekte/crosslingference/)

ERC WIDE (Prof. Dr. Harald Baayen)

ERC advanced grant: Wide Incremental learning with Discrimination nEtworks

In this project, we are developing a computational model of the mental lexicon with the aim of providing a functional characterization of the cognitive skills that allow us to express our thoughts in words.  In psychology and cognitive science, the organization of words in paper dictionaries has long served as a model for the organization of our lexical knowledge.  Accordingly, listeners are supposed to first locate a word's form in a list of word forms. The entry in this form list is then taken to provide access to a word's meaning.  Our model, by contrast, generates words' forms and meanings on the fly from acoustic, visual, or semantic input. The generative processes underlying comprehension and production are implemented with the mathematics of multivariate multiple regression.  The resulting machine learning model of lexical cognition combines linguistic and mathematical interpretability with high prediction accuracy.  The model, which is being developed for both auditory and visual comprehension as well as for speech production, is tested against descriptive and experimental data from a wide range of typologically different languages.  

WIDE webpage (https://www.quantling.org/ERC-WIDE/)

ERC REVIVE (Dr. Sireen El Zaatari)

Dr. Sireen El Zaatari received an ERC Consolidator Grant. ‘REVIVE’ is the third ERC grant hosted by the Paleoanthropology department here in Tübingen. In the link below is the University press release, which links to the official ERC announcement.


OCSEAN Consortium

OCSEAN is lacuned in 2020 and scheduled funded by the European Commission (H2020-MSCA-RISE-2019 Marie Sklodowska-Curie Research and Innovation Staff Exchange, Project Number 873207), scheduled to officially launch in 2020 and operate for 4 years. Led by a consortium of 9 European universities, including the University of Tübingen, the project work is a collaboration with several universities and institutes abroad. It unites researchers from across the world to re-evaluate our understanding of the Austronesian expansion, doing so by using new high-density data from archaeology, biological anthropology, linguistics and genomics. In line with the mission of the DFG Center “Words, Bones, Genes, Tools,” OCSEAN aims to address questions of the Austronesian expansion using distinct datasets within a common statistical framework.

OCSEAN will achieve a synthesis of disciplines using state-of-the-art computational and statistical methods to interrogate the structural relationship between joint data sets. The research contextualizes the expansion of the Austronesian language family within the growing evidence for social and political complexity across Island Southeast Asia and coastal mainland regions prior to the arrival of rice and millet agriculture to Taiwan during the mid 5th millennium BP. It also takes into account the rich history of interaction since the spread of the Malayo-Polynesian branch of Austronesian outside of Taiwan. OCSEAN brings together leading researchers from the humanities and sciences and combines the resources of multiple laboratories to tackle questions that can only be addressed with this extensive network of cooperation.