The “Global Basic Lexicon” (GloBasLex) project aims to compile an openly available basic vocabulary database of parallel wordlists covering 1,000 basic concepts across most languages of the world which are sufficiently documented by published dictionaries. According to the initial plan, the database will include 1,494 languages from all continents, which will make it seven times bigger than the largest comparable databases. Every dictionary form will include not only a consistent phonetic transcription, but also basic information about morphological structure, in simple machine readable formats.
The planned duration of the project is 15 years. Starting in 2026, it will be led by Prof. Gerhard Jäger and coordinated by Dr. Johannes Dellert from the Department of Linguistics at the University of Tübingen. The project is funded by the joint research program of the German academies of sciences (Akademienprogramm) and administered by the Heidelberg Academy of Sciences and Humanities. The team will consist of the principal investigator, the coordinator, two postdoctoral researchers, one doctoral student, and around twenty student researchers.
GloBasLex is organized into two main modules, the first one dealing with data collection, and the other with the accompanying research. Data collection is planned to proceed in three phases. In Phase I, scheduled to last until 2028, a comprehensive basic vocabulary of 3,200 concepts is to be compiled across a global sample of languages following the methodology established by Dellert and Buch (2018). Based on this, it will then be decided which concepts to include in the set of 1,000 concepts to be collected worldwide. The near global coverage for this concept list is then achieved in two further phases, in which first the small and then the medium sized families are completed, leaving only the six largest language families incomplete. As parts of the data become available, they will be used in a series of five accompanying research projects.
The data are intended not only to provide a more reliable basis for research in lexical, morphological, and phonological typology, but also to form an important foundation for data-driven historical linguistics. The variety of possible uses of GloBasLex data extends to a broad range of other disciplines within and beyond linguistics, such as phonology, lexical semantics, lexicography, cognitive science, and archeology. Some of these promising additional applications will be supported or pursued in collaborations with national and international partners.
For more detailed information about the project, please contact the coordinator Johannes Dellert.