Links and Do-Files for Research Internship and Thesis

Please find in this list useful step by step tutorials, video tutorials and brief discussions about new and interesting topics for research. Important for Bachelor and Master students interested in the research internship or thesis work at the chair.


How to calculate the ABCC index - Video Tutorial

How to calculate the ABCC index - aggregated data - Stata Do File

How to calculate the ABCC index - aggragated data - Stata Sample Data Set

How to calculate the ABCC index - individual data - Stata Do File

Resources for learning Stata
For Stata beginners we recommend these tutorials. You can also find answers to advanced questions here.

Resources for learning Stata

General hints

How to create residual scattergram using Stata

How to do a figure with two axes using Excel

How to combine Clio-Infra files

How to do a spatial regression - Presentation (with audio)

How to do a spatial regression - Stata Do File

Thematic maps

How to do thematic maps using Stata

How to do thematic maps using Stata - Template

Here is a do-file that assigns world regions to countries (in two-letter ISO format)

Here is a do-file that assigns countries two-letter ISO format abbreviations to most country name versions

Abbreviations of job titles
When entering individual level data with occupations, it is often faster to use abbreviations and replace them later with the original text, if this is for datasets that have 100 or more occupational cells.

Abbreviations of job titles (German)

Abbreviations of job titles (Spanish)

Research Internship

On this internet page we explain how human capital and health can be calculated for almost any country and region in the world over the past two centuries or more. To measure human capital, we are using the age-heaping based numeracy indicator that is quite easily calculated. A few things to keep in mind when using this approach will be explained in the following. Moreover, we explain how health and the inequality of health and income can be calculated from datasets on height, which can also be constructed with very little effort.

We will in addition show you where sources about both ages and height, which is the raw product for the numeracy component of human capital and health inequality, respectively, can be found on the internet so that you can write interesting studies about this topic.

The background of this research is explained in the global economic development book “A History of the Global Economy” (Baten 2016). The reference can be found below. This background book can be bought at a low price. The e-book is available at 17 € and the original paperback is available for 25 €.


Baten, J. (Ed.) (2016). A History of the Global Economy: 1500 to the Present. Cambridge: Cambridge University Press.

Further downloads can be found on the internet page of the institute.

Human Capital and Numeracy

Why are some countries rich and continue to grow while others don’t? New Growth Economics highlights human capital as an important explanation for the economic development of countries. The research team at the University of Tübingen was the first to examine this relationship in the long-run for countries worldwide by estimating numeracy skills.

Data on ages allows to estimate the numerical skills and hence a part of human capital for early periods for which few other data exists. This technique has a lot of potential for most parts of the world during the 19th century and for many developing countries until today. The UNESCO study recently included this age-heaping based numeracy approach into the global education monetary report, indicating the relevance of the indicator for contemporary studies.

How can numeracy be measured?

A proxy for numeracy, the ABCC-Index, can be easily computed by relying on ages in census data. This method considers the share of individuals who are able to state their precise age on an annual basis, in contrast to those who report an age rounded to a multiple of five (stating, for example, ‘I am about 35’ when they might be 34 in reality). Crayen and Baten (2010) showed that this proxy reflects human capital well, since it is closely related with other measures for human capital, such as literacy or schooling.

Usually we have census data available, which are collected for one year. These data are divided into age groups (e.g. 23-32, 33-42, ...) in order to assess the educational environment during the first ten years of life, i.e. early childhood and early adolescence, which are more relevant for very basic numeracy formation.

How to calculate the ABCC index in Stata is explained in a video and do-file that you can find above.

Which data can be used?

Computations of numeracy are mostly based on census data, but in principle any source that contains age data can be used to obtain information on numeracy.

A suggested data source is the website, which contains data for a large number of countries. Example data sets are: Argentina National Census 1895, Mexico National Census 1930. A further suggestion is the “Census Mosaic”, especially the data on Rumania, Denmark and France.

For an overview and explanation of typical numeracy values in different world regions and eras, see the book “A History of the Global Economy” (Baten 2016). Further estimates of numeracy obtained in the DFG project "Numeracy in Africa and the Middle East" are collected in a Data Hub.

Interactive teaching: Example for a research-internship

Research Internship Human Capital and Numeracy.pdf

Heights and Biological Standard of Living

What was the standard of living of people 2000 years ago? Adult stature reflects childhood health and nutrition and thereby provides evidence for living standards. Hence, in recent decades, anthropometric research has created the well-being indicator "human stature," which facilitates the measurement of health development by social and regional groups, as well as by gender.

This proxy of the biological standard of living is often, though not always, related to GDP per capita or real wages. Furthermore, it has important implications for labor productivity, demographics, health systems, and less developed countries.

How can biological living standards be measured by heights?

The body height of an individual is largely determined by its genes. In order to be able to use heights as a proxy for the biological standard of living one computes the average values of heights over e.g. a social group for a given period in time. This is because, in contrast to individual heights, current evidence suggests that differences in heights of populations are not caused by genetic differences but are due to environmental influences such as health or nutrition.

Which data can be used?

In convict, conscription or school records, for example, data on heights can be found for rather early periods. Adult height observations are grouped by year of birth because early childhood health and nutrition largely influence adult stature. The heights of children and adolescent reflect the recent biological standard of living and therefore should be grouped by the year of observation.

Very early evidence on heights can be obtained of skeletons from archaeological excavations.

A suggested data source for heights on are, for example, the Dutch Army Records, the US Army Records or the Albanian Census.

The International Economic History Association has taken the initiative to set up a network of scholars working with data on heights and establish a moderated list of datafiles of historical heights. They are collected in this Data Hub.


For an example, look at the book "A History of the Global Economy" (Baten 2016). There you will find an analysis of African heights in chapter 8.