A man, probably Nikola Tesla, with his equipment for producing high-frequency alternating currents.

Evidence & Uncertainty in Science:

Methodological, Philosophical and Meta-Scientific Issues

When: Tuesday/Wednesday, 10 & 11 June 2025

Where: 1st floor lecture hall, Tübinger Forum für Wissenschaftskulturen, Doblerstr. 33, 72074 Tübingen

Topic

Scientific knowledge comes with uncertainty. Yet, there is a robust agreement inside and outside of academia that scientific research is suffering from a so-called replication crisis. This makes empirical findings unreliable, hinders scientific progress and eventually undermines the epistemic authority of science.

To understand this phenomenon and overcome this alarming situation, we need a multifaceted perspective on evidence, scientific error and uncertainty in the sciences. For this reason, we offer this cross-disciplinary workshop and bring together scientists — especially early-career researchers — from diverse backgrounds: Statistics, Philosophy of Science, Epidemiology, Social & Political Sciences, Evolutionary Ecology, and Machine Learning.

The following key topics will be addressed:

Epistemic perspectives on the foundations and limitations of statistical and
probabilistic reasoning
Developing a non-monolithic, nuanced understanding of reproducibility
Open access to large-scale databases as a boon and bane for research
What is good meta-research and what should replication research look like?
What can we learn from meta-scientific projects studying methodological research itself?
Methodological and epistemic challenges in measuring the progress of Machine Learning prediction methods
The Open Science reform movement and its quest to change misaligned incentive structures in science
Public trust for science in an open society and the role of science communication

Program

Tuesday, June 10th	Wednesday, June 11th
10:00 - 10:45 Sophia Crüwell (Philosophy of Science, University of Cambridge) Open Science Practices as a Crutch in a Hostile Epistemic Environment	10:00 - 10:45 Bret Beheim (Evolutionary Ecology, Max Planck Institute for Evolutionary Anthropology) The Problem of Missingness in Historical and Evolutionary Datasets
11:15 - 12:00 Anne-Laure Boulesteix (Biometry in Molecular Medicine, LMU Munich) On the empirical comparison of data analysis methods	11:15 - 12:00 Stephan Guttinger (Philosophy of Science, University of Exeter) Why biologists don’t perform replication studies
12:30 - 13:15 Christian Hennig (Statistical Sciences, University of Bologna) Understanding statistical inference based on models that are not true	12:00 - 14:00 Lunch break
13:15 - 15:15 Lunch break	14:00 - 14:45 Friederike Hendriks (Communication Science & Educational Psychology, Technische Universität Braunschweig) Evidence and Uncertainty in Science Communication: What Matters for Public Trust in Science
15:15 - 16:00 Sven Ulpts (Political Science, Danish Centre for Studies in Research and Research Policy, Aarhus University) & Sheena Bartscherer (Social & Political Sciences, Robert K. Merton Center for Science Studies, Humboldt-University, Berlin) Stories from the Imagined Utopia of an Open Science: Contextualizing Reformers’ Conceptions of “Good Science"	15:15 - 16:15 Concluding discussion
16:30 - 17:15 Timo Freiesleben (Theoretical and Philosophical Foundations of Machine Learning, Munich Center for Mathematical Philosophy, LMU Munich) Are Performance Gains on Benchmarks a Good Proxy for Progress in Machine Learning?
	18:00 (Kupferbau HS 22) John Ioannidis (Medicine, Epidemiology and Population Health, and Biomedical Data Science, Stanford University) Public talk & discussion: Are rigorous methods, transparency, reproducibility, innovation, and/or usefulness features of good science?

Abstracts

Open Science Practices in a Hostile Epistemic Environment
Sophia Crüwell, University of Cambridge

This talk investigates the move towards open and transparent science, particularly in response to the replication crisis in the behavioural, social, and biomedical sciences. Can opening up science help solve the replication crisis issues surrounding low replicability and reproducibility, questionable credibility, and ill-adjusted inferences? I will argue that the answer to this question depends on our diagnosis of the underlying epistemic landscape.
Following an overview of OS practices and the issues they are meant to help solve, I will argue that in an epistemically ideal scenario, we do not need these practices. However, there are indications that we are not in fact in an epistemically ideal scenario. Instead, either vice epistemology or hostile epistemology may be fitting frameworks for the underlying problems to be solved by OS practices: either many or even a majority of scientists have epistemic vices that keep them from the pursuit of knowledge, and/or a majority of epistemically virtuous, or at least neutral, scientists have to operate in a hostile epistemic environment.
I will then argue that OS practices can better solve our problem if we assume a hostile epistemic environment at the core of the problem than if we assume widespread epistemic vice. If we assume widespread epistemic vice in scientists, then adopting OS practices would mean fighting a losing battle, as these practices cannot safeguard against that. For example, transparency indicators can be gamed, preregistration can be done vaguely, and data can be shared opportunistically. But OS practices can help if we assume that a vast majority of epistemically virtuous or neutral researchers are operating in an epistemically hostile environment. In this case, OS practices can a) enable these researchers to do better research in that environment or b) help to change the environment to be closer to ideal. I will then consider metascientific evidence to argue that a hostile epistemic environment is indeed a better explanation for our situation than widespread epistemic vice.
Finally, I will consider what this means for OS practices and their implementation.

On the empirical comparison of data analysis methods
Anne-Laure Boulesteix, Moritz Herrmann, Julian Lange, Maximilian Mandl, Christina Sauer, Munich Center for Machine Learning

When developing and evaluating statistical methods and data analysis tools, do statisticians and data scientists adhere to the good practice principles they promote in fields which apply statistics and data science? I argue that methodological researchers should make substantial efforts to address what may be called the replication crisis in methodological research in statistics and data science, and that the field of empirical methodological research can learn a lot about study design and good scientific practice from other experimental sciences. This talk gives an overview of recent works addressing the design and interpretation of comparison studies towards more reliable and less biased empirical evaluations of data analysis methods. The focus will be on three issues that have drawn ample attention in fields of application of statistics such as psychology or epidemiology, but are surprisingly most often ignored in research addressing methodological research questions: (i) cherry-picking or selective reporting, (ii) the distinction between confirmatory and exploratory research, and (iii) the storytelling fallacy in the context of illustrations of methods through applications to real data.

Why biologists don’t perform replication studies
Stephan Guttinger, Egenis Centre for the Study of the Life Sciences, University of Exeter

Empirical studies have shown that a significant proportion of published results in the experimental life sciences cannot be replicated by other researchers; the failure rate for direct replications is reported to be between 40%-80%. Given these numbers, it would be reasonable to assume that researchers only rely on data that has been replicated by other researchers; they should look back before they move forward. However, such case-by-case corroboration is not happening. Apart from a few large-scale studies that tried to establish the extent of the so-called “replication crisis”, biology journals are not suddenly filled with replication studies.
This absence of dedicated replications is often explained by a lack of resources and incentives – performing replications is an expensive and time-consuming process, and the resulting studies are difficult to publish. Whilst these factors are certainly part of the overall problem, I want to add a further explanation of why researchers don’t invest in dedicated replication studies: in everyday research practice trust does not have to be established through a linear two-step process (“Replicate first, then build on the verified data”). Using a case study from the life sciences, I will show how researchers use a discipline-specific set of trust-establishing practices (TEPs) to create an intricate fusion between new and published research within one experimental setup. This integrated approach allows them to look forward and backwards at the same time, thus increasing their trust in the published and the newly produced data in one move. This alternative path to trust does not mean that dedicated replication studies are useless, but it might explain why they are used less.

Stories from the Imagined Utopia of an Open Science: Contextualizing Reformers’ Conceptions of “Good Science“
Sheena Bartscherer, Social & Political Sciences, Robert K. Merton Center for Science Studies, Humboldt-University, Berlin & Sven Ulpts, Political Science, Danish Centre for Studies in Research and Research Policy, Aarhus University

The Open Science reform movement is to a large degree built on diagnoses of science in crisis and of science as being fundamentally broken. Beyond these dystopian attestations concerning the past or present state of science, their narratives also include optimistic promises of Open Science as the redeemer, able to revolutionize and improve science overall. This redemption is thought to be achieved mainly by opening up access for scientists and society at large. Within the movement and to the public, Open Science is usually presented as a force for good (science) and a sure path to creating a better society. They paint a picture of a utopian science that is more inclusive, more accessible, and more democratic. However, at this point, over ten years into the movement, it largely remains an open question how the narratives and promises of a ‘better science’ fit into the external realities and experiences of actual researchers today.
Therefore, in this talk we will present common examples of Open Science narratives and highlight how the movement conceptualizes ‘good science’. We will contextualize these wider narratives by contrasting them with researchers’ reports of their lived experiences in the lab and explore their personal accounts of Open Science practices and their practicality. We will discuss whether the promises of an utopian Open Science might in fact turn out to be dystopian for those who do not fit into the narrow conceptions of ‘good science’ as propagated by the Open Science movement. This misalignment may be due to epistemic, social, or other practical factors that are determining the conditions and values for academic knowledge production. We will end our talk with the question of what the potential consequences of such a misalignment between Open Science narratives and the existing external realities of researchers might be.

Understanding statistical inference based on models that aren't true
Christian Hennig, Statistical Sciences, University of Bologna

Statistical inference is based on probability models, and most of the theory behind it assumes these models to be true. But models are idealisations, and it makes little sense to postulate that they are literally true in reality. Models are however required to analyse the behaviour of statistical methods in any generality. In order to explore the implications of running statistical inference based on models that aren't true, it is helpful to look at more general supermodels that allow for violation of the supposedly assumed models. I will present a framework for how to think about statistical tests based on models that aren't true, conditions under which such inference can be useful or misleading, the concept of "errors", and what impact this has on the interpretation of the results in practical settings.

Are Performance Gains on Benchmarks a Good Proxy for Progress in Machine Learning?
Timo Freiesleben, Mathematics and Philosophy of Machine Learning, Munich Center for Mathematical Philosophy (MCMP), LMU Munich

At the core of practical machine learning methodology stand benchmarks. They transform abstract prediction tasks (e.g., image classification) into concrete learning problems (e.g., the ImageNet challenge) by specifying four key components: an operationalization of task-relevant features, datasets for training and testing, metrics for evaluating predictive success, and leaderboards that rank current models. Success on benchmark leaderboards has become the primary way of signaling scientific progress within the field. But does outperforming a benchmark truly reflect progress in machine learning—or are we merely optimizing for metrics in ways that echo Goodhart’s law, gaming our own artificial evaluation criteria? In this talk, I argue that the scientific value of benchmarks depends on their construct validity: the strength of the theoretical connection between the real-world task of interest and its operationalization through the benchmark. Consequently, benchmark performance alone is not sufficient to support substantive scientific claims. I conclude by outlining additional forms of evidence required to draw robust conclusions from benchmark results and propose best practices for improving the epistemic quality of machine learning benchmarks.

The Problem of Missingness in Historical and Evolutionary Datasets?
Bret Beheim, Evolutionary Ecology, Max Planck Institute for Evolutionary Anthropology, Leipzig

The role of missing data in defining the limits of scientific inference is complex. In the past, scientific inference was often guided by maxims such as "Absence of evidence is not evidence of absence", reflected in the practice of complete-case analysis. Simulation-based inference, careful development of statistical theories about the origins of missingness, and explicit measurement models have greatly enhanced our ability to address missingness in empirical datasets. This is especially true in the historical and evolutionary sciences, which must often work with highly fragmentary records subject to severe taphonomic filtering. In this talk, I will review three projects in which missing data play a central role: as a potentially insurmountable confound, as a key source of posterior inference, and as a starting place for ruling out specific historical processes. I hope to demonstrate that, as measurement models become more sophisticated, so too does our ability to manipulate, overcome, and fully exploit the information implied by missing data.

Evidence and Uncertainty in Science Communication: What Matters for Public Trust in Science?
Friederike Hendriks, TU Braunschweig

Public Trust in Science is subject to a fundamental tension: On the one hand, scientific knowledge is of fundamental relevance to people’s lives: in their every-day decision making, use of technology, as well as civic participation. On the other hand, scientific knowledge is rapidly evolving. As a consequence, science communication draws on sometimes uncertain, sometimes conflicting evidence. This is especially prevalent when scientific knowledge is produced to inform personal or public decision-making during evolving crises such as the COVID-19 pandemic. How do members of the public decide whether to trust scientific experts and rely on scientific knowledge when it is yet uncertain? The talk will introduce a notion of epistemic trust, and provide empirical evidence on people’s abilities to weigh their trust in scientific experts in the face of uncertainty.

To participate, please register by e-mail to k.holzheyspam prevention@tfw.uni-tuebingen.de.

Participation is free of charge, but registration is required.

We have reached our capacity limit. Therefore, unfortunately, registration is closed.