“Replications are essential for trust in economics”

Florian Neubauer on his experiences with Open Science

Foto von Florian neubauer

Copyright: RWI / Reinaldo Coddou H

The three key learnings:

  • Replications are a great way to deepen methodological skills. By working with external code and data, young researchers and students in particular learn a great deal – not only in terms of content, but also technically.
  • Even though replications rarely bring much visibility, they are a positive opportunity for young researchers to actively shape science and contribute to quality assurance.
  • Replications are not a vote of no confidence, but rather a tool for testing the robustness of results and thereby building trust in research.

What are the different types of reproducibility and replicability? Let’s start with the terms.

FN: I can speak for economics, because there are sometimes different definitions in other social sciences. In our Robustness and Replicability in Economics (R2E) project, led by Jörg Ankel-Peters, we at RWI are working together with the Institute for Replication and its chair, Abel Brodeur. Over time, a framework has been established here, and a distinction is made between two main groups: reproducibility and replicability. Reproducibility means that we use the same data from the original study; replicability means that we test the same research question with new data. Reproducibility includes computational reproducibility, in which the original code is executed with the original data and the results are compared with the published results. Reproducibility also includes robustness reproducibility, in which we vary analytical decisions based on the same data to test whether the original results remain stable against plausible alternatives. Replicability distinguishes between direct replications, in which new data is evaluated using the same analytical methods, and conceptual replications, in which new data is used with modified methods or research designs to test whether the underlying hypothesis or research question can be supported again. The rule of thumb is: the same data stands for reproducibility, new data stands for replicability.

Which of these dimensions do you think poses the greatest challenge for empirical economics?

FN: The biggest challenge is certainly replicability, i.e. the collection of new data. This often involves considerable effort, is difficult to implement in many contexts and is sometimes not even possible. In my view, however, robustness replications are the most immediately relevant. These examine whether the published results remain stable with the same data when alternative but plausible analytical decisions are made. This is central to research because it reveals whether published results are actually reliable. Such checks are important for the further development of scientific literature. In recent years, there have been repeated intense debates on this topic, which show how sensitive and at the same time necessary this area is.

Does this mean, conversely, that individual studies are becoming less important and that ultimately only large meta-analyses are decisive in ensuring robustness? Where do you see the development when it comes to creating trust in scientific results – i.e. results that apply not only to a small sample but to an entire country, for example?

FN: The question of sample size and transferability is a separate issue. However, what you are referring to primarily concerns the importance of robustness reproductions. Ideally, every study would be reproduced or replicated at least once so that another research team could verify the results. In practice, however, this is hardly feasible. At the same time, trust in science is regularly under pressure: On the one hand, replications are indispensable because they reveal weaknesses and advance research. On the other hand, some people interpret this critical process of as meaning that scientific results are unreliable overall. However, replications should not be seen as a weakness, but as a necessary part of scientific progress. They help to continuously improve the quality of research and ultimately strengthen confidence in its results.

I think the discussion about quality assurance is primarily an internal scientific issue. Much can be clarified within a working group, a department or a sub-discipline without having to be immediately communicated to the outside world. There are also established quality assurance processes that take effect before results are communicated so widely that they become politically relevant. So you could say that we are operating at different levels here. Against this background, I wonder whether, in light of the replication debate, science should treat individual studies more like preprints: as preliminary results that only gain weight through reproducibility tests – just as preprints are only validated through peer review.

FN: Yes, and even peer review is not infallible – which is precisely why replications are so important. I agree with you: individual studies should be viewed with caution. There have been political decisions made on the basis of individual studies that turned out to be problematic in retrospect. Nevertheless, these are isolated cases. In this respect, I share your assessment that individual results should initially be regarded as preliminary. At the same time, there are practical hurdles: collecting new data is often time-consuming, sometimes impossible, or data is not freely accessible for legal reasons. Nevertheless, it would be desirable, ideally, for published studies to be replicated in order to verify whether the results are actually reliable.

Let’s move on to your own work. You are part of a research team that deals with replicability and reproducibility in economics. What is your main area of interest? And what have you been able to find out so far?

FN: I came to this topic through my doctoral studies at the University of Connecticut. My doctoral supervisor involved me in a project in which we examined replications published in the American Economic Review as so-called comments between 2010 and 2020. Our initial interest was not so much in carrying out replications ourselves, but rather in finding out whether such work actually influences subsequent literature – in other words, whether it has a corrective element. We presented our findings in a paper entitled “Is Economics Self-Correcting?” As a preliminary step, we first systematically recorded how many replications actually appear in the 50 leading economics journals. We published these findings separately under the title “Do Economists Replicate?” The results showed that published replications are very rare: less than one per cent of all published articles are replications – even though, as we also found out in an editor survey, most journals state that they publish replications as a matter of principle.

Okay.

FN: Building on this initial investigation, we looked at whether replications have an impact on the citations of the original paper. Our hypothesis was that if a comment expresses substantial criticism, such as pointing out errors or questioning results, citations of the original article should decline over time. In fact, however, we found that replications themselves are rarely cited and have no measurable impact on the citation frequency of the original studies.

That’s sobering – especially since citations are often used as a measure of scientific influence.

FN: Exactly. For researchers, this makes replications a rather thankless task, as careers depend heavily on citations. In our current work at RWI, we are going one step further together with the Institute for Replication. As part of a big team science project, we are examining 66 studies from the field of development economics. The main focus is on robustness reproductions, i.e. the question of how stable results remain when alternative but plausible analytical decisions are made.

And how do you proceed methodically?

FN: We have developed a protocol and commissioned 66 replicators to reproduce published studies. The aim is to standardise the process as much as possible in order to ensure comparability. This includes pre-registrations, an internal review by the core team and peer reviews by other replicators. In this way, we want to ensure the quality of the replications.

Are there any initial findings?

FN: We have now seen almost all 66 reports, and the majority have undergone our internal project review. It’s a mammoth task, but we are working flat out on the final stages. We hope to be able to publish the results next year.

Research on this topic has been going on for about ten years. So could one say that little has changed in the meantime?

FN: Not quite, something has definitely changed.

For example?

FN: One important advance concerns awareness of the issue of reproducibility and data availability. A few journals now even have data editors who ensure that data sets are uploaded and made accessible. However, this is by no means a universal standard, but has so far been established in certain specialist journals. Most journals now have at least a data sharing policy – but there are still problems with implementation and enforcement.

Returning to the 66 studies: does that mean you have 66 people, each working on one paper according to a fixed protocol? Or does one replicator have to take on several papers?

FN: No, there are 66 papers and 66 replicators, i.e. one person per study. Each person receives the corresponding replication package together with our protocol and carries out the reproduction. The aim is to publish the individual results and at the same time combine them in an overarching meta-paper in order to gain general insights into the robustness of this field of research. The replicators receive a fee for their work and become co-authors of the meta-paper.

How did you select the 66 studies? You mentioned that you searched the top journals – can you briefly outline the selection process?

FN: We set several criteria. First, it had to be a paper from the field of development economics. We also only considered studies that estimated a causal effect. We excluded purely descriptive studies, but these are rare anyway. The studies also had to be empirical microeconomic research, not macroeconomic analyses. The selection was limited to three journals: the Journal of Development Economics and two journals of the American Economic Association. In the case of the Journal of Development Economics, it was clear that all papers fell within the field of development economics, but in the case of the others, we had to examine and classify each paper individually. In order to obtain a sufficient number of studies, we expanded the publication years step by step – first to 2021/2022, then back to 2019 or 2018.

And how did you then narrow down this larger pool to the 66 papers?

FN: In the end, we had about 250 eligible studies. Of these, we only pursued those for which data and code were available and the code was also executable. At this point, we did not yet check whether the results were identical to the original paper, but only whether replication was technically possible. From this reduced amount, we still had more work than we could handle. That’s why we finally selected 66 at random.

Did you have a template for the protocol that you could build on, or did you develop it from scratch?

FN: We didn’t start from scratch. The experienced members of our team had already carried out numerous individual replications and had the relevant experience. We also developed our protocol on the basis of two pilot replications. Taking the whole thing to the meta level was something that had not been done before in the social sciences.

And how did you select the 66 replicators? Did they have to apply and prove that they had experience with replications and working with such a protocol?

FN: Exactly, we published a posting on our website and through various channels. Interested parties had to apply with a CV and a short letter of motivation. The response was very positive: we received applications from professors, but also from postdocs and doctoral students. Some of our replicators have also already participated in meta-science studies or even many-analysts projects.

When you look at your international meta-project, what do you personally take away from it? What do you enjoy about it?

FN: I enjoy being part of a community that has gained enormous momentum in recent years. I have been working on this topic for four or five years, and even in this relatively short time, a lot has happened, especially in economics. It is clear that replications and quality assurance issues are becoming increasingly important to more and more people, including colleagues who do not conduct research in this area on a daily basis. Many now participate in Many Analysts studies or talk to me about it in the office. And finally, I see the social added value. It is important that politicians and the public can make decisions based on reliable research. Projects like ours contribute to this. I also enjoy being active in two roles: on the one hand, I take on coordination tasks, and on the other hand, I also work on the project myself, reading reports, discussing with replicators and ensuring quality. I find this mixture of organisational and content-related work particularly enriching.

In your view, how does meta-research influence disciplinary research? Does it have something of a self-cleansing effect?

FN: In my own work, I notice this above all in the fact that I pay much more attention to clean documentation. A complete replication package with traceable code and clearly described steps is much more important to me today than it used to be. I have often seen how this is neglected – and I consciously try to do better. Whether it has a self-cleansing effect remains to be seen.

Imagine you are talking to young researchers or students who still have little experience: what three tips would you give them for getting started with reproducibility and replicability?

FN: First, simply replicate it yourself. This does not have to be part of a compulsory course. You can also independently select a paper that interests you and check whether there is a reproduction package available. Then you can first attempt computational reproduction and later add robustness checks. You learn an incredible amount in the process, including how to work with unfamiliar code. However, it’s also clear that plausible robustness decisions require a great deal of knowledge and often experience. Second, participate in replication games. These are organised by the Institute for Replication and take place several times a year worldwide. It’s a low-threshold introduction and a great way to become part of a global community. And third, network. Exchanging ideas with others who already have experience with replication is very valuable. You learn a lot and encounter an open, collegial community. That’s important. And in my experience, the people in this “meta-science bubble” are very nice and helpful.

Finally, a bigger question: how can your work – regardless of the specific results – strengthen trust in research? And how could it inspire economic research as a whole?

FN: Trust is not created by scandalising individual cases of scientific misconduct. Of course there are black sheep, but they exist everywhere. If you sensationalise such cases, you end up creating mistrust. It is more important to focus on robustness: how stable are results when minor assumptions are changed or alternative analyses are performed? Replications show exactly that, without the intention of proving anyone wrong, but rather to advance science as a whole. In my view, replications are therefore indispensable for ensuring and further developing credibility and trust in economic research.

Thank you very much!

*The interview was conducted on 15 September 2025 by Dr Doreen Siegfried.
This text was translated on 7 October 2025 using DeeplPro.

About Florian Neubauer, PhD:

Florian Neubauer is a researcher in the Climate and Development Policy department at RWI and, since 2024, a research associate at the Expert Commission for Research and Innovation, which advises the German federal government. Florian Neubauer holds a PhD in Agricultural and Resource Economics from the University of Connecticut, USA, a Master’s degree in Development Economics from Georg August University in Göttingen, and a Bachelor’s degree in Economics and Political Science from Leuphana University in Lüneburg. His research focuses on development economics and research transparency.

Contact: https://www.rwi-essen.de/rwi/team/person/florian-neubauer

LinkedIn: https://www.linkedin.com/in/florianneubauer/

ResearchGate: https://www.researchgate.net/profile/Florian-Neubauer




to Open Science Magazine