Errors are normal – the code should still be published

Jan Marcus on his experiences with open science

Photo credit: Bernd Wannenmacher/FU Berlin

The three key lessons:

Reproducibility is a key requirement for scientific studies. Open data and accessible code are essential for research to be verifiable and reliable. Without them, much remains vague.
Errors in code happen. What matters is not perfection, but the willingness to deal with errors openly so that they can be improved.
Open science is practical and can be learned. Formats such as replication games or simple documentation in your own workflow make it easy to get started. Those who start early will benefit in the long term – both professionally and organisationally.

Where do you currently see specific challenges in your field of economic research with regard to replicability and reproducibility – also against the backdrop that many journals are increasingly requiring the provision of so-called replication packages?

JM: Reproducibility is a fundamental requirement for any scientific study. Nevertheless, many studies do not even meet this minimum requirement. Reproducibility means that identical results are achieved when using the same data and the same code. In practice, however, this is often not the case. One key reason is that either the data, the code or both are not publicly accessible. Even when both are available, problems regularly arise: for example, the code does not run on another computer, or uncontrolled random processes mean that the results cannot be reproduced exactly. These aspects make traceability considerably more difficult and show that there is still significant room for improvement in practice.

And why, in your experience, is data not shared? What reasons do you see for this?

JM: I think it is important to distinguish between sharing data and sharing code. When it comes to sharing data, there are various reasons why this is not done in a transparent manner. A common and understandable reason is that researchers often do not have the rights to pass on the data – especially in the case of secondary data analysis. In such cases, you are not the owner of the data and therefore not allowed to publish it.

However, it is possible to transparently document how other researchers can access the data themselves, provided that it is accessible in principle. Another reason is personal reservations: those who have invested a lot of time and effort in collecting data are sometimes reluctant to make it available without restrictions. There are practical solutions for this too, such as a time-limited embargo period after which the data is released.

The situation is different when it comes to sharing code, i.e. analysis and processing scripts in statistical software. Here, I see no compelling reasons not to disclose the code. The effort involved is relatively low, as the code was created during the analysis anyway. Overall, I believe that the benefits of transparent sharing clearly outweigh the disadvantages for science.

However, there are various reasons why researchers do not make their code available. Some researchers are simply unaware of how important it is to publish their code. There is sometimes an assumption that a detailed description in the paper is sufficient to make the analysis comprehensible. In practice, however, this is rarely the case. Most work is methodologically complex, and many details cannot be fully documented in an article without compromising readability. In the past, there were also technical challenges, such as the question of where to store the code. This hurdle hardly exists today. Most journals offer the option of linking to repositories, and personal websites or institutional platforms are also available. In my view, technical reasons are therefore no longer valid.

Do you think that some researchers may be concerned that errors could be discovered in the code or that it might not appear “clean” enough?

JM: Yes, that also plays a role. Some researchers are concerned that errors will be discovered when the code is published or that the structure of the code will be considered confusing. It is very important to me that we establish a constructive culture of error management. Errors occur regularly, especially in complex evaluations. I am involved in a larger meta-study in which numerous studies were reproduced and also checked for errors. In about a quarter of the cases, errors were actually found in the code. This does not mean that a study is unusable overall, but rather shows that minor errors occur frequently in practice – even among very experienced researchers. Therefore, errors are normal – the code should still be published. Only when the code is disclosed is it possible to identify and correct errors, which is much better than leaving them undetected. However, a constructive error culture also means that we must allow others to make small mistakes—and not condemn them to the ground. If the code is not published, the potential for quality assurance remains untapped.

What role could AI-supported tools such as ChatGPT play in preparing code for publication? Could this be an incentive or a tool to publish code in the first place?

JM: I think such tools can certainly be helpful – but we shouldn’t have unrealistic expectations. They are useful, for example, when the code has little or no commentary. AI-supported programmes can help add comments or check the code for possible errors or inconsistencies. Although this is no guarantee of error-free code, it can improve quality and reduce uncertainty. You can also ask the tools if they have any suggestions for improving structure or efficiency. All of this can help researchers feel less inhibited about disclosing their code. Ultimately, the responsibility remains with the researcher, but the technical possibilities can make it easier to get started.

What role does publicly accessible code play in reproducibility?

JM: The gold standard is that both the data and the code are available. If only the data is available, that is already progress – but without the associated code, much remains unclear. In the past, it was sometimes assumed that it was sufficient to provide the data – or that it was not necessary at all, for example in controlled random experiments that were thought to be easy to evaluate. It has since become apparent that even these controlled random experiments are often much more complex than expected: control variables are included or adjustments are made for multiple testing. These details can hardly be described completely and unambiguously in text.

Code offers much greater precision here – similar to how a mathematical formula is often clearer than a verbal description. An example: I work a lot with the Socio-Economic Panel. If a study states that “income was controlled,” this is not very meaningful. The panel contains around 40 to 50 different income variables – depending on whether it is gross or net income, individual or household level, monthly or annual data. Such decisions cannot be meaningfully reflected in the body text without making the text confusing or unreadable. The code creates transparency here and enables others to follow the analysis exactly.

Would you say that specialist journals and funding institutions have already exhausted all possibilities? Or to put it another way: if you had to draw up a wish list, what more could be asked of researchers?

JM: Quite clearly, specialist journals should make the provision of replication code mandatory – and not just formally, but with binding controls. In addition, it would be desirable for journals themselves to check whether the results presented in the manuscript can actually be reproduced using the code submitted. However, this form of quality assurance is very time-consuming and therefore costly – it can usually only be implemented by financially well-equipped journals. However, there is a positive development in economics: some of the top journals already carry out such reproducibility checks. This means that the discipline is playing a pioneering role in this area compared to other disciplines.

In addition to academic journals, research funding bodies should also take on greater responsibility. Institutions such as the DFG could, for example, make it a condition that the analysis code is made publicly available at the end of a project. Such requirements would make an important contribution to the transparency and traceability of scientific work.

If someone is unsure whether their own code is “good enough” for publication, are there criteria or a kind of checklist that can be used as a guide?

JM: From a reproducibility perspective, the most important criterion is that the code runs on another system and delivers identical results. A simple test is to provide the code with the data of a colleague. If they can reproduce the results on their computers, that’s a good sign.

If you want a piece of code to still be reproducible in ten years’ time, what technical measures must the person who first publishes it take to ensure long-term reproducibility?

JM: I used to simply publish my replication packages – i.e. code and data – on my personal homepage. That seemed sufficient to me, as they were publicly accessible. In practice, however, this has proven to be unsustainable: many people cannot find the data there, and if the site goes offline at some point, the materials are no longer available. Today, I use specialised repositories designed for long-term archiving instead.

Which ones are they?

JM: There are various reliable repositories, such as Harvard Dataverse and the Open Science Framework. These platforms are designed to keep data and code accessible in the long term – much more stable than a personal homepage, for example. From a technical point of view, long-term accessibility is a key component of sustainable reproducibility. Another advantage for researchers is that they receive a permanent identifier, such as a DOI. This allows the materials to be clearly referenced. Ideally, this link should be provided both in the article itself and on the journal website so that interested parties can find the replication code without any detours. It is also important that the code documents exactly which software was used, including version information for programmes and packages. Even if the file is available, replication may fail if these technical details are missing.

Are earlier versions of software or packages still accessible in the long term? Or is it like with old cassettes – you don’t know how to play them anymore?

JM: With most common statistical programmes, access to earlier versions is generally possible. In economics, for example, Stata is frequently used. The system is backward compatible: in newer versions, you can specify that a particular command or an entire script should be executed in an older version. This function is supported directly by Stata. It is more complicated with user-generated add-on programmes. These are not archived centrally, which means that older versions may no longer be available. It therefore makes sense to provide such external programmes together with the replication package.

For other platforms such as R or Python, the same applies: here too, you should document which versions of the programmes and packages used were employed. There are also technical solutions such as Docker, which allow you to save an entire computing environment. This enables subsequent replication to be carried out under exactly the same technical conditions. Overall, the options for ensuring reproducibility have become much more sophisticated in recent years.

How do you make your students and doctoral candidates aware of reproducibility and the careful handling of code? What role does this topic play in education?

JM: I think it is essential that the topic of reproducibility is anchored early on in education. The master’s and doctoral phases in particular offer a good opportunity to establish a transparent scientific workflow from the outset – in other words, one in which data and code are traceable and reusable. These principles should be an integral part of education – not only for doctoral candidates, but also for master’s students. Even if the latter go on to work outside academia, they will benefit from this: structured handling of code and the ability to document analyses in a reproducible manner are also in demand in many non-academic professions. In this respect, it also strengthens career prospects and employability. Anyone who works on software or data projects in a team-oriented manner must be able to ensure that their own work steps remain comprehensible to others.

You are also active at Lab². Based on your experience, what measures would be appropriate to reduce disincentives in the academic system and integrate replication more strongly into everyday research? How can colleagues be motivated to share more and replicate more often?

JM: Part of it is pressure – for example, from journals that make the publication of data and code mandatory. Supervisors of doctoral projects can also demand such standards at an early stage. At the same time, it helps to firmly anchor the topic in scientific training so that reproducibility is perceived as a matter of course.

For a long time, a key problem was the relationship between original authors and replicators. Replications often only had a chance of being published if they refuted the original results. This led to a kind of antagonism. Yet it is just as scientifically relevant when results are confirmed.

There therefore needs to be more opportunities for replicating studies to be published, regardless of the outcome. Journals such as the Journal of Comments and Replications in Economics (JCRE) show that this is possible: there, confirming replications are also considered valuable contributions. This promotes reliability and at the same time reduces incentives for overdramatisation.

Open science is often linked to the argument that it strengthens trust in science – even outside the specialist community. But is it realistic to expect politicians or business leaders to actually understand code and data? Or is this an argument that only applies within the scientific community?

JM: In my view, reproducibility is primarily aimed at the scientific community. That is where it is decided which results are considered reliable and passed on to policy advisors. Politicians and business leaders rarely check code and data themselves, but they rely on assessments from the scientific community. When studies are openly documented and successfully replicated, their credibility increases – and so does the trust that policy advisors place in them.

Policy advice is an important area of application, but reproducibility also plays a central role in the development of theory in science. Many theoretical models are based on empirical findings. If these findings are not reliable, this has direct consequences for the further development of theories. That is why there first needs to be a critical debate within science about which results can be considered reliable.

You have conducted research your own on replication code. What are your key findings?

JM: We looked at over 2,500 publications based on data from the Socio-Economic Panel and checked whether replication code was available. To do this, we evaluated the articles ourselves, searched journal websites, looked at authors’ websites and checked relevant repositories. The result was sobering: only in about 6 percent of cases was code publicly available.

Only 6 percent?

JM: Yes, based on all publications since the mid-1980s. At first glance, that’s a sobering figure. But the average masks an exciting dynamic: in recent years, the proportion of publications with accessible code has risen significantly and now stands at over 20 percent.

So we are seeing a clear trend towards greater openness. This is partly because more and more journals are requiring code to be made available. On the other hand, it is now much easier to provide replication material. In the early years, this was hardly possible – the first code we found was still published as an appendix to a printed paper. In addition, the discussion about reproduction crises has raised awareness of the importance of openness. New technologies, such as AI-supported tools, can also help to further lower barriers. I am therefore confident that this positive trend will continue.

I think it is relatively easy for specialist journals to make the provision of code mandatory. Technically, it is often sufficient to tick a box and provide a link to an external repository – they do not need to set up their own infrastructure. I therefore assume that this practice will increasingly become standard. It is also interesting to note that this can give rise to new forms of research: if sufficient code is available, it is possible to analyse which variables are frequently used or how methodological approaches change over time. The code itself thus becomes the subject of research.

You also deal with coding errors. What are the most common types of errors? And how do they affect replicability?

JM: There are different types of errors. A common, more technical error is that the code does not run on another computer – for example, due to incorrectly set paths, missing files or unclear software specifications. A somewhat more serious error concerns the handling of missing values. In Stata, for example, missing values are treated internally as very large numbers. If this is not taken into account, it can significantly distort the results: For example, if you create a variable that indicates whether a household is above or below the average income and do not handle missing values correctly, cases with missing income will incorrectly end up in the above-average income group because missing values are interpreted internally in Stata as very large numbers. Depending on the proportion of missing values, this can have a significant impact on the results.

Such errors are usually unintentional and technical in nature. There are also cases of deliberate manipulation, such as when key figures are changed to make results appear more appealing visually. However, such cases are rare. This is also shown by meta-studies in which many replications were carried out: the vast majority of studies are reproducible – provided that code and data are available. This has also strengthened my own confidence in scientific practice. Individual cases of misconduct often receive a lot of attention, but they are not representative of the breadth of research.

Final question: When you meet young researchers who have had little contact with open science or replication but are interested in it, what practical recommendations would you give them to get started?

JM: I would recommend taking part in a replication game. I’ve done this several times. It’s a focused, one-day event where you work with others to try to reproduce a study. Most formats are now hybrid, which makes it easier to participate. In addition to the professional exchange, you learn a lot about data preparation, coding and writing scientific texts. And I enjoyed it so much that I am now organising a Replication Game at the FU in Berlin on 30 September together with the Institute for Replication. A little advertisement here.

I think the concept of the Institute for Replication is very successful – especially because it does not focus on confrontation between original authors and replicators. Instead, both sides are involved. One of my own papers was also replicated there once, which helped me to better understand the perspective of the original authors. It’s very rewarding to be treated with respect – not condescendingly or embarrassingly, but as part of a joint process to ensure and improve scientific quality.

Do you have a second tip? Something that is also low-threshold, fun and doesn’t take too much time?

JM: My second tip would be to start with open science practices as early as possible and integrate them directly into your own workflow. It is during the doctoral phase in particular that many people develop their basic working style – and this often becomes ingrained. Those who learn early on to document transparently, structure code cleanly and present results in a comprehensible manner will benefit in the long term, both professionally and organisationally. At the same time, you should see the benefits not only for the discipline, but also for your own work. These open science practices not only facilitate reproducibility, but also make it easier to understand your own research work over longer periods of time.

Thank you very much!

*The interview was conducted on 19 June 2025 by Dr Doreen Siegfried.
This text was translated on 14 July 2025 using DeeplPro.

About Prof. Dr. Jan Marcus

Dr. Jan Marcus holds the Chair of Applied Statistics in the Department of Economics at the FU Berlin. His research combines politically relevant questions with the application of the latest statistical methods for identifying causal effects. Another focus of his research is on improving scientific standards, particularly in the area of replication and reproducibility. He is committed to the transparent handling of data and code and emphasises the central role of replication material for comprehensible research.

His work has led to numerous publications in renowned international journals, including the American Economic Journal: Economic Policy, Journal of Human Resources, Journal of Public Economics and Journal of Health Economics. Among other honours, he has been awarded the German Business Award by the Joachim Herz Foundation and a dissertation prize by the German National Academic Foundation.

*Registration for the Replication Game on 30 September 2025 at the FU Berlin: https://www.wiwiss.fu-berlin.de/forschung/laborscarcity/Dates/Replication-Game.html

Contact: https://www.wiwiss.fu-berlin.de/fachbereich/vwl/angewandte-statistik/Team/professor_innen/marcus/index.html

LinkedIn: https://www.linkedin.com/in/jan-marcus-1a819724b/

BlueSky: https://bsky.app/profile/janmarcus.bsky.social

ResearchGate: https://www.researchgate.net/profile/Jan-Marcus

OSF: https://osf.io/96vyp/

to Open Science Magazine