Working reproducibly: for yourself, for others

PRACTICAL GUIDE 7

What are we talking about?

Research is considered reproducible if using the same data, the same methodology and possibly the same environment leads to findings that are identical to the original findings. Providing data, software code, algorithms or experimental setups is an important prerequisite for this.

How to establish reproducibility varies, depending on discipline and methodology. Basically, it means that an experiment is conducted identically; that the results of statistical analysis of quantitative data are reproduced; that the analysis and the results from evaluating image and text corpora are reconstructed etc.

The benefits of a reproducible approach

Good documentation enables you to understand, even after some time has passed, exactly what you did in your computations and why you did it. This is helpful (e.g. after an article has been reviewed) or if you cannot work continuously on a project. You monitor and log how your data and/or code evolve from the beginning of the project and with every change. It is much more difficult and unreliable if you have to reconstruct these evolutions a posteriori.

It is easier to explain and justify the results you obtain to fellow scientists. If you submit an article for publication you will find it easier to respond to questions from your reviewers.

Future papers are made more reliable. You’re giving yourself a chance to reuse data, code, documents etc. in the future.

How to apply this approach in practice

Document the work and analysis stages of your research. If you use statistical software to analyse your data, comment in detail what you are doing in the individual analytical stages and why you are doing it. This may be absolutely clear while you are doing it, but two months later it can be less so, even if you are the author.

Manage your bibliographic references with a tool such as Zotero. Working with a reliable bibliographic standard is a common requirement for all disciplines.

Organise your data, files and folders: Apply file-naming conventions, build folder trees with a consistent, scaleable structure, separate raw data from analysed data, etc.

Learn the basics of version control, even if your research does not require coding skills. The possibility to restore a specific version of a document written over a period of several years can be invaluable.

Automate specific recurrent tasks. This can increase the reliability of your findings and facilitate the writing of scholarly papers because you can vary parameters much more easily.

Your resources are limited? Consider using collaborative approaches! Train yourself in collaborative working methods; participate in research projects with other colleagues; use public datasets if they exist.

Automate your workflows: Write scripts to process your data and to manage your work stages. Avoid using spreadsheets for the processing of large datasets.

Opt for Open Source solutions to ensure more transparency and guaranteed access.

Good luck with your research!

Date: March 2021
Questions, comments and notes are welcome at open-science@zbw.eu

to Open Science Magazine