Don’t be afraid of public code
Hans-Martin von Gaudecker talks about his experience with Open Science
Three key learnings:
- Economists who need software templates for economic research projects that already pre-configure a folder structure can find them at https://open-econ.org/.
- It is well worth investing a few days at the beginning of research work to be clear about one’s tools.
- pytask is a tool which automates the execution of all steps in a research project from data processing to preparation of the report.
What is Open Source Economics about?
HMvG: Open Source Economics is a group driven by students, PhD candidates and postdocs who code and are very much attached to the idea of Open Source. The platform is collaborative in character, they exchange information about codes, for example, and occasionally write things together. Open Source Economics is open to everyone who’s interested. The code is available on GitHub and everyone can contribute to it. For our project Estimagic for instance we have collaborators in Barcelona. Open Source Economics stands for projects. An important project at Open Source Economics that we drive strongly are templates for economic research projects. The templates provide directly the configuration for later work, i.e. the folder structure. It even includes an example, so you only have to add or modify the code where necessary. So users have the complete structure on their computer and don’t need to think about that any longer. Work that needs a whole day normally can thus be done in half an hour. The template also includes the pytask programme. With the help of the folder structure you divide the project into steps for data processing, analysis, graphs etc. These steps are interdependent and need to be synchronised. For example, models must be estimated anew if the data change. To remember all these dependencies is too demanding and accident-prone. With pytask researchers can automate the synchronisation of the steps and save themselves a lot of brainwork.
How many of these templates are there on Open Source Economics?
HMvG: It’s a prototype, a generic template which will allow use with four different programming languages. As soon as we will have finished the next revision, these will be Python, R, STATA and Julia.
What are the benefits of working with pytask to you?
HMvG: I believe it’s the only way to make things reasonably reproducible. As soon as I need to think about several intermediate steps – saving, running the various scripts one after the other, copying, pasting into the word document etc. – there’s no chance I can do this reproducibly for semi-complex papers. And when you add to this the half year between submission to the journal and revision and resubmission, there’s no way you can remember the correct order of the steps. Unless you have painstakingly documented the programme. In my opinion there are few decent tools for truly running the entire analysis from start to finish, and in ways that rule out certain errors. There are seldom tools that allow you to run only those parts that need to be run after you have changed individual steps in them. In the end I save a lot of time, also because pytask can execute steps in parallel.
Are all the people involved in Open Source Economics on the active side or are they mostly users?
HMvG: Both, within the framework of this project. Everyone can look at it and use it. It helps if people have seen before what I do in my course “Effective programming practices for economists”. But it’s not like everyone who attends that course is later active in the group. There are three, four people who actively collaborate on the code for the templates.
What is the overall objective you’re pursuing with Open Source Economics?
HMvG: With Open Source Economics we look in two directions. The first is to exchange information about codes within the group or locally and to use certain synergies and complementarities out of the group. The other, if we think about the outreach, is to provide high-quality software that other people can use. It’s about pushing reproducible research. Another projects of ours is Estimagic, a collection of numerical optimisers and other tools helping to estimate the parameters of scientific models. Estimagic tries to solve two problems: one is that there are many excellent and freely available algorithms, but they must be called up in different ways and it’s difficult to switch between them. But exactly that is required to find out which algorithm is the best match for a model. The other is that many steps needed for the estimation of parameters must be implemented anew in each project which often leads to errors. Actually, however, these steps are not specific to a model. We and others apply Estimagic in many projects. It saves time and helps make our results more transparent and less erroneous.
You teach the course “Effective programming practices for economists” for master students and PhD candidates at the University of Bonn. Do you see higher downloads or visitor numbers on the website after the course?
HMvG: I haven’t looked at the usage statistics of the website for quite some time, I must admit. But I do see definite qualitative effects on the work the students do on their master and PhD theses. At least those who use it from the beginning. Of course that varies. Nowadays I take great care to ensure that my own PhD candidates use it from the beginning. It’s an offer that more and more people are using, many more than ten or twelve years ago when I first started with this. Reproduction still matters very much to me, but more as a pleasant side effect of using decent software development techniques. And right now acquiring such skills is very attractive.
Can you tell how much Open Source Economics has entered the teaching of students outside Bonn?
HMvG: I regularly teach programming in block courses to economics students in Berlin, Munich, Zurich and other places. And soon there will be teaching videos.
Do you have tips for scientists who have had nothing to do with Open Science so far what they should bear in mind when they want to enter the subject?
HMvG: Yes, I would say: don’t be afraid that your code goes online. Nobody is going to take something away from you. Take your code seriously and use decent tools. You can ruin everything by using the wrong tools. You should really invest a few days at the beginning of research work to be clear about your tools.
How do you see the future significance of Open Science in economic research?
HMvG: I think it will become ever more important but without being overly dogmatic in practice. Sometimes I get the impression that you can’t do Open Science right, because you will never achieve a perfect ideal. The fundamental idea behind it is right and important. But you can’t publish everything all the time. You can’t establish full transparency. I think we’re on a good way in economics. The availability of code and data has changed over the last ten years from lip service to serious commitment, at least at some of the best journals. That is truly a value added. If I publish something in American Economic Review, I know I can download the code and use it. That wasn’t the case ten years ago. The more all of us use decent automated tools, the sooner we will make research reproducible.
The questions were asked by Dr Doreen Siegfried.
The interview was conducted on March 11, 2022.
About Professor Hans-Martin von Gaudecker
Professor Hans-Martin von Gaudecker is Professor of Applied Microeconomics at the Department of Economics at the University of Bonn. He is also a team leader at IZA and a research fellow at Reinhard Selten Institute, CESifo, Netspar and Munich Center for the Economics of Aging. His research interests revolve around modelling the life-cycle behaviour of households and informing public policy aiming to reduce inequality. The methods Professor Gaudecker uses involve heavy use of modern computing infrastructure; maintaining reproducibility of results becomes a challenge in such and other circumstances. He has created some software that aims to make it easier for researchers to meet the challenge. His teaching is focused on conveying the economics alongside econometric and computational techniques. Professor Hans-Martin Gaudecker is Associate Editor of Journal of Comments and Replications in Economics (JCRE).