Make a positive impression with preregistration
Dr Florian Pethig talks about his experience with Open Science
Three key learnings:
- The preregistration platform AsPredicted.org is easy to handle and offers a simple path to your first preregistration.
- With preregistrations you can make a positive impression in a peer review.
- Research data can be published on Open Science Framework. OSF offers an integration with GitHub.
When did Open Science become important for you?
FP: As a Business Information Systems Researcher I have always been interested in Open Source. My view of things changed when I looked at how to set up experiments in the course of an algorithm study (How do algorithms affect the chances of disadvantaged groups). In the field of Acceptance of algorithms there were a few psychologists who practise Open Science. I found that pretty interesting. Especially as I had no personal background with experiments, yet I could easily follow the work of those psychologists. And I could use it as Best Practice for my own work. That’s the first time I conducted an online experiment with Amazon Mechanical Turk. Thanks to the psychologists’ published data I could retrace all the requirements much better. I had no previous knowledge of Attention Checks or Comprehension Checks. But if all materials are available I can carry out high-quality experiments as an Information Systems Researcher. I can reconstruct the experiments at the practical level and learn a lot about the craft. That was a key moment for me, so I decided that I want to pursue Open Science.
You have made tentative studies of preregistration. How do you see the change of attitudes towards preregistration in your field of research?
FP: I have studied the prevalence of preregistrations in the field of Information Systems in collaboration with a student. We found only one preregistered experiment in the top eight journals, and that was only published in September 2021. So in principle this is fairly new. We looked a little deeper into this. Preregistrations are coming to the surface. Researchers address this question in advance: Which changes do we need to initiate? What is our position on transparency? And that’s where we found quite a few activities. For example, there is now a journal dedicated to replication research only. There is one big replication study in the field of Information Systems. And many Editorials are quite new. MIS Quarterly for instance, which is a flagship journal in our field, has published new transparency guidelines in June 2021. They now also have a Transparency Editor. Things are happening, but published studies are still rare.
So there are outlets that only need to be filled with content?
FP: That’s my perception, at any rate. We also looked at the fields of Management and Marketing, which are most similar to our own discipline, and there we found more preregistrations. 90 papers since 2016. The tendency towards more preregistrations was quite obvious here. We also see that more and more journals that didn’t publish preregistrations before are now starting to do so.
How do you explain these changes?
FP: If important journals like MIS Quarterly are starting with transparency guidelines, the other journals will follow suit bit by bit. And that’s a good start. But it will take some time of course, because of the review cycles. And of course it also helps researchers if it gives them a connecting link, if the journals require it.
Why is AsPredicted.org more popular than Open Science Framework?
FP: It’s a very simple platform with a short questionnaire and can be done quickly. I think the inhibition threshold is very low here. Because it’s so easy to handle you quickly lose your fear of preregistration. That’s how I experienced it, it was so easy there wasn’t much I could have done wrong.
What reactions to your preregistrations did you get from the community?
FP: The reaction we got from the peer review was pretty good. Especially one reviewer gave very positive feedback. We processed the data analysis with Jupyter Notebook and this, in combination with the preregistration, got us very positive responses. I could demonstrate the output from the code directly. Readers could just walk through it. It was displayed in the correct format directly on OSF. I believe you can make a very good impression with this because it is not yet standard procedure.
You have also looked at data version control. Why is that a topic?
FP: The Covid-19 research has shown that people often publish their data in different waves. First publication with the dataset, second publication with an enlarged dataset. But in practice this is the exception. I think it is important for the reproducibility of my own project in my own environment, and that’s what I wanted to study. Big datasets produce new challenges. If you have several gigabytes’ worth of datasets and run them through certain pre-processing stages, for example if you have big text data and you want to run a text analysis and modify the text in certain ways, then you create a new version of the text corpus. For these pre-processing stages you need a lot of computing power and it is time-intensive. So you don’t always run the complete process from start to finish, but you do datasets in-between. If you don’t document carefully which source code led to which dataset, you’ll end up with a final dataset but you no longer know how you got there from the original dataset. For programming codes you often use GitHub where you can track all changes across time. But it’s not so easy with large datasets, because GitHub isn’t suited for that. And I looked which tools are available to track changes for such gigabyte-sized datasets.
Can you recommend a tool?
FP: I have run testdrives on all and in theory they worked. Data Version Control (DVC) has made a positive impression where you save metadata about the dataset itself. So the metadata are stored on GitHub, and changes can be tracked, e.g. by my co-author, via DBV pull so he gets the latest version of the dataset. But this is very complex and bulky and I cannot recommend it for routine work.
Do you publish posters or conference slides online?
FP: I used to be more hesitating about that, I returned or posted few things to the community proactively. But this has changed with Covid-19, since every presentation is recorded on principle and often made publicly available afterwards, at conferences for example. I uploaded my presentation for Open Science Day directly to my GitHub repository and made it openly available. So I share more now than I did before, because I can see the benefit. And thanks to the Open Science community I have lost some of my fear that you must always provide perfect stuff. The information can always be helpful even if it is not yet final. I am gradually losing my perfectionism, although it is not always easy.
Which benefit do you see?
FP: Right now I have acquired more followers because my presentation for Open Science Day was reported on Twitter, and I can present my own research findings to a larger audience. I also feel my work is appreciated if something is tweeted about it. It’s nice to see people are interested in it.
You collect your own research data, where do you publish them?
FP: For my new project, where everything is openly accessible, I publish them on Open Science Framework. There’s an integration with GitHub so you can actually link to your repository there and you can even upload the data yourself. That’s very convenient. In my paper I include a link to the repository, including sub-repositories for data, codes, survey materials etc, but I also link in the paper to certain JupyterNotebooks, so people who are interested can access the data much more easily.
Can you give tips to interested people what they should bear in mind?
FP: I found the fellowship programme very helpful, if only because of the many different technical terms which I fully comprehended for the first time. Now I have contact persons for different areas which I find helpful. My mentor, Tamara Heck, works a lot with OER. I can ask her advice now whenever Open Educational Resources are the subject. I believe it’s important to find a community. And as for the rest, just do it. Preregister something, let it run, submit it to a journal and wait for the response. I think that’s another way to figure this out.
What is needed to convince more economists of the benefits of Open Science?
FP: You need people to talk to on the ground at the institution. At University of Mannheim we have the Open Science Office and an Open Science Officer, who supports you and advises you if you have questions. That’s totally important. Before I publish my data openly, I can consult them. Advisors are essential. The other thing are incentives. At the Open Science Office of University of Mannheim Open Science Grants have been put out for tender, i.e. projects that would be funded. I received almost 5,000 Euros in grants that will help me implement my field studies.
Dr Doreen Siegfried conducted this interview.
The interview was conducted on September 24, 2021.
About Dr Florian Pethig
Dr Florian Pethig is a Business Information Systems Researcher in the area Information Systems at the Chair for Enterprise Systems at University of Mannheim. In his research he studies the societal effects of IT, data analaytics, and technology acceptance. He has received the Open Science Grant of the University of Mannheim and has been accepted for the Fellow Programme Free Knowledge of Wikimedia Deutschland.