Qualitative business data opened up for subsequent use: A pilot test

Open Science experiences of Maximilian Heimstädt and Lukas Daniel Klausner

Photo collage: ZBW

The three key learnings:

Include the costs for reusability in the research proposal, including the anonymisation work by a student assistant, for example. Allow sufficient time for this process and obtain the consent of the interviewees in good time.
The use of secondary data offers a very good opportunity to build a committed research community that is interested in your own research topic and to lay the foundation for fruitful collaborations. Sharing data also gives students the opportunity to use fresh data on current topics to learn coding or categorisation.
An efficient approach to data collection is recommended. For example, less relevant parts of the interviews or biographical information could be removed from the transcripts and documented in the contextual information instead.

To the study report: https://doi.org/10.26092/elib/2555
To the data record: https://doi.org/10.1594/PANGAEA.961744
To the research data centre: https://www.qualiservice.org/en/

Can you please briefly summarise and present the research project?

Maximilian Heimstädt: Our research project, funded by the Hans Böckler Foundation, investigated an up-and-coming industry: providers of algorithmic strike prediction, known as “predictive risk intelligence”. These service providers offer large companies support in managing complex supply chains by predicting future disruptions from data available online. Of particular interest to us is that they aim to predict not only natural disasters, but also social events such as strikes and protests. However, this can have potentially problematic implications for workers if decisions are made based on such predictions, which could put them under pressure or even put them out of work.

What was your specific research question?

Maximilian Heimstädt: It was more of an exploratory project. Our aim was to understand how this new industry works and to find out to what extent various interest groups, in particular works councils and trade unions, already have knowledge of these processes and are thinking about how they can position themselves in this regard. After all, these technologies can certainly be used in the interests of employees if they are involved in the implementation processes at an early stage.

Lukas Daniel Klausner: The thought was also a bit, how could this possibly be reversed? How could – theoretically – trade union organisations and other employee interest groups also use this? In the course of the project, it turned out that the companies’ actual technical capabilities may not match their claims and promotional materials. As a result, the focus of our research has shifted somewhat towards the aspects that are actually changing as a result of these technologies, and we have looked more closely at the issues of transparency practices that arise as a result.

Maximilian Heimstädt: The focus has also shifted somewhat due to the fact that the Supply Chain Duty of Care Act has become much more important during our research project. It came into force in Germany on 1 January 2023, and in particular required companies to make their supply chains transparent. As a result, many of these software providers have adapted their “framing” and no longer talk primarily about predictions, but emphasise that they can now use algorithmic processes to create transparency about working conditions in supply chains. They have recognised that this is a convincing argument. They are now positioning themselves as service providers that help companies to fulfil the requirements of the Supply Chain Due Diligence Act.

What specific data did you collect exactly? Could you give us an overview of the data you worked with?

Lukas Daniel Klausner: Firstly, we analysed freely available text materials and promotional materials provided by these companies, such as white papers, policy reports and their own presentations on websites or other platforms. In addition, we attended workshops and seminars organised by these companies, mainly for promotional purposes. The core of our dataset consists of 31 interviews we conducted with representatives of different stakeholder groups, including the providers of these services, potential or actual customers, as well as representatives of the critical public, such as NGOs, labour and advocacy groups. People with legal expertise in the field of supply chain due diligence were also interviewed, as well as academics who have worked on the topic. The interviews were mainly conducted by a student assistant, Sandrine Faißt, and myself in tandem.

What was your approach in this regard? Did you conduct the interviews first and then ask for permission to publish them afterwards, or did you communicate openly from the outset that the interviews were to be published?

Lukas Daniel Klausner: We clearly communicated right from the start that we needed a form of consent, as without this we would not be able to use the interviews for our research. We also indicated in advance that we would like to make the data available for subsequent use and asked the interviewees whether they could imagine agreeing to publication. In general, none of the interviewees refused to cooperate from the outset. However, in the end, only 18 out of 31 interviewees signed the continuing after-use agreement. There is a possibility that we could have obtained a few more signatures if we had asked more persistently. However, some people, especially from business and industry, had strong concerns about the possible reuse of the data, even if it was anonymised.

Maximilian Heimstädt: In any case, this negotiation of re-use is something that could have been approached differently. I suspect that other researchers who have already dealt with the re-use of interview data had the agreement for this signed before the interview. For our research project, it seemed best to conduct the interviews first and then ask for authorisation for re-use later. In this way, we were able to ensure that the interviews would take place either way.

Will both the audio or video file and the transcript be published, or will only one part be published?

Lukas Daniel Klausner: Only the transcripts are made available for subsequent use. The anonymisation process was quite complex. It was not just about removing names and explicit company references, but also about meta-information, such as a person’s professional experience. In the first anonymisation process, we overlooked a few places where, for example, specific periods of time were still mentioned. The Qualiservice of the University of Bremen, with whom we worked on this, then pointed out to us that this information could still allow conclusions to be drawn about the identity. Therefore, this information also had to be removed.

Maximilian Heimstädt: We have inserted square brackets in the places where sensitive information was included. These contain a standardised marking for anonymisation, followed by the relevant information. The institutions are anonymised, but it is still clear which area is represented. For example, instead of a specific role in a specific NGO, “work in the civil society sector” is indicated.

Lukas Daniel Klausner: “Through our cooperation with [eco-social alliance 1], we have made further contacts and got involved in [eco-social alliance 2],” he says.

Maximilian Heimstädt: As it was our first attempt to reuse qualitative data, we would not have been able to achieve this anonymisation on our own. The Qualiservice research data centre supported us in this. We hired a student assistant to carry out the anonymisation. The research data centre provided us with a software tool and offered instruction to ensure that the anonymisation met the high quality standards.

How would you describe your data?

Lukas Daniel Klausner: Semi-structured qualitative interviews.

Maximilian Heimstädt: You could probably describe it at the level of a specialism or an industry. We didn’t just talk to employees of one company or one particular professional group, but rather tried to interact with people who are either already working in this field or are otherwise involved in this emerging industry.

How did you have to prepare the data? Did you create the transcripts, anonymise the names and edit the contextual information, as you have already explained? Are these now stored as text files at Qualiservice in the Research Data Centre?

Maximilian Heimstädt: Exactly!

You mentioned that you had good cooperation with the Qualiservice research data centre in Bremen and that you did not carry out the anonymisation yourself. Can you estimate how extensive and labour-intensive this process was?

Maximilian Heimstädt: We already budgeted for the costs of subsequent utilisation in the research proposal. It was clear that one of the aims of this project was to make qualitative data accessible via the research data centre – pioneering methodological work in business administration. We therefore obtained a quote from the Research Data Centre beforehand, which was around EUR 6,000. Prices can be negotiable, depending on the budget. These costs essentially covered the research data centre’s working hours. However, the actual anonymisation work was not included, only their support and guidance. In addition, we had planned for a student assistant in the research proposal to support us with the actual anonymisation. This was a time-consuming task, as none of us had any previous experience with anonymisation. We first had to familiarise ourselves with the anonymisation system and then obtain consent from the interviewees. This process can be quite lengthy. One hour of already transcribed audio material required an additional two to three hours of work time for anonymisation by the student assistant.

This could be seen as a tip for others in the field of business administration: Apply for funding for counselling services and the employment of a student assistant in your third-party funding application.

Maximilian Heimstädt: The application for research funding from the Hans Böckler Foundation explicitly asks about the plans for the subsequent usability of the data. Presumably, projects that collect quantitative data have been the main target group so far. However, we wanted to try out whether this would also work with qualitative data. That’s why we explicitly applied for funding for this. It is important to note that the cost structure for qualitative research has changed, particularly with regard to transcriptions, which are now much more affordable thanks to free tools. Qualitative research projects therefore do not necessarily have to become more expensive, even if reutilisation adds a new cost block.

How did you decide on the Research Data Centre in Bremen?

Maximilian Heimstädt: The German Data Forum (RatSWD) has a list of certified research data centres. I think there are around 40 certified centres. These centres usually have a thematic focus so that you can find out which one is best suited to your own research. In the end, there were two centres that were shortlisted for us. There are only a few centres that deal with qualitative data. A colleague from sociology, Isabel Steinhardt, had already had experience with Qualiservice, so we followed her example.

What do you hope to gain from this reutilisation?

Maximilian Heimstädt: In an ideal world, people would actually use our data. It would be fantastic if someone was interested in the same topic, used our data set, possibly extended it in new directions and a small but dedicated research community emerged around the topic of algorithmic prediction and co-determination. I think this could be the foundation for wonderful collaborations if people would simply use our data. I would also like our data to be used for studying and teaching. Especially for training in qualitative research, it would be very useful if students, who often don’t have the resources to collect their own primary data, could use fresh data on current topics for learning coding or categorisation. So I think it would be great if students or project groups could request and use the data.

How exactly does this work? Do you have the option of specifying that the data may be used for scientific purposes, but not for teaching purposes? Is there something like a selection process where you can indicate your preferences?

Maximilian Heimstädt: That’s a good question. The current procedure is that interested parties visit the Qualiservice website and can view a list of around 40 datasets. After reading a brief description, they can contact Qualiservice and explain their intentions, for example by saying: “I am a researcher at institution X and am working on project Y. Can I please have access to the data? Can I please have access to the data?” Once this declaration has been made, that is essentially all that needs to be done. A user agreement is then signed, and then the data is released. In our case, the data is made available directly for download. Qualiservice also offers solutions for more sensitive data where access may be restricted, for example by only allowing it to be viewed at a specific computer workstation in Bremen.

Lukas Daniel Klausner: Communication between primary and secondary data users only takes place indirectly, for example when we realise that our work is being cited and we see that colleagues in Frankfurt, Zurich or Erfurt are using our data for their research. This may lead to an exchange. With regard to my motivation, I would like to add that the philosophical idea of open science, open access and open data also plays a major role for me. I believe that research should be accessible and comprehensible, just like I teach my students. I explain to them how research should be, and then I show them how it often really is, unfortunately. However, at least where I have influence and with the resources from my project, I try to fulfil these ideals to the best of my ability.

Maximilian Heimstädt: Yes, I agree. We wanted to show that it is possible not to hide behind closed doors. At the same time, I’m against demonising qualitative research just because it doesn’t make its data public, whereas quantitative researchers do. It’s not right to lump everything together. Research is very diverse and you can’t apply the same standards and principles to everything. Nevertheless, I think it is important to test how openness can be implemented individually for each research practice and culture. We wanted to set an example and simply try out what openness could look like for us. It is important to note that our data does not necessarily meet the strict criteria of an openness definition. According to the common definition on the web, which is often seen as the gold standard, there is still a barrier to accessing the data, as interested parties have to declare themselves and sign a user agreement. According to this understanding, it is not really open, but “only” ready for re-use. Nevertheless, I would say that this is a practicable form of openness in the field of qualitative research.

Lukas Daniel Klausner: It’s a step up from the usual approach, where data is only available on request from the original researchers themselves. My experience shows that “data available on request” often means that it is simply not available at all. However, by involving a research data centre, there is a certain reliability that the data is actually available when requested.

What are your experiences with data protection and research ethics?

Maximilian Heimstädt: Qualiservice specifies the degree of anonymisation required for the data to be made available for subsequent use. Personally, I am in favour of social science research projects in Germany not having to be assessed by an ethics committee. Researchers should be able to approach ethics committees if they need advice, but it should not be obligatory to have an ethics check before every study or data collection, as is the case in the UK or the USA.

What potential do you see for re-utilising or analysing your business data?

Lukas Daniel Klausner: I think we have systematically asked some questions that we will no longer analyse specifically, as our research interests have shifted somewhat over the course of the project. Particularly in the area of labour relations or, in general, change and transformation processes in companies or industries, we believe there may be findings that we are not currently researching or pursuing.

Maximilian Heimstädt: We come from organisational and technology research and normally have little to do with the field of sustainability management. Nevertheless, our data can provide insights into issues such as sustainability in supply chains and the associated governance mechanisms. Although we do not currently utilise these aspects, we have decided to release the dataset now in advance of publishing our own research. We think it is unlikely that someone will randomly request the data and use the exact parts we are currently working on.

There are indeed studies that show that researchers often hesitate to publish their data until they have analysed every last bit of it. They wait until they really don’t know what else to do with the data or what other questions they could investigate. There is a great fear that someone else will be faster than you.

Maximilian Heimstädt: Yes, I think this fear, especially when transferring qualitative data, is personally somewhat irrational. The advantages of showing what topic you are working on and possibly coming into contact with other people far outweigh the minimal risk of someone selecting a specific sentence that you would like to quote yourself. The danger of “scooping”, which may exist in experimental research, does not exist in this sense in qualitative research. However, I believe that many qualitative researchers feel this fear, although in my opinion it is not justified.

You are pioneers in two respects – firstly because you dared to take the step at all and secondly because you did it so early. At least from my point of view, that is quite remarkable.

Maximilian Heimstädt: That’s right, and we can also say that we are in a comfortable position. We are working together with Leonhard Dobusch, who applied for the project, even though he was not directly involved in the data collection and is therefore not mentioned by name in the report on the research data. We all already have PhDs or even full-time positions and have several projects running in parallel. It is also not the central dissertation project of any of us, which could give rise to the feeling that a lot depends on it and nothing can go wrong. We were of the opinion that we could take the risk.

Last question: Would you do it again?

Maximilian Heimstädt: Yes, but next time I would approach it a little differently with the knowledge gained from this pilot project. For example, we could remove parts of the interviews that are less relevant to the topic or contain primarily biographical information from the transcripts and only note this in the contextual information. This would save us a lot of time and work in the anonymisation process. I believe we could organise our work processes more efficiently as a result.

Lukas Daniel Klausner: Yes, it’s not just a question of efficiency, but also from an ethical and data protection perspective, it would actually have been necessary to act more in the spirit of data minimisation. During research, we should consider whether certain information is really necessary. We should then not record or transcribe it in the first place. From a positive point of view, there is actually a clear synergy between the principle of data minimisation and the simplicity of subsequent use.

Maximilian Heimstädt: Basically, what we have described is the same as with any other qualitative research method. The first time you do ethnography as a doctoral student, you tend to write everything down and are on the verge of drowning in data. But you have to learn to be selective. It’s similar here. We thought we had to record everything in order to be transparent and allow our data to be reused. But you have to be more pragmatic. If you integrate these considerations into the research process, we would do it again.

I think “pragmatism” would be an appropriate heading. It’s about observing data minimisation, finding cooperation partners and using outsourcing where possible. This makes it easier to share qualitative data.

Thank you very much!

The interview was conducted on 1 February 2024 by Dr Doreen Siegfried.

About Maximilian Heimstädt:

Maximilian Heimstädt is an organisational researcher and currently works as a senior academic advisor at Bielefeld University. He also heads the research group “Reorganisation of Knowledge Practices” at the Weizenbaum Institute in Berlin. From April 2024, he will take up a professorship in business administration, in particular digital governance and service design, at Helmut Schmidt University/University of the Federal Armed Forces Hamburg.

Contact: https://heimstaedt.org/

ORCID-ID: https://orcid.org/0000-0003-2786-8187

LinkedIn: https://www.linkedin.com/in/maximilian-heimst%C3%A4dt-66263427/

About Lukas Daniel Klausner:

Lukas Daniel Klausner is a mathematician and critical computer scientist. He received his doctorate in set theory from the Vienna University of Technology. As a researcher at the St. Pölten University of Applied Sciences, he investigates the interactions between society and technology in an interdisciplinary manner. His work combines precise mathematical and algorithmic thinking with a holistic approach and an understanding of the social and ethical issues raised by technological innovation. His research interests include predictive technologies, marginalised communities, critical perspectives on correctness, media consumption and production, game studies, and the social embeddedness of machine learning and artificial intelligence. Together with Paola Lopez, he founded the AK MatriX, the working group for trans- and interdisciplinarity in mathematics.

Contact: https://l17r.eu/

ORCID: https://orcid.org/0000-0003-3650-9733

LinkedIn: https://www.linkedin.com/in/lukas-daniel-klausner-4aa484249/

to Open Science Magazine