When research data suddenly disappears

“Data sovereignty is first and foremost a question of decision-making rights” – Interview with ZBW Director Prof. Dr Klaus Tochtermann

Photo: Sven Wied

Scientific work today increasingly depends on the availability and accessibility of data, which rarely stops at institutional or national borders. This creates a new vulnerability. Geopolitical tensions, domestic policy shifts in third countries, and the logic of sanctions and export controls can, depending on the situation, restrict access to data, services or repositories. What was long considered reliable thus becomes political and subject to negotiation.

At the same time, the infrastructure landscape is changing. Centralised databases, indexing services, and cloud and platform ecosystems are unevenly distributed internationally. Europe has invested, but remains partly dependent on infrastructures that are operated or controlled outside Europe. Governance is crucial here. Who sets the rules for access and use, and decides on re-use, changes to metadata or depublication?

AI exacerbates this situation. Data is not merely research output, but also training material. The literature becomes input for automated analysis. This increases incentives for commercialisation and creates vulnerabilities, for example through manipulation, ‘poisoning’ or a flood of low-quality content.

Against this backdrop, data sovereignty has become a political priority. One of the key questions, alongside the aforementioned governance, is: how can science in Europe remain reproducible, verifiable and capable of taking action without abandoning trust-based international cooperation?

A prominent venue for this debate was a panel discussion held on 1 October 2025 at the European Parliament on data sovereignty in research (video recording: https://zbw.to/SUeCA). Prof. Dr Klaus Tochtermann was also present there in his honorary capacity as President of the European Open Science Cloud Association (EOSC-A). As Director of the ZBW, he combines practical experience of infrastructure with issues of governance, standards and integrity. We spoke to him about this perspective.

Photo: Sven Wied

What is the current state of play in the European debate on data sovereignty in research?

KT: We are currently experiencing a situation in which scientific work depends more heavily on external conditions than many have long assumed. Data access, repositories, search and indexing services, cloud and platform infrastructures are not merely technical building blocks, but part of an international framework of law, politics and economics. If conditions change in the US or China, this has immediate consequences for research practice here in Europe and in Germany. These consequences range from restrictions on availability to limitations on re-use and replication.

In this context, people often speak of ‘geopolitical vulnerability’ as the new normal. What is new about this? Haven’t we long been dependent on commercial providers?

KT: What is new is the combination of speed, scope and uncertainty. Decisions in third countries can alter access at short notice, whether through administrative measures, new legal interpretations or shifts in political priorities. For researchers and infrastructure operators, this means we can be less certain that a service, a dataset or an interface will still be available tomorrow under the same conditions as today. Planning becomes more difficult, even though research and infrastructure rely on continuity.

In concrete terms, what does Europe’s dependence on data infrastructures consist of?

KT: Central data infrastructures are unevenly distributed internationally. We are talking about large platforms, computing capacity, indexing and reference services, but also about certain specialised repositories. Europe has high-performance facilities. However, critical work processes often depend on services that are operated or managed outside Europe. This is not automatically a problem. An international division of labour is normal. But it poses a risk if there are no alternatives, no fallbacks and no clear rules in place.

What role does governance play here?

KT: Governance encompasses key questions such as: Who defines access rules? Who sets the terms of use? Under what conditions is re-use permitted, including automated re-use? Who can depublish content or amend metadata? What are the priorities for further development? These questions determine whether research remains reproducible. Dependencies often arise ‘invisibly’ – that is, not as technical disruptions, but through contractual terms, API restrictions, licensing models or proprietary formats.

At present, the international scientific community is exposed to several risk areas simultaneously, which also reinforce one another. Which interactions do you consider particularly relevant?

KT: Geopolitics influences the framework conditions, the economy influences access and incentives, and AI alters the speed and risk profile. As data becomes scarcer, the pressure to monetise it increases. When AI accesses this data, both its value and its vulnerability increase. At the same time, the system becomes more sensitive: even minor disruptions or manipulations can trigger major effects because automated analysis scales up.

If political intervention goes as far as the deletion, restriction or reinterpretation of data, what is the key damage caused?

KT: Apart from the immediate loss or short-term restriction, the damage to trust in science is central. If researchers have to expect that data sets are dependent on geopolitical or economic situations or political decisions, then the reliability of the entire scientific chain – from data collection, archiving and re-analysis right through to replication – declines. However, scientific work urgently requires stable references and traceable versions. If this becomes fragile, work processes suffer, and with them quality assurance. We must therefore build greater resilience into our infrastructures – that is, redundancies, mirroring or federated nodes, as exemplified by the EOSC, for instance. Furthermore, we need clear lines of responsibility and documented processes for crisis situations.

PubMed is a particularly well-known example of concentration risks. What is systemically critical about such cases?

KT: PubMed is a key infrastructural hub for the life sciences. If such a dominant hub were to disappear or be restricted, it would first create a painful gap and then, relatively quickly, give rise to a market for inferior alternatives, aggressive commercialisation or, in the worst case, fraud. It is like a company that is heavily dependent on a single major client. As long as everything remains stable, this concentration appears efficient. However, if this major client changes its terms or withdraws, a risk immediately arises that cannot be offset in the short term, because alternatives first need to be established and integrated. Resilience in business and science is achieved through diversification and robust fallbacks. To ensure that we do not end up building duplicate structures in an uncoordinated manner, priorities must, of course, be set. The aim is not parallelism at any cost, but rather a secure capacity to act.

Let’s talk about AI. With the introduction of AI into the working routines of the academic world, data also becomes a target for attack. What is the crux of the so-called ‘poisoning’ problem?

KT: If training or reference data is deliberately manipulated, AI systems can systematically produce incorrect results. This is particularly critical when AI is used in sensitive areas, such as medical diagnostics. In addition, there is the contamination of scientific literature and data repositories through the mass production of low-quality or fraudulent content. This is not just a quality issue, but an integrity issue. It can undermine trust and validity.

What does this mean for infrastructure?

KT: Security-by-design is becoming central. We are talking here about risk analyses along the data pipeline, mechanisms for detecting manipulation, versioning and provenance, as well as incident response plans. This is not ‘IT as a secondary task’, but IT as a core task of scientific practice. If data forms the basis for decisions and models, integrity and traceability must be safeguarded both technically and organisationally.

At the ZBW, the quality of metadata for scientific purposes plays a major role. Why is quality so important – including in the context of data sovereignty?

KT: Metadata – which should always be machine-readable – provides the context for research. How was the data generated? Under what conditions? What adjustments were made? Without a clear provenance and contextual description, reliable re-use is difficult. This applies to both replication and AI training. The FAIR principles provide guidance, but their implementation requires standards, resources and binding commitments. Otherwise, FAIR remains an aspiration that does not hold up in practice.

It seems we need a whole package of measures to achieve greater data sovereignty in Europe. What do you see as the common thread?

KT: Firstly: science is a global endeavour, and we do not wish to abandon international cooperation in principle. Isolation is not the answer. But we need greater resilience through diversification. We require our own governance capabilities and reliable infrastructure chains. Europe must make dependencies transparent, create alternatives and fallbacks, and design rules in such a way that controlled openness remains possible. In this context, controlled openness means that, on the one hand, we protect the data in the EOSC – which is primarily a European infrastructure for researchers from the Member States – and, on the other hand, make it internationally interoperable through negotiations with friendly nations, so that we can reach fair and binding agreements on data use. This does not correspond to the ideal of complete openness, but it is understandable and pragmatic from a science policy perspective.

What does data sovereignty as a governance programme mean in practice?

KT: Data sovereignty is, first and foremost, a question of decision-making rights. This gives rise to tasks such as joint governance models and clear lines of responsibility between Member States, the European Commission and institutions within the research system, which do not obscure interdependencies. Contracts and standards must be designed in such a way that risks are not only identified once access has been restricted.

What are your views on long-term funding?

KT: Yes, of course. That is fundamental! Infrastructure is an ongoing endeavour. Resilience requires redundancy, mirroring, federated nodes, security, standards, staff and operations. Project-based approaches and temporary funding streams are not sufficient for this. If Europe regards infrastructure as a strategic capability, funding must enable operations, security measures and further development on a permanent basis.

How do you see the issue of dependencies and data sovereignty developing over the next few years?

KT: Data sovereignty is not a state that is achieved once and for all. Data sovereignty is an ongoing governance issue. It is crucial to continuously identify and assess dependencies, particularly through ongoing risk analyses, diversification of providers and technologies, and binding standards. What is important to me here is that data cannot be viewed in isolation from the infrastructure. The EOSC is one such key European infrastructure. As a trustworthy, FAIR-compliant research infrastructure provided by European providers, the EOSC makes a significant contribution to securing data sovereignty for researchers in Europe in the long term.

Thank you very much!

The interview was conducted in April 2026.
This text was translated on 2 July 2026 using DeeplPro.

to Open Science Magazine