Data Spaces

This article provides a high-level definition of data spaces and summarizes related key research aspects. We raise fundamental questions about data spaces and search for answers in the literature.

AI*P: This literature research was conducted manually, and Perplexity was then used to check whether we had overlooked any important aspects.

What is a Data Space?

The goal of data spaces, as introduced in Franklin et al., 2005 [1], is to provide base functionality over various data sources without the need for dedicated data integration and data exchange systems. Nonetheless, it should offer the tools to create tighter integration of data sources as necessary. Therefore, a Data Space Support Platform (DSSP) must support all the data in the data space, offering integrated means of searching, querying, updating, and administering the data space. However, the DSSP is not in full control of its data and may only return best-effort answers, as the underlying systems may also be modified through other interfaces or generally unavailable.

While this initial definition provides abstractions for data management across organizations with bilateral contracts, the concept of data spaces evolved to today’s comprehensive infrastructures for sovereign data exchange. According to the CEN Workshop Agreement 18125 from July 2024 [4], a data space is a framework that enables data transactions between participants, ensuring interoperability and trust between participants based on common policies and enabling services. Therefore, modern data spaces extend beyond data management for heterogeneous data sources [3].

The roles of participants in a data space are generally divided into data providers and data consumers. However, note that these roles are not mutually exclusive, meaning that a participant can take on both the role of provider and consumer. The concept of data exchange in the data space can be summarized in three steps:

  1. The data consumer searches the data catalog, which was published in the data space by the data provider and includes meta information about its data as well as respective usage policies, to find relevant data.
  2. The data provider and the data consumer establish contracts within the data space regarding the data exchange.
  3. The data provider transfers the data, as agreed in the contracts, to the consumer.
High-level conceptual overview of the participant roles in a data space.

Why do we need Data Spaces? (Problem formulation)

As numerous processes become increasingly data-driven, the demands of “data everywhere” expand rapidly. Since data is typically available on various data sources with limited integration across systems, the challenge emerges of managing and accessing large amounts of distributed data consistently and efficiently [1, 2]. Fitting these heterogeneous data collections into a single data model or system is considerably difficult. Traditional data management approaches address this challenge by integrating each data source individually, involving continuous development workload as new data sources are added. The initial concept of data spaces [1] overcomes this development burden by identifying the scope and control to provide principled management functionality across the underlying systems. With the evolvement of the data space concept, various additional challenges emerged, including data sovereignty, security,

TBC

What does a Data Space entail? (Components and services)

What is the difference to traditional Data Management Systems?




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Gradient Evolution
  • Tensorflow Tutorial: Load Custom Image Dataset
  • Tensorflow Federated Tutorial
  • Zsh and Oh-My-Zsh Installation Guide
  • Python Virtual Environment