A new data platform: more efficient, more networked, future-proof
SRG works in five business units whose editorial teams generate and exchange metadata on media such as radio and television programs or articles for the website. Existing interfaces were created for the data exchange of specific use cases and are often difficult or impossible to use for other purposes, such as comprehensive analyses. A central data platform should simplify this exchange through a standardized interface and a consistent data model, transform unstructured data and enable data to be provided in a timely and needs-based manner.
Connecting teams, data and systems
In SRG’s five business units, teams such as the editorial teams work autonomously, but also together. On the one hand, these teams generate publication-relevant data, but also process data from other teams. A large number of systems are in use that continuously generate metadata during the media process.
”Thanks to Panter's expertise in software engineering and solution architecture, the proof of concept was successfully developed into a resilient and scalable data platform.
Joël SchmidCo-Lead Daten & KI, SRG
Why data analysis involves a great deal of effort
Data exchange between the teams is complex. Interfaces are created bilaterally between the teams for a specific data exchange and can sometimes only be used to a very limited extent for other applications. The data landscape is very heterogeneous and the interfaces and data structures are sometimes different depending on the system and process, making evaluations for organization-wide analyses very time-consuming.
From isolated systems to a central data source
With a central data platform, publication-relevant metadata from all company units is to be made available company-wide via a standardized interface and a standardized data model.
Data producers should be able to feed in unstructured data that is transformed by the platform into a standardized data model. This significantly reduces the effort required to provide data.
The platform makes it possible to correlate related data from different source systems in order to facilitate comprehensive evaluations. In addition, the platform can provide data both event-driven in real time and on request to data consumers.
Everything at a glance: Publication metadata more accessible than ever
We have achieved the following with the developed data platform (Publication Data Platform, PDP for short):
- Increased findability and an improved search experience for all SRG content on digital platforms
- Publication metadata from various sources, e.g. the metadata of the Tagesschau main edition of 3.10.2024 in the archive, can be linked and enriched with the metadata of the same Tagesschau main edition on SRF Play
- Quick and easy access to publication metadata from various sources to support audience research, AI enrichment and big data analytics
- Access to all available publication metadata for internal and external users, third parties and B2B partners
- Supra-regional data governance and reporting to the supervisory authorities instead of various language-regional solutions
- A level of security that meets the SRG standard in terms of confidentiality, availability and integrity
The technology behind the new data platform
The PDP was primarily developed to standardize unstructured data from different company units and make it available centrally. The heterogeneous data first enters the system via REST APIs, where it is stored in MongoDB. It is then transmitted via Kafka messages to data extractors, which transform it into a standardized data model. The data is made available to consumers either via Kafka or a REST API. This creates a consistent database that supports a wide range of use cases and simplifies company-wide evaluation.
- Kafka serves as a central messaging system for real-time data processing and enables reliable and asynchronous exchange between the various data consumers and the platform.
- MongoDB is used as a NoSQL database to store unstructured and structured data and make it available for further processing steps.
- Quarkus, an optimized framework for cloud-native applications, forms the basis for the development of REST APIs that serve as a uniform interface to the platform. These APIs enable easy access to the data for the various business units.
- We use various services in the AWS Cloud to operate the infrastructure of the entire data platform:
- EKS (Elastic Kubernetes Service) hosts the platform’s services and ensures scalability and reliability.
- S3 is used to store large amounts of data, especially raw data supplied by the standard systems without interfaces. SNS and SQS are used to notify the data platform when files in S3 change.
- OpenSearch is used for full-text search and data analysis to enable fast and targeted queries.
- Kotlin is the primary programming language used to implement the data platform, with a Proof of Concept (PoC) conducted in Scala.
Flurin Capaul, VP Clients
Interested in working with us?
Contact Flurin and get support from a partner with many years of experience.