SRG

A new data platform: more efficient, more networked, future-proof

SRG works in five business units whose editorial teams generate and exchange metadata on media such as radio and television programs or articles for the website. Existing interfaces were created for the data exchange of specific use cases and are often difficult or impossible to use for other purposes, such as comprehensive analyses. A central data platform should simplify this exchange through a standardized interface and a consistent data model, transform unstructured data and enable data to be provided in a timely and needs-based manner.

Connecting teams, data and systems

In SRG’s five business units, teams such as the editorial teams work autonomously, but also together. On the one hand, these teams generate publication-relevant data, but also process data from other teams. A large number of systems are in use that continuously generate metadata during the media process.

Thanks to Panter's expertise in software engineering and solution architecture, the proof of concept was successfully developed into a resilient and scalable data platform.

Joël SchmidCo-Lead Daten & KI, SRG
A laptop and a monitor display a website with a red and white theme, akin to the sleek design of DriveMyCar. The site shows search results for Künstliche Intelligenz, featuring a graph and articles. The backdrop is a gradient red, mirroring the sites dynamic energy.

Why data analysis involves a great deal of effort

Data exchange between the teams is complex. Interfaces are created bilaterally between the teams for a specific data exchange and can sometimes only be used to a very limited extent for other applications. The data landscape is very heterogeneous and the interfaces and data structures are sometimes different depending on the system and process, making evaluations for organization-wide analyses very time-consuming.

From isolated systems to a central data source

With a central data platform, publication-relevant metadata from all company units is to be made available company-wide via a standardized interface and a standardized data model.

Data producers should be able to feed in unstructured data that is transformed by the platform into a standardized data model. This significantly reduces the effort required to provide data.

The platform makes it possible to correlate related data from different source systems in order to facilitate comprehensive evaluations. In addition, the platform can provide data both event-driven in real time and on request to data consumers.

Everything at a glance: Publication metadata more accessible than ever

We have achieved the following with the developed data platform (Publication Data Platform, PDP for short):

  • Increased findability and an improved search experience for all SRG content on digital platforms
  • Publication metadata from various sources, e.g. the metadata of the Tagesschau main edition of 3.10.2024 in the archive, can be linked and enriched with the metadata of the same Tagesschau main edition on SRF Play
  • Quick and easy access to publication metadata from various sources to support audience research, AI enrichment and big data analytics
  • Access to all available publication metadata for internal and external users, third parties and B2B partners
  • Supra-regional data governance and reporting to the supervisory authorities instead of various language-regional solutions
  • A level of security that meets the SRG standard in terms of confidentiality, availability and integrity

The technology behind the new data platform

The PDP was primarily developed to standardize unstructured data from different company units and make it available centrally. The heterogeneous data first enters the system via REST APIs, where it is stored in MongoDB. It is then transmitted via Kafka messages to data extractors, which transform it into a standardized data model. The data is made available to consumers either via Kafka or a REST API. This creates a consistent database that supports a wide range of use cases and simplifies company-wide evaluation.

  • Kafka serves as a central messaging system for real-time data processing and enables reliable and asynchronous exchange between the various data consumers and the platform.
  • MongoDB is used as a NoSQL database to store unstructured and structured data and make it available for further processing steps.
  • Quarkus, an optimized framework for cloud-native applications, forms the basis for the development of REST APIs that serve as a uniform interface to the platform. These APIs enable easy access to the data for the various business units.
  • We use various services in the AWS Cloud to operate the infrastructure of the entire data platform:
    • EKS (Elastic Kubernetes Service) hosts the platform’s services and ensures scalability and reliability.
    • S3 is used to store large amounts of data, especially raw data supplied by the standard systems without interfaces. SNS and SQS are used to notify the data platform when files in S3 change.
    • OpenSearch is used for full-text search and data analysis to enable fast and targeted queries.
  • Kotlin is the primary programming language used to implement the data platform, with a Proof of Concept (PoC) conducted in Scala.
A man with short dark hair and glasses is wearing a navy blue button-up shirt. He gazes directly at the camera against a black background, as if ready to update his profile on the AEK website.

Flurin Capaul, VP Clients

Interested in working with us?

Contact Flurin and get support from a partner with many years of experience.