Skip to main content

Data Contract

Overview

A data contract is a promise made by a data producer towards data consumers. The latter accepts this promise, transforming it into a contract.

Imagine you're ordering a pizza. You're the data consumer and the pizza place is the data producer. A data contract is like the menu:

  • it tells you what kind of pizzas (data) are available
  • it lists the toppings and sizes (data properties)
  • it might mention things like "fresh ingredients", "gluten free", or "30-minute delivery" (data guarantees)

When you place your order, you're agreeing to a specific pizza and size (data subscription). The pizza place guarantees they'll make the pizza according to the menu (Service Level Agreement - SLA). This way, you know what to expect and they know what to deliver.

Data contracts work similarly but for data instead of pizza. They establish a clear agreement between those who provide data (data producers) and those who use it (data consumers). This makes data exchange more reliable and predictable. Here's a breakdown in simpler terms:

  • Data: the information being exchanged
    Example: daily transaction records from credit card processing
  • Data Producer: prepares and provides the data
    Example: the payments department within a bank, responsible for processing and sharing transaction data with other teams
  • Data Consumer: who wants to use the data and thus subscribes to the data contract
    Example: the risk management team, which uses transaction data to monitor for unusual activity and detect potential fraud
  • Data Properties: these are details about the data, like its schema, structure, and quality
    Example: the transaction data schema includes fields such as transaction_id (string), amount (decimal), transaction_date (date), merchant_category (string), and customer_id (string). Data quality checks specify that amount must be positive, transaction_date should be within the last 24 hours, and merchant_category must match a predefined list
  • SLA: a formal set of guarantees that outline the level of service you can expect
    Example: transaction data will be delivered within 15 minutes after each hour. 99.9% availability. Less than 0.5% missing or inaccurate records per month. Response time of 30 minutes for any data delivery issues.

A data contract helps ensure that both the data provider and the data consumer are on the same page regarding what data is being shared, how it can be accessed, and what quality standards it should meet. It brings clarity and trust to the process of exchanging data.

Data Contract Guardians

A data contract guardian is a controller (typically an automated system) that oversees data flows to ensure the data contract is consistently met.

If the guardian detects any contract violations — like missing fields or delayed delivery — it may trigger alerts, reject the data, or take corrective actions to keep data reliable for consumers.

Data Contracts in Witboost

Witboost provides comprehensive capabilities to:

  • define, validate, and seamlessly manage the evolution of data contracts
  • deploy data contracts and their guardians on the target infrastructure
  • gather monitoring insights and alerts from data contract guardians, fully integrating with the Witboost Computational Governance
  • instantly notify data producers and consumers of any data contract violations
  • track data contract status and issues within the Witboost Marketplace

The following sections of the documentation will delve deeper into each topic, using a sample data contract as a guiding reference:

Data Contract Example

The diagram represents a data flow setup between a Producer System and one or more consumers with a data contract governing the process.

Producer System:

  • Data Ingestion Workload: periodically ingests data from a source system (e.g., CDC on a transactional database, ETL pipeline, ...) and produces messages for the landing topic
  • Data Contract:
    • Landing Topic: message queue, entry point for the data contract
    • Data Contract Guardian: a workload that assesses each message against the data contract criteria and routes it based on compliance:
      • Non-Compliant Topic: stores messages that do not meet contract standards
      • Compliant Topic: stores messages that pass contract validation and makes them available for consumption

The data contract ensures that only compliant data reaches the consumers, while non-compliant data is isolated, creating a controlled and validated data pipeline.

Notifications

In the event of a data contract violation, is essential to notify all the stakeholders (both consumers and producers). This ensures that everyone impacted is informed and can take necessary actions.

The Witboost notification service can be configured by your platform team to send push notifications, trigger webhooks, send emails, and more when a violation is detected.

info

You may already be familiar with notifications triggered by policy violations in the Witboost Computational Governance. As outlined in previous sections, a data contract violation is effectively a policy violation triggered by the data contract guardian.

Producer Notifications

If a violation occurs on a data contract that you own or develop, you may receive a notification.

Your platform team can configure:

  • whether notifications should be sent to you as an owner, or as a developer
  • the environments where this kind of notifications are enabled (only production by default)

Consumer Notifications

Consumers of a data contract will also receive notifications in the event of a violation.

Your platform team can configure:

  • whether the notification should be sent to owners of systems that read from the data contract, or to members of the data contract’s access control list (ACL)
  • the environments where notifications are enabled (only production by default)