A data contract refers to an agreement or specification that defines the structure, format, and semantics of data exchanged between different systems, applications, or components. It serves as a mutual understanding between parties involved in data exchange ensuring that data is transmitted and interpreted correctly. (View Highlight)
While contracts encompass agreements, specifications, and various structural aspects, their true value lies in their ability to be validated.
Verifiable elements include schemas, column-level data checks, and operational service level agreements (SLAs that can be programmatically checked and enforced. (View Highlight)
A Data Contract is an agreement between a producer and a consumer that clearly defines
• what data needs to move from a producer/source to a consumer/destination
• the shape of that data, its schema, and semantics
• expectations around availability and data quality
• details about contract violation(s) and enforcement (View Highlight)
End-to-End Implementation of Data Contracts in DataHub
The implementation of Data Contracts in DataHub is designed so that:
• Data producers can author data contracts as YAML files and store them in version control systems like Git.
• These contracts can then be deployed to DataHub, which acts as a repository for contracts and their associated assertions
• Business users can use DataHub to access and edit/update the Data Contract.
• Existing data quality tools can evaluate these assertions and report the results. (View Highlight)
Going back to the verifiability aspect of Data Contracts, key data elements, such as documentation, ownership, and tags, lack verifiability, but we know how incredibly important they are in the context of the data ecosystem. (View Highlight)
And it’s this focus on both verifiable and non-verifiable metadata that anchors DataHub’s approach to Data Contracts. Data Contracts in DataHub integrate with Data Products for a holistic approach to managing data assets. Here’s how. (View Highlight)
Data Products in DataHub represent collections of assets combined together in a concept for you to manage and maintain. They have owners, tags, glossary terms, and documentation. (View Highlight)
Data Contracts are the verifiable aspects stated and enforced on individual data assets, that cover schema-related aspects, service level agreements (SLAs), data freshness, and data quality. (View Highlight)
With DataHub, you can combine the verifiable (via Data Contracts) and the descriptive, non-verifiable (via Data Contracts) elements to create a curated metadata graph. (View Highlight)
In the near future, to streamline the management of Data Products and Data Contracts, you can use the same YAML file to define both Data Products and Data Contract specifications – allowing them to be managed as a unified definition. This approach ensures that both documentation and schema assertions can be maintained as code, satisfying the needs of different stakeholders. (View Highlight)
While DataHub serves as the foundation for Data Contracts, Acryl’s managed DataHub version provides the advanced tools and capabilities you need to manage them at scale. This includes: (View Highlight)
• An inference engine to generate proposals for Data Contracts
• Approval workflows for data producers and consumers, and
• Enforcement mechanisms for data contracts. (View Highlight)