Faces Behind Features | Data Contracts in One Data

Science & Technology


Introduction

Hi everyone! My name is Christina, and I’m a Product Owner at One Data. Along with my team, we aspire to simplify the management of data products for everyone. In today’s video of Faces Behind Features, I will focus on data contracts. While there are many contributors to this feature, I have the pleasure of representing my team today.

What is a Data Contract?

A data contract is a formal agreement between the consumers and owners of data. It sets clear expectations regarding data structure, quality, and availability. Without these clear expectations, consumers often encounter issues such as outdated data or schema mismatches, which can lead to failing data pipelines and diminished trust in the data. Data contracts aim to address these challenges by defining essential building blocks.

Key Components of a Data Contract

  1. Fundamentals: This includes the basic information of our data product like its name, ownership, and location, as well as the data and its schema. It’s essential to define what the data columns will do and the corresponding data types to prevent failing pipelines.

  2. Data Quality: Collaborating between consumers and owners is crucial for extracting business value from data quality measures, which can be defined in the data contract.

  3. Stakeholders: Pricing, data governance, and SLAs (Service Level Agreements) can also be included in the data contract based on the specific needs of your data and stakeholders.

At One Data, we continually monitor the market to ensure long-term compatibility and relevance.

Demo of Setting Up a Data Contract

Let’s dive into a short demonstration within One Data to show how easy it is to set up a data contract. I received a request from my colleague Sasha regarding credit applications. With clear requirements in mind, I moved my data product into development, which automatically created the data contract.

The data freshness is pre-filled based on the request, and the schema is also pre-populated to save time. I can modify these details or add specific purposes and limitations to inform consumers about the data product's requirements.

Providing access information is crucial, as it establishes transparency about where the data product is stored. While Sasha could ask for access, knowing where it is stored fosters greater trust in the data.

In this demo, I also have the option to set a price for using my data product. For instance, if I wish to charge another department, I can do so; however, in this case, it’s free for everyone in the company.

Furthermore, I can manually validate the data freshness. The contract's validity will be based on set values—if the data wasn’t updated within the last 24 hours, it will be flagged as invalid. I can see the last updated timestamp in the header.

When validating the schema, if I forget to include crucial columns, I will receive an error message. I can either ignore them (if they aren’t critical) or easily add the required columns directly into the system, which simplifies the process.

Once everything is validated, I can publish my data product. The validation occurs automatically, and notifications keep me informed if anything goes wrong. The health batch visible in the marketplace overview also shows whether the quality and the data contract are valid or not.

Conclusion

In summary, a data contract significantly enhances data quality. It allows for the definition of various checks for freshness, schema automation, and data payload requirements. Additionally, it promotes better collaboration by clarifying who to approach for issues or enhancements related to the data product. Ultimately, data contracts foster trust in data, making it easier to drive data-informed decisions. For more information about One Data, please visit w.ai and take the interactive product tour.


Keywords

Data contracts, data quality, schema, stakeholders, data governance, SLAs, collaboration, data freshness, data-driven decisions, data products.

FAQ

1. What is a data contract?
A data contract is a formal agreement between data consumers and owners that outlines expectations for data structure, quality, and availability.

2. Why are data contracts important?
Data contracts help prevent issues like outdated data and schema mismatches, fostering trust in data and improving decision-making.

3. What are the key components of a data contract?
Key components include fundamentals (basic info and schema), data quality, and governance aspects, like SLAs and pricing.

4. How can I set up a data contract in One Data?
You can easily set up a data contract by moving your data product into development, where the contract is automatically created and pre-filled based on request details.

5. How do I validate a data contract?
Validation occurs manually or automatically, where notifications alert you to any issues, ensuring data quality before publication.