Who has control of your data pipeline ?

Mona Rakibe
4 min readApr 18, 2022
Data Pipeline needs control plane

Data Observability has been the talk of the data community for the last couple of years, but often gets treated as a pure alerting system. In this blog, I want to share why data observability is not the bystander of your data pipeline but the most crucial part of the data pipeline. It’s the goddamn data pipeline control-plane!

Let’s start with our hero, the Data Metric. A Data metric is an indicator to measure data reliability — for example, completeness, accuracy, uniqueness, frequency, lengths, distribution, etc. At Telmai, we support monitoring 40+ data metrics and business rules, and we will cover detailed data metrics in a separate blog.

So what happens when a data metric is below an agreed threshold?

The obvious answer is that the user gets alerted, but often that’s not enough. Users also need to take action on this alert, which means making a change in the pipeline workflow. This is known as the “Pipeline Circuit breaker pattern”. To be fair, it has existed for a while in data engineering. But what has changed is how data observability tooling can enable better orchestration for these patterns.

Let’s look at an example of a data pipeline:

A pipeline consists of multiple transformation steps, and your data monitoring should be plugged in to each such step.

Each such pipeline step can monitor multiple Data metrics and depending on the outcome of that reliability check it will lead to 2 things, alerting and orchestration.

1: Alerting

Almost always when there is data issue it leads to an alert, however depending on the severity of the issue and its impact, alerting can be one of the three types,

Soft alert is recorded in a monitoring system like Telmai for later analysis and investigation.

A user notification if the issue requires attention, so the user can investigate and fix the problem. Notification can be sent via email, slack, or a ticketing system so that the recipients can prioritize their responses.

A pager alert if the issue requires immediate attention and is urgent.

2: Orchestration

Another outcome of monitoring is a change in pipeline workflow. However, this depends on the type of data metric and the impact of its threshold on the downstream.

Circuit Open or block pipeline :

Let’s take an example of a pipeline step where a Data Observability tool identifies incomplete data in an attribute, which is used as a join key.

This should mandate a repair job that needs to be launched automatically and then followed up by reprocessing the step in the pipeline where the problem was detected.

If you are using Airflow + Telmai, the DAG can leverage the response from Telmai API for its flow orchestration.

The Pro of this approach is that users can prevent low-quality data from propagating to downstream steps. The tradeoff is that this will add small latency in the pipeline step.

Circuit Close and pipeline continues:

Now take an example of a pipeline step where a Data Observability tool like Telmai identifies inaccurate job titles in an attribute, which is used for targeted marketing campaigns

This should not become a pipeline blocking step, and the outcome of such a step can be either

Remediate, and a tool like Telmai makes it easy to identify these records to remediate or have the specific dataOps team segment and query records with complete titles.

The biggest Pro of this approach is that the pipeline flow has no latency hence ensuring timely access to the fresh data.

In summary, one should not treat data observability as a boring BI tool where you only check in every once in a while. If you set up the data observability workflow right, it becomes the center of all your data pipeline workflows.

If you want to learn more about customer case studies or want a demo from an expert, sign-up here.

--

--

Mona Rakibe

Love solving complex problems using data. CEO, product head & data leader(www.telm.ai). Democratize data quality across business and technology using ML/AI