observability you can NOT have a data platform 2. If your data observability captures only the T in ELT (or ETL), you have none. Spice level: testing boundaries 7/10 🌶
the problem has occurred. • “What does my metadata table say” Siloed monitoring • Logs, metrics, and traces are treated as isolated pieces. • They know nothing of each other. Costly failures • Issues detected too late can lead to missed SLAs, resulting in costly consequences. 18 Pain points of traditional observability
• Important: data lineage focuses on the lifecycle of data within an organization's data ecosystem, while event tracing is more concerned with monitoring the flow of individual events or requests through distributed systems for troubleshooting and performance analysis.
errors in my work. Treat observability like data • Let me browse the metadata about my pipelines, just like I do any other data. Automatic SLAs • Calculate the SLAs from my SLO so I can focus on firefighting when I (really) need to. 20 What do we want?
need answers: 1. What is the data source? 2. What is the schema? 3. Who is the owner? 4. How often is it updated? 5. Where does it come from? 6. Who is using it? 7. What has changed? 8. & many more…
DAGs into smaller ones? • Tried to switch to more frequent pipelines in order to minimize failures and increase availability? • Tried adding a control DAG for end-to-end pipeline visibility? • Offered standardized task groups to your users for easier onboarding and management? • Written a metadata exporter (or two)?
implementation (Marquez) • Libraries for common languages, integrations, with data pipeline tools An open platform for collection and analysis of data lineage
for a new generation of powerful, context-aware data tools and best practices. OpenLineage enables consistent collection of lineage metadata, creating a deeper understanding of how data is produced and used. An open standard for data lineage collection and analysis
your data lies within a single vendor, • If you use only one orchestrator, • If you use only one processing engine, • If you do not care about your data sources, • If you can change fields or their type without anyone realizing (before its too late), • If you are responsible both for the data preparation and the presentation.
testing via dependency graphs and schema enforcement during PRs. • End-to-end data lineage, albeit only based on SQL code. • Policy enforcement, automated by code and based on data contracts.