Dataops - The WAT, The Pain, The How

Dataops - The WAT, The Pain, The How

A high-level introduction to DataOps and how to deploy or nail it within real organisations.

21f588b59e5b0a7d92be27f14405747a?s=128

Xavier Bruhiere

August 22, 2019
Tweet

Transcript

  1. DataOps The WAT The Pain The How

  2. Hey hi, I’m Xavier Bruhiere VP Data engineering @ Lazada

    And I AM HIRING Data and full stack engineers Shameless Plug
  3. 01. Terms, scope, context Agenda 02. Where it hurts 03.

    Hotfixes
  4. Building ETL What I’ve been told

  5. Building ETL What I do

  6. - Glossary, scope, definitions 01 Have you met DataOps

  7. “DevOps is the combination of cultural philosophies, practices, and tools

    that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.” - AWS:*What is DevOps?* DevOps
  8. “A collaborative data manager practice, really focused on improving communication,

    integration, and automation of data flow between managers and consumers of data within an organization,” - Gartner DataOps
  9. Orchestrate, monitor Cycle Velocity Sensitive information DataOps Long lasting Slow

    changing Stateful People Massive Same goals Data != Software Analytics iterations ML / Operations
  10. -Cost and pain of DevOps for Data 02 Diagnosis

  11. - Time to delivery and Reliability - Fast iterations -

    Safe iterations - Correctness - Cost down - Insight value Goal
  12. - Reproducibility? - Local sandbox of a warehouse? Staging? -

    Mocking source data - Complex, stateful DAGs But… About tooling About testing
  13. - QA Testing with non-tech-savy people - Gap between who

    request and own the domain, and who performs - Various backgrounds and expectations But… About security About people - Access management at the column level - GDPR - Data leakage - Anonymization
  14. |”Violent advices and slides ahead” | - me 03 Counter-measures

  15. Efficiency Automate data collection – capture changes

  16. Efficiency Functional data engineering - Maxime Beauchemin

  17. Safety DO TESTS – Start small, build up. - Maxime

    Beauchemin
  18. Quality Develop Data Governance

  19. Quality Statistical Monitoring

  20. People // Business Engineer shouldn’t write ETL - Jeff Magnuson

  21. 01. Ignore the buzz, capture automations to escape chaos Takeaway

    02. Listen to people 03. Hunt relentlessly bottlenecks and risks 05. Have faith 02-bis. No heroism – self-organizing team
  22. Questions?

  23. None
  24. Thanks!

  25. None