Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What is even a data engineer?

What is even a data engineer?

Mohammed Fazalullah

December 11, 2023
Tweet

More Decks by Mohammed Fazalullah

Other Decks in Technology

Transcript

  1. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. What is even a data engineer? Mohammed Fazalullah “Faz” Senior Developer Advocate, MENAT AWS
  2. © 2022, Amazon Web Services, Inc. or its affiliates. whoami

    >> Developer Advocate at AWS, Coding Ambassador @ coders(hq) >> 18 years and counting, Backend engineer to Solutions Architect and Team lead >> Community builder @ASEAN and @MENA >> Mentorship in technical leadership and how to build a career path in tech 3
  3. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. Evolution of the developer/engineer persona
  4. © 2022, Amazon Web Services, Inc. or its affiliates. In

    simpler times… A developer would design a basic UI and focus on the backend logic A DB architect would manage the schema A sys admin would manage the provisioning and maintenance of servers Developer Sys admin Architect
  5. © 2022, Amazon Web Services, Inc. or its affiliates. Technology

    evolution Browsers are the new OS, internet the new delivery medium The rise of multiple programming languages, frameworks More compute options, Distributed systems over networks, Purpose-built databases Open-source tooling Virtualization and containerization Infrastructure democratization through cloud
  6. © 2022, Amazon Web Services, Inc. or its affiliates. Developers

    now Frontend developer Backend developer Full stack developer Cloud developer Low-code developer IT admin DevOps engineer InfoSec engineer Cloud engineer Platform engineer Data engineer Data scientist MLOps Engineer ML engineer Software architect Solutions architect Enterprise architect Cloud architect InfoSec architect
  7. © 2022, Amazon Web Services, Inc. or its affiliates. 8

    https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the- 2023-gartner-hype-cycle
  8. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. So what is even a data engineer? 9
  9. © 2022, Amazon Web Services, Inc. or its affiliates. Data

    engineer? Some call them: • BI developers • Research engineers • Machine Learning engineers • DataOps • Analytics engineers (aka unhappy data analysts) • Data engineers • And sometimes… Software Engineer 10 Photo by Brendan Church on Unsplash
  10. © 2022, Amazon Web Services, Inc. or its affiliates. Is

    there even such a role like data engineer ??? ex-developer, ex-SA now at-Cloud company, 2019 12
  11. © 2022, Amazon Web Services, Inc. or its affiliates. Not

    one of today’s data engineers grew up as a kid imagining becoming one. IT sage 13
  12. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. A data engineer solves data-related engineering problems in a maintainable way. Also talks a lot. 16
  13. © 2022, Amazon Web Services, Inc. or its affiliates. The

    Data Science Hierarchy of Needs 18 AI, Deep Learning A/B testing, Experimentation, Simple ML Algorithms Analytics, Metrics, Segments, Aggregates, Features, Training data Cleaning, Anomaly Detection, Preparation Reliable Data Flow, Infrastructure, Pipelines, ETL, Structured and unstructured data storage Instrumentation, Logging, Sensors, External data, User generated content
  14. © 2022, Amazon Web Services, Inc. or its affiliates. The

    Data Science Hierarchy of Needs 19 AI, Deep Learning A/B testing, Experimentation, Simple ML Algorithms Analytics, Metrics, Segments, Aggregates, Features, Training data Cleaning, Anomaly Detection, Preparation Reliable Data Flow, Infrastructure, Pipelines, ETL, Structured and unstructured data storage Instrumentation, Logging, Sensors, External data, User generated content
  15. © 2022, Amazon Web Services, Inc. or its affiliates. The

    Data Science Hierarchy of Needs 20 AI, Deep Learning A/B testing, Experimentation, Simple ML Algorithms Analytics, Metrics, Segments, Aggregates, Features, Training data Cleaning, Anomaly Detection, Preparation Reliable Data Flow, Infrastructure, Pipelines, ETL, Structured and unstructured data storage Instrumentation, Logging, Sensors, External data, User generated content Machine Learning Engineer Data Scientist Data Analyst Data Engineer Data Infrastructure Engineer
  16. © 2022, Amazon Web Services, Inc. or its affiliates. The

    Data Science Hierarchy of Needs 21 AI, Deep Learning A/B testing, Experimentation, Simple ML Algorithms Analytics, Metrics, Segments, Aggregates, Features, Training data Cleaning, Anomaly Detection, Preparation Reliable Data Flow, Infrastructure, Pipelines, ETL, Structured and unstructured data storage Instrumentation, Logging, Sensors, External data, User generated content Machine Learning Engineer Data Scientist Data Analyst Data Engineer Data Infrastructure Engineer
  17. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. What makes a data engineer? 22
  18. © 2022, Amazon Web Services, Inc. or its affiliates. The

    past is the future, choose boring • Data modeling (1960s) • UNIX shell (1971) • SQL (1974) • Python (1991), Java (1995) • Kubernetes YAML (2014)? ”The longer a technology lives, the longer it can be expected to live.” - Nassim N. Taleb (way of Mandelbrot, aka Lindy effect) 23 Photo by Lukas on Unsplash
  19. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. A day in the life of a data engineer… 25
  20. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. But what about softwares/libraries/tools?
  21. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. DevOps as a data engineer
  22. © 2022, Amazon Web Services, Inc. or its affiliates. •

    “DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity.” Source: https://aws.amazon.com/devops/what-is-devops/
  23. © 2022, Amazon Web Services, Inc. or its affiliates. Your

    enterprise Customers Release Test Build Delivery pipeline Plan Monitor Feedback loop DevOps in practice
  24. © 2022, Amazon Web Services, Inc. or its affiliates. ©

    2022, Amazon Web Services, Inc. or its affiliates. What is even MLOps
  25. © 2022, Amazon Web Services, Inc. or its affiliates. ML:

    Pilots vs Operationalising Pilot phase Operational phase Purpose: Put the system in production and achieve desired business value ML Code Configuration Data Collection ETL Data Verification Analysis & Evaluations Infrastructure Management Process Management Serving Infrastructure Monitoring Testing Automation CI/CD Machine Learning Code Monitoring Serving Infrastructure Configuration Management Tools Automation Continuous Integration Continuous Deployment Testing Data Verification Continuous Data Collection Model Evaluation Experiments Purpose: Answer the question “Is this possible, and should we proceed?”
  26. © 2022, Amazon Web Services, Inc. or its affiliates. MLOps

    or ML Ops is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. Wikipedia 38
  27. © 2022, Amazon Web Services, Inc. or its affiliates. MLOps

    = People + Technology + Process People Processes Technology ML Ops
  28. © 2022, Amazon Web Services, Inc. or its affiliates. MLOps

    Requirements • ML pipelines • Data warehouses • Data pipelines • CI/CD • Model development environment • Model serving infrastructure Tech • Training • Deployment • Monitoring • Logging • Performance • Governance Processes • Roles required (Data scientist, ML engineer, SysOps, etc) • Capability • Collaboration • Culture People
  29. © 2022, Amazon Web Services, Inc. or its affiliates. MLOps

    is a journey Initial Repeatable Reliable Scalable MLOps Maturity Models in Production Establish the experimentation environment Standardise code repositories and ML solution deployment Introduce testing, monitoring, and multi-account deployment Templatise and productionise multiple ML solutions
  30. © 2022, Amazon Web Services, Inc. or its affiliates. Principles

    while architecting data projects Principle Example Flexibility Use decoupled services Reproducibility Use infrastructure as code (IaC) to deploy your services Reusability Use libraries and references in a shared manner Scalability Choose service configurations to accommodate any data load Auditability Keep an audit trail by using logs, versions, and dependencies 42 https://docs.aws.amazon.com/prescriptive-guidance/latest/modern-data-centric-use-cases/data-engineering-principles.html
  31. © 2022, Amazon Web Services, Inc. or its affiliates. Parting

    advice Build foundations with the boring stack, DS hierarchy of needs Look at the hidden details of the role and responsibilities behind a job title Ramp up on automating data processes and deployments to production It’s a journey, with moving goal posts, and newer responsibilities being added as technology and businesses evolve
  32. © 2022, Amazon Web Services, Inc. or its affiliates. Thank

    you! © 2022, Amazon Web Services, Inc. or its affiliates. Mohammed Fazalullah “Faz” linkedin.com/in/mohammedfazalullah