Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Engineering behind Data Science World

Data Engineering behind Data Science World

Presented in Data Science and Data Engineering class for computer engineering student at Chulalongkorn University.

13 February 2024

Kamolphan Liwprasert

February 13, 2024
Tweet

More Decks by Kamolphan Liwprasert

Other Decks in Technology

Transcript

  1. Data Engineering Behind Data Science World กมลพรรณ ลิ้วประเสริฐ (ฝน) PhD

    Student @ Chula Google Developer Expert - Cloud Women Techmakers Ambassador CP39
  2. Andrej Karpathy, “Building the Software 2.0 Stack,” Spark+AI Summit 2018,

    video, 17:54, https://oreil.ly/Z21Oz. Chip Huyen, “Designing Machine Learning Systems”
  3. Application Workload ✅ For application and backend services ✅ Transactional

    data ✅ Read / Write / Update (ACID) ✅ E.g. Database ✅ For query, dashboards and analytics ✅ Historical data ✅ Write once, read many ✅ E.g. Data Warehouse, Data Lake, Data Mart Analytics Workload OLTP OLAP
  4. - เมื่อไร “Data Engineer” มี Technology ใหม ๆ เกิดขึ้นมากมาย แตอาจจะยังไมไดรับความ

    นิยมในตอนนี้ Brief History of Data Engineer ‘06 ‘10 ยุค Big Data (20xx) 1983 ยุค Database ‘15 1979 1989 ‘14 ‘13 ‘12 ‘11 เริ่มมี Job Data Engineer MapReduce paper ‘04 ‘18 ‘19 10
  5. การสรางและดูแล Data Infrastructure • Database • Data Lake • Data

    Warehouse • Cloud / on-premise Infrastructure • Data Pipeline Azure Blob Storage Amazon S3 Google Cloud Storage Amazon Redshift Google BigQuery Azure Synapse Snowflake Apache Hive Database Data Lake Data Warehouse Cloud / on-premise 11
  6. แหลงเรียน Data Engineer ฟรี [FREE WIKI] Data Engineering Wiki https://dataengineering.wiki/

    [FREE PDF] Data Engineering Cookbook https://cookbook.learndataengineering.com/ [ZOOM COURSE] Data Engineering Zoomcamp https://github.com/DataTalksClub/data-engineerin g-zoomcamp [NEWSLETTER] Start Data Engineering https://www.startdataengineering.com/ Youtube.com (ฟรี) รวม Talk จาก Meetup และ Conference ตาง ๆ Channel ที่แนะนำ: - DataTalksClub - Data Council - PyData - Andreas Kretz - Seattle Data Guy - The Cloud Girl - Google Cloud Tech 18
  7. Next Action for Data Engineer https://www.startdataengineering.com/ Learn Do 1. Go

    through the Quickstart 2. Understand its fundamentals 3. Know their objectives Bash Python Java / Scala (optional) SQL Programming Languages Frameworks Platforms
  8. DE Trend • Data Mesh / Data Fabric ( datamesh-architecture.com

    ) • Data Observability • Data Reliability Engineering • Self-service Analytic System - Analytics Engineering • Reverse ETL
  9. นิยาม AI System แบบง่าย https://youtu.be/VfcY0edoSLU AI System = * Software

    Engineering DevOps Data Engineering Data Science ML/AI Research ML Engineering Software + Model + DATA
  10. High-level of ML Systems 1. Data ▸ Data Engineering Pipelines

    ▸ Data Quality is a key 🔑 2. Model ▸ Machine Learning Pipelines ▸ Train, Evaluate, Test 󰙤 3. Software ▸ Model Serving & Predictions ▸ Deployment strategies & Infra 🐳☁
  11. Machine Learning Engineers are technically proficient programmers who research, build,

    and design self-running software to automate predictive models. An ML Engineer builds artificial intelligence (AI) systems that leverage huge data sets to generate and develop algorithms capable of learning and eventually making predictions. - Brainstation.io ML Engineer Machine Learning Engineer คือ นักพัฒนาโปรแกรมที่วิจัย สราง และออกแบบ ซอฟตแวรที่ สามารถรันโมเดลทํานายผลได และสรางระบบ AI โดยใชประโยชนจากขอมูลจํานวนมาก
  12. ML Engineer MLE / DS / DE แตกต่างกันอย่างไร Data Engineer

    Data Scientist DATA MODEL MODEL In PROD E.g. Data pipeline Stats, Accuracy, Business value Performance, Throughput, Automation
  13. มารู้จัก ML System ML System Users Business Requirements ML System

    Developers ML System Deployment, monitoring, updating of logics Feature engineering ML algorithms Chip Huyen, “Designing Machine Learning Systems” Evaluation Data Infrastructure Data Scientist Feature engineering ML algorithms Evaluation
  14. To make great products: do machine learning like the great

    engineer you are, not like the great machine learning expert you aren’t. Martin Zinkevich https://developers.google.com/machine-learning/guides/rules-of-ml - Rule of Machine Learning -
  15. Ads