Slide 1

Slide 1 text

Data Engineering Behind Data Science World กมลพรรณ ลิ้วประเสริฐ (ฝน) PhD Student @ Chula Google Developer Expert - Cloud Women Techmakers Ambassador CP39

Slide 2

Slide 2 text

Warm up 🔥

Slide 3

Slide 3 text

Andrej Karpathy, “Building the Software 2.0 Stack,” Spark+AI Summit 2018, video, 17:54, https://oreil.ly/Z21Oz. Chip Huyen, “Designing Machine Learning Systems”

Slide 4

Slide 4 text

Data มาจากไหน ?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Why we need a data engineer?

Slide 7

Slide 7 text

Data Science Hierarchy of Needs Data Engineers Data Analysts Data Scientists

Slide 8

Slide 8 text

Application Analytics

Slide 9

Slide 9 text

Application Workload ✅ For application and backend services ✅ Transactional data ✅ Read / Write / Update (ACID) ✅ E.g. Database ✅ For query, dashboards and analytics ✅ Historical data ✅ Write once, read many ✅ E.g. Data Warehouse, Data Lake, Data Mart Analytics Workload OLTP OLAP

Slide 10

Slide 10 text

- เมื่อไร “Data Engineer” มี Technology ใหม ๆ เกิดขึ้นมากมาย แตอาจจะยังไมไดรับความ นิยมในตอนนี้ Brief History of Data Engineer ‘06 ‘10 ยุค Big Data (20xx) 1983 ยุค Database ‘15 1979 1989 ‘14 ‘13 ‘12 ‘11 เริ่มมี Job Data Engineer MapReduce paper ‘04 ‘18 ‘19 10

Slide 11

Slide 11 text

การสรางและดูแล Data Infrastructure ● Database ● Data Lake ● Data Warehouse ● Cloud / on-premise Infrastructure ● Data Pipeline Azure Blob Storage Amazon S3 Google Cloud Storage Amazon Redshift Google BigQuery Azure Synapse Snowflake Apache Hive Database Data Lake Data Warehouse Cloud / on-premise 11

Slide 12

Slide 12 text

DE ในแต่ละบริษัทเป็นอย่างไร

Slide 13

Slide 13 text

https://towardsdatascience.com/how-to-measure-your-organizations-data-maturity-2352cbaf1896

Slide 14

Slide 14 text

ยกตัวอย่าง Tech stack Data Engineering

Slide 15

Slide 15 text

Techstack ของบริษัทดัง หาได้จาก blog / newsletter

Slide 16

Slide 16 text

ตัวอย่าง Techstack ของบริษัทในไทย

Slide 17

Slide 17 text

ตัวอย่าง Techstack ของบริษัท (2)

Slide 18

Slide 18 text

แหลงเรียน Data Engineer ฟรี [FREE WIKI] Data Engineering Wiki https://dataengineering.wiki/ [FREE PDF] Data Engineering Cookbook https://cookbook.learndataengineering.com/ [ZOOM COURSE] Data Engineering Zoomcamp https://github.com/DataTalksClub/data-engineerin g-zoomcamp [NEWSLETTER] Start Data Engineering https://www.startdataengineering.com/ Youtube.com (ฟรี) รวม Talk จาก Meetup และ Conference ตาง ๆ Channel ที่แนะนำ: - DataTalksClub - Data Council - PyData - Andreas Kretz - Seattle Data Guy - The Cloud Girl - Google Cloud Tech 18

Slide 19

Slide 19 text

Next Action for Data Engineer https://www.startdataengineering.com/ Learn Do 1. Go through the Quickstart 2. Understand its fundamentals 3. Know their objectives Bash Python Java / Scala (optional) SQL Programming Languages Frameworks Platforms

Slide 20

Slide 20 text

DE Trend ● Data Mesh / Data Fabric ( datamesh-architecture.com ) ● Data Observability ● Data Reliability Engineering ● Self-service Analytic System - Analytics Engineering ● Reverse ETL

Slide 21

Slide 21 text

MARCH 2023 HOW IT’S GOING (2023)

Slide 22

Slide 22 text

Bonus: ML Engineering

Slide 23

Slide 23 text

นิยาม AI System แบบง่าย https://youtu.be/VfcY0edoSLU AI System = * Software Engineering DevOps Data Engineering Data Science ML/AI Research ML Engineering Software + Model + DATA

Slide 24

Slide 24 text

High-level of ML Systems 1. Data ▸ Data Engineering Pipelines ▸ Data Quality is a key 🔑 2. Model ▸ Machine Learning Pipelines ▸ Train, Evaluate, Test 󰙤 3. Software ▸ Model Serving & Predictions ▸ Deployment strategies & Infra 🐳☁

Slide 25

Slide 25 text

Machine Learning Engineers are technically proficient programmers who research, build, and design self-running software to automate predictive models. An ML Engineer builds artificial intelligence (AI) systems that leverage huge data sets to generate and develop algorithms capable of learning and eventually making predictions. - Brainstation.io ML Engineer Machine Learning Engineer คือ นักพัฒนาโปรแกรมที่วิจัย สราง และออกแบบ ซอฟตแวรที่ สามารถรันโมเดลทํานายผลได และสรางระบบ AI โดยใชประโยชนจากขอมูลจํานวนมาก

Slide 26

Slide 26 text

ML Engineer MLE / DS / DE แตกต่างกันอย่างไร Data Engineer Data Scientist DATA MODEL MODEL In PROD E.g. Data pipeline Stats, Accuracy, Business value Performance, Throughput, Automation

Slide 27

Slide 27 text

ML Engineer MLE / DS / DE แตกต่างกันอย่างไร (2) Data Engineer Data Scientist Automate ʻem all!

Slide 28

Slide 28 text

มารู้จัก ML System ML System Users Business Requirements ML System Developers ML System Deployment, monitoring, updating of logics Feature engineering ML algorithms Chip Huyen, “Designing Machine Learning Systems” Evaluation Data Infrastructure Data Scientist Feature engineering ML algorithms Evaluation

Slide 29

Slide 29 text

Rule of Machine Learning Martin Zinkevich https://developers.google.com/machine-learning/guides/rules-of-ml

Slide 30

Slide 30 text

To make great products: do machine learning like the great engineer you are, not like the great machine learning expert you aren’t. Martin Zinkevich https://developers.google.com/machine-learning/guides/rules-of-ml - Rule of Machine Learning -

Slide 31

Slide 31 text

Thank you! กมลพรรณ ลิ้วประเสริฐ (ฝน) PhD Student @ Chula Google Developer Expert - Cloud Women Techmakers Ambassador CP39

Slide 32

Slide 32 text

Ads

Slide 33

Slide 33 text

Image slides facebook.com/wtmbkk developers.google.com/womentechmakers IG: @wtmbkk

Slide 34

Slide 34 text

Upcoming: International Women’s Day 2024 9 March 2024 16 March 2024 @wtmbkk