Data Engineering
Behind Data Science World
กมลพรรณ ลิ้วประเสริฐ (ฝน)
PhD Student @ Chula
Google Developer Expert - Cloud
Women Techmakers Ambassador
CP39
Slide 2
Slide 2 text
Warm up 🔥
Slide 3
Slide 3 text
Andrej Karpathy, “Building the Software 2.0 Stack,” Spark+AI Summit 2018, video, 17:54, https://oreil.ly/Z21Oz.
Chip Huyen, “Designing Machine Learning Systems”
Slide 4
Slide 4 text
Data มาจากไหน ?
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Why we need a data engineer?
Slide 7
Slide 7 text
Data Science Hierarchy of Needs
Data Engineers
Data Analysts
Data Scientists
Slide 8
Slide 8 text
Application Analytics
Slide 9
Slide 9 text
Application Workload
✅ For application and backend services
✅ Transactional data
✅ Read / Write / Update (ACID)
✅ E.g. Database
✅ For query, dashboards and analytics
✅ Historical data
✅ Write once, read many
✅ E.g. Data Warehouse, Data Lake, Data Mart
Analytics Workload
OLTP OLAP
Slide 10
Slide 10 text
- เมื่อไร
“Data Engineer”
มี Technology ใหม ๆ
เกิดขึ้นมากมาย
แตอาจจะยังไมไดรับความ
นิยมในตอนนี้
Brief History of Data Engineer
‘06 ‘10
ยุค Big Data (20xx)
1983
ยุค Database
‘15
1979
1989
‘14
‘13
‘12
‘11
เริ่มมี Job
Data Engineer
MapReduce
paper
‘04 ‘18 ‘19
10
Slide 11
Slide 11 text
การสรางและดูแล Data Infrastructure
● Database
● Data Lake
● Data Warehouse
● Cloud / on-premise Infrastructure
● Data Pipeline
Azure Blob
Storage
Amazon S3 Google
Cloud Storage
Amazon Redshift Google BigQuery Azure Synapse Snowflake
Apache Hive
Database
Data Lake
Data
Warehouse
Cloud /
on-premise
11
แหลงเรียน Data Engineer ฟรี
[FREE WIKI] Data Engineering Wiki
https://dataengineering.wiki/
[FREE PDF] Data Engineering Cookbook
https://cookbook.learndataengineering.com/
[ZOOM COURSE] Data Engineering Zoomcamp
https://github.com/DataTalksClub/data-engineerin
g-zoomcamp
[NEWSLETTER] Start Data Engineering
https://www.startdataengineering.com/
Youtube.com
(ฟรี) รวม Talk จาก Meetup และ Conference ตาง
ๆ
Channel ที่แนะนำ:
- DataTalksClub
- Data Council
- PyData
- Andreas Kretz
- Seattle Data Guy
- The Cloud Girl
- Google Cloud Tech
18
Slide 19
Slide 19 text
Next Action for Data Engineer https://www.startdataengineering.com/
Learn
Do
1. Go through the
Quickstart
2. Understand its
fundamentals
3. Know their
objectives
Bash
Python
Java / Scala
(optional)
SQL
Programming
Languages Frameworks Platforms
Slide 20
Slide 20 text
DE Trend
● Data Mesh / Data Fabric ( datamesh-architecture.com )
● Data Observability
● Data Reliability Engineering
● Self-service Analytic System - Analytics Engineering
● Reverse ETL
Slide 21
Slide 21 text
MARCH
2023
HOW IT’S GOING (2023)
Slide 22
Slide 22 text
Bonus: ML Engineering
Slide 23
Slide 23 text
นิยาม AI System แบบง่าย
https://youtu.be/VfcY0edoSLU
AI System =
*
Software Engineering
DevOps
Data Engineering
Data Science
ML/AI Research
ML Engineering
Software + Model
+ DATA
Slide 24
Slide 24 text
High-level of ML Systems
1. Data
▸ Data Engineering Pipelines
▸ Data Quality is a key 🔑
2. Model
▸ Machine Learning Pipelines
▸ Train, Evaluate, Test
3. Software
▸ Model Serving & Predictions
▸ Deployment strategies & Infra
🐳☁
Slide 25
Slide 25 text
Machine Learning Engineers are technically proficient programmers who research, build,
and design self-running software to automate predictive models. An ML Engineer builds
artificial intelligence (AI) systems that leverage huge data sets to generate and develop
algorithms capable of learning and eventually making predictions.
- Brainstation.io
ML Engineer
Machine Learning Engineer คือ นักพัฒนาโปรแกรมที่วิจัย สราง และออกแบบ ซอฟตแวรที่
สามารถรันโมเดลทํานายผลได และสรางระบบ AI โดยใชประโยชนจากขอมูลจํานวนมาก
Slide 26
Slide 26 text
ML Engineer
MLE / DS / DE แตกต่างกันอย่างไร
Data Engineer Data Scientist
DATA MODEL
MODEL
In PROD
E.g. Data pipeline Stats, Accuracy,
Business value
Performance,
Throughput,
Automation
Slide 27
Slide 27 text
ML Engineer
MLE / DS / DE แตกต่างกันอย่างไร (2)
Data Engineer Data Scientist
Automate ʻem all!
Slide 28
Slide 28 text
มารู้จัก ML System
ML System Users
Business
Requirements
ML System
Developers
ML System
Deployment, monitoring, updating of logics
Feature engineering ML algorithms
Chip Huyen, “Designing Machine Learning Systems”
Evaluation
Data
Infrastructure
Data Scientist Feature engineering ML algorithms Evaluation
Slide 29
Slide 29 text
Rule of
Machine Learning
Martin Zinkevich
https://developers.google.com/machine-learning/guides/rules-of-ml
Slide 30
Slide 30 text
To make great products:
do machine learning like the great
engineer you are, not like the great
machine learning expert you aren’t.
Martin Zinkevich
https://developers.google.com/machine-learning/guides/rules-of-ml
- Rule of Machine Learning -
Slide 31
Slide 31 text
Thank you!
กมลพรรณ ลิ้วประเสริฐ (ฝน)
PhD Student @ Chula
Google Developer Expert - Cloud
Women Techmakers Ambassador
CP39