Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Data-centric MLOps(이정권)
Search
MLOpsKR
June 05, 2021
Programming
1.1k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Data-centric MLOps(이정권)
MLOps KR(
https://www.facebook.com/groups/mlopskr)에서
주최한 1회 온라인 이벤트 발표 자료입니다
MLOpsKR
June 05, 2021
More Decks by MLOpsKR
See All by MLOpsKR
Ray: 대규모 ML인프라를 위한 분산 시스템 프레임워크(조상빈)
mlopskr
0
2.4k
JupyterFlow : 당신의 모델에 날개를 달아드립니다(유홍근)
mlopskr
0
1.2k
모델을 데이터셋에 맞게 대량을 찍어내는 방법(only 파이썬)(김태영)
mlopskr
0
920
KRSH: 선언형 Kubeflow, Terraform처럼 파이프라인 관리하기(김완수)
mlopskr
0
990
MLOps 춘추 전국 시대 정리(변성윤)
mlopskr
0
13k
Other Decks in Programming
See All in Programming
セキュリティの専門家じゃなくてもできる。「セキュリティ意識」をアップデートして サプライチェーン攻撃への耐性を高めよう。
tk3fftk
5
900
Datadog × OpenTelemetry 入門と実践のあいだ
kn_to_maxpno
1
170
RTSPクライアントを自作してみた話
simotin13
0
620
エンジニアと一緒にテストコードの設計と実装を改善した話
mototakatsu
0
210
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
210
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
5.3k
Observability in Practice:Grafana 與 Edge Device SRE 的那些事
blueswen
0
170
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.7k
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
120
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
11
6.1k
Creating Composable Callables in Contemporary C++
rollbear
0
160
Lessons from Spec-Driven Development
simas
PRO
0
220
Featured
See All Featured
Designing Experiences People Love
moore
143
24k
From π to Pie charts
rasagy
0
220
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
Heart Work Chapter 1 - Part 1
lfama
PRO
7
36k
Building Adaptive Systems
keathley
44
3.1k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
870
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
11k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
270
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
62
44k
Into the Great Unknown - MozCon
thekraken
41
2.6k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Transcript
Data-centric MLOps : 데이터 중심 MLOps를 돕기 위한 작은 장치들
Superb AI 이정권
AI / ML = Model + Data
AI / ML = Model + Data Data centric?
Task Baseline: 70% accuracy Target Performance: 90% accuracy Should the
team improve the code or the data? : code(20%), data(80%) A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
A Chat with Andrew on MLOps: From Model-centric to Data-centric
AI Improve AI → Improve the quality of the data: consistency error rate diversity coverage feedback frequency size ...
A Chat with Andrew on MLOps: From Model-centric to Data-centric
AI slide credit: A Chat with Andrew on MLOps: From Model-centric to Data-centric AI (https://www.youtube.com/watch?v=06-AZXmwHjo)
사실은, 늘 해오던 일 Project progress month 1 month 2
month 3 month 4 month 5 Code a model Build data Launch training job
사실은, 늘 해오던 일 Building the Software 2.0 Stack (Andrej
Karpathy, 2018)
Question: How many labeled images are needed to solve this
problem?
Answer: 100,000 images?
My Answer: I don’t know. Let’s start from 5,000 WHY?
여전히, 잘 모른다 → Data-centric MLOps Systematic & iterative way
to build Data for ML 단순히 지루한 작업을 자동화하는 과정이 아닌 ML 문제를 해결하기 위한 과정 저는 Superb AI라는 팀에서 이 문제를 풀고 있습니다.
<2달 <30명 <20,000 Images The Problem
The Meta Problem Design Data Spec Build Data Train a
model Deploy to service
Starting Point Labeling Tool Data Label
Reusable Data Spec { project_name: potato_detect_1 data_spec: good_potato: box: color:
red condition: ... bad_potato: box: } { project_name: potato_detect_2 data_spec: good_potato: polygon: color: red condition: ... bad_potato: box: }
Reusable Data Spec { project_name: potato_detect_13 data_spec: best_potato: polygon: direction:
options: ... good_potato: {} normal_potato: {} bad_potato: {} } Goal ≠ Task ALWAYS configured repeatedly name, color, type, conditions, options, property, ROI Info, ...
Support flexible pipeline 100 different problems, 100 different datasets, 100
different ways To support flexible pipeline Build Data Team Model WORKING SUBMITTED REVIEWED
Support flexible pipeline
Versioning Set 단위, 실험 당
ML Engineer를 위해 … ? Detailed Statistics & Report
Human in the loop ^ 2 Human in the loop
ML
Inside Human Labeling Data Human Labeling Service Model Data Labeling
Our Model ? Uncertain? Label-wise Confidence Overall Set Confidence User performance estimate Boost Labeling ... Human in the loop ^ 2
Keep labels consistent
Keep labels consistent
요약
Source data analysis, User analysis, Log, Task matching, etc 여전히
할일이 정말 많다. 마무리 SDK를 이용한 사용 예제!는 다음에 https://github.com/superb-AI-Suite/ Full-pipeline MLOps https://ai-infrastructure.org/