Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Seven Sins of Data Science Newbie
Search
_themessier
March 10, 2018
Technology
0
98
Seven Sins of Data Science Newbie
Presented at WiDS Mumbai 2018
_themessier
March 10, 2018
Tweet
Share
More Decks by _themessier
See All by _themessier
Thesis Presentation
_themessier
0
36
Proactive_Mitigation_Detox_ICWSM
_themessier
0
7
An overview of hate speech analysis techniques in NLP
_themessier
0
78
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection
_themessier
0
100
Google Bindi: Hateful Signals and where to find them?
_themessier
0
80
Hateful Signals In Indic Context and Where to Find Them
_themessier
0
97
NLP With Friends
_themessier
0
110
Revisiting Hate Speech Benchmarks KDD 2023
_themessier
0
140
Role of NLP in Analysing Hate Speech
_themessier
0
110
Other Decks in Technology
See All in Technology
Function Body Macros で、SwiftUI の View に Accessibility Identifier を自動付与する/Function Body Macros: Autogenerate accessibility identifiers for SwiftUI Views
miichan
2
170
なぜSaaSがMCPサーバーをサービス提供するのか?
sansantech
PRO
8
2.7k
Kiroと学ぶコンテキストエンジニアリング
oikon48
6
9.6k
人工衛星のファームウェアをRustで書く理由
koba789
5
2.4k
Webアプリケーションにオブザーバビリティを実装するRust入門ガイド
nwiizo
3
450
5年目から始める Vue3 サイト改善 #frontendo
tacck
PRO
3
200
カミナシ社の『ID管理基盤』製品内製 - その意思決定背景と2年間の進化 #AWSUnicornDay / Kaminashi ID - The Big Whys
kaminashi
3
840
AI開発ツールCreateがAnythingになったよ
tendasato
0
110
スマートファクトリーの第一歩 〜AWSマネージドサービスで 実現する予知保全と生成AI活用まで
ganota
1
170
共有と分離 - Compose Multiplatform "本番導入" の設計指針
error96num
1
220
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
2
160
[ JAWS-UG 東京 CommunityBuilders Night #2 ]SlackとAmazon Q Developerで 運用効率化を模索する
sh_fk2
2
280
Featured
See All Featured
Music & Morning Musume
bryan
46
6.8k
Speed Design
sergeychernyshev
32
1.1k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
520
The Cult of Friendly URLs
andyhume
79
6.6k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
How STYLIGHT went responsive
nonsquared
100
5.8k
Scaling GitHub
holman
463
140k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
61k
Embracing the Ebb and Flow
colly
87
4.8k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.5k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
13k
Transcript
Seven Sins of a Newbie Data Science (and how not
to commit them) - Sarah Masud, Red Hat
About Me Github: sara-02 Blog: themessier.wordpress.com
Me Learning To Give Back: 1. Open Source Contributions 2.
Blogs 3. Meetups, Conferences 4. Mentorship 5. Program review committees
Let’s begin ;) Image: https://commons.wikimedia.org/wiki/File:DataScienceLogo.png
Image: https://chroniclesofanassistant.wordpress.com/2010/11/14/first-day-of-work/
Image: https://www.kdnuggets.com/2016/10/big-data-science-expectation-reality.html
1: The Problem Statement At College: “On a loan data-set,
using logistic regression determine if person will default or not.”
1: The Problem Statement At Work: “We have been collecting
these data points since past 3 years. See what can be done to monetize it.”
1: The Problem Statement Solution 1. Understand the business needs!
2. Then understand the data collected. 3. Finally translate the vague problem into a known one.
2: Show Me the data At College: “Use the data
from Kaggle, UCLA registry, Image-Net, Wikipedia...”
Image: https://me.me/i/show-me-the-data-9747283
2: Show Me the data At Work: “Use whatever data
is legally available, but get this problem solved!”
2: Show Me the data Solution: 1. Don’t expect someone
to give you the data willingly! 2. Learn to deal with lack of labelled data. 3. Learn Web Scraping/Data ingestion pipelines.
3. Using A Missile Gun To Kill The Chicken At
College: “Sounds cool! Let me use this SOTA algorithm.”
Image: https://pbs.twimg.com/media/B83v847CUAAQHKg.jpg:large
3. Using A Missile Gun To Kill The Chicken At
Work: “Provide us with a cheap, accurate, stable solution.”
Image: https://www.someecards.com/usercards/viewcard/if-you-torture-the-data-they-will-confess-94dd7
3. Using A Missile Gun To Kill The Chicken Solutions:
1. Not every problem needs to be a DS problem! 2. Use switch cases if that is enough. 3. Understand the business constraints.
4: The Value of Your Work At College: 1. Accuracy
of model. 2. Number of research papers. 3. Subject grade!
4: The Value of Your Work At work 1. RoI.
2. RoI. 3. RoI.
Image: https://me.me/i/show-me-the-money-memes-11885126
4: The Value of Your Work Solution: 1. Understand the
business. 2. Optimise for Accuracy vs Cost. 3. Keep the end user in mind.
5: Serving the model At College “It about building most
accurate system, running it from the terminal. And that is it!”
5: Serving the model At Work: 1. How many concurrent
users can we serve? 2. What time delay can we afford, before we lose the customer?
5: Serving the model Industry: 1. How is the model
exposed to UI? 2. Can the model be distributed? 3. Can the model scale with increase in data?
6. Know Thy Audience At College: “Technical mentors, peers.”
6. Know Thy Audience At Work: “Audience is always a
mixed Baggage.”
6. Know Thy Audience Solution: 1. Know you concepts well.
2. Teaching DS to your grandma style of conversations.
Image: http://www.combine-lab.com/if-you-cant-explain-it-simply-you-dont-understand-it-well-enough/
7. Entropy sets in At College: “Build once, use once,
and then forget it!”
7. Entropy sets in At Work: “The same model and
code can be used in production for years without replacement.”
7. Entropy sets in Solution: 1. Build scalable robust models.
2. Perform regular model evaluation. 3. Re-train the model from time to time.
Love the problem, not your solution. Learn to Unlearn →
Relearn → Remodel. BECAUSE ...
Image: https://www.cafepress.com/+entropy_always_wins_3_shot_glass,1289685014
Thank You Q & A