Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
300
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
380
Intro to Parquet (June 2015)
samklr
0
330
High Performance RPC with Finagle
samklr
1
230
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
840
Datageeks_27-05.pdf
samklr
0
79
Scalable Machine Learning
samklr
2
270
mesos.devoxx.2014
samklr
2
300
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
310
Other Decks in Technology
See All in Technology
あなたの AI ワークスペースに、 専門コーダーを連れてくる - Amazon Quick Desktop 最新情報
kawaji_scratch
1
130
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.9k
RSA暗号を手計算したくなること、ありますよね?? (20260615_orestudy6_rsa)
thousanda
0
280
Claude Codeをどのように キャッチアップしているか
oikon48
12
6.4k
小さく始める AI 活用推進 ― 日経電子版 Web チームの事例/nikkei-tech-talk47
nikkei_engineer_recruiting
0
230
機械学習を「社会実装」するということ 2026年夏版 / Social Implementation of Machine Learning June 2026 Version
moepy_stats
4
1.6k
ルールやカスタム機能、どう活かす?ハンズオンで体感するIBM Bobの出力コントロール
muehara
1
130
手塩にかけりゃいいってもんじゃない
ming_ayami
0
470
データサイエンスを価値につなげるプロジェクト設計 〜 DS一年目が現場で得た気づき 〜
ysd113
1
190
200個のGitHubリポジトリを横断調査したかった
icck
0
110
FinOps × AIエージェントで実現する コストインシデントの自動調査
oasis1994liveforever
0
130
Agentic Web
dynamis
1
210
Featured
See All Featured
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
1
250
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
56k
Done Done
chrislema
186
16k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
250
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Navigating Weather and Climate Data
rabernat
0
220
The Spectacular Lies of Maps
axbom
PRO
1
800
Exploring anti-patterns in Rails
aemeredith
3
400
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.2k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
22k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr