Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
250
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
340
Intro to Parquet (June 2015)
samklr
0
280
High Performance RPC with Finagle
samklr
1
170
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
790
Datageeks_27-05.pdf
samklr
0
51
Scalable Machine Learning
samklr
2
220
mesos.devoxx.2014
samklr
2
240
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.8k
Algebra for analytics
samklr
1
280
Other Decks in Technology
See All in Technology
CSSの最新トレンド Ver.2025
tonkotsuboy_com
11
3.8k
Spring for GraphQLって実際どうなの?〜小規模スタートアップの事例紹介〜
kogayushi
0
160
Vibe Codingの裏で、 考える力をどう取り戻すか
csekine
0
320
Tenstorrent 開発者プログラム
tenstorrent_japan
0
150
Two-Tower モデルで実現する 検索リランキング / Shibuya_AI_2
visional_engineering_and_design
2
120
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
6.2k
Introduction to Bill One Development Engineer
sansan33
PRO
0
240
20250612_GitHubを使いこなすためにソニーの開発現場が取り組んでいるプラクティス.pdf
osakiy8
1
310
現場で役立つAPIデザイン
nagix
1
180
Flutterアプリを⾃然⾔語で操作する
yukisakai1225
0
210
うちの会社の評判は?SNSの投稿分析にAIを使ってみた
doumae
0
610
Tensix Core アーキテクチャ解説
tenstorrent_japan
0
170
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
KATA
mclloyd
29
14k
Build The Right Thing And Hit Your Dates
maggiecrowley
35
2.7k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.6k
Balancing Empowerment & Direction
lara
1
100
Building Applications with DynamoDB
mza
95
6.4k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
123
52k
Code Reviewing Like a Champion
maltzj
524
40k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.3k
Thoughts on Productivity
jonyablonski
69
4.7k
Being A Developer After 40
akosma
91
590k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr