Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
270
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
360
Intro to Parquet (June 2015)
samklr
0
300
High Performance RPC with Finagle
samklr
1
200
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
810
Datageeks_27-05.pdf
samklr
0
57
Scalable Machine Learning
samklr
2
240
mesos.devoxx.2014
samklr
2
270
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
2025年 開発生産「可能」性向上報告 サイロ解消からチームが能動性を獲得するまで/ 20251216 Naoki Takahashi
shift_evolve
PRO
1
150
Power of Kiro : あなたの㌔はパワステ搭載ですか?
r3_yamauchi
PRO
0
150
品質のための共通認識
kakehashi
PRO
3
260
コンテキスト情報を活用し個社最適化されたAI Agentを実現する4つのポイント
kworkdev
PRO
0
1.2k
Kubernetes Multi-tenancy: Principles and Practices for Large Scale Internal Platforms
hhiroshell
0
120
エンジニアとPMのドメイン知識の溝をなくす、 AIネイティブな開発プロセス
applism118
4
1.3k
手動から自動へ、そしてその先へ
moritamasami
0
300
今年のデータ・ML系アップデートと気になるアプデのご紹介
nayuts
1
380
生成AI時代におけるグローバル戦略思考
taka_aki
0
180
生成AI活用の型ハンズオン〜顧客課題起点で設計する7つのステップ
yushin_n
0
180
Database イノベーショントークを振り返る/reinvent-2025-database-innovation-talk-recap
emiki
0
180
AIと二人三脚で育てた、個人開発アプリグロース術
zozotech
PRO
1
730
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
GraphQLとの向き合い方2022年版
quramy
50
14k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Why Our Code Smells
bkeepers
PRO
340
57k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.6k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
Docker and Python
trallard
47
3.7k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3k
How GitHub (no longer) Works
holman
316
140k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr