Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Sam Bessalah
December 03, 2014
Technology
290
4
Share
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
370
Intro to Parquet (June 2015)
samklr
0
320
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
830
Datageeks_27-05.pdf
samklr
0
74
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
290
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
310
Other Decks in Technology
See All in Technology
NOSTR, réseau social et espace de liberté décentralisé
rlifchitz
0
160
デシリアライゼーションを理解する / Inside Deserialization
tomzoh
0
260
New CBs New Challenges
ysuzuki
1
170
Databricksで構築するログ検索基盤とアーキテクチャ設計
cscengineer
0
160
会社紹介資料 / Sansan Company Profile
sansan33
PRO
16
410k
Bill One 開発エンジニア 紹介資料
sansan33
PRO
5
18k
Zero-Downtime Migration: Moving a Massive, Historic iOS App from CocoaPods to SPM and Tuist without Stopping Feature Delivery
kagemiku
0
230
新規サービス開発におけるReact Nativeのリアル〜技術選定の裏側と実践的OSS活用〜
grandbig
2
180
GitHub Copilotを極める会 - 開発者のための活用術
findy_eventslides
6
4k
インフラを Excel 管理していた組織が 3 ヶ月で IaC 化されるまで
geekplus_tech
3
180
非エンジニア職からZOZOへ 〜登壇がキャリアに与えた影響〜
penpeen
0
320
幾億の壁を超えて/Beyond Countless Walls(JP)
ikuodanaka
0
120
Featured
See All Featured
Facilitating Awesome Meetings
lara
57
6.8k
Design in an AI World
tapps
0
190
Unsuck your backbone
ammeep
672
58k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.4k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2k
WENDY [Excerpt]
tessaabrams
9
37k
Become a Pro
speakerdeck
PRO
31
5.9k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.6k
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
150
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr