Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Technology
4
230
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
290
Intro to Parquet (June 2015)
samklr
0
200
High Performance RPC with Finagle
samklr
1
140
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
660
Datageeks_27-05.pdf
samklr
0
41
Scalable Machine Learning
samklr
2
190
mesos.devoxx.2014
samklr
2
210
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.4k
Algebra for analytics
samklr
1
240
Other Decks in Technology
See All in Technology
AWS CLI でやってみる ~ AWS Hands-on for Beginners ECS ハンズオン ~
kentosuzuki
1
560
品質特性のすすめ
honamin09
0
190
森林情報のオープンデータと QGISでの利用
kou_kita
0
340
ReverseETLでユーザーに価値を届ける基盤を実現した話
hakky
0
360
Autonomous Database Cloud 技術詳細 / adb-s_technical_detail_jp
oracle4engineer
PRO
10
19k
PMMやプロダクト関係者と協働するために役割を整理した話 / 20220810_pdmtipslt
rakus_dev
0
130
年700万円損するサーバレスの 認可システムをご紹介します!!
higuuu
3
360
ログ集約基盤をCloudWatchからOpenSearchに変えてみた
yuhta28
0
140
金融スタートアップの上場準備で大事にしたマインドセット / 2022-08-04-the-mindset-in-preparing-for-ipo
stajima
0
340
AWSを使う上で意識しておきたい、クラウドセキュリティ超入門(駆け足版)
kkmory
0
230
増田亨さんによる 「設計の考え方とやり方」勉強会オープニング
tsuyok
0
240
やってみたLT会 Fleet Managerのススメ
yukiiiiikuma
PRO
0
420
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
223
49k
Designing with Data
zakiwarfel
91
4k
Imperfection Machines: The Place of Print at Facebook
scottboms
253
12k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
15
990
What's new in Ruby 2.0
geeforr
335
30k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
316
19k
Web development in the modern age
philhawksworth
197
9.4k
Why Our Code Smells
bkeepers
PRO
324
55k
WebSockets: Embracing the real-time Web
robhawkes
57
5.6k
Typedesign – Prime Four
hannesfritz
34
1.4k
Statistics for Hackers
jakevdp
782
210k
Fashionably flexible responsive web design (full day workshop)
malarkey
396
62k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr