Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
240
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
330
Intro to Parquet (June 2015)
samklr
0
250
High Performance RPC with Finagle
samklr
1
160
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
770
Datageeks_27-05.pdf
samklr
0
47
Scalable Machine Learning
samklr
2
210
mesos.devoxx.2014
samklr
2
230
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.7k
Algebra for analytics
samklr
1
270
Other Decks in Technology
See All in Technology
エンジニア視点で見る、 組織で運用されるデザインシステムにするには
shunya078
1
290
スタッフエンジニアの道: The Staff Engineer’s Path
snoozer05
PRO
38
13k
Oracle Autonomous Database:サービス概要のご紹介
oracle4engineer
PRO
1
7k
強いチームを夢見て-PMからSREに転身して1年の振り返り / 20240906_bengo4_sre
bengo4com
2
830
標準ライブラリの奥深アップデートを掘り下げよう!
logica0419
2
450
自作Cコンパイラ 8時間の奮闘
soukouki
0
200
疎通2024
sadnessojisan
5
1k
「家族アルバム みてね」における運用管理・ オブザーバビリティの全貌 / Overview of Operation Management and Observability in FamilyAlbum
isaoshimizu
4
140
AIで変わるテスト自動化:最新ツールの多様なアプローチ/ 20240910 Takahiro Kaneyama
shift_evolve
0
190
Datadog を使ったプロダクトとクラウドの セキュリティモニタリング
mrtc0
0
620
プロダクトエンジニアを支えるための開発生産性向上施策
tsukakei
0
140
App Router を実プロダクトで採用して見えてきた勘所をちょっとだけ紹介
marokanatani
1
840
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
294
20k
A Modern Web Designer's Workflow
chriscoyier
691
190k
Thoughts on Productivity
jonyablonski
66
4.2k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
248
20k
Rails Girls Zürich Keynote
gr2m
93
13k
Docker and Python
trallard
39
3k
Bootstrapping a Software Product
garrettdimon
PRO
304
110k
The Mythical Team-Month
searls
218
43k
Into the Great Unknown - MozCon
thekraken
28
1.4k
Pencils Down: Stop Designing & Start Developing
hursman
119
11k
YesSQL, Process and Tooling at Scale
rocio
167
14k
Product Roadmaps are Hard
iamctodd
PRO
48
10k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr