Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Machine Learning with Clojure and Apache Spark
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Eric Weinstein
October 25, 2016
Technology
450
1
Share
Machine Learning with Clojure and Apache Spark
Slides for my EuroClojure 2016 talk on machine learning.
Eric Weinstein
October 25, 2016
More Decks by Eric Weinstein
See All by Eric Weinstein
Interview Them Where They Are
ericqweinstein
0
160
Value Your Types!
ericqweinstein
0
120
Being Good: An Introduction to Robo- and Machine Ethics
ericqweinstein
1
2k
What If...?: Ruby 3
ericqweinstein
1
240
Infinite State Machine
ericqweinstein
1
160
Do Androids Dream of Electronic Dance Music?
ericqweinstein
1
130
Machine Learning with Elixir and Phoenix
ericqweinstein
1
1k
Domo Arigato, Mr. Roboto: Machine Learning with Ruby
ericqweinstein
1
1.6k
A Nil Device, A Lonely Operator, and a Voyage to the Void Star
ericqweinstein
1
1.1k
Other Decks in Technology
See All in Technology
写真で見るAWS Summit Singapore 2026
k_adachi_01
0
110
Terragrunt x Snowflake + dbt で作るマルチテナントなデータ基盤構築プラットフォーム
gak_t12
0
170
生成AI時代に信頼性をどう保ち続けるか - Policy as Code の実践
akitok_
1
430
RedmineをAIで効率的に使う検証
yoshiokacb
0
120
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.4k
Redmine次期バージョン7.0の注目新機能解説 — UI/UX強化と連携強化を中心に
vividtone
1
140
そのSLO 99.9%、本当に必要ですか? 〜優先度付きSLOによる責任共有の設計思想〜 / Is that 99.9% SLO really necessary? Design philosophy of shared responsibility through prioritized SLOs
vtryo
0
770
サンプリングは「作る」のか「使う」のか? 分散トレースのコストと運用を両立する実践的戦略 / Why you need the tail sampling and why you don't want it
ymotongpoo
4
180
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
100k
Claude Codeウェビナー資料 - AWSの最新機能をClaude Codeで高速に検証する
oshanqq
0
850
20260515 ID管理は会社を守る大切な砦!〜🔰情シス向け〜
oidfj
0
580
20260515 ログイン機能だけではないアカウント管理を全体で考える~サービス設計者向け~
oidfj
1
680
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
A Tale of Four Properties
chriscoyier
163
24k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
190
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
140
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
2
190
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.2k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
4k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Transcript
Machine Learning with Clojure and Apache Spark ;; Eric Weinstein
;; EuroClojure 2016 ;; Bratislava, Slovakia ;; 25 October 2016
for Joshua
Part 0: Hello!
About Me (def eric-weinstein {:employer "Hulu" :github "ericqweinstein" :twitter "ericqweinstein"
:website "ericweinste.in"}) 30% off with EURORUBY30!
Agenda • Machine learning • Apache Spark • Flambo vs.
Sparkling • DL4J, deep learning, and convolutional neural networks
Part 1: ⚡✨
What’s machine learning?
In a word:
Generalization
What’s Supervised Learning? Classification or regression, generalizing from labeled data
to unlabeled data
What’s Apache Spark? Apache Spark is an open-source cluster computing
framework; its parallelism makes it ideal for processing large data sets, and in ML, the more data, the better!
Some Spark Terminology • RDD: Resilient Distributed Dataset • Dataset:
RDD + Spark SQL execution engine • DataFrame: Dataset organized into named columns
Our Data • Police stop data for the city of
Los Angeles, California in 2015 • 4 features, ~600,000 instances • http://bit.ly/2f9jVwn
Features && Labels • Sex (Male | Female) • Race
(American Indian | Asian | Black | Hispanic | White | Other) • Stop type (Pedestrian | Vehicle) • Post-stop activity (Yes | No)
Features && Labels • Sex (Male | Female) • Race
(American Indian | Asian | Black | Hispanic | White | Other) • Stop type (Pedestrian | Vehicle) • Post-stop activity (Yes | No)
Decision Trees X[0] <= 0.5 gini = 0.4033 samples =
139572 value = [100477, 39095] X[1] <= 5.5 gini = 0.4318 samples = 102419 value = [70118, 32301] True X[1] <= 5.5 gini = 0.2989 samples = 37153 value = [30359, 6794] False X[1] <= 4.5 gini = 0.4399 samples = 96665 value = [65083, 31582] gini = 0.2187 samples = 5754 value = [5035, 719] X[1] <= 3.5 gini = 0.4483 samples = 78400 value = [51805, 26595] gini = 0.397 samples = 18265 value = [13278, 4987] X[1] <= 2.5 gini = 0.4324 samples = 51662 value = [35328, 16334] gini = 0.473 samples = 26738 value = [16477, 10261] X[1] <= 0.5 gini = 0.4406 samples = 48927 value = [32894, 16033] gini = 0.1959 samples = 2735 value = [2434, 301] gini = 0.4658 samples = 65 value = [41, 24] gini = 0.4406 samples = 48862 value = [32853, 16009] X[1] <= 3.5 gini = 0.3067 samples = 34817 value = [28234, 6583] gini = 0.1643 samples = 2336 value = [2125, 211] X[1] <= 2.5 gini = 0.2796 samples = 15786 value = [13133, 2653] X[1] <= 4.5 gini = 0.3277 samples = 19031 value = [15101, 3930] X[1] <= 0.5 gini = 0.2921 samples = 13985 value = [11501, 2484] gini = 0.1701 samples = 1801 value = [1632, 169] gini = 0.426 samples = 26 value = [18, 8] gini = 0.2918 samples = 13959 value = [11483, 2476] gini = 0.3747 samples = 9522 value = [7144, 2378] gini = 0.2732 samples = 9509 value = [7957, 1552]
Part 2: A Tale of Two DSLs vs. ✨✨ Image
credit: Adventure Time
Flambo Example (defn make-spark-context "Creates the Apache Spark context using
the Flambo DSL." [] (-> (conf/spark-conf) (conf/master "local") (conf/app-name "euroclojure") (f/spark-context)))
Sparkling Example (defn make-spark-context "Creates the Apache Spark context using
the Sparkling DSL." [] (-> (conf/spark-conf) (conf/master "local") (conf/app-name "euroclojure") (spark/spark-context)))
Straight Spark (def model (DecisionTree/trainClassifier training 2 categorical-features- info "gini"
5 32)) ; max depth: 5, max leaves: 32 (defn predict [p] ; LabeledPoint (let [prediction (.predict model (.features p))] [(.label p) prediction]))
Accuracy: 0.77352
Part 3: Deep Learning
What’s Deep Learning? • Neural networks (computational architecture modeled after
the human brain) • Neural networks with many layers (> 1 hidden layer, but in practice, can be hundreds) • The vanishing/exploding gradient problem
Vanishing && Gradients
Image credit for all ConvNet images: https://deeplearning4j.org/convolutionalnets
Max Pooling/Downsampling
Alternating Layers
Our Data Image credit: http://digitalmedia.fws.gov/cdm/
What’s DL4J? • DL4J == Deep Learning 4 Java, a
library (for Java, unsurprisingly) • Examples on GitHub: https://github.com/ deeplearning4j/deeplearning4j • ConvNet worked example: http://bit.ly/2eBM8ss
DL4J Example (def nn-conf (-> (NeuralNetConfiguration$Builder.) ;; Some values omitted
for space (.activation "relu") (.learningRate 0.0001) (.weightInit (WeightInit/XAVIER)) (.optimizationAlgo OptimizationAlgorithm/STOCHASTIC_GRADIENT_DESCENT) (.updater Updater/RMSPROP) (.momentum 0.9) (.list) (.layer 0 conv-init) (.layer 1 (max-pool "maxpool1" (int-array [2 2]))) (.layer 2 (conv-5x5 "cnn2" 100 (int-array [5 5]) (int-array [1 1]) 0)) (.layer 3 (max-pool "maxpool2" (int-array [2 2]))) (.layer 4 (fully-connected 500)) (.layer 5 output-layer) (.build)))
How’d We Do? • Accuracy: 0.375 • Precision: 0.3333 •
Recall: 0.375 • F1 Score: 0.3529
Summary • Clojure + Spark = • Flambo and Sparkling
are roughly equally powerful • Deep learning is super doable with Clojure (though Java interop is kind of a pain)
Takeaways (TL;DPA) • Contribute to Flambo and/or Sparkling! • Let’s
build or contribute to a nicer DSL for DL4J • https://github.com/ericqweinstein/euroclojure
None