Slide 1

Slide 1 text

W2vUtils.jl Kenta Murata 2015-04-25 JuliaTokyo #3

Slide 2

Slide 2 text

W2vUtils.jl Kenta Murata 2015-04-25 JuliaTokyo #3

Slide 3

Slide 3 text

introduce(self)

Slide 4

Slide 4 text

Kenta Murata @mrkn ଜా ݡଠ ✓ Cookpad Inc. ✓ Ruby committer as
 a bigdecimal maintainer ✓ One of Julia beginners

Slide 5

Slide 5 text

Julia and me

Slide 6

Slide 6 text

Julia and me ✓ First meet is 2012? in Wikipedia

Slide 7

Slide 7 text

Julia and me ✓ First meet is 2012? in Wikipedia ✓ First use is 2013 only in REPL

Slide 8

Slide 8 text

Julia and me ✓ First meet is 2012? in Wikipedia ✓ First use is 2013 only in REPL ✓ First script writing is yesterday!!

Slide 9

Slide 9 text

W2vUtils.jl

Slide 10

Slide 10 text

W2vUtils.jl

Slide 11

Slide 11 text

W2vUtils.jl ✓ Utilities for word2vec data

Slide 12

Slide 12 text

W2vUtils.jl ✓ Utilities for word2vec data ✓ My first Julia package

Slide 13

Slide 13 text

W2vUtils.jl ✓ Utilities for word2vec data ✓ My first Julia package ✓ My first Julia script

Slide 14

Slide 14 text

Why W2vUtils.jl?

Slide 15

Slide 15 text

Why W2vUtils.jl? ✓ I’ve tried to write SOM learner at first for this presentation

Slide 16

Slide 16 text

Why W2vUtils.jl? ✓ I’ve tried to write SOM learner at first for this presentation ✓ What the most interesting application for SOM?

Slide 17

Slide 17 text

Why W2vUtils.jl? ✓ I’ve tried to write SOM learner at first for this presentation ✓ What the most interesting application for SOM? ✓ I think it is interesting to map distributed representations of words onto 2d-lattice.

Slide 18

Slide 18 text

But…

Slide 19

Slide 19 text

But… ✓ I couldn’t get to done to write SOM learner

Slide 20

Slide 20 text

But… ✓ I couldn’t get to done to write SOM learner ✓ Writing both data loader and SOM learner is too many to done in one night

Slide 21

Slide 21 text

But… ✓ I couldn’t get to done to write SOM learner ✓ Writing both data loader and SOM learner is too many to done in one night ✓ So I completely focused on to make my first package

Slide 22

Slide 22 text

http://www.slideshare.net/KentaSato/julia-36649709

Slide 23

Slide 23 text

https://github.com/mrkn/W2vUtils.jl

Slide 24

Slide 24 text

using W2vUtils

Slide 25

Slide 25 text

Install W2vUtils.jl Pkg.clone("[email protected]:mrkn/W2vUtils.jl.git")

Slide 26

Slide 26 text

Load word2vec data using W2vUtils wv = load("vectors.bin", W2vData) nwords(wv) #=> The number of words vocabulary(wv) #=> The array of words projdim(wv) #=> Dimensions of vector (== 200) projection(wv) #=> The projection matrix wordindex(wv, word) #=> Lookup index of the word wordindices(wv, words) #=> Lookup indices of the words

Slide 27

Slide 27 text

http://techlife.cookpad.com/entry/2015/02/27/093000

Slide 28

Slide 28 text

N-best nearest words

Slide 29

Slide 29 text

N-best nearest words

Slide 30

Slide 30 text

N-best nearest words using W2vUtils wv = load("recipe_steps.bin", W2vData) (words, dists) = distance(wv, "νϣί"; n=5) collect(zip(words, dists)) #=> 5-element Array{(UTF8String,Float64),1}: ("νϣίϨʔτ",0.9378173408020709) ("Ψφογϡ",0.7568368811212932) ("ϚγϡϚϩ",0.7461657278585042) ("Ϗλʔνϣί",0.7439865689272069) ("Ϋϥϯν",0.7296649975102198)

Slide 31

Slide 31 text

Word analogy

Slide 32

Slide 32 text

Word analogy

Slide 33

Slide 33 text

Word analogy using W2vUtils wv = load("recipe_steps-phrase.bin", W2vData) (words, dists) = analogy(wv, ["໊ݹ԰", "੺ຯḩ", "௕໺"]; n=5) collect(zip(words, dists)) #=> 5-element Array{(UTF8String,Float64),1}: ("৴भ",0.5091896142078718) ("ຯḩ",0.5033710820183608) ("ഴ_ຯḩ",0.5033459461705277) ("৴भ_ຯḩ",0.49885121244487496) ("੺ຯḩ_നຯḩ",0.4816681271563857)

Slide 34

Slide 34 text

Nearest words for a vector using W2vUtils wv = load("recipe_steps-phrase.bin", W2vData) ੺ຯḩ = projection(wv, "੺ຯḩ") ໊ݹ԰ = projection(wv, "໊ݹ԰") ௕໺ = projection(wv, "௕໺") (words, dists) = nearest_words(wv, ੺ຯḩ - ໊ݹ԰ + ௕໺; n=5) collect(zip(words, dists)) #=> 5-element Array{(UTF8String,Float64),1}: ("௕໺",0.6527766969046891) ("੺ຯḩ",0.5241494934036854) ("৴भ",0.5091896142078718) ("ຯḩ",0.5033710820183608) ("ഴ_ຯḩ",0.5033459461705277)

Slide 35

Slide 35 text

Example usage

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

͜Ε΍Γ͍ͨ

Slide 38

Slide 38 text

examples/w2v_pca.jl using W2vUtils using MultivariateStats using Gadfly wv = load(ARGS[1], W2vData) (words, dists) = distance(wv, ARGS[2]; n=15) vecs = W2vUtils.projection(wv, words) model = fit(PCA, vecs'; maxoutdim=2) transvecs = transform(model, vecs') pca_plot = plot(x=transvecs[1, :], y=transvecs[2, :], label=words, Geom.point, Geom.label) draw(PDF(ARGS[3], 4inch, 3inch), pca_plot)

Slide 39

Slide 39 text

� ���� ��� ��� ���� ���� ��� ��� ��� ��� � $ julia w2v_pca.jl recipe_steps.bin νϣί pca1.pdf

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

͜Ε΋΍ͬͯΈΑ͏

Slide 42

Slide 42 text

$ julia w2v_pca.jl recipe_steps-phrase.bin ੺ຯḩ pca2.pdf � ���� ��� ��� ��� � � �� � � � � � �� � �� ���� ��� ��� �

Slide 43

Slide 43 text

Future work

Slide 44

Slide 44 text

Future work

Slide 45

Slide 45 text

Future work ✓ Conform to the standard coding style of Julia

Slide 46

Slide 46 text

Future work ✓ Conform to the standard coding style of Julia ✓ Submit to METADATA.jl

Slide 47

Slide 47 text

Future work ✓ Conform to the standard coding style of Julia ✓ Submit to METADATA.jl ✓ Implement self-organizing map

Slide 48

Slide 48 text

Future work ✓ Conform to the standard coding style of Julia ✓ Submit to METADATA.jl ✓ Implement self-organizing map ✓ Visualize 2d-lattice map of word distributed representations