Slide 1

Slide 1 text

W2vUtils.jl Kenta Murata 2015-04-25 JuliaTokyo #3

Slide 2

Slide 2 text

W2vUtils.jl Kenta Murata 2015-04-25 JuliaTokyo #3

Slide 3

Slide 3 text


Slide 4

Slide 4 text

Kenta Murata @mrkn ଜా ݡଠ ✓ Cookpad Inc. ✓ Ruby committer as
 a bigdecimal maintainer ✓ One of Julia beginners

Slide 5

Slide 5 text

Julia and me

Slide 6

Slide 6 text

Julia and me ✓ First meet is 2012? in Wikipedia

Slide 7

Slide 7 text

Julia and me ✓ First meet is 2012? in Wikipedia ✓ First use is 2013 only in REPL

Slide 8

Slide 8 text

Julia and me ✓ First meet is 2012? in Wikipedia ✓ First use is 2013 only in REPL ✓ First script writing is yesterday!!

Slide 9

Slide 9 text


Slide 10

Slide 10 text


Slide 11

Slide 11 text

W2vUtils.jl ✓ Utilities for word2vec data

Slide 12

Slide 12 text

W2vUtils.jl ✓ Utilities for word2vec data ✓ My first Julia package

Slide 13

Slide 13 text

W2vUtils.jl ✓ Utilities for word2vec data ✓ My first Julia package ✓ My first Julia script

Slide 14

Slide 14 text

Why W2vUtils.jl?

Slide 15

Slide 15 text

Why W2vUtils.jl? ✓ I’ve tried to write SOM learner at first for this presentation

Slide 16

Slide 16 text

Why W2vUtils.jl? ✓ I’ve tried to write SOM learner at first for this presentation ✓ What the most interesting application for SOM?

Slide 17

Slide 17 text

Why W2vUtils.jl? ✓ I’ve tried to write SOM learner at first for this presentation ✓ What the most interesting application for SOM? ✓ I think it is interesting to map distributed representations of words onto 2d-lattice.

Slide 18

Slide 18 text


Slide 19

Slide 19 text

But… ✓ I couldn’t get to done to write SOM learner

Slide 20

Slide 20 text

But… ✓ I couldn’t get to done to write SOM learner ✓ Writing both data loader and SOM learner is too many to done in one night

Slide 21

Slide 21 text

But… ✓ I couldn’t get to done to write SOM learner ✓ Writing both data loader and SOM learner is too many to done in one night ✓ So I completely focused on to make my first package

Slide 22

Slide 22 text

Slide 23

Slide 23 text

Slide 24

Slide 24 text

using W2vUtils

Slide 25

Slide 25 text

Install W2vUtils.jl Pkg.clone("")

Slide 26

Slide 26 text

Load word2vec data using W2vUtils wv = load("vectors.bin", W2vData) nwords(wv) #=> The number of words vocabulary(wv) #=> The array of words projdim(wv) #=> Dimensions of vector (== 200) projection(wv) #=> The projection matrix wordindex(wv, word) #=> Lookup index of the word wordindices(wv, words) #=> Lookup indices of the words

Slide 27

Slide 27 text

Slide 28

Slide 28 text

N-best nearest words

Slide 29

Slide 29 text

N-best nearest words

Slide 30

Slide 30 text

N-best nearest words using W2vUtils wv = load("recipe_steps.bin", W2vData) (words, dists) = distance(wv, "νϣί"; n=5) collect(zip(words, dists)) #=> 5-element Array{(UTF8String,Float64),1}: ("νϣίϨʔτ",0.9378173408020709) ("Ψφογϡ",0.7568368811212932) ("ϚγϡϚϩ",0.7461657278585042) ("Ϗλʔνϣί",0.7439865689272069) ("Ϋϥϯν",0.7296649975102198)

Slide 31

Slide 31 text

Word analogy

Slide 32

Slide 32 text

Word analogy

Slide 33

Slide 33 text

Word analogy using W2vUtils wv = load("recipe_steps-phrase.bin", W2vData) (words, dists) = analogy(wv, ["໊ݹ԰", "੺ຯḩ", "௕໺"]; n=5) collect(zip(words, dists)) #=> 5-element Array{(UTF8String,Float64),1}: ("৴भ",0.5091896142078718) ("ຯḩ",0.5033710820183608) ("ഴ_ຯḩ",0.5033459461705277) ("৴भ_ຯḩ",0.49885121244487496) ("੺ຯḩ_നຯḩ",0.4816681271563857)

Slide 34

Slide 34 text

Nearest words for a vector using W2vUtils wv = load("recipe_steps-phrase.bin", W2vData) ੺ຯḩ = projection(wv, "੺ຯḩ") ໊ݹ԰ = projection(wv, "໊ݹ԰") ௕໺ = projection(wv, "௕໺") (words, dists) = nearest_words(wv, ੺ຯḩ - ໊ݹ԰ + ௕໺; n=5) collect(zip(words, dists)) #=> 5-element Array{(UTF8String,Float64),1}: ("௕໺",0.6527766969046891) ("੺ຯḩ",0.5241494934036854) ("৴भ",0.5091896142078718) ("ຯḩ",0.5033710820183608) ("ഴ_ຯḩ",0.5033459461705277)

Slide 35

Slide 35 text

Example usage

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text


Slide 38

Slide 38 text

examples/w2v_pca.jl using W2vUtils using MultivariateStats using Gadfly wv = load(ARGS[1], W2vData) (words, dists) = distance(wv, ARGS[2]; n=15) vecs = W2vUtils.projection(wv, words) model = fit(PCA, vecs'; maxoutdim=2) transvecs = transform(model, vecs') pca_plot = plot(x=transvecs[1, :], y=transvecs[2, :], label=words, Geom.point, Geom.label) draw(PDF(ARGS[3], 4inch, 3inch), pca_plot)

Slide 39

Slide 39 text

� ���� ��� ��� ���� ���� ��� ��� ��� ��� � $ julia w2v_pca.jl recipe_steps.bin νϣί pca1.pdf

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text


Slide 42

Slide 42 text

$ julia w2v_pca.jl recipe_steps-phrase.bin ੺ຯḩ pca2.pdf � ���� ��� ��� ��� � � �� � � � � � �� � �� ���� ��� ��� �

Slide 43

Slide 43 text

Future work

Slide 44

Slide 44 text

Future work

Slide 45

Slide 45 text

Future work ✓ Conform to the standard coding style of Julia

Slide 46

Slide 46 text

Future work ✓ Conform to the standard coding style of Julia ✓ Submit to METADATA.jl

Slide 47

Slide 47 text

Future work ✓ Conform to the standard coding style of Julia ✓ Submit to METADATA.jl ✓ Implement self-organizing map

Slide 48

Slide 48 text

Future work ✓ Conform to the standard coding style of Julia ✓ Submit to METADATA.jl ✓ Implement self-organizing map ✓ Visualize 2d-lattice map of word distributed representations