Perl for Visualization@YAPC2013

Perl for Visualization YAPCASIA:2013 9/21 1 13೥9݄21೔౔༵೔

print Dumper $me { twitter => “@muddydixon”, organization => “NIFTY”,
job => “low spec full stack engineer”, skill => [ “data collecting”, “data cleansing”, “visualization”, ] } 2 13೥9݄21೔౔༵೔

Today’s Topic 3 13೥9݄21೔౔༵೔

Today’s Topic 4 13೥9݄21೔౔༵೔

Key of Visualization STORIES are main concept of visualization are
buried in data enhance your business 5 13೥9݄21೔౔༵೔

Visualization Two purposes: 1.Explanatory visualization 2.Exploratory visualization 6 13೥9݄21೔౔༵೔

Explanatory visualization communicating information clearly and effectively 7 13೥9݄21೔౔༵೔

8 13೥9݄21೔౔༵೔

Can you find STORIES? 9 13೥9݄21೔౔༵೔

10 13೥9݄21೔౔༵೔

11 13೥9݄21೔౔༵೔

12 13೥9݄21೔౔༵೔

ՃྸʹΑΔ ٕज़޲্ The more use, the higher skill 13 13೥9݄21೔౔༵೔

14 13೥9݄21೔౔༵೔

ͻͱΓͰ Ͱ͖ͳ͍ ೥બख͸ ͍ͳ͔ͬͨ Everybody who touch perl 10 years
use perl :) 15 13೥9݄21೔౔༵೔

16 13೥9݄21೔౔༵೔

೥໨ʹ ͕Μ͹Ζ͏ 7-10 years users practice more! 17 13೥9݄21೔౔༵೔

18 13೥9݄21೔౔༵೔

͔͜͜Β ಡΈऔΔ ͷ͸ϜϦ We cannot find STORIES from text log
19 13೥9݄21೔౔༵೔

This is the POWER of Explanatory visualization 20 13೥9݄21೔౔༵೔

I / You can inform STORIES in data to my
/ your colleague to my / your boss to my / your audiences 21 13೥9݄21೔౔༵೔

Exploratory visualization Visualization allow you to find STORIES from data
Data Mining 22 13೥9݄21೔౔༵೔

viaʢಓ۩ͱͯ͠ͷʣσʔλαΠΤϯςΟετͷ͔͍ͭํ 23 13೥9݄21೔౔༵೔

Problems Data Mining Engineer a few domain specific knowledge Domain
Specialist a few mining skill 24 13೥9݄21೔౔༵೔

GOAL Business Success Domain Specific Knowledge Data Mining Skill 25
13೥9݄21೔౔༵೔

How to apply Domain Specific Knowledge to Data Mining Processes
26 13೥9݄21೔౔༵೔

What is Glue between two? What is Ladder to Goal?
27 13೥9݄21೔౔༵೔

GOAL Business Success Domain Specific Knowledge Data Mining Skill 28
13೥9݄21೔౔༵೔

Ans. Visualization 29 13೥9݄21೔౔༵೔

GOAL Business Success Domain Specific Knowledge Data Mining Skill Trends
five-number summary Co-occurence Mosaic map Flow chart Parallel chart 30 13೥9݄21೔౔༵೔

31 13೥9݄21೔౔༵೔

Domain Specific Expert Data Mining Engineer 32 13೥9݄21೔౔༵೔

It looks like communication to your data and your collaborators
33 13೥9݄21೔౔༵೔

Exploratory Visualization for Exploratory Data Analysis do before model processing
or fitting, testing with Domain specific knowledge 34 13೥9݄21೔౔༵೔

How to communicate ? R S S-Plus SYSTAT SPSS Pands
35 13೥9݄21೔౔༵೔

We have Perl !! 36 13೥9݄21೔౔༵೔

Perl Data Language 37 13೥9݄21೔౔༵೔

Perl Data Language standard Perl the ability to compactly store
and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. PDL turns Perl into a free, array-oriented, numerical language similar to (but, we believe, better than) such commercial packages as IDL and MatLab. One can write simple perl 38 13೥9݄21೔౔༵೔

and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. PDL turns Perl into a free, array-oriented, numerical language similar to (but, we believe, better than) such commercial packages as IDL and MatLab. One can write simple perl 39 13೥9݄21೔౔༵೔

and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. PDL turns Perl into a free, array-oriented, numerical language similar to (but, we believe, better than) such commercial packages as IDL and MatLab. One can write simple perl We want Hash Object 40 13೥9݄21೔౔༵೔

#!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use PDL;
my $obj; $obj = pdl([[1,2,3],[4,5,6]]); print $obj; # [ # [1 2 3] # [4 5 6] # ] $obj = pdl([{a => 1, b => 2, c => 3}, {a => 4, b => 5, c => 6}]); # Hash given as a pdl - but not {PDL} key! at Basic/Core/ Core.pm.PL (i.e. PDL::Core.pm) line 1292. # ޻ΤΣ(´Д`)ΣΤ޻ 41 13೥9݄21೔౔༵೔

42 13೥9݄21೔౔༵೔

Data::Cube is an array of multi-dimensional data has OLAP methods:
dice, slice, etc. 43 13೥9݄21೔౔༵೔

Slice Rotate Dice Drill-down 44 13೥9݄21೔౔༵೔

45 13೥9݄21೔౔༵೔

46 13೥9݄21೔౔༵೔

47 13೥9݄21೔౔༵೔

͑ɾɾɾ 48 13೥9݄21೔౔༵೔

Data::Cube released! processing Array of Hash Object add / remove
rollup measure add / remove dimension add / remove hierarchy 49 13೥9݄21೔౔༵೔

Data::Cube 1. Data Date Country SalesPerson Product Units Unit_Cost Total
3/15/2005 US Sorvino Pencil 56 2.99 167.44 3/7/2006 US Sorvino Binder 7 19.99 139.93 8/24/2006 US Sorvino Desk 3 275.00 825.00 9/27/2006 US Sorvino Pen 76 1.99 151.24 5/22/2005 US Thompson Pencil 32 1.99 63.68 10/14/2006 US Thompson Binder 57 19.99 1139.43 4/18/2005 US Andrews Pencil 75 1.99 149.25 4/10/2006 US Andrews Pencil 66 1.99 131.34 10/31/2006 US Andrews Pencil 114 1.29 147.06 50 13೥9݄21೔౔༵೔

Data::Cube 2. Usage my $file = shift; my $data =
Text::CSV::Slurp->load(file => $file); my $cube; say "============================================================"; say "raw data size: ".(scalar @$data)."\n"; say "\n============================================================"; $cube = new Data::Cube("experience"); $cube->put($data); say Dumper $cube->rollup(noValues => 1); say "\n============================================================"; $cube->add_dimension("skill"); say Dumper $cube->rollup(noValues => 1); 51 13೥9݄21೔౔༵೔

Data::Cube 2. Usage my $file = shift; my $data =
Text::CSV::Slurp->load(file => $file); my $cube; say "============================================================"; say "raw data size: ".(scalar @$data)."\n"; say "\n============================================================"; $cube = new Data::Cube("experience"); $cube->put($data); say Dumper $cube->rollup(noValues => 1); say "\n============================================================"; $cube->add_dimension("skill"); say Dumper $cube->rollup(noValues => 1); ͨͬͨͷ͜Ε͚ͩ 52 13೥9݄21೔౔༵೔

Data::Cube 3. Results $VAR1 = [ { 'count' => 150,
'dim' => 10೥Ҏ্ }, { 'count' => 76, 'dim' => 1~3೥ }, { 'count' => 32, 'dim' => 1೥ະຬ }, { 'count' => 93, 'dim' => 4~6೥ }, { 'count' => 43, 'dim' => 7~9೥ } ]; 53 13೥9݄21೔౔༵೔

Data::Cube 4. Results $VAR1 = [ { 'dim' => 10೥Ҏ্,
'values' => [ { 'count' => 79, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 71, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }]}, { 'dim' => 1~3೥, 'values' => [ { 'count' => 8, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 56, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 12, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]}, { 'dim' => 1೥ະຬ, 'values' => [ { 'count' => 11, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 21, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]}, { 'dim' => 4~6೥, 'values' => [ { 'count' => 25, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 64, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 4, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]}, { 'dim' => 7~9೥, 'values' => [ { 'count' => 19, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 23, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 1, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]} ]; 54 13೥9݄21೔౔༵೔

Data::Cube 5. Measure my $cube = new Data::Cube("Country"); $cube->put($data); $cube->add_measure("sum",
sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum;}); $cube->add_measure("mean", sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum / (scalar @_);}); print Dumper $cube->rollup(noValues => 1); $cube->add_dimension("Product"); print Dumper $cube->rollup(noValues => 1); 55 13೥9݄21೔౔༵೔

Data::Cube 5. Measure my $cube = new Data::Cube("Country"); $cube->put($data); $cube->add_measure("sum",
sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum;}); $cube->add_measure("mean", sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum / (scalar @_);}); print Dumper $cube->rollup(noValues => 1); $cube->add_dimension("Product"); print Dumper $cube->rollup(noValues => 1); ͨͬͨͷ͜Ε͚ͩ 56 13೥9݄21೔౔༵೔

Data::Cube 6. Measure $VAR1 = [ { 'count' => 10,
'dim' => 'CA', 'sum' => 687, 'mean' => '68.7' }, { 'count' => 11, 'dim' => 'UK', 'sum' => 764, 'mean' => '69.4545454545455' }, { 'count' => 22, 'dim' => 'US', 'sum' => 1103, 'mean' => '50.1363636363636' } ]; 57 13೥9݄21೔౔༵೔

Data::Cube 7. Measure $VAR1 = [ { 'values' => [
{ 'count' => 5, 'dim' => 'Binder', 'sum' => 288, 'mean' => '57.6'}, { 'count' => 1, 'dim' => 'Pen', 'sum' => 51, 'mean' => 51}, { 'count' => 1, 'dim' => 'PenSet', 'sum' => 61, 'mean' => 61}, { 'count' => 3, 'dim' => 'Pencil', 'sum' => 287, 'mean' => '95.6666666666667'} ], 'dim' => 'CA' }, { 'values' => [ { 'count' => 4, 'dim' => 'Binder', 'sum' => 242, 'mean' => '60.5'}, { 'count' => 1, 'dim' => 'Pen', 'sum' => 12, 'mean' => 12}, { 'count' => 3, 'dim' => 'PenSet', 'sum' => 205, 'mean' => '68.3333333333333'}, { 'count' => 3, 'dim' => 'Pencil', 'sum' => 305, 'mean' => '101.666666666667'} ], 'dim' => 'UK' }, 58 13೥9݄21೔౔༵೔

to Visualization $cube -> HASH ref -> JSON JSON ->
d3.js -> visualization 59 13೥9݄21೔౔༵೔

Summary 60 13೥9݄21೔౔༵೔

Summary ͔ͤͬ͘঺հ͠·͠ ͕ͨɺ๻͸͜ͷखͷ ॲཧ͸3Ͱ΍Γ·͢ 61 13೥9݄21೔౔༵೔

> data = read.csv("./data/perl5census2013.csv") > summary(data) address experience 関東地方 :292
1~3年 : 76 近畿地方 : 39 10年以上:150 中部地方 : 23 1年未満 : 32 九州地方・沖縄: 17 4~6年 : 93 北海道地方 : 12 7~9年 : 43 東北地方 : 6 (Other) : 5 skill 初級者（人に聞いたりしないとなかなか書けない） : 38 上級者（一通り書こうと思えば書ける。わかんなかったらとりあえずソースコード読んじゃう）:131 中級者（だいたいやりたいことはできるが、本やサイトを頼りにして確認したりする） :225 frequencyatbusiness frequencyatprivate Min. : 1.000 Min. : 1.000 1st Qu.: 2.000 1st Qu.: 2.000 Median : 6.000 Median : 5.000 Mean : 5.779 Mean : 5.388 3rd Qu.:10.000 3rd Qu.: 8.000 Max. :10.000 Max. :10.000 versionmanager システム Perl (rpm, yum, インストール済みのperl等) :136 システム Perl (rpm, yum, インストール済みのperl等), perlbrew : 92 perlbrew : 76 システム Perl (rpm, yum, インストール済みのperl等), perlbrew, plenv: 14 perlbrew, plenv : 12 plenv : 9 (Other) : 55 ߦ 62 13೥9݄21೔౔༵೔

Summary: but ܧଓతͳՄࢹԽΛ ઐ໳ՈʹݟͤΔͱ͖ ѹ౗తͳ༏Ґੑʂ 63 13೥9݄21೔౔༵೔

Summary: Example Time Series sales, repeat rate, DAU, system info,
activities Effects of trial / campaign attribution, condition, cost, cash back, etc. 64 13೥9݄21೔౔༵೔

Perl for Visualization@YAPC2013

Perl for Visualization@YAPC2013

More Decks by muddydixon

Other Decks in Programming

Featured

Transcript