Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perl for Visualization@YAPC2013

Perl for Visualization@YAPC2013

muddydixon

March 28, 2023
Tweet

More Decks by muddydixon

Other Decks in Programming

Transcript

  1. print Dumper $me { twitter => “@muddydixon”, organization => “NIFTY”,

    job => “low spec full stack engineer”, skill => [ “data collecting”, “data cleansing”, “visualization”, ] } 2 13೥9݄21೔౔༵೔
  2. Key of Visualization STORIES are main concept of visualization are

    buried in data enhance your business 5 13೥9݄21೔౔༵೔
  3. I / You can inform STORIES in data to my

    / your colleague to my / your boss to my / your audiences 21 13೥9݄21೔౔༵೔
  4. Problems Data Mining Engineer a few domain specific knowledge Domain

    Specialist a few mining skill 24 13೥9݄21೔౔༵೔
  5. GOAL Business Success Domain Specific Knowledge Data Mining Skill Trends

    five-number summary Co-occurence Mosaic map Flow chart Parallel chart 30 13೥9݄21೔౔༵೔
  6. Exploratory Visualization for Exploratory Data Analysis do before model processing

    or fitting, testing with Domain specific knowledge 34 13೥9݄21೔౔༵೔
  7. Perl Data Language standard Perl the ability to compactly store

    and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. PDL turns Perl into a free, array-oriented, numerical language similar to (but, we believe, better than) such commercial packages as IDL and MatLab. One can write simple perl 38 13೥9݄21೔౔༵೔
  8. Perl Data Language standard Perl the ability to compactly store

    and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. PDL turns Perl into a free, array-oriented, numerical language similar to (but, we believe, better than) such commercial packages as IDL and MatLab. One can write simple perl 39 13೥9݄21೔౔༵೔
  9. Perl Data Language standard Perl the ability to compactly store

    and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. PDL turns Perl into a free, array-oriented, numerical language similar to (but, we believe, better than) such commercial packages as IDL and MatLab. One can write simple perl We want Hash Object 40 13೥9݄21೔౔༵೔
  10. #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; use PDL;

    my $obj; $obj = pdl([[1,2,3],[4,5,6]]); print $obj; # [ # [1 2 3] # [4 5 6] # ] $obj = pdl([{a => 1, b => 2, c => 3}, {a => 4, b => 5, c => 6}]); # Hash given as a pdl - but not {PDL} key! at Basic/Core/ Core.pm.PL (i.e. PDL::Core.pm) line 1292. # ޻ΤΣ(´Д`)ΣΤ޻ 41 13೥9݄21೔౔༵೔
  11. Data::Cube is an array of multi-dimensional data has OLAP methods:

    dice, slice, etc. 43 13೥9݄21೔౔༵೔
  12. Data::Cube released! processing Array of Hash Object add / remove

    rollup measure add / remove dimension add / remove hierarchy 49 13೥9݄21೔౔༵೔
  13. Data::Cube 1. Data Date Country SalesPerson Product Units Unit_Cost Total

    3/15/2005 US Sorvino Pencil 56 2.99 167.44 3/7/2006 US Sorvino Binder 7 19.99 139.93 8/24/2006 US Sorvino Desk 3 275.00 825.00 9/27/2006 US Sorvino Pen 76 1.99 151.24 5/22/2005 US Thompson Pencil 32 1.99 63.68 10/14/2006 US Thompson Binder 57 19.99 1139.43 4/18/2005 US Andrews Pencil 75 1.99 149.25 4/10/2006 US Andrews Pencil 66 1.99 131.34 10/31/2006 US Andrews Pencil 114 1.29 147.06 50 13೥9݄21೔౔༵೔
  14. Data::Cube 2. Usage my $file = shift; my $data =

    Text::CSV::Slurp->load(file => $file); my $cube; say "============================================================"; say "raw data size: ".(scalar @$data)."\n"; say "\n============================================================"; $cube = new Data::Cube("experience"); $cube->put($data); say Dumper $cube->rollup(noValues => 1); say "\n============================================================"; $cube->add_dimension("skill"); say Dumper $cube->rollup(noValues => 1); 51 13೥9݄21೔౔༵೔
  15. Data::Cube 2. Usage my $file = shift; my $data =

    Text::CSV::Slurp->load(file => $file); my $cube; say "============================================================"; say "raw data size: ".(scalar @$data)."\n"; say "\n============================================================"; $cube = new Data::Cube("experience"); $cube->put($data); say Dumper $cube->rollup(noValues => 1); say "\n============================================================"; $cube->add_dimension("skill"); say Dumper $cube->rollup(noValues => 1); ͨͬͨͷ͜Ε͚ͩ 52 13೥9݄21೔౔༵೔
  16. Data::Cube 3. Results $VAR1 = [ { 'count' => 150,

    'dim' => 10೥Ҏ্ }, { 'count' => 76, 'dim' => 1~3೥ }, { 'count' => 32, 'dim' => 1೥ະຬ }, { 'count' => 93, 'dim' => 4~6೥ }, { 'count' => 43, 'dim' => 7~9೥ } ]; 53 13೥9݄21೔౔༵೔
  17. Data::Cube 4. Results $VAR1 = [ { 'dim' => 10೥Ҏ্,

    'values' => [ { 'count' => 79, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 71, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }]}, { 'dim' => 1~3೥, 'values' => [ { 'count' => 8, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 56, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 12, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]}, { 'dim' => 1೥ະຬ, 'values' => [ { 'count' => 11, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 21, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]}, { 'dim' => 4~6೥, 'values' => [ { 'count' => 25, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 64, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 4, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]}, { 'dim' => 7~9೥, 'values' => [ { 'count' => 19, 'dim' => ্ڃऀʢҰ௨Γॻ͜͏ͱࢥ͑͹ॻ͚ΔɻΘ͔Μͳ͔ͬͨΒͱΓ͋͑ͣιʔείʔυಡΜ͡Ό͏ʣ }, { 'count' => 23, 'dim' => தڃऀʢ͍͍ͩͨ΍Γ͍ͨ͜ͱ͸Ͱ͖Δ͕ɺຊ΍αΠτΛཔΓʹͯ֬͠ೝͨ͠Γ͢Δʣ }, { 'count' => 1, 'dim' => ॳڃऀʢਓʹฉ͍ͨΓ͠ͳ͍ͱͳ͔ͳ͔ॻ͚ͳ͍ʣ }]} ]; 54 13೥9݄21೔౔༵೔
  18. Data::Cube 5. Measure my $cube = new Data::Cube("Country"); $cube->put($data); $cube->add_measure("sum",

    sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum;}); $cube->add_measure("mean", sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum / (scalar @_);}); print Dumper $cube->rollup(noValues => 1); $cube->add_dimension("Product"); print Dumper $cube->rollup(noValues => 1); 55 13೥9݄21೔౔༵೔
  19. Data::Cube 5. Measure my $cube = new Data::Cube("Country"); $cube->put($data); $cube->add_measure("sum",

    sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum;}); $cube->add_measure("mean", sub { my $sum = 0; foreach my $d (@_){ $sum += $d->{Units};} $sum / (scalar @_);}); print Dumper $cube->rollup(noValues => 1); $cube->add_dimension("Product"); print Dumper $cube->rollup(noValues => 1); ͨͬͨͷ͜Ε͚ͩ 56 13೥9݄21೔౔༵೔
  20. Data::Cube 6. Measure $VAR1 = [ { 'count' => 10,

    'dim' => 'CA', 'sum' => 687, 'mean' => '68.7' }, { 'count' => 11, 'dim' => 'UK', 'sum' => 764, 'mean' => '69.4545454545455' }, { 'count' => 22, 'dim' => 'US', 'sum' => 1103, 'mean' => '50.1363636363636' } ]; 57 13೥9݄21೔౔༵೔
  21. Data::Cube 7. Measure $VAR1 = [ { 'values' => [

    { 'count' => 5, 'dim' => 'Binder', 'sum' => 288, 'mean' => '57.6'}, { 'count' => 1, 'dim' => 'Pen', 'sum' => 51, 'mean' => 51}, { 'count' => 1, 'dim' => 'PenSet', 'sum' => 61, 'mean' => 61}, { 'count' => 3, 'dim' => 'Pencil', 'sum' => 287, 'mean' => '95.6666666666667'} ], 'dim' => 'CA' }, { 'values' => [ { 'count' => 4, 'dim' => 'Binder', 'sum' => 242, 'mean' => '60.5'}, { 'count' => 1, 'dim' => 'Pen', 'sum' => 12, 'mean' => 12}, { 'count' => 3, 'dim' => 'PenSet', 'sum' => 205, 'mean' => '68.3333333333333'}, { 'count' => 3, 'dim' => 'Pencil', 'sum' => 305, 'mean' => '101.666666666667'} ], 'dim' => 'UK' }, 58 13೥9݄21೔౔༵೔
  22. to Visualization $cube -> HASH ref -> JSON JSON ->

    d3.js -> visualization 59 13೥9݄21೔౔༵೔
  23. > data = read.csv("./data/perl5census2013.csv") > summary(data) address experience 関東地方 :292

    1~3年 : 76 近畿地方 : 39 10年以上:150 中部地方 : 23 1年未満 : 32 九州地方・沖縄: 17 4~6年 : 93 北海道地方 : 12 7~9年 : 43 東北地方 : 6 (Other) : 5 skill 初級者(人に聞いたりしないとなかなか書けない) : 38 上級者(一通り書こうと思えば書ける。わかんなかったらとりあえずソースコード読んじゃう):131 中級者(だいたいやりたいことはできるが、本やサイトを頼りにして確認したりする) :225 frequencyatbusiness frequencyatprivate Min. : 1.000 Min. : 1.000 1st Qu.: 2.000 1st Qu.: 2.000 Median : 6.000 Median : 5.000 Mean : 5.779 Mean : 5.388 3rd Qu.:10.000 3rd Qu.: 8.000 Max. :10.000 Max. :10.000 versionmanager システム Perl (rpm, yum, インストール済みのperl等) :136 システム Perl (rpm, yum, インストール済みのperl等), perlbrew : 92 perlbrew : 76 システム Perl (rpm, yum, インストール済みのperl等), perlbrew, plenv: 14 perlbrew, plenv : 12 plenv : 9 (Other) : 55 ߦ 62 13೥9݄21೔౔༵೔
  24. Summary: Example Time Series sales, repeat rate, DAU, system info,

    activities Effects of trial / campaign attribution, condition, cost, cash back, etc. 64 13೥9݄21೔౔༵೔