Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Rubyで機械学習が出来る_未来を目指すRed_Data_Toolsの_現状と今後について...
Search
hatappi
November 01, 2017
0
82
Rubyで機械学習が出来る_未来を目指すRed_Data_Toolsの_現状と今後について.pdf
hatappi
November 01, 2017
Tweet
Share
More Decks by hatappi
See All by hatappi
Cloudflare を活用して変わったメルカリの開発体験 / How Cloudflare Changed Mercari's Development Experience
hatappi
1
630
RubyではじめるGraphQL
hatappi
0
850
RubyでChainerつくってます!!
hatappi
2
1.4k
TDDな個人開発
hatappi
0
310
できるだけ楽して楽しくRails開発しよう
hatappi
2
330
EKSにRailsをのせた
hatappi
1
1.2k
RubyとApache Arrow
hatappi
0
2.4k
Red Chainerを なぜ作って今後どうするのか
hatappi
2
2.4k
Fargateで夢は見られるのか
hatappi
1
2.2k
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.4k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Practical Orchestrator
shlominoach
188
11k
Typedesign – Prime Four
hannesfritz
42
2.7k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
730
Making the Leap to Tech Lead
cromwellryan
134
9.4k
GraphQLとの向き合い方2022年版
quramy
49
14k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Done Done
chrislema
184
16k
Transcript
RubyͰػցֶश͕ग़དྷΔ ະདྷΛࢦ͢Red Data Toolsͷ ݱঢ়ͱࠓޙʹ͍ͭͯ by Yusaku Hatanaka @RubyWorld Conference
2017 1
self.intoroduction { name: "Yusaku Hatanaka", twitter: "@hatappi", github: "hatappi", hatena:
"hatappi1225", company: "Speee, Inc." languages: %w(ruby go python), icon: "ɹ " } 2
ࣗݾհ w ാத༔࡞ w 5XJUUFSɺ(JUIVCIBUBQQJ ͯͳϒϩάIBUBQQJ w גࣜձࣾ4QFFF w σδλϧίϯαϧςΟϯάࣄۀຊ෦
ΞυςΫࣄۀ෦6;06ࣄۀ w 6;06ࣄۀΤϯδχΞ
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 4
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 5
6 RubyͰ ػցֶश͍ͨ͠ʂ
7 ػցֶशʹݶΒͣ σʔλੳʹ͓͍ͯ σʔλ࿈ܞඞཁͱͳΔ
8 σʔλ࿈ܞͱ͍͏ͱ JSONCSV ??
JSONCSV ϑΝΠϧͰͷΓͱΓ͚ͩͰͳ͘APIͰͷϦΫΤε τɾϨεϙϯεͳͲͰ༻͞ΕΔ 9 PythonͰग़ྗͨ͠CSVΛRubyͰಡΈࠐΉྫ write: 0.8009s read: 0.1227s
CSVJSONͰ େ͖ͳσʔλΛॲཧ ಛఆͷྻͰूܭॲཧΛ͍ͨ͠߹Ͱ Ұͯ͢ΛಡΈࠐΉඞཁ͕͋ΔͷͰ େ͖ͳσʔλΛॲཧ͢Δʹݶք͕͋Δ 10 ػցֶशͰඦສͷେ͖ͳσʔληοτΛ͏͜ͱ ͋ΔͷͰେ͖ͳσʔλΛѻ͑ΔΑ͏ʹͳΓ͍ͨ
11 Apache Parquet
Apache Parquetͱ • ΧϥϜܕͷσʔλϑΥʔϚοτ • શ෦ΛಡΈࠐ·ͣͱσʔλͷҰ෦ΛऔΓग़ͤΔ • ྫ͑ಛఆͷྻΛऔΓग़ͯ͠ूܭ͢Δ • αΠζ͕খ͍͞
• ΧϥϜͰσʔλΛ֨ೲ͢ΔͨΊಉ͡ܕͷσʔλΛѹॖ͢ Δ͜ͱ͕ग़དྷΔ 12 େ͖ͳσʔλΛॲཧͯ͠อଘ͢Δ͜ͱʹ͍͍ͯΔ
Apache Parquet 13 PythonͰग़ྗͨ͠ParquetΛRubyͰಡΈࠐΉྫ write: 0.1390s read: 0.0422s
14 େྔͷσʔλΛ ѻ͑ΔΑ͏ʹͳͬͨ
ෳͷγεςϜؒͰͷ σʔλΛ࿈ܞ ParquetσʔλΛѹॖ͢ΔͨΊσʔλॲཧ͢Δաఔ ͰෳγεςϜؒͰσʔλ࿈ܞΛߦ͏ࡍͦΕΛݩʹ ͢Φʔόʔϔου͕ੜ͡Δ 15 γʔϜϨεʹγεςϜؒͰσʔλ࿈ܞ͍ͨ͠!
16 Apache Arrow
Apache Arrowͱ • ΠϯϝϞϦͰΧϥϜܕσʔλΛѻ͏ͨΊͷϑΥʔϚοτͱΞϧΰ ϦζϜ • γϦΞϥΠζɾσγϦΞϥΠζίετ͕΄΅0 • σʔλΛ͢ଆड͚औΔଆCPUΛΘͳ͍ •
θϩίϐʔ • σʔλΛड͚औΔଆϝϞϦίϐʔʹ࣌ؒΛΘͳͯ͘Α͍ 17 σʔλަʹΉ͍͍ͯΔ
Apache Arrow 18 PythonͰग़ྗͨ͠ParquetΛRubyͰಡΈࠐΉྫ write: 0.0775s read: 0.0074s
19 1ZUIPOͰͷॻ͖ࠐΈ T 3VCZͰͷಡΈࠐΈ T $47 "QBDIF1BSRVFU
"QBDIF"SSPX 5000Ϩίʔυʹର͢Δ ಡΈॻ͖ൺֱ·ͱΊ
Apache Arrow͕ͳ͍࣌ 20
Apache Arrow͕͋Δ࣌ 21
RubyͰParquetΛѻ͑Δͱ 22 ෳͷγεςϜؒ"QBDIF "SSPXͰσʔλΛ࿈ܞΛߦ͍ σʔλੳΛߦ͏ ParquetΛѻ͏͜ͱͰσʔλͷ࿈ܞग़དྷͨ ͔͠͠RubyͰσʔλੳ͕͍ͨ͠ʂʂʹಧ͔ͳ͍
RubyͰArrowΛѻ͑Δͱ 23 ෳͷγεςϜؒ"QBDIF "SSPXͰσʔλΛ࿈ܞΛߦ͍ σʔλੳΛߦ͏ RubyͰσʔλॲཧͷҰ෦Λ୲͏͜ͱ͕ग़དྷΔ!!
3VCZͰ"SSPXΛѻ͑Δͱ 24 "QBDIF"SSPXʹରԠ͢Δ͜ͱͰඞཁͳ෦͔Β 3VCZΛͬͨσʔλੳΛ͡ΊΔ͜ͱ͕ग़དྷΔ w 3VCZͰूΊͨσʔλΛ"QBDIF"SSPXʹରԠͯ͠ ͍Δ1BOEBT4QBSLʹ࿈ܞ͠ੳͨ݁͠ՌΛ3VCZ Ͱड͚औͬͯ3BJMTΛͬͨXFCΞϓϦͰՄࢹԽ w ޙʹ1BOEBT4QBSL෦Λঃʑʹ3VCZҠߦ͢
Δ͜ͱग़དྷΔ ྫ
25
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 26
Red Data Tools • גࣜձࣾΫϦΞίʔυͷਢ౻͞Μ͕20172݄ʹ ϓϩδΣΫτΛઃཱ • Ruby༻ͷσʔλॲཧπʔϧΛఏڙ͢Δ͜ͱΛత ͱͨ͠ϓϩδΣΫτ •
ଟ͘ͷݴޠ͕ڞ௨ͯ͠༻Ͱ͖ΔApache ArrowΛ ༻͢Δ͜ͱͰRubyίϛϡχςΟʔΛ͑ͯڠྗ ͢Δ 27
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 28
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowԽ • ৽͍͠πʔϧΛఏڙ 29
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowԽ • ৽͍͠πʔϧΛఏڙ 30
Red Arrow w "QBDIF"SSPXͷ3VCZόΠϯσΟϯά w (JU)VCSFEEBUBUPPMTSFEBSSPX w "SSPX(-JCͱHPCKFDUJOUSPTQFDUJPOΛͬͯ "QBDIF"SSPXͷόΠϯσΟϯάΛ࣮ݱ͍ͯ͠Δ 31
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowԽ • ৽͍͠πʔϧΛఏڙ 32
SciRubyɾRuby Numo • Պֶٕज़ܭࢉɺσʔλՄࢹԽ༻్ͷGem܈Λ։ൃ ͍ͯ͠ΔϓϩδΣΫτ • σʔλϑϨʔϜΛѻ͑Δdaruߦྻܭࢉ͕ग़དྷΔ numo-narryͳͲ͕͋Δ • ֤༻్Ͱݸผͷgemଘࡏ͢Δ͔Β
ͦΕΒ͕࿈ܞͰ͖ΔΑ͏ʹͳΔͱ ͬͱΑ͍ʂ 33
PyCall • @mrkn͞Μ͕࡞͞Ε͍ͯΔRubyͱPythonͷϒ ϦοδϥΠϒϥϦ • PythonͰ࡞͞Εͨطଘͷࢿ࢈Λͬͯ PythonͷΦϒδΣΫτΛ RubyͰ༻͢Δ͜ͱ͕ग़དྷΔ 34
طଘgemͷArrowԽ 35 4DJ3VCZ 1Z$BMM "QBDIF"SSPXʹରԠͤ͞Δ͜ͱͰ طଘͷHFNΛ༻ͯ͠σʔλੳΛ͡ΊΒΕΔ 3VCZ/VNP
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowԽ • ৽͍͠πʔϧΛఏڙ 36
Red Chainer • ChainerΛRubyϙʔςΟϯάͨ͠ͷ • ChainerͷΫϥεύϥϝʔλͷ࣋ͪํΛࢀߟʹ RubyͰॻ͘͜ͱͰ0͔Β࡞ΔͷͰͳ͘طଘͷࢿ࢈ Λ׆͔ͯ͠࡞Δ͜ͱ͕ग़དྷΔ • ෦ͷྻʹApache
ArrowʹରԠͨ͠numo- narrayͤ͞Δ͜ͱͰApache Arrowܗࣜʹม͢Δ ͜ͱՄೳ 37
38 MNIST
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 39
Red Data Toolsͷࠓޙ • Ҿ͖ଓ͖طଘͷgemͷAapche ArrowͷରԠߦ ͏ • Red ChainerͷΑ͏ͳ৽͍͠πʔϧͷ࡞
40 3VCZؒͰσʔλੳ͕ ग़དྷΔΑ͏ʹͳΓ͍ͨ
Red Data Toolsͷࠓޙ Apache Arrowຊମͷ։ൃͷࢀՃ 41 "QBDIF"SSPX $ FUD
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 42
·ͱΊ w "QBDIF"SSPXʹରԠ͢Δ͜ͱͰඞཁͳ෦͔Β 3VCZͰσʔλੳΛ͡ΊΔ͜ͱ͕ग़དྷΔ w গͮͭ͠Ͱ3VCZͰσʔλੳ͕ग़དྷΔະདྷʂ 43
44 ࠂ
։ൃΠϕϯτͬͯ·͢ 45 w ຖ݄ճͷఆظ։࠵ w ॴ4QFFFͰߦͬͯ·͢!౦ژຊ w σʔλੳ͜Ε͔ΒͷਓͰͲͳͨͰ0,Ͱ͢ʂ
Gitter 46 w ຊޠɹIUUQTHJUUFSJNSFEEBUBUPPMTKB w &OHMJTIIUUQTHJUUFSJNSFEEBUBUPPMTFO