Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Rubyで機械学習が出来る 未来を目指すRed Data Toolsの 現状と今後について
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
hatappi
November 01, 2017
Technology
1.8k
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Rubyで機械学習が出来る 未来を目指すRed Data Toolsの 現状と今後について
RubyWorld Conferencee #rubyworld
hatappi
November 01, 2017
More Decks by hatappi
See All by hatappi
AlloyDB 奮闘記
hatappi
0
420
Cloudflare を活用して変わったメルカリの開発体験 / How Cloudflare Changed Mercari's Development Experience
hatappi
1
840
RubyではじめるGraphQL
hatappi
0
940
RubyでChainerつくってます!!
hatappi
2
1.5k
TDDな個人開発
hatappi
0
370
できるだけ楽して楽しくRails開発しよう
hatappi
2
370
EKSにRailsをのせた
hatappi
1
1.3k
RubyとApache Arrow
hatappi
0
2.6k
Red Chainerを なぜ作って今後どうするのか
hatappi
2
2.5k
Other Decks in Technology
See All in Technology
あなたの知らないPDFのアクセシビリティ
lycorptech_jp
PRO
0
200
2026TECHFRESH畢業分享會 - AI 時代的人生存檔點
line_developers_tw
PRO
0
1.1k
就職⽀援サービスにおけるキャリアアドバイザーのシフトスケジューリング
recruitengineers
PRO
1
150
マルチアカウント環境での コーディングエージェントを使った障害調査が大変なので AIエージェントにReadOnly権限を付与してみた / ReadOnly AI Agents for Multi-Account AWS Incident Response
yamaguchitk333
2
110
手塩にかけりゃいいってもんじゃない
ming_ayami
0
590
Kubernetesにおける学習基盤とLLMOpsの概要
ry
1
310
Android の公式 Skill / Android skills
yanzm
0
150
AIの性能が向上しても未解決な組織の重大問題は何か?/An Unsolved Organizational Problem in the Age of AI
moriyuya
4
680
SONiC Scale-Up Working Group から探る Scale-UpやUltraEthernet機能の実装方法
ebiken
PRO
2
350
脆弱性対応、どこで線を引くか
rymiyamoto
1
400
なぜ Platform Engineering の土台に Kubernetes を選ぶのか
r4ynode
2
650
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
Featured
See All Featured
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.6k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.2k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
460
Unsuck your backbone
ammeep
672
58k
Un-Boring Meetings
codingconduct
0
310
Utilizing Notion as your number one productivity tool
mfonobong
4
320
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
480
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
410
Prompt Engineering for Job Search
mfonobong
0
340
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.8k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
840
Transcript
RubyͰػցֶश͕ग़དྷΔ ະདྷΛࢦ͢Red Data Toolsͷ ݱঢ়ͱࠓޙʹ͍ͭͯ by Yusaku Hatanaka @RubyWorld Conference
2017 1
self.intoroduction { name: "Yusaku Hatanaka", twitter: "@hatappi", github: "hatappi", hatena:
"hatappi1225", company: "Speee, Inc." languages: %w(ruby go python), icon: "ɹ " } 2
ࣗݾհ w ാத༔࡞ w 5XJUUFSɺ(JUIVCIBUBQQJ ͯͳϒϩάIBUBQQJ w גࣜձࣾ4QFFF w σδλϧίϯαϧςΟϯάࣄۀຊ෦
ΞυςΫࣄۀ෦6;06ࣄۀ w 6;06ࣄۀΤϯδχΞ
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 4
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 5
6 RubyͰ ػցֶश͍ͨ͠ʂ
7 ػցֶशʹݶΒͣ σʔλੳʹ͓͍ͯ σʔλ࿈ܞඞཁͱͳΔ
8 σʔλ࿈ܞͱ͍͏ͱ JSONCSV ??
JSONCSV ϑΝΠϧͰͷΓͱΓ͚ͩͰͳ͘APIͰͷϦΫΤε τɾϨεϙϯεͳͲͰ༻͞ΕΔ 9 PythonͰग़ྗͨ͠CSVΛRubyͰಡΈࠐΉྫ write: 0.8009s read: 0.1227s
CSVJSONͰ େ͖ͳσʔλΛॲཧ ಛఆͷྻͰूܭॲཧΛ͍ͨ͠߹Ͱ Ұͯ͢ΛಡΈࠐΉඞཁ͕͋ΔͷͰ େ͖ͳσʔλΛॲཧ͢Δʹݶք͕͋Δ 10 ػցֶशͰඦສͷେ͖ͳσʔληοτΛ͏͜ͱ ͋ΔͷͰେ͖ͳσʔλΛѻ͑ΔΑ͏ʹͳΓ͍ͨ
11 Apache Parquet
Apache Parquetͱ • ΧϥϜܕͷσʔλϑΥʔϚοτ • શ෦ΛಡΈࠐ·ͣͱσʔλͷҰ෦ΛऔΓग़ͤΔ • ྫ͑ಛఆͷྻΛऔΓग़ͯ͠ूܭ͢Δ • αΠζ͕খ͍͞
• ΧϥϜͰσʔλΛ֨ೲ͢ΔͨΊಉ͡ܕͷσʔλΛѹॖ͢ Δ͜ͱ͕ग़དྷΔ 12 େ͖ͳσʔλΛॲཧͯ͠อଘ͢Δ͜ͱʹ͍͍ͯΔ
Apache Parquet 13 PythonͰग़ྗͨ͠ParquetΛRubyͰಡΈࠐΉྫ write: 0.1390s read: 0.0422s
14 େྔͷσʔλΛ ѻ͑ΔΑ͏ʹͳͬͨ
ෳͷγεςϜؒͰͷ σʔλΛ࿈ܞ ParquetσʔλΛѹॖ͢ΔͨΊσʔλॲཧ͢Δաఔ ͰෳγεςϜؒͰσʔλ࿈ܞΛߦ͏ࡍͦΕΛݩʹ ͢Φʔόʔϔου͕ੜ͡Δ 15 γʔϜϨεʹγεςϜؒͰσʔλ࿈ܞ͍ͨ͠!
16 Apache Arrow
Apache Arrowͱ • ΠϯϝϞϦͰΧϥϜܕσʔλΛѻ͏ͨΊͷϑΥʔϚοτͱΞϧΰ ϦζϜ • γϦΞϥΠζɾσγϦΞϥΠζίετ͕΄΅0 • σʔλΛ͢ଆड͚औΔଆCPUΛΘͳ͍ •
θϩίϐʔ • σʔλΛड͚औΔଆϝϞϦίϐʔʹ࣌ؒΛΘͳͯ͘Α͍ 17 σʔλަʹΉ͍͍ͯΔ
Apache Arrow 18 PythonͰग़ྗͨ͠ParquetΛRubyͰಡΈࠐΉྫ write: 0.0775s read: 0.0074s
19 1ZUIPOͰͷॻ͖ࠐΈ T 3VCZͰͷಡΈࠐΈ T $47 "QBDIF1BSRVFU
"QBDIF"SSPX 5000Ϩίʔυʹର͢Δ ಡΈॻ͖ൺֱ·ͱΊ
Apache Arrow͕ͳ͍࣌ 20
Apache Arrow͕͋Δ࣌ 21
RubyͰParquetΛѻ͑Δͱ 22 ෳͷγεςϜؒ"QBDIF "SSPXͰσʔλΛ࿈ܞΛߦ͍ σʔλੳΛߦ͏ ParquetΛѻ͏͜ͱͰσʔλͷ࿈ܞग़དྷͨ ͔͠͠RubyͰσʔλੳ͕͍ͨ͠ʂʂʹಧ͔ͳ͍
RubyͰArrowΛѻ͑Δͱ 23 ෳͷγεςϜؒ"QBDIF "SSPXͰσʔλΛ࿈ܞΛߦ͍ σʔλੳΛߦ͏ RubyͰσʔλॲཧͷҰ෦Λ୲͏͜ͱ͕ग़དྷΔ!!
3VCZͰ"SSPXΛѻ͑Δͱ 24 "QBDIF"SSPXʹରԠ͢Δ͜ͱͰඞཁͳ෦͔Β 3VCZΛͬͨσʔλੳΛ͡ΊΔ͜ͱ͕ग़དྷΔ w 3VCZͰूΊͨσʔλΛ"QBDIF"SSPXʹରԠͯ͠ ͍Δ1BOEBT4QBSLʹ࿈ܞ͠ੳͨ݁͠ՌΛ3VCZ Ͱड͚औͬͯ3BJMTΛͬͨXFCΞϓϦͰՄࢹԽ w ޙʹ1BOEBT4QBSL෦Λঃʑʹ3VCZҠߦ͢
Δ͜ͱग़དྷΔ ྫ
25
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 26
Red Data Tools • גࣜձࣾΫϦΞίʔυͷਢ౻͞Μ͕20172݄ʹ ϓϩδΣΫτΛઃཱ • Ruby༻ͷσʔλॲཧπʔϧΛఏڙ͢Δ͜ͱΛత ͱͨ͠ϓϩδΣΫτ •
ଟ͘ͷݴޠ͕ڞ௨ͯ͠༻Ͱ͖ΔApache ArrowΛ ༻͢Δ͜ͱͰRubyίϛϡχςΟʔΛ͑ͯڠྗ ͢Δ 27
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 28
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowରԠ • ৽͍͠πʔϧΛఏڙ 29
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowରԠ • ৽͍͠πʔϧΛఏڙ 30
Red Arrow w "QBDIF"SSPXͷ3VCZόΠϯσΟϯά w (JU)VCSFEEBUBUPPMTSFEBSSPX w "SSPX(-JCͱHPCKFDUJOUSPTQFDUJPOΛͬͯ "QBDIF"SSPXͷόΠϯσΟϯάΛ࣮ݱ͍ͯ͠Δ 31
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowରԠ • ৽͍͠πʔϧΛఏڙ 32
SciRubyɾRuby Numo • Պֶٕज़ܭࢉɺσʔλՄࢹԽ༻్ͷGem܈Λ։ൃ ͍ͯ͠ΔϓϩδΣΫτ • σʔλϑϨʔϜΛѻ͑Δdaruߦྻܭࢉ͕ग़དྷΔ numo-narryͳͲ͕͋Δ • ֤༻్Ͱݸผͷgemଘࡏ͢Δ͔Β
ͦΕΒ͕࿈ܞͰ͖ΔΑ͏ʹͳΔͱ ͬͱΑ͍ʂ 33
PyCall • @mrkn͞Μ͕࡞͞Ε͍ͯΔRubyͱPythonͷϒ ϦοδϥΠϒϥϦ • PythonͰ࡞͞Εͨطଘͷࢿ࢈Λͬͯ PythonͷΦϒδΣΫτΛ RubyͰ༻͢Δ͜ͱ͕ग़དྷΔ 34
طଘgemͷArrowରԠ 35 4DJ3VCZ 1Z$BMM "QBDIF"SSPXʹରԠͤ͞Δ͜ͱͰ طଘͷHFNΛ༻ͯ͠σʔλੳΛ͡ΊΒΕΔ 3VCZ/VNP
ݱঢ়ͷऔΓΈ • Red Arrow • طଘgemͷArrowରԠ • ৽͍͠πʔϧΛఏڙ 36
Red Chainer • ChainerΛRubyϙʔςΟϯάͨ͠ͷ • ChainerͷΫϥεύϥϝʔλͷ࣋ͪํΛࢀߟʹ RubyͰॻ͘͜ͱͰ0͔Β࡞ΔͷͰͳ͘طଘͷࢿ࢈ Λ׆͔ͯ͠࡞Δ͜ͱ͕ग़དྷΔ • ෦ͷྻʹApache
ArrowʹରԠͨ͠numo- narrayͤ͞Δ͜ͱͰApache Arrowܗࣜʹม͢Δ ͜ͱՄೳ 37
38 MNIST
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 39
Red Data Toolsͷࠓޙ • Ҿ͖ଓ͖طଘͷgemͷAapche ArrowͷରԠߦ ͏ • Red ChainerͷΑ͏ͳ৽͍͠πʔϧͷ࡞
40 3VCZؒͰσʔλੳ͕ ग़དྷΔΑ͏ʹͳΓ͍ͨ
Red Data Toolsͷࠓޙ Apache Arrowຊମͷ։ൃͷࢀՃ 41 "QBDIF"SSPX $ FUD
Agenda • Rubyʹ͓͚Δػցֶशͷݱঢ়ʹ͍ͭͯ • Red Data Toolsʹ͍ͭͯ • ݱঢ়ͷऔΓΈ •
ະདྷʹ͍ͭͯ • ·ͱΊ 42
·ͱΊ w "QBDIF"SSPXʹରԠ͢Δ͜ͱͰඞཁͳ෦͔Β 3VCZͰσʔλੳΛ͡ΊΔ͜ͱ͕ग़དྷΔ w গͮͭ͠Ͱ3VCZͰσʔλੳ͕ग़དྷΔະདྷʂ 43
44 ࠂ
։ൃΠϕϯτͬͯ·͢ 45 w ຖ݄ճͷఆظ։࠵ w ॴ4QFFFͰߦͬͯ·͢!౦ژຊ w σʔλੳ͜Ε͔ΒͷਓͰͲͳͨͰ0,Ͱ͢ʂ
Gitter 46 w ຊޠɹIUUQTHJUUFSJNSFEEBUBUPPMTKB w &OHMJTIIUUQTHJUUFSJNSFEEBUBUPPMTFO