Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Exploratory Hour #41 データを集計したい
Search
Takato Shiroto
December 03, 2019
Science
95
0
Share
Exploratory Hour #41 データを集計したい
2019/12/3(水)に行ったExploratory Hourのスライドです。
このスライドでは、集計 (Summarize)の基本的な使い方とよく使う関数について紹介しています。
Takato Shiroto
December 03, 2019
More Decks by Takato Shiroto
See All by Takato Shiroto
Exploratory v6.7の紹介
takatoshiroto
0
920
Exploratory v6.6の紹介
takatoshiroto
0
1.6k
Exploratory v6.5の紹介
takatoshiroto
0
5.1k
コンバージョン率と信頼区間の推移を可視化する方法
takatoshiroto
1
350
Exploratory Hour #104 - 別の列の値をもとに、カテゴリー列の値の順序を指定したい
takatoshiroto
0
230
Exploratory Hour #105 - 元のデータ順をもとに、カテゴリー列の値の順序を指定したい
takatoshiroto
1
300
Exploratory Hour #102 - complete関数を使って2つの時間の間の値を生成したい
takatoshiroto
0
140
Exploratory Hour #103 - 仕事の開始・終了時間データから、どの時間に何人働いているか知りたい
takatoshiroto
0
120
Exploratory v6.4の紹介
takatoshiroto
0
5.8k
Other Decks in Science
See All in Science
Accelerated Computing for Climate forecast
inureyes
PRO
0
160
Deep Space Network (abreviated)
tonyrice
0
100
20260220 OpenIDファウンデーション・ジャパン ご紹介 / 20260220 OpenID Foundation Japan Intro
oidfj
0
300
シャボン玉の虹から原子も地震も重力も見える! 〜 物理の目「干渉縞」のすごい力 〜
syotasasaki593876
1
110
データマイニング - ウェブとグラフ
trycycle
PRO
0
260
SHINOMIYA Nariyoshi
genomethica
0
120
人生を変えた一冊「独学大全」のはなし / Self-study ENCYCLOPEDIA: The Book Which Change My Life #独学大全 #EM推し本
expajp
0
140
NDCG is NOT All I Need
statditto
2
3k
Non-Gaussian, nonlinear causal discovery with hidden variables and application
sshimizu2006
0
100
凸最適化からDC最適化まで
santana_hammer
1
370
次代のデータサイエンティストへ~スキルチェックリスト、タスクリスト更新~
datascientistsociety
PRO
3
33k
検索と推論タスクに関する論文の紹介
ynakano
1
180
Featured
See All Featured
Color Theory Basics | Prateek | Gurzu
gurzu
0
280
Designing Experiences People Love
moore
143
24k
Building Adaptive Systems
keathley
44
3k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
140
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
510
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
99
New Earth Scene 8
popppiees
2
2k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
350
Balancing Empowerment & Direction
lara
5
1k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Transcript
EXPLORATORY 1
2 εϐʔΧʔ നށ ܟొ Customer Succes EXPLORATORY ུྺ େֶࡏֶதʹϑʔυϩεΛݮΒͨ͢ΊʹɺֶੜஂମΛ্ཱͪ͛දΛ ΊΔɻͦͷޙɺϏδωεΛΔͨΊʹԽֶϝʔΧʔͷσϡϙϯͱ
ϑʔυςοΫܥελʔτΞοϓͰӦۀͱϚʔέςΟϯάΛܦݧɻΞϓ ϦͷͷͨΊʹσʔλαΠΤϯε͕ඞཁͩͱײ͡ɺΞϓϦʹಛԽ ͨ͠ϢʔβʔͷߦಈੳπʔϧΛ։ൃ͢ΔاۀʹͯɺΞϓϦۀքͷ KPIੳͳͲΛ୲͢ΔɻݱࡏExploratory, Inc. ͰΧελϚʔαΫη εΛ୲͢ΔΒɺσʔλͷՄࢹԽͱ୳ࡧతσʔλੳΛઐͱͯ͠ σʔλαΠΤϯεͷීٴʹऔΓΉɻ @ShirotoTakato
Exploratory Hour
ࠓिͷ࣭ σʔλΛूܭ͍ͨ͠
Customer Name Sales Mike 34 Mike 26 Mike 36 5
ܭࢉΛ࡞ͷ߹1ߦ1ߦʹܭࢉ݁ՌΛฦ͢ ྫɿܭࢉΛ࡞ʢMutateʣ Customer Name Counts Sales_avg Mike 3 32 Mike 3 32 Mike 3 32
Customer Name counts Sales_avg Mike 3 32 Customer Name Sales
Mike 34 Mike 26 Mike 36 6 ूܭͷ߹ɺάϧʔϓ͝ͱʹ·ͱΊͯ1ߦʹܭࢉ݁ՌΛฦ͢ ྫɿूܭʢSummarizeʣ
7 ूܭʢSummarizeʣ
ؔ ఠཁ sum άϧʔϓͷதͷͷ߹ܭ n άϧʔϓͷதͷߦ n_distinct άϧʔϓͷதͷҰҙͷͷ mean άϧʔϓͷฏۉ
median άϧʔϓͷதԝ min άϧʔϓͷ࠷খ max άϧʔϓͷ࠷େ first άϧʔϓͷ࠷ॳͷ last άϧʔϓͷ࠷ޙͷ ूܭ ؔ 8
ؔ ఠཁ nth άϧʔϓͷN൪ͷ sd άϧʔϓͷඪ४ภࠩ var άϧʔϓͷࢄ IQR άϧʔϓͷ̐Ґൣғ(75%͔Β25%·Ͱ)
mad άϧʔϓͷฏۉઈରภࠩ na_count άϧʔϓͷܽଛͷ na_percent άϧʔϓͷܽଛͷׂ߹ ूܭ ؔ 9
• جຊతͳूܭؔʢߦͷɺ߹ܭɺฏۉɺҰҙͳͷʣ • ࠷සΛmodeؔͰٻΊΔ • ࠷ॳͷʢ࠷ޙͷʣͱ࠷খʢ࠷େʣͷҧ͍ • ϩδΧϧܕͷूܭؔʢTRUEͷɺTRUEͷׂ߹ʣ ूܭ (Summarize)
ച্σʔλ
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
ूܭ͍ͨ͠
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
15 εςοϓͷྻϔομϝχϡʔ͔Βूܭ (Summarize)Λબ͢Δ
16 ूܭͷμΠΞϩά͕දࣔ͞Εͨ
17 ʹߦͷΛબ͢Δ ϓϨϏϡʔը໘ʹબ͞Ε͕ͨදࣔ͞ ΕΔɻαϯϓϧ͞Εͨ5000ߦΛදࣔ͢Δ ͜ͱ͕Ͱ͖Δɻ
18 άϧʔϓԽʹCustomer IDΛબ͢Δ Customer ID͝ͱʹάϧʔϓԽͯ͠ɺ ߦͷΛूܭ͢Δ͜ͱ͕Ͱ͖ͨɻ
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
20 ͲͷΑ͏ʹߪങͨ͠ͷछྨΛ͑Ε͍͍͔ʁ ސ٬໊ Mike ϖϯ Mike ϊʔτ Mike ص
Tom ϊʔτ Tom ϖϯ Tom ϊʔτ
21 ͲͷΑ͏ʹߪങͨ͠ͷछྨΛ͑Ε͍͍͔ʁ ސ٬໊ Mike ϖϯ Mike ϊʔτ Mike ص
Tom ϊʔτ Tom ϖϯ Tom ϊʔτ
22 ҰҙͳͷΛ༻͢Δͱɺॏෳͨ͠ΛΧϯτ ͤͣʹछྨͷΛٻΊΔ͜ͱ͕Ͱ͖Δ ސ٬໊ Mike ϖϯ Mike ϊʔτ Mike
ص Tom ϊʔτ Tom ϖϯ Tom ϊʔτ 3छྨ 2छྨ
23 ϓϥεϘλϯΛΫϦοΫͯ͠ΛՃ͢Δ
24 ʹSub-CategoryΛબͼɺूܭؔʹҰҙͳͷ (unique)Λબ͢Δ ߪങͨ͠αϒΧςΰϦʔͷछྨΛओ ܭ͢Δ͜ͱ͕Ͱ͖ͨɻ
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
26 ʹSalesΛબͼɺूܭؔʹ߹ܭΛબ͢Δ ސ٬͝ͱʹച্ͷ߹ܭΛूܭ͢Δ ͜ͱ͕Ͱ͖ͨ
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
Customer Name Country Mike China Mike Japan Mike Japan Tom
US Tom Japan Tom US Customer Name Country Mike ??? Tom ??? ࠃΧςΰϦܕͷͨΊूܭͰ͖ͳ͍ʁ
Customer Name Country Mike China Mike Japan Mike Japan Tom
US Tom Japan Tom US Customer Name Country_mode Mike Japan Tom US ࠷සʢmodeʣ
30 ʹCountryΛબͼɺूܭؔʹ࠷සΛબ͢Δ ސ٬ͷ࠷จճ͕ଟ͍ࠃ͕Ճ Θͬͨɻ
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
ސ٬͕࠷ॳʹߪೖͨ͠ΛٻΊ͍ͨ
33 ʹOrder DateΛબͼɺूܭؔʹ࠷ॳͷ(first)Λબ͢Δ
34 ຊʹ࠷ॳͷͰ͍͍ͷ͔ʁ
35 ͜ͷσʔλจͷঢॱʹฒͼସ͑͞Ε͍ͯͳ͍
࠷ॳͷ vs ࠷খ
Customer Name Order Date Mike 2015-12-3 Mike 2014-7-14 Mike 2015-12-24
Customer Name ։࢝ Mike 2015-12-3 ࠷ॳͷ(first)ͷ߹
Customer Name Order Date Mike 2015-12-3 Mike 2014-7-14 Mike 2015-12-24
Customer Name ։࢝ Mike 2014-7-14 ࠷খ(min)ͷ߹
39 ʹOrder DateΛબͼɺूܭؔʹ࠷খ(min)Λબ͢Δ ސ٬͕࠷ॳʹจ͕ͨ͠ूܭ ͞Εͨɻ
CustomerID͝ͱʹ • จ݅ • ߪങͨ͠ͷछྨ • ച্߹ܭ • ࠃ •
ސ٬ͷ։࢝ • ฦ ूܭ͍ͨ͠
41 ฦͷྻʹTRUE͔FALSEΛͱΔϩδΧϧܕͷྻ
ϩδΧϧܕͷूܭؔ
• TRUEͷ(FALSEͷ) • TRUEͷׂ߹(FASLEͷׂ߹) ϩδΧϧܕͷूܭؔ
Customer Name Returned Mike TRUE Mike FALSE Mike TRUE Customer
Name ฦ Mike 2 TRUEͷͷ߹
Customer Name Returned Mike TRUE Mike FALSE Mike TRUE Customer
Name ฦ Mike 0.6666 TRUEͷׂ߹ (%)ͷ߹
46 ʹReturnedΛબͼɺूܭؔʹTRUEͷΛબ͢Δ
ूܭ͢Δ͜ͱ͕Ͱ͖ͨ
Q&A
None
None
None
Contact Email
[email protected]
Twitter Hashtag #ExploratoryHour ͰπΠʔτʂ Twitter @ExploratoryJp Exploratory
Hour https://bit.ly/30odd9q