Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SKL 2019 Intern Training Python Data Analysis
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Adam
March 05, 2019
Programming
52
0
Share
SKL 2019 Intern Training Python Data Analysis
Data analysis and visualization introduction for 2019 intern.
Adam
March 05, 2019
More Decks by Adam
See All by Adam
Working Backward Reading Group
adamchang
0
94
Python Data Visualization - PyData Taipei Meetup
adamchang
0
240
SKL data analysis internship lecture 1
adamchang
0
150
SKL 2019 Intern Training Data Cleaning and Feature Engineering
adamchang
0
33
Other Decks in Programming
See All in Programming
Skillは並べた。動かなかった。契約で繋いだ。— 65個のSkillから、自走する開発サイクルへ
junholee
0
700
My daily life on Ruby
a_matsuda
3
440
過去のレビュー知見をSkillsで資産化した話
pkshadeck
PRO
1
2.2k
Spec-Driven Development with AI-Agents: From High-Level Requirements to Working Software
antonarhipov
2
300
The Arts and Crafts of Work in the AI Era — Toward Mastery in Software Development
kuranuki
0
110
Cloudflare で始める Data Platform
ta93abe
0
250
1人1案件のプロダクトエンジニア時代に、"プロセス監督"としてチャレンジしたこと
non0113
0
250
Old Dog, New Tricks: The Java 25 Reinvention - JNation
bazlur_rahman
0
110
Migrations : C'est une question d'hygiène !
vinceamstoutz
0
1.1k
新規プロダクトを高速で生み出すハーネスエンジニアリング
seanchas116
3
260
AI時代だからこそ「Bloc」を採用する価値があるのかもしれない
takuroabe
0
230
These Five Tricks Can Make Your Apps Greener, Cheaper, & Nicer
hollycummins
0
140
Featured
See All Featured
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.5k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
130
Paper Plane (Part 1)
katiecoart
PRO
0
7.8k
Designing for Timeless Needs
cassininazir
1
230
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.3k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
830
Automating Front-end Workflow
addyosmani
1370
210k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
260
Thoughts on Productivity
jonyablonski
76
5.2k
Transcript
Data Analysis In Python 資料分析介紹 & Python實作
Yesterday recap! ‣ IDE In Python Development ‣ Markdown ‣
Data type ‣ Control Flow ‣ Exception ‣ Function ‣ Class ‣ Module ‣ Python Debugger
先介紹今天推薦的教材
為什麼要做資料分析? 幫助我們做更好的決策 - 我說的
–列⼦‧說符 「是故聖⼈⾒出以知入,觀往⽽知來,此其所 以先知之理也。」
TODAY’S AGENDA 1. Exploratory Data Analysis 2. Data Visualization 3.
How To Read Package Documents? 4. Pandas In Data Analysis 5. Homework
你會如何評估⼀份數據資料? 10 筆資料 100 筆資料 1,000,000 筆資料 平均值 平均值 平均值?????
分佈、分佈、分佈!
描述性統計 Min 25% Mediam 75% Max Min 12.5% 25% Mediam
75% 87.5% Max Min 12.5% 25% Mediam 75% 87.5% Max Midhinge Mideight Min 12.5% 25% Mediam 75% 87.5% Max Midhinge Mideight Counts Mean STD
Null Data
Outlier
Correlation
Data Visualization
– 分佈、分佈、分佈!!! 為什麼我們要做資料視覺化?
國內各⼈壽公司近⼀年在Google的搜索熱度 搜索熱度(數量) 0 25 50 75 100 搜索時間(⽇期) 2018-03-04 2018-03-25
2018-04-15 2018-05-06 2018-05-27 2018-06-17 2018-07-08 2018-07-29 2018-08-19 2018-09-09 2018-09-30 2018-10-21 2018-11-11 2018-12-02 2018-12-23 2019-01-13 2019-02-03 新光⼈壽: (Taiwan) 國泰⼈壽: (Taiwan) 富邦⼈壽: (Taiwan) 台灣⼈壽: (Taiwan) 我們如何看圖表? 1. 先看標頭 2. 座標軸名稱 3. 座標軸單位 4. 座標軸範圍 5. 看圖表Pattern & 圖⽰
常⽤的圖表有哪些?
Line Chart 0 25 50 75 100 4 ⽉ 5
⽉ 6 ⽉ 7 ⽉ Pros: 通常⽤於評估因時間變化 ⽽改變的趨勢 Cons: 結構簡單 通常需結合其他資訊比對
Bar Chart 0 25 50 75 100 4 ⽉ 5
⽉ 6 ⽉ 7 ⽉ Pros: 通常⽤於評估事件的頻率 Cons: 結構簡單 通常需結合其他資訊比對
Pie Chart 7% 8% 10% 11% 29% 35% Pros: 通常⽤於評估事件的佔比
Cons: 結構簡單 通常需結合其他資訊比對
Histogram Pros: 通常⽤於評估事件的頻率 分佈,是統計上常⽤的圖 表之⼀ Cons: 需調整資料級距 無法看到關鍵指標的位置
Scatter Plot Pros: 通常⽤於評估兩個因⼦之 間的關係 Cons: 常輔以回歸線看彼此的關 係
Boxplot Pros: 也是看分佈常⽤的圖標之 ⼀,可以清楚地看到描述 性統計 Cons: 評估整體性稍為直⽅圖弱 ⼀點,可以交叉比對看
Heatmap Pros: 通常⽤以看因⼦之間的相 關強度 Cons: ⽤於可明顯評估強度的資 料
How to read package document?
Python in jupyter
Recap • How to analyze data • How to visualize
data and get insight • How to use documentation • How to explore data using python
Introduce Kaggle https://www.kaggle.com
Homework - 資料探索松! Objective: 練習如何使⽤Python進⾏資料探勘,及對數據建立批判性思維 Dataset: Boston House Dataset From
Kaggle 1. 請針對各(78)個特徵做EDA,看看他們的描述性統計(如下表!) 2. 去看看每⼀個特徵跟房價是否有關係?請從中挑出15個你覺得有關係的特徵,並 挑出10個你覺得沒有關係的特徵,寫下你這樣判斷的理由 3. 將上述的資訊建成⼀張Excel的表,星期四驗收 Feature Chinese Data Descripti on Null Qty DataType Unit Mode Frequenc y(%) Counts Mean STD Min 25% Median 75% Max Relations hip? Reason GrLivAre a ⽣活區域 ⾯積 ⾯積⼤⼩ 100 Numerical m^2 35872 35 1.36 15 25 35 45 50 Yes 買房⼀定 看室內坪 數啊 Neighbor hood 社區位置 描述住宅 的區域 3000 Categoric al qty A 75 32872 Yes 房⼦的位 置Hen重 要