Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Understanding the Basics of Data Analysis
Piyush Verma
November 13, 2017
Technology
0
130
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
64
Doing SRE the right way - 2
meson10
0
100
Doing SRE the right way
meson10
0
750
Observability and Control Theory
meson10
1
640
Reliability
meson10
0
62
Reliability of Distributed Systems
meson10
0
100
My TLS was broken
meson10
0
33
Technology that builds Organizations
meson10
0
44
Namespace.go
meson10
0
55
Other Decks in Technology
See All in Technology
私見「UNIXの考え方」/20230124-kameda-unix-phylosophy
opelab
0
160
20230121_BuriKaigi
oyakata2438
0
180
日経電子版だけじゃない! 日経の新規Webメディアの開発 - NIKKEI Tech Talk #3
sztm
0
290
もし本番ネットワークをまるごと仮想環境に”コピー”できたらうれしいですか? / janog51
corestate55
0
360
ML PM, DS PMってどんな仕事をしているの?
line_developers
PRO
1
240
あつめたデータをどう扱うか
skrb
2
150
01_ユーザーリサーチ実施の進め方
kouzoukaikaku
0
370
メドレー エンジニア採用資料/ Medley Engineer Guide
medley
3
5.1k
ラズパイとGASで加湿器の消し忘れをLINEでリマインド&操作
minako__ph
0
140
ROS_Japan_UG_#49_LT
maeharakeisuke
0
220
Oktaの管理者権限を適切に移譲してみた
shimosyan
2
260
OpenShiftのリリースノートを整理してみた
loftkun
2
320
Featured
See All Featured
Bash Introduction
62gerente
601
210k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
224
50k
Faster Mobile Websites
deanohume
295
29k
Building Adaptive Systems
keathley
27
1.3k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
10
1.3k
Reflections from 52 weeks, 52 projects
jeffersonlam
338
18k
WebSockets: Embracing the real-time Web
robhawkes
58
6k
Why You Should Never Use an ORM
jnunemaker
PRO
49
7.9k
Agile that works and the tools we love
rasmusluckow
320
20k
Code Review Best Practice
trishagee
50
11k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
32
6.7k
Raft: Consensus for Rubyists
vanstee
130
5.7k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in