Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
310
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
92
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
970
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
130
Reliability of Distributed Systems
meson10
0
220
My TLS was broken
meson10
0
110
Technology that builds Organizations
meson10
0
100
Namespace.go
meson10
0
120
Other Decks in Technology
See All in Technology
令和トラベルQAのAI活用
seigaitakahiro
0
520
Java 30周年記念! Javaの30年をふりかえる
skrb
1
660
ゴリラ.vim #36 ~ Vim x SNS ~ スポンサーセッション
yasunori0418
1
350
2025advance01
minamizaki
0
130
NW運用の工夫と発明
recuraki
1
790
LT:組込み屋さんのオシロが壊れた!
windy_pon
0
460
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
38k
プロジェクトマネジメント実践論|現役エンジニアが語る!~チームでモノづくりをする時のコツとは?~
mixi_engineers
PRO
3
180
データプレーンプログラミングとは? DPU&スイッチASICの開発経験から語る
ebiken
PRO
1
260
Swiftは最高だよの話
yuukiw00w
2
290
Machine Intelligence for Vision, Language, and Actions
keio_smilab
PRO
0
490
継続戦闘能⼒
sansantech
PRO
0
220
Featured
See All Featured
Optimizing for Happiness
mojombo
378
70k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.5k
The Cult of Friendly URLs
andyhume
78
6.4k
Bash Introduction
62gerente
614
210k
Building Applications with DynamoDB
mza
95
6.4k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
GitHub's CSS Performance
jonrohan
1031
460k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
228
22k
Being A Developer After 40
akosma
91
590k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
Speed Design
sergeychernyshev
30
970
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
137
34k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in