Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
270
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
86
Doing SRE the right way - 2
meson10
0
150
Doing SRE the right way
meson10
0
940
Observability and Control Theory
meson10
1
950
Reliability
meson10
0
100
Reliability of Distributed Systems
meson10
0
190
My TLS was broken
meson10
0
83
Technology that builds Organizations
meson10
0
82
Namespace.go
meson10
0
90
Other Decks in Technology
See All in Technology
多領域インシデントマネジメントへの挑戦:ハードウェアとソフトウェアの融合が生む課題/Challenge to multidisciplinary incident management: Issues created by the fusion of hardware and software
bitkey
PRO
2
110
小学3年生夏休みの自由研究「夏休みに Copilot で遊んでみた」
taichinakamura
0
160
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
480
大幅アップデートされたRagas v0.2をキャッチアップ
os1ma
2
540
サイボウズフロントエンドエキスパートチームについて / FrontendExpert Team
cybozuinsideout
PRO
5
38k
終了の危機にあった15年続くWebサービスを全力で存続させる - phpcon2024
yositosi
14
12k
Qiita埋め込み用スライド
naoki_0531
0
5.1k
Microsoft Azure全冠になってみた ~アレを使い倒した者が試験を制す!?~/Obtained all Microsoft Azure certifications Those who use "that" to the full will win the exam! ?
yuj1osm
2
110
統計データで2024年の クラウド・インフラ動向を眺める
ysknsid25
2
850
サーバレスアプリ開発者向けアップデートをキャッチアップしてきた #AWSreInvent #regrowth_fuk
drumnistnakano
0
200
複雑性の高いオブジェクト編集に向き合う: プラガブルなReactフォーム設計
righttouch
PRO
0
120
レンジャーシステムズ | 会社紹介(採用ピッチ)
rssytems
0
150
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
693
190k
Reflections from 52 weeks, 52 projects
jeffersonlam
347
20k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.1k
Statistics for Hackers
jakevdp
796
220k
4 Signs Your Business is Dying
shpigford
181
21k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
2
170
Six Lessons from altMBA
skipperchong
27
3.5k
Building Your Own Lightsaber
phodgson
103
6.1k
Mobile First: as difficult as doing things right
swwweet
222
9k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in