Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
340
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
150
Reliability of Distributed Systems
meson10
0
270
My TLS was broken
meson10
0
140
Technology that builds Organizations
meson10
0
140
Namespace.go
meson10
0
160
Other Decks in Technology
See All in Technology
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
95k
ヘルシーSRE
tk3fftk
2
240
Kiro のクレジットを使い切る!
otanikohei2023
0
110
自動テストが巻き起こした開発プロセス・チームの変化 / Impact of Automated Testing on Development Cycles and Team Dynamics
codmoninc
1
1.1k
白金鉱業Meetup_Vol.22_Orbital Senseを支える衛星画像のマルチモーダルエンベディングと地理空間のあいまい検索技術
brainpadpr
2
210
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
14k
LLM活用の壁を超える:リクルートR&Dの戦略と打ち手
recruitengineers
PRO
1
240
Exadata Fleet Update
oracle4engineer
PRO
0
1.3k
男(監査)はつらいよ - Policy as CodeからAIエージェントへ
ken5scal
5
730
JAWS DAYS 2026 CDP道場 事前説明会 / JAWS DAYS 2026 CDP Dojo briefing document
naospon
0
170
製造業ドメインにおける LLMプロダクト構築: 複雑な文脈へのアプローチ
caddi_eng
1
460
「ストレッチゾーンに挑戦し続ける」ことって難しくないですか? メンバーの持続的成長を支えるEMの環境設計
sansantech
PRO
1
320
Featured
See All Featured
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.1k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
250
Believing is Seeing
oripsolob
1
72
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
660
Darren the Foodie - Storyboard
khoart
PRO
3
2.7k
A Tale of Four Properties
chriscoyier
162
24k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
450
Building an army of robots
kneath
306
46k
Side Projects
sachag
455
43k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
230
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in