Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
350
0
Share
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
160
Reliability of Distributed Systems
meson10
0
280
My TLS was broken
meson10
0
150
Technology that builds Organizations
meson10
0
140
Namespace.go
meson10
0
170
Other Decks in Technology
See All in Technology
OpenClaw初心者向けセミナー / OpenClaw Beginner Seminar
cmhiranofumio
0
370
AIがコードを書く時代の ジェネレーティブプログラミング
polidog
PRO
3
660
ふりかえりがなかった職能横断チームにふりかえりを導入してみて学んだこと 〜チームのふりかえりを「みんなで未来を考える場」にするプロローグ設計〜
masahiro1214shimokawa
0
300
プロダクトを育てるように生成AIによる開発プロセスを育てよう
kakehashi
PRO
1
900
TanStack Start エコシステムの現在地 / TanStack Start Ecosystem 2026
iktakahiro
1
360
終盤で崩壊させないAI駆動開発
j5ik2o
0
200
【関西電力KOI×VOLTMIND 生成AIハッカソン】空間AIブレイン ~⼤阪おばちゃんフィジカルAIに続く道~
tanakaseiya
0
180
第26回FA設備技術勉強会 - Claude/Claude_codeでデータ分析 -
happysamurai294
0
400
Hooks, Filters & Now Context: Why MCPs Are the “Hooks” of the AI Era
miriamschwab
0
130
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.3k
組織的なAI活用を阻む 最大のハードルは コンテキストデザインだった
ixbox
1
1.3k
Databricksを用いたセキュアなデータ基盤構築とAIプロダクトへの応用.pdf
pkshadeck
PRO
0
230
Featured
See All Featured
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
How GitHub (no longer) Works
holman
316
150k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
310
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
160
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1k
Abbi's Birthday
coloredviolet
2
6.4k
So, you think you're a good person
axbom
PRO
2
2k
Automating Front-end Workflow
addyosmani
1370
200k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
150
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
250
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in