Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Piyush Verma
November 13, 2017
Technology
350
0
Share
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
120
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
160
Reliability of Distributed Systems
meson10
0
280
My TLS was broken
meson10
0
150
Technology that builds Organizations
meson10
0
150
Namespace.go
meson10
0
180
Other Decks in Technology
See All in Technology
コミュニティ・勉強会を作るのは目的じゃない
ohmori_yusuke
0
280
需要創出(Chatwork)×供給(BPaaS) フライホイールとMoat 実行能力の最適配置とAI戦略
kubell_hr
0
1.4k
AWS Agent Registry の基礎・概要を理解する/aws-agent-registry-intro
ren8k
3
420
AI時代 に増える データ活用先
takahal
0
340
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.4k
Percolatorを廃止し、マルチ検索サービスへ刷新した話 / Search Engineering Tech Talk 2026 Spring
visional_engineering_and_design
0
200
AIが書いたコードを信じられない問題 〜レビュー負荷を下げるために変えたこと〜 / The AI Code Trust Gap: Reducing the Review Burden
bitkey
PRO
8
1.4k
EMから幅を広げるために最近挑戦していること / Recent challenges I'm undertaking to expand my horizons beyond EM
hiro_torii
1
160
20年前の「OSS革命」に学ぶ AI時代の生存戦略
samakada
0
500
要件定義の精度を高めるための型と生成AIの活用 / Using Types and Generative AI to Improve the Accuracy of Requirements Definition
haru860
0
110
Anthropic「Long-running a gents」をGeminiで再現してみた
tkikuchi
0
730
MySQL 9.7がやってきた ~これまでのあらすじと基本情報~ @ 日本MySQLユーザ会会2026年04月 / mysql97-yattekita
sakaik
0
120
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.9k
Abbi's Birthday
coloredviolet
2
7.3k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
120
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
170
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
240
Discover your Explorer Soul
emna__ayadi
2
1.1k
WCS-LA-2024
lcolladotor
0
550
[SF Ruby Conf 2025] Rails X
palkan
2
980
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
680
The SEO identity crisis: Don't let AI make you average
varn
0
450
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
200
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in