Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
300
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
87
Doing SRE the right way - 2
meson10
0
150
Doing SRE the right way
meson10
0
960
Observability and Control Theory
meson10
1
1k
Reliability
meson10
0
120
Reliability of Distributed Systems
meson10
0
210
My TLS was broken
meson10
0
96
Technology that builds Organizations
meson10
0
94
Namespace.go
meson10
0
110
Other Decks in Technology
See All in Technology
2025年春に見直したい、リソース最適化の基本
sogaoh
PRO
0
470
Vision Pro X Text to 3D Model ~How Swift and Generative Al Unlock a New Era of Spatial Computing~
igaryo0506
0
260
「それはhowなんよ〜」のガイドライン #orestudy
77web
9
2.4k
似たような課題が何度も蘇ってくるゾンビふりかえりを撲滅するため、ふりかえりのテーマをフォーカスしてもらった話 / focusing on the theme
naitosatoshi
0
400
こんなデータマートは嫌だ。どんな? / waiwai-data-meetup-202504
shuntak
6
1.8k
Cursor AgentによるパーソナルAIアシスタント育成入門―業務のプロンプト化・MCPの活用
os1ma
11
3.6k
Porting PicoRuby to Another Microcontroller: ESP32
yuuu
2
150
技術者はかっこいいものだ!!~キルラキルから学んだエンジニアの生き方~
masakiokuda
2
200
低レイヤを知りたいPHPerのためのCコンパイラ作成入門 / Building a C Compiler for PHPers Who Want to Dive into Low-Level Programming
tomzoh
0
210
NLP2025 参加報告会 / NLP2025
sansan_randd
4
530
7,000名規模の 人材サービス企業における プロダクト戦略・戦術と課題 / Product strategy, tactics and challenges for a 7,000-employee staffing company
techtekt
0
270
Amebaにおける Platform Engineeringの実践
kumorn5s
6
910
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
47
2.5k
Raft: Consensus for Rubyists
vanstee
137
6.9k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
227
22k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
The Invisible Side of Design
smashingmag
299
50k
Visualization
eitanlees
146
16k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
Scaling GitHub
holman
459
140k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.7k
Testing 201, or: Great Expectations
jmmastey
42
7.4k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.3k
What's in a price? How to price your products and services
michaelherold
245
12k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in