Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
330
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
110
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
260
My TLS was broken
meson10
0
130
Technology that builds Organizations
meson10
0
120
Namespace.go
meson10
0
150
Other Decks in Technology
See All in Technology
Claude Code 10連ガチャ
uhyo
3
680
Proxmox × HCP Terraformで始めるお家プライベートクラウド
lamaglama39
1
200
“それなりに”安全なWebアプリケーションの作り方
xryuseix
0
370
これからアウトプットする人たちへ - アウトプットを支える技術 / that support output
soudai
PRO
18
5.5k
エンタープライズ企業における開発効率化のためのコンテキスト設計とその活用
sergicalsix
1
400
なぜインフラコードのモジュール化は難しいのか - アプリケーションコードとの本質的な違いから考える
mizzy
52
17k
はじめての OSS コントリビューション 〜小さな PR が世界を変える〜
chiroito
4
320
お試しで oxlint を導入してみる #vuefes_aftertalk
bengo4com
2
1.5k
米軍Platform One / Black Pearlに学ぶ極限環境DevSecOps
jyoshise
1
310
嗚呼、当時の本番環境の状態で AI Agentを再評価したいなぁ...
po3rin
0
420
バクラクの AI-BPO を支える AI エージェント 〜とそれを支える Bet AI Guild〜
tomoaki25
2
760
⽣成 AI で進化する AWS オブザーバビリティ
o11yfes2023
0
120
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
GraphQLとの向き合い方2022年版
quramy
49
14k
Building Adaptive Systems
keathley
44
2.8k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
What's in a price? How to price your products and services
michaelherold
246
12k
Into the Great Unknown - MozCon
thekraken
40
2.2k
Side Projects
sachag
455
43k
The Cult of Friendly URLs
andyhume
79
6.7k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.3k
Agile that works and the tools we love
rasmusluckow
331
21k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in