$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
260
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
86
Doing SRE the right way - 2
meson10
0
150
Doing SRE the right way
meson10
0
930
Observability and Control Theory
meson10
1
940
Reliability
meson10
0
96
Reliability of Distributed Systems
meson10
0
180
My TLS was broken
meson10
0
79
Technology that builds Organizations
meson10
0
77
Namespace.go
meson10
0
86
Other Decks in Technology
See All in Technology
クラウドネイティブへの小さな一歩!既存VMからコンテナまで、KubeVirtが実現する『無理しないペースの移行』とは!?
tsukaman
0
100
プロセス改善とE2E自動テストによる、プロダクトの品質向上事例
tomasagi
0
410
Empowering Customer Decisions with Elasticsearch: From Search to Answer Generation
hinatades
PRO
0
180
メインテーマはKubernetes
nwiizo
2
330
データ基盤の負債解消のためのリプレイス
livesense
PRO
0
140
Oracle Cloud Infrastructureデータベース・クラウド:各バージョンのサポート期間
oracle4engineer
PRO
30
16k
徹底解説!Microsoft 365 Copilot の拡張機能 / Complete guide to Microsoft 365 Copilot extensions
karamem0
1
1.7k
Entra ID の多要素認証(Japan Microsoft 365 コミュニティ カンファレンス 2024 )
murachiakira
0
1.8k
12/2(月)のBedrockアプデ速報(re:Invent 2024 Daily re:Cap #1 with AWS Heroes)
minorun365
PRO
2
270
GeminiとUnityで実現するインタラクティブアート
hokkey621
0
370
「品質とスピードはトレード・オンできる」に向き合い続けた2年半を振り返る / Quality and speed can be traded on.
mii3king
0
460
次のコンテナセキュリティの時代 - User Namespace With a Pod / CloudNative Days Winter 2024
pfn
PRO
5
450
Featured
See All Featured
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
44
2.2k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
31
2.7k
Teambox: Starting and Learning
jrom
133
8.8k
Building a Scalable Design System with Sketch
lauravandoore
459
33k
The Cost Of JavaScript in 2023
addyosmani
45
6.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
Site-Speed That Sticks
csswizardry
0
110
Agile that works and the tools we love
rasmusluckow
327
21k
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
VelocityConf: Rendering Performance Case Studies
addyosmani
326
24k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in