Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
0
320
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
Tweet
Share
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
99
Doing SRE the right way - 2
meson10
0
160
Doing SRE the right way
meson10
0
980
Observability and Control Theory
meson10
1
1.1k
Reliability
meson10
0
140
Reliability of Distributed Systems
meson10
0
250
My TLS was broken
meson10
0
120
Technology that builds Organizations
meson10
0
110
Namespace.go
meson10
0
140
Other Decks in Technology
See All in Technology
今!ソフトウェアエンジニアがハードウェアに手を出すには
mackee
12
4.8k
自作JSエンジンに推しプロポーザルを実装したい!
sajikix
1
180
Webブラウザ向け動画配信プレイヤーの 大規模リプレイスから得た知見と学び
yud0uhu
0
230
dbt開発 with Claude Codeのためのガードレール設計
10xinc
2
1.2k
Function Body Macros で、SwiftUI の View に Accessibility Identifier を自動付与する/Function Body Macros: Autogenerate accessibility identifiers for SwiftUI Views
miichan
2
180
エラーとアクセシビリティ
schktjm
1
1.3k
Webアプリケーションにオブザーバビリティを実装するRust入門ガイド
nwiizo
7
830
生成AI時代のデータ基盤設計〜ペースレイヤリングで実現する高速開発と持続性〜 / Levtech Meetup_Session_2
sansan_randd
1
150
KotlinConf 2025_イベントレポート
sony
1
140
ハードウェアとソフトウェアをつなぐ全てを内製している企業の E2E テストの作り方 / How to create E2E tests for a company that builds everything connecting hardware and software in-house
bitkey
PRO
1
150
react-callを使ってダイヤログをいろんなとこで再利用しよう!
shinaps
1
240
AIエージェント開発用SDKとローカルLLMをLINE Botと組み合わせてみた / LINEを使ったLT大会 #14
you
PRO
0
120
Featured
See All Featured
RailsConf 2023
tenderlove
30
1.2k
Practical Orchestrator
shlominoach
190
11k
4 Signs Your Business is Dying
shpigford
184
22k
How STYLIGHT went responsive
nonsquared
100
5.8k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
Become a Pro
speakerdeck
PRO
29
5.5k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Thoughts on Productivity
jonyablonski
70
4.8k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Statistics for Hackers
jakevdp
799
220k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in