Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Understanding the Basics of Data Analysis
Search
Piyush Verma
November 13, 2017
Technology
360
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Understanding the Basics of Data Analysis
Read more on
http://blog.oogway.in
Piyush Verma
November 13, 2017
More Decks by Piyush Verma
See All by Piyush Verma
SLOs that Lie
meson10
0
140
Doing SRE the right way - 2
meson10
0
170
Doing SRE the right way
meson10
0
1k
Observability and Control Theory
meson10
1
1.2k
Reliability
meson10
0
180
Reliability of Distributed Systems
meson10
0
310
My TLS was broken
meson10
0
160
Technology that builds Organizations
meson10
0
150
Namespace.go
meson10
0
190
Other Decks in Technology
See All in Technology
自律型AIエージェントは何を破壊するのか
kojira
0
130
ブロックチェーン / Blockchain
ks91
PRO
0
120
TypeScript Compiler APIとPHP-Parserを活用し、TypeScriptとPHPで型を共有する
shuta13
1
370
Djangoユーザが知っ得なPostgreSQL機能 - 設計の選択肢を増やす / Djang-use-PostgreSQL
soudai
PRO
0
220
新規事業を牽引する技術選定 〜フルスタックTypeScript開発の実践事例〜
nullnull
3
380
あなたの AI ワークスペースに、 専門コーダーを連れてくる - Amazon Quick Desktop 最新情報
kawaji_scratch
1
120
Taking back control of your AI development
inesmontani
PRO
0
110
【Gen-AX】20260530開催_JJUG CCC 2026 Spring
genax
1
450
DevOps Agentで始めるAWS運用 〜フロンティアエージェントが変える運用の現場〜
nyankotaro
1
350
運用を見据えたAIエージェント設計実践
amacbee
1
3.4k
探して_入れて_作って_使う_Agent_Skills___LT.pdf
peintangos
2
190
React、まだ楽しくて草
uhyo
7
4.2k
Featured
See All Featured
エンジニアに許された特別な時間の終わり
watany
107
250k
The Invisible Side of Design
smashingmag
302
52k
GraphQLとの向き合い方2022年版
quramy
50
15k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
460
How GitHub (no longer) Works
holman
316
150k
Mobile First: as difficult as doing things right
swwweet
225
10k
Building AI with AI
inesmontani
PRO
1
1.1k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
390
Side Projects
sachag
455
43k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
360
Git: the NoSQL Database
bkeepers
PRO
432
67k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
480
Transcript
ABC of Distributed Data Processing. Achieving Buzzword Compliance. 1 Piyush
Verma Oogway Consulting
Common thoughts 2
When will my Data become Big Data?
Hive Data Will Save.
How did we reach here? 5
Data :: Business
Data :: Business
Types of Workload
When do I call it Big Enough?
Why bother with Data Engineering? 10
Why do analysis at all?
Descriptive - Historical. - Deterministic. - Inferential. - Managers make
pretty graphs.
Predictive - Future. - Probabilistic. - Based on Descriptive. -
This is what armchair critics do.
Prescriptive
Architecture: Round 1 15
What does data look like?
Storage Choice 1
Storage Choice 2
Challenges: Round 1 19
Scaling
Archival Policy
Oh no
Garbage / Purging
All related entities end up in complex joins
All Relationships complicate over Dimension of time
Anatomy 26
Anatomy
Challenges: Round 2 28
Snowflake Schema
Star Schema
De-Duplication
Bloom Filters Cuckoo Filters - Does not exist for sure.
- May or may not exist.
Slow Changing Dimensions
Batching vs Streaming
Out-of-Order Processing
Cubes • Efficiency of Retrieval • Warehouse:Cube :: DB:Table •
View: Dimension + Measure • Slice, Dice & Rotate
Architecture: Revisited 37
Sample Solution
Thank you! Piyush Verma @meson10 Oogway Consulting http://oogway.in