Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
My !!con talk
Search
Sasha Laundy
May 17, 2015
Technology
0
560
My !!con talk
Sasha Laundy
May 17, 2015
Tweet
Share
More Decks by Sasha Laundy
See All by Sasha Laundy
Your Brain's API: Getting and Giving Technical Help
slaundy
4
7.7k
HOWTO Make Your Future Data Science Team Love You
slaundy
0
520
HOWTO Make Your Future Data Science Team Love You
slaundy
1
990
Other Decks in Technology
See All in Technology
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
1
400
半年で、AIゼロ知識から AI中心開発組織の変革担当に至るまで
rfdnxbro
0
140
ESXi のAIOps だ!2025冬
unnowataru
0
350
Claude Codeを使った情報整理術
knishioka
0
130
Bedrock AgentCore Memoryの新機能 (Episode) を試してみた / try Bedrock AgentCore Memory Episodic functionarity
hoshi7_n
2
1.8k
障害対応訓練、その前に
coconala_engineer
0
190
ソフトウェアエンジニアとAIエンジニアの役割分担についてのある事例
kworkdev
PRO
0
220
_第4回__AIxIoTビジネス共創ラボ紹介資料_20251203.pdf
iotcomjpadmin
0
130
M&Aで拡大し続けるGENDAのデータ活用を促すためのDatabricks権限管理 / AEON TECH HUB #22
genda
0
230
たまに起きる外部サービスの障害に備えたり備えなかったりする話
egmc
0
400
AWS運用を効率化する!AWS Organizationsを軸にした一元管理の実践/nikkei-tech-talk-202512
nikkei_engineer_recruiting
0
170
会社紹介資料 / Sansan Company Profile
sansan33
PRO
11
390k
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Are puppies a ranking factor?
jonoalderson
0
2.4k
A Soul's Torment
seathinner
1
2k
How to build a perfect <img>
jonoalderson
0
4.7k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.4k
Claude Code のすすめ
schroneko
65
200k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.3k
The SEO Collaboration Effect
kristinabergwall1
0
310
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
170
Building AI with AI
inesmontani
PRO
1
570
Mind Mapping
helmedeiros
PRO
0
39
Transcript
Spinning metal platters IN THE CLOUD!!! @sashalaundy
physics + programming ! very little CS
None
None
katrinaebowman on flickr
VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;
! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
None
I get this for FREE! • Mappin’ & reducin’ •
HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
None
Write Pigscript Graphs!
None
None
“give me 500 rows where age > 15”
“give me 500 rows where age > 15” Why so
slow?
“Seeking is slower than reading”
??
None
01010110101010001010101000101010101101010101001 GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
None
None
READING: grabbing contiguous sections of data
SEEKING: grabbing scattered sections of data
“Seeking is slower than reading”
None
“give me 500 rows where age > 15” GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
MIND. BLOWN.
in my PIGSCRIPTS I had to worry about a spinning
METAL PLATTER somewhere in VIRGINIA!!!!
None
• Various schema? MONGO • Fast search? ELASTICSEARCH • Keep
history? DATOMIC • Want very fast analytics queries? REDSHIFT.
REDSHIFT production backend for your website! copy of your database
for your data team to play with!!
analytics needs lots of AGGREGATION ! like SUM, AVG, or
COUNT across ROWS
GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND So lots of seeking? GOSH DARN IT! but what
if…
GRACEALANADA504530VIRGINIAENGLANDENGLAND READING! ! YAYYYYYY!!!
GRACEALANADA504530VIRGINIAENGLANDENGLAND “columnar storage”
What’s faster than reading AND seeking? IGNORING
block min max 1 1 6 2 7 12 3
13 340
Redshift has lots more… • NODES so you can compute
in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
Really fast! …how fast? • 21,454,134 rows • COUNT(*) •
Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
376x isn’t cool. You know what’s cool? 100,000x Instead of
native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x
redshift is fast
hardware matters
databases are cool
THANKS!!!! @sashalaundy sasha.io