Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
My !!con talk
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Sasha Laundy
May 17, 2015
Technology
570
0
Share
My !!con talk
Sasha Laundy
May 17, 2015
More Decks by Sasha Laundy
See All by Sasha Laundy
Your Brain's API: Getting and Giving Technical Help
slaundy
4
7.7k
HOWTO Make Your Future Data Science Team Love You
slaundy
0
530
HOWTO Make Your Future Data Science Team Love You
slaundy
1
1k
Other Decks in Technology
See All in Technology
いつの間にかデータエンジニア以外の業務も増えていたけど、意外と経験が役に立ってる
zozotech
PRO
0
630
How to learn AWS Well-Architected with AWS BuilderCards: Security Edition
coosuke
PRO
0
150
Tachikawa.any 運営挨拶
daitasu
0
180
Oracle Cloud Infrastructure presents managed, serverless MCP Servers for Oracle AI Database
thatjeffsmith
1
350
生成AI時代に信頼性をどう保ち続けるか - Policy as Code の実践
akitok_
1
450
写真で見るAWS Summit Singapore 2026
k_adachi_01
0
110
ECSのTerraformモジュールにコントリビュートした話
harukasakihara
0
210
AI対話分析の夢と、汚いデータの現実 Looker / Dataplex / Dataform で実現する品質ファーストな基盤設計
waiwai2111
0
580
Agent Skillsで実現する記憶領域の運用とその後
yamadashy
2
1.9k
Claude Code で使える DuckDB Skills を試してみた / DuckDB Skills and Claude Code
masahirokawahara
1
550
続 運用改善、不都合な真実 〜 物理制約のない運用改善はほとんど無価値 / 20260518-ssmjp-kaizen-no-value-without-physical-constraints
opelab
2
240
【関西製造業祭り2026春】現場を変える技術はここまで来た〜世界最大の製造業見本市から持って帰ってきたもの〜
tanakaseiya
0
170
Featured
See All Featured
The SEO identity crisis: Don't let AI make you average
varn
0
460
KATA
mclloyd
PRO
35
15k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
120
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
300
Embracing the Ebb and Flow
colly
88
5k
Designing Experiences People Love
moore
143
24k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Documentation Writing (for coders)
carmenintech
77
5.3k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
500
Believing is Seeing
oripsolob
1
120
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
Reflections from 52 weeks, 52 projects
jeffersonlam
356
21k
Transcript
Spinning metal platters IN THE CLOUD!!! @sashalaundy
physics + programming ! very little CS
None
None
katrinaebowman on flickr
VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;
! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
None
I get this for FREE! • Mappin’ & reducin’ •
HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
None
Write Pigscript Graphs!
None
None
“give me 500 rows where age > 15”
“give me 500 rows where age > 15” Why so
slow?
“Seeking is slower than reading”
??
None
01010110101010001010101000101010101101010101001 GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
None
None
READING: grabbing contiguous sections of data
SEEKING: grabbing scattered sections of data
“Seeking is slower than reading”
None
“give me 500 rows where age > 15” GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
MIND. BLOWN.
in my PIGSCRIPTS I had to worry about a spinning
METAL PLATTER somewhere in VIRGINIA!!!!
None
• Various schema? MONGO • Fast search? ELASTICSEARCH • Keep
history? DATOMIC • Want very fast analytics queries? REDSHIFT.
REDSHIFT production backend for your website! copy of your database
for your data team to play with!!
analytics needs lots of AGGREGATION ! like SUM, AVG, or
COUNT across ROWS
GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND So lots of seeking? GOSH DARN IT! but what
if…
GRACEALANADA504530VIRGINIAENGLANDENGLAND READING! ! YAYYYYYY!!!
GRACEALANADA504530VIRGINIAENGLANDENGLAND “columnar storage”
What’s faster than reading AND seeking? IGNORING
block min max 1 1 6 2 7 12 3
13 340
Redshift has lots more… • NODES so you can compute
in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
Really fast! …how fast? • 21,454,134 rows • COUNT(*) •
Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
376x isn’t cool. You know what’s cool? 100,000x Instead of
native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x
redshift is fast
hardware matters
databases are cool
THANKS!!!! @sashalaundy sasha.io