Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
My !!con talk
Search
Sasha Laundy
May 17, 2015
Technology
0
560
My !!con talk
Sasha Laundy
May 17, 2015
Tweet
Share
More Decks by Sasha Laundy
See All by Sasha Laundy
Your Brain's API: Getting and Giving Technical Help
slaundy
4
7.7k
HOWTO Make Your Future Data Science Team Love You
slaundy
0
520
HOWTO Make Your Future Data Science Team Love You
slaundy
1
990
Other Decks in Technology
See All in Technology
us-east-1 の障害が 起きると なぜ ソワソワするのか
miu_crescent
PRO
2
700
Sansan BIが実践する AI on BI とセマンティックレイヤー / data_summit_findy
sansan_randd
0
130
[2025-11-06] ベイズ最適化の基礎とデザイン支援への応用(CVIMチュートリアル)
yuki_koyama
1
560
自己的售票系統自己做!
eddie
0
370
フライトコントローラPX4の中身(制御器)を覗いてみた
santana_hammer
1
140
Flutterで実装する実践的な攻撃対策とセキュリティ向上
fujikinaga
1
290
プログラミング言語を書く前に日本語を書く── AI 時代に求められる「言葉で考える」力/登壇資料(井田 献一朗)
hacobu
PRO
0
130
開発者が知っておきたい複雑さの正体/where-the-complexity-comes-from
hanhan1978
6
2.4k
Snowflakeとdbtで加速する 「TVCMデータで価値を生む組織」への進化論 / Evolving TVCM Data Value in TELECY with Snowflake and dbt
carta_engineering
2
230
QAエンジニアがプロダクト専任で チームの中に入ると。。。?/登壇資料(杉森 太樹)
hacobu
PRO
0
180
激動の2025年、Modern Data Stackの最新技術動向
sagara
0
1.2k
メタプログラミングRuby問題集の活用
willnet
2
730
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Site-Speed That Sticks
csswizardry
13
960
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Designing for humans not robots
tammielis
254
26k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
54k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.1k
Designing for Performance
lara
610
69k
Statistics for Hackers
jakevdp
799
220k
Transcript
Spinning metal platters IN THE CLOUD!!! @sashalaundy
physics + programming ! very little CS
None
None
katrinaebowman on flickr
VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;
! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
None
I get this for FREE! • Mappin’ & reducin’ •
HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
None
Write Pigscript Graphs!
None
None
“give me 500 rows where age > 15”
“give me 500 rows where age > 15” Why so
slow?
“Seeking is slower than reading”
??
None
01010110101010001010101000101010101101010101001 GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
None
None
READING: grabbing contiguous sections of data
SEEKING: grabbing scattered sections of data
“Seeking is slower than reading”
None
“give me 500 rows where age > 15” GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
MIND. BLOWN.
in my PIGSCRIPTS I had to worry about a spinning
METAL PLATTER somewhere in VIRGINIA!!!!
None
• Various schema? MONGO • Fast search? ELASTICSEARCH • Keep
history? DATOMIC • Want very fast analytics queries? REDSHIFT.
REDSHIFT production backend for your website! copy of your database
for your data team to play with!!
analytics needs lots of AGGREGATION ! like SUM, AVG, or
COUNT across ROWS
GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND So lots of seeking? GOSH DARN IT! but what
if…
GRACEALANADA504530VIRGINIAENGLANDENGLAND READING! ! YAYYYYYY!!!
GRACEALANADA504530VIRGINIAENGLANDENGLAND “columnar storage”
What’s faster than reading AND seeking? IGNORING
block min max 1 1 6 2 7 12 3
13 340
Redshift has lots more… • NODES so you can compute
in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
Really fast! …how fast? • 21,454,134 rows • COUNT(*) •
Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
376x isn’t cool. You know what’s cool? 100,000x Instead of
native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x
redshift is fast
hardware matters
databases are cool
THANKS!!!! @sashalaundy sasha.io