Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
My !!con talk
Search
Sasha Laundy
May 17, 2015
Technology
0
560
My !!con talk
Sasha Laundy
May 17, 2015
Tweet
Share
More Decks by Sasha Laundy
See All by Sasha Laundy
Your Brain's API: Getting and Giving Technical Help
slaundy
4
7.7k
HOWTO Make Your Future Data Science Team Love You
slaundy
0
520
HOWTO Make Your Future Data Science Team Love You
slaundy
1
980
Other Decks in Technology
See All in Technology
AWSを活用したIoTにおけるセキュリティ対策のご紹介
kwskyk
0
380
手を動かしてレベルアップしよう!
maruto
0
220
実は強い 非ViTな画像認識モデル
tattaka
3
1.3k
AI自体のOps 〜LLMアプリの運用、AWSサービスとOSSの使い分け〜
minorun365
PRO
2
140
【詳説】コンテンツ配信 システムの複数機能 基盤への拡張
hatena
0
260
OCI Success Journey OCIの何が評価されてる?疑問に答える事例セミナー(2025年2月実施)
oracle4engineer
PRO
2
160
【Findy】「正しく」失敗できる チームの作り方 〜リアルな事例から紐解く失敗を恐れない組織とは〜 / A team that can fail correctly by findy
i35_267
5
910
あなたが人生で成功するための5つの普遍的法則 #jawsug #jawsdays2025 / 20250301 HEROZ
yoshidashingo
2
300
1行のコードから社会課題の解決へ: EMの探究、事業・技術・組織を紡ぐ実践知 / EM Conf 2025
9ma3r
11
3.9k
Autonomous Database Serverless 技術詳細 / adb-s_technical_detail_jp
oracle4engineer
PRO
17
45k
データベースの負荷を紐解く/untangle-the-database-load
emiki
2
520
AIエージェント開発のノウハウと課題
pharma_x_tech
0
310
Featured
See All Featured
Build The Right Thing And Hit Your Dates
maggiecrowley
34
2.5k
Designing for Performance
lara
604
68k
Music & Morning Musume
bryan
46
6.4k
How STYLIGHT went responsive
nonsquared
98
5.4k
Building an army of robots
kneath
303
45k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Being A Developer After 40
akosma
89
590k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3k
Building Your Own Lightsaber
phodgson
104
6.2k
Side Projects
sachag
452
42k
Transcript
Spinning metal platters IN THE CLOUD!!! @sashalaundy
physics + programming ! very little CS
None
None
katrinaebowman on flickr
VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;
! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
None
I get this for FREE! • Mappin’ & reducin’ •
HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
None
Write Pigscript Graphs!
None
None
“give me 500 rows where age > 15”
“give me 500 rows where age > 15” Why so
slow?
“Seeking is slower than reading”
??
None
01010110101010001010101000101010101101010101001 GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
None
None
READING: grabbing contiguous sections of data
SEEKING: grabbing scattered sections of data
“Seeking is slower than reading”
None
“give me 500 rows where age > 15” GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
MIND. BLOWN.
in my PIGSCRIPTS I had to worry about a spinning
METAL PLATTER somewhere in VIRGINIA!!!!
None
• Various schema? MONGO • Fast search? ELASTICSEARCH • Keep
history? DATOMIC • Want very fast analytics queries? REDSHIFT.
REDSHIFT production backend for your website! copy of your database
for your data team to play with!!
analytics needs lots of AGGREGATION ! like SUM, AVG, or
COUNT across ROWS
GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND So lots of seeking? GOSH DARN IT! but what
if…
GRACEALANADA504530VIRGINIAENGLANDENGLAND READING! ! YAYYYYYY!!!
GRACEALANADA504530VIRGINIAENGLANDENGLAND “columnar storage”
What’s faster than reading AND seeking? IGNORING
block min max 1 1 6 2 7 12 3
13 340
Redshift has lots more… • NODES so you can compute
in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
Really fast! …how fast? • 21,454,134 rows • COUNT(*) •
Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
376x isn’t cool. You know what’s cool? 100,000x Instead of
native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x
redshift is fast
hardware matters
databases are cool
THANKS!!!! @sashalaundy sasha.io