Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
My !!con talk
Search
Sasha Laundy
May 17, 2015
Technology
0
550
My !!con talk
Sasha Laundy
May 17, 2015
Tweet
Share
More Decks by Sasha Laundy
See All by Sasha Laundy
Your Brain's API: Getting and Giving Technical Help
slaundy
4
7.6k
HOWTO Make Your Future Data Science Team Love You
slaundy
0
510
HOWTO Make Your Future Data Science Team Love You
slaundy
1
980
Other Decks in Technology
See All in Technology
エムスリーマルチデバイスチーム紹介資料 / Introduction of M3 Multi Device Team
m3_engineering
1
170
生成AIがもたらす変革 / GitHubGalaxy_CyberAgent
cyberagentdevelopers
PRO
2
110
スクラムに出会って「できた」を実感できるようになってきた話 / Scrum makes me feel like I can do it
yayoi_dd
2
110
サービス開発におけるVue3とTypeScriptの親和性について
tsukuha
10
1.8k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
5
38k
動画配信サービスのフロントエンド実装に学ぶ設計原則
yud0uhu
1
130
パフォーマンス最適化のベストプラクティス
databricksjapan
0
200
NewSQL Landscape
oracle4engineer
PRO
5
3.2k
TypeScript の抽象構文木を用いた、数百を超える API の大規模リファクタリング戦略
yanaemon
6
1.3k
iThome2024 Wailing Wall of Enterprise Security
notsurprised
0
300
1Q86
kawaguti
PRO
2
190
本当のガバクラ基礎
toru_kubota
0
320
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.7k
How To Stay Up To Date on Web Technology
chriscoyier
782
250k
Building a Modern Day E-commerce SEO Strategy
aleyda
22
6.5k
Debugging Ruby Performance
tmm1
70
11k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
21
2k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
26
2.3k
Building an army of robots
kneath
300
41k
The World Runs on Bad Software
bkeepers
PRO
61
6.8k
The Invisible Side of Design
smashingmag
294
49k
Agile that works and the tools we love
rasmusluckow
325
20k
Become a Pro
speakerdeck
PRO
13
4.6k
The Cost Of JavaScript in 2023
addyosmani
21
4k
Transcript
Spinning metal platters IN THE CLOUD!!! @sashalaundy
physics + programming ! very little CS
None
None
katrinaebowman on flickr
VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;
! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
None
I get this for FREE! • Mappin’ & reducin’ •
HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
None
Write Pigscript Graphs!
None
None
“give me 500 rows where age > 15”
“give me 500 rows where age > 15” Why so
slow?
“Seeking is slower than reading”
??
None
01010110101010001010101000101010101101010101001 GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
None
None
READING: grabbing contiguous sections of data
SEEKING: grabbing scattered sections of data
“Seeking is slower than reading”
None
“give me 500 rows where age > 15” GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
MIND. BLOWN.
in my PIGSCRIPTS I had to worry about a spinning
METAL PLATTER somewhere in VIRGINIA!!!!
None
• Various schema? MONGO • Fast search? ELASTICSEARCH • Keep
history? DATOMIC • Want very fast analytics queries? REDSHIFT.
REDSHIFT production backend for your website! copy of your database
for your data team to play with!!
analytics needs lots of AGGREGATION ! like SUM, AVG, or
COUNT across ROWS
GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND So lots of seeking? GOSH DARN IT! but what
if…
GRACEALANADA504530VIRGINIAENGLANDENGLAND READING! ! YAYYYYYY!!!
GRACEALANADA504530VIRGINIAENGLANDENGLAND “columnar storage”
What’s faster than reading AND seeking? IGNORING
block min max 1 1 6 2 7 12 3
13 340
Redshift has lots more… • NODES so you can compute
in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
Really fast! …how fast? • 21,454,134 rows • COUNT(*) •
Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
376x isn’t cool. You know what’s cool? 100,000x Instead of
native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x
redshift is fast
hardware matters
databases are cool
THANKS!!!! @sashalaundy sasha.io