Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
My !!con talk
Search
Sasha Laundy
May 17, 2015
Technology
570
0
Share
My !!con talk
Sasha Laundy
May 17, 2015
More Decks by Sasha Laundy
See All by Sasha Laundy
Your Brain's API: Getting and Giving Technical Help
slaundy
4
7.7k
HOWTO Make Your Future Data Science Team Love You
slaundy
0
530
HOWTO Make Your Future Data Science Team Love You
slaundy
1
1k
Other Decks in Technology
See All in Technology
実装は速くなった、レビューはどうする? ― 自身のレビューをAIで再現させるサーヴァントエンジニアリングのすゝめ / Implementation got faster. So what about reviews? — An invitation to Servant Engineering: Recreating your own code reviews with AI
nrslib
6
3.5k
AI と創る新たな世界 / A New World Created with AI
ks91
PRO
0
110
ポスター発表&デモと総括 / Poster Presentations & Demonstrations and Summary
ks91
PRO
0
190
個人最適 から 全体最適 へ AI情報共有会・AIギルド・AI-DLC で進める カンリーの組織展開
rfdnxbro
0
1.3k
Spring AI × MCP 入門〜AIエージェントへのツール公開、境界設計から始める最小構成 〜
yuyamiyamoto
0
210
【Gen-AX】20260530開催_JJUG CCC 2026 Spring
genax
0
410
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
1.6k
プラットフォームエンジニア ワークショップ/ platform-workshop
databricksjapan
0
250
Platform engineering for developers, architects & the rest of us (AI agents)
danielbryantuk
0
180
AIガバナンス実践 - 生成AIコネクタのデータ漏洩リスクと実務対策
knishioka
0
180
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development
yoshidashingo
1
340
インフラが苦手でも大丈夫! 紙芝居 Kubernetes -WWGT 10周年編-
aoi1
1
340
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
10k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
A designer walks into a library…
pauljervisheath
211
24k
Leo the Paperboy
mayatellez
7
1.8k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.3k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.2k
Fireside Chat
paigeccino
42
3.9k
30 Presentation Tips
portentint
PRO
1
310
Information Architects: The Missing Link in Design Systems
soysaucechin
0
960
Building an army of robots
kneath
306
46k
Are puppies a ranking factor?
jonoalderson
1
3.5k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
130
Transcript
Spinning metal platters IN THE CLOUD!!! @sashalaundy
physics + programming ! very little CS
None
None
katrinaebowman on flickr
VERY high level trimmed = FOREACH loaded_data GENERATE userId, website;
! grouped = GROUP trimmed BY userId; ! counted = FOREACH grouped GENERATE group, COUNT(grouped);
None
I get this for FREE! • Mappin’ & reducin’ •
HDFS in the CLOUD! • Clusters AND nodes! • A rockin’ query plan!
None
Write Pigscript Graphs!
None
None
“give me 500 rows where age > 15”
“give me 500 rows where age > 15” Why so
slow?
“Seeking is slower than reading”
??
None
01010110101010001010101000101010101101010101001 GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
None
None
READING: grabbing contiguous sections of data
SEEKING: grabbing scattered sections of data
“Seeking is slower than reading”
None
“give me 500 rows where age > 15” GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND
MIND. BLOWN.
in my PIGSCRIPTS I had to worry about a spinning
METAL PLATTER somewhere in VIRGINIA!!!!
None
• Various schema? MONGO • Fast search? ELASTICSEARCH • Keep
history? DATOMIC • Want very fast analytics queries? REDSHIFT.
REDSHIFT production backend for your website! copy of your database
for your data team to play with!!
analytics needs lots of AGGREGATION ! like SUM, AVG, or
COUNT across ROWS
GRACE50VIRGINIAALAN45ENGLANDADA30ENGLAND So lots of seeking? GOSH DARN IT! but what
if…
GRACEALANADA504530VIRGINIAENGLANDENGLAND READING! ! YAYYYYYY!!!
GRACEALANADA504530VIRGINIAENGLANDENGLAND “columnar storage”
What’s faster than reading AND seeking? IGNORING
block min max 1 1 6 2 7 12 3
13 340
Redshift has lots more… • NODES so you can compute
in parallel • cool QUERY PLANS based on your actual data! • Not actually a database. “Managed data warehouse service in the cloud” • So blazing fast!
Really fast! …how fast? • 21,454,134 rows • COUNT(*) •
Postgres: 586,931.216 ms (10 minutes) • Redshift: 1,561.359 ms (1.5 seconds) 376 times faster! from http://dailytechnology.net/2013/08/03/redshift-what-you-need-to-know/
376x isn’t cool. You know what’s cool? 100,000x Instead of
native Python, a matrix! 100x Speed from OpenBLAS compared to numpy 10x Parallelization (for free from OpenBLAS) 10x 100,000x
redshift is fast
hardware matters
databases are cool
THANKS!!!! @sashalaundy sasha.io