Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Small Data: Storage For The Rest Of Us
Andrew Godwin
May 26, 2015
Programming
1
330
Small Data: Storage For The Rest Of Us
A talk I gave at PyWaw Summit 2015.
Andrew Godwin
May 26, 2015
Tweet
Share
More Decks by Andrew Godwin
See All by Andrew Godwin
A Newcomer's Guide To Airflow's Architecture
andrewgodwin
0
98
Async, Python, and the Future
andrewgodwin
1
370
How To Break Django: With Async
andrewgodwin
1
310
Taking Django's ORM Async
andrewgodwin
0
340
The Long Road To Asynchrony
andrewgodwin
0
380
The Scientist & The Engineer
andrewgodwin
1
380
Pioneering Real-Time
andrewgodwin
0
150
Just Add Await: Retrofitting Async Into Django
andrewgodwin
2
1.1k
Terrain, Art, Python and LiDAR
andrewgodwin
1
240
Other Decks in Programming
See All in Programming
言語処理ライブラリ開発における失敗談 / NLPHacks
taishii
1
440
Get Ready for Jakarta EE 10
ivargrimstad
0
110
Power Automateドリブンのチームマネジメント
hanaseleb
0
180
NEWT.net: Frontend Technology Selection
xpromx
0
230
Branching out to Jetpack Compose
chrisbanes
4
1.2k
Jetpack Compose best practices 動画紹介 @GoogleI/O LT会
takakitojo
0
300
Seleniumでイキってたらサーバを絞め落としかけてた話
kenfujita
0
360
Scrum Fest Osaka 2022/5年で200人になったスタートアップの アジャイル開発の歴史とリアル
atamaplus
1
840
The strategies behind ddd – AdeoDevSummit 2022
lilobase
PRO
4
240
Value and Record Types
hschwentner
0
550
io22 extended What's new in app performance
veronikapj
0
340
iOS 16からのロック画面Widget争奪戦に備える
tsuzuki817
0
210
Featured
See All Featured
Designing for humans not robots
tammielis
241
23k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
11
4.7k
Designing Experiences People Love
moore
130
22k
Designing on Purpose - Digital PM Summit 2013
jponch
106
5.6k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
100
5.9k
Building a Scalable Design System with Sketch
lauravandoore
448
30k
Faster Mobile Websites
deanohume
294
28k
Streamline your AJAX requests with AmplifyJS and jQuery
dougneiner
126
8.5k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
39
13k
Build your cross-platform service in a week with App Engine
jlugia
219
17k
StorybookのUI Testing Handbookを読んだ
zakiyama
5
2.2k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
269
11k
Transcript
Andrew Godwin @andrewgodwin SMALL DATA STORAGE FOR THE REST OF
US
Andrew Godwin Hi, I'm Django Core Developer Senior Engineer at
Far too many hobbies
BIG DATA What does it mean?
BIG DATA What does it mean? What is 'big'?
1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?
Scalable designs are a tradeoff: NOW LATER vs
Small company? Agency? Focus on ease of change, not scalability
You don't need to scale from day one But always
leave yourself scaling points
Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'
Rapid development It's all about schema change overhead
Explicit Schema ID int Name text Weight uint 1 2
3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
Silent Failure { "id": 342, "name": "David", "weight": 74, }
{ "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
Continuous deployment It's 11pm. Do you know where your locks
are?
Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type
changes
Hardware choice ZOMG RUN IT ON THE CLOUD
VMs are TERRIBLE at IO Up to 10x slowdown, even
with VT-d.
Memory is king Your database loves it. Don't let other
apps steal it.
Adding more power goes far Especially with PostgreSQL or read-only
replicas
Scaling Breakpoints
Sharding point Datasets paritioned by primary key
Vertical split Entirely unrelated tables
Denormalisation It's not free!
Consistency leeway Can you take inconsistent views?
Load Shapes
Read-heavy Write-heavy Large size
Read-heavy Write-heavy Large size Wikipedia TV show website Minecraft Forums
Amazon Glacier Eventbrite Logging
Read-heavy Write-heavy Large size Offline storage Append formats In-memory cache
/ flat files Many indexes Fewer indexes
Extremes
Extreme Reads Heavy Replication Extreme Writes Sacrifice ordering or consistency
Extreme Size Sacrifice query time
Extreme Longevity Flash in cold storage Extreme Survivability Rad-hardened Flash
Extreme Auditability True append only storage
SSDs Magnetic Tape Hard Drives Consumer Flash CDs/DVDs Long-life Flash
Metal-Carbon DVDs 3-6 months 5-10 years 3-5 years 100+ years Approximate time to bit flip, unpowered at room temperature
Big Data isn't one thing It depends on type, size,
complexity, throughput, latency...
Focus on the current problems Future problems don't matter if
you never get there
Efficiency and iterating fast matters The smaller you are, the
more time is worth
Good architecture affects product You're not writing a system in
a vacuum
Thanks. Andrew Godwin @andrewgodwin