A talk I gave at RuPy 2012 - discussing databases, their architecture and history, and how to make them happy.
ANDREW GODWIN // @andrewgodwinT H EWONDERFULDATABASES- O F -W O R L D
View Slide
?
=
!Databases areimportant
Journaling / WALTransactionsSchema changesJOINs
Journaling / WALDatabaseDisk
Journaling / WALDatabaseOSDisk
Journaling / WALDatabaseOSDisk ControllerDisk Platter
Journaling / WALDatabaseOSDisk ControllerDisk PlatterRAID controller
Journaling / WALDatabaseOSNetwork cardSwitchNetwork card
TransactionsRead Writetime →Read WriteRead Readt₁t₂t₃
Altering SchemaSchemaData?
Altering SchemaSchemaData? ?
JOINs
Table A Table B
n²
n²Hash JoinsMerge JoinsIndex scans
But wait!There's more!
Asynchronous Callsimport psycopg2import eventleteventlet.monkey_patch(psycopg2=True)cur1 = psycopg2.connect().cursor()cur2 = psycopg2.connect().cursor()cur1.execute("BIG SELECT")cur2.execute("ANOTHER SELECT")results1 = cur1.fetchall()results2 = cur2.fetchall()
Document Stores• Easy to add data• You still need schema• Large updates can be slow
Distributed DatabasesConsistencyAvailabilityPartition Tolerance(pick any two)
Performance Tips
Don't use MySQLReally, don't.
Play to strengthsDo search in a search engine.Don't put relational in MongoDB.
Use indexes!EXPLAIN is your friend.
# explain select * from aeblog_post where id = 50;QUERY PLAN---------------------------------------------------Index Scan using aeblog_post_pkey on aeblog_post(cost=0.00..8.27 rows=1 width=1812)Index Cond: (id = 50)
# explain select * from aeblog_postwhere section = 'trip-notes';QUERY PLAN---------------------------------------------------Seq Scan on aeblog_post (cost=0.00..14.21 rows=1width=1812)Filter: ((section)::text = 'trip-notes'::text)
Analytics. Everywhere.Slow queries. Number of queries.
Avoid giant IN queriesSeriously. A JOIN is better.
Think.Learn.
Fin.Andrew Godwin@andrewgodwinhttp://aeracode.orgThanks.