The Wonderful World Of Databases

The Wonderful World Of Databases

A talk I gave at RuPy 2012 - discussing databases, their architecture and history, and how to make them happy.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

November 18, 2012
Tweet

Transcript

  1. ANDREW GODWIN // @andrewgodwin T H E WONDERFUL DATABASES -

    O F - W O R L D
  2. ?

  3. None
  4. None
  5. None
  6. =

  7. ! Databases are important

  8. Journaling / WAL Transactions Schema changes JOINs

  9. Journaling / WAL Database Disk

  10. Journaling / WAL Database OS Disk

  11. Journaling / WAL Database OS Disk Controller Disk Platter

  12. Journaling / WAL Database OS Disk Controller Disk Platter RAID

    controller
  13. Journaling / WAL Database OS Network card Switch Network card

  14. Transactions Read Write time → Read Write Read Read t₁

    t₂ t₃
  15. Transactions Read Write time → Read Write Read Read t₁

    t₂ t₃
  16. Altering Schema Schema Data ?

  17. Altering Schema Schema Data ? ?

  18. Altering Schema Schema Data ?

  19. JOINs

  20. JOINs

  21. Table A Table B

  22. n² Hash Joins Merge Joins Index scans

  23. But wait! There's more!

  24. Asynchronous Calls import psycopg2 import eventlet eventlet.monkey_patch(psycopg2=True) cur1 = psycopg2.connect().cursor()

    cur2 = psycopg2.connect().cursor() cur1.execute("BIG SELECT") cur2.execute("ANOTHER SELECT") results1 = cur1.fetchall() results2 = cur2.fetchall()
  25. Document Stores • Easy to add data • You still

    need schema • Large updates can be slow
  26. Distributed Databases Consistency Availability Partition Tolerance (pick any two)

  27. Performance Tips

  28. Don't use MySQL Really, don't.

  29. Play to strengths Do search in a search engine. Don't

    put relational in MongoDB.
  30. Use indexes! EXPLAIN is your friend.

  31. # explain select * from aeblog_post where id = 50;

    QUERY PLAN --------------------------------------------------- Index Scan using aeblog_post_pkey on aeblog_post (cost=0.00..8.27 rows=1 width=1812) Index Cond: (id = 50)
  32. # explain select * from aeblog_post where section = 'trip-notes';

    QUERY PLAN --------------------------------------------------- Seq Scan on aeblog_post (cost=0.00..14.21 rows=1 width=1812) Filter: ((section)::text = 'trip-notes'::text)
  33. Analytics. Everywhere. Slow queries. Number of queries.

  34. Avoid giant IN queries Seriously. A JOIN is better.

  35. Think. Learn.

  36. Fin. Andrew Godwin @andrewgodwin http://aeracode.org Thanks.