$30 off During Our Annual Pro Sale. View Details »

The Wonderful World Of Databases

The Wonderful World Of Databases

A talk I gave at RuPy 2012 - discussing databases, their architecture and history, and how to make them happy.

Andrew Godwin

November 18, 2012
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. ANDREW GODWIN // @andrewgodwin
    T H E
    WONDERFUL
    DATABASES
    - O F -
    W O R L D

    View Slide

  2. ?

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. =

    View Slide

  7. !
    Databases are
    important

    View Slide

  8. Journaling / WAL
    Transactions
    Schema changes
    JOINs

    View Slide

  9. Journaling / WAL
    Database
    Disk

    View Slide

  10. Journaling / WAL
    Database
    OS
    Disk

    View Slide

  11. Journaling / WAL
    Database
    OS
    Disk Controller
    Disk Platter

    View Slide

  12. Journaling / WAL
    Database
    OS
    Disk Controller
    Disk Platter
    RAID controller

    View Slide

  13. Journaling / WAL
    Database
    OS
    Network card
    Switch
    Network card

    View Slide

  14. Transactions
    Read Write
    time →
    Read Write
    Read Read
    t₁
    t₂
    t₃

    View Slide

  15. Transactions
    Read Write
    time →
    Read Write
    Read Read
    t₁
    t₂
    t₃

    View Slide

  16. Altering Schema
    Schema
    Data
    ?

    View Slide

  17. Altering Schema
    Schema
    Data
    ? ?

    View Slide

  18. Altering Schema
    Schema
    Data
    ?

    View Slide

  19. JOINs

    View Slide

  20. JOINs

    View Slide

  21. Table A Table B

    View Slide


  22. View Slide


  23. Hash Joins
    Merge Joins
    Index scans

    View Slide

  24. But wait!
    There's more!

    View Slide

  25. Asynchronous Calls
    import psycopg2
    import eventlet
    eventlet.monkey_patch(psycopg2=True)
    cur1 = psycopg2.connect().cursor()
    cur2 = psycopg2.connect().cursor()
    cur1.execute("BIG SELECT")
    cur2.execute("ANOTHER SELECT")
    results1 = cur1.fetchall()
    results2 = cur2.fetchall()

    View Slide

  26. Document Stores
    • Easy to add data
    • You still need schema
    • Large updates can be slow

    View Slide

  27. Distributed Databases
    Consistency
    Availability
    Partition Tolerance
    (pick any two)

    View Slide

  28. Performance Tips

    View Slide

  29. Don't use MySQL
    Really, don't.

    View Slide

  30. Play to strengths
    Do search in a search engine.
    Don't put relational in MongoDB.

    View Slide

  31. Use indexes!
    EXPLAIN is your friend.

    View Slide

  32. # explain select * from aeblog_post where id = 50;
    QUERY PLAN
    ---------------------------------------------------
    Index Scan using aeblog_post_pkey on aeblog_post
    (cost=0.00..8.27 rows=1 width=1812)
    Index Cond: (id = 50)

    View Slide

  33. # explain select * from aeblog_post
    where section = 'trip-notes';
    QUERY PLAN
    ---------------------------------------------------
    Seq Scan on aeblog_post (cost=0.00..14.21 rows=1
    width=1812)
    Filter: ((section)::text = 'trip-notes'::text)

    View Slide

  34. Analytics. Everywhere.
    Slow queries. Number of queries.

    View Slide

  35. Avoid giant IN queries
    Seriously. A JOIN is better.

    View Slide

  36. Think.
    Learn.

    View Slide

  37. Fin.
    Andrew Godwin
    @andrewgodwin
    http://aeracode.org
    Thanks.

    View Slide