Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python and Relational/Non-relational Databases

Python and Relational/Non-relational Databases

A talk I gave at PyCon Ukraine 2010.

Andrew Godwin

October 22, 2010
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Relational / Non-relational
    Databases
    Python and
    Andrew Godwin

    View Slide

  2. Introduction
    Python for 5 years
    Django core developer
    Data modelling / visualisation

    View Slide

  3. ""Andrew speaks English
    like a machine gun
    speaks bullets.""
    Reinout van Rees

    View Slide

  4. If I speak too fast -
    tell me!

    View Slide

  5. What is a
    relational database?

    View Slide

  6. A relational database is
    a “collection of relations”

    View Slide

  7. It's what a lot of people
    are used to.

    View Slide

  8. Relational Databases
    PostgreSQL
    MySQL
    SQLite

    View Slide

  9. Let's pick PostgreSQL
    (it's a good choice)

    View Slide

  10. Usage
    conn = psycopg2.connect(
    host="localhost",
    user="postgres"
    )
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM users WHERE
    username = "andrew";')
    for row in cursor.fetchall():
    print row

    View Slide

  11. You've probably seen all
    that before.

    View Slide

  12. Now, to introduce some
    non-relational databases

    View Slide

  13. Document Databases
    MongoDB
    CouchDB

    View Slide

  14. Key-Value Stores
    Redis
    Cassandra

    View Slide

  15. Message Queues
    AMQP
    Celery

    View Slide

  16. Various Others
    Graph databases
    Filesystems
    VCSs

    View Slide

  17. Redis and MongoDB are
    two good examples here

    View Slide

  18. Redis: Key-value store with
    strings, lists, sets, channels
    and atomic operations.

    View Slide

  19. Redis Example
    conn = redis.Redis(host="localhost")
    print conn.get("top_value")
    conn.set("last_user", "andrew")
    conn.inc("num_runs")
    conn.sadd("users", "andrew")
    conn.sadd("users", "martin")
    for item in conn.smembers("users"):
    print item

    View Slide

  20. MongoDB: Document store
    with indexing and a wide
    range of query filters.

    View Slide

  21. MongoDB Example
    conn = pymongo.Connection("localhost")
    db = conn['mongo_example']
    coll = db['users']
    coll.insert({
    "username": "andrew",
    "uid": 1000,
    })
    for entry in coll.find({"username":
    "andrew"}):
    print entry

    View Slide

  22. These all solve different
    problems - you can't easily
    replace one with the other.

    View Slide

  23. ""When all you have is a
    hammer, everything
    looks like a nail""
    Abraham Manslow (paraphrased)

    View Slide

  24. JOIN - your best friend,
    and your worst enemy.

    View Slide

  25. Denormalising your data speeds
    up reads, and slows down writes.

    View Slide

  26. Schemaless != Denormalised

    View Slide

  27. Atomic operations are nice.
    conn.incrby("num_users', 2)

    View Slide

  28. But SQL can do some of them.
    UPDATE foo SET bar = bar + 1 WHERE baz;

    View Slide

  29. Redis, the datastructures server.
    SETNX, GETSET, EXPIRES and friends

    View Slide

  30. Locks / Semaphores
    conn.setnx("lock:foo", time.time() + 3600)
    val = conn.decr("sem:foo")
    if val >= 0: ... else: conn.incr("sem:foo")

    View Slide

  31. Queues
    conn.lpush("myqueue", "workitem")
    todo = conn.lpop("myqueue")
    (or publish/subscribe)

    View Slide

  32. Priority Queues
    conn.zadd("myqueue", "handle-meltdown", 1)
    conn.zadd("myqueue", "feed-cats", 5)
    todo = conn.zrange("myqueue", 0, 1)
    conn.zrem(todo)

    View Slide

  33. Lock-free linked lists!
    new_id = "bgrdsd"
    old_end = conn.getset(":end", new_id)
    conn.set("%s:next" % old_end, new_id)

    View Slide

  34. Performance-wise, the less
    checks/integrity the faster
    it goes.

    View Slide

  35. Maturity can sometimes be
    an issue, but new features
    can appear rapidly.

    View Slide

  36. You can also use databases
    for the wrong thing - it
    often only matters ""at scale""

    View Slide

  37. But how does this all
    relate to Python?

    View Slide

  38. Most databases - even
    new ones - have good
    Python bindings

    View Slide

  39. Postgres: PsycoPG2
    Redis: redis-py
    MongoDB: pymongo
    (and more - neo4j, VCSen, relational, etc.)

    View Slide

  40. Some databases have
    Python available inside
    (Postgres has it as an option)

    View Slide

  41. Document databases map
    really well to Python dicts

    View Slide

  42. You may find non-relational
    databases a nicer way to
    store state - for any app

    View Slide

  43. Remember, you might still
    need transactions/reliability.
    (Business logic is probably better
    off on mature systems for now)

    View Slide

  44. Overall? Just keep all
    the options in mind.
    Don't get caught by trends,
    and don't abuse your relational store

    View Slide

  45. Thanks.
    Andrew Godwin
    @andrewgodwin
    http://aeracode.org

    View Slide