Python and Relational/Non-relational Databases

Python and Relational/Non-relational Databases

A talk I gave at PyCon Ukraine 2010.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

October 22, 2010
Tweet

Transcript

  1. Relational / Non-relational Databases Python and Andrew Godwin

  2. Introduction Python for 5 years Django core developer Data modelling

    / visualisation
  3. ""Andrew speaks English like a machine gun speaks bullets."" Reinout

    van Rees
  4. If I speak too fast - tell me!

  5. What is a relational database?

  6. A relational database is a “collection of relations”

  7. It's what a lot of people are used to.

  8. Relational Databases PostgreSQL MySQL SQLite

  9. Let's pick PostgreSQL (it's a good choice)

  10. Usage conn = psycopg2.connect( host="localhost", user="postgres" ) cursor = conn.cursor()

    cursor.execute('SELECT * FROM users WHERE username = "andrew";') for row in cursor.fetchall(): print row
  11. You've probably seen all that before.

  12. Now, to introduce some non-relational databases

  13. Document Databases MongoDB CouchDB

  14. Key-Value Stores Redis Cassandra

  15. Message Queues AMQP Celery

  16. Various Others Graph databases Filesystems VCSs

  17. Redis and MongoDB are two good examples here

  18. Redis: Key-value store with strings, lists, sets, channels and atomic

    operations.
  19. Redis Example conn = redis.Redis(host="localhost") print conn.get("top_value") conn.set("last_user", "andrew") conn.inc("num_runs")

    conn.sadd("users", "andrew") conn.sadd("users", "martin") for item in conn.smembers("users"): print item
  20. MongoDB: Document store with indexing and a wide range of

    query filters.
  21. MongoDB Example conn = pymongo.Connection("localhost") db = conn['mongo_example'] coll =

    db['users'] coll.insert({ "username": "andrew", "uid": 1000, }) for entry in coll.find({"username": "andrew"}): print entry
  22. These all solve different problems - you can't easily replace

    one with the other.
  23. ""When all you have is a hammer, everything looks like

    a nail"" Abraham Manslow (paraphrased)
  24. JOIN - your best friend, and your worst enemy.

  25. Denormalising your data speeds up reads, and slows down writes.

  26. Schemaless != Denormalised

  27. Atomic operations are nice. conn.incrby("num_users', 2)

  28. But SQL can do some of them. UPDATE foo SET

    bar = bar + 1 WHERE baz;
  29. Redis, the datastructures server. SETNX, GETSET, EXPIRES and friends

  30. Locks / Semaphores conn.setnx("lock:foo", time.time() + 3600) val = conn.decr("sem:foo")

    if val >= 0: ... else: conn.incr("sem:foo")
  31. Queues conn.lpush("myqueue", "workitem") todo = conn.lpop("myqueue") (or publish/subscribe)

  32. Priority Queues conn.zadd("myqueue", "handle-meltdown", 1) conn.zadd("myqueue", "feed-cats", 5) todo =

    conn.zrange("myqueue", 0, 1) conn.zrem(todo)
  33. Lock-free linked lists! new_id = "bgrdsd" old_end = conn.getset(":end", new_id)

    conn.set("%s:next" % old_end, new_id)
  34. Performance-wise, the less checks/integrity the faster it goes.

  35. Maturity can sometimes be an issue, but new features can

    appear rapidly.
  36. You can also use databases for the wrong thing -

    it often only matters ""at scale""
  37. But how does this all relate to Python?

  38. Most databases - even new ones - have good Python

    bindings
  39. Postgres: PsycoPG2 Redis: redis-py MongoDB: pymongo (and more - neo4j,

    VCSen, relational, etc.)
  40. Some databases have Python available inside (Postgres has it as

    an option)
  41. Document databases map really well to Python dicts

  42. You may find non-relational databases a nicer way to store

    state - for any app
  43. Remember, you might still need transactions/reliability. (Business logic is probably

    better off on mature systems for now)
  44. Overall? Just keep all the options in mind. Don't get

    caught by trends, and don't abuse your relational store
  45. Thanks. Andrew Godwin @andrewgodwin http://aeracode.org