Slide 1

Slide 1 text

Relational / Non-relational Databases Python and Andrew Godwin

Slide 2

Slide 2 text

Introduction Python for 5 years Django core developer Data modelling / visualisation

Slide 3

Slide 3 text

""Andrew speaks English like a machine gun speaks bullets."" Reinout van Rees

Slide 4

Slide 4 text

If I speak too fast - tell me!

Slide 5

Slide 5 text

What is a relational database?

Slide 6

Slide 6 text

A relational database is a “collection of relations”

Slide 7

Slide 7 text

It's what a lot of people are used to.

Slide 8

Slide 8 text

Relational Databases PostgreSQL MySQL SQLite

Slide 9

Slide 9 text

Let's pick PostgreSQL (it's a good choice)

Slide 10

Slide 10 text

Usage conn = psycopg2.connect( host="localhost", user="postgres" ) cursor = conn.cursor() cursor.execute('SELECT * FROM users WHERE username = "andrew";') for row in cursor.fetchall(): print row

Slide 11

Slide 11 text

You've probably seen all that before.

Slide 12

Slide 12 text

Now, to introduce some non-relational databases

Slide 13

Slide 13 text

Document Databases MongoDB CouchDB

Slide 14

Slide 14 text

Key-Value Stores Redis Cassandra

Slide 15

Slide 15 text

Message Queues AMQP Celery

Slide 16

Slide 16 text

Various Others Graph databases Filesystems VCSs

Slide 17

Slide 17 text

Redis and MongoDB are two good examples here

Slide 18

Slide 18 text

Redis: Key-value store with strings, lists, sets, channels and atomic operations.

Slide 19

Slide 19 text

Redis Example conn = redis.Redis(host="localhost") print conn.get("top_value") conn.set("last_user", "andrew") conn.inc("num_runs") conn.sadd("users", "andrew") conn.sadd("users", "martin") for item in conn.smembers("users"): print item

Slide 20

Slide 20 text

MongoDB: Document store with indexing and a wide range of query filters.

Slide 21

Slide 21 text

MongoDB Example conn = pymongo.Connection("localhost") db = conn['mongo_example'] coll = db['users'] coll.insert({ "username": "andrew", "uid": 1000, }) for entry in coll.find({"username": "andrew"}): print entry

Slide 22

Slide 22 text

These all solve different problems - you can't easily replace one with the other.

Slide 23

Slide 23 text

""When all you have is a hammer, everything looks like a nail"" Abraham Manslow (paraphrased)

Slide 24

Slide 24 text

JOIN - your best friend, and your worst enemy.

Slide 25

Slide 25 text

Denormalising your data speeds up reads, and slows down writes.

Slide 26

Slide 26 text

Schemaless != Denormalised

Slide 27

Slide 27 text

Atomic operations are nice. conn.incrby("num_users', 2)

Slide 28

Slide 28 text

But SQL can do some of them. UPDATE foo SET bar = bar + 1 WHERE baz;

Slide 29

Slide 29 text

Redis, the datastructures server. SETNX, GETSET, EXPIRES and friends

Slide 30

Slide 30 text

Locks / Semaphores conn.setnx("lock:foo", time.time() + 3600) val = conn.decr("sem:foo") if val >= 0: ... else: conn.incr("sem:foo")

Slide 31

Slide 31 text

Queues conn.lpush("myqueue", "workitem") todo = conn.lpop("myqueue") (or publish/subscribe)

Slide 32

Slide 32 text

Priority Queues conn.zadd("myqueue", "handle-meltdown", 1) conn.zadd("myqueue", "feed-cats", 5) todo = conn.zrange("myqueue", 0, 1) conn.zrem(todo)

Slide 33

Slide 33 text

Lock-free linked lists! new_id = "bgrdsd" old_end = conn.getset(":end", new_id) conn.set("%s:next" % old_end, new_id)

Slide 34

Slide 34 text

Performance-wise, the less checks/integrity the faster it goes.

Slide 35

Slide 35 text

Maturity can sometimes be an issue, but new features can appear rapidly.

Slide 36

Slide 36 text

You can also use databases for the wrong thing - it often only matters ""at scale""

Slide 37

Slide 37 text

But how does this all relate to Python?

Slide 38

Slide 38 text

Most databases - even new ones - have good Python bindings

Slide 39

Slide 39 text

Postgres: PsycoPG2 Redis: redis-py MongoDB: pymongo (and more - neo4j, VCSen, relational, etc.)

Slide 40

Slide 40 text

Some databases have Python available inside (Postgres has it as an option)

Slide 41

Slide 41 text

Document databases map really well to Python dicts

Slide 42

Slide 42 text

You may find non-relational databases a nicer way to store state - for any app

Slide 43

Slide 43 text

Remember, you might still need transactions/reliability. (Business logic is probably better off on mature systems for now)

Slide 44

Slide 44 text

Overall? Just keep all the options in mind. Don't get caught by trends, and don't abuse your relational store

Slide 45

Slide 45 text

Thanks. Andrew Godwin @andrewgodwin http://aeracode.org