Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Developing good ORMs is HARD! by Nickolas Grigo...

Pycon ZA
October 12, 2018

Developing good ORMs is HARD! by Nickolas Grigoriadis

As with many people, I was looking for an ORM for asyncio Python.

Whilst `asyncio` is a great framework for I/O bound applications, there isn't any mature, recommendable ORMs for it.
Many attempts to wrap an existing sync Python ORM (such as peewee or sqlAlchemy) by having them run in a separate thread, and then dealing with synchronising between the event loop and threads got abandoned, due to a myriad of problems, including performance, correctness and blocking.
Other ORMs were abandoned before they worked, or had so many layers of abstraction that I feared to touch it.

Then I came across Tortoise ORM
It had a simple design. (Inspired by the Django ORM syntax)
It actually worked when I tried it out.

So I decided to jump in, and help with development.

In this talk I'll talk about my experience of being on a development team on an ORM.
There is a reason there are so few successful ORM projects out there.

Developing good ORMs are HARD

Pycon ZA

October 12, 2018
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Developing good ORMs is HARD! or, I got involved in

    working on an ORM Nickolas Grigoriadis Github: grigi
  2. 2 Why did I get involved? • I really like

    asyncio • I realised I depended on ORMs • I wanted to use BOTH at the same time Inspired by an issue on the promising Vibora framework, discussing good asyncio ORMs. No clear answer, I decided to search...
  3. 3 What is asyncio? asyncio is a concurrency framework: async

    def do_it(some, thing): query = compute_query(some, thing) # Here we call another async function, ceding control of event loop result = await do_the_query(query) result.tag = 'I did it' return result Ideal for clearly understandable code. Ideal for integration
  4. 4 What is an ORM? Define a model: class Person(Model):

    name ⇒ string happy ⇒ boolean Now you can create a record: Person(name='One', happy=True).save() And fetch a record: Person.get(name='One') >>> <class Person(name='One', happy=True)>
  5. 5 Why did I get involved? (resumed) I found many

    attempts that all failed for some reason: • Wrapped an existing, async-hostile ORM, these projects all appear abandoned. • Only supports a single database (Defeats the purpose of an ORM, thinks I) • Did not work when I tried it.
  6. 6 Why did I get involved? Then I found Tortoise

    ORM... It had a simple, but clear, design. (Inspired by Django ORM) It actually worked when I tried it. https://tortoise.github.io/
  7. 7 How did my involvement start? Wow, this looks nice

    Hi I have been looking for a sane async python orm, and this is the first one that I found that ticks my boxes ito design. Clearly this is a work in progress, but as it is it actually is workable for simpler projects Could you please share your goals of this project, and how can I help?
  8. 8 Early steps Working with Andrey Bondar @Zeliboba5 is awesome.

    He has a really good understanding of SQL and databases in detail. Since I have no real experience doing something like this I decided to go "tactical"... I mean, the little housekeeping things, and quick wins.
  9. 9 Early steps My normal first steps: • Static analysis:

    Catch simple errors, limits bikeshedding. • Set up testing Oh dear, these samples need a whole DB set up! • CI Travis
  10. 10 Early steps Not so normal steps: • Documentation •

    Provide test doubles Modelled after Django, does setUp and tearDown of a DB. • Retroactively write tests for code I don’t know. • Dogfooding This actually caught me out.
  11. 11 Testing was a particular issue There is examples, which

    was used as manual tests. • The first tests was to take the examples and turn them into "fat" tests. One always need fat tests, especially for re-factoring • We redid transactions until we could isolate tests. • Wrote lots of unit tests, and learned the code. Over 91% test coverage now. Why so low you ask? We'll get there...
  12. 12 Priorities As I was learning the project I would

    have questions. • How do I use a thing? • What is correct behaviour? • What makes a “good” ORM? • In case of trade-offs, how do we choose the right one?
  13. 13 Priorities We decided on a simple set of priorities:

    • It should minimally dictate project structure It should be easy to “jam” into an existing project. • Correctness > Ease-of-Use > Performance > Maintenance • We need a community To ensure longevity of the project. Easy decision to make, but does give us a lot of constraints.
  14. 14 So, what is a “good” ORM? At a high

    level, I think it would be: • Reduce complexity of applications • Never surprise • Have all the features one needs • And none of the ones that get in the way • Performant • Cleanly abstract away many different databases
  15. 15 So, what is a “good” ORM? This expands into

    something more concrete: • Reliable • Succinct syntax • Consistent • Enough Field types • Secure • Prevents side-effects • Performant • Tailoring • Data migrations • Easy (de)serialisation
  16. 16 Databases • They are amazing • And unique snowflakes

    Syntax • We are hiding a lot of complexity behind a (hopefully) simple interface. Async adds new complexities • How do I ensure that two different async coroutines on the same event loop don't share transaction state. Why is developing a good ORM hard?
  17. 17 • SQL Injection • Data mangling e.g. MySQL will

    silently truncate a string if it is too big, or replace a too large int with maxint. • SQLite is super relaxed about datatypes e.g. Decimal is stored as a float so mangles by default, Date/time returns as strings, booleans as int. Means we have to handle conversions to minimise mangling. Why is developing a good ORM hard?
  18. 18 • SELECT ... WHERE field=null Does not do what

    you expect, never returns any results. • DELETE FROM ... LIMIT 100 It’s a thing... It also doesn't work. • MySQL doesn't have case-sensitive text operations. • Transactions... ACID did us all a great big favour. Why is developing a good ORM hard?
  19. 19 Each database has a different way of getting the

    primary key of the item you just inserted: • SQLite: SELECT last_insert_rowid() • MySQL: cursor.lastrowid • PostgreSQL: INSERT ... RETURNS id And many more... Why is developing a good ORM hard?
  20. 20 Succinct syntax ✔ Consistent ✔ Enough Field types ✔

    Secure ? Reliable ? Prevents side-effects ~ Handles several workarounds already, should use Hypothesis for greater confidence. Performant ~ Have done some, lots more to do Tailoring ~ We try to limit dictating structure Data migrations ❌ Easy (de)serialisation ❌ How does Tortoise ORM compare?
  21. 21 Tortoise ORM features Tortoise ORM is an ORM (Object-Relational-Mapper)

    written for use with CPython >= 3.5 and asyncio. Designed to be used without enforcing convention. It uses a variant of the Active Record pattern, and is heavily inspired by Django ORM in syntax. It doesn’t keep the Python objects in sync with the Database, but expects explicit .save() methods.
  22. 22 Tortoise ORM features It supports: • SQLite, MySQL &

    PostgreSQL • Provides testing doubles • Supports most common Field types • Comprehensive Query API • PyLint plugin
  23. 23 Tortoise ORM syntax from tortoise.models import Model from tortoise

    import fields class Tournament(Model): id = fields.IntField(pk=True) name = fields.TextField()
  24. 24 Tortoise ORM syntax from tortoise import Tortoise, run_async async

    def init(): await Tortoise.init( db_url='sqlite://db.sqlite3', Modules={'models': ['app.models']}) # Generate the schema → Once only please await Tortoise.generate_schemas()
  25. 25 Tortoise ORM syntax # Create instance by save tournament

    = Tournament(name='New Tournament') await tournament.save() # Or by .create() await Tournament.create(name='Another Tournament') # Now search for a record tour = await Tournament.filter(name__contains='Another').first() print(tour.name) >>> Another Tournament
  26. 27 Performance (SQLite tested, relative) Django peewee Pony ORM SQLAlchemy

    ORM SQLObject Tortoise ORM At first, performance was not ideal... Raw Insert Atomic Insert Filter Get
  27. 28 Performance (SQLite tested, relative) Django peewee Pony ORM SQLAlchemy

    ORM SQLObject Tortoise ORM But easy to improve significantly Raw Insert Atomic Insert Filter Get
  28. 29 Performance A large gain is from simple synchronization fixes,

    as SQLite provides a blocking interface. Another large gain was in actually building SQL queries, we are using the excellent pypika library, and it was doing deepcopy() for every operation. Optimized Model instantiation, now at the point where we will have to do code generation techniques to get any significant performance increases. Still it is ~ of the runtime in the filtering benchmark. ⅔ of the
  29. 30 Tortoise ORM roadmap We plan to release a v1.0

    release "soon"... Outstanding items for v1.0: • Fix connection pooling Need to rewrite →Need to • Schema regeneration refactor Halfway done →Need to • Performance Mostly done →Need to • Comprehensive tests Relation fields & side-effects →Need to • Finish Documentation Reference material →Need to • Cleaning up examples Doctests? →Need to
  30. 31 Tortoise ORM roadmap What does hitting v1.0 mean for

    us? • Minimise API breakage • Confidence that it should be usable in long lived projects.
  31. 32 Tortoise ORM roadmap, future • Aggregate functions • Subqueries

    • Serialization • Migrations • Actually finishing defining Py3 Types • Sentry plugin And whatever the community comes up with.
  32. 33 Next steps Build a community, Github has some nice

    guides at https://opensource.guide/ We have 3 code contributors, another 2 that converse, but over 100 stars. How do we get more people that sees this decide to use it, and then jump in to help? And fix up the missing parts of what makes a “good” ORM, of course ☺