Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Django Meta API, A journey of optimization and design

The Django Meta API, A journey of optimization and design

D547949cc9256f649a3519c8a1673f14?s=128

Daniel Pyrathon

November 15, 2014
Tweet

Transcript

  1. The Meta API A journey of optimization and design Lessons

    learned from a SoC project Daniel Pyrathon @pirosb3
  2. Who am I

  3. Ex Bachelor student (Just Graduated!) In a very informal way

    ..a closer look
  4. Currently working at Pathflow Yes, we do use some Django..

  5. Google Summer of Code • A global program that offers

    students stipends to write code for open source projects • One of Google’s ways to give back to Open Source • A great occasion for you to give back to open source! • May 19 to August 18 • Nearly 200 Open Source organisations • Each organisation has 1 or more proposed projects (you can also propose others) • 5,000 USD for the project Program details
  6. My Summer of Code Formalizing the Meta object task

  7. • An internal API, hidden under the _meta object within

    each model • Allows Django to introspect a model’s internals • Makes a lot of Django’s model magic possible • … so basically you don’t know about it, but you may have used it What is the Meta API A very simple definition
  8. Provides metadata about the model: • model name • app

    name • abstract? • proxy? • database table name • table permissions • primary key What is inside the Meta object What is the Meta API
  9. Provides metadata and references to fields and relations in a

    model • Field names • Field instances • Model relations • Field attributes What is inside the Meta object What is the Meta API >>> User._meta.fields (<django.db.models.fields.AutoField: id>, <django.db.models.fields.CharField: password>, <django.db.models.fields.DateTimeField: last_login>, ... ... )
  10. Admin Migrations ModelForms Other developers? What is the Meta API

    Which apps use the Meta API
  11. Developers have always used it, even though it’s not officially

    supported What is the Meta API Which apps use the Meta API django-nonrel django-taggit django-rest-framework ..and many more
  12. What is the Meta API There was a big need

    for a public API
  13. What is the Meta API There was a big need

    for a public API
  14. YESTERDAY (What is currently in master, and where I started

    from)
  15. Complexity Previous Meta API entry-points def many_to_many(self): def get_m2m_with_model(self): def

    get_field(self, name, many_to_many=True): def get_field_by_name(self, name): def get_all_field_names(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self): def concrete_fields(self): def local_concrete_fields(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): def fields(self): 14
  16. Distinction between Fields and Related Objects Complexity Brand Item Item

    has a ForeignKey to Brand 1 Brand has a RelatedObject to Item as a consequence of the relation 2 Related Objects Related objects are objects created as a consequence of a relation of another model with the current model. Fields Any field defined on the model with or without a relation.
  17. • 10 entry-points • 4 cached properties • 6 Separate

    caching systems for API • Distinction between 4 different types of fields: fields, m2m, related_objects, related_m2m • Never been tested Complexity Previous Meta API entry-points
  18. Complexity Difference between properties and concepts All Model Related Objects

    related_objects related_m2m Excludes RO coming from M2M relations Only RO coming from M2M relations All Model fields fields many_to_many Excludes fields that are M2M Only fields that are M2M The decision to split these concepts into multiple properties is only an implementation detail
  19. Complexity Previous Meta API entry-points def many_to_many(self): def get_m2m_with_model(self): def

    get_field(self, name, many_to_many=True): def get_field_by_name(self, name): def get_all_field_names(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self): def concrete_fields(self): def local_concrete_fields(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): 14
  20. TODAY (What is currently in my fork’s Pull Request, not

    yet in Django)
  21. • An official API, that everyone can use without risk

    of breakage. • A fast API, that also Django’s internals can use. • An intuitive API, simple to use and documented. The new Meta API Philosophy
  22. Complexity New Meta API entry-points def get_fields(self, forward, reverse, ..):

    def get_field(self, field_name): def field_names(self): def fields(self): def concrete_fields(self): def local_concrete_fields(self): def related_objects(self): 7
  23. >>> User._meta.field_names set(['name', 'email', ..]) The new Meta API 3

    Intuitive return types field_names cached properties get_field() >>> User._meta.fields (<django.db.models.fields.AutoField: id>, ..,) >>> Person._meta.get_field('name') <django.db.models.fields.CharField: name>
  24. • 2 Entry-points • 5 Cached properties • Only 1

    cache layer • Distinction between related objects and fields • 46 Test Cases Complexity New Meta API entry-points
  25. get_fields() fields many_to_many related_objects field_names get_field() Every cached property depends

    on get_fields() • Consistency • Maintainability The new Meta API A single generator function: get_fields()
  26. SuperModel AbstractModel Model get_fields() 1 2 3 4 get_fields() needs

    to take into consideration inheritance, model swapping, and proxy models. Calls to get_fields() are entirely recursive The new Meta API An overview of get_fields()
  27. SuperModel AbstractModel Model get_fields() 1 2 3 4 The new

    Meta API Caching layers and recursiveness Caching is computed per each layer recursively. • Less computation per layer • Duplicate data being set • Cache invalidation.. Cache
  28. get_fields() local_fields Related Objects Graph The new Meta API A

    single generator function
  29. Related Objects Graph • A graph of connections between models

    • Generates a map between models and connections • Efficient, computed once for everyone and cached • Still really expensive on first lookup: For every model in every field in every app The new Meta API Related objects graph
  30. I am by • Avoids function call overhead • Uses

    internal __dict__ • pip install cached- property (PyDanny) • Has its limitations The new Meta API Cached property
  31. The Meta API If this method gets executed, it must

    be the first ever call to _relation_tree. All other calls return the attribute directly
  32. The Meta API Avoids instantiating multiple empty lists stores in

    __dict__
  33. The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache

    RT Cache RT Cache RT Cache aModel1_meta.relation_tree
  34. The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache

    RT Cache RT Cache RT Cache aModel1_meta.relation_tree
  35. The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache

    RT Cache RT Cache RT Cache aModel4_meta.relation_tree aModel1._meta.relation_tree
  36. Apps.register_model() Apps.clear_cache() aModel1 RT Cache aModel2 RT Cache aModel3 RT

    Cache NewModel RT Cache 1 2 Cache invalidation Relation Tree cache
  37. Apps.register_model() Apps.clear_cache() aModel1 RT Cache aModel2 RT Cache aModel3 RT

    Cache NewModel RT Cache 1 2 3 Invalidation is expensive, but it happens only on bootup, and is the price we pay for less bugs. Cache invalidation
  38. fields many_to_many related_objects field_names related_objects_tree The new Meta API Cached

    properties in Django
  39. • Memory efficiency the Meta API is at the core

    of Django, therefore it must provide excellent memory management. Immutable data structures allocate exactly the required space, without over-allocating. • Reusability By returning a reference to an immutable data structure, we guarantee that the end-user cannot manipulate the array, and therefore we can safely return the same reference. • Less bug-prone Personal experience here! API consumers will often retain the array as their own and manipulate the contents. Immutability Why immutability
  40. Why immutability Immutability

  41. What do we do in the Meta API? • All

    return types are immutable, no copies are returned • All return types are cached • When possible, we use data structures that derive from set and tuple Immutability in the Meta API Immutability How does this impact how Django consumes the API? • Iteration over multiple API calls is done using itertools.chain() • Use generators everywhere, when filtering API results
  42. • Use itertools.chain() when possible to avoid allocating a new

    list • Use generator comprehension to map or filter API results Downfalls of generator expressions: no indexing or multiple iteration. Currently this happens very little.. Immutability Immutability in the Meta API
  43. We do even more! The API consumer should never pay

    the price of immutable internals. You can always make a copy for your own use. And in case you forget, we kindly remind you with an AttributeError Immutability Immutability in the Meta API
  44. • Reduced complexity of the previous API • Tests added

    to previous and current API • Performance increased compared to previous API • 465 commits in my second PR, as of today! • A fully working, refactored API, for what we have today in Django • ..But, not what we may want to have tomorrow The new Meta API What we have up till now 10% Performance increase DjangoBench, median of 1000 runs for each test
  45. Complexity New Meta API entry-points def get_fields(self, forward, reverse, ..):

    def get_field(self, field_name): def field_names(self): def fields(self): def concrete_fields(self): def local_concrete_fields(self): def related_objects(self): 7
  46. TOMORROW (What will be in my Pull Request, and hopefully

    soon in Django)
  47. A Minimal get_fields() API Field flags The future Meta API

    An entirely different approach
  48. field.editable Field is editable field.concrete Field has a respective db

    column field.is_relation Field has relation with another model field.one_to_many Cardinality 1-N field.many_to_one Cardinality N-1 field.many_to_many Cardinality N-N field.one_to_one Cardinality 1-1 The future Meta API Boolean flags
  49. field.name Queryable name field.hidden The field is used for another

    field’s functionality (ex. GenericForeignKey) field.model The model that contains the field field.referred_model The model that a field points to (in the case the field has a relation) The future Meta API Data flags
  50. # Fetch all relations that go from A to B

    FIELDS = (f for f in A._meta.get_fields() if not f.hidden and f.is_relation and f.referred_model == B) # Fetch all fields to show on a form for A (including Fks) FIELDS = (f for f in A._meta.get_fields() if not f.hidden and f.editable and (not f.has_relation or f.one_to_many)) # Fetch all fields that have a connected db column FIELDS = (f for f in A._meta.get_fields() if f.concrete) The future Meta API Querying with get_fields() and field flags
  51. Notable past iterations (All major decisions between yesterday and today)

  52. Naming things Major changes from yesterday to today 1. Moving

    to a centralized entry-point: get_fields() 2. Moving from flags to bit-fields 3. Making the API even more sparse 4. Going all the way down to 2 flags
  53. Naming things Spotting the pattern in the old API def

    _fill_fields_cache(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): def _fill_m2m_cache(self): def get_m2m_with_model(self): def _fill_related_objects_cache(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def _fill_related_many_to_many_cache(self): def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self):
  54. def get_fields(fields, m2m, related_objects, related_m2m, with_models): Naming things Compacting into

    a single get_fields() API • Less redundancy • A refactored version of the past API, nothing more • Some entry-points have unique flags, so generalizing can be very hard
  55. def get_fields(types=RELATED_OBJECTS, opts=INCLUDE_HIDDEN | INCLUDE_PROXY): Naming things Using bit-fields •

    A flexible API • Requires imports • Entirely anti-pythonic • Causes problems with circular imports
  56. def get_fields(pure_data, pure_m2m, pure_virtual, forward_data, forward_m2m, forward_virtual, related_data, related_m2m, related_virual,

    include_hidden, include_proxy, include_concrete): Naming things Making the API even more sparse • Flags are far better: more pythonic and less imports • This matrix can describe exactly what we have now • This matrix may not describe what we want in the future • Field types and options are too sparse to be api parameters.
  57. def get_fields(forward, reverse, include_hidden, include_parents) Naming things Moving to only

    2 main field distinctions • Only separates the main 2 distinction points • The rest of the filtering is done outside the API • Far simpler and easier to maintain • We are not there yet, as this distinction may not exactly be what we want in the future (future ForeignKeys, future Virtual Fields)
  58. An open source project is nothing without it’s community. Please

    give me feedback on Google Group, or IRC. If you are coming at the sprints, and you have some ideas or you want to have a chat, please do so! Naming things Conclusion
  59. Without these people, the project would have not gone so

    far: Russell Keith Magee, Collin Anderson, Tim Graham, Loic Bistuer, Anssi, and many more Without these people, I wouldn’t be speaking here: Mark Tamlyn, Dutch Django association, Ola Sitarska, and many more! Community Daniel Pyrathon @pirosb3 Naming things A huge thanks
  60. FIN heb je nog vragen? (questions?) Daniel Pyrathon @pirosb3