Slide 1

Slide 1 text

The Meta API A journey of optimization and design Lessons learned from a SoC project Daniel Pyrathon @pirosb3

Slide 2

Slide 2 text

Who am I

Slide 3

Slide 3 text

Ex Bachelor student (Just Graduated!) In a very informal way ..a closer look

Slide 4

Slide 4 text

Currently working at Pathflow Yes, we do use some Django..

Slide 5

Slide 5 text

Google Summer of Code ● A global program that offers students stipends to write code for open source projects ● One of Google’s ways to give back to Open Source ● A great occasion for you to give back to open source! ● May 19 to August 18 ● Nearly 200 Open Source organisations ● Each organisation has 1 or more proposed projects (you can also propose others) ● 5,000 USD for the project Program details

Slide 6

Slide 6 text

My Summer of Code Formalizing the Meta object task

Slide 7

Slide 7 text

● An internal API, hidden under the _meta object within each model ● Allows Django to introspect a model’s internals ● Makes a lot of Django’s model magic possible ● … so basically you don’t know about it, but you may have used it What is the Meta API A very simple definition

Slide 8

Slide 8 text

Provides metadata about the model: ● model name ● app name ● abstract? ● proxy? ● database table name ● table permissions ● primary key What is inside the Meta object What is the Meta API

Slide 9

Slide 9 text

Provides metadata and references to fields and relations in a model ● Field names ● Field instances ● Model relations ● Field attributes What is inside the Meta object What is the Meta API >>> User._meta.fields (, , , ... ... )

Slide 10

Slide 10 text

Admin Migrations ModelForms Other developers? What is the Meta API Which apps use the Meta API

Slide 11

Slide 11 text

Developers have always used it, even though it’s not officially supported What is the Meta API Which apps use the Meta API django-nonrel django-taggit django-rest-framework ..and many more

Slide 12

Slide 12 text

What is the Meta API There was a big need for a public API

Slide 13

Slide 13 text

What is the Meta API There was a big need for a public API

Slide 14

Slide 14 text

YESTERDAY (What is currently in master, and where I started from)

Slide 15

Slide 15 text

Complexity Previous Meta API entry-points def many_to_many(self): def get_m2m_with_model(self): def get_field(self, name, many_to_many=True): def get_field_by_name(self, name): def get_all_field_names(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self): def concrete_fields(self): def local_concrete_fields(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): def fields(self): 14

Slide 16

Slide 16 text

Distinction between Fields and Related Objects Complexity Brand Item Item has a ForeignKey to Brand 1 Brand has a RelatedObject to Item as a consequence of the relation 2 Related Objects Related objects are objects created as a consequence of a relation of another model with the current model. Fields Any field defined on the model with or without a relation.

Slide 17

Slide 17 text

● 10 entry-points ● 4 cached properties ● 6 Separate caching systems for API ● Distinction between 4 different types of fields: fields, m2m, related_objects, related_m2m ● Never been tested Complexity Previous Meta API entry-points

Slide 18

Slide 18 text

Complexity Difference between properties and concepts All Model Related Objects related_objects related_m2m Excludes RO coming from M2M relations Only RO coming from M2M relations All Model fields fields many_to_many Excludes fields that are M2M Only fields that are M2M The decision to split these concepts into multiple properties is only an implementation detail

Slide 19

Slide 19 text

Complexity Previous Meta API entry-points def many_to_many(self): def get_m2m_with_model(self): def get_field(self, name, many_to_many=True): def get_field_by_name(self, name): def get_all_field_names(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self): def concrete_fields(self): def local_concrete_fields(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): 14

Slide 20

Slide 20 text

TODAY (What is currently in my fork’s Pull Request, not yet in Django)

Slide 21

Slide 21 text

● An official API, that everyone can use without risk of breakage. ● A fast API, that also Django’s internals can use. ● An intuitive API, simple to use and documented. The new Meta API Philosophy

Slide 22

Slide 22 text

Complexity New Meta API entry-points def get_fields(self, forward, reverse, ..): def get_field(self, field_name): def field_names(self): def fields(self): def concrete_fields(self): def local_concrete_fields(self): def related_objects(self): 7

Slide 23

Slide 23 text

>>> User._meta.field_names set(['name', 'email', ..]) The new Meta API 3 Intuitive return types field_names cached properties get_field() >>> User._meta.fields (, ..,) >>> Person._meta.get_field('name')

Slide 24

Slide 24 text

● 2 Entry-points ● 5 Cached properties ● Only 1 cache layer ● Distinction between related objects and fields ● 46 Test Cases Complexity New Meta API entry-points

Slide 25

Slide 25 text

get_fields() fields many_to_many related_objects field_names get_field() Every cached property depends on get_fields() ● Consistency ● Maintainability The new Meta API A single generator function: get_fields()

Slide 26

Slide 26 text

SuperModel AbstractModel Model get_fields() 1 2 3 4 get_fields() needs to take into consideration inheritance, model swapping, and proxy models. Calls to get_fields() are entirely recursive The new Meta API An overview of get_fields()

Slide 27

Slide 27 text

SuperModel AbstractModel Model get_fields() 1 2 3 4 The new Meta API Caching layers and recursiveness Caching is computed per each layer recursively. ● Less computation per layer ● Duplicate data being set ● Cache invalidation.. Cache

Slide 28

Slide 28 text

get_fields() local_fields Related Objects Graph The new Meta API A single generator function

Slide 29

Slide 29 text

Related Objects Graph ● A graph of connections between models ● Generates a map between models and connections ● Efficient, computed once for everyone and cached ● Still really expensive on first lookup: For every model in every field in every app The new Meta API Related objects graph

Slide 30

Slide 30 text

I am by ● Avoids function call overhead ● Uses internal __dict__ ● pip install cached- property (PyDanny) ● Has its limitations The new Meta API Cached property

Slide 31

Slide 31 text

The Meta API If this method gets executed, it must be the first ever call to _relation_tree. All other calls return the attribute directly

Slide 32

Slide 32 text

The Meta API Avoids instantiating multiple empty lists stores in __dict__

Slide 33

Slide 33 text

The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache RT Cache RT Cache RT Cache aModel1_meta.relation_tree

Slide 34

Slide 34 text

The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache RT Cache RT Cache RT Cache aModel1_meta.relation_tree

Slide 35

Slide 35 text

The Meta API aModel1 aModel2 aModel3 aModel4 Apps RT Cache RT Cache RT Cache RT Cache aModel4_meta.relation_tree aModel1._meta.relation_tree

Slide 36

Slide 36 text

Apps.register_model() Apps.clear_cache() aModel1 RT Cache aModel2 RT Cache aModel3 RT Cache NewModel RT Cache 1 2 Cache invalidation Relation Tree cache

Slide 37

Slide 37 text

Apps.register_model() Apps.clear_cache() aModel1 RT Cache aModel2 RT Cache aModel3 RT Cache NewModel RT Cache 1 2 3 Invalidation is expensive, but it happens only on bootup, and is the price we pay for less bugs. Cache invalidation

Slide 38

Slide 38 text

fields many_to_many related_objects field_names related_objects_tree The new Meta API Cached properties in Django

Slide 39

Slide 39 text

● Memory efficiency the Meta API is at the core of Django, therefore it must provide excellent memory management. Immutable data structures allocate exactly the required space, without over-allocating. ● Reusability By returning a reference to an immutable data structure, we guarantee that the end-user cannot manipulate the array, and therefore we can safely return the same reference. ● Less bug-prone Personal experience here! API consumers will often retain the array as their own and manipulate the contents. Immutability Why immutability

Slide 40

Slide 40 text

Why immutability Immutability

Slide 41

Slide 41 text

What do we do in the Meta API? ● All return types are immutable, no copies are returned ● All return types are cached ● When possible, we use data structures that derive from set and tuple Immutability in the Meta API Immutability How does this impact how Django consumes the API? ● Iteration over multiple API calls is done using itertools.chain() ● Use generators everywhere, when filtering API results

Slide 42

Slide 42 text

● Use itertools.chain() when possible to avoid allocating a new list ● Use generator comprehension to map or filter API results Downfalls of generator expressions: no indexing or multiple iteration. Currently this happens very little.. Immutability Immutability in the Meta API

Slide 43

Slide 43 text

We do even more! The API consumer should never pay the price of immutable internals. You can always make a copy for your own use. And in case you forget, we kindly remind you with an AttributeError Immutability Immutability in the Meta API

Slide 44

Slide 44 text

● Reduced complexity of the previous API ● Tests added to previous and current API ● Performance increased compared to previous API ● 465 commits in my second PR, as of today! ● A fully working, refactored API, for what we have today in Django ● ..But, not what we may want to have tomorrow The new Meta API What we have up till now 10% Performance increase DjangoBench, median of 1000 runs for each test

Slide 45

Slide 45 text

Complexity New Meta API entry-points def get_fields(self, forward, reverse, ..): def get_field(self, field_name): def field_names(self): def fields(self): def concrete_fields(self): def local_concrete_fields(self): def related_objects(self): 7

Slide 46

Slide 46 text

TOMORROW (What will be in my Pull Request, and hopefully soon in Django)

Slide 47

Slide 47 text

A Minimal get_fields() API Field flags The future Meta API An entirely different approach

Slide 48

Slide 48 text

field.editable Field is editable field.concrete Field has a respective db column field.is_relation Field has relation with another model field.one_to_many Cardinality 1-N field.many_to_one Cardinality N-1 field.many_to_many Cardinality N-N field.one_to_one Cardinality 1-1 The future Meta API Boolean flags

Slide 49

Slide 49 text

field.name Queryable name field.hidden The field is used for another field’s functionality (ex. GenericForeignKey) field.model The model that contains the field field.referred_model The model that a field points to (in the case the field has a relation) The future Meta API Data flags

Slide 50

Slide 50 text

# Fetch all relations that go from A to B FIELDS = (f for f in A._meta.get_fields() if not f.hidden and f.is_relation and f.referred_model == B) # Fetch all fields to show on a form for A (including Fks) FIELDS = (f for f in A._meta.get_fields() if not f.hidden and f.editable and (not f.has_relation or f.one_to_many)) # Fetch all fields that have a connected db column FIELDS = (f for f in A._meta.get_fields() if f.concrete) The future Meta API Querying with get_fields() and field flags

Slide 51

Slide 51 text

Notable past iterations (All major decisions between yesterday and today)

Slide 52

Slide 52 text

Naming things Major changes from yesterday to today 1. Moving to a centralized entry-point: get_fields() 2. Moving from flags to bit-fields 3. Making the API even more sparse 4. Going all the way down to 2 flags

Slide 53

Slide 53 text

Naming things Spotting the pattern in the old API def _fill_fields_cache(self): def get_fields_with_model(self): def get_concrete_fields_with_model(self): def _fill_m2m_cache(self): def get_m2m_with_model(self): def _fill_related_objects_cache(self): def get_all_related_objects(self, local_only=False, include_hidden=False, def get_all_related_objects_with_model(self, local_only=False, ..) def _fill_related_many_to_many_cache(self): def get_all_related_many_to_many_objects(self, local_only=False): def get_all_related_m2m_objects_with_model(self):

Slide 54

Slide 54 text

def get_fields(fields, m2m, related_objects, related_m2m, with_models): Naming things Compacting into a single get_fields() API ● Less redundancy ● A refactored version of the past API, nothing more ● Some entry-points have unique flags, so generalizing can be very hard

Slide 55

Slide 55 text

def get_fields(types=RELATED_OBJECTS, opts=INCLUDE_HIDDEN | INCLUDE_PROXY): Naming things Using bit-fields ● A flexible API ● Requires imports ● Entirely anti-pythonic ● Causes problems with circular imports

Slide 56

Slide 56 text

def get_fields(pure_data, pure_m2m, pure_virtual, forward_data, forward_m2m, forward_virtual, related_data, related_m2m, related_virual, include_hidden, include_proxy, include_concrete): Naming things Making the API even more sparse ● Flags are far better: more pythonic and less imports ● This matrix can describe exactly what we have now ● This matrix may not describe what we want in the future ● Field types and options are too sparse to be api parameters.

Slide 57

Slide 57 text

def get_fields(forward, reverse, include_hidden, include_parents) Naming things Moving to only 2 main field distinctions ● Only separates the main 2 distinction points ● The rest of the filtering is done outside the API ● Far simpler and easier to maintain ● We are not there yet, as this distinction may not exactly be what we want in the future (future ForeignKeys, future Virtual Fields)

Slide 58

Slide 58 text

An open source project is nothing without it’s community. Please give me feedback on Google Group, or IRC. If you are coming at the sprints, and you have some ideas or you want to have a chat, please do so! Naming things Conclusion

Slide 59

Slide 59 text

Without these people, the project would have not gone so far: Russell Keith Magee, Collin Anderson, Tim Graham, Loic Bistuer, Anssi, and many more Without these people, I wouldn’t be speaking here: Mark Tamlyn, Dutch Django association, Ola Sitarska, and many more! Community Daniel Pyrathon @pirosb3 Naming things A huge thanks

Slide 60

Slide 60 text

FIN heb je nog vragen? (questions?) Daniel Pyrathon @pirosb3