Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A performant application, even with Doctrine!

A performant application, even with Doctrine!

Thomas Calvet

November 28, 2019
Tweet

More Decks by Thomas Calvet

Other Decks in Programming

Transcript

  1. Good evening! I am Thomas Calvet I am a Symfony

    enthusiast I work at ekino You can find me on GitHub and on the Symfony Devs Slack as fancyweb, on Twitter as @fancyweb_
  2. Object Relational Mapping (ORM) ▪ A technique to convert data

    between incompatible type systems using object-oriented programming languages ▪ Doctrine ORM is an Object Relational Mapper ▪ Its goal is to simplify the translation between database rows and the PHP object model
  3. UnitOfWork (UoW) ▪ A single class of 3k+ lines of

    code ▪ Knows all the managed entities ▪ Responsible for tracking those entities changes ▪ Responsible for writing out those changes in the good order The private heart of the ORM
  4. EntityManager (EM) ▪ A facade to the UoW ▪ And

    a facade to all the others ORM subsystems: ▫ Metadata ▫ Repositories ▫ Queries, etc. The central public access point of the ORM
  5. The identity map ▪ A big associative array in the

    UoW ▪ Class property ($identityMap) ▪ Stores a reference to every managed entities ▪ Ensures that a managed entity is loaded only once and in the same in-memory object The unique source of truth
  6. In this example, only one SQL query is executed since

    the entity with the id 1 is already in the identity map on the second “find” call.
  7. In this example, two SQL queries are executed but thanks

    to the identity map the same entity instance is returned by the repository.
  8. The states ▪ A big associative array in the UoW

    ▪ Cached in a class property ($entityStates) ▪ Every entity has a state ▪ There are four possible states Capital for the optimisation
  9. An entity is “managed” when it it known by the

    identity map. It is added to this map in multiple cases : for example, when it is hydrated through the ORM or after being inserted in the database. 1 = UnitOfWork::STATE_MANAGED
  10. An entity is “new” when it is not known by

    the identity map and when it has no identifier or an unknown database identifier. 2 = UnitOfWork::STATE_NEW
  11. An entity is “detached” when it was removed from the

    identity map or when it is not known by the identity map and when it has a known database identifier. “detached” is also the default assumed state. 3 = UnitOfWork::STATE_DETACHED
  12. A entity is “removed” when it has been scheduled for

    deletion from the database on the next UoW commit. 4 = UnitOfWork::STATE_REMOVED
  13. The changesets ▪ A big associative array in the UoW

    ▪ Class property ($entityChangeSets) ▪ The differences between the last synchronized (from the database) states and the current states of the entities ▪ Computed at the beginning of the commit and cleared afterwhile
  14. The pending changes ▪ Big associatives arrays in the UoW

    ▪ Class properties ($entityInsertions, $collectionUpdates, etc.) ▪ All pending information about what to insert, update or delete (entities and collections) ▪ Computed at the beginning of the commit and cleared afterwhile
  15. 16 big associative arrays in the UoW! The UoW contains

    highly performance sensitive code
  16. “Always keep the identity map and internals in mind It

    will greatly improve the performance of your code!
  17. Transactional write-behind ▪ A strategy used by the UoW ▪

    Delays the execution of SQL queries ▪ The goal is to execute them in the most efficient way ▪ They are optimized in the shortest transaction possible so that all write locks are quickly released
  18. Flushing is an heavy operation ▪ It computes all the

    changesets ▪ It dispatches all lifecycle events ▪ It generates all needed SQL queries ▪ It executes them in a transaction
  19. Do

  20. One heavy flush can be catastrophic ▪ The more changes

    there are on managed entities, the longer they take to be processed internally ▪ Sometimes, splitting an heavy task in x smaller ones is more efficient ▪ Too big database transactions are bad for concurrency because they lock the tables for too long
  21. Do

  22. Do

  23. Reduce the memory usage ▪ The UoW stores a big

    amount of information in all its class properties ▪ Some are cleared after each commit ▪ Some are never cleared automatically ▪ Those leftover data can end up using a lot of memory
  24. Do

  25. What happens when the UoW is cleared? ▪ It resets

    it to its initial state ▪ It sets all its “stack” class properties to an empty array, thus freeing a lot of memory ▪ Consequently, all managed entities become “detached”
  26. ⚠ Clearing one entity class is not recommended because of

    many broken scenarios, it won’t be available anymore in Doctrine ORM 3. Don’t
  27. ⚠ Detaching and merging entities is not recommended, it won’t

    be available anymore in Doctrine ORM 3. Don’t
  28. Changes tracking ▪ At some point, the ORM determines what

    changed on the entities it manages thanks to a tracking policy ▪ Each tracking policy has advantages and disadvantages ▪ Each tracking policy has a different impact on the overall performance ▪ There are three different tracking policies
  29. The Deferred Implicit tracking policy ▪ The ORM checks all

    managed entities for changes ▪ It checks all properties values one by one ▪ It checks for new entities that are referenced by other managed entities ▪ It obviously takes longer as the UoW grows Automatic but the worse performance
  30. The Deferred Explicit tracking policy ▪ Same behavior than the

    Deferred Implicit tracking policy ▪ Except that the ORM only check entities that have been explicitly marked through a “persist” call ▪ Better for large UoW Some manual work but a way better performance
  31. The Notify tracking policy ▪ The entities notify interested listeners

    of every changes to their properties ▪ You have full control over when you consider a property changed or not ▪ The best for very large UoW A lot of manual work but the best performance
  32. Every loaded entity has an impact ▪ An impact on

    both time and memory because they increase the overall number of managed entities ▪ Hydrating an entity is heavy ▪ Think about the potential total numbers of entities your method call could load ▪ Don’t use the “findAll()” method ▪ Avoid filterless queries
  33. Use the “iterate” feature ▪ Avoids to hydrate all the

    query resulting entities in the memory at once ▪ Hydrates the entities one by one instead ▪ Limited to queries that don’t need to fetch join a collection valued association ▪ ⚠ You still need to clear the EM regularly to free the memory
  34. Do

  35. Do

  36. Watch for the queries count ▪ Entity associations are lazy

    loaded ▪ An SQL query is executed on demand, when you access the uninitialized collection for the first time ▪ When you iterate on an array of entities, the ORM needs to execute a SQL query for every collection you access ▪ Therefore, nested iterations make the number of queries grow exponentially
  37. Do

  38. Do

  39. Don’t setup big inverse side associations ▪ In a general

    way, avoid non-essential associations ▪ Each association imply more work for the ORM, ie consumed time and memory ▪ Not having “big” inverse side associations prevents you from using their getter ▪ Consequently, it prevents you from loading large amount of entities in the memory at once and from running into the N+1 problem
  40. Metadata cache ▪ Removes the mapping files parsing overhead ▪

    They don’t need to be parsed on each request because the same mapping files will always produce the same class metadata ▪ Your application should never be in production without a configured metadata cache
  41. Query cache ▪ Removes the DQL to SQL conversion overhead

    ▪ The same DQL query will always be converted to the same SQL (for the same platform) ▪ Your application should never be in production without a configured query cache
  42. Result cache ▪ Caches the results of a query ▪

    Avoids to requery the database and to rehydrate the resulting entities ▪ You can specify the time to leave (TTL) by queries ▪ Use it particularly on slow queries, even with a short TTL
  43. Do

  44. Second level cache ▪ Reduces the amount of necessary database

    access ▪ A cache between the identity map and the database ▪ Three caching modes : ▫ READ_ONLY, ▫ NONSTRICT_READ_WRITE ▫ READ_WRITE ▪ Experimental and very complex ▪ Limited to single application and to single primary key
  45. Prefer entity listeners ▪ In a general way, prefer listeners

    ▪ Entity listeners are better for performance because they are called only for entities of the class they were configured for, they avoid to execute useless code paths ▪ Moreover, they can be lazy, ie being instantiated only when they are actually used
  46. Do

  47. Do

  48. Avoid composite primary keys ▪ They require additional internal work,

    ie they imply time and memory overheads ▪ They have a way higher probability of errors because almost nobody uses them ▪ Use single primary keys ▪ Use UUIDs
  49. Don’t use table inheritance ▪ Single table inheritance (STI) implies

    having weak constraints on the table, for example many nullable columns ▪ Class Table Inheritance (CTI) implies multiples join operations for any query ▪ Representing OOP inheritance in a database is never a good idea
  50. Use MappedSuperclass ▪ It provides reusable mapping information and reusable

    concrete code logic but is not an entity itself ▪ Think of it as a trait for your entities but with a bonus: the mapped super class is in the children class hierarchy ▪ Each “children” entity is independent, it has its own table and is easily refactorable
  51. Do

  52. Don’t use cascade removing ▪ Cascade removing is done at

    runtime, is synchronous, and involves a full in-memory load of the entities to delete ▪ On big associations, it consumes a lot of time and a lot of memory, it often leads to OOM errors ▪ Use SQL “ON DELETE CASCADE” clauses
  53. Do

  54. Don’t use the “remove” lifecycle events ▪ In a general

    way, avoid to have to do things on removes ▪ If you don’t use cascade removing, those events won’t be dispatched for the associated entities to remove. And you don’t want to do everything manually yourselves ▪ Use domain events that are handled asynchronously ▪ It’s really easy now thanks to the Symfony Messenger component
  55. They have a cost ▪ Collecting data for the profiler

    takes time and memory ▪ Writing logs involves IO operations and thus, takes time and memory ▪ Consider disabling the profiling and the logging, especially in your batch processing ▪ Do it case by case
  56. Do

  57. More topics worth checking out ▪ Extra lazy associations ▪

    Partial objects and references ▪ Read only entities and properties ▪ Using criteria on non loaded associations ▪ Using the “filters” feature ▪ Aggregate data directly with SQL ▪ Multi step hydratation
  58. Thanks! Any questions? You can find me at: fancyweb on

    GitHub and on the Symfony Devs Slack @fancyweb_ on Twitter [email protected] by mail