Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A performant application, even with Doctrine!

A performant application, even with Doctrine!

A7e7c34aaa3ff7eb359b6449fb8bb043?s=128

Thomas Calvet

November 28, 2019
Tweet

Transcript

  1. A performant application, even with Doctrine!

  2. Good evening! I am Thomas Calvet I am a Symfony

    enthusiast I work at ekino You can find me on GitHub and on the Symfony Devs Slack as fancyweb, on Twitter as @fancyweb_
  3. ORM internals Simply and quickly

  4. Object Relational Mapping (ORM) ▪ A technique to convert data

    between incompatible type systems using object-oriented programming languages ▪ Doctrine ORM is an Object Relational Mapper ▪ Its goal is to simplify the translation between database rows and the PHP object model
  5. UnitOfWork (UoW) ▪ A single class of 3k+ lines of

    code ▪ Knows all the managed entities ▪ Responsible for tracking those entities changes ▪ Responsible for writing out those changes in the good order The private heart of the ORM
  6. EntityManager (EM) ▪ A facade to the UoW ▪ And

    a facade to all the others ORM subsystems: ▫ Metadata ▫ Repositories ▫ Queries, etc. The central public access point of the ORM
  7. The identity map ▪ A big associative array in the

    UoW ▪ Class property ($identityMap) ▪ Stores a reference to every managed entities ▪ Ensures that a managed entity is loaded only once and in the same in-memory object The unique source of truth
  8. In this example, only one SQL query is executed since

    the entity with the id 1 is already in the identity map on the second “find” call.
  9. In this example, two SQL queries are executed but thanks

    to the identity map the same entity instance is returned by the repository.
  10. The states ▪ A big associative array in the UoW

    ▪ Cached in a class property ($entityStates) ▪ Every entity has a state ▪ There are four possible states Capital for the optimisation
  11. An entity is “managed” when it it known by the

    identity map. It is added to this map in multiple cases : for example, when it is hydrated through the ORM or after being inserted in the database. 1 = UnitOfWork::STATE_MANAGED
  12. An entity is “new” when it is not known by

    the identity map and when it has no identifier or an unknown database identifier. 2 = UnitOfWork::STATE_NEW
  13. An entity is “detached” when it was removed from the

    identity map or when it is not known by the identity map and when it has a known database identifier. “detached” is also the default assumed state. 3 = UnitOfWork::STATE_DETACHED
  14. A entity is “removed” when it has been scheduled for

    deletion from the database on the next UoW commit. 4 = UnitOfWork::STATE_REMOVED
  15. The changesets ▪ A big associative array in the UoW

    ▪ Class property ($entityChangeSets) ▪ The differences between the last synchronized (from the database) states and the current states of the entities ▪ Computed at the beginning of the commit and cleared afterwhile
  16. None
  17. The pending changes ▪ Big associatives arrays in the UoW

    ▪ Class properties ($entityInsertions, $collectionUpdates, etc.) ▪ All pending information about what to insert, update or delete (entities and collections) ▪ Computed at the beginning of the commit and cleared afterwhile
  18. None
  19. To summarize With a schema

  20. None
  21. None
  22. None
  23. 16 big associative arrays in the UoW! The UoW contains

    highly performance sensitive code
  24. “Always keep the identity map and internals in mind It

    will greatly improve the performance of your code!
  25. Flush carefully Flush once

  26. None
  27. Transactional write-behind ▪ A strategy used by the UoW ▪

    Delays the execution of SQL queries ▪ The goal is to execute them in the most efficient way ▪ They are optimized in the shortest transaction possible so that all write locks are quickly released
  28. Flushing is an heavy operation ▪ It computes all the

    changesets ▪ It dispatches all lifecycle events ▪ It generates all needed SQL queries ▪ It executes them in a transaction
  29. Don’t

  30. Do

  31. Batch your flushes When it is necessary

  32. One heavy flush can be catastrophic ▪ The more changes

    there are on managed entities, the longer they take to be processed internally ▪ Sometimes, splitting an heavy task in x smaller ones is more efficient ▪ Too big database transactions are bad for concurrency because they lock the tables for too long
  33. Don’t

  34. Do

  35. Use SQL Get back to the basics

  36. Do

  37. Clear your Entity Manager To start over

  38. Reduce the memory usage ▪ The UoW stores a big

    amount of information in all its class properties ▪ Some are cleared after each commit ▪ Some are never cleared automatically ▪ Those leftover data can end up using a lot of memory
  39. Do

  40. What happens when the UoW is cleared? ▪ It resets

    it to its initial state ▪ It sets all its “stack” class properties to an empty array, thus freeing a lot of memory ▪ Consequently, all managed entities become “detached”
  41. ⚠ Legacy clear, detach and merge Everything need to die

  42. ⚠ Clearing one entity class is not recommended because of

    many broken scenarios, it won’t be available anymore in Doctrine ORM 3. Don’t
  43. ⚠ Detaching and merging entities is not recommended, it won’t

    be available anymore in Doctrine ORM 3. Don’t
  44. Know the tracking policies To choose the right one

  45. Changes tracking ▪ At some point, the ORM determines what

    changed on the entities it manages thanks to a tracking policy ▪ Each tracking policy has advantages and disadvantages ▪ Each tracking policy has a different impact on the overall performance ▪ There are three different tracking policies
  46. The Deferred Implicit tracking policy ▪ The ORM checks all

    managed entities for changes ▪ It checks all properties values one by one ▪ It checks for new entities that are referenced by other managed entities ▪ It obviously takes longer as the UoW grows Automatic but the worse performance
  47. None
  48. The Deferred Explicit tracking policy ▪ Same behavior than the

    Deferred Implicit tracking policy ▪ Except that the ORM only check entities that have been explicitly marked through a “persist” call ▪ Better for large UoW Some manual work but a way better performance
  49. None
  50. None
  51. The Notify tracking policy ▪ The entities notify interested listeners

    of every changes to their properties ▪ You have full control over when you consider a property changed or not ▪ The best for very large UoW A lot of manual work but the best performance
  52. None
  53. Load the least entities possible The out of memory (OOM)

    problem
  54. None
  55. Every loaded entity has an impact ▪ An impact on

    both time and memory because they increase the overall number of managed entities ▪ Hydrating an entity is heavy ▪ Think about the potential total numbers of entities your method call could load ▪ Don’t use the “findAll()” method ▪ Avoid filterless queries
  56. Don’t There could be 50 000 comments one day.

  57. Use the “iterate” feature ▪ Avoids to hydrate all the

    query resulting entities in the memory at once ▪ Hydrates the entities one by one instead ▪ Limited to queries that don’t need to fetch join a collection valued association ▪ ⚠ You still need to clear the EM regularly to free the memory
  58. Do

  59. Use SQL Yes again

  60. Do

  61. “Pre select” the associations The N+1 problem

  62. None
  63. None
  64. Watch for the queries count ▪ Entity associations are lazy

    loaded ▪ An SQL query is executed on demand, when you access the uninitialized collection for the first time ▪ When you iterate on an array of entities, the ORM needs to execute a SQL query for every collection you access ▪ Therefore, nested iterations make the number of queries grow exponentially
  65. None
  66. Do

  67. ⚠ Big inverse side associations

  68. Don’t

  69. Do

  70. Don’t setup big inverse side associations ▪ In a general

    way, avoid non-essential associations ▪ Each association imply more work for the ORM, ie consumed time and memory ▪ Not having “big” inverse side associations prevents you from using their getter ▪ Consequently, it prevents you from loading large amount of entities in the memory at once and from running into the N+1 problem
  71. Use caches Do not process the same things again and

    again
  72. Metadata cache ▪ Removes the mapping files parsing overhead ▪

    They don’t need to be parsed on each request because the same mapping files will always produce the same class metadata ▪ Your application should never be in production without a configured metadata cache
  73. Query cache ▪ Removes the DQL to SQL conversion overhead

    ▪ The same DQL query will always be converted to the same SQL (for the same platform) ▪ Your application should never be in production without a configured query cache
  74. Result cache ▪ Caches the results of a query ▪

    Avoids to requery the database and to rehydrate the resulting entities ▪ You can specify the time to leave (TTL) by queries ▪ Use it particularly on slow queries, even with a short TTL
  75. Do

  76. Second level cache ▪ Reduces the amount of necessary database

    access ▪ A cache between the identity map and the database ▪ Three caching modes : ▫ READ_ONLY, ▫ NONSTRICT_READ_WRITE ▫ READ_WRITE ▪ Experimental and very complex ▪ Limited to single application and to single primary key
  77. Prefer entity listeners Over event listeners and event subscribers

  78. When it concerns one kind of entity only. Don’t

  79. When it concerns one kind of entity only. Don’t

  80. Prefer entity listeners ▪ In a general way, prefer listeners

    ▪ Entity listeners are better for performance because they are called only for entities of the class they were configured for, they avoid to execute useless code paths ▪ Moreover, they can be lazy, ie being instantiated only when they are actually used
  81. Do

  82. Do

  83. ⚠ Composite primary keys

  84. Don’t

  85. Avoid composite primary keys ▪ They require additional internal work,

    ie they imply time and memory overheads ▪ They have a way higher probability of errors because almost nobody uses them ▪ Use single primary keys ▪ Use UUIDs
  86. ⚠ Table inheritance

  87. Don’t

  88. Don’t use table inheritance ▪ Single table inheritance (STI) implies

    having weak constraints on the table, for example many nullable columns ▪ Class Table Inheritance (CTI) implies multiples join operations for any query ▪ Representing OOP inheritance in a database is never a good idea
  89. Use MappedSuperclass ▪ It provides reusable mapping information and reusable

    concrete code logic but is not an entity itself ▪ Think of it as a trait for your entities but with a bonus: the mapped super class is in the children class hierarchy ▪ Each “children” entity is independent, it has its own table and is easily refactorable
  90. Do

  91. ⚠ Cascade removing

  92. Don’t

  93. Don’t use cascade removing ▪ Cascade removing is done at

    runtime, is synchronous, and involves a full in-memory load of the entities to delete ▪ On big associations, it consumes a lot of time and a lot of memory, it often leads to OOM errors ▪ Use SQL “ON DELETE CASCADE” clauses
  94. Do

  95. ⚠ “Remove” lifecycle events

  96. Don’t

  97. Don’t use the “remove” lifecycle events ▪ In a general

    way, avoid to have to do things on removes ▪ If you don’t use cascade removing, those events won’t be dispatched for the associated entities to remove. And you don’t want to do everything manually yourselves ▪ Use domain events that are handled asynchronously ▪ It’s really easy now thanks to the Symfony Messenger component
  98. ⚠ Profiling and logs Very useful but...

  99. They have a cost ▪ Collecting data for the profiler

    takes time and memory ▪ Writing logs involves IO operations and thus, takes time and memory ▪ Consider disabling the profiling and the logging, especially in your batch processing ▪ Do it case by case
  100. Do

  101. No time to explain It’s already long enough

  102. More topics worth checking out ▪ Extra lazy associations ▪

    Partial objects and references ▪ Read only entities and properties ▪ Using criteria on non loaded associations ▪ Using the “filters” feature ▪ Aggregate data directly with SQL ▪ Multi step hydratation
  103. “Always keep the identity map and internals in mind

  104. Thanks! Any questions? You can find me at: fancyweb on

    GitHub and on the Symfony Devs Slack @fancyweb_ on Twitter calvet.thomas@gmail.com by mail