Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Teaching Doctrine to be Lazy

Teaching Doctrine to be Lazy

Doctrine object managers are greedy: when you query for a set of objects, they love to load everything, all at once. That’s normally great - but what if you’re working with large data sets, where you might load 10's of thousands of objects?

In this talk, we’ll teach Doctrine how to be lazy by demonstrating how to efficiently query and work with large data sets. We’ll cover:

- Lazy queries
- Lazy relationships
- Profiling and reducing object "hydrations"
- Efficient batch processing
- An alternate, “lazy-by-default” repository pattern

Kevin Bond

June 15, 2023
Tweet

More Decks by Kevin Bond

Other Decks in Programming

Transcript

  1. Me? From Ontario, Canada Husband, father of three Symfony user

    since 1.0 Symfony Core Team @kbond on GitHub/Slack @zenstruck on Twitter
  2. zenstruck? A GitHub organization where my open source packages live

    zenstruck/foundry zenstruck/browser zenstruck/messenger-test zenstruck/filesystem (wip) zenstruck/schedule-bundle (for <6.3) ... Many now co-maintained by Nicolas PHILIPPE ( @nikophil )
  3. What we'll cover Hydration considerations Lazy batch iterating (readonly) Lazy

    batch processing Updating/Deleting/Persisting Lazy relationships Future ideas Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 4
  4. Sample App +----------+ +------------+ | PRODUCT | | PURCHASE |

    |----------| |------------| | id |---+ | id | | sku | +--<| product_id | | stock | | date | | category | | amount | +----------+ +------------+ 1,000+ products, 100,000+ purchases Products may have 1,000's of purchases Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 5
  5. Mongo? With some tweaks, the demonstrated techniques should/could apply to

    any doctrine/persistence implementation I'm using doctrine/orm for the examples in this talk Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 6
  6. Part 1: Hydration Considerations Hydration is expensive Some rules Only

    hydrate what you need Only hydrate when you need it Cleanup after yourself Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 7
  7. Profiling Hydrations Web Profiler? debesha/doctrine-hydration-profiler-bundle DoctrineBundle? Needs a hook in

    doctrine/orm Blackfire.io metrics.doctrine.entities.hydrated Teaching Doctrine to be Lazy - Part 1: Hydration Considerations Kevin Bond • @zenstruck • github.com/kbond 8
  8. Part 2: Batch Iterating Read-only Use SQL? purchase:report command Generates

    a report for all purchases Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 9
  9. $repo->findAll() 100000/100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 1 sec/1 sec 166.0 MiB //

    Time: 2 secs, Queries: 1 Only hydrate what you need Only hydrate when you need it Cleanup after yourself Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 10
  10. $repo->matching(new Criteria()) 100000/100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 1 sec/1 sec 168.0 MiB

    // Time: 1 sec, Queries: 2 Only hydrate what you need Only hydrate when you need it Cleanup after yourself Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 11
  11. Doctrine\ORM\Query::toIterable() 100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 2 secs 166.0 MiB // Time: 2

    secs, Queries: 1 Only hydrate what you need Only hydrate when you need it Cleanup after yourself Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 12
  12. Batch Utilities - Iterator ocramius/doctrine-batch-utils Takes an ORM Query object

    and iterates over the result set in batches Clear the ObjectManager after each batch to free memory Enhanced: Accepts any iterable and any ObjectManager instance Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 13
  13. Use BatchIterator $iterator = new BatchIterator($query->toIterable(), $this->em); 100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%

    2 secs 20.0 MiB // Time: 2 secs, Queries: 1 Only hydrate what you need Only hydrate when you need it Cleanup after yourself Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 14
  14. Memory Stays Constant, Time Increases 200,000 purchases? 200000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%

    4 secs 20.0 MiB // Time: 4 secs, Queries: 1 1,000,000 purchases? 1000000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 19 secs 22.0 MiB // Time: 19 secs, Queries: 1 Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 15
  15. 1,000,000 Purchases Using $repo->findAll() ? 1000000/1000000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 2 secs/2

    secs 1.5 GiB // Time: 16 secs, Queries: 1 1.5 GiB of memory? Teaching Doctrine to be Lazy - Part 2: Batch Iterating Kevin Bond • @zenstruck • github.com/kbond 16
  16. Part 3: Batch Processing Teaching Doctrine to be Lazy Kevin

    Bond • @zenstruck • github.com/kbond 17
  17. Batch Updating product:stock-update Command Loop through all products Update stock

    level from a source (ie. CSV files, API, etc) Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 18
  18. $repo->findAll() foreach ($repo->findAll() as $product) { /** @var Product $product

    */ $product->setStock($this->currentStockFor($product)); $this->em->flush(); } Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 19
  19. $repo->findAll() foreach ($repo->findAll() as $product) { /** @var Product $product

    */ $product->setStock($this->currentStockFor($product)); $this->em->flush(); } 1000/1000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 8 secs/8 secs 16.0 MiB // Time: 8 secs, Queries: 988 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 20
  20. $repo->findAll() , Delay Flush foreach ($repo->findAll() as $product) { /**

    @var Product $product */ $product->setStock($this->currentStockFor($product)); } $this->em->flush(); Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 21
  21. $repo->findAll() , Delay Flush foreach ($repo->findAll() as $product) { /**

    @var Product $product */ $product->setStock($this->currentStockFor($product)); } $this->em->flush(); 1000/1000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% < 1 sec/< 1 sec 16.0 MiB // Time: < 1 sec, Queries: 2 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 22
  22. $repo->findAll() , Delay Flush 100,000 products? 100000/100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% <

    1 sec/< 1 sec 186.0 MiB // Time: 12 secs, Queries: 2 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 23
  23. Batch Utilities - Processor ocramius/doctrine-batch-utils Takes an ORM Query object

    and iterates over the result set in batches Flush and clear the ObjectManager after each batch to free memory and save changes Wrap everything in a transaction Enhanced: Accepts any iterable and any ObjectManager instance Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 24
  24. Using BatchProcessor $processor = new BatchProcessor($query->toIterable(), $this->em); foreach ($processor as

    $product) { /** @var Product $product */ $product->setStock($this->currentStockFor($product)); } // no need for "flush" 1000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] < 1 sec 16.0 MiB // Time: < 1 sec, Queries: 1 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 25
  25. Using BatchProcessor - 100,000 Products 100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 11 secs 22.0

    MiB // Time: 11 secs, Queries: 2 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Update) Kevin Bond • @zenstruck • github.com/kbond 26
  26. Batch Deleting DQL DELETE statement? PreRemove / PostRemove events? purchase:purge

    Command Delete all purchases older than X days Imagine a PostRemove event that archives the purged purchases Teaching Doctrine to be Lazy - Part 3: Batch Processing (Delete) Kevin Bond • @zenstruck • github.com/kbond 27
  27. Using BatchProcessor $processor = new BatchProcessor($query->toIterable(), $this->em); foreach ($processor as

    $purchase) { /** @var Purchase $purchase */ $this->em->remove($purchase); // no need for "flush" } Teaching Doctrine to be Lazy - Part 3: Batch Processing (Delete) Kevin Bond • @zenstruck • github.com/kbond 28
  28. Using BatchProcessor - 100,000 Purchases 75237 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 9 secs

    18.0 MiB // Time: 9 secs, Queries: 1 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Delete) Kevin Bond • @zenstruck • github.com/kbond 29
  29. Using BatchProcessor - 1,000,000 Purchases 753854 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100% 1 min

    18.0 MiB // Time: 1 min, Queries: 1 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Delete) Kevin Bond • @zenstruck • github.com/kbond 30
  30. Batch Persisting product:import Command Imports products from a source (ie.

    CSV files, API, etc) We'll use a Generator to yield Product instances from our source Requires enhanced BatchProcessor Accepts any iterable Teaching Doctrine to be Lazy - Part 3: Batch Processing (Persist) Kevin Bond • @zenstruck • github.com/kbond 31
  31. Using BatchProcessor $processor = new BatchProcessor( $this->products(), // Product[] -

    our "source" $this->em, ); foreach ($processor as $product) { /** @var Product $product */ $this->em->persist($product); // no need for "flush" } Teaching Doctrine to be Lazy - Part 3: Batch Processing (Persist) Kevin Bond • @zenstruck • github.com/kbond 32
  32. Using BatchProcessor - Import 1,000 1000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] < 1 sec

    16.0 MiB // Time: < 1 sec, Queries: 1 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Persist) Kevin Bond • @zenstruck • github.com/kbond 33
  33. Using BatchProcessor - Import 100,000 100000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 12 secs 16.0

    MiB // Time: 12 secs, Queries: 1 Teaching Doctrine to be Lazy - Part 3: Batch Processing (Persist) Kevin Bond • @zenstruck • github.com/kbond 34
  34. Part 4: Lazy Relationships product:report Command Loop over all products

    (using our BatchIterator ) For each product Fetch details on the most recent purchase Fetch number of purchases in the last 30 days Some products have 10,000+ purchases Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 35
  35. Command Code foreach ($products as $product) { /** @var Product

    $product */ /** @var Collection&Selectable $purchases */ $purchases = $product->getPurchases(); $last30Days = Criteria::create()->where( Criteria::expr()->gte('date', new \DateTimeImmutable('-30 days')) ); $this->addToReport( $product->getSku(), $purchases->first() ?: null, // most recent purchase $purchases->matching($last30Days)->count(), ); } Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 36
  36. Standard One-to-Many Relationship #[ORM\Entity] class Product { #[ORM\OneToMany(mappedBy: 'product', targetEntity:

    Purchase::class)] #[ORM\OrderBy(['date' => 'DESC'])] private Collection $purchases; public function getPurchases(): Collection { return $this->purchases; } } Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 37
  37. Standard One-to-Many Relationship $purchases = $product->getPurchases(); $purchases->count(); // initializes entire

    collection $purchases->first(); // initializes entire collection $purchases->slice(0, 10); // initializes entire collection foreach ($purchases as $purchase) { // initializes entire collection } Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 38
  38. Standard One-to-Many Relationship 1000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 6 secs 128.0 MiB //

    Time: 6 secs, Queries: 1001 Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 39
  39. Extra Lazy One-to-Many Relationship #[ORM\Entity] class Product { #[ORM\OneToMany( mappedBy:

    'product', targetEntity: Purchase::class, fetch: 'EXTRA_LAZY', // !!! )] #[ORM\OrderBy(['date' => 'DESC'])] private Collection $purchases; } Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 40
  40. Extra Lazy One-to-Many Relationship Assuming the collection hasn't been previously

    initialized, Certain methods create new queries: $purchases = $product->getPurchases(); $purchases->count(); // creates an additional "count" query $purchases->first(); // initializes entire collection !! $purchases->slice(0, 10); // creates an additional "slice" query foreach ($purchases as $purchase) { // initializes entire collection } Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 41
  41. Extra Lazy One-to-Many Relationship More efficient first() : $purchases =

    $product->getPurchases(); $purchases->slice(0, 1)[0] ?? null; Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 42
  42. Updated Command Code foreach ($products as $product) { // ...

    $this->addToReport( $product->getSku(), $purchases->slice(0, 1)[0] ?? null, // most recent purchase $purchases->matching($last30Days)->count(), ); } Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 43
  43. Extra Lazy One-to-Many Relationship 1000 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 1 sec 18.0 MiB

    // Time: 1 sec, Queries: 2001 Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 44
  44. n+x Problem? ...it depends... Saving the number of queries at

    all costs is not always the best solution If the collection has many items, hydration will be more expensive than the extra queries Evaluate your models and use cases Teaching Doctrine to be Lazy - Part 4: Lazy Relationships Kevin Bond • @zenstruck • github.com/kbond 45
  45. Batch Summary Hydration is expensive The BatchIterator / Processor can

    keep the expense down to time only When you have a large or unknown amount of data to process, it's better to move the processing to background tasks Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 46
  46. Part 5: Future Ideas Exploring some ideas in zenstruck/collection .

    Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 47
  47. Alternate Lazy by Default ObjectRepository Teaching Doctrine to be Lazy

    - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 48
  48. New ObjectRepository Interface /** * @template T of object *

    @extends \IteratorAggregate<T> */ interface ObjectRepository extends \IteratorAggregate, \Countable { /** * @param mixed|Criteria $specification * * @return Result<T> */ public function filter(mixed $specification): Result; } Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 49
  49. The Result Interface /** * @template T of object *

    @extends \IteratorAggregate<T> */ interface Result extends \IteratorAggregate, \Countable { public function first(): T|null; public function take(int $limit, int $offset = 0): self; public function process(int $chunkSize = 100): BatchProcessor public function toArray(): array; // ... } Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 50
  50. ORM ObjectRepository::filter() $specification can be: array<string,mixed> : works like findBy()

    Criteria : works like matching() callable(QueryBuilder, string): void : custom query Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 51
  51. Using the $specification callable $purchases = $repo->filter( function(QueryBuilder $qb, string

    $root) use ($newerThan) { $qb->where("{$root}.date > :newerThan") ->setParameter('newerThan', $newerThan) ; } ); Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 52
  52. Specification Objects You could extend this ObjectRepository to add your

    methods, but, because filter() accepts callable(QueryBuilder) , you can create invokable specification objects instead. Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 53
  53. Between Specification final class Between { public function __invoke(QueryBuilder $qb,

    string $root): void { if ($this->from) { $qb->andWhere("{$root}.date >= :from") ->setParameter('from', $this->from) ; } // "to" logic... } } Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 54
  54. Inject as a Service (Symfony 6.3+) /** * @param ObjectRepository<Purchase>

    $repo */ public function someAction( // extends "Autowire" (creates repo from factory service) #[ForClass(Purchase::class)] ObjectRepository $repo, ) { $products = $repo->filter(new Between('2021-01-01', '2021-12-31')); // ... } Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 55
  55. Thank You! @kbond on GitHub/Slack @zenstruck on Twitter Sample Code:

    github.com/kbond/lazy-doctrine Slides: speakerdeck.com/kbond zenstruck/collection Teaching Doctrine to be Lazy Kevin Bond • @zenstruck • github.com/kbond 56
  56. Paginating the Result class ResultPagerfantaAdapter implements AdapterInterface { public function

    getNbResults(): int { return $this->result->count(); } public function getSlice(int $offset, int $length): array { return $this->result->take($length, $offset)->toArray(); } } Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 57
  57. Lazier Doctrine Collection $purchase = $purchases->first(); // use slice(0, 1)[0]

    ?? null internally foreach ($purchases as $purchase) { // lazily iterate "chunks" if large count } Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 58
  58. Generic Specification System $specification = Spec::andX( new Between(from: new \DateTimeImmutable('-1

    year')), // in last year Spec::greaterThan('amount', 100.00), // amount > $100.00 Spec::sortDesc('date'), // sort by date ); Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 59
  59. Generic Specification System Use the same specification object in multiple

    places: // use with ORM $purchases = $ormPurchaseRepository->filter($specification); // use with Mongo $purchases = $mongoPurchaseRepository->filter($specification); // use with Collection $purchases = $product->getPurchases()->filter($specification); Teaching Doctrine to be Lazy - Part 5: Future Ideas Kevin Bond • @zenstruck • github.com/kbond 60