Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The worst best idea: Doing batch with Doctrine

The worst best idea: Doing batch with Doctrine

Romain Monceau

May 24, 2018
Tweet

Other Decks in Programming

Transcript

  1. THE WORST BEST IDEA: THE WORST BEST IDEA: DOING BATCH

    WITH PHP AND DOCTRINE DOING BATCH WITH PHP AND DOCTRINE
  2. PHP HAS NOT BEEN DONE FOR BATCH PROCESSES PHP HAS

    NOT BEEN DONE FOR BATCH PROCESSES HTTP requests Timeout Memory consumption
  3. DOCTRINE MAPPING: BEER.ORM.YML DOCTRINE MAPPING: BEER.ORM.YML BeerBundle\Entity\Beer: type: entity table:

    beer repositoryClass: BeerBundle\Entity\Repository\EntityRepository indexes: searchcode_idx: columns: - code uniqueConstraints: searchunique_idx: columns: - code
  4. ENTITY REPOSITORY ENTITY REPOSITORY // BeerBundle\Entity\Repository\EntityRepository public function findOneByIdentifier(string $code)

    { $qb = $this->createQueryBuilder('c'); $qb->andWhere( $qb->expr()->eq('c.code', $qb->expr()->literal($code)) ); $results = $qb->getQuery()->execute(); // ... return $results[0]; }
  5. DOCTRINE MAPPING: BEER.ORM.YML DOCTRINE MAPPING: BEER.ORM.YML BeerBundle\Entity\Beer: type: entity table:

    beer repositoryClass: BeerBundle\Entity\Repository\EntityRepository fields: id: [...] code: [...] name: [...] description: [...] percent: [...] quotation: [...]
  6. DOCTRINE MAPPING: BEER.ORM.YML DOCTRINE MAPPING: BEER.ORM.YML BeerBundle\Entity\Beer: type: entity table:

    beer manyToOne: brewery: targetEntity: BeerBundle\Entity\Brewery joinColumn: name: brewery_id referencedColumnName: id category: targetEntity: BeerBundle\Entity\Category joinColumn: name: category_id referencedColumnName: id
  7. BEER APPLICATION IS READY! BEER APPLICATION IS READY! OH WAIT!

    WE NEED TO IMPORT DATA! OH WAIT! WE NEED TO IMPORT DATA!
  8. IMPORT BREWERIES (1000 LINES) IMPORT BREWERIES (1000 LINES) code;name;description abbey-ales;Abbey

    Ales Brewery;Abbey Ales Brewery black-sheep;Black Sheep Brewery;A black sheep brewery oakkham;Oakham Ales;The famous Oakham brewery
  9. COMMAND TO IMPORT BREWERIES COMMAND TO IMPORT BREWERIES // Read

    file content $fd = fopen($filepath, 'r+'); $headers = fgetcsv($fd, null, ';'); while ($csvRow = fgetcsv($fd, null, ';')) { $csvRow = array_combine($headers, $csvRow); // Process data row $brewery = $this->process($csvRow); // Validate brewery entity $violations = $this->getValidator()->validate($brewery); if ($violations->count() === 0) { // Persist brewery $this->getEntityManager()->persist($brewery); $this->getEntityManager()->flush(); } else { $this->printViolations($violations, $brewery); } }
  10. COMMAND TO IMPORT BREWERIES COMMAND TO IMPORT BREWERIES private function

    process(array $item) { $brewery = $this->findOrCreateBrewery($item['code']); $brewery->setName($item['name']); $brewery->setDescription($item['description']); $brewery->setAddress($item['address']); // ... return $brewery; } private function findOrCreateBrewery(string $code) { $brewery = $this->getRepository()->findOneByIdentifier($code); if (null === $brewery) { $brewery = new Brewery(); $brewery->setCode($code); } return $brewery; }
  11. LAUNCH BREWERIES IMPORT LAUNCH BREWERIES IMPORT $ bin/console batch:import:brewery breweries.csv

    Memory: 8.39M 1001 entity written Time: 23.77s - Memory: 29.36M - Diff: 20.97M $ bin/console batch:import:brewery breweries.csv --env=prod Memory: 6.29M 1001 entity written Time: 21.24s - Memory: 20.97M - Diff: 14.68M
  12. WHAT IS HAPPENING? WHAT IS HAPPENING? [...] DEBUG: SELECT [...]

    FROM brewery b0_ WHERE b0_.code = 'id-quia' DEBUG: SELECT [...] FROM brewery t0 WHERE t0.code = ? ["id-quia"] DEBUG: "START TRANSACTION" [] [] DEBUG: INSERT INTO brewery [...] [] DEBUG: "COMMIT" [] [] DEBUG: SELECT [...] FROM brewery b0_ WHERE b0_.code LIKE 'abbey-ales' DEBUG: SELECT [...] FROM brewery t0 WHERE t0.code = ? ["abbey-ales"] DEBUG: "START TRANSACTION" [] [] DEBUG: INSERT INTO brewery [...] [] DEBUG: "COMMIT" [] [] [...]
  13. INCREASE BATCH SIZE TO 100 INCREASE BATCH SIZE TO 100

    // Write entity $this->getEntityManager()->persist($entity); if (0 === ++$writeCount % 100) { $this->getEntityManager()->flush(); }
  14. INCREASE BATCH SIZE TO 100 INCREASE BATCH SIZE TO 100

    $ bin/console batch:import:brewery breweries.csv --env=prod [Doctrine\DBAL\Exception\UniqueConstraintViolationException] [Doctrine\DBAL\Driver\PDOException] [PDOException] SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'id-quia' for key 'searchunique_idx'
  15. REWORK VALIDATOR REWORK VALIDATOR class UniqueEntityCodeValidator extends ConstraintValidator { private

    $codeSet = []; public function validate($entity, Constraint $constraint) { if (isset($this->codeSet[$entity->getCode()])) { $this->context ->buildViolation($constraint->message) ->setParameter('%unique_code%', $entity->getCode()) ->addViolation(); return; } $this->codeSet[$entity->getCode()] = $entity->getCode(); } }
  16. RELAUNCH THE IMPORT! RELAUNCH THE IMPORT! BATCH SIZE 1 BATCH

    SIZE 1 BATCH SIZE = 100 BATCH SIZE = 100 Memory: 6.29M 1001 entity written Time: 21.24s - Memory: 20.97M - Diff: 14.68M Memory: 6.29M Entity "id-quia" not valid: The value "id-quia" is already set in another entity 1000 entity written Time: 1.95s - Memory: 20.97M - Diff: 14.68M
  17. UPDATE BREWERIES UPDATE BREWERIES $ bin/console batch:import:brewery breweries_update.csv --env=prod Memory:

    6.29M Entity "id-quia" not valid: This value is too long. It should have 100 characters or less. 0 entity written [Doctrine\DBAL\Exception\DriverException] [Doctrine\DBAL\Driver\PDOException] [PDOException] SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'name' at row 1
  18. CHANGE TRACKING POLICY CHANGE TRACKING POLICY "It is the process

    of determining what has changed in managed entities since the last time they were synchronized with the db" http://docs.doctrine-project.org/projects/doctrine- orm/en/latest/reference/change-tracking-policies.html
  19. DEFERRED EXPLICIT IMPLEMENTATION DEFERRED EXPLICIT IMPLEMENTATION BeerBundle\Entity\Brewery: type: entity table:

    brewery repositoryClass: BeerBundle\Entity\Repository\EntityRepository changeTrackingPolicy: DEFERRED_EXPLICIT
  20. RELAUNCH THE JOB RELAUNCH THE JOB $ bin/console batch:import:brewery breweries_update.csv

    --env=prod Memory: 6.29M Entity "id-quia" not valid: This value is too long. It should have 100 characters or less. 0 entity written Time: 0.05s - Memory: 14.68M - Diff: 8.39M
  21. IMPORT BEERS (10000 LINES) IMPORT BEERS (10000 LINES) code;name;description;brewery;category kronenbourg;Kronenbourg;A

    beer to piss;kronenbourg_brewery;piss stella;Stella Artois;A beer that does not respect its country;Artois;pils 33_export;33 Export;The best competitor of Kronenbourg;heineken;piss
  22. IMPORT BEERS (10000 LINES) IMPORT BEERS (10000 LINES) IMPORT BEERS

    (20000 LINES) IMPORT BEERS (20000 LINES) $ bin/console batch:import:beer beers_10k.csv --env=prod Memory: 6.29M 10000 entity written Time: 32.3s - Memory: 92.27M - Diff: 85.98M $ bin/console batch:import:beer beers_20k.csv --env=prod Memory: 6.29M PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)
  23. EXPLANATION MEMORY LEAK PROBLEMS EXPLANATION MEMORY LEAK PROBLEMS Increase memory

    usage GC is run more often... and for nothing! Decrease performance
  24. EXTENSION PHP-MEMINFO EXTENSION PHP-MEMINFO Give insight about PHP memory content

    Free & open source PHP extension https://github.com/BitOne/php-meminfo
  25. PHP-MEMINFO: HOW IT WORKS? PHP-MEMINFO: HOW IT WORKS? 1. Dump

    memory content 2. Analyze content (summary) 3. Analyze memory for an object type 4. Analyze memory path for a speci c object
  26. DUMP MEMORY CONTENT DUMP MEMORY CONTENT // EntityWriter $this->em->flush(); meminfo_info_dump(fopen('/tmp/doctrine_batch.log',

    'w')); { "header": { "memory_usage":89735424, "memory_usage_real":92274688, "peak_memory_usage":90107432, "peak_memory_usage_real":92274688 }, "items": { "0x7f64be256100" : { "type" : "array", "size" : "72", "symbol_name" : "_GET", "is_root" : true, "frame" : "BatchBundle\\Job\\Job->execute()", "children" : {} }, ... } }
  27. SUMMARIZE CONTENT SUMMARIZE CONTENT Type Instances Count BeerBundle\Entity\Brewery 21000 BeerBundle\Entity\Category

    20139 Doctrine\ORM\Query\ResultSetMapping 11145 Doctrine\ORM\Query\ParserResult 11139 Doctrine\ORM\Query\Exec\SingleSelectExecutor 11139 BeerBundle\Entity\Beer 10000 $ bin/analyzer summary /tmp/doctrine_batch.log
  28. ANALYZE MEMORY FOR AN OBJECT TYPE ANALYZE MEMORY FOR AN

    OBJECT TYPE Item ids Item data Children 0x7f64ab801028 Type: object Class: BeerBundle\Entity\Beer [...] id: 0x7f64a8d354e0 code: 0x7f64a8d35500 [...] brewery: 0x7f64a8d355a0 category: 0x7f64a8d355c0 $ bin/analyzer query -f "class=BeerBundle\\Entity\\Beer" -v /tmp/doctrine_batch.log
  29. ANALYZE MEMORY PATH FOR A SPECIFIC OBJECT ANALYZE MEMORY PATH

    FOR A SPECIFIC OBJECT $ bin/analyzer ref-path /tmp/doctrine_batch.log 0x7f64ab801028 Found 3 paths 0x7f64ab801028 ↑ 1 (0x7f64afa4a920) ↑ BeerBundle\Entity\Beer (0x7f64a8c65c80) ↑ identityMap (0x7f64a8c6d320) ↑ unitOfWork (0x7f64afdbe7c0) ↑ doctrine.orm.default_entity_manager (0x7f64af9c08e0) ↑ services (0x7f64af9c1280) ↑ container (0x7f64be256280)
  30. THE IDENTITY MAP THE IDENTITY MAP Stores references to all

    managed entities Entities are grouped by their class name Avoid entity duplication $this->identityMap[$className][$idHash] = $entity; $breweryA = $breweryRepo->findOneBy(['code' => 'id-quia']); $breweryB = $breweryRepo->findOneBy(['name' => 'Dietrich Group']); $this->assertSame($breweryA, $breweryB);
  31. DETACH VERSUS CLEAR DETACH VERSUS CLEAR 3 ways to remove

    from the UOW: $em->detach($entity); $em->clear('BeerBundle\Entiy\Beer'); $em->clear();
  32. DETACH ENTITIES DETACH ENTITIES PREVIOUSLY: PREVIOUSLY: NOW: NOW: $this->em->flush(); $this->em->clear('BeerBundle\Entity\Beer');

    // or loop on items and call detach foreach ($items as $item) { $this->em->detach($item); } $ bin/console batch:import:beer beers_10k.csv --env=prod Time: 21.95s - Memory: 92.27M - Diff: 85.98M $ bin/console batch:import:beer beers_10k.csv --env=prod Time: 21.89s - Memory: 69.21M - Diff: 62.92M
  33. ANALYZE MEMORY ANALYZE MEMORY Type Instances Count Doctrine\ORM\Query\ResultSetMapping 11145 Doctrine\ORM\Query\ParserResult

    11139 Doctrine\ORM\Query\Exec\SingleSelectExecutor 11139 BeerBundle\Entity\Brewery 1000 BeerBundle\Entity\Category 139
  34. DETACH ENTITIES DETACH ENTITIES Categories and Breweries are not detached

    No cascade detach (ManyToOne) Not the same lifecycle
  35. CLEAR UOW CLEAR UOW PREVIOUSLY: PREVIOUSLY: NOW: NOW: $this->em->flush(); $this->em->clear();

    $ bin/console batch:import:beer beers_10k.csv --env=prod Time: 21.89s - Memory: 69.21M - Diff: 62.92M $ bin/console batch:import:beer beers_10k.csv --env=prod Time: 22.16s - Memory: 67.11M - Diff: 60.82M
  36. LET'S HUNT LAST MEMORY LEAKS LET'S HUNT LAST MEMORY LEAKS

    SingleSelectExecutor ResultSetMapping ParserResult
  37. DEDICATED REPOSITORY DEDICATED REPOSITORY public function findOneByIdentifier(string $code) { $qb

    = $this->createQueryBuilder('c'); $qb->andWhere( $qb->expr()->eq('c.code', $qb->expr()->literal($code)) ); $results = $qb->getQuery()->execute(); //... return $results[0]; }
  38. HOW TO QUERY ENTITIES?! HOW TO QUERY ENTITIES?! BAD CODE:

    BAD CODE: GOOD CODE: GOOD CODE: $qb = $this->createQueryBuilder('c'); $qb->andWhere( $qb->expr()->eq('c.code', $qb->expr()->literal($code)) ); $results = $qb->getQuery()->execute(); SELECT c FROM BeerBundle\Entity\Beer c WHERE c.code = 'chaquip-beer' $qb = $this->createQueryBuilder('c'); $qb->andWhere( $qb->expr()->eq('c.code', ':code') ); $qb->setParameter('code', $code); $results = $qb->getQuery()->execute(); SELECT c FROM BeerBundle\Entity\Beer c WHERE c.code = :code
  39. HOW TO QUERY ENTITIES?! HOW TO QUERY ENTITIES?! ONLY WORK

    ON A SINGLE TABLE ONLY WORK ON A SINGLE TABLE public function findOneByIdentifier(string $code) { return $this->findOneBy(['code' => $code]); }
  40. RELAUNCH IMPORT RELAUNCH IMPORT PREVIOUSLY: PREVIOUSLY: NOW: NOW: $ bin/console

    batch:import:beer beers_10k.csv --env=prod Time: 22.16s - Memory: 67.11M - Diff: 60.82M $ bin/console batch:import:beer beers_10k.csv --env=prod Memory: 6.29M Time: 15.9s - Memory: 14.68M - Diff: 8.39M
  41. RELAUNCH WITH 1M BEERS RELAUNCH WITH 1M BEERS $ bin/console

    batch:import:beer beers_1M.csv --env=prod Memory: 6.29M Time: 1573.68s - Memory: 14.68M - Diff: 8.39M
  42. COMMAND TO EXPORT BEERS COMMAND TO EXPORT BEERS $filepath =

    $input->getArgument('filepath'); $fd = fopen($filepath, 'a+'); $headers = ['code', 'name', 'description', 'percent', 'brewery', 'category']; fputcsv($fd, $headers, ';'); $beers = $this->getBeerRepository()->findAll(); foreach ($beers as $beer) { $csvRow = $this->process($beer); fputcsv($fd, $csvRow, ';'); } fclose($fd);
  43. COMMAND TO EXPORT BEERS COMMAND TO EXPORT BEERS private function

    process(Beer $beer) { $data = []; $data['code'] = $beer->getCode(); $data['name'] = $beer->getName(); $data['description'] = $beer->getDescription(); $data['percent'] = $beer->getPercent(); $data['quotation'] = $beer->getQuotation(); $data['brewery'] = $beer->getBrewery()->getCode(); $data['category'] = $beer->getCategory()->getCode(); return $data; }
  44. LAUNCH THE EXPORT (ON 10K BEERS) LAUNCH THE EXPORT (ON

    10K BEERS) Type Instances Count Proxies\__CG__\BeerBundle\Entity\Brewery 21000 Proxies\__CG__\BeerBundle\Entity\Category 20139 BeerBundle\Entity\Beer 10000 $ bin/console batch:export:beer beers.csv --env=prod Memory: 6.29M Time: 0.83s - Memory: 46.14M - Diff: 39.85M
  45. CLEAR THE UOW CLEAR THE UOW Type Instances Count Proxies\__CG__\BeerBundle\Entity\Brewery

    1000 Proxies\__CG__\BeerBundle\Entity\Category 139 $beers = $this->getBeerRepository()->findAll(); $this->getEntityManager()->clear(); foreach ($beers as $beer) { bin/console batch:export:beer beers.csv --env=prod Memory: 6.29M Time: 0.83s - Memory: 46.14M - Diff: 39.85M
  46. WHAT ABOUT A 1M BEERS EXPORT?! WHAT ABOUT A 1M

    BEERS EXPORT?! FIND ALL FIND ALL
  47. HOW TO DO SQL QUERY ON HUGE TABLE?! HOW TO

    DO SQL QUERY ON HUGE TABLE?! Read offset/limit Database cursor Search after
  48. READ OFFSET/LIMIT READ OFFSET/LIMIT mysql> SELECT * FROM beer; 1611768

    rows in set (3,92 sec) mysql> SELECT * FROM beer LIMIT 0, 100; 100 rows in set (0,00 sec) mysql> SELECT * FROM beer LIMIT 100000, 100; 100 rows in set (0,09 sec) mysql> SELECT * FROM beer LIMIT 1000000, 100; 100 rows in set (0,83 sec)
  49. DATABASE CURSOR DATABASE CURSOR $em->getConnection() ->getWrappedConnection() ->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false); $qb =

    $this->repository->createQueryBuilder('b'); while ($result = $qb->getQuery()->iterate()) { // Do something... if (++$this->readCount % 100 === 0) $this->getEntityManager()->clear(); }
  50. DATABASE CURSOR DATABASE CURSOR bin/console batch:export:beer beers.csv --env=prod Memory: 6.29M

    Time: 112.78s - Memory: 14.68M - Diff: 8.39M [PDOException] SQLSTATE[HY000]: [...]Cannot execute queries while other unbuffered queries are active
  51. SEARCH AFTER SEARCH AFTER mysql> SELECT * FROM beer WHERE

    id > 0 ORDER BY id ASC LIMIT 100; 100 rows in set (0,00 sec) mysql> SELECT * FROM beer WHERE id > 100000 ORDER BY id ASC LIMIT 100; 100 rows in set (0,00 sec) mysql> SELECT * FROM beer WHERE id > 1000000 ORDER BY id ASC LIMIT 100; 100 rows in set (0,00 sec)
  52. SEARCH AFTER SEARCH AFTER private function getResults($id = 0) {

    $qb = $this->repository->createQueryBuilder('b'); $qb ->where($qb->expr()->gt('b.id', ':id')) ->orderBy('b.id') ->setMaxResults(100) ->setParameter('id', $id); $this->results = $qb->getQuery()->execute(); } $ bin/console batch:export:beer beers.csv --env=prod Memory: 6.29M Time: 132.91s - Memory: 14.68M - Diff: 8.39M
  53. IN A NUTSHELL IN A NUTSHELL Find the right moment

    to ush Rework validation Change Tracking Policy Clear UOW Take care at your queries