How to write well designed imports with Symfony

How to write well designed imports with Symfony

Importing data into an application is a common task through different fields. Since imports are often running in the background, they tend to be forgotten, until things break or don't meet requirements anymore, that is. Let's bring these import scripts out of the shadows and cast some light on how to improve them.

In this talk we will look at a few ways to write imports, from barebone scripts to an elaborate import domain utilizing all the bells and whistles provided by modern frameworks. We will discuss what to look out for when designing and improving them and build a checklist of things to consider before getting started. Hopefully, by the end of the talk you are motivated to look at the imports hidden away in your projects and are motivated to improve them, before things break, write better imports in the future or just give you the good feeling that you are already on the right track.

6a1345d8e6dd15b2c78eff0c331963b1?s=128

Denis Brumann

October 24, 2019
Tweet

Transcript

  1. How to write well designed imports with Symfony Denis Brumann

    SensioLabs Deutschland
  2. Denis Brumann @dbrumann denis.brumann@sensiolabs.de iSAQB® Certified Professional for Software Architecture

    Symfony 4 Certified Developer
  3. Background JSR-352 @dbrumann

  4. Requirements / What do I mean by "well designed"? testable

    modifiable reusable fast memory efficient @dbrumann
  5. Source @dbrumann

  6. Import name.basics.tsv MySQL ~ 9.5 million lines @dbrumann

  7. Input nconst primaryName birthYear deathYear primaryProfession knownForTitles nm0004813 Nancy Cartwright

    1957 \N actress, soundtrack, producer tt0096697, tt0089153, tt0120685, tt0462538 @dbrumann
  8. @dbrumann READ PROCESS WRITE

  9. @dbrumann

  10. public function execute(ReadContext $readContext, WriteContext $writeContext) { $reader = $this->reader->open($readContext);

    $writer = $this->writer->open($writeContext); $count = 0; $items = []; $generator = $reader->read(); while ($generator->valid()) { $item = $generator->current(); $items[] = $this->processor->process($item); ++$count; $generator->next(); if (($count % $this->writeInterval) === 0) { $writer->write($items); $items = []; } } $writer->write($items); } @dbrumann
  11. public function execute(ReadContext $readContext, WriteContext $writeContext) { $reader = $this->reader->open($readContext);

    $writer = $this->writer->open($writeContext); $count = 0; $items = []; $generator = $reader->read(); while ($generator->valid()) { $item = $generator->current(); $items[] = $this->processor->process($item); ++$count; $generator->next(); if (($count % $this->writeInterval) === 0) { $writer->write($items); $items = []; } } $writer->write($items); } @dbrumann
  12. public function execute(ReadContext $readContext, WriteContext $writeContext) { $reader = $this->reader->open($readContext);

    $writer = $this->writer->open($writeContext); $count = 0; $items = []; $generator = $reader->read(); while ($generator->valid()) { $item = $generator->current(); $items[] = $this->processor->process($item); ++$count; $generator->next(); if (($count % $this->writeInterval) === 0) { $writer->write($items); $items = []; } } $writer->write($items); } @dbrumann
  13. Reader interface Reader { /** * @return Reader Returns an

    opened Reader-instance * that you can read from. */ public function open(ReadContext $context): Reader; public function read(): Generator; /** * Counts the numbers of processable items based on * the current file and line position. */ public function count(): int; } @dbrumann
  14. TsvReader public function open(ReadContext $context): Reader { $reader = clone

    $this; $reader->file = new SplFileObject($context->filename(), 'r', false); $reader->file->setFlags(SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY | SplFileObject::READ_CSV ); $reader->file->setCsvControl("\t"); $reader->linePosition = $context->linePosition(); $reader->readLimit = $context->readLimit(); return $reader; } @dbrumann
  15. TsvReader public function read(): Generator { if ($this->file === null)

    { throw new \RuntimeException('No file opened! Please call open() first.'); } $count = 0; $this->file->seek($this->linePosition); while ($this->file->valid() && ($this->readLimit === null || $count < $this->readLimit) ) { yield $count => $this->file->current(); ++$count; $this->file->next(); } } @dbrumann
  16. TsvReaderTest public function test_reading_names_full(): void { $reader = new TsvReader();

    $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 0); $openedReader = $reader->open($context); $generator = $openedReader->read(); $rows = []; while ($generator->valid()) { $rows[] = $generator->current(); $generator->next(); } self::assertCount(3, $rows); self::assertSame('nconst', $rows[0][0]); self::assertSame('nm0000001', $rows[1][0]); self::assertSame('nm0000002', $rows[2][0]); } @dbrumann
  17. TsvReaderTest public function test_reading_names_skip_headers(): void { $reader = new TsvReader();

    $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 1); $openedReader = $reader->open($context); $generator = $openedReader->read(); $rows = []; while ($generator->valid()) { $rows[] = $generator->current(); $generator->next(); } self::assertCount(2, $rows); self::assertSame('nm0000001', $rows[0][0]); self::assertSame('nm0000002', $rows[1][0]); } @dbrumann
  18. TsvReaderTest public function test_read_limit_skip_headers(): void { $reader = new TsvReader();

    $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 1, 1); $openedReader = $reader->open($context); $generator = $openedReader->read(); $rows = []; while ($generator->valid()) { $rows[] = $generator->current(); $generator->next(); } self::assertCount(1, $rows); self::assertSame('nm0000001', $rows[0][0]); } @dbrumann
  19. Reader testable modifiable reusable fast memory efficient @dbrumann

  20. Processor public function process($item) { if ($item[2] === '' ||

    $item[2] === '\N') { $birthYear = null; } else { $birthYear = (int) $item[2]; } if ($item[3] === '' || $item[3] === '\N') { $deathYear = null; } else { $deathYear = (int) $item[3]; } return new Person($item[0], $item[1], $birthYear, $deathYear); } @dbrumann
  21. Processor testable modifiable reusable fast memory efficient @dbrumann

  22. Writer public function write(iterable $items): void { if ($this->entityManager ===

    null) { throw new \RuntimeException('No EntityManager. Please call open().'); } $count = 0; foreach ($items as $item) { $this->entityManager->persist($item); ++$count; if (($count % $this->batchSize) === 0) { $this->entityManager->flush(); $this->entityManager->clear(); } } $this->entityManager->flush(); $this->entityManager->clear(); } @dbrumann
  23. Writer testable modifiable reusable fast memory efficient @dbrumann

  24. Import took 1h 3m 23s using 12.58MB @dbrumann

  25. PARTITIONING @dbrumann

  26. PartitionManager while ($range < $totalItemCount) { if (count($this->processes) >= $this->processLimit)

    { sleep(2); foreach ($this->processes as $index => $process) { if (!$process->isRunning()) { unset($this->processes[$index]); } } } if (count($this->processes) < $this->processLimit) { $process = new Process(['php', 'bin/console', $command, '--amount', (string) $partitionSize, (string) $offset]); $process->start(); $processes[] = $process; $offset += $partitionSize; $range += $partitionSize; } } @dbrumann
  27. PartitionManager while ($range < $totalItemCount) { if (count($this->processes) >= $this->processLimit)

    { sleep(2); foreach ($this->processes as $index => $process) { if (!$process->isRunning()) { unset($this->processes[$index]); } } } if (count($this->processes) < $this->processLimit) { $process = new Process(['php', 'bin/console', $command, '--amount', (string) $partitionSize, (string) $offset]); $process->start(); $processes[] = $process; $offset += $partitionSize; $range += $partitionSize; } } @dbrumann
  28. Import took 28m 20s using 10.49MB @dbrumann

  29. PARTITIONING @dbrumann

  30. SYMFONY MESSENGER @dbrumann

  31. send @dbrumann

  32. receive @dbrumann

  33. send receive @dbrumann

  34. BETTER PARTITIONG @dbrumann

  35. Queue @dbrumann

  36. Queue @dbrumann

  37. Consuming @dbrumann Usage: messenger:consume [options] [--] [<receivers>...] messenger:consume-messages Arguments: receivers

    Names of the receivers/transports to consume in order of priority [default: ["async"]] Options: -l, --limit=LIMIT Limit the number of received messages -m, --memory-limit=MEMORY-LIMIT The memory limit the worker can consume -t, --time-limit=TIME-LIMIT The time limit in seconds the worker can run --sleep=SLEEP Seconds to sleep before asking for new messages after no messages were found [default: 1] -b, --bus=BUS Name of the bus to which received messages should be dispatched (if not passed, bus is determine automatically.
  38. Pros testable modifiable reusable fast memory efficient scalable @dbrumann

  39. Cons complex infrastructure needs harder to understand concurrency & side

    effects not easily retrofitted @dbrumann
  40. Thank You Denis Brumann SensioLabs Deutschland