Slide 1

Slide 1 text

How to write well designed imports with Symfony Denis Brumann SensioLabs Deutschland

Slide 2

Slide 2 text

Denis Brumann @dbrumann denis.brumann@sensiolabs.de iSAQB® Certified Professional for Software Architecture Symfony 4 Certified Developer

Slide 3

Slide 3 text

Background JSR-352 @dbrumann

Slide 4

Slide 4 text

Requirements / What do I mean by "well designed"? testable modifiable reusable fast memory efficient @dbrumann

Slide 5

Slide 5 text

Source @dbrumann

Slide 6

Slide 6 text

Import name.basics.tsv MySQL ~ 9.5 million lines @dbrumann

Slide 7

Slide 7 text

Input nconst primaryName birthYear deathYear primaryProfession knownForTitles nm0004813 Nancy Cartwright 1957 \N actress, soundtrack, producer tt0096697, tt0089153, tt0120685, tt0462538 @dbrumann

Slide 8

Slide 8 text

@dbrumann READ PROCESS WRITE

Slide 9

Slide 9 text

@dbrumann

Slide 10

Slide 10 text

public function execute(ReadContext $readContext, WriteContext $writeContext) { $reader = $this->reader->open($readContext); $writer = $this->writer->open($writeContext); $count = 0; $items = []; $generator = $reader->read(); while ($generator->valid()) { $item = $generator->current(); $items[] = $this->processor->process($item); ++$count; $generator->next(); if (($count % $this->writeInterval) === 0) { $writer->write($items); $items = []; } } $writer->write($items); } @dbrumann

Slide 11

Slide 11 text

public function execute(ReadContext $readContext, WriteContext $writeContext) { $reader = $this->reader->open($readContext); $writer = $this->writer->open($writeContext); $count = 0; $items = []; $generator = $reader->read(); while ($generator->valid()) { $item = $generator->current(); $items[] = $this->processor->process($item); ++$count; $generator->next(); if (($count % $this->writeInterval) === 0) { $writer->write($items); $items = []; } } $writer->write($items); } @dbrumann

Slide 12

Slide 12 text

public function execute(ReadContext $readContext, WriteContext $writeContext) { $reader = $this->reader->open($readContext); $writer = $this->writer->open($writeContext); $count = 0; $items = []; $generator = $reader->read(); while ($generator->valid()) { $item = $generator->current(); $items[] = $this->processor->process($item); ++$count; $generator->next(); if (($count % $this->writeInterval) === 0) { $writer->write($items); $items = []; } } $writer->write($items); } @dbrumann

Slide 13

Slide 13 text

Reader interface Reader { /** * @return Reader Returns an opened Reader-instance * that you can read from. */ public function open(ReadContext $context): Reader; public function read(): Generator; /** * Counts the numbers of processable items based on * the current file and line position. */ public function count(): int; } @dbrumann

Slide 14

Slide 14 text

TsvReader public function open(ReadContext $context): Reader { $reader = clone $this; $reader->file = new SplFileObject($context->filename(), 'r', false); $reader->file->setFlags(SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY | SplFileObject::READ_CSV ); $reader->file->setCsvControl("\t"); $reader->linePosition = $context->linePosition(); $reader->readLimit = $context->readLimit(); return $reader; } @dbrumann

Slide 15

Slide 15 text

TsvReader public function read(): Generator { if ($this->file === null) { throw new \RuntimeException('No file opened! Please call open() first.'); } $count = 0; $this->file->seek($this->linePosition); while ($this->file->valid() && ($this->readLimit === null || $count < $this->readLimit) ) { yield $count => $this->file->current(); ++$count; $this->file->next(); } } @dbrumann

Slide 16

Slide 16 text

TsvReaderTest public function test_reading_names_full(): void { $reader = new TsvReader(); $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 0); $openedReader = $reader->open($context); $generator = $openedReader->read(); $rows = []; while ($generator->valid()) { $rows[] = $generator->current(); $generator->next(); } self::assertCount(3, $rows); self::assertSame('nconst', $rows[0][0]); self::assertSame('nm0000001', $rows[1][0]); self::assertSame('nm0000002', $rows[2][0]); } @dbrumann

Slide 17

Slide 17 text

TsvReaderTest public function test_reading_names_skip_headers(): void { $reader = new TsvReader(); $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 1); $openedReader = $reader->open($context); $generator = $openedReader->read(); $rows = []; while ($generator->valid()) { $rows[] = $generator->current(); $generator->next(); } self::assertCount(2, $rows); self::assertSame('nm0000001', $rows[0][0]); self::assertSame('nm0000002', $rows[1][0]); } @dbrumann

Slide 18

Slide 18 text

TsvReaderTest public function test_read_limit_skip_headers(): void { $reader = new TsvReader(); $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 1, 1); $openedReader = $reader->open($context); $generator = $openedReader->read(); $rows = []; while ($generator->valid()) { $rows[] = $generator->current(); $generator->next(); } self::assertCount(1, $rows); self::assertSame('nm0000001', $rows[0][0]); } @dbrumann

Slide 19

Slide 19 text

Reader testable modifiable reusable fast memory efficient @dbrumann

Slide 20

Slide 20 text

Processor public function process($item) { if ($item[2] === '' || $item[2] === '\N') { $birthYear = null; } else { $birthYear = (int) $item[2]; } if ($item[3] === '' || $item[3] === '\N') { $deathYear = null; } else { $deathYear = (int) $item[3]; } return new Person($item[0], $item[1], $birthYear, $deathYear); } @dbrumann

Slide 21

Slide 21 text

Processor testable modifiable reusable fast memory efficient @dbrumann

Slide 22

Slide 22 text

Writer public function write(iterable $items): void { if ($this->entityManager === null) { throw new \RuntimeException('No EntityManager. Please call open().'); } $count = 0; foreach ($items as $item) { $this->entityManager->persist($item); ++$count; if (($count % $this->batchSize) === 0) { $this->entityManager->flush(); $this->entityManager->clear(); } } $this->entityManager->flush(); $this->entityManager->clear(); } @dbrumann

Slide 23

Slide 23 text

Writer testable modifiable reusable fast memory efficient @dbrumann

Slide 24

Slide 24 text

Import took 1h 3m 23s using 12.58MB @dbrumann

Slide 25

Slide 25 text

PARTITIONING @dbrumann

Slide 26

Slide 26 text

PartitionManager while ($range < $totalItemCount) { if (count($this->processes) >= $this->processLimit) { sleep(2); foreach ($this->processes as $index => $process) { if (!$process->isRunning()) { unset($this->processes[$index]); } } } if (count($this->processes) < $this->processLimit) { $process = new Process(['php', 'bin/console', $command, '--amount', (string) $partitionSize, (string) $offset]); $process->start(); $processes[] = $process; $offset += $partitionSize; $range += $partitionSize; } } @dbrumann

Slide 27

Slide 27 text

PartitionManager while ($range < $totalItemCount) { if (count($this->processes) >= $this->processLimit) { sleep(2); foreach ($this->processes as $index => $process) { if (!$process->isRunning()) { unset($this->processes[$index]); } } } if (count($this->processes) < $this->processLimit) { $process = new Process(['php', 'bin/console', $command, '--amount', (string) $partitionSize, (string) $offset]); $process->start(); $processes[] = $process; $offset += $partitionSize; $range += $partitionSize; } } @dbrumann

Slide 28

Slide 28 text

Import took 28m 20s using 10.49MB @dbrumann

Slide 29

Slide 29 text

PARTITIONING @dbrumann

Slide 30

Slide 30 text

SYMFONY MESSENGER @dbrumann

Slide 31

Slide 31 text

send @dbrumann

Slide 32

Slide 32 text

receive @dbrumann

Slide 33

Slide 33 text

send receive @dbrumann

Slide 34

Slide 34 text

BETTER PARTITIONG @dbrumann

Slide 35

Slide 35 text

Queue @dbrumann

Slide 36

Slide 36 text

Queue @dbrumann

Slide 37

Slide 37 text

Consuming @dbrumann Usage: messenger:consume [options] [--] [...] messenger:consume-messages Arguments: receivers Names of the receivers/transports to consume in order of priority [default: ["async"]] Options: -l, --limit=LIMIT Limit the number of received messages -m, --memory-limit=MEMORY-LIMIT The memory limit the worker can consume -t, --time-limit=TIME-LIMIT The time limit in seconds the worker can run --sleep=SLEEP Seconds to sleep before asking for new messages after no messages were found [default: 1] -b, --bus=BUS Name of the bus to which received messages should be dispatched (if not passed, bus is determine automatically.

Slide 38

Slide 38 text

Pros testable modifiable reusable fast memory efficient scalable @dbrumann

Slide 39

Slide 39 text

Cons complex infrastructure needs harder to understand concurrency & side effects not easily retrofitted @dbrumann

Slide 40

Slide 40 text

Thank You Denis Brumann SensioLabs Deutschland