Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to write well designed imports with Symfony

How to write well designed imports with Symfony

Importing data into an application is a common task through different fields. Since imports are often running in the background, they tend to be forgotten, until things break or don't meet requirements anymore, that is. Let's bring these import scripts out of the shadows and cast some light on how to improve them.

In this talk we will look at a few ways to write imports, from barebone scripts to an elaborate import domain utilizing all the bells and whistles provided by modern frameworks. We will discuss what to look out for when designing and improving them and build a checklist of things to consider before getting started. Hopefully, by the end of the talk you are motivated to look at the imports hidden away in your projects and are motivated to improve them, before things break, write better imports in the future or just give you the good feeling that you are already on the right track.

Denis Brumann

October 24, 2019
Tweet

More Decks by Denis Brumann

Other Decks in Programming

Transcript

  1. How to write well designed
    imports with Symfony
    Denis Brumann
    SensioLabs Deutschland

    View full-size slide

  2. Denis Brumann
    @dbrumann
    [email protected]
    iSAQB® Certified Professional for Software Architecture
    Symfony 4 Certified Developer

    View full-size slide

  3. Background
    JSR-352
    @dbrumann

    View full-size slide

  4. Requirements / What do I mean by "well designed"?
    testable
    modifiable
    reusable
    fast
    memory efficient
    @dbrumann

    View full-size slide

  5. Source
    @dbrumann

    View full-size slide

  6. Import
    name.basics.tsv MySQL
    ~ 9.5 million lines
    @dbrumann

    View full-size slide

  7. Input
    nconst primaryName birthYear deathYear primaryProfession knownForTitles
    nm0004813 Nancy Cartwright 1957 \N actress,
    soundtrack,
    producer
    tt0096697,
    tt0089153,
    tt0120685,
    tt0462538
    @dbrumann

    View full-size slide

  8. @dbrumann
    READ PROCESS WRITE

    View full-size slide

  9. public function execute(ReadContext $readContext, WriteContext $writeContext)
    {
    $reader = $this->reader->open($readContext);
    $writer = $this->writer->open($writeContext);
    $count = 0;
    $items = [];
    $generator = $reader->read();
    while ($generator->valid()) {
    $item = $generator->current();
    $items[] = $this->processor->process($item);
    ++$count;
    $generator->next();
    if (($count % $this->writeInterval) === 0) {
    $writer->write($items);
    $items = [];
    }
    }
    $writer->write($items);
    }
    @dbrumann

    View full-size slide

  10. public function execute(ReadContext $readContext, WriteContext $writeContext)
    {
    $reader = $this->reader->open($readContext);
    $writer = $this->writer->open($writeContext);
    $count = 0;
    $items = [];
    $generator = $reader->read();
    while ($generator->valid()) {
    $item = $generator->current();
    $items[] = $this->processor->process($item);
    ++$count;
    $generator->next();
    if (($count % $this->writeInterval) === 0) {
    $writer->write($items);
    $items = [];
    }
    }
    $writer->write($items);
    }
    @dbrumann

    View full-size slide

  11. public function execute(ReadContext $readContext, WriteContext $writeContext)
    {
    $reader = $this->reader->open($readContext);
    $writer = $this->writer->open($writeContext);
    $count = 0;
    $items = [];
    $generator = $reader->read();
    while ($generator->valid()) {
    $item = $generator->current();
    $items[] = $this->processor->process($item);
    ++$count;
    $generator->next();
    if (($count % $this->writeInterval) === 0) {
    $writer->write($items);
    $items = [];
    }
    }
    $writer->write($items);
    }
    @dbrumann

    View full-size slide

  12. Reader
    interface Reader
    {
    /**
    * @return Reader Returns an opened Reader-instance
    * that you can read from.
    */
    public function open(ReadContext $context): Reader;
    public function read(): Generator;
    /**
    * Counts the numbers of processable items based on
    * the current file and line position.
    */
    public function count(): int;
    }
    @dbrumann

    View full-size slide

  13. TsvReader
    public function open(ReadContext $context): Reader
    {
    $reader = clone $this;
    $reader->file = new SplFileObject($context->filename(), 'r', false);
    $reader->file->setFlags(SplFileObject::DROP_NEW_LINE
    | SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY
    | SplFileObject::READ_CSV
    );
    $reader->file->setCsvControl("\t");
    $reader->linePosition = $context->linePosition();
    $reader->readLimit = $context->readLimit();
    return $reader;
    }
    @dbrumann

    View full-size slide

  14. TsvReader
    public function read(): Generator
    {
    if ($this->file === null) {
    throw new \RuntimeException('No file opened! Please call open() first.');
    }
    $count = 0;
    $this->file->seek($this->linePosition);
    while ($this->file->valid()
    && ($this->readLimit === null || $count < $this->readLimit)
    ) {
    yield $count => $this->file->current();
    ++$count;
    $this->file->next();
    }
    }
    @dbrumann

    View full-size slide

  15. TsvReaderTest
    public function test_reading_names_full(): void
    {
    $reader = new TsvReader();
    $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 0);
    $openedReader = $reader->open($context);
    $generator = $openedReader->read();
    $rows = [];
    while ($generator->valid()) {
    $rows[] = $generator->current();
    $generator->next();
    }
    self::assertCount(3, $rows);
    self::assertSame('nconst', $rows[0][0]);
    self::assertSame('nm0000001', $rows[1][0]);
    self::assertSame('nm0000002', $rows[2][0]);
    }
    @dbrumann

    View full-size slide

  16. TsvReaderTest
    public function test_reading_names_skip_headers(): void
    {
    $reader = new TsvReader();
    $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 1);
    $openedReader = $reader->open($context);
    $generator = $openedReader->read();
    $rows = [];
    while ($generator->valid()) {
    $rows[] = $generator->current();
    $generator->next();
    }
    self::assertCount(2, $rows);
    self::assertSame('nm0000001', $rows[0][0]);
    self::assertSame('nm0000002', $rows[1][0]);
    }
    @dbrumann

    View full-size slide

  17. TsvReaderTest
    public function test_read_limit_skip_headers(): void
    {
    $reader = new TsvReader();
    $context = new ReadContext(__DIR__ . '/../Fixtures/name.basics.tsv', 1, 1);
    $openedReader = $reader->open($context);
    $generator = $openedReader->read();
    $rows = [];
    while ($generator->valid()) {
    $rows[] = $generator->current();
    $generator->next();
    }
    self::assertCount(1, $rows);
    self::assertSame('nm0000001', $rows[0][0]);
    }
    @dbrumann

    View full-size slide

  18. Reader
    testable
    modifiable
    reusable
    fast
    memory efficient
    @dbrumann

    View full-size slide

  19. Processor
    public function process($item)
    {
    if ($item[2] === '' || $item[2] === '\N') {
    $birthYear = null;
    } else {
    $birthYear = (int) $item[2];
    }
    if ($item[3] === '' || $item[3] === '\N') {
    $deathYear = null;
    } else {
    $deathYear = (int) $item[3];
    }
    return new Person($item[0], $item[1], $birthYear, $deathYear);
    }
    @dbrumann

    View full-size slide

  20. Processor
    testable
    modifiable
    reusable
    fast
    memory efficient
    @dbrumann

    View full-size slide

  21. Writer
    public function write(iterable $items): void
    {
    if ($this->entityManager === null) {
    throw new \RuntimeException('No EntityManager. Please call open().');
    }
    $count = 0;
    foreach ($items as $item) {
    $this->entityManager->persist($item);
    ++$count;
    if (($count % $this->batchSize) === 0) {
    $this->entityManager->flush();
    $this->entityManager->clear();
    }
    }
    $this->entityManager->flush();
    $this->entityManager->clear();
    }
    @dbrumann

    View full-size slide

  22. Writer
    testable
    modifiable
    reusable
    fast
    memory efficient
    @dbrumann

    View full-size slide

  23. Import took 1h 3m 23s
    using 12.58MB
    @dbrumann

    View full-size slide

  24. PARTITIONING
    @dbrumann

    View full-size slide

  25. PartitionManager
    while ($range < $totalItemCount) {
    if (count($this->processes) >= $this->processLimit) {
    sleep(2);
    foreach ($this->processes as $index => $process) {
    if (!$process->isRunning()) {
    unset($this->processes[$index]);
    }
    }
    }
    if (count($this->processes) < $this->processLimit) {
    $process = new Process(['php', 'bin/console', $command,
    '--amount', (string) $partitionSize, (string) $offset]);
    $process->start();
    $processes[] = $process;
    $offset += $partitionSize;
    $range += $partitionSize;
    }
    }
    @dbrumann

    View full-size slide

  26. PartitionManager
    while ($range < $totalItemCount) {
    if (count($this->processes) >= $this->processLimit) {
    sleep(2);
    foreach ($this->processes as $index => $process) {
    if (!$process->isRunning()) {
    unset($this->processes[$index]);
    }
    }
    }
    if (count($this->processes) < $this->processLimit) {
    $process = new Process(['php', 'bin/console', $command,
    '--amount', (string) $partitionSize, (string) $offset]);
    $process->start();
    $processes[] = $process;
    $offset += $partitionSize;
    $range += $partitionSize;
    }
    }
    @dbrumann

    View full-size slide

  27. Import took 28m 20s
    using 10.49MB
    @dbrumann

    View full-size slide

  28. PARTITIONING
    @dbrumann

    View full-size slide

  29. SYMFONY MESSENGER
    @dbrumann

    View full-size slide

  30. send
    @dbrumann

    View full-size slide

  31. receive
    @dbrumann

    View full-size slide

  32. send
    receive
    @dbrumann

    View full-size slide

  33. BETTER PARTITIONG
    @dbrumann

    View full-size slide

  34. Queue
    @dbrumann

    View full-size slide

  35. Queue
    @dbrumann

    View full-size slide

  36. Consuming
    @dbrumann
    Usage:
    messenger:consume [options] [--] [...]
    messenger:consume-messages
    Arguments:
    receivers Names of the receivers/transports to consume in
    order of priority [default: ["async"]]
    Options:
    -l, --limit=LIMIT Limit the number of received messages
    -m, --memory-limit=MEMORY-LIMIT The memory limit the worker can consume
    -t, --time-limit=TIME-LIMIT The time limit in seconds the worker can run
    --sleep=SLEEP Seconds to sleep before asking for new messages
    after no messages were found [default: 1]
    -b, --bus=BUS Name of the bus to which received messages should
    be dispatched (if not passed, bus is determine
    automatically.

    View full-size slide

  37. Pros
    testable
    modifiable
    reusable
    fast
    memory efficient
    scalable
    @dbrumann

    View full-size slide

  38. Cons
    complex infrastructure needs
    harder to understand
    concurrency & side effects
    not easily retrofitted
    @dbrumann

    View full-size slide

  39. Thank You
    Denis Brumann
    SensioLabs Deutschland

    View full-size slide