Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Iterators & Generators: The Road To Virtual Infinite RAM

Iterators & Generators: The Road To Virtual Infinite RAM

Iterators could save your day on many occasions.

Everyone has already been bothered by memory overflows when trying to manage too big collections in PHP. These are omnipresent in the most famous PHP libraries: Symfony, Doctrine, PHPUnit and even the SPL. The latter contains a lot of ready-to-use iterators that will allow you to manage collections of any size, without ever having to worry about memory.

You won't be able to do without Iterators and Generators soon!

Alexandre Daubois

June 05, 2023
Tweet

More Decks by Alexandre Daubois

Other Decks in Programming

Transcript

  1. Iterators & Generators
    The Road to Virtual In nite RAM
    1

    View Slide

  2. Who am I?
    2

    View Slide

  3. Alexandre Daubois
    Symfony Lead Developer
    at WanadevDigital
    Author of "Clean Code in PHP"
    Conference speaker
    @alexdaubois
    3

    View Slide

  4. 4

    View Slide

  5. HD Renders from our
    webapp Kazaplan
    5

    View Slide

  6. 6

    View Slide

  7. This is "just" one big JSON
    collection
    7

    View Slide

  8. Collections?
    8

    View Slide

  9. Collections?
    A B C D E
    9

    View Slide

  10. Collections?
    A container of n elements, which could be
    stored...
    In memory On a SSD Remotely
    A B C D E
    10

    View Slide

  11. A tree collection
    A
    B C
    D E F
    11

    View Slide

  12. The same tree
    (Because why not)
    With a different traversal method
    A
    B C
    D E F
    12

    View Slide

  13. What about...
    class MyTree
    {
    // ...
    }
    $tree = new MyTree();
    foreach ($tree as $element) {
    // ...
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13

    View Slide

  14. Iterators
    Do you know about design patterns?
    14

    View Slide

  15. De ne how a collection
    should be browsed
    When a simple array access is no more possible
    15

    View Slide

  16. Array access doesn't t.
    A
    B C
    D E F
    A
    B C
    D E F
    16

    View Slide

  17. 17

    View Slide

  18. ✅ Inner structure is hidden
    ✅ Access logic is decoupled from storage
    ✅ The collection actually only stores data
    ✅ Easier to create a new traversal behavior
    ✅ Easier to test
    18

    View Slide

  19. PHP got your back.
    interface Iterator extends Traversable
    {
    public function current(): mixed;
    public function key(): mixed;
    public function next(): void;
    public function rewind(): void;
    public function valid(): bool;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    https://www.php.net/manual/en/class.iterator.php
    19

    View Slide

  20. PHP got your back.
    interface Iterator extends Traversable
    {
    public function current(): mixed;
    public function key(): mixed;
    public function next(): void;
    public function rewind(): void;
    public function valid(): bool;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    interface Iterator extends Traversable
    1
    {
    2
    public function current(): mixed;
    3
    4
    public function key(): mixed;
    5
    6
    public function next(): void;
    7
    8
    public function rewind(): void;
    9
    10
    public function valid(): bool;
    11
    }
    12
    https://www.php.net/manual/en/class.iterator.php
    19.1

    View Slide

  21. class WidthIterator implements \Iterator
    {
    public function __construct(private MyCollection $collection) {}
    // ...
    }
    class MyTree implements \IteratorAggregate
    {
    // ...
    public function __construct() {}
    // Must be implemented when creating an `\IteratorAggregate`
    public function getIterator(): \Traversable
    {
    return new WidthIterator($this);
    }
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20

    View Slide

  22. The SPL got your back.
    AppendIterator
    ArrayIterator
    CachingIterator
    CallbackFilterIterator
    DirectoryIterator
    EmptyIterator
    FilesystemIterator
    FilterIterator
    GlobIterator
    InfiniteIterator
    IteratorIterator
    LimitIterator
    MultipleIterator
    NoRewindIterator
    ParentIterator
    RecursiveArrayIterator
    RecursiveCachingIterator
    RecursiveCallbackFilterIterator
    RecursiveDirectoryIterator
    RecursiveFilterIterator
    RecursiveIteratorIterator
    RecursiveRegexIterator
    RecursiveTreeIterator
    RegexIterator
    21

    View Slide

  23. Let's talk about
    virtual in nite RAM.
    Now, you know iterators.
    22

    View Slide

  24. php -d memory_limit=-1
    Who already did this?
    23

    View Slide

  25. yield
    Who already did this?
    24

    View Slide

  26. Generators
    All about compromises
    25

    View Slide

  27. Let's (really) talk about
    virtual in nite RAM.
    26

    View Slide

  28. What if the data source is...
    ❓ actually 1 billion elements in a remote cache
    ❓ a file of 1 TB
    ❓ a video being streamed
    ❓ an HTTP response of 800 MB
    27

    View Slide

  29. 🚫 php -d memory_limit=-1
    ✨ Generators

    28

    View Slide

  30. Create values on the y
    using yield
    29

    View Slide

  31. // 1st solution : The "classical" way
    function getCsvValues(): array
    {
    $handle = \fopen('file.csv', 'r');
    $data = [];
    // Read everything and store it in memory
    while (false !== $line = \fgetcsv($handle)) {
    $data[] = $data;
    }
    return $data;
    }
    foreach (getCsvValues() as $line) { /** ... */ }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    // 2nd solution : The "generator" way
    17
    function generateCsvValues(): \Generator
    18
    {
    19
    $handle = \fopen('file.csv', 'r');
    20
    21
    // Line by line, yield as soon as it's read
    22
    while (false !== $data = \fgetcsv($handle)) {
    23
    yield $data;
    24
    }
    25
    }
    26
    27
    foreach (generateCsvValues() as $line) { /** ... */ }
    28
    30

    View Slide

  32. // 1st solution : The "classical" way
    function getCsvValues(): array
    {
    $handle = \fopen('file.csv', 'r');
    $data = [];
    // Read everything and store it in memory
    while (false !== $line = \fgetcsv($handle)) {
    $data[] = $data;
    }
    return $data;
    }
    foreach (getCsvValues() as $line) { /** ... */ }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    // 2nd solution : The "generator" way
    17
    function generateCsvValues(): \Generator
    18
    {
    19
    $handle = \fopen('file.csv', 'r');
    20
    21
    // Line by line, yield as soon as it's read
    22
    while (false !== $data = \fgetcsv($handle)) {
    23
    yield $data;
    24
    }
    25
    }
    26
    27
    foreach (generateCsvValues() as $line) { /** ... */ }
    28
    // 2nd solution : The "generator" way
    function generateCsvValues(): \Generator
    {
    $handle = \fopen('file.csv', 'r');
    // Line by line, yield as soon as it's read
    while (false !== $data = \fgetcsv($handle)) {
    yield $data;
    }
    }
    foreach (generateCsvValues() as $line) { /** ... */ }
    // 1st solution : The "classical" way
    1
    function getCsvValues(): array
    2
    {
    3
    $handle = \fopen('file.csv', 'r');
    4
    $data = [];
    5
    6
    // Read everything and store it in memory
    7
    while (false !== $line = \fgetcsv($handle)) {
    8
    $data[] = $data;
    9
    }
    10
    11
    return $data;
    12
    }
    13
    14
    foreach (getCsvValues() as $line) { /** ... */ }
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    30.1

    View Slide

  33. // 1st solution : The "classical" way
    function getCsvValues(): array
    {
    $handle = \fopen('file.csv', 'r');
    $data = [];
    // Read everything and store it in memory
    while (false !== $line = \fgetcsv($handle)) {
    $data[] = $data;
    }
    return $data;
    }
    foreach (getCsvValues() as $line) { /** ... */ }
    // 2nd solution : The "generator" way
    function generateCsvValues(): \Generator
    {
    $handle = \fopen('file.csv', 'r');
    // Line by line, yield as soon as it's read
    while (false !== $data = \fgetcsv($handle)) {
    yield $data;
    }
    }
    foreach (generateCsvValues() as $line) { /** ... */ }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    If "file.csv" is 1TB large, there's only one winner
    31

    View Slide

  34. final class Generator implements Iterator
    {
    public function current(): mixed;
    public function getReturn(): mixed;
    public function key(): mixed;
    public function next(): void;
    public function rewind(): void;
    public function send(mixed $value): mixed;
    public function throw(Throwable $exception): mixed;
    public function valid(): bool;
    public function __wakeup(): void;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    Generators are iterators.
    Literally.
    32

    View Slide

  35. class LazyCsvProvider implements \IteratorAggregate
    {
    public function __construct(private string $source)
    {
    }
    private function fetchData(): \Generator
    {
    $handle = \fopen($this->source, 'r');
    // Line by line, as soon as it is read
    while (false !== $data = \fgetcsv($handle)) {
    yield $data;
    }
    }
    public function getIterator(): \Generator
    {
    yield from $this->fetchData();
    }
    }
    $lazyProvider = new LazyCsvProvider(/** ... */);
    foreach ($lazyProvider as $files) {
    // ...
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    33

    View Slide

  36. There are always
    compromises: distribute
    the load.
    34

    View Slide

  37. Identify your bottlenecks
    (Before they happen!)
    35

    View Slide

  38. I/O vs Memory vs Network
    36

    View Slide

  39. Create generators
    37

    View Slide

  40. Generators
    CPU performances
    38

    View Slide

  41. 3 implementations to benchmark
    "range()"
    39

    View Slide

  42. $values = [];
    // Creates values between two bounds
    foreach (\range(1, 5) as $value) {
    $values[] = $value;
    }
    echo \implode(', ', $values);
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    1, 2, 3, 4, 5
    40

    View Slide

  43. Benchmarking "range()"
    Native implementation
    Iterator implementation
    Generator implementation
    https://gist.github.com/nikic/2975796
    (By Nikita Popov (@nikita_ppv))
    41

    View Slide

  44. 40%
    faster than native implementation
    Generator implementation is up to
    42

    View Slide

  45. 4x
    faster than iterator implementation
    And up to
    43

    View Slide

  46. “ For small ranges (around one hundred elements), [...] generators are
    slightly slower than the native implementation, but still faster than the
    iterator variant.
    - Nikita Popov, Generator's RFC
    https://wiki.php.net/rfc/generators
    44

    View Slide

  47. Generators
    Internals: how does it
    really work?
    45

    View Slide

  48. Let's use the
    Re ection
    API
    46

    View Slide

  49. class ReflectionGenerator
    {
    public function __construct(Generator $generator);
    public function getExecutingLine(): int;
    public function getExecutingFile(): string;
    public function getTrace(int $options = DEBUG_BACKTRACE_PROVIDE_OBJECT): array;
    public function getFunction(): ReflectionFunctionAbstract;
    public function getThis(): ?object;
    public function getExecutingGenerator(): Generator;
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    Reflecting Generators
    47

    View Slide

  50. Reflecting Generators
    class ReflectionGenerator
    {
    // ...
    public function getExecutingLine(): int;
    public function getExecutingFile(): string;
    // ...
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    Generator's "physical" location is actually stored
    48

    View Slide

  51. PHP internals, to finalize your
    knowledge
    (understandable by everyone, I promise)
    49

    View Slide

  52. Generators: PHP internals
    Zend
    Generator
    Instructions of
    the generating
    function
    Execution
    context
    Local
    variables
    Current
    instruction
    (and a few other things, but you get it)
    50

    View Slide

  53. Want more?
    // Zend/zend_generators.h
    // ...
    struct _zend_generator {
    zend_object std;
    zend_execute_data *execute_data;
    zend_execute_data *frozen_call_stack;
    zval value;
    zval key;
    zval retval;
    zval *send_target;
    zend_long largest_used_integer_key;
    zval values;
    zend_generator_node node;
    zend_execute_data execute_fake;
    uint8_t flags;
    };
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    https://github.com/php/php-src/blob/master/Zend/zend_generators.h
    51

    View Slide

  54. Generators
    In third-party libraries
    52

    View Slide

  55. Symfony
    Component Example of usage
    Console Manage your command's output
    Finder Discover directories and files lazily
    HttpFoundation Create streamed responses for big files
    HttpClient Responses multiplexing
    Process Fetch a process output
    (They're nearly everywhere!)
    53

    View Slide

  56. Doctrine
    use Doctrine\ORM\EntityManagerInterface;
    public function getProducts(EntityManagerInterface $entityManager)
    {
    $qb = $entityManager->createQueryBuilder()
    ->select('p')
    ->from(Product::class, 'p')
    ->where(/** ... */)
    ;
    foreach ($qb->getQuery()->toIterable() as $product) {
    yield $product;
    }
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    54

    View Slide

  57. PHPUnit
    use PHPUnit\Framework\TestCase;
    class MyTest extends TestCase
    {
    /**
    * @dataProvider dataProvider
    */
    public function testSomething(int|float $value)
    {
    $this->assertGreaterThan(0, $value);
    }
    public function dataProvider()
    {
    yield 'Positive set' => [1];
    yield 'Float set' => [1.0];
    }
    }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    55

    View Slide

  58. Generators
    are everywhere
    (and that's for a good reason)
    56

    View Slide

  59. Thanks!
    @alexdaubois
    57

    View Slide