Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Iterators & Generators: The Road To Virtual Infinite RAM

Iterators & Generators: The Road To Virtual Infinite RAM

Iterators could save your day on many occasions.

Everyone has already been bothered by memory overflows when trying to manage too big collections in PHP. These are omnipresent in the most famous PHP libraries: Symfony, Doctrine, PHPUnit and even the SPL. The latter contains a lot of ready-to-use iterators that will allow you to manage collections of any size, without ever having to worry about memory.

You won't be able to do without Iterators and Generators soon!

Alexandre Daubois

June 05, 2023
Tweet

More Decks by Alexandre Daubois

Other Decks in Programming

Transcript

  1. 4

  2. 6

  3. Collections? A container of n elements, which could be stored...

    In memory On a SSD Remotely A B C D E 10
  4. What about... <?php class MyTree { // ... } $tree

    = new MyTree(); foreach ($tree as $element) { // ... } 1 2 3 4 5 6 7 8 9 10 11 12 13
  5. De ne how a collection should be browsed When a

    simple array access is no more possible 15
  6. 17

  7. ✅ Inner structure is hidden ✅ Access logic is decoupled

    from storage ✅ The collection actually only stores data ✅ Easier to create a new traversal behavior ✅ Easier to test 18
  8. PHP got your back. interface Iterator extends Traversable { public

    function current(): mixed; public function key(): mixed; public function next(): void; public function rewind(): void; public function valid(): bool; } 1 2 3 4 5 6 7 8 9 10 11 12 https://www.php.net/manual/en/class.iterator.php 19
  9. PHP got your back. interface Iterator extends Traversable { public

    function current(): mixed; public function key(): mixed; public function next(): void; public function rewind(): void; public function valid(): bool; } 1 2 3 4 5 6 7 8 9 10 11 12 interface Iterator extends Traversable 1 { 2 public function current(): mixed; 3 4 public function key(): mixed; 5 6 public function next(): void; 7 8 public function rewind(): void; 9 10 public function valid(): bool; 11 } 12 https://www.php.net/manual/en/class.iterator.php 19.1
  10. class WidthIterator implements \Iterator { public function __construct(private MyCollection $collection)

    {} // ... } class MyTree implements \IteratorAggregate { // ... public function __construct() {} // Must be implemented when creating an `\IteratorAggregate` public function getIterator(): \Traversable { return new WidthIterator($this); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  11. The SPL got your back. AppendIterator ArrayIterator CachingIterator CallbackFilterIterator DirectoryIterator

    EmptyIterator FilesystemIterator FilterIterator GlobIterator InfiniteIterator IteratorIterator LimitIterator MultipleIterator NoRewindIterator ParentIterator RecursiveArrayIterator RecursiveCachingIterator RecursiveCallbackFilterIterator RecursiveDirectoryIterator RecursiveFilterIterator RecursiveIteratorIterator RecursiveRegexIterator RecursiveTreeIterator RegexIterator 21
  12. What if the data source is... ❓ actually 1 billion

    elements in a remote cache ❓ a file of 1 TB ❓ a video being streamed ❓ an HTTP response of 800 MB 27
  13. // 1st solution : The "classical" way function getCsvValues(): array

    { $handle = \fopen('file.csv', 'r'); $data = []; // Read everything and store it in memory while (false !== $line = \fgetcsv($handle)) { $data[] = $data; } return $data; } foreach (getCsvValues() as $line) { /** ... */ } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 // 2nd solution : The "generator" way 17 function generateCsvValues(): \Generator 18 { 19 $handle = \fopen('file.csv', 'r'); 20 21 // Line by line, yield as soon as it's read 22 while (false !== $data = \fgetcsv($handle)) { 23 yield $data; 24 } 25 } 26 27 foreach (generateCsvValues() as $line) { /** ... */ } 28 30
  14. // 1st solution : The "classical" way function getCsvValues(): array

    { $handle = \fopen('file.csv', 'r'); $data = []; // Read everything and store it in memory while (false !== $line = \fgetcsv($handle)) { $data[] = $data; } return $data; } foreach (getCsvValues() as $line) { /** ... */ } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 // 2nd solution : The "generator" way 17 function generateCsvValues(): \Generator 18 { 19 $handle = \fopen('file.csv', 'r'); 20 21 // Line by line, yield as soon as it's read 22 while (false !== $data = \fgetcsv($handle)) { 23 yield $data; 24 } 25 } 26 27 foreach (generateCsvValues() as $line) { /** ... */ } 28 // 2nd solution : The "generator" way function generateCsvValues(): \Generator { $handle = \fopen('file.csv', 'r'); // Line by line, yield as soon as it's read while (false !== $data = \fgetcsv($handle)) { yield $data; } } foreach (generateCsvValues() as $line) { /** ... */ } // 1st solution : The "classical" way 1 function getCsvValues(): array 2 { 3 $handle = \fopen('file.csv', 'r'); 4 $data = []; 5 6 // Read everything and store it in memory 7 while (false !== $line = \fgetcsv($handle)) { 8 $data[] = $data; 9 } 10 11 return $data; 12 } 13 14 foreach (getCsvValues() as $line) { /** ... */ } 15 16 17 18 19 20 21 22 23 24 25 26 27 28 30.1
  15. // 1st solution : The "classical" way function getCsvValues(): array

    { $handle = \fopen('file.csv', 'r'); $data = []; // Read everything and store it in memory while (false !== $line = \fgetcsv($handle)) { $data[] = $data; } return $data; } foreach (getCsvValues() as $line) { /** ... */ } // 2nd solution : The "generator" way function generateCsvValues(): \Generator { $handle = \fopen('file.csv', 'r'); // Line by line, yield as soon as it's read while (false !== $data = \fgetcsv($handle)) { yield $data; } } foreach (generateCsvValues() as $line) { /** ... */ } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 If "file.csv" is 1TB large, there's only one winner 31
  16. final class Generator implements Iterator { public function current(): mixed;

    public function getReturn(): mixed; public function key(): mixed; public function next(): void; public function rewind(): void; public function send(mixed $value): mixed; public function throw(Throwable $exception): mixed; public function valid(): bool; public function __wakeup(): void; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Generators are iterators. Literally. 32
  17. class LazyCsvProvider implements \IteratorAggregate { public function __construct(private string $source)

    { } private function fetchData(): \Generator { $handle = \fopen($this->source, 'r'); // Line by line, as soon as it is read while (false !== $data = \fgetcsv($handle)) { yield $data; } } public function getIterator(): \Generator { yield from $this->fetchData(); } } $lazyProvider = new LazyCsvProvider(/** ... */); foreach ($lazyProvider as $files) { // ... } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 33
  18. <?php $values = []; // Creates values between two bounds

    foreach (\range(1, 5) as $value) { $values[] = $value; } echo \implode(', ', $values); 1 2 3 4 5 6 7 8 9 10 1, 2, 3, 4, 5 40
  19. “ For small ranges (around one hundred elements), [...] generators

    are slightly slower than the native implementation, but still faster than the iterator variant. - Nikita Popov, Generator's RFC https://wiki.php.net/rfc/generators 44
  20. class ReflectionGenerator { public function __construct(Generator $generator); public function getExecutingLine():

    int; public function getExecutingFile(): string; public function getTrace(int $options = DEBUG_BACKTRACE_PROVIDE_OBJECT): array; public function getFunction(): ReflectionFunctionAbstract; public function getThis(): ?object; public function getExecutingGenerator(): Generator; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Reflecting Generators 47
  21. Reflecting Generators class ReflectionGenerator { // ... public function getExecutingLine():

    int; public function getExecutingFile(): string; // ... } 1 2 3 4 5 6 7 8 9 10 Generator's "physical" location is actually stored 48
  22. Generators: PHP internals Zend Generator Instructions of the generating function

    Execution context Local variables Current instruction (and a few other things, but you get it) 50
  23. Want more? // Zend/zend_generators.h // ... struct _zend_generator { zend_object

    std; zend_execute_data *execute_data; zend_execute_data *frozen_call_stack; zval value; zval key; zval retval; zval *send_target; zend_long largest_used_integer_key; zval values; zend_generator_node node; zend_execute_data execute_fake; uint8_t flags; }; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 https://github.com/php/php-src/blob/master/Zend/zend_generators.h 51
  24. Symfony Component Example of usage Console Manage your command's output

    Finder Discover directories and files lazily HttpFoundation Create streamed responses for big files HttpClient Responses multiplexing Process Fetch a process output (They're nearly everywhere!) 53
  25. Doctrine use Doctrine\ORM\EntityManagerInterface; public function getProducts(EntityManagerInterface $entityManager) { $qb =

    $entityManager->createQueryBuilder() ->select('p') ->from(Product::class, 'p') ->where(/** ... */) ; foreach ($qb->getQuery()->toIterable() as $product) { yield $product; } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 54
  26. PHPUnit use PHPUnit\Framework\TestCase; class MyTest extends TestCase { /** *

    @dataProvider dataProvider */ public function testSomething(int|float $value) { $this->assertGreaterThan(0, $value); } public function dataProvider() { yield 'Positive set' => [1]; yield 'Float set' => [1.0]; } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 55