Slide 1

Slide 1 text

Iterators & generators: the Python way Luciano Ramalho [email protected] @ramalhoorg

Slide 2

Slide 2 text

@ramalhoorg Iteration: C and Python #include int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%s\n", argv[i]); return 0; } import sys for arg in sys.argv: print arg

Slide 3

Slide 3 text

@ramalhoorg Iteration: Java (classic) class Argumentos { public static void main(String[] args) { for (int i=0; i < args.length; i++) System.out.println(args[i]); } } $ java Argumentos alfa bravo charlie alfa bravo charlie

Slide 4

Slide 4 text

@ramalhoorg Iteration: Java ≥1.5 class Argumentos2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } } $ java Argumentos2 alfa bravo charlie alfa bravo charlie • Enhanced for (for melhorado) since 2004

Slide 5

Slide 5 text

@ramalhoorg Iteration: Java ≥1.5 class Argumentos2 { public static void main(String[] args) { for (String arg : args) System.out.println(arg); } } since 2004 • Enhanced for (for melhorado) import sys for arg in sys.argv: print arg since 1991

Slide 6

Slide 6 text

@ramalhoorg Demo: some iterables • High-level iteration: not limited to built-in types • string • file • XML: ElementTree nodes • Django QuerySet • etc.

Slide 7

Slide 7 text

>>> from django.db import connection >>> q = connection.queries >>> q [] >>> from municipios.models import * >>> res = Municipio.objects.all()[:5] >>> q [] >>> for m in res: print m.uf, m.nome ... GO Abadia de Goiás MG Abadia dos Dourados GO Abadiânia MG Abaeté PA Abaetetuba >>> q [{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id", "municipios_municipio"."uf", "municipios_municipio"."nome", "municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id", "municipios_municipio"."capital", "municipios_municipio"."latitude", "municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM "municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}] In Django, QuerySet is a lazy iterable when the iteration happens, the query is made no database access so far

Slide 8

Slide 8 text

@ramalhoorg The for statement is not the only construct that groks iterables...

Slide 9

Slide 9 text

@ramalhoorg List comprehensions • Expressions that build lists from arbitrary iterables >>> s = 'abracadabra' >>> l = [ord(c) for c in s] >>> l [97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97] any iterable result: always a list ≈ math set notation List comprehension ● Compreensão de lista ou abrangência ● Exemplo: usar todos os elementos: – L2 = [n*10 for n in L]

Slide 10

Slide 10 text

@ramalhoorg Set & dict comprehensions • Expressions that build sets / dicts from arbitrary iterables >>> s = 'abracadabra' >>> {c for c in s} set(['a', 'r', 'b', 'c', 'd']) >>> {c:ord(c) for c in s} {'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100}

Slide 11

Slide 11 text

@ramalhoorg Built-in iterable types • basestring • str • unicode • dict • file • frozenset • list • set • tuple • xrange

Slide 12

Slide 12 text

@ramalhoorg Built-in functions that take iterable arguments • all • any • filter • iter • len • map • max • min • reduce • sorted • sum • zip unrelated to compression

Slide 13

Slide 13 text

@ramalhoorg Syntactic support • Tuple unpacking • parallel assignment • function calls with * >>> def soma(a, b): ... return a + b ... >>> soma(1, 2) 3 >>> t = (3, 4) >>> soma(t) Traceback (most recent call last): File "", line 1, in TypeError: soma() takes exactly 2 arguments (1 given) >>> soma(*t) 7 >>> a, b, c = 'XYZ' >>> a 'X' >>> b 'Y' >>> c 'Z' >>> g = (n for n in [1, 2, 3]) >>> a, b, c = g >>> a 1 >>> b 2 >>> c 3

Slide 14

Slide 14 text

@ramalhoorg A Python iterable is... • An object from which the iter function can produce an iterator • The iter(x) call: • invokes x.__iter__() to obtain an iterator • but, if x has no __iter__: • iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]...

Slide 15

Slide 15 text

@ramalhoorg Train: a sequence of cars train train[0] sequences were called trains in ABC, the language that preceded Python

Slide 16

Slide 16 text

@ramalhoorg Train: a sequence of cars >>> train = Train(4) >>> len(train) 4 >>> train[0] 'car #1' >>> train[3] 'car #4' >>> train[-1] 'car #4' >>> train[4] Traceback (most recent call last): ... IndexError: no car at 4 >>> for car in train: ... print(car) car #1 car #2 car #3 car #4 if __getitem__ exists, iteration “just works”

Slide 17

Slide 17 text

@ramalhoorg Train: a sequence of cars class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key)

Slide 18

Slide 18 text

@ramalhoorg class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence protocol • protocol: a synonym for interface used in dynamic languages like Smalltalk, Python, Ruby... • not declared, and not enforced by static checks __len__ and __getitem__ implement the immutable sequence protocol

Slide 19

Slide 19 text

@ramalhoorg import collections class Train(collections.Sequence): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key): index = key if key >= 0 else self.cars + key if 0 <= index < len(self): # index 2 -> car #3 return 'car #%s' % (index + 1) else: raise IndexError('no car at %s' % key) Sequence ABC • collections.Sequence abstract base class abstract methods

Slide 20

Slide 20 text

@ramalhoorg >>> train = Train(4) >>> 'car #2' in train True >>> 'car #7' in train False >>> for car in reversed(train): ... print(car) car #4 car #3 car #2 car #1 >>> train.index('car #3') 2 Sequence ABC implement __len__ and __getitem__ inherit 5 methods import collections class Train(collections.Sequence def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __getitem__(self, key):

Slide 21

Slide 21 text

@ramalhoorg Iterable ABC • A concrete subclass of Iterable must implement __iter__ • __iter__ returns an Iterator • Iterator must implement a next method • in Python 3: __next__

Slide 22

Slide 22 text

@ramalhoorg Iterator is... • a classic design pattern Design Patterns Gamma, Helm, Johnson & Vlissides Addison-Wesley, ISBN 0-201-63361-2

Slide 23

Slide 23 text

@ramalhoorg Head First Design Patterns Poster O'Reilly, ISBN 0-596-10214-3

Slide 24

Slide 24 text

@ramalhoorg “The Iterator Pattern provides a way to access the elements of an aggregate object sequentially without exposing the underlying representation” Head First Design Patterns Poster O'Reilly, ISBN 0-596-10214-3

Slide 25

Slide 25 text

@ramalhoorg for car in train: • calls iter(train) to obtain a TrainIterator • makes repeated calls to aTrainIterator.__next__() until it raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() next = __next__ # Python 2 compatibility Train with iterator >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3 1 1 2 2

Slide 26

Slide 26 text

@ramalhoorg A Python iterable is... • An object from which the iter function can produce an iterator • The iter(x) call: • invokes x.__iter__() to obtain an iterator • but, if x has no __iter__: • iter makes an iterator which tries to fetch items from x by doing x[0], x[1], x[2]... sequence protocol Iterable interface

Slide 27

Slide 27 text

@ramalhoorg Iteration in C (example 2) #include int main(int argc, char *argv[]) { int i; for(i = 0; i < argc; i++) printf("%d : %s\n", i, argv[i]); return 0; } $ ./args2 alfa bravo charlie 0 : ./args2 1 : alfa 2 : bravo 3 : charlie

Slide 28

Slide 28 text

@ramalhoorg Iteration in Python (ex. 2) import sys for i in range(len(sys.argv)): print i, ':', sys.argv[i] $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie not Pythonic

Slide 29

Slide 29 text

@ramalhoorg Iteration in Python (ex. 2) import sys for i, arg in enumerate(sys.argv): print i, ':', arg $ python args2.py alfa bravo charlie 0 : args2.py 1 : alfa 2 : bravo 3 : charlie this returns a lazy iterable generator the generator yields tuples (index, item) on demand, at each iteration

Slide 30

Slide 30 text

@ramalhoorg Iterator x generator • By definition (GoF) an iterator retrieves successive items from an existing collection • A generator implements the iterator interface but produces items not necessarily in a collection • a generator may iterate over a collection, but return the items decorated in some way • it may also produce items independently of any other data structure (eg. Fibonacci generator)

Slide 31

Slide 31 text

@ramalhoorg Generator function >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> g = gen_123() >>> g >>> g.next() 1 >>> g.next() 2 >>> g.next() 3 >>> g.next() Traceback (most recent call last): ... StopIteration Python 2.x • When invoked, returns a generator object • Generator objects implement the iterator interface: .next (.__next__ in Python 3)

Slide 32

Slide 32 text

@ramalhoorg Generator function >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> g = gen_123() >>> g >>> g.__next__() 1 >>> g.__next__() 2 >>> g.__next__() 3 >>> g.__next__() Traceback (most recent call last): ... StopIteration • When invoked, returns a generator object • Generator objects implement the iterator interface: .next (.__next__ in Python 3) Python 3.x

Slide 33

Slide 33 text

@ramalhoorg Generator function >>> def gen_123(): ... yield 1 ... yield 2 ... yield 3 ... >>> for i in gen_123(): print(i) 1 2 3 >>> g = gen_123() >>> g >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) Traceback (most recent call last): ... StopIteration • When invoked, returns a generator object • Generator objects implement the iterator interface: .next (.__next__ in Python 3) Python ≥ 2.6

Slide 34

Slide 34 text

@ramalhoorg Generator behavior >>> def gen_ab(): ... print('starting...') ... yield 'A' ... print('here comes B:') ... yield 'B' ... print('the end.') ... >>> for s in gen_ab(): print(s) starting... A here comes B: B the end. >>> g = gen_ab() >>> next(g) starting... 'A' >>> next(g) here comes B: 'B' >>> next(g) Traceback (most recent call last): ... StopIteration • Invoking a generator function builds the generator object but does not execute the body of the function

Slide 35

Slide 35 text

@ramalhoorg Generator behavior >>> def gen_ab(): ... print('starting...') ... yield 'A' ... print('here comes B:') ... yield 'B' ... print('the end.') ... >>> for s in gen_ab(): print(s) starting... A here comes B: B the end. >>> g = gen_ab() >>> next(g) starting... 'A' >>> next(g) here comes B: 'B' >>> next(g) Traceback (most recent call last): ... StopIteration • The body is executed only when next is called, and only up to the following yield

Slide 36

Slide 36 text

@ramalhoorg for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the function returns, which raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): # index 2 is car #3 yield 'car #%s' % (i+1) Train with generator function 1 1 2 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3

Slide 37

Slide 37 text

Classic iterator x generator class Train(object): def __init__(self, cars): self.cars = cars def __len__(self): return self.cars def __iter__(self): return TrainIterator(self) class TrainIterator(object): def __init__(self, train): self.train = train self.current = 0 def __next__(self): # Python 3 if self.current < len(self.train): self.current += 1 return 'car #%s' % (self.current) else: raise StopIteration() class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): for i in range(self.cars): yield 'car #%s' % (i+1) 2 classes, 12 lines of code 1 class, 3 lines of code

Slide 38

Slide 38 text

@ramalhoorg Generator expression • When evaluated, returns a generator object >>> g = (n for n in [1, 2, 3]) >>> for i in g: print i ... 1 2 3 >>> g = (n for n in [1, 2, 3]) >>> g at 0x109a4deb0> >>> g.next() 1 >>> g.next() 2 >>> g.next() 3 >>> g.next() Traceback (most recent call last): File "", line 1, in StopIteration

Slide 39

Slide 39 text

@ramalhoorg for car in train: • calls iter(train) to obtain a generator • makes repeated calls to next(generator) until the generator raises StopIteration class Train(object): def __init__(self, cars): self.cars = cars def __iter__(self): return ('car #%s' % (i+1) for i in range(self.cars)) Train with generator expression 1 1 2 2 >>> train = Train(3) >>> for car in train: ... print(car) car #1 car #2 car #3

Slide 40

Slide 40 text

@ramalhoorg Built-in functions that return iterables, iterators or generators • dict • enumerate • frozenset • list • reversed • set • tuple

Slide 41

Slide 41 text

@ramalhoorg • boundless generators • count(), cycle(), repeat() • generators which combine several iterables: • chain(), tee(), izip(), imap(), product(), compress()... • generators which select or group items: • compress(), dropwhile(), groupby(), ifilter(), islice()... • generators producing combinations of items: • product(), permutations(), combinations()... The itertools module

Slide 42

Slide 42 text

@ramalhoorg A practical example using generator functions • Generator functions to decouple reading and writing logic in a database conversion tool designed to handle large datasets https://github.com/ramalho/isis2json

Slide 43

Slide 43 text

@ramalhoorg Main loop writes JSON file

Slide 44

Slide 44 text

@ramalhoorg Another loop reads the input records

Slide 45

Slide 45 text

@ramalhoorg One implementation: same loop reads/writes

Slide 46

Slide 46 text

@ramalhoorg But what if we need to read another format?

Slide 47

Slide 47 text

@ramalhoorg Functions in the script •iterMstRecords* •iterIsoRecords* •writeJsonArray •main * generator functions

Slide 48

Slide 48 text

@ramalhoorg main: read command line arguments

Slide 49

Slide 49 text

@ramalhoorg main: determine input format selected generator function is passed as an argument input generator function is selected based on the input file extension

Slide 50

Slide 50 text

@ramalhoorg writeJsonArray: write JSON records

Slide 51

Slide 51 text

@ramalhoorg writeJsonArray: iterates over one of the input generator functions selected generator function received as an argument... and called to produce input generator

Slide 52

Slide 52 text

@ramalhoorg iterIsoRecords: read records from ISO-2709 format file generator function!

Slide 53

Slide 53 text

@ramalhoorg iterIsoRecords yields one record, structured as a dict creates a new dict in each iteration

Slide 54

Slide 54 text

@ramalhoorg iterMstRecords: read records from ISIS .MST file generator function!

Slide 55

Slide 55 text

@ramalhoorg iterIsoRecords iterMstRecords yields one record, structured as a dict creates a new dict in each iteration

Slide 56

Slide 56 text

@ramalhoorg Generators at work

Slide 57

Slide 57 text

@ramalhoorg Generators at work

Slide 58

Slide 58 text

@ramalhoorg Generators at work

Slide 59

Slide 59 text

@ramalhoorg What we did not cover • sending data into a generator function with the .send() method (instead of .next()), and using yield as an expression to get the data sent • using generator functions as coroutines not very useful in the context of iteration “Coroutines are not related to iteration” David Beazley

Slide 60

Slide 60 text

Q & A Luciano Ramalho [email protected] @ramalhoorg https://github.com/ramalho/isis2json