Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Functional Programming Approach To Data Processing In Python

A Functional Programming Approach To Data Processing In Python

LambdaConf Workshop on Functional Programming with Python

Reuben Cummings

May 26, 2017
Tweet

More Decks by Reuben Cummings

Other Decks in Programming

Transcript

  1. A FUNCTIONAL PROGRAMMING
    APPROACH TO
    LambdaConf — Boulder, Colorado — May 26, 2017
    DATA PROCESSING IN PYTHON
    By Reuben Cummings

    View Slide

  2. Reuben Cummings λ @reubano λ #LambdaConf
    Who am I?
    Managing Director, Nerevu Development
    Founder, Arusha Coders
    Author of several popular Python packages

    View Slide

  3. WHAT IS DATA?
    I dare you, I double dare you!
    Image Credit: www.emaze.com
    SAY BIG DATA
    ONE MORE TIME

    View Slide

  4. Reuben Cummings λ @reubano λ #LambdaConf
    Organization
    language presenter
    mercury alex
    scala gleb
    haskell michael
    "This session
    seeks to
    entertain and
    teach the
    developer who
    is already..."
    structured unstructured

    View Slide

  5. Reuben Cummings λ @reubano λ #LambdaConf
    Storage
    type,duration
    leap,360
    hop,120
    de novo,60
    inspire,10
    00103e0 b0e6 04...
    00105f0 e4e7 03...
    0010600 0be8 04...
    00105b0 c4e4 02...
    00106e0 b0e9 04...
    flat/text binary

    View Slide

  6. Reuben Cummings λ @reubano λ #LambdaConf
    Organization vs Storage
    flat/text
    binary
    structured
    unstructured

    View Slide

  7. What is it
    data processing
    good for?

    View Slide

  8. Spotify's Discovery
    Weekly
    playlist of new songs you like
    adapts to user's shifting
    musical tastes
    handles outliers and
    seasonality Image Credit: www.spotify.com/int/discoverweekly/

    View Slide

  9. What is
    functional
    programming
    good for?

    View Slide

  10. Om
    rapid UI re-renders
    serializable application state
    time travel/undo
    Image Credit: circleci.com

    View Slide

  11. A BRIEF INTRO TO PYTHON
    Ooouuu, stickers!
    Image Credit: www.pythongear.com

    View Slide

  12. Reuben Cummings λ @reubano λ #LambdaConf
    Presentation: GitHub repo
    github.com/reubano/lambdaconf-
    tutorial

    View Slide

  13. Reuben Cummings λ @reubano λ #LambdaConf
    Presentation: Jupyter Notebook
    beta.mybinder.org/v2/gh/reubano/
    lambdaconf-tutorial/master
    (presentation.ipybn)

    View Slide

  14. Naive
    Image Credit: (Alika Seu) www.flickr.com

    View Slide

  15. Reading data

    View Slide

  16. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data (naive)
    from urllib.request import urlopen
    from json import loads
    BASE = 'https://api.github.com/search'
    _url1 = '{}/repositories?q={}'
    q = 'data&per_page=100'
    url1 = _url1.format(BASE, q)
    f = urlopen(url1)

    View Slide

  17. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data (naive)
    from urllib.request import urlopen
    from json import loads
    BASE = 'https://api.github.com/search'
    _url1 = '{}/repositories?q={}'
    q = 'data&per_page=100'
    url1 = _url1.format(BASE, q)
    f = urlopen(url1)

    View Slide

  18. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data (naive)
    from urllib.request import urlopen
    from json import loads
    BASE = 'https://api.github.com/search'
    _url1 = '{}/repositories?q={}'
    q = 'data&per_page=100'
    url1 = _url1.format(BASE, q)
    f = urlopen(url1)

    View Slide

  19. GitHub API
    Image Credit: https://api.github.com/search/repositories?q=data

    View Slide

  20. Reuben Cummings λ @reubano λ #LambdaConf
    >>> data = loads(f.read().decode('utf-8'))
    Reading data (naive)

    View Slide

  21. Reuben Cummings λ @reubano λ #LambdaConf
    >>> repos = data['items']
    >>> repos[0]['description']
    'Jargon from the functional programming world
    in simple terms!'
    >>> repos[0]['full_name']
    'hemanth/functional-programming-jargon'
    >>> data = loads(f.read().decode('utf-8'))
    Reading data (naive)

    View Slide

  22. Processing data

    View Slide

  23. Reuben Cummings λ @reubano λ #LambdaConf
    Processing data (naive)
    def rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated

    View Slide

  24. Reuben Cummings λ @reubano λ #LambdaConf
    Processing data (naive)
    def rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated

    View Slide

  25. Reuben Cummings λ @reubano λ #LambdaConf
    Processing data (naive)
    def rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated

    View Slide

  26. Reuben Cummings λ @reubano λ #LambdaConf
    Processing data (naive)
    def rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated

    View Slide

  27. Reuben Cummings λ @reubano λ #LambdaConf
    Processing data (naive)
    >>> rate(repos)[:5]
    [36520, 30174, 28576, 26842, 24092]

    View Slide

  28. Reuben Cummings λ @reubano λ #LambdaConf
    >>> from itertools import count
    >>>
    >>> inf_repos = (
    ... {'watchers': c} for c in count())
    Processing infinite data (naive)

    View Slide

  29. Reuben Cummings λ @reubano λ #LambdaConf
    >>> from itertools import count
    >>>
    >>> inf_repos = (
    ... {'watchers': c} for c in count())
    >>>
    >>> rate(inf_repos)
    Processing infinite data (naive)

    View Slide

  30. Reuben Cummings λ @reubano λ #LambdaConf
    KeyboardInterrupt
    Traceback (most recent call last)
    in ()
    >>> from itertools import count
    >>>
    >>> inf_repos = (
    ... {'watchers': c} for c in count())
    >>>
    >>> rate(inf_repos)
    Processing infinite data (naive)

    View Slide

  31. Reuben Cummings λ @reubano λ #LambdaConf
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated
    def rate(repos):
    Processing expensive data (naive)

    View Slide

  32. Reuben Cummings λ @reubano λ #LambdaConf
    def exp_rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated
    Processing expensive data (naive)

    View Slide

  33. Reuben Cummings λ @reubano λ #LambdaConf
    from time import sleep
    def exp_rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated
    Processing expensive data (naive)

    View Slide

  34. Reuben Cummings λ @reubano λ #LambdaConf
    from time import sleep
    def exp_rate(repos):
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated
    sleep(5)
    Processing expensive data (naive)

    View Slide

  35. Reuben Cummings λ @reubano λ #LambdaConf
    >>> exp_rate(repos)[:5]
    Processing expensive data (naive)

    View Slide

  36. Reuben Cummings λ @reubano λ #LambdaConf
    [36520, 30174, 28576, 26842, 24092]
    >>> exp_rate(repos)[:5]
    Processing expensive data (naive)

    View Slide

  37. Lazy evaluation
    Image Credit: (Mark Turnauckas) www.flickr.com

    View Slide

  38. Lazy intro

    View Slide

  39. Reuben Cummings λ @reubano λ #LambdaConf
    >>> next(lazy_list)
    0
    >>> eager_list = list(range(5))
    >>> eager_list
    [0, 1, 2, 3, 4]
    >>> lazy_list = iter(eager_list)
    >>> lazy_list

    Iterators

    View Slide

  40. Reuben Cummings λ @reubano λ #LambdaConf
    >>> next(lazy_list)
    StopIteration
    Traceback (most recent call last)
    in ()
    ----> 1 next(lazy_list)
    Iterators
    >>> list(lazy_list)
    [1, 2, 3, 4]

    View Slide

  41. Reading data

    View Slide

  42. Reuben Cummings λ @reubano λ #LambdaConf
    $ pip install ijson
    Reading data (lazy evaluation)

    View Slide

  43. Reuben Cummings λ @reubano λ #LambdaConf
    >>> from ijson import items
    >>>
    >>> f = urlopen(url1)
    >>> repos = items(f, 'items.item')
    >>> repos

    >>> repo = next(repos)
    >>> repo['full_name']
    'hemanth/functional-programming-jargon'
    Reading data (lazy evaluation)

    View Slide

  44. Processing data

    View Slide

  45. Reuben Cummings λ @reubano λ #LambdaConf
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated
    def rate(repos):
    Processing data (lazy evaluation)

    View Slide

  46. Reuben Cummings λ @reubano λ #LambdaConf
    rated = []
    for repo in repos:
    rated.append(repo['watchers'] * 2)
    return rated
    def gen_rates(repos):
    Processing data (lazy evaluation)

    View Slide

  47. Reuben Cummings λ @reubano λ #LambdaConf
    def gen_rates(repos):
    for repo in repos:
    yield repo['watchers'] * 2
    Processing data (lazy evaluation)

    View Slide

  48. Reuben Cummings λ @reubano λ #LambdaConf
    >>> rates = gen_rates(repos)
    >>> next(rates)
    36520
    >>> next(rates)
    30174
    >>> gen_rates(repos)

    Processing data (lazy evaluation)

    View Slide

  49. Reuben Cummings λ @reubano λ #LambdaConf
    Processing infinite data
    (lazy evaluation)
    >>> rates = gen_rates(inf_repos)
    >>> next(rates)
    42220156

    View Slide

  50. Reuben Cummings λ @reubano λ #LambdaConf
    Processing expensive data
    (lazy evaluation)
    def gen_exp_rates(repos):
    for repo in repos:
    sleep(5)
    yield repo['watchers'] * 2

    View Slide

  51. Reuben Cummings λ @reubano λ #LambdaConf
    Processing expensive data
    (lazy evaluation)
    def gen_exp_rates(repos):
    for repo in repos:
    sleep(5)
    yield repo['watchers'] * 2

    View Slide

  52. Reuben Cummings λ @reubano λ #LambdaConf
    >>> list(result)
    >>> from itertools import islice
    >>>
    >>> rates = gen_exp_rates(repos)
    >>> result = islice(rates, 5)
    Processing expensive data
    (lazy evaluation)

    View Slide

  53. Reuben Cummings λ @reubano λ #LambdaConf
    [36520, 30174, 28576, 26842, 24092]
    >>> list(result)
    >>> from itertools import islice
    >>>
    >>> rates = gen_exp_rates(repos)
    >>> result = islice(rates, 5)
    Processing expensive data
    (lazy evaluation)

    View Slide

  54. Reuben Cummings λ @reubano λ #LambdaConf
    >>> from itertools import islice
    >>>
    >>> rates = gen_exp_rates(repos)
    >>> result = islice(rates, 5)
    >>> list(result)
    [36520, 30174, 28576, 26842, 24092]
    >>> next(rates)
    648
    Processing expensive data
    (lazy evaluation)

    View Slide

  55. Grouping data

    View Slide

  56. Reuben Cummings λ @reubano λ #LambdaConf
    Grouping data
    >>> f = urlopen(url1)
    >>> repos = items(f, 'items.item')
    >>> repo = next(repos)
    >>> repo.keys()
    dict_keys(['id', 'name', 'full_name', 'owner',
    'private', 'html_url',
    'description', 'fork', 'url',
    'forks_url', 'keys_url', ...])

    View Slide

  57. Reuben Cummings λ @reubano λ #LambdaConf
    Grouping data
    >>> repo['has_issues']
    True

    View Slide

  58. Reuben Cummings λ @reubano λ #LambdaConf
    Grouping data
    >>> import itertools as it
    >>> from operator import itemgetter
    >>>
    >>> keyfunc = itemgetter('has_issues')
    >>> sorted_repos = sorted(repos, key=keyfunc)
    >>> grouped = it.groupby(
    ... sorted_repos, keyfunc)
    >>> data = (
    ... (k, len(list(g))) for k, g in grouped)

    View Slide

  59. Reuben Cummings λ @reubano λ #LambdaConf
    Grouping data
    >>> next(data)
    (False, 3)
    >>> next(data)
    (True, 96)

    View Slide

  60. Memoization
    Image Credit: (olho wodzynski) www.flickr.com

    View Slide

  61. Processing data

    View Slide

  62. Reuben Cummings λ @reubano λ #LambdaConf
    def gen_exp_rates(repos):
    for repo in repos:
    sleep(5)
    yield repo['watchers'] * 2
    Processing expensive data (memoization)

    View Slide

  63. Reuben Cummings λ @reubano λ #LambdaConf
    def calc_rate(watchers):
    sleep(5)
    return watchers * 2
    def gen_exp_rates(repos):
    for repo in repos:
    yield calc_rate(repo['watchers'])
    Processing expensive data (memoization)

    View Slide

  64. Reuben Cummings λ @reubano λ #LambdaConf
    def _calc_rate(watchers):
    cacher = lru_cache()
    calc_rate = cacher(_calc_rate)
    from functools import lru_cache
    sleep(5)
    return watchers * 2
    Processing expensive data (memoization)

    View Slide

  65. Reuben Cummings λ @reubano λ #LambdaConf
    @lru_cache()
    from functools import lru_cache
    def calc_rate(watchers):
    sleep(5)
    return watchers * 2
    def gen_exp_rates(repos):
    for repo in repos:
    yield calc_rate(repo['watchers'])
    Processing expensive data (memoization)

    View Slide

  66. Reuben Cummings λ @reubano λ #LambdaConf
    [10, 10, 10, 10, 10]
    >>> list(result)
    >>> repos = it.repeat({'watchers': 5})
    >>> rates = gen_exp_rates(repos)
    >>> result = islice(rates, 5)
    Processing expensive data (memoization)

    View Slide

  67. EXERCISE #1
    Mount Meru — Arusha, Tanzania
    Image Credit: Reuben Cummings

    View Slide

  68. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #1: Problem
    display the total # of
    watchers per language
    (ignore repos w/o a language)

    View Slide

  69. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #1: Result
    C# 32
    C++ 63
    HTML 349
    JavaScript 3881
    Jupyter Notebook 5481
    PHP 201
    Python 37007
    R 18

    View Slide

  70. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #1: Data source
    https://api.github.com/search/
    repositories?q=data

    View Slide

  71. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #1: Jupyter Notebook
    beta.mybinder.org/v2/gh/reubano/
    lambdaconf-tutorial/master
    (exercises.ipybn)

    View Slide

  72. Reuben Cummings λ @reubano λ #LambdaConf
    from urllib.request import urlopen
    from itertools import groupby
    from operator import itemgetter
    from ijson import items
    url2 = '{}/repositories?q=data'.format(BASE)
    f = urlopen(url2)
    repos = items(f, 'items.item')
    Exercise #1: Solution

    View Slide

  73. Reuben Cummings λ @reubano λ #LambdaConf
    keyfunc = itemgetter('language')
    cleaned = filter(keyfunc, repos)
    records = sorted(cleaned, key=keyfunc)
    grouped = groupby(records, keyfunc)
    for key, group in grouped:
    cnt = sum(g['watchers'] for g in group)
    print(key, cnt)
    Exercise #1: Solution

    View Slide

  74. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #1: Solution
    beta.mybinder.org/v2/gh/reubano/
    lambdaconf-tutorial/master
    (solutions.ipybn)

    View Slide

  75. INTRODUCING MEZA
    Because you might not need Pandas
    Image Credit: github.com/reubano/meza

    View Slide

  76. Reuben Cummings λ @reubano λ #LambdaConf
    $ pip install meza
    Meza demo

    View Slide

  77. Reuben Cummings λ @reubano λ #LambdaConf
    Meza demo: Jupyter Notebook
    beta.mybinder.org/v2/gh/reubano/
    lambdaconf-tutorial/master
    (presentation.ipybn)

    View Slide

  78. Reading data

    View Slide

  79. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> from urllib.request import urlopen
    >>> from meza.io import read_json
    >>>
    >>> f = urlopen(url2)
    >>> records = read_json(f, path='items.item')
    >>> repo = next(records)
    >>> repo['full_name']
    'emberjs/data'

    View Slide

  80. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> len(list(records))
    29

    View Slide

  81. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> from io import StringIO
    >>> from meza.io import read_csv
    >>>
    >>> f = StringIO(
    ... 'greeting,location\nhello,world\n')
    >>>
    >>> next(read_csv(f))
    {'greeting': 'hello', 'location': 'world'}

    View Slide

  82. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> from os import path as p
    >>> from meza.io import join
    >>>
    >>> url3 = '{}&page=2'.format(url2)
    >>> files = map(urlopen, [url2, url3])
    >>> records = join(
    ... *files, ext='json', path='items.item')

    View Slide

  83. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> repo = next(records)
    >>> repo['full_name']
    'emberjs/data'

    View Slide

  84. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> repo = next(records)
    >>> repo['full_name']
    'emberjs/data'
    >>> repo['language']
    'JavaScript'

    View Slide

  85. Reuben Cummings λ @reubano λ #LambdaConf
    Reading data
    >>> repo = next(records)
    >>> repo['full_name']
    'emberjs/data'
    >>> repo['language']
    'JavaScript'
    >>> len(list(records))
    59

    View Slide

  86. Transforming data

    View Slide

  87. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> from meza.process import merge
    >>>
    >>> records = [
    ... {'a': 200}, {'b': 300}, {'c': 400}]
    >>>
    >>> merge(records)
    {'a': 200, 'b': 300, 'c': 400}

    View Slide

  88. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> from meza.process import group
    >>>
    >>> records = [
    ... {'item': 'a', 'amount': 200},
    ... {'item': 'a', 'amount': 200},
    ... {'item': 'b', 'amount': 400}]
    >>>
    >>> grouped = group(records, 'item')

    View Slide

  89. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> key, _group = next(grouped)
    >>> key
    'a'
    >>> _group
    [{'amount': 200, 'item': 'a'},
    {'amount': 200, 'item': 'a'}]

    View Slide

  90. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> from meza import process as pr
    >>>
    >>> f = urlopen(url2)
    >>> raw = read_json(f, path='items.item')
    >>> fields = [
    ... 'full_name', 'language', 'watchers',
    ... 'score', 'has_wiki']
    >>>
    >>> cut = pr.cut(raw, fields)

    View Slide

  91. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> cut
    . at
    0x10b0410f8>
    >>> cut, preview = pr.peek(cut)
    >>> cut

    >>> len(preview)
    5

    View Slide

  92. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> preview[0]
    {'full_name': 'substance/data',
    'has_wiki': True,
    'language': 'JavaScript',
    'score': Decimal('72.90926'),
    'watchers': 678}

    View Slide

  93. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> filled = pr.fillempty(
    ... raw, value='', fields=['language'])
    >>>
    >>> pivoted = pr.pivot(
    ... filled, 'score', 'language',
    ... rows=['has_wiki'], op=min)

    View Slide

  94. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> next(pivoted)
    {'HTML': Decimal('73.52254'),
    'JavaScript': Decimal('53.48755'),
    'PHP': Decimal('41.3122'),
    'Python': Decimal('42.49319'),
    'has_wiki': False}

    View Slide

  95. Reuben Cummings λ @reubano λ #LambdaConf
    Transforming data
    >>> next(pivoted)
    {'': Decimal('44.83392'),
    'C#': Decimal('47.793495'),
    'HTML': Decimal('69.20008'),
    'JavaScript': Decimal('70.15174'),
    'PHP': Decimal('44.251198'),
    'Python': Decimal('45.78215'),
    'R': Decimal('46.23451'),
    'has_wiki': True}

    View Slide

  96. Reuben Cummings λ @reubano λ #LambdaConf
    | full_name | language | score | has_wiki |
    | --------- | ---------- | ------ | -------- |
    | 'aptnote…' | '' | 76.11… | True |
    | 'GSA/dat…' | 'HTML' | 73.52… | False |
    | 'substan…' | 'JavaScr…' | 72.83… | True |
    | 'GoogleT…' | 'JavaScr…' | 70.15… | True |
    | 'curran/…' | 'HTML' | 69.20… | True |
    Transforming data (before)

    View Slide

  97. Reuben Cummings λ @reubano λ #LambdaConf
    | has_wiki | '' | HTML | JavaScript |
    | -------- | -------- | -------- | ---------- |
    | False | | 73.52254 | |
    | True | 76.11933 | 69.20008 | 70.15174 |
    Transforming data (after)

    View Slide

  98. EXERCISE #2
    Image Credit: Reuben Cummings
    Mount Kilimanjaro — Kilimanjaro Region, Tanzania

    View Slide

  99. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #2: Problem
    display the language with the
    most # of watchers per
    owner_type per has_pages

    View Slide

  100. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #2: Result (partial)
    {'has_pages': True,
    'language': 'JavaScript',
    'owner_type': 'Organization',
    'watchers': 128605}

    View Slide

  101. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #2: Data source
    https://api.github.com/search/
    repositories?
    q=data&sort=stars&order=desc

    View Slide

  102. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #2: Hint
    from meza.fntools import flatten
    # and one of the following
    from meza.process import normalize # this
    from meza.process import aggregate # or this

    View Slide

  103. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #2: Jupyter Notebook
    beta.mybinder.org/v2/gh/reubano/
    lambdaconf-tutorial/master
    (exercises.ipybn)

    View Slide

  104. Reuben Cummings λ @reubano λ #LambdaConf
    from urllib.request import urlopen
    from operator import itemgetter
    from functools import partial
    from meza import process as pr, fntools as ft
    from meza.io import read_json
    q = 'data&sort=stars&order=desc'
    url4 = '{}/repositories?q={}'.format(BASE, q)
    f = urlopen(url4)
    Exercise #2: Solution

    View Slide

  105. Reuben Cummings λ @reubano λ #LambdaConf
    records = read_json(f, path='items.item')
    filled = pr.fillempty(
    records, value='', fields=['language'])
    flat = (dict(ft.flatten(r)) for r in filled)
    args = ('watchers', 'language')
    rows = ['has_pages', 'owner_type']
    Exercise #2: Solution

    View Slide

  106. Reuben Cummings λ @reubano λ #LambdaConf
    spun = pr.pivot(
    flat, *args, rows=rows, op=sum)
    spun, preview = pr.peek(spun)
    Exercise #2: Solution

    View Slide

  107. Reuben Cummings λ @reubano λ #LambdaConf
    >>> preview[0]
    {'C#': 7675,
    'C++': 55602,
    'Go': 13223,
    'Objective-C': 10556,

    'has_pages': False,
    'owner_type': 'Organization'}
    Exercise #2: Solution

    View Slide

  108. Reuben Cummings λ @reubano λ #LambdaConf
    >>> kw = {'rows': rows, 'invert': True}
    >>> normal = pr.normalize(spun, *args, **kw)
    >>> normal, preview = pr.peek(normal)
    >>> preview[0]
    {'has_pages': False,
    'language': 'Objective-C',
    'owner_type': 'Organization',
    'watchers': 10556}
    Exercise #2: Solution

    View Slide

  109. Reuben Cummings λ @reubano λ #LambdaConf
    akeyfunc = itemgetter('watchers')
    gkeyfunc = lambda x: tuple(x[r] for r in rows)
    aggregator = partial(max, key=akeyfunc)
    kwargs = {
    'tupled': False, 'aggregator': aggregator}
    grouped = pr.group(normal, gkeyfunc, **kwargs)
    Exercise #2: Solution

    View Slide

  110. Reuben Cummings λ @reubano λ #LambdaConf
    >>> grouped, preview = pr.peek(grouped)
    >>> preview[0]
    {'has_pages': False,
    'language': 'C++',
    'owner_type': 'Organization',
    'watchers': 55602}
    Exercise #2: Solution

    View Slide

  111. Reuben Cummings λ @reubano λ #LambdaConf
    sgrouped = sorted(
    grouped, key=akeyfunc, reverse=True)
    for record in sgrouped:
    print(record)
    Exercise #2: Solution

    View Slide

  112. Reuben Cummings λ @reubano λ #LambdaConf
    | language | watchers | owner_ty… | has_pages |
    | -------- | -------- | --------- | --------- |
    | 'JavaS…' | 128605 | 'Organi…' | True |
    | 'C++' | 55602 | 'Organi…' | False |
    | 'Python' | 54269 | 'User' | False |
    | 'Jupyte…'| 12046 | 'User' | True |
    Exercise #2: Result (full)

    View Slide

  113. Reuben Cummings λ @reubano λ #LambdaConf
    Exercise #2: Solution
    beta.mybinder.org/v2/gh/reubano/
    lambdaconf-tutorial/master
    (solutions.ipybn)

    View Slide

  114. Thanks!
    Reuben Cummings
    @reubano

    View Slide

  115. Extra Slides

    View Slide

  116. Reuben Cummings λ @reubano λ #LambdaConf
    def gen_rates(repos):
    for repo in repos:
    yield repo['watchers'] * 2
    Processing data (lazy evaluation)

    View Slide

  117. Reuben Cummings λ @reubano λ #LambdaConf
    def gen_rates(repos):
    return (
    r['watchers'] * 2 for r in repos)
    Processing data (lazy evaluation)

    View Slide

  118. Reuben Cummings λ @reubano λ #LambdaConf
    from urllib.request import urlopen
    from operator import itemgetter
    from functools import partial
    from meza import process as pr, fntools as ft
    from meza.io import read_json
    q = 'data&sort=stars&order=desc'
    url4 = '{}/repositories?q={}'.format(BASE, q)
    f = urlopen(url4)
    Exercise #2: Alt. solution

    View Slide

  119. Reuben Cummings λ @reubano λ #LambdaConf
    records = read_json(f, path='items.item')
    filled = pr.fillempty(
    records, value='', fields=['language'])
    flat = (dict(ft.flatten(r)) for r in filled)
    akeyfunc = itemgetter('watchers')
    Exercise #2: Alt. solution

    View Slide

  120. Reuben Cummings λ @reubano λ #LambdaConf
    rows = [
    'has_pages', 'owner_type', 'language',
    'watchers']
    def grouper(records, rows, aggregator):
    kwargs = {'aggregator': aggregator}
    key = lambda x: tuple(x[r] for r in rows)
    _grouper = partial(pr.group, tupled=False)
    return _grouper(records, key, **kwargs)
    Exercise #2: Alt. solution

    View Slide

  121. Reuben Cummings λ @reubano λ #LambdaConf
    def agg1(records):
    args = (records, 'watchers', sum)
    return pr.aggregate(*args)
    grouped = grouper(flat, rows[:3], agg1)
    agg2 = partial(max, key=akeyfunc)
    regrouped = grouper(grouped, rows[:2], agg2)
    cut = pr.cut(regrouped, rows)
    Exercise #2: Alt. solution

    View Slide

  122. Reuben Cummings λ @reubano λ #LambdaConf
    >>> cut, preview = pr.peek(cut)
    >>> preview[0]
    {'has_pages': False,
    'language': 'C++',
    'owner_type': 'Organization',
    'watchers': 55602}
    Exercise #2: Alt. solution

    View Slide

  123. Reuben Cummings λ @reubano λ #LambdaConf
    sgrouped = sorted(
    cut, key=akeyfunc, reverse=True)
    for record in sgrouped:
    print(record)
    Exercise #2: Alt. solution

    View Slide