Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream processing made easy with riko

Stream processing made easy with riko

Interactive workshop on streams and stream processing with the Python library riko

Reuben Cummings

September 29, 2016
Tweet

More Decks by Reuben Cummings

Other Decks in Programming

Transcript

  1. Stream processing made easy with
    riko
    DevCraft - Nairobi, KE
    Sep 29, 2016
    by Reuben Cummings
    @reubano #DevCraftKE

    View Slide

  2. Who am I?
    @reubano #DevCraftKE
    Managing Director, Nerevu
    Development
    Lead organizer of Arusha Coders
    Author of several popular Python
    packages (riko, meza, pygogo)

    View Slide

  3. Topics & Format
    @reubano #DevCraftKE
    data, streams, and stream processing
    code samples and interactive exercises
    hands-on (don't be a spectator)

    View Slide

  4. what is data?

    View Slide

  5. structured unstructured
    Organization
    @reubano #DevCraftKE
    country capital
    Kenya Nairobi
    Tanzania Dodoma
    Rwanda Kigali
    "O God of all
    creation. Bless
    this our land
    and nation.
    Justice be our
    shield..."

    View Slide

  6. binary (hex dump)
    Storage
    @reubano #DevCraftKE
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    hexadecimal
    number

    View Slide

  7. Storage
    @reubano #DevCraftKE
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    binary (hex dump)
    1 byte

    View Slide

  8. Storage
    @reubano #DevCraftKE
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    8 bits
    binary (hex dump)

    View Slide

  9. Storage
    @reubano #DevCraftKE
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    2^8 = 256
    binary (hex dump)

    View Slide

  10. Storage
    @reubano #DevCraftKE
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408
    00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408
    0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408
    0 - 255
    binary (hex dump)

    View Slide

  11. flat/text
    Storage
    @reubano #DevCraftKE
    greeting,loc,rating
    hello,world,3
    good bye,moon,7
    welcome,stars,5
    what's up,sky,2

    View Slide

  12. binary flat/text
    Organization vs Storage
    @reubano #DevCraftKE
    structured
    unstructured
    maasai mara
    hell's gate

    View Slide

  13. sample
    json

    View Slide

  14. [
    {
    "greeting": "hello",
    "location": "world",
    "enthusiasm": 3
    }, {
    "greeting": "good bye",
    "location": "moon",
    "enthusiasm": 7
    }
    ]

    View Slide

  15. what are streams?

    View Slide

  16. >>> stream = 'abracadabra'
    >>> stream[0]
    'a'
    >>> stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    >>> stream[0]
    1
    >>> stream = ['hello', 'devcraft', 'attendees']
    >>> stream[0]
    'hello'
    >>> stream = [
    ... {'num': 0}, {'num': 1}, {'num': 2}]
    >>> stream[0]
    {'num': 0}

    View Slide

  17. how do you
    construct
    streams?

    View Slide

  18. >>> stream = input('--> ')

    View Slide

  19. >>> stream = input('--> ')
    --->

    View Slide

  20. >>> stream = input('--> ')
    ---> abracadabra

    View Slide

  21. >>> stream = input('--> ')
    ---> abracadabra
    >>> s = 'hello devcraft attendees'
    >>> s.split(' ')
    ['hello', 'devcraft', 'attendees']
    >>> list(range(1, 11))
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    >>> stream
    'abracadabra'
    >>> [{'num': x} for x in range(4)]
    [{'num': 0}, {'num': 1}, {'num': 2}, {'num': 3}]

    View Slide

  22. how do you
    process streams?

    View Slide

  23. >>> ints = range(1, 10)
    >>> doubled = [2 * x for x in ints]
    >>> doubled
    [2, 4, 6, 8, 10, 12, 14, 16, 18]
    >>> big = [x for x in doubled if x > 10]
    >>> big
    [12, 14, 16, 18]
    >>> [x / 3 for x in big]
    [4.0, 4.6667, 5.3333, 6.0]
    >>> (x / 3 for x in big)
    at 0x103c10830>
    >>> next(x / 3 for x in big)
    4.0

    View Slide

  24. so what!

    View Slide

  25. RSS feeds (feedly)

    View Slide

  26. aggregators (kayak)

    View Slide

  27. mashups (portwiture)

    View Slide

  28. introducing riko
    github.com/nerevu/riko

    View Slide

  29. let's get some
    data

    View Slide

  30. Kenya Open Data (opendata.go.ke)

    View Slide

  31. API access

    View Slide

  32. IPython Demo
    bit.ly/riko-demo
    (examples)

    View Slide

  33. IPython Demo
    bit.ly/riko-demo
    (exercises)

    View Slide

  34. exercise #1

    View Slide

  35. number of schools
    per district

    View Slide

  36. [
    {'BUTERE/MUMIAS': 1},
    {'HOMA BAY': 1},
    {'KIAMBU': 1},
    {'MACHAKOS': 1},
    {'MAKUENI': 1},
    {'MARAGUA': 2},
    {'MBEERE': 1},
    {'MOMBASA': 2},
    {'NAIROBI': 5},
    {'TRANS NZOIA': 1}
    ]

    View Slide

  37. exercise #2

    View Slide

  38. boarding only
    students per
    division

    View Slide

  39. [
    {'ASEGO': Decimal('277')},
    {'BUTERE': Decimal('224')},
    {'DAGORETTI': Decimal('903')},
    {'EMBAKASI': Decimal('138')},
    {'ISLAND': Decimal('14')},
    {'KANDARA': Decimal('74')},
    {'KASIKEU': Decimal('20')},
    {'KIBERA': Decimal('355')},
    {'KIKUYU': Decimal('69')},
    {'KISAUNI': Decimal('424')},
    ...
    ]

    View Slide

  40. exercise #3

    View Slide

  41. create a stream
    process with the
    "joining" example

    View Slide

  42. github.com/reubano/
    devcraft-workshop

    View Slide

  43. github.com/reubano/
    riko

    View Slide

  44. Reuben Cummings
    [email protected]
    https://reubano.github.io
    Thanks!
    @reubano #DevCraftKE

    View Slide