Save 37% off PRO during our Black Friday Sale! »

Stream processing made easy with riko

Stream processing made easy with riko

Interactive workshop on streams and stream processing with the Python library riko

869402f85dcbabcef3da1ee61b88a45a?s=128

Reuben Cummings

September 29, 2016
Tweet

Transcript

  1. Stream processing made easy with riko DevCraft - Nairobi, KE

    Sep 29, 2016 by Reuben Cummings @reubano #DevCraftKE
  2. Who am I? @reubano #DevCraftKE Managing Director, Nerevu Development Lead

    organizer of Arusha Coders Author of several popular Python packages (riko, meza, pygogo)
  3. Topics & Format @reubano #DevCraftKE data, streams, and stream processing

    code samples and interactive exercises hands-on (don't be a spectator)
  4. what is data?

  5. structured unstructured Organization @reubano #DevCraftKE country capital Kenya Nairobi Tanzania

    Dodoma Rwanda Kigali "O God of all creation. Bless this our land and nation. Justice be our shield..."
  6. binary (hex dump) Storage @reubano #DevCraftKE 00105e0 b0e6 0408 9ee7

    0408 bce7 0408 d5e7 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 hexadecimal number
  7. Storage @reubano #DevCraftKE 00105e0 b0e6 0408 9ee7 0408 bce7 0408

    d5e7 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 binary (hex dump) 1 byte
  8. Storage @reubano #DevCraftKE 00105e0 b0e6 0408 9ee7 0408 bce7 0408

    d5e7 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 8 bits binary (hex dump)
  9. Storage @reubano #DevCraftKE 00105e0 b0e6 0408 9ee7 0408 bce7 0408

    d5e7 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 2^8 = 256 binary (hex dump)
  10. Storage @reubano #DevCraftKE 00105e0 b0e6 0408 9ee7 0408 bce7 0408

    d5e7 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 00105f0 e4e7 0408 b0e6 0408 f0e7 0408 ffe7 0408 00105e0 b0e6 0408 9ee7 0408 bce7 0408 d5e7 0408 0010600 0be8 0408 1ae8 0408 b0e6 0408 b0e6 0408 0 - 255 binary (hex dump)
  11. flat/text Storage @reubano #DevCraftKE greeting,loc,rating hello,world,3 good bye,moon,7 welcome,stars,5 what's

    up,sky,2
  12. binary flat/text Organization vs Storage @reubano #DevCraftKE structured unstructured maasai

    mara hell's gate
  13. sample json

  14. [ { "greeting": "hello", "location": "world", "enthusiasm": 3 }, {

    "greeting": "good bye", "location": "moon", "enthusiasm": 7 } ]
  15. what are streams?

  16. >>> stream = 'abracadabra' >>> stream[0] 'a' >>> stream =

    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> stream[0] 1 >>> stream = ['hello', 'devcraft', 'attendees'] >>> stream[0] 'hello' >>> stream = [ ... {'num': 0}, {'num': 1}, {'num': 2}] >>> stream[0] {'num': 0}
  17. how do you construct streams?

  18. >>> stream = input('--> ')

  19. >>> stream = input('--> ') --->

  20. >>> stream = input('--> ') ---> abracadabra

  21. >>> stream = input('--> ') ---> abracadabra >>> s =

    'hello devcraft attendees' >>> s.split(' ') ['hello', 'devcraft', 'attendees'] >>> list(range(1, 11)) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> stream 'abracadabra' >>> [{'num': x} for x in range(4)] [{'num': 0}, {'num': 1}, {'num': 2}, {'num': 3}]
  22. how do you process streams?

  23. >>> ints = range(1, 10) >>> doubled = [2 *

    x for x in ints] >>> doubled [2, 4, 6, 8, 10, 12, 14, 16, 18] >>> big = [x for x in doubled if x > 10] >>> big [12, 14, 16, 18] >>> [x / 3 for x in big] [4.0, 4.6667, 5.3333, 6.0] >>> (x / 3 for x in big) <generator object <genexpr> at 0x103c10830> >>> next(x / 3 for x in big) 4.0
  24. so what!

  25. RSS feeds (feedly)

  26. aggregators (kayak)

  27. mashups (portwiture)

  28. introducing riko github.com/nerevu/riko

  29. let's get some data

  30. Kenya Open Data (opendata.go.ke)

  31. API access

  32. IPython Demo bit.ly/riko-demo (examples)

  33. IPython Demo bit.ly/riko-demo (exercises)

  34. exercise #1

  35. number of schools per district

  36. [ {'BUTERE/MUMIAS': 1}, {'HOMA BAY': 1}, {'KIAMBU': 1}, {'MACHAKOS': 1},

    {'MAKUENI': 1}, {'MARAGUA': 2}, {'MBEERE': 1}, {'MOMBASA': 2}, {'NAIROBI': 5}, {'TRANS NZOIA': 1} ]
  37. exercise #2

  38. boarding only students per division

  39. [ {'ASEGO': Decimal('277')}, {'BUTERE': Decimal('224')}, {'DAGORETTI': Decimal('903')}, {'EMBAKASI': Decimal('138')}, {'ISLAND':

    Decimal('14')}, {'KANDARA': Decimal('74')}, {'KASIKEU': Decimal('20')}, {'KIBERA': Decimal('355')}, {'KIKUYU': Decimal('69')}, {'KISAUNI': Decimal('424')}, ... ]
  40. exercise #3

  41. create a stream process with the "joining" example

  42. github.com/reubano/ devcraft-workshop

  43. github.com/reubano/ riko

  44. Reuben Cummings reubano@gmail.com https://reubano.github.io Thanks! @reubano #DevCraftKE