Quantum of Data: A data science journey

Quantum of Data: A data science journey

A data science talk given at Python Exposé, Nairobi.

869402f85dcbabcef3da1ee61b88a45a?s=128

Reuben Cummings

April 01, 2017
Tweet

Transcript

  1. Quantum of Data A data science journey Python Exposé •

    Nairobi, Kenya April 1, 2017 by Reuben Cummings
  2. Who am I? Managing Director, Nerevu Development Founder of Arusha

    Coders Author of several popular Python packages reubano on Twitter and GitHub
  3. MISSION DOCID AGENT 0001 00111 CASE OF THE MISSING SHORTCAKE

  4. Ransom Note If you want to see your shortcake again,

    visit bitly.com/ pyexpose for further instructions
  5. EVIDENCE DOCID AGENT 0010 00111 DROPBOX FOLDER WWW.DROPBOX.COM/HOME/EXPOSE/RANSOM

  6. GPG encrypted file encryption key unknown decryptme.txt.gpg readme.txt

  7. To obtain the key, first get the number of attendees

    from the previous meetups. decryptme.txt.gpg readme.txt
  8. HINTS DOCID AGENT 0011 00111 HINT #1 WWW.MEETUP.COM/PYTHON-NAIROBI/EVENTS/PAST

  9. HINTS DOCID AGENT 0011 00111 HINT #1 WWW.MEETUP.COM/PYTHON-NAIROBI/EVENTS/PAST

  10. from html.parser import HTMLParser from itertools import chain def handle_starttag(self,

    tag, attrs): entry = dict(attrs) if entry.get('class') == 'event-rating': self.match = True class AttendanceParser(HTMLParser): def reset(self): HTMLParser.reset(self) self.match = False self.nums = iter([])
  11. from html.parser import HTMLParser from itertools import chain class AttendanceParser(HTMLParser):

    ... def handle_data(self, data): num = data.strip() if self.match and num: self.nums = chain(self.nums, [int(num)]) self.match = False
  12. from urllib.request import urlopen BASE = 'https://www.meetup.com/Python-Nairobi' BASE_URL = '{base}/events/past/?page={page}'

    >>> extract_attendance() def extract_attendance(): parser = AttendanceParser() url = BASE_URL.format(base=BASE, page=0) f = urlopen(url) encoding = f.info().get_content_charset() [parser.feed(line.decode(encoding)) for line in f] return list(parser.nums) [65, 83, 50, 64, 46]
  13. def extract_attendance(): parser = AttendanceParser() # Inner loop to parse

    each line for line in f: parser.feed(line.decode(encoding)) yield from parser.nums # Outer loop to extract each page for page in range(5): url = BASE_URL.format(base=BASE, page=page) f = urlopen(url) encoding = f.info().get_content_charset() >>> len(list(extract_attendance())) 25
  14. Hint #2 This code is available at bitly.com/ pyexpose-attendance python

    extract-attendance.py in a shell, enter the command:
  15. Hint #3 Each number represents the Unicode code point character

    of a password.
  16. Hint #4 chr(i) returns the string representing a character whose

    Unicode code point is the integer i
  17. >>> attendance = list(extract_attendance()) >>> chr(attendance[10]) >>> print(chr(attendance[10])) >>> chr(attendance[10]).isprintable()

    >>> chr(attendance[0]) 'A' '\x11' False
  18. >>> printable = [ ...: chr(x) for x in range(150)

    ...: if chr(x).isprintable()] >>> len(printable) >>> [ ...: (x, chr(x)) for x in range(150) ...: if chr(x).isprintable()] >>> ''.join(printable[num] for num in attendance) [(32, ' '), (33, '!'), (34, '"'), (35, '#')...] 'asR`N_WMH.153F24682579(?7' 95
  19. Hint #5 gpg ransom/decryptme.txt.gpg asR`N_WMH.153F24682579(?7 when prompted, enter the password:

    in a shell, enter the command:
  20. Hint #6 This decrypted message is available at bitly.com/pyexpose-decrypted

  21. Decrypted message Your shortcake is at a cafe in Nairobi

    that shares an object with a snake in this flickr group https://www.flickr.com/groups/1329313@N21/ Find the first photo taken by the most prolific group member in 2017.
  22. EVIDENCE DOCID AGENT 0100 00111 FLICKR GROUP WWW.FLICKR.COM/GROUPS/1329313@N21/

  23. HINTS DOCID AGENT 0101 00111 HINT #6 API.FLICKR.COM/SERVICES/FEEDS/GROUPS_POOL.GNE?ID=1329313@N21

  24. Hint #7 pip install riko in a shell, enter the

    command:
  25. >>> from riko.collections import SyncPipe >>> >>> BASE = 'https://api.flickr.com/services/feeds'

    >>> BASE_URL = '{}/groups_pool.gne?id=1329313@N21' >>> conf = {'url': BASE_URL.format(BASE)} >>> stream = SyncPipe('fetch', conf=conf).output >>> next(stream)
  26. {'author.name': 'Sharon B Mott', 'link': 'https://www.flickr.com/photos/...', 'pubDate': time.struct_time(tm_year=2017, tm_mo,...), 'tags':

    [ {'label': None, 'scheme': 'https://www.flickr.com/photos/tags/', 'term': 'boaconstrictor'}, {'label': None, 'scheme': 'https://www.flickr.com/photos/tags/', 'term': 'boa'}, ... ], 'title': 'Hints of blue', ... }
  27. >>> from datetime import datetime as dt >>> 15 >>>

    stream = ( ...: SyncPipe('fetch', conf=conf) ...: .filter(conf={'rule': rule}) ...: .list) >>> len(stream) >>> rule = { ...: 'field': 'pubDate', ...: 'op': 'after', ...: 'value': dt(2016, 12, 31)}
  28. >>> creators = [ ...: item.get('author.name') for item in stream]

    ['Sharon B Mott', 'baker.cameron43', 'stevekpriest', 'TessaSmits', 'TessaSmits', 'TessaSmits', 'Sharon B Mott', ... ] >>> creators
  29. Hint #8 collections.Counter is a dict subclass for counting hashable

    objects
  30. >>> from collections import Counter >>> Counter({'Jesonis|Photography_On/Off (super busy)': 1,

    'Sabrina Filipiak Vasseur': 3, 'Sharon B Mott': 5, 'TessaSmits': 3, 'baker.cameron43': 2, 'stevekpriest': 1}) >>> c = Counter(creators) >>> c >>> c.most_common(1) [('Sharon B Mott', 5)] >>> top_creator = c.most_common(1)[0][0]
  31. >>> links[-1] 'https://www.flickr.com/photos/ 125407841@N08/32344001285/in/pool-1329313@N21' >>> links = [ ...: item['link']

    for item in stream ...: if item.get('author.name') == top_creator]
  32. Hint #9 This code is available at bitly.com/ pyexpose-flickr python

    get-flickr-link.py in a shell, enter the command:
  33. EVIDENCE DOCID AGENT 0110 00111 FLICKR GROUP PHOTO WWW.FLICKR.COM/PHOTOS/125407841@N08/32344001285/SIZES/L

  34. Hint #10 Your shortcake is at a cafe in Nairobi

    that shares an object with a snake in this flickr group
  35. EVIDENCE DOCID AGENT 0111 00111 GOOGLE MAPS WWW.GOOGLE.CO.KE/MAPS/SEARCH/LEAF+CAFE+NAIROBI/

  36. EVIDENCE DOCID AGENT 0111 00111 GOOGLE MAPS WWW.GOOGLE.CO.KE/MAPS/SEARCH/LEAF+CAFE+NAIROBI/

  37. MISSION DOCID AGENT 0001 00111 CASE OF THE MISSING SHORTCAKE

    SOLVED!
  38. Thank you! Questions? Reuben Cummings @reubano