Slide 1

Slide 1 text

Quantum of Data A data science journey Python Exposé ● Nairobi, Kenya April 1, 2017 by Reuben Cummings

Slide 2

Slide 2 text

Who am I? Managing Director, Nerevu Development Founder of Arusha Coders Author of several popular Python packages reubano on Twitter and GitHub

Slide 3

Slide 3 text

MISSION DOCID AGENT 0001 00111 CASE OF THE MISSING SHORTCAKE

Slide 4

Slide 4 text

Ransom Note If you want to see your shortcake again, visit bitly.com/ pyexpose for further instructions

Slide 5

Slide 5 text

EVIDENCE DOCID AGENT 0010 00111 DROPBOX FOLDER WWW.DROPBOX.COM/HOME/EXPOSE/RANSOM

Slide 6

Slide 6 text

GPG encrypted file encryption key unknown decryptme.txt.gpg readme.txt

Slide 7

Slide 7 text

To obtain the key, first get the number of attendees from the previous meetups. decryptme.txt.gpg readme.txt

Slide 8

Slide 8 text

HINTS DOCID AGENT 0011 00111 HINT #1 WWW.MEETUP.COM/PYTHON-NAIROBI/EVENTS/PAST

Slide 9

Slide 9 text

HINTS DOCID AGENT 0011 00111 HINT #1 WWW.MEETUP.COM/PYTHON-NAIROBI/EVENTS/PAST

Slide 10

Slide 10 text

from html.parser import HTMLParser from itertools import chain def handle_starttag(self, tag, attrs): entry = dict(attrs) if entry.get('class') == 'event-rating': self.match = True class AttendanceParser(HTMLParser): def reset(self): HTMLParser.reset(self) self.match = False self.nums = iter([])

Slide 11

Slide 11 text

from html.parser import HTMLParser from itertools import chain class AttendanceParser(HTMLParser): ... def handle_data(self, data): num = data.strip() if self.match and num: self.nums = chain(self.nums, [int(num)]) self.match = False

Slide 12

Slide 12 text

from urllib.request import urlopen BASE = 'https://www.meetup.com/Python-Nairobi' BASE_URL = '{base}/events/past/?page={page}' >>> extract_attendance() def extract_attendance(): parser = AttendanceParser() url = BASE_URL.format(base=BASE, page=0) f = urlopen(url) encoding = f.info().get_content_charset() [parser.feed(line.decode(encoding)) for line in f] return list(parser.nums) [65, 83, 50, 64, 46]

Slide 13

Slide 13 text

def extract_attendance(): parser = AttendanceParser() # Inner loop to parse each line for line in f: parser.feed(line.decode(encoding)) yield from parser.nums # Outer loop to extract each page for page in range(5): url = BASE_URL.format(base=BASE, page=page) f = urlopen(url) encoding = f.info().get_content_charset() >>> len(list(extract_attendance())) 25

Slide 14

Slide 14 text

Hint #2 This code is available at bitly.com/ pyexpose-attendance python extract-attendance.py in a shell, enter the command:

Slide 15

Slide 15 text

Hint #3 Each number represents the Unicode code point character of a password.

Slide 16

Slide 16 text

Hint #4 chr(i) returns the string representing a character whose Unicode code point is the integer i

Slide 17

Slide 17 text

>>> attendance = list(extract_attendance()) >>> chr(attendance[10]) >>> print(chr(attendance[10])) >>> chr(attendance[10]).isprintable() >>> chr(attendance[0]) 'A' '\x11' False

Slide 18

Slide 18 text

>>> printable = [ ...: chr(x) for x in range(150) ...: if chr(x).isprintable()] >>> len(printable) >>> [ ...: (x, chr(x)) for x in range(150) ...: if chr(x).isprintable()] >>> ''.join(printable[num] for num in attendance) [(32, ' '), (33, '!'), (34, '"'), (35, '#')...] 'asR`N_WMH.153F24682579(?7' 95

Slide 19

Slide 19 text

Hint #5 gpg ransom/decryptme.txt.gpg asR`N_WMH.153F24682579(?7 when prompted, enter the password: in a shell, enter the command:

Slide 20

Slide 20 text

Hint #6 This decrypted message is available at bitly.com/pyexpose-decrypted

Slide 21

Slide 21 text

Decrypted message Your shortcake is at a cafe in Nairobi that shares an object with a snake in this flickr group https://www.flickr.com/groups/1329313@N21/ Find the first photo taken by the most prolific group member in 2017.

Slide 22

Slide 22 text

EVIDENCE DOCID AGENT 0100 00111 FLICKR GROUP WWW.FLICKR.COM/GROUPS/1329313@N21/

Slide 23

Slide 23 text

HINTS DOCID AGENT 0101 00111 HINT #6 API.FLICKR.COM/SERVICES/FEEDS/GROUPS_POOL.GNE?ID=1329313@N21

Slide 24

Slide 24 text

Hint #7 pip install riko in a shell, enter the command:

Slide 25

Slide 25 text

>>> from riko.collections import SyncPipe >>> >>> BASE = 'https://api.flickr.com/services/feeds' >>> BASE_URL = '{}/groups_pool.gne?id=1329313@N21' >>> conf = {'url': BASE_URL.format(BASE)} >>> stream = SyncPipe('fetch', conf=conf).output >>> next(stream)

Slide 26

Slide 26 text

{'author.name': 'Sharon B Mott', 'link': 'https://www.flickr.com/photos/...', 'pubDate': time.struct_time(tm_year=2017, tm_mo,...), 'tags': [ {'label': None, 'scheme': 'https://www.flickr.com/photos/tags/', 'term': 'boaconstrictor'}, {'label': None, 'scheme': 'https://www.flickr.com/photos/tags/', 'term': 'boa'}, ... ], 'title': 'Hints of blue', ... }

Slide 27

Slide 27 text

>>> from datetime import datetime as dt >>> 15 >>> stream = ( ...: SyncPipe('fetch', conf=conf) ...: .filter(conf={'rule': rule}) ...: .list) >>> len(stream) >>> rule = { ...: 'field': 'pubDate', ...: 'op': 'after', ...: 'value': dt(2016, 12, 31)}

Slide 28

Slide 28 text

>>> creators = [ ...: item.get('author.name') for item in stream] ['Sharon B Mott', 'baker.cameron43', 'stevekpriest', 'TessaSmits', 'TessaSmits', 'TessaSmits', 'Sharon B Mott', ... ] >>> creators

Slide 29

Slide 29 text

Hint #8 collections.Counter is a dict subclass for counting hashable objects

Slide 30

Slide 30 text

>>> from collections import Counter >>> Counter({'Jesonis|Photography_On/Off (super busy)': 1, 'Sabrina Filipiak Vasseur': 3, 'Sharon B Mott': 5, 'TessaSmits': 3, 'baker.cameron43': 2, 'stevekpriest': 1}) >>> c = Counter(creators) >>> c >>> c.most_common(1) [('Sharon B Mott', 5)] >>> top_creator = c.most_common(1)[0][0]

Slide 31

Slide 31 text

>>> links[-1] 'https://www.flickr.com/photos/ 125407841@N08/32344001285/in/pool-1329313@N21' >>> links = [ ...: item['link'] for item in stream ...: if item.get('author.name') == top_creator]

Slide 32

Slide 32 text

Hint #9 This code is available at bitly.com/ pyexpose-flickr python get-flickr-link.py in a shell, enter the command:

Slide 33

Slide 33 text

EVIDENCE DOCID AGENT 0110 00111 FLICKR GROUP PHOTO WWW.FLICKR.COM/PHOTOS/125407841@N08/32344001285/SIZES/L

Slide 34

Slide 34 text

Hint #10 Your shortcake is at a cafe in Nairobi that shares an object with a snake in this flickr group

Slide 35

Slide 35 text

EVIDENCE DOCID AGENT 0111 00111 GOOGLE MAPS WWW.GOOGLE.CO.KE/MAPS/SEARCH/LEAF+CAFE+NAIROBI/

Slide 36

Slide 36 text

EVIDENCE DOCID AGENT 0111 00111 GOOGLE MAPS WWW.GOOGLE.CO.KE/MAPS/SEARCH/LEAF+CAFE+NAIROBI/

Slide 37

Slide 37 text

MISSION DOCID AGENT 0001 00111 CASE OF THE MISSING SHORTCAKE SOLVED!

Slide 38

Slide 38 text

Thank you! Questions? Reuben Cummings @reubano