Quantum of Data: A data science journey

Quantum of Data A data science journey Python Exposé •
Nairobi, Kenya April 1, 2017 by Reuben Cummings

Who am I? Managing Director, Nerevu Development Founder of Arusha
Coders Author of several popular Python packages reubano on Twitter and GitHub

MISSION DOCID AGENT 0001 00111 CASE OF THE MISSING SHORTCAKE

Ransom Note If you want to see your shortcake again,
visit bitly.com/ pyexpose for further instructions

EVIDENCE DOCID AGENT 0010 00111 DROPBOX FOLDER WWW.DROPBOX.COM/HOME/EXPOSE/RANSOM

GPG encrypted ﬁle encryption key unknown decryptme.txt.gpg readme.txt

To obtain the key, ﬁrst get the number of attendees
from the previous meetups. decryptme.txt.gpg readme.txt

HINTS DOCID AGENT 0011 00111 HINT #1 WWW.MEETUP.COM/PYTHON-NAIROBI/EVENTS/PAST

from html.parser import HTMLParser from itertools import chain def handle_starttag(self,
tag, attrs): entry = dict(attrs) if entry.get('class') == 'event-rating': self.match = True class AttendanceParser(HTMLParser): def reset(self): HTMLParser.reset(self) self.match = False self.nums = iter([])

from html.parser import HTMLParser from itertools import chain class AttendanceParser(HTMLParser):
... def handle_data(self, data): num = data.strip() if self.match and num: self.nums = chain(self.nums, [int(num)]) self.match = False

from urllib.request import urlopen BASE = 'https://www.meetup.com/Python-Nairobi' BASE_URL = '{base}/events/past/?page={page}'
>>> extract_attendance() def extract_attendance(): parser = AttendanceParser() url = BASE_URL.format(base=BASE, page=0) f = urlopen(url) encoding = f.info().get_content_charset() [parser.feed(line.decode(encoding)) for line in f] return list(parser.nums) [65, 83, 50, 64, 46]

def extract_attendance(): parser = AttendanceParser() # Inner loop to parse
each line for line in f: parser.feed(line.decode(encoding)) yield from parser.nums # Outer loop to extract each page for page in range(5): url = BASE_URL.format(base=BASE, page=page) f = urlopen(url) encoding = f.info().get_content_charset() >>> len(list(extract_attendance())) 25

Hint #2 This code is available at bitly.com/ pyexpose-attendance python
extract-attendance.py in a shell, enter the command:

Hint #3 Each number represents the Unicode code point character
of a password.

Hint #4 chr(i) returns the string representing a character whose
Unicode code point is the integer i

>>> attendance = list(extract_attendance()) >>> chr(attendance[10]) >>> print(chr(attendance[10])) >>> chr(attendance[10]).isprintable()
>>> chr(attendance[0]) 'A' '\x11' False

>>> printable = [ ...: chr(x) for x in range(150)
...: if chr(x).isprintable()] >>> len(printable) >>> [ ...: (x, chr(x)) for x in range(150) ...: if chr(x).isprintable()] >>> ''.join(printable[num] for num in attendance) [(32, ' '), (33, '!'), (34, '"'), (35, '#')...] 'asR`N_WMH.153F24682579(?7' 95

Hint #5 gpg ransom/decryptme.txt.gpg asR`N_WMH.153F24682579(?7 when prompted, enter the password:
in a shell, enter the command:

Hint #6 This decrypted message is available at bitly.com/pyexpose-decrypted

Decrypted message Your shortcake is at a cafe in Nairobi
that shares an object with a snake in this flickr group https://www.flickr.com/groups/1329313@N21/ Find the first photo taken by the most prolific group member in 2017.

EVIDENCE DOCID AGENT 0100 00111 FLICKR GROUP WWW.FLICKR.COM/GROUPS/1329313@N21/

HINTS DOCID AGENT 0101 00111 HINT #6 API.FLICKR.COM/SERVICES/FEEDS/GROUPS_POOL.GNE?ID=1329313@N21

Hint #7 pip install riko in a shell, enter the
command:

>>> from riko.collections import SyncPipe >>> >>> BASE = 'https://api.flickr.com/services/feeds'
>>> BASE_URL = '{}/groups_pool.gne?id=1329313@N21' >>> conf = {'url': BASE_URL.format(BASE)} >>> stream = SyncPipe('fetch', conf=conf).output >>> next(stream)

{'author.name': 'Sharon B Mott', 'link': 'https://www.flickr.com/photos/...', 'pubDate': time.struct_time(tm_year=2017, tm_mo,...), 'tags':
[ {'label': None, 'scheme': 'https://www.flickr.com/photos/tags/', 'term': 'boaconstrictor'}, {'label': None, 'scheme': 'https://www.flickr.com/photos/tags/', 'term': 'boa'}, ... ], 'title': 'Hints of blue', ... }

>>> from datetime import datetime as dt >>> 15 >>>
stream = ( ...: SyncPipe('fetch', conf=conf) ...: .filter(conf={'rule': rule}) ...: .list) >>> len(stream) >>> rule = { ...: 'field': 'pubDate', ...: 'op': 'after', ...: 'value': dt(2016, 12, 31)}

>>> creators = [ ...: item.get('author.name') for item in stream]
['Sharon B Mott', 'baker.cameron43', 'stevekpriest', 'TessaSmits', 'TessaSmits', 'TessaSmits', 'Sharon B Mott', ... ] >>> creators

Hint #8 collections.Counter is a dict subclass for counting hashable
objects

>>> from collections import Counter >>> Counter({'Jesonis|Photography_On/Off (super busy)': 1,
'Sabrina Filipiak Vasseur': 3, 'Sharon B Mott': 5, 'TessaSmits': 3, 'baker.cameron43': 2, 'stevekpriest': 1}) >>> c = Counter(creators) >>> c >>> c.most_common(1) [('Sharon B Mott', 5)] >>> top_creator = c.most_common(1)[0][0]

>>> links[-1] 'https://www.flickr.com/photos/ 125407841@N08/32344001285/in/pool-1329313@N21' >>> links = [ ...: item['link']
for item in stream ...: if item.get('author.name') == top_creator]

Hint #9 This code is available at bitly.com/ pyexpose-ﬂickr python
get-flickr-link.py in a shell, enter the command:

EVIDENCE DOCID AGENT 0110 00111 FLICKR GROUP PHOTO WWW.FLICKR.COM/PHOTOS/125407841@N08/32344001285/SIZES/L

Hint #10 Your shortcake is at a cafe in Nairobi
that shares an object with a snake in this ﬂickr group

EVIDENCE DOCID AGENT 0111 00111 GOOGLE MAPS WWW.GOOGLE.CO.KE/MAPS/SEARCH/LEAF+CAFE+NAIROBI/

MISSION DOCID AGENT 0001 00111 CASE OF THE MISSING SHORTCAKE
SOLVED!

Thank you! Questions? Reuben Cummings @reubano

Quantum of Data: A data science journey

Quantum of Data: A data science journey

More Decks by Reuben Cummings

Other Decks in Programming

Featured

Transcript