Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Case Studies of Python in Parallel

Tim McNamara
September 02, 2012

Case Studies of Python in Parallel

Talking about various ways to run Python in a distributed manner, building socket servers by hacking Python's standard library and a very brief introduction to tools like Celery and iPython's cluster management toos.

(An extended version that goes into more detail will likely follow)

Tim McNamara

September 02, 2012
Tweet

More Decks by Tim McNamara

Other Decks in Programming

Transcript

  1. Case Studies in Case Studies in Running Python in Parallel

    Running Python in Parallel Tim McNamara (@timClicks) Tim McNamara (@timClicks) Kiwi Pycon 2012 Kiwi Pycon 2012 Content released under the Creative Commons Content released under the Creative Commons Attribution 3.0 New Zealand licence. Attribution 3.0 New Zealand licence. Original code released under MIT/X11 licence. Original code released under MIT/X11 licence.
  2. Objectives Objectives 1) introduce people to distributed 1) introduce people

    to distributed programming programming 2) peek inside Python's standard library 2) peek inside Python's standard library 3) build community around distributed 3) build community around distributed computing in NZ computing in NZ
  3. Outline Outline 1) introduce a problem 1) introduce a problem

    2) implement several solutions 2) implement several solutions 3) consider trade offs between different 3) consider trade offs between different
  4. Caveat Caveat I will be using the terms parallel, I

    will be using the terms parallel, distributed and concurrent fairly distributed and concurrent fairly interchangeably. This is wrong. interchangeably. This is wrong.
  5. after after Key point: isolation, leading to fault tolerance Key

    point: isolation, leading to fault tolerance error
  6. Doing more than one thing Doing more than one thing

    on the same computer on the same computer
  7. Let Let’ ’s work through a use case together s

    work through a use case together
  8. We are worried about We are worried about government survelliance;

    government survelliance; how many faces from the how many faces from the WWW is it possible to WWW is it possible to scan with $100? scan with $100?
  9. We are worried about We are worried about government survelliance;

    government survelliance; how many faces from the how many faces from the WWW is it possible to WWW is it possible to scan with $100? scan with $100?
  10. def people_in_pages(url): page = get(url) for img_url in extract_img_urls(page): img

    = get(url) for face in detect_faces(img): yield face Where could we do two things at the same time?
  11. def people_in_pages(url): page = get(url) for img_url in extract_img_urls(page): img

    = get(url) for face in detect_faces(img): yield face Which portions of the code are I/O bound?
  12. def people_in_pages(url): page = get(url) for img_url in extract_img_urls(page): img

    = get(url) for face in detect_faces(img): yield face Which portions of the code are CPU bound?
  13. def people_in_pages(url): page = get(url) for img_url in extract_img_urls(page): img

    = get(url) for face in detect_faces(img): yield face Each loop can be done independently
  14. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name)
  15. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Download a page
  16. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Extract URLs with images in them using a good enough regular expression pattern
  17. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Matches h, t, p, s, /, word characters and dots (Redundancy added for readability)
  18. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) That finishes in a useful file extension.
  19. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) From the page contents.
  20. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) While ignoring cases.
  21. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) “cascade” is OpenCV terminology.
  22. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) You can think of it as a model.
  23. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Allocate a memory buffer.
  24. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) The results of re.findall look like ('http://img.com/hi.jpg', 'jpg')
  25. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Download the image, ignoring any problems
  26. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Unless I have caused the problem.
  27. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Write image to a named temporary file
  28. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) (This approach was easier than deserialising the incoming string, decompressing the data and coverting it to an array)
  29. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Set file extension correctly
  30. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) This will enable OpenCV to load it properly later.
  31. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Prevent automatic file deletion when file is closed.
  32. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Load an image using the file's name.
  33. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Editor's Note: This is wrong
  34. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Do facial recognition
  35. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Yield (x, y, height, width)
  36. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Delete the file
  37. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) Delete the file
  38. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name)
  39. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) def get(url)
  40. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) def extract_img_urls(page)
  41. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) def extract_img_urls(page)
  42. import re from tempfile import NamedTemporaryFile as NTF import cv,

    requests def people_in_pages(url): page = requests.get(url).content finds = re.findall('([https://\w\.]+(jpg|png))', page, re.IGNORECASE) cascade = cv.Load('/home/tim/Libs/haarcascade_frontalface_alt.xml') store = cv.CreateMemStorage() for url, extn in finds: try: image = requests.get(url).content except KeyboardInterrupt: raise except: # can't handle relative URLs :( continue # kludge to make things easy for OpenCV with NTF(suffix='.'+extn, delete=False) as f: f.write(image) image = cv.LoadImageM(f.name, iscolor=True) faces = cv.HaarDetectObjects(image, cascade, store) for face in faces: coordinates, neighbours = face yield coordinates f.unlink(f.name) def detect_faces(image)
  43. Introducing Introducing multiprocessing.Pool multiprocessing.Pool from from multiprocessing multiprocessing import import

    Pool, cpu_count Pool, cpu_count P = Pool(cpu_count()-1) P = Pool(cpu_count()-1) def people_in_pages(url): page = get(url) img_urls = extract_img_urls(page) imgs = P.map(get, img_urls) faces = P.map(detect_faces, imgs) return faces
  44. By the way... By the way... In Python, using threading

    for I/O bound In Python, using threading for I/O bound problems will be just fine. problems will be just fine. That is if you are interfacing from the outside That is if you are interfacing from the outside world, audio, serial, network, disc, et cetera, world, audio, serial, network, disc, et cetera, then threads are easier. then threads are easier. Multiprocessing is a much a better way to Multiprocessing is a much a better way to distribute work across CPUs. distribute work across CPUs.
  45. from bottle import abort, request, route, run @route('/faces') def faces():

    url = request.query['url'] if not url: abort(400) return '%s' % people_in_pages(url) @route('/') def index(): page = '''<html><head></head><body> <form method='GET' action='/faces'> <input name="url" type="text"></input> <button type="submit">Hunt</button> </body></html> ''' return page run('localhost', 8000)
  46. from bottle import abort, request, route, run @route('/faces') def faces():

    url = request.query['url'] if not url: abort(400) return '%s' % people_in_pages(url) @route('/') def index(): page = '''<html><head></head><body> <form method='GET' action='/faces'> <input name="url" type="text"></input> <button type="submit">Hunt</button> </body></html> ''' return page run('localhost', 8000) From previous slide
  47. from bottle import abort, request, route, run @route('/faces') def faces():

    url = request.query['url'] if not url: abort(400) return '%s' % people_in_pages(url) @route('/') def index(): page = '''<html><head></head><body> <form method='GET' action='/faces'> <input name="url" type="text"></input> <button type="submit">Hunt</button> </body></html> ''' return page run('localhost', 8000)
  48. and an implementation exists and an implementation exists in the

    standard library. in the standard library.
  49. XML-RPC XML-RPC try: try: from from SimpleXMLRPCServer SimpleXMLRPCServer import import

    SimpleXMLRPCServer SimpleXMLRPCServer except except ImportError: ImportError: from from xmlrpc.server xmlrpc.server import import SimpleXMLRPCServer SimpleXMLRPCServer server = SimpleXMLRPCServer(( server = SimpleXMLRPCServer(("localhost" "localhost", , 22022 22022)) )) server.register_introspection_functions() server.register_introspection_functions() server.register_function(people_in_pages, server.register_function(people_in_pages, 'people' 'people') ) server.serve_forever() server.serve_forever()
  50. XML-RPC XML-RPC server = SimpleXMLRPCServer(( server = SimpleXMLRPCServer(("localhost" "localhost", ,

    22022 22022)) )) server.register_introspection_functions() server.register_introspection_functions() server.register_function(people_in_pages, server.register_function(people_in_pages, 'people' 'people') ) server.serve_forever() server.serve_forever() Less scary without the boilerplate
  51. XML-RPC XML-RPC server = SimpleXMLRPCServer(( server = SimpleXMLRPCServer(("localhost" "localhost", ,

    22022 22022)) )) server.register_introspection_functions() server.register_introspection_functions() server.register_function(people_in_pages, server.register_function(people_in_pages, 'people' 'people') ) server.serve_forever() server.serve_forever() For spec compliance
  52. XML-RPC XML-RPC server = SimpleXMLRPCServer(( server = SimpleXMLRPCServer(("localhost" "localhost", ,

    22022 22022)) )) server.register_introspection_functions() server.register_introspection_functions() server.register_function(people_in_pages, server.register_function(people_in_pages, 'people' 'people') ) server.serve_forever() server.serve_forever() One thing to note, our function with yield will raise an exception.
  53. XML-RPC XML-RPC server = SimpleXMLRPCServer(( server = SimpleXMLRPCServer(("localhost" "localhost", ,

    22022 22022)) )) server.register_introspection_functions() server.register_introspection_functions() server.register_function(people_in_pages, server.register_function(people_in_pages, 'people' 'people') ) server.serve_forever() server.serve_forever() You can leave the registered function unnamed, defaults to fn.__name__
  54. XML-RPC XML-RPC try: try: from from SimpleXMLRPCServer SimpleXMLRPCServer import import

    SimpleXMLRPCServer SimpleXMLRPCServer except except ImportError: ImportError: from from xmlrpc.server xmlrpc.server import import SimpleXMLRPCServer SimpleXMLRPCServer server = SimpleXMLRPCServer(( server = SimpleXMLRPCServer(("localhost" "localhost", , 22022 22022)) )) server.register_introspection_functions() server.register_introspection_functions() server.register_function(people_in_pages, server.register_function(people_in_pages, 'people' 'people') ) server.serve_forever() server.serve_forever()
  55. How do we connect this to our website? How do

    we connect this to our website?
  56. from bottle import abort, request, route, run @route('/faces') def faces():

    url = request.query['url'] if not url: abort(400) return '%s' % people_in_pages(url) @route('/') def index(): page = '''<html><head></head><body> <form method='GET' action='/faces'> <input name="url" type="text"></input> <button type="submit">Hunt</button> </body></html> ''' return page run('localhost', 8000) Original
  57. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... New
  58. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... Connect to the XML-RPC server.
  59. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... Use XML-RPC server.
  60. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... Remember, we renamed the function
  61. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ...
  62. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... Has this helped?
  63. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... No
  64. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... We're still only using one core
  65. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... Performance is now slower...
  66. try: import xmlrpclib import ServerProxy except ImportError: from xmlrpc.client import

    ServerProxy from bottle import abort, request, route, run rpc = ServerProxy('http://localhost:22022 22022') @route('/faces') def faces(): url = request.query['url'] if not url: abort(400) return '%s' % rpc.people(url) ... ...everything is serialised as XML.
  67. How can we use the standard How can we use

    the standard library to create something more useful? library to create something more useful?
  68. … … but that wont help with but that wont

    help with CPU-bound problems. CPU-bound problems.
  69. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCDispatcher): """Simple XML-RPC server. """Simple

    XML-RPC server. Simple XML-RPC server that allows functions and a single instance Simple XML-RPC server that allows functions and a single instance to be installed to handle requests. The default implementation to be installed to handle requests. The default implementation attempts to dispatch XML-RPC calls to the functions or instance attempts to dispatch XML-RPC calls to the functions or instance installed in the server. Override the _dispatch method inhereted installed in the server. Override the _dispatch method inhereted from SimpleXMLRPCDispatcher to change this behavior. from SimpleXMLRPCDispatcher to change this behavior. """ """ allow_reuse_address = allow_reuse_address = True True # Warning: this is for debugging purposes only! Never set this to True in # Warning: this is for debugging purposes only! Never set this to True in # production code, as will be sending out sensitive information (exception # production code, as will be sending out sensitive information (exception # and stack trace details) when exceptions are raised inside # and stack trace details) when exceptions are raised inside # SimpleXMLRPCRequestHandler.do_POST # SimpleXMLRPCRequestHandler.do_POST _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) # [Bug #1222790] If possible, set close-on-exec flag; if a # [Bug #1222790] If possible, set close-on-exec flag; if a # method spawns a subprocess, the subprocess shouldn't have # method spawns a subprocess, the subprocess shouldn't have # the listening socket open. # the listening socket open. if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags)
  70. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCDispatcher): """Simple XML-RPC server. """Simple

    XML-RPC server. Simple XML-RPC server that allows functions and a single instance Simple XML-RPC server that allows functions and a single instance to be installed to handle requests. The default implementation to be installed to handle requests. The default implementation attempts to dispatch XML-RPC calls to the functions or instance attempts to dispatch XML-RPC calls to the functions or instance installed in the server. Override the _dispatch method inhereted installed in the server. Override the _dispatch method inhereted from SimpleXMLRPCDispatcher to change this behavior. from SimpleXMLRPCDispatcher to change this behavior. """ """ allow_reuse_address = allow_reuse_address = True True # Warning: this is for debugging purposes only! Never set this to True in # Warning: this is for debugging purposes only! Never set this to True in # production code, as will be sending out sensitive information (exception # production code, as will be sending out sensitive information (exception # and stack trace details) when exceptions are raised inside # and stack trace details) when exceptions are raised inside # SimpleXMLRPCRequestHandler.do_POST # SimpleXMLRPCRequestHandler.do_POST _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) # [Bug #1222790] If possible, set close-on-exec flag; if a # [Bug #1222790] If possible, set close-on-exec flag; if a # method spawns a subprocess, the subprocess shouldn't have # method spawns a subprocess, the subprocess shouldn't have # the listening socket open. # the listening socket open. if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) http://hg.python.org/cpython/file/2.7/Lib/SimpleXMLRPCServer.py#l569
  71. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCDispatcher): """Simple XML-RPC server. """Simple

    XML-RPC server. Simple XML-RPC server that allows functions and a single instance Simple XML-RPC server that allows functions and a single instance to be installed to handle requests. The default implementation to be installed to handle requests. The default implementation attempts to dispatch XML-RPC calls to the functions or instance attempts to dispatch XML-RPC calls to the functions or instance installed in the server. Override the _dispatch method inhereted installed in the server. Override the _dispatch method inhereted from SimpleXMLRPCDispatcher to change this behavior. from SimpleXMLRPCDispatcher to change this behavior. """ """ allow_reuse_address = allow_reuse_address = True True # Warning: this is for debugging purposes only! Never set this to True in # Warning: this is for debugging purposes only! Never set this to True in # production code, as will be sending out sensitive information (exception # production code, as will be sending out sensitive information (exception # and stack trace details) when exceptions are raised inside # and stack trace details) when exceptions are raised inside # SimpleXMLRPCRequestHandler.do_POST # SimpleXMLRPCRequestHandler.do_POST _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) # [Bug #1222790] If possible, set close-on-exec flag; if a # [Bug #1222790] If possible, set close-on-exec flag; if a # method spawns a subprocess, the subprocess shouldn't have # method spawns a subprocess, the subprocess shouldn't have # the listening socket open. # the listening socket open. if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Urgh
  72. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCDispatcher): """Simple XML-RPC server. """Simple

    XML-RPC server. Simple XML-RPC server that allows functions and a single instance Simple XML-RPC server that allows functions and a single instance to be installed to handle requests. The default implementation to be installed to handle requests. The default implementation attempts to dispatch XML-RPC calls to the functions or instance attempts to dispatch XML-RPC calls to the functions or instance installed in the server. Override the _dispatch method inhereted installed in the server. Override the _dispatch method inhereted from SimpleXMLRPCDispatcher to change this behavior. from SimpleXMLRPCDispatcher to change this behavior. """ """ allow_reuse_address = allow_reuse_address = True True # Warning: this is for debugging purposes only! Never set this to True in # Warning: this is for debugging purposes only! Never set this to True in # production code, as will be sending out sensitive information (exception # production code, as will be sending out sensitive information (exception # and stack trace details) when exceptions are raised inside # and stack trace details) when exceptions are raised inside # SimpleXMLRPCRequestHandler.do_POST # SimpleXMLRPCRequestHandler.do_POST _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) # [Bug #1222790] If possible, set close-on-exec flag; if a # [Bug #1222790] If possible, set close-on-exec flag; if a # method spawns a subprocess, the subprocess shouldn't have # method spawns a subprocess, the subprocess shouldn't have # the listening socket open. # the listening socket open. if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Let's simplify
  73. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags)
  74. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Important things to note
  75. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Subclassing from SocketServer mixin...
  76. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) ...and dispatcher.
  77. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Perhaps we can just use SocketServer.ForkingTCPServer?
  78. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Watch out for hard coded values!
  79. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) This thing appears to be a workaround to a bug.
  80. class class SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): SimpleXMLRPCServer(SocketServer.TCPServer, SimpleXMLRPCDispatcher): allow_reuse_address = allow_reuse_address =

    True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler, (self, addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding) SocketServer.TCPServer.__init__(self, addr, SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) requestHandler, bind_and_activate) if if fcntl fcntl is not is not None and hasattr(fcntl, 'FD_CLOEXEC'): None and hasattr(fcntl, 'FD_CLOEXEC'): flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags = fcntl.fcntl(self.fileno(), fcntl.F_GETFD) flags |= fcntl.FD_CLOEXEC flags |= fcntl.FD_CLOEXEC fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) fcntl.fcntl(self.fileno(), fcntl.F_SETFD, flags) Let's leave it alone.
  81. import import fcntl fcntl # needed for workaround, now hidden

    # needed for workaround, now hidden from from SocketServer SocketServer import import ForkingTCPServer as Server ForkingTCPServer as Server from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCDispatcher as Dispatcher SimpleXMLRPCDispatcher as Dispatcher from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCRequestHandler as Handler SimpleXMLRPCRequestHandler as Handler class class ForkingXMLRPCServer(Server, Dispatcher): ForkingXMLRPCServer(Server, Dispatcher): allow_reuse_address = allow_reuse_address = True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, request_handler=Handler, (self, addr, request_handler=Handler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests Dispatcher.__init__(self, allow_none, encoding) Dispatcher.__init__(self, allow_none, encoding) Server.__init__(self, addr, request_handler, bind_and_activate) Server.__init__(self, addr, request_handler, bind_and_activate) if if fcntl fcntl is not is not None None and and hasattr(fcntl, 'FD_CLOEXEC'): hasattr(fcntl, 'FD_CLOEXEC'): ... ...
  82. import import fcntl fcntl # needed for workaround, now hidden

    # needed for workaround, now hidden from from SocketServer SocketServer import import ForkingTCPServer as Server ForkingTCPServer as Server from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCDispatcher as Dispatcher SimpleXMLRPCDispatcher as Dispatcher from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCRequestHandler as Handler SimpleXMLRPCRequestHandler as Handler class class ForkingXMLRPCServer(Server, Dispatcher): ForkingXMLRPCServer(Server, Dispatcher): allow_reuse_address = allow_reuse_address = True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, request_handler=Handler, (self, addr, request_handler=Handler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests Dispatcher.__init__(self, allow_none, encoding) Dispatcher.__init__(self, allow_none, encoding) Server.__init__(self, addr, request_handler, bind_and_activate) Server.__init__(self, addr, request_handler, bind_and_activate) if if fcntl fcntl is not is not None None and and hasattr(fcntl, 'FD_CLOEXEC'): hasattr(fcntl, 'FD_CLOEXEC'): ... ... Here is the magic.
  83. import import gevent gevent gevent.patch_all() gevent.patch_all() import import fcntl fcntl

    # needed for workaround, now hidden # needed for workaround, now hidden from from SocketServer SocketServer import import ThreadedTCPServer as Server ThreadedTCPServer as Server from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCDispatcher as Dispatcher SimpleXMLRPCDispatcher as Dispatcher from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCRequestHandler as Handler SimpleXMLRPCRequestHandler as Handler class class ForkingXMLRPCServer(Server, Dispatcher): ForkingXMLRPCServer(Server, Dispatcher): allow_reuse_address = allow_reuse_address = True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, request_handler=Handler, (self, addr, request_handler=Handler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests Dispatcher.__init__(self, allow_none, encoding) Dispatcher.__init__(self, allow_none, encoding) Server.__init__(self, addr, request_handler, bind_and_activate) Server.__init__(self, addr, request_handler, bind_and_activate) if if fcntl fcntl is not is not None None and and hasattr(fcntl, 'FD_CLOEXEC'): hasattr(fcntl, 'FD_CLOEXEC'): ... ...
  84. import import gevent gevent gevent.patch_all() gevent.patch_all() import import fcntl fcntl

    # needed for workaround, now hidden # needed for workaround, now hidden from from SocketServer SocketServer import import ThreadedTCPServer as Server ThreadedTCPServer as Server from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCDispatcher as Dispatcher SimpleXMLRPCDispatcher as Dispatcher from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCRequestHandler as Handler SimpleXMLRPCRequestHandler as Handler class class ForkingXMLRPCServer(Server, Dispatcher): ForkingXMLRPCServer(Server, Dispatcher): allow_reuse_address = allow_reuse_address = True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, request_handler=Handler, (self, addr, request_handler=Handler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests Dispatcher.__init__(self, allow_none, encoding) Dispatcher.__init__(self, allow_none, encoding) Server.__init__(self, addr, request_handler, bind_and_activate) Server.__init__(self, addr, request_handler, bind_and_activate) if if fcntl fcntl is not is not None None and and hasattr(fcntl, 'FD_CLOEXEC'): hasattr(fcntl, 'FD_CLOEXEC'): ... ... Can you spot the differences?
  85. import import gevent gevent gevent.patch_all() gevent.patch_all() import import fcntl fcntl

    # needed for workaround, now hidden # needed for workaround, now hidden from from SocketServer SocketServer import import ThreadedTCPServer as Server ThreadedTCPServer as Server from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCDispatcher as Dispatcher SimpleXMLRPCDispatcher as Dispatcher from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCRequestHandler as Handler SimpleXMLRPCRequestHandler as Handler class class ForkingXMLRPCServer(Server, Dispatcher): ForkingXMLRPCServer(Server, Dispatcher): allow_reuse_address = allow_reuse_address = True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, request_handler=Handler, (self, addr, request_handler=Handler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests Dispatcher.__init__(self, allow_none, encoding) Dispatcher.__init__(self, allow_none, encoding) Server.__init__(self, addr, request_handler, bind_and_activate) Server.__init__(self, addr, request_handler, bind_and_activate) if if fcntl fcntl is not is not None None and and hasattr(fcntl, 'FD_CLOEXEC'): hasattr(fcntl, 'FD_CLOEXEC'): ... ... gevent monkeypatches socket...
  86. import import gevent gevent gevent.patch_all() gevent.patch_all() import import fcntl fcntl

    # needed for workaround, now hidden # needed for workaround, now hidden from from SocketServer SocketServer import import ThreadedTCPServer as Server ThreadedTCPServer as Server from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCDispatcher as Dispatcher SimpleXMLRPCDispatcher as Dispatcher from from SimpleXMLRPCServer SimpleXMLRPCServer import import SimpleXMLRPCRequestHandler as Handler SimpleXMLRPCRequestHandler as Handler class class ForkingXMLRPCServer(Server, Dispatcher): ForkingXMLRPCServer(Server, Dispatcher): allow_reuse_address = allow_reuse_address = True True _send_traceback_header = _send_traceback_header = False False def def __init__ __init__(self, addr, request_handler=Handler, (self, addr, request_handler=Handler, logRequests=True, allow_none=False, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True): encoding=None, bind_and_activate=True): self.logRequests = logRequests self.logRequests = logRequests Dispatcher.__init__(self, allow_none, encoding) Dispatcher.__init__(self, allow_none, encoding) Server.__init__(self, addr, request_handler, bind_and_activate) Server.__init__(self, addr, request_handler, bind_and_activate) if if fcntl fcntl is not is not None None and and hasattr(fcntl, 'FD_CLOEXEC'): hasattr(fcntl, 'FD_CLOEXEC'): ... ... ...meaning you can use “threads”.
  87. A task queue that is even easier to use. A

    task queue that is even easier to use.
  88. As a bonus, it can support both As a bonus,

    it can support both gevent for I/O intensive tasks & gevent for I/O intensive tasks & multiprocessing for CPU intensive tasks multiprocessing for CPU intensive tasks (at the same time). (at the same time).
  89. As a bonus, it can support both As a bonus,

    it can support both gevent for I/O intensive tasks & gevent for I/O intensive tasks & multiprocessing for CPU intensive tasks multiprocessing for CPU intensive tasks (at the same time). (at the same time). You can use OS threads for I/O tasks, but you will get a soft max of 200 instead of over 1000.
  90. iPython looks just like a shell, iPython looks just like

    a shell, but it also has very stable support for but it also has very stable support for distributed programming. distributed programming.
  91. Controller ... Engine 0 ... An IPython Cluster An IPython

    Cluster Controllers are Controllers are your interface to your interface to one or more one or more engines engines
  92. Controller ... Engine 0 ... An IPython Cluster An IPython

    Cluster Engines could be Engines could be local processes, local processes, remote remote processes via processes via SSH or batch SSH or batch jobs. jobs.
  93. Controller ... Engine 0 ... An IPython Cluster An IPython

    Cluster Schedulers live Schedulers live here, but are here, but are hidden from you. hidden from you.
  94. $ ipython profile create --parallel profile_name $ ipython profile create

    --parallel profile_name $ ipcluster start –n=4 --profile=profile_name $ ipcluster start –n=4 --profile=profile_name $ ipython notebook --profile=profile_name $ ipython notebook --profile=profile_name
  95. In [1]: In [1]: from from IPython.parallel IPython.parallel import import

    Client Client In [2]: In [2]: rc = Client(profile= rc = Client(profile='profile_name' 'profile_name') ) In [3]: In [3]: rc.ids rc.ids Out[3]: Out[3]: [0, 1, 2, 3] [0, 1, 2, 3] In [4]: In [4]: view = rc[:] view = rc[:] In [n]: In [n]: view.apply_async(people_in_page, url) view.apply_async(people_in_page, url)
  96. Some homework Some homework • To do effective distributed programming,

    you need To do effective distributed programming, you need to think about (at least): to think about (at least): • data locality data locality • ease of configuration / maintainence ease of configuration / maintainence • decomposition decomposition • load balancing load balancing • message size message size • efficient serialisation efficient serialisation • send references, rather than data through message queues send references, rather than data through message queues • ease of debugging ease of debugging