Upgrade to Pro — share decks privately, control downloads, hide ads and more …

You're No Fun Anymore - PyCon AU 2014

davidjb
August 03, 2014

You're No Fun Anymore - PyCon AU 2014

Lightning talk presented to PyCon AU 2014 in Brisbane.

davidjb

August 03, 2014
Tweet

More Decks by davidjb

Other Decks in Programming

Transcript

  1. You're no fun anymore PyCon AU 2014 David Beitey (@davidjb

    or @davidjb_) Slides @ https://speakerdeck.com/davidjb
  2. And now for something completely different • Far too serious.

    • It's Sunday so let's be a little silly. • Ever wondered how much Monty Python influences Python?
  3. Reference you likely know: • Python's name is from Flying

    Circus ◦ Great post from Guido: http://python-history.blogspot.com. au/2009/01/personal-history-part-1-cwi.html • IDLE editor, Cheese Shop, Packages, Wheel • Sketch and comedian references in documentation, tests • Guido "under-analysed" the language's name ◦ "First thing that came to mind"
  4. Introducing Blancmange • Sweet, creamy dessert. Popular in UK in

    70s. • Topic of much of Ep 7 of Flying Circus • Now a Monty Python analysis package for Python easy_install blancmange
  5. Introducing Blancmange (console scripts) flying-circus-db Creates an SQLite DB using

    HTML original sketches and divides into Episodes/Sketches/Words/Actors – who said what. flying-circus Micro-analyse Flying Circus (165,483 spoken words, Michael Palin said 24.3% of them), 500+ sketches, 10100 lines spoken, 249,945 words in scripts) spot-camels keyword [keyword...] Find & display sketches that feature keywords. Open in web browser. completely-different Lookup a random line and output as Python comments, wrapped, and formatted ready for inclusion in your code. blancmange [path] Process a directory of files (eg CPython!) and match with the scripts. Output most common references, those which are missing, and actor stats.
  6. Run blancmange over CPython • Naive text analysis of arbitrary

    source is hard! • Extract episodes from downloaded data wget/PyQuery • Extract words and counts from episodes TextBlob/NLTK • Exclude short words, Python syntax Pygments • Normalise using NLTK. Brown Corpus is from 1979 which is perfect. • Wait about 45 mins. ◦ 76MB of CPython, 250,000 Flying-circus words ◦ Results are pickled for sanity.
  7. Fast facts • Flying Circus has 76 references to Spam.

    Cpython has 2225! • Eric Idle gets mentioned 3495 times in Cpython. IDLE, libidle John Cleese mentioned 316 times, Terry Gilliam just 8 times. Aww :( • Almost all references are within the documentation/tests: a. May be good thing or we'd end up with: os. walk → silly.walk • Missing references to: Camel Spotting, Fish Slapping, Hell's Grannies, Blancmange, etc.
  8. My Zen of (Monty) Python Monty Python references are one

    honking great idea -- let's do more of those! • Blancmange = rough edges. < 2 days old. • Want to help or think I'm crazy? Get in touch! • http://github.com/blancmange/blancmange • http://davidjb.com, Twitter: @davidjb_ Slides @ https://speakerdeck.com/davidjb