Circus ◦ Great post from Guido: http://python-history.blogspot.com. au/2009/01/personal-history-part-1-cwi.html • IDLE editor, Cheese Shop, Packages, Wheel • Sketch and comedian references in documentation, tests • Guido "under-analysed" the language's name ◦ "First thing that came to mind"
HTML original sketches and divides into Episodes/Sketches/Words/Actors – who said what. flying-circus Micro-analyse Flying Circus (165,483 spoken words, Michael Palin said 24.3% of them), 500+ sketches, 10100 lines spoken, 249,945 words in scripts) spot-camels keyword [keyword...] Find & display sketches that feature keywords. Open in web browser. completely-different Lookup a random line and output as Python comments, wrapped, and formatted ready for inclusion in your code. blancmange [path] Process a directory of files (eg CPython!) and match with the scripts. Output most common references, those which are missing, and actor stats.
source is hard! • Extract episodes from downloaded data wget/PyQuery • Extract words and counts from episodes TextBlob/NLTK • Exclude short words, Python syntax Pygments • Normalise using NLTK. Brown Corpus is from 1979 which is perfect. • Wait about 45 mins. ◦ 76MB of CPython, 250,000 Flying-circus words ◦ Results are pickled for sanity.
Cpython has 2225! • Eric Idle gets mentioned 3495 times in Cpython. IDLE, libidle John Cleese mentioned 316 times, Terry Gilliam just 8 times. Aww :( • Almost all references are within the documentation/tests: a. May be good thing or we'd end up with: os. walk → silly.walk • Missing references to: Camel Spotting, Fish Slapping, Hell's Grannies, Blancmange, etc.
honking great idea -- let's do more of those! • Blancmange = rough edges. < 2 days old. • Want to help or think I'm crazy? Get in touch! • http://github.com/blancmange/blancmange • http://davidjb.com, Twitter: @davidjb_ Slides @ https://speakerdeck.com/davidjb