Save 37% off PRO during our Black Friday Sale! »

Discovering Python by David Beazley

D21717ea76044d31115c573d368e6ff4?s=47 PyCon 2014
April 10, 2014

Discovering Python by David Beazley

D21717ea76044d31115c573d368e6ff4?s=128

PyCon 2014

April 10, 2014
Tweet

Transcript

  1. Discovering Python David Beazley (@dabeaz) http:/ /www.dabeaz.com PyCon'2014 Montreal

  2. In 2005... ... I was hired to go look at

    1.5 TB (yes, that's Terabytes) of source code sitting in a secret vault.
  3. Six Years Later... I testified in US district court about:

    - Concurrency - Threads - Event loops - Interrupts Good god!
  4. Discovering with Python (or what happens when Python is brought

    into the ring of a legal battle)
  5. Disclaimer Everything in this talk actually happened Names and details

    have been changed Non-disclosure (I'd have to kill you) All exhibits/photos are fictional I know nothing, you'll learn nothing
  6. Meet Alice

  7. Alice Meet Bob

  8. Alice Bob "No, I'll send YOU a message!"

  9. Alice Bob Bob's Attorney

  10. Alice Bob Bob's Attorney "Bwhahahaha!" Patent Infringement

  11. Alice Bob Bob's Attorney "Prepare to die!" Alice's Attorney

  12. Let's Talk Patents A hot-button issue Myth: All patent lawsuits

    are trolls Myth: All patent lawsuits involving software are purely about software Fact: Patent litigation is hell
  13. Patent Litigation You hear about patents a lot But what

    actually happens? This talk is about that! Initial Complaint Fact Discovery (9-12 months) Claim Construction Summary Judgement Trial
  14. Plaintiff Bob "Obvious Infringement"

  15. Defendant Alice "Obviously Different"

  16. Fact Discovery Bob's Attorney Alice's Attorney Facts

  17. "Just the facts, ma'am" Enter: Fact Expert Technical expert Unbiased

    party Privileged Works with legal
  18. The Team Bob's Attorney Bob's Coworkers Fact Expert

  19. Reality Bob's Attorney Bob's Coworkers Fact Expert

  20. Reality Bob's Attorney Bob's Coworkers Fact Expert Me

  21. What Happens You are dropped into a firestorm No technical

    guidance Because no one knows anything... that's why they called you!
  22. Quick Learning The Invention

  23. Quick Learning The Invention 7. The system of claim 5

    or 6, wherein the display and input means comprises displays means and input means, the input means being connected to the central processing unit, the display means being connected to the slithering means and the central processing unit, the display means being arranged to display the displays and the input means transferring the input responses to the central processing unit, and wherein the display and input task means further comprises display task means and input task means, the display task means being arranged to control the display means by transferring display commands to, and receiving the display responses from, the display means, the input task means being arranged to control the input means by transferring input commands to, and receiving input responses from, the input means. The Patent
  24. Quick Learning The Invention 7. The system of claim 5

    or 6, wherein the display and input means comprises displays means and input means, the input means being connected to the central processing unit, the display means being connected to the slithering means and the central processing unit, the display means being arranged to display the displays and the input means transferring the input responses to the central processing unit, and wherein the display and input task means further comprises display task means and input task means, the display task means being arranged to control the display means by transferring display commands to, and receiving the display responses from, the display means, the input task means being arranged to control the input means by transferring input commands to, and receiving input responses from, the input means. The Patent
  25. Invention has some code - 600 pages C - PDF

    - 1989
  26. Patent Compilation Does the patent even work? Would the code

    compile? Can it be explained to others? You'd better find out How?
  27. Hand Compilation from PDF - Use highlighter

  28. Enter Python definitions = { 450: [ 'spam', 'grok', ],

    451: [ 'foo', ], 452: [ 'bar', ] } definitions calls = { 123: [ 'blah', 'read_input', 'send_msg', ], 124: [ 'spam', 'foo', 'bar' ] } calls Entered by hand (from paper copy) A long weekend
  29. Just Link It symbols = { name: pageno for pageno,

    defns in definitions.items() for name in defns } unresolved = [ (name, pageno) for pageno, clist in calls.items() for name in clist if name not in symbols ] missing = defaultdict(list) for name, pageno in unresolved: missing[name].append(pageno) for item in missing.items() print("Missing: %s on pages %s" % item)
  30. Secret Weapons List/dict/set comprehensions collections module

  31. WHY?!?!?!?! Due diligence You'd better understand your side's invention Otherwise,

    you will die
  32. Meet The Enemy Alice

  33. Meet The Enemy Alice Alice's Ninja Rockstar Coders

  34. Alice's Ninja Rockstar Coders Meet The Enemy Alice

  35. Meet The Enemy Alice Alice's Adult Engineers SEI CMMI Level

    4
  36. 5,000 pages Can you look at it? They gave us

    some printouts
  37. Some Documents 500 pages

  38. 500 pages 5,000 pages Some Documents

  39. 500 pages 5,000 pages 500,000 pages Some Documents

  40. (what's better than one? 300,000 that's what!) Sample Documents

  41. Purported Source Code? ATTORNEY EYES ONLY ATTORNEY EYES ONLY 1677723

    1677724
  42. From: Guido van Rossum <guido@python.org> Date: Dec 9 23:21:42 CET

    2011 Subject: [Python-Dev][PATCH] Adding braces to __future__ For me, if I had to design a new language today, I would probably use braces, not because they're better than whitespace, but because pretty much every other lanugage uses them, and there are more interesting concepts to distinguish a new language. That said, I don't regret that Python uses indentation, and the rest I have to say about the topic would violate the above request. -- --Guido van Rossum (python.org/~guido) Emails
  43. From: Guido van Rossum <guido@python.org> Date: Dec 9 23:21:42 CET

    2011 Subject: [Python-Dev][PATCH] Adding braces to __future__ For me, if I had to design a new language today, I would probably use braces, not because they're better than whitespace, but because pretty much every other lanugage uses them, and there are more interesting concepts to distinguish a new language. That said, I don't regret that Python uses indentation, and the rest I have to say about the topic would violate the above request. -- --Guido van Rossum (python.org/~guido) Emails Smoking gun?!?
  44. Alleged Prior Art

  45. Deposition of Crazy Old Guy Prior art

  46. We Have Their Software It's highly proprietary You're the only

    one approved to look at it It's actually sitting over in a vault AKA: Software escrow
  47. None
  48. The Vault

  49. The Vault By the tracks

  50. The Vault By the tracks Rock band rehearsal space

  51. Vault Protocol No computers No phone No electronics No storage

    devices Pen, paper and books okay
  52. The Vault PC in a locked cage (no network) Printer

    Special paper Log Book
  53. What's There? A collection of large hard drives D:\ E:\

    F:\ G:\ Each containing copies of CDs (>1.5 TB total) No documentation or organization
  54. Perspective Software archive for the infringing invention

  55. Perspective Software archive for the infringing invention Embedded Microcontroller System

  56. Perspective Software archive for the infringing invention Embedded Microcontroller System

    Display Module Keypads 7-segment
  57. Perspective Software archive for the infringing invention Embedded Microcontroller System

    Display Module Keypads 7-segment A PC
  58. Perspective Software archive for the infringing invention Embedded Microcontroller System

    Display Module Keypads 7-segment A PC Custom PCI Board
  59. Perspective Software archive for the infringing invention Embedded Microcontroller System

    Display Module Keypads 7-segment A PC Custom PCI Board Second PC
  60. Perspective A PC Custom PCI Board Second PC Custom Router

    Actually, more of a distributed system
  61. Perspective The software is "all stack" (a million lines of

    code) C++/ Win32 C/ASM DCOM/ CORBA C/ASM C/ASM VB Java RMI RTOS
  62. Enter Time OS/2 90 92 94 95 96 97 98

    00 01 WinNT V1 V2 V3 V4 V5 V6 V7 V8 V9 RevA RevB RevC • Weekly snapshots (52 x 15 years = 780 versions) • Multiple hardware revisions/configurations • Operating system changes/deployment changes
  63. Enter Customers • Dozen major customers (corporations) • Customer-specific system

    modifications • Think "skins" on main system • Hundreds of interlocking versions Base System Version 2.51 ACME Vers 1.23 Buy N Large Vers 4.22 Tyrell Corp Vers 3.43
  64. Provided Tools Windows-XP

  65. Provided Tools Windows-XP Command Prompt

  66. Provided Tools Windows-XP Command Prompt Search Mutt

  67. Provided Tools Notepad

  68. Official Tools Notepad Visual Studio

  69. Printing You can print anything Must be logged Numbered, copied,

    given to opposing side
  70. Constraints No working hardware setup (can't run code) No working

    build environ (can't compile) No tech support (can't call anyone) Fragmentary documentation (if any)
  71. None
  72. Secret Weapon

  73. Python? What? How? Unknown: How did Python get placed on

    the machine in the vault? I have NO idea A new IBM PC with only "approved tools" Best Guess: Used by an IBM OEM tool (Yet, there it was, python... in the Windows path no less).
  74. Desert Island Coding Admit it, you've probably thought Python might

    be a good choice Batteries included FOR THE WIN!
  75. Strategy Create a fact discovery environment from scratch in the

    vault I was destined for this job... I wrote the book
  76. Question: What are the objectives? (What does it mean to

    "look at" the code?)
  77. Goals What was provided? Is it complete? How does the

    code work? Where is the patent in the code?
  78. The Horror! The Horror! Reverse engineering the entire build environ

    Makefiles, config files, etc. Identifying all major software components Examples: .exe files, .DLLs, plugins, etc. Sorting out version histories
  79. MKDEP= mkdep SHELL= /bin/sh # === Fixed definitions === OBJS=

    \ bltinmodule.o \ ceval.o cgensupport.o compile.o \ errors.o \ frozen.o \ getargs.o getcompiler.o getcopyright.o getmt getplatform.o getversion.o graminit.o \ import.o importdl.o \ marshal.o modsupport.o mystrtoul.o \ pythonrun.o \ sigcheck.o structmember.o sysmodule.o \ traceback.o \ $(LIBOBJS) LIB= libPython.a Sources Library You try to figure it out
  80. None
  81. Basic Tooling Reimplement Unix find grep wc diff tail head

    Because that Windows search mutt must die
  82. Example: navigation import os def cd(dirname): os.chdir(dirname) def pwd(): print(os.getcwd())

    def ls(dirname=''): os.system('dir %s' % dirname)
  83. Example: diff # diff.py import sys, difflib def diff(fromfile, tofile):

    fromlines = open(fromfile).readlines() tolines = open(tofile).readlines() diff = difflib.context_diff(fromlines, tolines, fromfile, tofile) sys.stdout.writelines(diff)
  84. Interactive Shell >>> cd('pycode') >>> pwd() D:\Files\pycode >>> diff('Python-2.6/Lib/collections.py', ...

    'Python-2.6.2/Lib/collections.py') *** Python-2.6/Lib/collections.py --- Python-2.6.2/Lib/collections.py *************** *** 103,109 **** # where the named tuple is created. Bypass this step in e where # sys._getframe is not defined (Jython for example). if hasattr(_sys, '_getframe'): ! result.__module__ = _sys._getframe(1).f_globals['__nam return result --- 103,109 ---- # where the named tuple is created. Bypass this step in e where # sys._getframe is not defined (Jython for example).
  85. More Than Reinvention Actually implementing an entire workflow Building up

    layers of tools/analyses Not unlike what is done with IPython NB Can't understate Python awesomeness
  86. Example def allfiles(topdir): return ((path, filename) for path, dirs, files

    in os.walk(topdir) for filename in files) >>> files = allfiles('AllPython') >>> next(files) ('AllPython/0/python-0.9.1', 'python.man') >>> next(files) ('AllPython/0/python-0.9.1', 'README') >>>
  87. Example def filetypes(topdir): from collections import Counter from pprint import

    pprint c = Counter(os.path.splitext(name)[1] for _, name in allfiles(topdir)) pprint(c.most_common()) >>> filetypes('AllPython') [('.py', 125277), ('.c', 27200), ('', 17010), ('.rst', 15439), ('.h', 14782), ('.tex', 12257), ... allfiles()
  88. Example def find(topdir, pattern): from fnmatch import fnmatch return ((path,

    name) for path, name in allfiles(topdir) if fnmatch(name, pattern)) >>> f = find('AllPython', '*.py') >>> next(f) ('AllPython/0/python-0.9.1/demo/scripts', 'findlinksto.py') >>> next(f) ('AllPython/0/python-0.9.1/demo/scripts', 'mkreal.py') >>> next(f) ('AllPython/0/python-0.9.1/demo/scripts', 'ptags.py') >>> allfiles() filetypes()
  89. Example def create_versions(topdir): import re for path, _ in find(topdir,

    'pgen.c'): pypath, _ = os.path.split(path) version = re.search(r'-(\w+\.\w+(\.\w+)?)$', pypath).group(1) yield version, pypath allfiles() filetypes() find()
  90. Example >>> vers = find_versions('AllPython') >>> next(vers) ('0.9.1', 'AllPython/0/python-0.9.1') >>>

    next(vers) ('1.0.1', 'AllPython/1/python-1.0.1') >>> allfiles() filetypes() find() find_versions()
  91. Example def write_manifest(topdir): import csv f =!open('manifest.csv','w') csv.writer(f).writerows(find_versions(topdir)) f.close() allfiles()

    filetypes() find() find_versions()
  92. Example allfiles() filetypes() find() find_versions() write_manifest()

  93. Example allfiles() filetypes() find() find_versions() write_manifest() .csv Workflows!

  94. Pile it Higher and Higher HDDs Snapshots "Virtual File System"

    View View View You keep building abstractions Reorganized file layer Different views (version, date, prod, debug, etc.) .csv
  95. Timeline/Inventory Tools Link together every version of every component found

    Development timelines Official vs. Debug releases V1 V2 V3 V4 release release release release release release release
  96. Example: Versioning def versions(filename): import hashlib from collections import defaultdict

    manifest = read_manifest() groups = defaultdict(list) for vers, path in manifest.items(): fullname = os.path.join(path, filename) if os.path.exists(fullname): digest = hashlib.new('md5') digest.update(open(fullname,'rb').read()) groups[digest.digest()].append(vers) return sorted([sorted(g) for g in groups.values()])
  97. Example: Versioning >>> for x in versions('Python/thread.c'): ... print(x) ...

    ['1.0.1'] ['1.1'] ['1.2', '1.3'] ['1.4'] ['1.5', '1.5.1'] ['1.5.2', '1.5.2c1'] ['1.5.2b1', '1.5.2b2'] ['1.6', '1.6b1'] ['2.0', '2.0.1', '2.0c1', '2.1', '2.1.1', '2.1.2', '2.1.3'] ...
  98. Navigational Tooling "Virtual File System" View View View Query tools

    for going to any version/file Navigational Tools >>> view('2.7.3', 'Python/ceval.c') >>> Typically launch windows tools (e.g., Vis Studio)
  99. Commentary I don't know if the opposing side actually expected

    us to figure out their code We knew almost everything about everything Python FOR.THE.WIN.
  100. How Does Code Work? Better make sure you understand everything

    about the code Software architecture Interaction between components Underlying algorithms
  101. Problem: Code Sucks Nobody wants to read code Better: Design

    documents, specs Nobody wants to give you that "Go read the source."
  102. Let's Go Fishing Interesting files Code comments TPS reports PDF,

    DOC, RTF, HTML, TXT / / See: Important Document Fixed bug. See important specification. I ὑ re
  103. Back and Forth An obscure find /* See FS-6541-8v2.0 for

    details */ A request to attorneys "Tell opposing counsel we can't find FS-6541-8v2.0" A few silent days pass....
  104. None
  105. None
  106. None
  107. Casting a Wide Net Search for documents is far and

    wide Software change notices Unrelated software (peripheral devices) Emails The web (catalogs, manuals, job postings, etc.) Analogy: Pulling on a loose thread...
  108. Commentary You're learning the invention from scratch Reading other people's

    code You're teaching attorneys about it The other side doesn't want you to succeed You will learn A LOT in this exercise
  109. Some Lessons Learned SUCKS ROCKS C++ Assembly code Asynchronous Threads

    Objects Functions Makefiles IDEs CASE Tools Humans 1990s 1970s (of course this is just my opinion, I could be wrong) UML Words
  110. Speaking of Attorneys Do the "facts" support patent infringement? Does

    it look like it infringes? Can it be proven that it infringes? (Let the game begin)
  111. None
  112. Remember this? 7. The system of claim 5 or 6,

    wherein the display and input means comprises displays means and input means, the input means being connected to the central processing unit, the display means being connected to the slithering means and the central processing unit, the display means being arranged to display the displays and the input means transferring the input responses to the central processing unit, and wherein the display and input task means further comprises display task means and input task means, the display task means being arranged to control the display means by transferring display commands to, and receiving the display responses from, the display means, the input task means being arranged to control the input means by transferring input commands to, and receiving input responses from, the input means. The Patent
  113. Remember this? 7. The system of claim 5 or 6,

    wherein the display and input means comprises displays means and input means, the input means being connected to the central processing unit, the display means being connected to the slithering means and the central processing unit, the display means being arranged to display the displays and the input means transferring the input responses to the central processing unit, and wherein the display and input task means further comprises display task means and input task means, the display task means being arranged to control the display means by transferring display commands to, and receiving the display responses from, the display means, the input task means being arranged to control the input means by transferring input commands to, and receiving input responses from, the input means. The Patent What does this claim mean? (let's rumble)
  114. None
  115. Defining Claim Terms 7. The system of claim 5 or

    6, wherein the display and input means comprises displays means and input means, the input means being connected to the central processing unit, the display means being connected to the slithering means and the central processing unit, the display means being arranged to display the displays and the input means transferring the input responses to the central processing unit, and wherein the display and input task means further comprises display task means and input task means, the display task means being arranged to control the display means by transferring display commands to, and receiving the display responses from, the display means, the input task means being arranged to control the input means by transferring input commands to, and receiving input responses from, the input means. The Patent Term Plaintiff Defendant central processing unit display means
  116. Claim Construction Claim terms have to be supported by reality

    If not, it's game over A lot of attorney/expert consultation Problem: very specific facts and structure File: Widget/foo.c, lines 230-255. Requires a deep dive
  117. Problem Matching claims to 800 versions of a million line

    program Pick one version? Which one? Match them all?
  118. Fragment Versioning You're familiar with source code control Imagine applying

    it to code fragments/excerpts In reverse Hmmm.
  119. /* source.c */ void grok() { if (spam) { foo();

    bar(); } ... } void blah() { ... }
  120. /* source.c */ void grok() { if (spam) { foo();

    bar(); } ... } void blah() { ... } file: source.c start: 'void grok()' end: 'void blah()' Fragment
  121. /* source.c */ void grok() { if (spam) { foo();

    bar(); } ... } void blah() { ... } file: source.c start: 'void grok()' end: 'void blah()' Fragment Snapshots (>800) Global fragment search across all versions
  122. /* source.c */ void grok() { if (spam) { foo();

    bar(); } ... } void blah() { ... } file: source.c start: 'void grok()' end: 'void blah()' Fragment Snapshots (>800) Ver1 Ver2 Ver3 void grok() { if (spam) { foo(); bar(); } } void blah() { void grok() { if (spam) { foo(); bar(x); } } void blah() { void grok() { if (spam) { new_foo(); bar(x); } } void blah() {
  123. Big Picture Reduce a massive data set to something sane

    "This claim matches this structure in the code. There have only been six versions of this code over 15 years. Here are the six versions."
  124. PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON

    PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON
  125. PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON

    PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON Python makes the impossible possible
  126. PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON

    PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON PYTHON Python makes the impossible possible (even Python 3)
  127. Final Thoughts If you get the chance to do this,

    do it! You will learn A LOT! Would I want to do it again? Not sure.
  128. But How Did it End?

  129. None
  130. My End Game I learned a lot about generator functions

    Ultimately a well-known PyCon tutorial...
  131. Postscript: Expert Report You may be asked to write an

    expert report Outlines all factual findings Ties facts to patent claims A scientific document It's a document that WILL be read
  132. Postscript: Deposition You A room of attorneys Opposing expert Court

    reporter Videographer 8 hours It will be one of the most intense, surreal, awesome/worst experiences of your whole life.
  133. Postscript: Court Testimony Like deposition, but dialed up to 11

    Twice as many attorneys, more experts Judge & clerks
  134. Questions