$30 off During Our Annual Pro Sale. View Details »

Doug Hellmann - How I built a power debugger out of the standard library and things I found on the internet

Doug Hellmann - How I built a power debugger out of the standard library and things I found on the internet

Smiley spys on your application, recording everything it does, using Python's built-in tracing facility and a database to create a complete record of your program's runtime so you can study it. Most of the tools I used to create Smiley are in Python's standard library, and the history of Smiley's development serves as a framework to discuss those tools and how you can tap their power yourself.

https://us.pycon.org/2016/schedule/presentation/1650/

PyCon 2016

May 29, 2016
Tweet

More Decks by PyCon 2016

Other Decks in Programming

Transcript

  1. How I Built a Power Debugger
    Out of the Standard Library and
    Things I Found on the Internet
    Doug Hellmann
    PyCon US 2016

    View Slide

  2. I Built a Power Debugger!
    Doug Hellmann
    PyCon US 2016

    View Slide

  3. How
    I Built a Power Debugger!
    Doug Hellmann
    PyCon US 2016

    View Slide

  4. github.com/dhellmann/smiley
    smiley.readthedocs.org

    View Slide

  5. View Slide

  6. View Slide

  7. Features
    • Record calls with data
    • Remote monitoring, local
    analysis
    • Browse history
    • Learn new tools
    flickr/Kangrex

    View Slide

  8. View Slide

  9. class Publisher(object):
    def __init__(self, endpoint,
    high_water_mark=10000):
    self.context = zmq.Context()
    self.pub_socket = \
    self.context.socket(zmq.PUSH)
    self.pub_socket.bind(endpoint)
    self.pub_socket.identity = 'publisher'
    self.pub_socket.hwm = high_water_mark
    def send(self, msg_type, data):
    msg = [
    msg_type,
    json.dumps(data,
    default=self._json_special_types),
    ]
    self.pub_socket.send_multipart(msg)

    View Slide

  10. def trace_calls(self, frame, event, arg):
    co = frame.f_code
    filename = co.co_filename
    if filename in (__file__,):
    # Ignore ourself
    return
    self._send_notice(frame, event, arg)
    return self.trace_calls

    View Slide

  11. def _send_notice(self, frame, event, arg):
    co = frame.f_code
    func_name = co.co_name
    line_no = frame.f_lineno
    filename = os.path.abspath(co.co_filename)
    for d in IGNORE_DIRS:
    if filename.startswith(d):
    return
    # …

    View Slide

  12. # …
    interesting_locals = {
    n: v
    for n, v in frame.f_locals.items()
    if (not inspect.ismodule(v)
    and not inspect.isfunction(v)
    and not inspect.ismethod(v)
    and (n[:2] != '__' and n[-2:] != '__'))
    }
    # …

    View Slide

  13. # …
    self.publisher.send(
    event,
    {'func_name': func_name,
    'line_no': line_no,
    'filename': filename,
    'arg': arg,
    'locals': interesting_locals,
    'timestamp': time.time(),
    'run_id': self.run_id,
    })

    View Slide

  14. $ smiley help
    usage: smiley [--version] [-v] [--log-file LOG_FILE]
    [-q] [-h] [--debug]
    smiley spies on your apps as they run
    optional arguments:
    --version show program's version number
    and exit
    -v, --verbose Increase verbosity of output.
    --log-file LOG_FILE Specify a file to log output.
    -q, --quiet suppress output except warnings
    -h, --help show this help message and exit
    --debug show tracebacks on errors
    Commands:
    complete print bash completion command
    help print detailed help for another command
    monitor Listen for running programs and show
    their progress.
    run Run another program with monitoring
    enabled.

    View Slide

  15. $ smiley help run
    usage: smiley run [-h] [--socket SOCKET] command
    [command ...]
    Run another program with monitoring enabled.
    positional arguments:
    command the command to spy on
    optional arguments:
    -h, --help show this help message and exit
    --socket SOCKET URL for the socket where the listener
    will be (tcp://127.0.0.1:5556)

    View Slide

  16. $ smiley help monitor
    usage: smiley monitor [-h] [--socket SOCKET]
    Listen for running programs and show their progress.
    optional arguments:
    -h, --help show this help message and exit
    --socket SOCKET URL for the socket where to monitor on
    (tcp://127.0.0.1:5556)

    View Slide

  17. def _process_message(self, msg):
    print 'MESSAGE:', msg
    msg_type, msg_payload = msg
    if msg_type == 'start_run':
    print (‘Starting new run:',
    msg_payload.get(‘command_line'))
    elif msg_type == 'end_run':
    print 'Finished run'
    else:
    line = linecache.getline(msg_payload['filename'],
    msg_payload['line_no']).rstrip()
    if msg_type == 'return':
    print '%s:%4s: return>>> %s' % (
    msg_payload['filename'],
    msg_payload[‘line_no'], msg_payload['arg'])
    else:
    print '%s:%4s: %s' % (
    msg_payload['filename'],
    msg_payload[‘line_no'], line)
    if msg_payload.get('locals'):
    for n, v in sorted(msg_payload['locals'].items()):
    print '%s %s = %s' % (
    ' ' * len(msg_payload['filename']),
    n,
    v,
    )
    print

    View Slide

  18. def gen(m):
    for i in xrange(m):
    yield i
    def c(input):
    print 'input =', input
    data = list(gen(input))
    print 'Leaving c()'
    def b(arg):
    val = arg * 5
    c(val)
    print 'Leaving b()'
    return val
    def a():
    print 'args:', sys.argv
    b(2)
    print 'Leaving a()'
    a()

    View Slide

  19. $ smiley monitor
    Waiting for incoming data
    MESSAGE: ['start_run', {u'command_line': [u'simple.py'], u'run_id':
    u'e87302b2-402a-4243-bb27-69a467d4bd8e'}]
    Starting new run: [u'simple.py']
    MESSAGE: ['call', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.142863,
    u'line_no': 1,
    u'filename': u'/Users/dhellmann/Devel/smiley/scratch/simple.py',
    u'func_name': u'', u'arg': None, u'locals': {u'self':
    u'', u'command_line':
    [u'simple.py']}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 1: import sys
    command_line =
    [u'simple.py']
    self =

    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.143175,
    u'line_no': 1,
    u'filename': u'/Users/dhellmann/Devel/smiley/scratch/simple.py',
    u'func_name': u'', u'arg': None, u'locals': {u'self':
    u'', u'command_line':
    [u'simple.py']}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 1: import sys
    command_line =
    [u'simple.py']
    self =

    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.143499, u'line_no': 3, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'', u'arg':

    View Slide

  20. $ smiley monitor
    Waiting for incoming data
    MESSAGE: ['start_run', {u'command_line': [u'simple.py'], u'run_id':
    u'e87302b2-402a-4243-bb27-69a467d4bd8e'}]
    Starting new run: [u'simple.py']
    MESSAGE: ['call', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.142863,
    u'line_no': 1,
    u'filename': u'/Users/dhellmann/Devel/smiley/scratch/simple.py',
    u'func_name': u'', u'arg': None, u'locals': {u'self':
    u'', u'command_line':
    [u'simple.py']}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 1: import sys
    command_line =
    [u'simple.py']
    self =

    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.143175,
    u'line_no': 1,
    u'filename': u'/Users/dhellmann/Devel/smiley/scratch/simple.py',
    u'func_name': u'', u'arg': None, u'locals': {u'self':
    u'', u'command_line':
    [u'simple.py']}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 1: import sys
    command_line =
    [u'simple.py']
    self =

    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.143499, u'line_no': 3, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'', u'arg':

    View Slide

  21. MESSAGE: ['call', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.144927, u'line_no': 21, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'a', u'arg': None,
    u'locals': {}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 21: def a():
    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.145128, u'line_no': 22, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'a', u'arg': None,
    u'locals': {}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 22: print 'args:',
    sys.argv
    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.145343, u'line_no': 23, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'a', u'arg': None,
    u'locals': {}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 23: b(2)

    View Slide

  22. View Slide

  23. View Slide

  24. def start_run(self, run_id, cwd, description,
    start_time):
    "Record the beginning of a run."
    with transaction(self.conn) as c:
    c.execute(
    """
    INSERT INTO run (id, cwd, description,
    start_time)
    VALUES
    (:id, :cwd, :description, :start_time)
    """,
    {'id': run_id,
    'cwd': cwd,
    'description': description,
    'start_time': start_time}
    )

    View Slide

  25. def _process_message(self, msg):
    msg_type, msg_payload = msg
    if msg_type == 'start_run':
    command_line = ' ‘.join(
    msg_payload.get('command_line', []))
    self._cwd = msg_payload.get('cwd', '')
    if self._cwd:
    self._cwd = (self._cwd.rstrip(os.sep) +
    os.sep)
    self.db.start_run(
    run_id=msg_payload.get('run_id'),
    cwd=self._cwd,
    description=command_line,
    start_time=msg_payload.get('timestamp'),
    )

    View Slide

  26. elif msg_type == 'end_run':
    self.log.info('Finished run')
    self.db.end_run(
    run_id=msg_payload['run_id'],
    end_time=msg_payload.get('timestamp'),
    message=msg_payload.get('message'),
    traceback=msg_payload.get('traceback'),
    )
    else:
    self.db.trace(
    run_id=msg_payload['run_id'],
    event=msg_type,
    func_name=msg_payload.get('func_name'),
    line_no=msg_payload.get('line_no'),
    filename=msg_payload.get('filename'),
    trace_arg=msg_payload.get('arg'),
    locals=msg_payload.get('locals'),
    timestamp=msg_payload.get('timestamp'),
    )

    View Slide

  27. • Replay past runs
    • Complex data types
    flickr/Chris Marquardt

    View Slide

  28. import json
    import traceback
    import types
    def _json_special_types(obj):
    if isinstance(obj, types.TracebackType):
    return traceback.extract_tb(obj)
    if isinstance(obj, type):
    # We don't want to return classes
    return repr(obj)
    try:
    data = dict(vars(obj))
    data['__class__'] = obj.__class__.__name__
    data['__module__'] = obj.__class__.__module__
    except Exception as err:
    data = repr(obj)
    return data
    def dumps(data):
    return json.dumps(data, default=_json_special_types)

    View Slide

  29. ?

    View Slide

  30. class EventProcessor(object):
    __metaclass__ = abc.ABCMeta
    @abc.abstractmethod
    def start_run(self, run_id, cwd, description,
    start_time):
    """Called when a 'start_run' event is seen.
    """
    @abc.abstractmethod
    def end_run(self, run_id, end_time, message,
    traceback):
    """Called when an 'end_run' event is seen.
    """
    @abc.abstractmethod
    def trace(self, run_id, event,
    func_name, line_no, filename,
    trace_arg, local_vars,
    timestamp):
    """Called when any other event type is seen.
    """

    View Slide

  31. def get_runs(self):
    "Return the runs available to browse."
    with transaction(self.conn) as c:
    c.execute(
    """
    SELECT
    id, cwd, description, start_time,
    end_time, error_message
    FROM run
    """
    )
    return c.fetchall()

    View Slide

  32. def take_action(self, parsed_args):
    self.out = output.OutputFormatter(
    linecache.getline)
    self.db = db.DB(parsed_args.database)
    run_details = self.db.get_run(parsed_args.run_id)
    self.out.start_run(
    run_details.id,
    run_details.cwd,
    run_details.description,
    run_details.start_time,
    )
    for t in self.db.get_trace(parsed_args.run_id):
    self.out.trace(
    t.run_id, t.event, t.func_name,
    t.line_no, t.filename, t.trace_arg,
    t.local_vars, t.timestamp,
    )
    self.out.end_run(
    run_details.id,
    run_details.end_time,
    run_details.error_message,
    None) # run_details.traceback

    View Slide

  33. MESSAGE: ['call', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.144927, u'line_no': 21, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'a', u'arg': None,
    u'locals': {}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 21: def a():
    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.145128, u'line_no': 22, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'a', u'arg': None,
    u'locals': {}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 22: print 'args:',
    sys.argv
    MESSAGE: ['line', {u'run_id': u'e87302b2-402a-4243-bb27-69a467d4bd8e',
    u'timestamp': 1436367235.145343, u'line_no': 23, u'filename': u'/Users/
    dhellmann/Devel/smiley/scratch/simple.py', u'func_name': u'a', u'arg': None,
    u'locals': {}}]
    /Users/dhellmann/Devel/smiley/scratch/simple.py: 23: b(2)

    View Slide

  34. class DBLineCache(object):
    def __init__(self, db, run_id):
    self._db = db
    self._run_id = run_id
    self._files = {}
    def getline(self, filename, line_no):
    if filename not in self._files:
    body = self._db.get_cached_file(
    self._run_id, filename)
    self._files[filename] = body.splitlines()
    try:
    return self._files[filename][line_no]
    except IndexError:
    # Line number is out of range
    return ''

    View Slide

  35. def take_action(self, parsed_args):
    # Fix import path
    cwd = os.getcwd()
    if (cwd not in sys.path and
    os.curdir not in sys.path):
    sys.path.insert(0, cwd)
    # Fix command line args
    sys.argv = parsed_args.command
    # Run the app
    p = publisher.Publisher(parsed_args.socket)
    t = tracer.Tracer(p)
    t.run(parsed_args.command)

    View Slide

  36. def take_action(self, parsed_args):
    # Fix import path
    cwd = os.getcwd()
    if (cwd not in sys.path and
    os.curdir not in sys.path):
    sys.path.insert(0, cwd)
    # Fix command line args
    sys.argv = parsed_args.command
    # Run the app
    if parsed_args.mode == 'remote':
    p = publisher.Publisher(parsed_args.socket)
    else:
    p = db.DB(parsed_args.database)
    t = tracer.Tracer(p)
    t.run(parsed_args.command)

    View Slide

  37. 1.ZMQ
    2.Tracing
    3.Command line interface
    4.Database
    5.Complex objects
    6.EventProcessor API
    7.Replay command
    8.Store source files
    9.Local database

    View Slide

  38. 1.ZMQ
    2.Tracing
    3.Command line interface
    4.Database
    5.Complex objects
    6.EventProcessor API
    7.Replay command
    8.Store source files
    9.Local database

    View Slide

  39. UI Tools

    View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. class FileController(RestController):
    @expose(generic=True, template='file.html')
    @nav.active_section('runs')
    def get_one(self, run_id, file_id):
    filename, body = \
    request.db.get_cached_file_by_id(run_id,
    file_id)
    run = request.db.get_run(run_id)
    lexer = guess_lexer_for_filename(filename, body)
    formatter = HtmlFormatter(linenos=True)
    styled_body = highlight(body, lexer, formatter)
    return {
    'run_id': run_id,
    'run': run,
    'filename': filename,
    'body': body,
    'styled_body': styled_body,
    }

    View Slide

  44. View Slide

  45. class StyledLineCache(object):
    def __init__(self, db, run_id):
    self._db = db
    self._run_id = run_id
    self._files = {}
    EXPECTED_PREFIX = ''
    EXPECTED_SUFFIX = ''
    def getline(self, filename, line_no):
    if filename not in self._files:
    body = self._db.get_cached_file(self._run_id,
    filename)
    styled_body = apply_style(filename, body,
    linenos=False)
    start = len(self.EXPECTED_PREFIX)
    end = -1 * (len(self.EXPECTED_SUFFIX) + 1)
    middle_body = styled_body[start:end].rstrip('\n')
    self._files[filename] = middle_body.splitlines()
    try:
    return self._files[filename][line_no-1]
    except IndexError:
    # Line number is out of range
    return ''

    View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. ✓ Web UI
    ✓ Profiling Data
    ✓ Call Graph
    ✓ Syntax Highlighting
    • Only Changed Variables
    • Comments

    View Slide

  51. def _mk_seq(d):
    return sorted(
    (k, pformat(v, width=20))
    for k, v in d.iteritems()
    )
    def get_variable_changes(older, newer):
    s_a = _mk_seq(older)
    s_b = _mk_seq(newer)
    matcher = difflib.SequenceMatcher(None, s_a, s_b)
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
    if tag in {'insert', 'replace'}:
    for i in s_b[j1:j2]:
    yield i

    View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. Other Features
    Tracing
    Threaded
    Applications
    Static
    HTML
    Reports
    DB
    Export/
    Import
    Pagination
    in
    Web UI

    View Slide

  57. Other Features
    Tracing
    Threaded
    Applications
    Static
    HTML
    Reports
    DB
    Export/
    Import
    Pagination
    in
    Web UI

    View Slide

  58. 0
    1000
    2000
    3000
    4000
    2013-05-25
    2013-05-26
    2013-05-27
    2013-05-28
    2013-06-02
    2013-06-03
    2013-06-05
    2013-06-06
    2013-06-08
    2013-06-16
    2013-06-23
    2013-07-14
    2013-07-27
    2013-07-28
    2013-09-01
    2013-12-28
    2013-12-29
    2013-12-30
    2014-01-04
    2014-01-12
    2014-07-06
    2014-08-01

    View Slide

  59. 0
    1000
    2000
    3000
    4000
    2013-05-25
    2013-05-26
    2013-05-27
    2013-05-28
    2013-06-02
    2013-06-03
    2013-06-05
    2013-06-06
    2013-06-08
    2013-06-16
    2013-06-23
    2013-07-14
    2013-07-27
    2013-07-28
    2013-09-01
    2013-12-28
    2013-12-29
    2013-12-30
    2014-01-04
    2014-01-12
    2014-07-06
    2014-08-01

    View Slide

  60. 0
    1000
    2000
    3000
    4000
    2013-05-25
    2013-05-26
    2013-05-27
    2013-05-28
    2013-06-02
    2013-06-03
    2013-06-05
    2013-06-06
    2013-06-08
    2013-06-16
    2013-06-23
    2013-07-14
    2013-07-27
    2013-07-28
    2013-09-01
    2013-12-28
    2013-12-29
    2013-12-30
    2014-01-04
    2014-01-12
    2014-07-06
    2014-08-01

    View Slide

  61. • Performance
    • Standard I/O
    • GUI
    • Compare runs
    flickr/Mike Mozart

    View Slide

  62. Freenode: dhellmann
    [email protected]
    github.com/dhellmann/smiley
    smiley.readthedocs.org

    View Slide

  63. View Slide