sys._current_frames(): Take real-time x-rays of your software for fun and performance by Leonardo Rochael

sys._current_frames(): Take real-time x-rays of your software for fun and performance by Leonardo Rochael

Take real-time X-rays of your software for Fun and Performance by Leonardo_Rochael

Afcfefa1f067d10bd021de0cc2e5e806?s=128

PyCon 2013

March 17, 2013
Tweet

Transcript

  1. 1.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License sys._current_frames() Take real-time X-rays of your software for Fun and Performance Leonardo Rochael Almeida Simples Consultoria
  2. 2.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Performance Debugging Status Quo • Prevention – Functional Performance testing • testutils.org/multi-mechanize – Before production • You do that, right?!
  3. 3.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Performance Debugging Status Quo • Production: – Wait for the angry mob • “Reproduction instructions, please?” – Proactive Monitoring
  4. 4.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License With reproduction instructions • “profile/cProfile” – Heavy performance toll – Complicated output – Show calls only, not lines
  5. 5.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License With Reproduction Instructions • “statprof” – Sampling of tracebacks → statistics – More relevant info than “cProfile” – And lighter – Still doesn't “tell the story”
  6. 6.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Without reproduction instructions • “profile/cProfile” – Way too slow for production • Sprinkle timing calls – log(time.time() - start_time) – Time consuming... – On the production environment?! • DeadlockDebugger
  7. 7.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Quick Zope Primer • Z Object Publishing Environment • One Listening thread – Parses HTTP Request • A few Worker threads: – http://server/document/method?foo=bar – document.method(foo=“bar”) • Transactional
  8. 8.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License DeadlockDebugger for thread_id, frame in sys._current_frames().items(): output = StringIO() traceback.print_stack(frame, file=output) res.append("Thread %s:\n%s" % (thread_id, output.getvalue())) res.append("End of dump") return '\n'.join(res) • Directly from the Listening thread – At special URL – Works even if system stuck
  9. 9.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License sys._current_frames() • “threadframe” (Python 2.4) • Python dict – Key: thread id (int) • Same as thread.get_ident() for each thread – Value: Stack Frame • Currently executing frame on this thread • Tip of the thread's stack
  10. 10.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Stack Frame • Currently executing Function – file name – line number • Pointer to previous frame • Variables – locals – globals • [...]
  11. 11.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License DeadlockDebugger • Instant snapshot – Thread: 1 • Stack trace … – Thread: 2 • Stack trace … • Good for – System is slow Right Now
  12. 12.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Proactive Monitoring • Parsing of HTTP server logs – Reports slow URLs – But many “false” positives • Maatkit (now Percona) – Reports slow MySQL Queries • Still not enough info – Just sprinkling timing calls
  13. 13.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Can we do better? • DeadlockDebugger continuously • Only in slow requests • Add extra info
  14. 14.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger • Created at Nexedi • Design – Julien Muchembled – Sébastien Robin – Leonardo Rochael Almeida • Implementation – Leonardo Rochael Almeida
  15. 15.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger • Intercepts and Monitores each request • Logfile: – Request X, thread Y, running for T seconds • Request Info • Stack trace – Request X, thread Y, running for T+2 seconds • Request Info • Stack trace – [...]
  16. 16.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger shows • Which requests were slow – Not the collateral damage • Why exactly they were slow • The right spot to add optimizations
  17. 17.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License ERP5 Case Study • “T-Shirt Custom Printing company” • ERP5: Zope based ERP from Nexedi • MRP, CRM, Accounting, ... • Random peaks of horrible performance • Never twice on the same URL • Always at moments of high volume
  18. 18.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License ERP5 Case Study • Culprit: Sale Order Line Price • Calculated on demand – Only the first time – Usually by background activities • Stored on the Sale Order Line • Death by 1000 cuts – Solved in 15 LOCs
  19. 19.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Plone Case Study • Large government portal • LongRequestLogger: – Slowdowns at specific times – Culprit: database network access – Slowed down by scheduled backup
  20. 20.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License What to optimize: Limit between Hot and Cold • Stack Trace at T – A • calls B – B • calls C – C • calls D – D • calls E – E • calls F • Stack Trace at T+1 – A • calls B – B • calls C – C • calls X – X • calls Y – Y • calls Z
  21. 21.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Why it works • Statistic Sampling Profiling • Pure probability – What is slow, shows • Stack traces “tell the story” – Enriched by additional info • Captures even the most un-reproduceable • User Independence – Diagnose/Optimize before complaints
  22. 22.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger: patch def wrapper(*args, **kw): m = Monitor() try: result = wrapper.original(*args, **kw) return result finally: m.stop() def do_patch(): from ZPublisher import Publish wrapper.original = Publish.publish_module_standard Publish.publish_module_standard = wrapper
  23. 23.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger: Monitor class Monitor(Thread): def __init__(self): Thread.__init__(self) self._event_pipe = os.pipe() self.dumper = dumper.Dumper(thread.get_ident()) Thread.start(self) def run(self): timeout = self.dumper.timeout while self.waiting(timeout): self.dumper() timeout = self.dumper.interval
  24. 24.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger: Dumper def extract_request(self, frame): from ZPublisher.Publish import call_object func_code = call_object.func_code while frame is not None: code = frame.f_code if (code is func_code): request = frame.f_locals.get('request') return request frame = frame.f_back def __call__(self): frame = sys._current_frames()[self.thread_id] output = StringIO() thread_info = self.get_thread_info(frame) print >> output, thread_info print >> output, "Traceback:" traceback.print_stack(frame, file=output) self.log.warning(output.getvalue())
  25. 25.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License LongRequestLogger: Monitor def waiting(self, timeout): r, w = self._event_pipe read_ready_list, _, _ = select([r], [], [], timeout) if read_ready_list: os.read(r, 1) # stop() was called by monitored thread. # stop waiting: return False # still waiting for the other thread to finish return True def stop(self): """Stop monitoring this thread""" r, w = self._event_pipe os.write(w, '\0') self.join() map(os.close, (r, w))
  26. 26.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Challenges • Data volume – Parsing – De-interleaving • User Interface (or lack thereof...) – grep – <your text editor here>
  27. 27.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Future Improvements • Reusable Library • Helpers for analytics – Searching / sorting • Better/more information extraction from frames • Structured log storage – Richer comparisons • Ignore set
  28. 28.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License Open Questions • True performance impact – How short can the recurring interval be? – GIL contention if too short? – Single thread to monitor all others?
  29. 29.

    © 2012 Nexedi S/A, Some rights reserved, GNU Free Documentation

    License References • Stack Overflow on sampling profiling: – http://stackoverflow.com/questions/375913 • “statprof”: – http://pypi.python.org/pypi/statprof/ • http://www.nexedi.com • http://www.erp5.com • @LeoRochael • leonardo@nexedi.com • LeoRochael@gmail.com