Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing Reliable Python Extensions (and Debugging)

Designing Reliable Python Extensions (and Debugging)

Course materials from Scientific Computing with Python. 2002.

David Beazley

January 21, 2002
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 1

    [email protected] Designing Reliable Python Extensions (and Debugging) David M. Beazley Department of Computer Science University of Chicago [email protected] January 21, 2002 PY-MA
  2. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 2

    [email protected] Introduction Python environment is more complicated • Scripts • Shared libraries • Extension modules Applications are more flexible • Interactive interpreter • Can call any function at any time in any order • Full access to underlying data • Dynamic problem reconfiguration. • This is exactly what you want! Most scientific programs not this flexible • Written as batch processing jobs • Very predictable control flow. • Few failure modes. • Easy to debug (maybe).
  3. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 3

    [email protected] Reliability Concerns Frequent crashes • When a program is first converted, it may crash a lot • Segmentation faults, failed assertions. Wrong answers • How do you know a program gave the correct result? • Missing parameters. • Use of incompatible procedures (improper integrator, etc.) This is a serious issue • Often ignored in the scientific scripting literature
  4. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 4

    [email protected] Catastrophic Failure Modes Uninitialized data • User invoked function before data was initialized. • Example: calculate forces before setting up initial condition. Bad parameters • Function called with invalid parameter value • Example: bad filename, negative number, NULL pointer. Non-reentrant functions • User called the same function twice--program crashes. • Example: function assumes uninitialized state on entry. Non-graceful error handling • Program throws an assertion or calls exit() • Python interpreter just dies. No error message. Normally these errors are hidden in batch jobs • Very predictable control-flow • Internals not exposed to user.
  5. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 5

    [email protected] Incorrect Results Uninitialized data • Missing parameter specification. • Missing initialization function. Bad parameters • Function given a physically meaningless parameter. • Domain error. Data manipulation • User modified data directly from interpreter. Calling functions in wrong order • Incorrect sequence of operations in an integration loop.
  6. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 6

    [email protected] Strategy Defensive programming • If it might break, assume that it will break. • Always assume that user will give bad parameters. • Always assume that functions might return a bad result. Error handling • Design a uniform mechanism for returning errors to Python. • Don’t just call exit(). • Use SWIG to help with error handling (some of this can be automated). Library design • Build error handling into libraries • Even if you have to modify the original code. • The results pay off--even when Python not used.
  7. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 7

    [email protected] Programming with Assertions An error-prone function: double sqrt(double x) { double r; /* Compute sqrt */ ... return r; } With assertions double sqrt(double x) { if (x < 0) { error("sqrt: domain error"); } /* Compute sqrt */ ... if (r < 0) { error("sqrt: implementation error!"); } return r; }
  8. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 8

    [email protected] Assertions Pre-assertions • Checking of function input values. • Makes sure function operates on acceptable input values. Post-assertions • Checking of function output values • Makes sure function is operating correctly. Important technique for writing reliable libraries • Borrowed from "Design by contract"
  9. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 9

    [email protected] Assertions Range/value checking • Make sure a variable has a proper value. • Shown in the sqrt() example. Synchronization assertion • Make sure function A executes before function B. • Implement using some kind of synchronization variable int Async = 0; void A() { ... Async = 1; } void B() { if (!Async) error("Must call A() first"); ... } • Can make this arbitrarily complex • Technique also works for non-reentrant functions.
  10. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 10

    [email protected] Checking Return Codes Always check function return codes char *ptr = malloc(8192); if (!ptr) error("malloc failed!"); FILE *f = fopen(filename,"r"); if (!f) error("Can’t open file"); An amazing source of errors • Spend 10 hours debugging only to find that you didn’t check a return code.
  11. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 11

    [email protected] An Exit Strategy Always provide a bailout mechanism • Error return codes • C++ exceptions • jongjmp/setjmp Don’t want • Program to keep running in an error state • Program to exit with no error message. >>> foo() foo() bar() spam() error RuntimeError: spam: domain error >>>
  12. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 12

    [email protected] Designing for Reconfiguration Initialization functions • In batch jobs, usually only called once. • In scripting, may be called many times. Dynamic program reconfiguration • User may want to change important problem parameters • Grid size. • Data distribution across processors. • Problem geometry. Techniques • Make sure an application can clean up previous simulation data • Plan for dynamic memory management. • Do not rely upon fixed sized arrays. • Be careful with memory.
  13. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 13

    [email protected] Use Common Sense Add error checking where it makes sense • Functions likely to be invoked directly by user • Utility functions. • I/O and system related functions. Not in the inner loop • Low-level math operations • Functions used in performance critical operations.
  14. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 14

    [email protected] Case Study Adding reliability features • This was the best modification we made to our application • Cleaned up the design. • Improved stability. • More confidence in our results. • A more user friendly environment. Our Implementation • Used setjmp/longjmp to add C exception handling • Merged with Python exception handling using SWIG. • Even added line numbers Traceback (most recent call last): File "<stdin>", line 1, in ? RuntimeError: malloc(10000000) failed (memory.c, line 52)
  15. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 15

    [email protected] Panic! When all else fails: $ python myscript.py Segmentation Fault (core dumped) $ Traditional debugging is difficult • What is the application being debugged? • A script? • An extension module? • Python? Usual approach • Run Python under the debugger. $ gdb python (gdb) run myscript.py ... • Not an ideal solution. Debugging with gdb is difficult. • Gives no information about script code.
  16. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 16

    [email protected] Mixed Language Debugging An open area of systems research • A little work in Java-C++ debugging. • Not much with scripting environments. WAD (Wrapped Application Debugger) • A highly experimental system I am working on • A debugger encapsulated in a shared library • Linked to extension modules $ cc -shared $(OBJS) -o foomodule.so -lwadpy How it works • Fatal process errors converted into recoverable Python exceptions • Segfaults, Bus errors, assertions, etc.
  17. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 17

    [email protected] Mixed Language Debugging Fatal Extension Error with WAD $ python myscript.py Traceback (most recent call last): File "<stdin>", line 1, in ? File "myscript.py", line 16, in ? foo() File "myscript.py", line 7, in spam doh.doh(a,b,c) SegFault: [ C stack trace ] #2 0x00027774 in call_builtin(func=0x1c74f0,arg=0x1a1ccc,kw=0x0) #1 0xff022f7c in _wrap_doh(0x0,0x1a1ccc,0x160ef4,0x9c,0x56b44,0x1aa3d8) #0 0xfe7e0568 in doh(a=0x3,b=0x4,c=0x0) in 'foo.c', line 28 /u0/beazley/Projects/WAD/Python/foo.c, line 28 int doh(int a, int b, int *c) { => *c = a + b; return *c; } Caveats • Highly experimental • Not portable (i386-Linux and SPARC Solaris only). • Maybe in the future...
  18. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 18

    [email protected] Summary Plan for failure • Your application is going to crash • Take a defensive approach to programming. Adding reliability features is a big win • It improves your code even if you don’t use Python • Results in extremely stable modules • Gives confidence in results. Debugging • This is the biggest weakness of the scripting approach. • Traditional debuggers work. • Some research in mixed-language debugging. However, it’s immature.
  19. Libraries, Shared Libraries, and Dynamic Loading, January 21, 2002 19

    [email protected] The End? Topics not covered • Numeric Python • Plotting packages • SciPy (www.scipy.org) • Working with Fortran Resources • www.python.org (Python) • www.vex.net/parnassus (Vaults of Parnassus) • www.swig.org (SWIG) Tips • Many scientific papers and tutorials in the International Python Conference. • All available online. • IEEE Computing in Science Engineering, Computers in Physics. Open discussion