Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scientific Software as Python Extension Modules

Scientific Software as Python Extension Modules

Course materials from Scientific Computing with Python. 2002.

70c42f4cf225f1455a7e01379bbd4d48?s=128

David Beazley

January 21, 2002
Tweet

Transcript

  1. Software Migration, January 21, 2002 1 beazley@cs.uchicago.edu Scientific Software as

    Python Extension Modules David M. Beazley Department of Computer Science University of Chicago beazley@cs.uchicago.edu January 21, 2002 PY-MA
  2. Software Migration, January 21, 2002 2 beazley@cs.uchicago.edu Introduction Question •

    Given an existing application, how do you get it to run inside Python? Thoughts • Rewriting the whole application is not an option. • Breaking the application for an extended period not an option. • Probably want to evolve the code and verify correct behavior as you go. Personal bias • I think that Python should always be an optional program feature. • Clean separation of what a program does and how it is controlled. • You always want to keep your options open • Maybe something better than Python will come along. • Or you will want to use the code in a different way. Alternative view • Build your application as a direct extension of Python using Python objects. • I won’t discuss that option.
  3. Software Migration, January 21, 2002 3 beazley@cs.uchicago.edu Focus The goal

    is not to just add a Python interpreter • You’re really trying to make your software more flexible. • Python is only a means to an end. Tasks • You want to expose the internals of your application. • Internal data structures, global variables, functions, constants, etc. • You want to use Python as a mechanism for exploring data. • Python as a control program. • Break the traditional batch processing cycle. • If possible, you want to clean up your software by creating modules. Comment • Almost everything I will discuss is independent of Python. • Would apply to other scripting environments (Perl, Tcl, Ruby, etc.). • Mostly relates to software engineering and software architecture.
  4. Software Migration, January 21, 2002 4 beazley@cs.uchicago.edu Tools and Resources

    What you need to start • Python • An extension building tool (e.g., SWIG) • A good book on Python. • Source code for your application (especially header files). • C/C++ compiler. • Some understanding of makefiles (hopefully). Assumptions • I will assume the use of SWIG in this tutorial. • Will also assume C/C++ code. • If using Fortran, there are special Fortran tools (f2py, pyfort). • Extensions still involve some C programming however.
  5. Software Migration, January 21, 2002 5 beazley@cs.uchicago.edu General Idea Remove

    main() and expose everything that’s left • Python interpreter is in control • main() no longer used. • Python wrappers added to provide interface (generated by SWIG). main() myprog python myprogmodule.so wrappers
  6. Software Migration, January 21, 2002 6 beazley@cs.uchicago.edu A First Extension

    Module Make sure you can compile a Python extension • Create an empty extension module (no functions, no variables, etc.) // swig: myprog.i %module myprog.i • Compile $ swig -python myprog.i $ cc -c -I/usr/local/include/python2.1 myprog_wrap.c $ cc -shared myprog_wrap.o -o myprogmodule.so Try it and make sure there are no errrors $ python Python 2.1 (#3, Aug 20 2001, 15:41:42) [GCC 2.95.2 19991024 (release)] on sunos5 >>> import myprog >>>
  7. Software Migration, January 21, 2002 7 beazley@cs.uchicago.edu Creating a Library

    Scripting language extensions are libraries • You need to convert your application to a library Suggestion • Change the makefile to build an archive in addition to an executable. target: cc $(OBJS) -lm -o myprog target: cc $(OBJS) -lm -o myprog ar cr libmyprog.a $(OBJS) Comment • Include everything in the archive. Don’t worry about main() or other functions.
  8. Software Migration, January 21, 2002 8 beazley@cs.uchicago.edu Linking Test Link

    your extension module with your library • Easy to do in the Makefile (add a new rule). python: swig -python myprog.i cc -c -I/usr/local/include/python2.1 myprog_wrap.c cc -shared myprog_wrap.o -L. -lmyprog -o myprogmodule.so Try loading into Python >>> import myprog >>> • If you get an ImportError, you are probably missing some libraries. • Add extra libraries to the link line and repeat until problems go away. Now the tricky part... • Actually doing something. • Replicating the functionality of main().
  9. Software Migration, January 21, 2002 9 beazley@cs.uchicago.edu Program Initialization Look

    for initialization code • In main(), there is probably some code similar to this: #include "header.h" int main() { ... init_memory(); init_io(); ... }; Expose to Python and test %module myprog %{ #include "header.h" %} >>> import myprog void init_memory(); >>> myprog.init_memory() void init_io(); >>> myprog.init_io() ... ... • See if you can get the program to do anything without crashing.
  10. Software Migration, January 21, 2002 10 beazley@cs.uchicago.edu Solve a Simple

    Problem Pick a very simple computational problem • Something with a known result • Maybe a test run of some kind Replicate as a Python script • Look at the code in main() • Expose all of the functions needed to run the problem from Python. • Add to the SWIG interface file. • Write a short Python script that executes the same sequence of operations • Verify the output. Comments • This stage of development is probably the most difficult • Modifications to build environment. • Forcefully ripped the whole control mechanism out of the code. • A tangled mess left behind. Question • Now what?
  11. Software Migration, January 21, 2002 11 beazley@cs.uchicago.edu Libraries Application as

    a library • The program consists of functions, global variables, constants, classes, etc. • You will need to think about how all of this is organized. Header files • You will probably want to create some kind of library header file • Describes every function and variable in your program (that you want to expose). /* myprog.h */ #ifndef _HAVE_MYPROG_H #define _HAVE_MYPROG_H extern void init_memory(); extern void init_io(); extern int integrate(double Dt, int nsteps); extern double Cutoff; ... #endif • This step may be easy (existing headers) • May have to do some searching and cleanup.
  12. Software Migration, January 21, 2002 12 beazley@cs.uchicago.edu Building a Full

    Extension Module Library --> Extension • Expose all functions, variables, structures, etc. in library header to Python • Complexity of header will determine difficulty of this task • In SWIG... %module myprog %{ /* Include header in wrapper file */ #include "myprog.h" %} /* Parse the header to generate wrappers */ %include "myprog.h" Goal • Create some kind of rudimentary extension module that mirrors the application Comments • May need to do extra work to make this work. • May want to tackle the library in small pieces. • Focus on functions and variables, then structures and classes.
  13. Software Migration, January 21, 2002 13 beazley@cs.uchicago.edu Start Using the

    Program Replicate the functionality of main() • Write some scripts to solve various problems. • Exercise various features of the scripting language interface. • Make sure everything is working like you expect. • You might even rewrite main() entirely in Python as an experiment. • Or to maintain backwards compatibility with old programs. Case study • In our own application, we added integrated visualization • Created Python wrappers for some graphics functions. • Wrote some small functions to plot data. • Experimented with interactive simulation/visualization >>> integrate(100) # simulation >>> plot() # visualization >>> rotx() # visualization >>> zoom(200) # visualization >>> integrate(1000) # simulation >>> plot() # visualization ...
  14. Software Migration, January 21, 2002 14 beazley@cs.uchicago.edu Usability Problems By

    using the scripted code, you will discover problems • Awkward function calls • Inaccessible data • Missing functionality • Illogical control flow. • Catastrophic program crashes. • Other annoyances You will also uncover useful information • A better sense of how different pieces of the program relate • Parts of the program that are independent • Opportunities for cleanup and improvement. This starts a cycle of incremental refinement • This depends on the application • I will give a few examples
  15. Software Migration, January 21, 2002 15 beazley@cs.uchicago.edu Module Identification In

    many applications, main() is the glue • Along with bits and pieces of user interface (UI) code. • Code elimination may reveal more structure. An opportunity for code reorganization main() big monolithic program UI UI UI UI remove main() Loosely coupled modules
  16. Software Migration, January 21, 2002 16 beazley@cs.uchicago.edu Library Reorganization Instead

    of having one module, create many src/ lib/ io/ libio.a integrate/ libintegrate.a grid/ libgrid.a crack/ libcrack.a force/ libforce.a ... ... Separate Python modules • Create a Python extension module for each library independently • Clean up header files and other source code. Comments • This process takes a lot of work (and thought). • Separates parts of your application into logical subcomponents • Changes development. • No longer work on the whole application---you work on smaller components. • There are some complications with libraries (will cover later).
  17. Software Migration, January 21, 2002 17 beazley@cs.uchicago.edu Thoughts on Modules

    Programming with modules • Subdivision into modules makes application seem smaller • Each module only a few thousand lines of code • Doesn’t look or feel like a huge application anymore • Simplifies maintenance. Can work on individual modules. Case study • In our application, "user" code dropped from 30000 to about 2000 lines of code. • This was the code users typically wrote to set up and run simulations. • Everything else disappeared into libraries. • Instead of copying whole package into their own directory, copy a module
  18. Software Migration, January 21, 2002 18 beazley@cs.uchicago.edu Data Structures Modules

    force you to think about data structures • How modules exchange data • The organization of internal data structures • Mechanisms for accessing data • Extensible data structures • Dynamic allocation of data. • Parameter passing Example • Parameters vs. globals vs. objects void foo(int x, int y, double a, double b, int n); int x, y; double a,b; void foo(int n); struct Foo { int x,y; doule a,b; } void foo(Foo *f, int n);
  19. Software Migration, January 21, 2002 19 beazley@cs.uchicago.edu Data-Centric Design In

    simulation codes, almost everything centers on data • Grids, meshes, particles, etc. • We decided to make data the center of the system (as globals) • Collection of utility functions for accessing the data Modules • Merely perform operations on the global data (queries, transforms, etc.) • Python simply loads and coordinates the modules. Simulation Data Python Visualize Analysis I/O Integrate Force
  20. Software Migration, January 21, 2002 20 beazley@cs.uchicago.edu Data-Centric Design Comments

    • Assumes that a user only runs one simulation at a time. • Greatly simplifies the interface between functions and modules. • Global simulation data is implicitly assumed everywhere (not passed). • Object-orientation used elsewhere in the system (e.g., visualization) • This is not the only approach Contrast to data-flow Initial Condition Iterator Integrate Visualize Boundary Force data
  21. Software Migration, January 21, 2002 21 beazley@cs.uchicago.edu Python Access to

    Data Data Wrapping • C/C++ data structures can be hidden behind Python objects • Objects can use operator overloading to mimic Python lists, dictionaries, etc. Operators • Almost all standard mathematic operators can be overloaded Tools • SWIG makes this very easy--simplified implementation of accessors. double *ary; ... class dArray: def __init__(self,ptr): self.ptr = ptr def __getitem__(self,index): .... def __setitem__(self,index,value): .... ... >>> a = dArray() >>> print a[3] 3.887233 >>> a[3] = 2.5 >>> 3.887233
  22. Software Migration, January 21, 2002 22 beazley@cs.uchicago.edu Python Access to

    Data Access vs. marshalling • Primary goal is to provide access to underlying data structures • Usually do not want to convert data into Python objects (marshalling). • Example: convert a C/C++ array into a Python list. • Instead, you provide a proxy object that refers to C/C++ data. Discussion • Should you use native Python objects in the simulation? • Example: Numeric Python arrays. • I think it’s a bad idea if you care about portability and long-term development. • Not everyone agrees with me on this point.
  23. Software Migration, January 21, 2002 23 beazley@cs.uchicago.edu Summary Existing application

    turned into a library • Elimination of old user interface. • Identification of useful functionality Incremental refinement • Identification of modules • Source cleanup. • Changes to the API. • Changes to data structures (if necessary). • This is a gradual process Role of Python • Python allows application to be used interactively. • Can explore data structures, execute functions, etc. • This kind of exploration can identify problems in the design • If it’s hard to use from Python, then maybe there is a better way. • Interactivity allows you to experiment with module interfaces • Interaction between modules
  24. Software Migration, January 21, 2002 24 beazley@cs.uchicago.edu Case Study SPaSM

    code • Initial scripting language interface took a few days. • Incremental refinement for more than a year afterwards • Data structure cleanup. • Modules • Error handling • User interface problems. Changed some functions around. • Changes were always motivated by the use of the code from Python Overview • System broken into about 8 different modules • Each module built into as a separate Python extension. • SWIG interface for each module (incorporated into build process). • Modules maintained separately (split the source code). Discussion