Creating Scriptable Scientific Software David M. Beazley Department of Computer Science The University of Chicago [email protected] April 22, 2002 PY-MA
scientific programs are being transformed Motivation: Scripting languages offer many benefits • Interpreted and interactive user environment. • Extensible with compiled C/C++/Fortran code. • Rapid prototyping. • Debugging. • Systems integration and components. • Can make applications look a lot like MATLAB, IDL, etc. C/C++ Application C/C++ Application Scripting Language Non-interactive batch processing (Python, Perl, Tcl, etc.) (extension)
wrapper #include "Python.h" extern int gcd(int,int); PyObject *wrap_gcd(PyObject *self, PyObject *args) { int x,y; int result; if (!PyArg_ParseTuple(args,"ii", &x, &y)) { return NULL; } result = gcd(x,y); return PyInt_FromLong(result); } • Data conversion from C <---> Python. • You would write a wrapper for each part of your program. • Ex: 300 C functions ==> 300 wrapper functions • C++ classes, structures, templates, etc. are more complicated.
wants to write extension code • Highly repetitive. • Prone to error. • Difficult for complicated programs. Other issues • Scientific programs characterized by rapid change. • Functions change, variables change, objects change. • Piecemeal development. • Would require continual maintenance of the wrappers. • Complicates development. • Makes scripting languages impractical to use in early stages of a project.
• A C/C++ compiler for generating wrappers to existing code. • Freely available and in development since 1995. • Currently targets Python, Perl, Tcl, Ruby, Java, PHP , Guile, and Mzscheme. Source translation • C++ header files are parsed to generate wrappers Goals • Make it extremely easy for users (scientists) to build wrappers. • Allow scripting interface to automatically track changes in underlying source. • Make the wrapping process as invisible as possible. .h .h .h swig Wrapper Code C/C++ Perl, Python,Tcl, Ruby, ...
• Header file parsing. • Special SWIG directives. Supported C++ features • Functions, variables, constants. • Classes • Inheritance and multiple inheritance. • Pointers, references, arrays, member pointers. • Overloading (with renaming) • Operators. • Namespaces. • Templates. • Preprocessing. Not supported • Nested classes, member templates, template partial specialization Will show a few examples • Not a complete coverage of SWIG.
mixed with special SWIG directives. // example.i : Sample SWIG input file %module example %{ #include "example.h" %} // Resolve a name clash with Python %rename(cprint) print; // C/C++ declarations extern int gcd(int x, int y); extern int fact(int n); ... %include "example.h" // Parse a header file ...
and linking % swig -python example.i % cc -c -I/usr/local/include/python2.1 example_wrap.c % cc -shared example_wrap.o $(OBJS) -o examplemodule.so Use % python Python 2.1 (#3, Aug 20 2001, 15:41:42) [GCC 2.95.2 19991024 (release)] on sunos5 >>> import example >>> example.gcd(12,16) 4 >>> Comments: • Modules built as shared libraries/DLLs • Dynamic loading used to import into interpreter. • Contents of the module similar to C.
User only works with input file (C++) and scripts • Details of wrappers hidden. • Wrappers not modified by user. Only used to compile DLL. class Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Extension Module (DLL) >>> a = Complex(2,3) >>> b = Complex(4,5) >>> c = a + b >>> c.re() 6 >>> C++ (input) Python script swig
bad interface definition language • Type system complexity: typedef int (*PFIA[20])(int, double *x); double foo(PFIA *const x); • Ambiguity in data conversion (pointers, arrays, output values, etc.) double bar(double *x, double *y, double *r); • Structures, classes, unions. • Templates, namespaces, overloading, operators, etc. SWIG solution • Declaration annotation. • Pattern based type conversion. • Will provide a brief tour of internals.
• Must convert data between scripting and C representation. Example: In Python >>> gcd(12,16) 4 >>> count("Hello",5,"e") 1 int gcd(int x, int y); int count(char *buf, int len, char c); Integers String Single character
matching integrated with C++ typesystem %typemap(in) int { ... } typedef int Integer; ... Integer gcd(Integer x, Integer y); namespace std { class string; %typemap(in) string * { ... }; } namespace S = std; using std::string; ... void foo(string *a, S::string *b); Comments: • All type conversion in SWIG is pattern based. • Type conversion by naming convention. • Mostly hidden from users. • Allows advanced customization.
of Numeric Python array to C %typemap(in) (double *mat, int nx, int ny) { PyArrayObject *array; if (!PyArray_Check($input)) { PyErr_SetString(PyExc_TypeError,"Expected an array"); return NULL; } array = (PyArrayObject *) PyArray_ContiguousFromObject(input, PyArray_DOUBLE, 2, 2); if (!array) { PyErr_SetString(PyExc_ValueError, "array must be two-dimensional and of type float"); return NULL; } $1 = (double *) array->data; /* Assign grid */ $2 = array->dimensions[0]; /* Assign nx */ $3 = array->dimensions[1]; /* Assign ny */ } ... double determinant(double *mat, int nx, int ny); Key point • SWIG can be customized to handle new datatypes. • Customized data marshalling.
Existing C/C++ header files used to build wrappers. • Process guided by some special SWIG directives. • Most details hidden from user. • Can customize output using typemaps and other features. .h .h .h .h Scientific Application (C/C++/Fortran) Scientific Application (C/C++/Fortran) .i Scientific Application (C/C++/Fortran) Wrapper Layer swig DLL
of several components • Preprocessor. • C++ parser. • C++ type system. • Fully supports multi-pass compilation/code generation. • Internal data structures loosely based on XML-DOM. Target language modules • Implemented as C++ classes. • Virtual methods redefined according to target language. class SomeLanguage : public Language { public: virtual void main(int argc, char *argv[]); virtual int top(Node *n); virtual int functionWrapper(Node *n); virtual int variableWrapper(Node *n); ... };
• Nested classes (soon). • Certain advanced features of templates. • Not all C++ features map cleanly to scripting interface. • Subtle differences in semantics (assignment, overloading, etc.) Problematic topics • Callback functions and methods. • Memory management (object ownership). • Arrays. No universal representation, marshalling, mapping to arguments.
SWIG is actively used and developed • 750 members on mailing list ([email protected]) • 86000 downloads in last 3 years. • Used in industry and commercial products. • And real scientific computing applications. Status • Currently working on major new release (SWIG-1.3.x ===> SWIG-2.0). • About 6 active developers. • Major enhancements to C++ handling (templates, namespaces, type system). • New target languages. Availability: • http://www.swig.org • And many Linux distributions.