An Extensible Compiler for Creating Scriptable Scientific Software

An Extensible Compiler for Creating Scriptable Scientific Software

Conference presentation. ICCS 2002. Amsterdam.

70c42f4cf225f1455a7e01379bbd4d48?s=128

David Beazley

April 22, 2002
Tweet

Transcript

  1. ICCS’02, April 22, 2002 1 beazley@cs.uchicago.edu An Extensible Compiler for

    Creating Scriptable Scientific Software David M. Beazley Department of Computer Science The University of Chicago beazley@cs.uchicago.edu April 22, 2002 PY-MA
  2. ICCS’02, April 22, 2002 2 beazley@cs.uchicago.edu Scripted Scientific Software Many

    scientific programs are being transformed Motivation: Scripting languages offer many benefits • Interpreted and interactive user environment. • Extensible with compiled C/C++/Fortran code. • Rapid prototyping. • Debugging. • Systems integration and components. • Can make applications look a lot like MATLAB, IDL, etc. C/C++ Application C/C++ Application Scripting Language Non-interactive batch processing (Python, Perl, Tcl, etc.) (extension)
  3. ICCS’02, April 22, 2002 3 beazley@cs.uchicago.edu Extension Programming Wrappers •

    main() replaced by wrappers (data marshalling, error handling, etc.) • Similar to stub-code with RPC, CORBA, COM, etc. • Goal: expose application internals to interpreter (functions, classes, variables,...) >>> gcd(12,16) 4 Scripting Interpreter Scientific Application (C/C++/Fortran) Wrapper Layer Scientific Application (C/C++/Fortran) Wrapper Layer Scientific Application (C/C++/Fortran) main() Original Application
  4. ICCS’02, April 22, 2002 4 beazley@cs.uchicago.edu Extension Code Example Python

    wrapper #include "Python.h" extern int gcd(int,int); PyObject *wrap_gcd(PyObject *self, PyObject *args) { int x,y; int result; if (!PyArg_ParseTuple(args,"ii", &x, &y)) { return NULL; } result = gcd(x,y); return PyInt_FromLong(result); } • Data conversion from C <---> Python. • You would write a wrapper for each part of your program. • Ex: 300 C functions ==> 300 wrapper functions • C++ classes, structures, templates, etc. are more complicated.
  5. ICCS’02, April 22, 2002 5 beazley@cs.uchicago.edu The Problem No one

    wants to write extension code • Highly repetitive. • Prone to error. • Difficult for complicated programs. Other issues • Scientific programs characterized by rapid change. • Functions change, variables change, objects change. • Piecemeal development. • Would require continual maintenance of the wrappers. • Complicates development. • Makes scripting languages impractical to use in early stages of a project.
  6. ICCS’02, April 22, 2002 6 beazley@cs.uchicago.edu The SWIG Project SWIG

    • A C/C++ compiler for generating wrappers to existing code. • Freely available and in development since 1995. • Currently targets Python, Perl, Tcl, Ruby, Java, PHP , Guile, and Mzscheme. Source translation • C++ header files are parsed to generate wrappers Goals • Make it extremely easy for users (scientists) to build wrappers. • Allow scripting interface to automatically track changes in underlying source. • Make the wrapping process as invisible as possible. .h .h .h swig Wrapper Code C/C++ Perl, Python,Tcl, Ruby, ...
  7. ICCS’02, April 22, 2002 7 beazley@cs.uchicago.edu SWIG Overview Key components:

    • Header file parsing. • Special SWIG directives. Supported C++ features • Functions, variables, constants. • Classes • Inheritance and multiple inheritance. • Pointers, references, arrays, member pointers. • Overloading (with renaming) • Operators. • Namespaces. • Templates. • Preprocessing. Not supported • Nested classes, member templates, template partial specialization Will show a few examples • Not a complete coverage of SWIG.
  8. ICCS’02, April 22, 2002 8 beazley@cs.uchicago.edu Input files C/C++ declarations

    mixed with special SWIG directives. // example.i : Sample SWIG input file %module example %{ #include "example.h" %} // Resolve a name clash with Python %rename(cprint) print; // C/C++ declarations extern int gcd(int x, int y); extern int fact(int n); ... %include "example.h" // Parse a header file ...
  9. ICCS’02, April 22, 2002 9 beazley@cs.uchicago.edu Creating a Module Compilation

    and linking % swig -python example.i % cc -c -I/usr/local/include/python2.1 example_wrap.c % cc -shared example_wrap.o $(OBJS) -o examplemodule.so Use % python Python 2.1 (#3, Aug 20 2001, 15:41:42) [GCC 2.95.2 19991024 (release)] on sunos5 >>> import example >>> example.gcd(12,16) 4 >>> Comments: • Modules built as shared libraries/DLLs • Dynamic loading used to import into interpreter. • Contents of the module similar to C.
  10. ICCS’02, April 22, 2002 10 beazley@cs.uchicago.edu A More Complicated Example

    • Structures/classes mapped into wrapper objects. • Provides natural access from an interpreter. class Complex { double real, imag; public: Complex(double r = 0, double i = 0); Complex(const Complex &c); Complex &operator=(const Complex &c); Complex operator+(const Complex &); Complex operator-(const Complex &); Complex operator*(const Complex &); Complex operator-(); double re(); double im(); ... }; >>> a = Complex(3,4) >>> b = Complex(5,6) >>> c = a + b >>> c.re() 8.0 >>> c.im() 10.0 >>> C++ Python
  11. ICCS’02, April 22, 2002 11 beazley@cs.uchicago.edu Structure Extension Converting C

    structures to classes • Can make C programs look OO (or extend C++ classes) typedef struct { double x,y,z; } Vector; ... %addmethods Vector { Vector(double x, double y, double z) { Vector *r = (Vector *) malloc(sizeof(Vector)); r->x = x; r->y = y; r->z = z; return r; } double magnitude() { return sqrt(self->x*self->x+self->y*self->y + self->z*self->z); } ... };
  12. ICCS’02, April 22, 2002 12 beazley@cs.uchicago.edu Template Wrapping %template directive

    template<class T> class vector { public: vector(); ~vector(); T get(int index); int size(); ... }; // Instantiate templates %template(intvector) vector<int>; %template(doublevector) vector<double>; In Python >>> v = intvector() ... >>> x = v.get(2) >>> print v.size() 10 >>>
  13. ICCS’02, April 22, 2002 13 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; C++ (input)
  14. ICCS’02, April 22, 2002 14 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Complex * new_Complex(double r, double i) { return new Complex(r,i); } Complex * Complex_operator___add__( Complex *self,Complex *other) { Complex *r; r=new Complex(self->operator+(*other)); return r; } double Complex_re(Complex *self) { return self->re(); } C++ (input) Procedure Wrappers
  15. ICCS’02, April 22, 2002 15 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Complex * new_Complex(double r, double i) { return new Complex(r,i); } Complex * Complex_operator___add__( Complex *self,Complex *other) { Complex *r; r=new Complex(self->operator+(*other)); return r; } double Complex_re(Complex *self) { return self->re(); } Extension Module (DLL) C++ (input) Procedure Wrappers
  16. ICCS’02, April 22, 2002 16 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Complex * new_Complex(double r, double i) { return new Complex(r,i); } Complex * Complex_operator___add__( Complex *self,Complex *other) { Complex *r; r=new Complex(self->operator+(*other)); return r; } double Complex_re(Complex *self) { return self->re(); } Extension Module (DLL) class Complex: def __init__(self,r,i): self.this = new_Complex(r,i) def __add__(self,other): return Complex_operator___add__ self.this,other) def re(self): return Complex_re(self.this) ... C++ (input) Procedure Wrappers Python class
  17. ICCS’02, April 22, 2002 17 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Complex * new_Complex(double r, double i) { return new Complex(r,i); } Complex * Complex_operator___add__( Complex *self,Complex *other) { Complex *r; r=new Complex(self->operator+(*other)); return r; } double Complex_re(Complex *self) { return self->re(); } Extension Module (DLL) class Complex: def __init__(self,r,i): self.this = new_Complex(r,i) def __add__(self,other): return Complex_operator___add__ self.this,other) def re(self): return Complex_re(self.this) ... >>> a = Complex(2,3) >>> b = Complex(4,5) >>> c = a + b >>> c.re() 6 >>> C++ (input) Procedure Wrappers Python class Python script
  18. ICCS’02, April 22, 2002 18 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Complex * new_Complex(double r, double i) { return new Complex(r,i); } Complex * Complex_operator___add__( Complex *self,Complex *other) { Complex *r; r=new Complex(self->operator+(*other)); return r; } double Complex_re(Complex *self) { return self->re(); } Extension Module (DLL) class Complex: def __init__(self,r,i): self.this = new_Complex(r,i) def __add__(self,other): return Complex_operator___add__ self.this,other) def re(self): return Complex_re(self.this) ... >>> a = Complex(2,3) >>> b = Complex(4,5) >>> c = a + b >>> c.re() 6 >>> C++ (input) Procedure Wrappers Python class Python script
  19. ICCS’02, April 22, 2002 19 beazley@cs.uchicago.edu How it Works class

    Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Extension Module (DLL) >>> a = Complex(2,3) >>> b = Complex(4,5) >>> c = a + b >>> c.re() 6 >>> C++ (input) Procedure Wrappers Python class Python script SWIG generated SWIG generated
  20. ICCS’02, April 22, 2002 20 beazley@cs.uchicago.edu How it Works •

    User only works with input file (C++) and scripts • Details of wrappers hidden. • Wrappers not modified by user. Only used to compile DLL. class Complex { public: Complex(double r = 0, double i = 0); Complex operator+(const Complex &); double re(); ... }; Extension Module (DLL) >>> a = Complex(2,3) >>> b = Complex(4,5) >>> c = a + b >>> c.re() 6 >>> C++ (input) Python script swig
  21. ICCS’02, April 22, 2002 21 beazley@cs.uchicago.edu Challenges C/C++ is a

    bad interface definition language • Type system complexity: typedef int (*PFIA[20])(int, double *x); double foo(PFIA *const x); • Ambiguity in data conversion (pointers, arrays, output values, etc.) double bar(double *x, double *y, double *r); • Structures, classes, unions. • Templates, namespaces, overloading, operators, etc. SWIG solution • Declaration annotation. • Pattern based type conversion. • Will provide a brief tour of internals.
  22. ICCS’02, April 22, 2002 22 beazley@cs.uchicago.edu Declaration Annotation The underlying

    customization mechanism %module example %rename(cprint) print; %ignore Complex::operator=; %include "example.h" // example.h void print(char *s); class Complex { public: void print(); ... Complex& operator=(const Complex &); ... }; Declaration modifiers (special directives) Pattern matching (unmodified C/C++)
  23. ICCS’02, April 22, 2002 23 beazley@cs.uchicago.edu Declaration Annotation Advanced features

    • Fully integrated with the C++ type system. • Annotations can be parameterized with type signatures. Example: %ignore Object::bar(string *s) const; ... class Object { ... void bar(string *s); void bar(string *s) const; // Ignored ... } ; class Foo : public Object { ... void bar(string *s); void bar(string *s) const; // Ignored ... };
  24. ICCS’02, April 22, 2002 24 beazley@cs.uchicago.edu Type Conversion Problem: marshalling

    • Must convert data between scripting and C representation. Example: In Python >>> gcd(12,16) 4 >>> count("Hello",5,"e") 1 int gcd(int x, int y); int count(char *buf, int len, char c); Integers String Single character
  25. ICCS’02, April 22, 2002 25 beazley@cs.uchicago.edu Pattern-Based Type Conversion Typemap

    patterns %typemap(in) int { $1 = PyInt_AsLong($input); } %typemap(out) int { $result = PyInt_FromLong($1); } %typemap(in) char * { $1 = PyString_AsString($input); } ... %include "example.h" int gcd(int x, int y); ... int count(char *buf, int len, char c); ... C datatype conversion code. (depends on target language) C header Note: user rarely writes this.
  26. ICCS’02, April 22, 2002 26 beazley@cs.uchicago.edu Typemaps Named typemaps: %typemap(in)

    double nonnegative { $1 = PyFloat_AsDouble($input); if ($1 < 0) { PyErr_SetString(PyExc_ValueError,"domain error!"); return NULL; } } double sqrt(double nonnegative); Sequences %typemap(in) (char *buf, int len) { $1 = PyString_AsString($input); $2 = PyString_Size($input); } int count(char *buf, int len, char c); >>> count("Hello","e") 1
  27. ICCS’02, April 22, 2002 27 beazley@cs.uchicago.edu Typemaps and Datatypes Pattern

    matching integrated with C++ typesystem %typemap(in) int { ... } typedef int Integer; ... Integer gcd(Integer x, Integer y); namespace std { class string; %typemap(in) string * { ... }; } namespace S = std; using std::string; ... void foo(string *a, S::string *b); Comments: • All type conversion in SWIG is pattern based. • Type conversion by naming convention. • Mostly hidden from users. • Allows advanced customization.
  28. ICCS’02, April 22, 2002 28 beazley@cs.uchicago.edu Advanced Typemap Example Conversion

    of Numeric Python array to C %typemap(in) (double *mat, int nx, int ny) { PyArrayObject *array; if (!PyArray_Check($input)) { PyErr_SetString(PyExc_TypeError,"Expected an array"); return NULL; } array = (PyArrayObject *) PyArray_ContiguousFromObject(input, PyArray_DOUBLE, 2, 2); if (!array) { PyErr_SetString(PyExc_ValueError, "array must be two-dimensional and of type float"); return NULL; } $1 = (double *) array->data; /* Assign grid */ $2 = array->dimensions[0]; /* Assign nx */ $3 = array->dimensions[1]; /* Assign ny */ } ... double determinant(double *mat, int nx, int ny); Key point • SWIG can be customized to handle new datatypes. • Customized data marshalling.
  29. ICCS’02, April 22, 2002 29 beazley@cs.uchicago.edu Using SWIG Summary •

    Existing C/C++ header files used to build wrappers. • Process guided by some special SWIG directives. • Most details hidden from user. • Can customize output using typemaps and other features. .h .h .h .h Scientific Application (C/C++/Fortran) Scientific Application (C/C++/Fortran) .i Scientific Application (C/C++/Fortran) Wrapper Layer swig DLL
  30. ICCS’02, April 22, 2002 30 beazley@cs.uchicago.edu Extending SWIG SWIG consists

    of several components • Preprocessor. • C++ parser. • C++ type system. • Fully supports multi-pass compilation/code generation. • Internal data structures loosely based on XML-DOM. Target language modules • Implemented as C++ classes. • Virtual methods redefined according to target language. class SomeLanguage : public Language { public: virtual void main(int argc, char *argv[]); virtual int top(Node *n); virtual int functionWrapper(Node *n); virtual int variableWrapper(Node *n); ... };
  31. ICCS’02, April 22, 2002 31 beazley@cs.uchicago.edu Limitations Unsupported C++ features

    • Nested classes (soon). • Certain advanced features of templates. • Not all C++ features map cleanly to scripting interface. • Subtle differences in semantics (assignment, overloading, etc.) Problematic topics • Callback functions and methods. • Memory management (object ownership). • Arrays. No universal representation, marshalling, mapping to arguments.
  32. ICCS’02, April 22, 2002 32 beazley@cs.uchicago.edu Related Work Many extension

    building tools are available Goals • Simplify extension programming • Automate extension programming. Common approaches • Programming libraries. • Specialized compilers. • Mixed language procedure inlining. • SWIG • CABLE • Inline • Boost Python • Wrappy • Grad • f2py • pyfort • G-wrap • Tolua • CXX • Pyrex • Weave • Many others
  33. ICCS’02, April 22, 2002 33 beazley@cs.uchicago.edu Current Status and Availability

    SWIG is actively used and developed • 750 members on mailing list (swig@cs.uchicago.edu) • 86000 downloads in last 3 years. • Used in industry and commercial products. • And real scientific computing applications. Status • Currently working on major new release (SWIG-1.3.x ===> SWIG-2.0). • About 6 active developers. • Major enhancements to C++ handling (templates, namespaces, type system). • New target languages. Availability: • http://www.swig.org • And many Linux distributions.