Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Extensible Compiler for Creating Scriptable Scientific Software

An Extensible Compiler for Creating Scriptable Scientific Software

Conference presentation. ICCS 2002. Amsterdam.

David Beazley

April 22, 2002
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. ICCS’02, April 22, 2002 1 [email protected]
    An Extensible Compiler for Creating
    Scriptable Scientific Software
    David M. Beazley
    Department of Computer Science
    The University of Chicago
    [email protected]
    April 22, 2002
    PY-MA

    View Slide

  2. ICCS’02, April 22, 2002 2 [email protected]
    Scripted Scientific Software
    Many scientific programs are being transformed
    Motivation: Scripting languages offer many benefits
    • Interpreted and interactive user environment.
    • Extensible with compiled C/C++/Fortran code.
    • Rapid prototyping.
    • Debugging.
    • Systems integration and components.
    • Can make applications look a lot like MATLAB, IDL, etc.
    C/C++
    Application
    C/C++
    Application
    Scripting Language
    Non-interactive
    batch processing (Python, Perl, Tcl, etc.)
    (extension)

    View Slide

  3. ICCS’02, April 22, 2002 3 [email protected]
    Extension Programming
    Wrappers
    • main() replaced by wrappers (data marshalling, error handling, etc.)
    • Similar to stub-code with RPC, CORBA, COM, etc.
    • Goal: expose application internals to interpreter (functions, classes, variables,...)
    >>> gcd(12,16)
    4
    Scripting Interpreter
    Scientific Application
    (C/C++/Fortran)
    Wrapper Layer
    Scientific Application
    (C/C++/Fortran)
    Wrapper Layer
    Scientific Application
    (C/C++/Fortran)
    main()
    Original Application

    View Slide

  4. ICCS’02, April 22, 2002 4 [email protected]
    Extension Code Example
    Python wrapper
    #include "Python.h"
    extern int gcd(int,int);
    PyObject *wrap_gcd(PyObject *self, PyObject *args) {
    int x,y;
    int result;
    if (!PyArg_ParseTuple(args,"ii", &x, &y)) {
    return NULL;
    }
    result = gcd(x,y);
    return PyInt_FromLong(result);
    }
    • Data conversion from C <---> Python.
    • You would write a wrapper for each part of your program.
    • Ex: 300 C functions ==> 300 wrapper functions
    • C++ classes, structures, templates, etc. are more complicated.

    View Slide

  5. ICCS’02, April 22, 2002 5 [email protected]
    The Problem
    No one wants to write extension code
    • Highly repetitive.
    • Prone to error.
    • Difficult for complicated programs.
    Other issues
    • Scientific programs characterized by rapid change.
    • Functions change, variables change, objects change.
    • Piecemeal development.
    • Would require continual maintenance of the wrappers.
    • Complicates development.
    • Makes scripting languages impractical to use in early stages of a project.

    View Slide

  6. ICCS’02, April 22, 2002 6 [email protected]
    The SWIG Project
    SWIG
    • A C/C++ compiler for generating wrappers to existing code.
    • Freely available and in development since 1995.
    • Currently targets Python, Perl, Tcl, Ruby, Java, PHP
    , Guile, and Mzscheme.
    Source translation
    • C++ header files are parsed to generate wrappers
    Goals
    • Make it extremely easy for users (scientists) to build wrappers.
    • Allow scripting interface to automatically track changes in underlying source.
    • Make the wrapping process as invisible as possible.
    .h
    .h
    .h
    swig Wrapper Code
    C/C++ Perl, Python,Tcl, Ruby, ...

    View Slide

  7. ICCS’02, April 22, 2002 7 [email protected]
    SWIG Overview
    Key components:
    • Header file parsing.
    • Special SWIG directives.
    Supported C++ features
    • Functions, variables, constants.
    • Classes
    • Inheritance and multiple inheritance.
    • Pointers, references, arrays, member pointers.
    • Overloading (with renaming)
    • Operators.
    • Namespaces.
    • Templates.
    • Preprocessing.
    Not supported
    • Nested classes, member templates, template partial specialization
    Will show a few examples
    • Not a complete coverage of SWIG.

    View Slide

  8. ICCS’02, April 22, 2002 8 [email protected]
    Input files
    C/C++ declarations mixed with special SWIG directives.
    // example.i : Sample SWIG input file
    %module example
    %{
    #include "example.h"
    %}
    // Resolve a name clash with Python
    %rename(cprint) print;
    // C/C++ declarations
    extern int gcd(int x, int y);
    extern int fact(int n);
    ...
    %include "example.h" // Parse a header file
    ...

    View Slide

  9. ICCS’02, April 22, 2002 9 [email protected]
    Creating a Module
    Compilation and linking
    % swig -python example.i
    % cc -c -I/usr/local/include/python2.1 example_wrap.c
    % cc -shared example_wrap.o $(OBJS) -o examplemodule.so
    Use
    % python
    Python 2.1 (#3, Aug 20 2001, 15:41:42)
    [GCC 2.95.2 19991024 (release)] on sunos5
    >>> import example
    >>> example.gcd(12,16)
    4
    >>>
    Comments:
    • Modules built as shared libraries/DLLs
    • Dynamic loading used to import into interpreter.
    • Contents of the module similar to C.

    View Slide

  10. ICCS’02, April 22, 2002 10 [email protected]
    A More Complicated Example
    • Structures/classes mapped into wrapper objects.
    • Provides natural access from an interpreter.
    class Complex {
    double real, imag;
    public:
    Complex(double r = 0, double i = 0);
    Complex(const Complex &c);
    Complex &operator=(const Complex &c);
    Complex operator+(const Complex &);
    Complex operator-(const Complex &);
    Complex operator*(const Complex &);
    Complex operator-();
    double re();
    double im();
    ...
    };
    >>> a = Complex(3,4)
    >>> b = Complex(5,6)
    >>> c = a + b
    >>> c.re()
    8.0
    >>> c.im()
    10.0
    >>>
    C++ Python

    View Slide

  11. ICCS’02, April 22, 2002 11 [email protected]
    Structure Extension
    Converting C structures to classes
    • Can make C programs look OO (or extend C++ classes)
    typedef struct {
    double x,y,z;
    } Vector;
    ...
    %addmethods Vector {
    Vector(double x, double y, double z) {
    Vector *r = (Vector *) malloc(sizeof(Vector));
    r->x = x;
    r->y = y;
    r->z = z;
    return r;
    }
    double magnitude() {
    return sqrt(self->x*self->x+self->y*self->y
    + self->z*self->z);
    }
    ...
    };

    View Slide

  12. ICCS’02, April 22, 2002 12 [email protected]
    Template Wrapping
    %template directive
    template class vector {
    public:
    vector();
    ~vector();
    T get(int index);
    int size();
    ...
    };
    // Instantiate templates
    %template(intvector) vector;
    %template(doublevector) vector;
    In Python
    >>> v = intvector()
    ...
    >>> x = v.get(2)
    >>> print v.size()
    10
    >>>

    View Slide

  13. ICCS’02, April 22, 2002 13 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    C++ (input)

    View Slide

  14. ICCS’02, April 22, 2002 14 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Complex *
    new_Complex(double r, double i) {
    return new Complex(r,i);
    }
    Complex *
    Complex_operator___add__(
    Complex *self,Complex *other) {
    Complex *r;
    r=new Complex(self->operator+(*other));
    return r;
    }
    double
    Complex_re(Complex *self) {
    return self->re();
    }
    C++ (input)
    Procedure Wrappers

    View Slide

  15. ICCS’02, April 22, 2002 15 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Complex *
    new_Complex(double r, double i) {
    return new Complex(r,i);
    }
    Complex *
    Complex_operator___add__(
    Complex *self,Complex *other) {
    Complex *r;
    r=new Complex(self->operator+(*other));
    return r;
    }
    double
    Complex_re(Complex *self) {
    return self->re();
    }
    Extension Module (DLL)
    C++ (input)
    Procedure Wrappers

    View Slide

  16. ICCS’02, April 22, 2002 16 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Complex *
    new_Complex(double r, double i) {
    return new Complex(r,i);
    }
    Complex *
    Complex_operator___add__(
    Complex *self,Complex *other) {
    Complex *r;
    r=new Complex(self->operator+(*other));
    return r;
    }
    double
    Complex_re(Complex *self) {
    return self->re();
    }
    Extension Module (DLL)
    class Complex:
    def __init__(self,r,i):
    self.this = new_Complex(r,i)
    def __add__(self,other):
    return Complex_operator___add__
    self.this,other)
    def re(self):
    return Complex_re(self.this)
    ...
    C++ (input)
    Procedure Wrappers
    Python class

    View Slide

  17. ICCS’02, April 22, 2002 17 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Complex *
    new_Complex(double r, double i) {
    return new Complex(r,i);
    }
    Complex *
    Complex_operator___add__(
    Complex *self,Complex *other) {
    Complex *r;
    r=new Complex(self->operator+(*other));
    return r;
    }
    double
    Complex_re(Complex *self) {
    return self->re();
    }
    Extension Module (DLL)
    class Complex:
    def __init__(self,r,i):
    self.this = new_Complex(r,i)
    def __add__(self,other):
    return Complex_operator___add__
    self.this,other)
    def re(self):
    return Complex_re(self.this)
    ...
    >>> a = Complex(2,3)
    >>> b = Complex(4,5)
    >>> c = a + b
    >>> c.re()
    6
    >>>
    C++ (input)
    Procedure Wrappers
    Python class
    Python script

    View Slide

  18. ICCS’02, April 22, 2002 18 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Complex *
    new_Complex(double r, double i) {
    return new Complex(r,i);
    }
    Complex *
    Complex_operator___add__(
    Complex *self,Complex *other) {
    Complex *r;
    r=new Complex(self->operator+(*other));
    return r;
    }
    double
    Complex_re(Complex *self) {
    return self->re();
    }
    Extension Module (DLL)
    class Complex:
    def __init__(self,r,i):
    self.this = new_Complex(r,i)
    def __add__(self,other):
    return Complex_operator___add__
    self.this,other)
    def re(self):
    return Complex_re(self.this)
    ...
    >>> a = Complex(2,3)
    >>> b = Complex(4,5)
    >>> c = a + b
    >>> c.re()
    6
    >>>
    C++ (input)
    Procedure Wrappers
    Python class
    Python script

    View Slide

  19. ICCS’02, April 22, 2002 19 [email protected]
    How it Works
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Extension Module (DLL)
    >>> a = Complex(2,3)
    >>> b = Complex(4,5)
    >>> c = a + b
    >>> c.re()
    6
    >>>
    C++ (input)
    Procedure Wrappers
    Python class
    Python script
    SWIG generated
    SWIG generated

    View Slide

  20. ICCS’02, April 22, 2002 20 [email protected]
    How it Works
    • User only works with input file (C++) and scripts
    • Details of wrappers hidden.
    • Wrappers not modified by user. Only used to compile DLL.
    class Complex {
    public:
    Complex(double r = 0, double i = 0);
    Complex operator+(const Complex &);
    double re();
    ...
    };
    Extension Module (DLL)
    >>> a = Complex(2,3)
    >>> b = Complex(4,5)
    >>> c = a + b
    >>> c.re()
    6
    >>>
    C++ (input)
    Python script
    swig

    View Slide

  21. ICCS’02, April 22, 2002 21 [email protected]
    Challenges
    C/C++ is a bad interface definition language
    • Type system complexity:
    typedef int (*PFIA[20])(int, double *x);
    double foo(PFIA *const x);
    • Ambiguity in data conversion (pointers, arrays, output values, etc.)
    double bar(double *x, double *y, double *r);
    • Structures, classes, unions.
    • Templates, namespaces, overloading, operators, etc.
    SWIG solution
    • Declaration annotation.
    • Pattern based type conversion.
    • Will provide a brief tour of internals.

    View Slide

  22. ICCS’02, April 22, 2002 22 [email protected]
    Declaration Annotation
    The underlying customization mechanism
    %module example
    %rename(cprint) print;
    %ignore Complex::operator=;
    %include "example.h"
    // example.h
    void print(char *s);
    class Complex {
    public:
    void print();
    ...
    Complex& operator=(const Complex &);
    ...
    };
    Declaration modifiers
    (special directives)
    Pattern matching
    (unmodified C/C++)

    View Slide

  23. ICCS’02, April 22, 2002 23 [email protected]
    Declaration Annotation
    Advanced features
    • Fully integrated with the C++ type system.
    • Annotations can be parameterized with type signatures.
    Example:
    %ignore Object::bar(string *s) const;
    ...
    class Object {
    ...
    void bar(string *s);
    void bar(string *s) const; // Ignored
    ...
    } ;
    class Foo : public Object {
    ...
    void bar(string *s);
    void bar(string *s) const; // Ignored
    ...
    };

    View Slide

  24. ICCS’02, April 22, 2002 24 [email protected]
    Type Conversion
    Problem: marshalling
    • Must convert data between scripting and C representation.
    Example:
    In Python
    >>> gcd(12,16)
    4
    >>> count("Hello",5,"e")
    1
    int gcd(int x, int y);
    int count(char *buf, int len, char c);
    Integers
    String Single character

    View Slide

  25. ICCS’02, April 22, 2002 25 [email protected]
    Pattern-Based Type Conversion
    Typemap patterns
    %typemap(in) int {
    $1 = PyInt_AsLong($input);
    }
    %typemap(out) int {
    $result = PyInt_FromLong($1);
    }
    %typemap(in) char * {
    $1 = PyString_AsString($input);
    }
    ...
    %include "example.h"
    int gcd(int x, int y);
    ...
    int count(char *buf, int len, char c);
    ...
    C datatype conversion code.
    (depends on target language)
    C header
    Note: user rarely writes this.

    View Slide

  26. ICCS’02, April 22, 2002 26 [email protected]
    Typemaps
    Named typemaps:
    %typemap(in) double nonnegative {
    $1 = PyFloat_AsDouble($input);
    if ($1 < 0) {
    PyErr_SetString(PyExc_ValueError,"domain error!");
    return NULL;
    }
    }
    double sqrt(double nonnegative);
    Sequences
    %typemap(in) (char *buf, int len) {
    $1 = PyString_AsString($input);
    $2 = PyString_Size($input);
    }
    int count(char *buf, int len, char c);
    >>> count("Hello","e")
    1

    View Slide

  27. ICCS’02, April 22, 2002 27 [email protected]
    Typemaps and Datatypes
    Pattern matching integrated with C++ typesystem
    %typemap(in) int { ... }
    typedef int Integer;
    ...
    Integer gcd(Integer x, Integer y);
    namespace std {
    class string;
    %typemap(in) string * { ... };
    }
    namespace S = std;
    using std::string;
    ...
    void foo(string *a, S::string *b);
    Comments:
    • All type conversion in SWIG is pattern based.
    • Type conversion by naming convention.
    • Mostly hidden from users.
    • Allows advanced customization.

    View Slide

  28. ICCS’02, April 22, 2002 28 [email protected]
    Advanced Typemap Example
    Conversion of Numeric Python array to C
    %typemap(in) (double *mat, int nx, int ny) {
    PyArrayObject *array;
    if (!PyArray_Check($input)) {
    PyErr_SetString(PyExc_TypeError,"Expected an array");
    return NULL;
    }
    array = (PyArrayObject *)
    PyArray_ContiguousFromObject(input, PyArray_DOUBLE, 2, 2);
    if (!array) {
    PyErr_SetString(PyExc_ValueError,
    "array must be two-dimensional and of type float");
    return NULL;
    }
    $1 = (double *) array->data; /* Assign grid */
    $2 = array->dimensions[0]; /* Assign nx */
    $3 = array->dimensions[1]; /* Assign ny */
    }
    ...
    double determinant(double *mat, int nx, int ny);
    Key point
    • SWIG can be customized to handle new datatypes.
    • Customized data marshalling.

    View Slide

  29. ICCS’02, April 22, 2002 29 [email protected]
    Using SWIG
    Summary
    • Existing C/C++ header files used to build wrappers.
    • Process guided by some special SWIG directives.
    • Most details hidden from user.
    • Can customize output using typemaps and other features.
    .h
    .h
    .h
    .h
    Scientific Application
    (C/C++/Fortran)
    Scientific Application
    (C/C++/Fortran)
    .i
    Scientific Application
    (C/C++/Fortran)
    Wrapper Layer
    swig
    DLL

    View Slide

  30. ICCS’02, April 22, 2002 30 [email protected]
    Extending SWIG
    SWIG consists of several components
    • Preprocessor.
    • C++ parser.
    • C++ type system.
    • Fully supports multi-pass compilation/code generation.
    • Internal data structures loosely based on XML-DOM.
    Target language modules
    • Implemented as C++ classes.
    • Virtual methods redefined according to target language.
    class SomeLanguage : public Language {
    public:
    virtual void main(int argc, char *argv[]);
    virtual int top(Node *n);
    virtual int functionWrapper(Node *n);
    virtual int variableWrapper(Node *n);
    ...
    };

    View Slide

  31. ICCS’02, April 22, 2002 31 [email protected]
    Limitations
    Unsupported C++ features
    • Nested classes (soon).
    • Certain advanced features of templates.
    • Not all C++ features map cleanly to scripting interface.
    • Subtle differences in semantics (assignment, overloading, etc.)
    Problematic topics
    • Callback functions and methods.
    • Memory management (object ownership).
    • Arrays. No universal representation, marshalling, mapping to arguments.

    View Slide

  32. ICCS’02, April 22, 2002 32 [email protected]
    Related Work
    Many extension building tools are available
    Goals
    • Simplify extension programming
    • Automate extension programming.
    Common approaches
    • Programming libraries.
    • Specialized compilers.
    • Mixed language procedure inlining.
    • SWIG
    • CABLE
    • Inline
    • Boost Python
    • Wrappy
    • Grad
    • f2py
    • pyfort
    • G-wrap
    • Tolua
    • CXX
    • Pyrex
    • Weave
    • Many others

    View Slide

  33. ICCS’02, April 22, 2002 33 [email protected]
    Current Status and Availability
    SWIG is actively used and developed
    • 750 members on mailing list ([email protected])
    • 86000 downloads in last 3 years.
    • Used in industry and commercial products.
    • And real scientific computing applications.
    Status
    • Currently working on major new release (SWIG-1.3.x ===> SWIG-2.0).
    • About 6 active developers.
    • Major enhancements to C++ handling (templates, namespaces, type system).
    • New target languages.
    Availability:
    • http://www.swig.org
    • And many Linux distributions.

    View Slide