Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rethinking Extension Programming

Rethinking Extension Programming

Invited Keynote. EuroSciPy 2012. Brussels. Talk was recorded, but the video seems to be lost to the sands of time. A shame as I live-coded some LLVM examples. Contact me if its ever found.

David Beazley

August 25, 2012
Tweet

More Decks by David Beazley

Other Decks in Programming

Transcript

  1. Rethinking Extension
    Programming
    David Beazley (@dabeaz)
    EuroSciPy
    August 25, 2012

    View Slide

  2. "Much of scientific programming is exploratory in
    nature, and for that sort of programming the use of
    compiled languages will cease. Interpreters will simply
    be fast enough for most such calculations. More
    computationally intensive programs will be written as
    extensions of interpreted environments."

    View Slide

  3. "Much of scientific programming is exploratory in
    nature, and for that sort of programming the use of
    compiled languages will cease. Interpreters will simply
    be fast enough for most such calculations. More
    computationally intensive programs will be written as
    extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)

    View Slide

  4. NUMERICAL PYTHON
    Paul F. Dubois, Konrad Hinsen,
    and James Hugunin
    Department Editor:
    Paul F. Dubois
    [email protected]
    P)1hon is a small and easy-la-learn hmgl!age with surpris-
    ing capabilities. It is an interpreted objecl-orienT.ed script-
    ing language and has a full range of sophisticated features
    such as first-class functions, garbage collection, and excep-
    tion handling. Python has properties that m"ke it especially
    appealing for scientific programming:
    Python is quite simple and easy to learn, but it is a full and
    complete language.
    It is simple to extend Python with your own compiled
    objects and functions.
    Python is portable, from Unix to Windows 95 to Linux to
    Macintosh.
    Python is free, with no license required e'/en if you make
    a commercial product out of it. _
    Python has a large user-contributed library of "modules"
    These modules cover a wide variety of needs, such as audio.
    benchmarks for the basic language and
    the numerical extension are available
    on the Python Web site (http://www.
    python.org).
    The numerical extension is still in
    beta test and may therefore change
    slightly from this description. In par-
    ticular, the beta-test period is needed to
    sort out some controversies in naming
    and coercion rules. But with this tutorial
    as a start and the latest readme file for
    the numerical extension, you should be
    able to start using it. Note that you will
    need to add the extension to the Python
    source; every effort is made to keep a
    "minimal Python" as small as possible,
    as Python is being used for applications
    where a small size is important.
    Python is extremely well suited to
    the development of programmable ap-
    plications, as has been advocated on
    these pages (CiP S:I, 1994, p. 70). It has a scripting language
    as the user interface and compiled code for the compute-in-
    tensive portions.
    Introducing Python
    Python is an interpreter. You can either enter commands
    directly into the interpreter Of,more commonly, create a file
    containing a script. On Unix, you can invoke Python with the
    script as the first argument, or you can use the llsual trick of
    starting the script with a comment like this:
    # !/usr/localibinlpython
    Then you give execute permission to the script file. When you
    execute it, the Python interpreter is invoked on the script file
    itself. Since the above line is a comment as far as Python is
    concerned, it is then ignored.
    w
    ex
    th
    by
    80
    lis
    so
    us
    tw
    >
    {
    A l
    obj
    f
    The
    fi
    In m
    tupl
    fir
    2 COMPUTERS IN PHYSICS, VOL. 10,NO.3. MAYIJUN 1996 © 1996 AMERICAN INSTITUTE OF PHYSICS 0894_1866/96110(3)126216/$10.00

    View Slide

  5. View Slide

  6. "Imagine a physics program written in
    Fortran, whose user interface is a
    programming language (not Fortran, of
    course, but imagine something similar).
    In this programming language you can
    create variables, assign values, loop, if-
    test, define functions, and so on."

    View Slide

  7. "Imagine a physics program written in
    Fortran, whose user interface is a
    programming language (not Fortran, of
    course, but imagine something similar).
    In this programming language you can
    create variables, assign values, loop, if-
    test, define functions, and so on."
    1994

    View Slide

  8. 1994

    View Slide

  9. 1994 Awesome!

    View Slide

  10. Cool pictures, but all was not well...

    View Slide

  11. Batch Scripting

    View Slide

  12. Testing Hell

    View Slide

  13. Testing Hell

    View Slide

  14. Testing Hell
    Well, does it conserve energy?

    View Slide

  15. Horrible Workflow
    Batch job

    View Slide

  16. Horrible Workflow
    Batch job
    3-6 hours Data Files

    View Slide

  17. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)

    View Slide

  18. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)
    FTP (10 Mbit/s)

    View Slide

  19. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)
    FTP (10 Mbit/s)
    4 GB
    NFS

    View Slide

  20. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)
    FTP (10 Mbit/s)
    walking
    4 GB
    NFS

    View Slide

  21. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)
    FTP (10 Mbit/s)
    walking
    4 GB
    NFS
    hours

    View Slide

  22. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)
    FTP (10 Mbit/s)
    walking
    4 GB
    NFS
    hours
    Tweak

    View Slide

  23. Horrible Workflow
    Batch job
    3-6 hours
    repeat
    Data Files
    (10-20 GB)
    FTP (10 Mbit/s)
    walking
    4 GB
    NFS
    hours
    Tweak
    Unworkable!

    View Slide

  24. 1995
    (I was in the first year of a computer science Ph.D.)

    View Slide

  25. 1995
    Compiler hacking!

    View Slide

  26. 1995
    Functional
    Programming,
    Type systems.

    View Slide

  27. 1995
    Networks
    (sockets, RPC, etc.)

    View Slide

  28. 1995
    A distracted
    advisor!

    View Slide

  29. A Diabolical
    Convergence
    1995

    View Slide

  30. A REPL is born!
    1995

    View Slide

  31. 1995

    View Slide

  32. 1995
    Mother of Swig! (Yes)

    View Slide

  33. You did what?!?!?!?
    NO! NO! NO! NO! NO! NO! NO!
    (Go look at TCL)

    View Slide

  34. SWIG project unleashed
    1996

    View Slide

  35. 15 of 35
    Scriptable Scientific Software
    [email protected]
    SWIG (Simplified Wrapper and Interface Generator)
    • SWIG is a compiler that automatically constructs scripting interfaces given
    ANSI C/C++ declarations.
    • Can be used to quickly build scripting interfaces from header files.
    • Avoids the problem of writing wrapper code by hand.
    • Disclaimer : SWIG is not the only approach.
    %module SPaSM
    %{
    #include "SPaSM.h"
    %}
    void memory(int natoms);
    void geometry(double ...
    void processors(int ...
    void set_boundary_period
    ...
    >>> from SPaSM import *
    >>> memory(20000)
    >>> geometry(0,0,0,80,80,
    160,2.0)
    ...
    SWIG
    SWIG Interface file
    Python
    SWIG: C declarations to Python wrappers.
    (Details don't really matter here)

    View Slide

  36. 1996
    It was awesome! (and radical)
    Sample Session

    View Slide

  37. T1 * 3 != T3

    View Slide

  38. View Slide

  39. (A finished dissertation is a good dissertation)

    View Slide

  40. Swig lives on!
    New release!
    (Aug 20, 2012)

    View Slide

  41. But, I've long been bothered...
    What is Swig exactly?
    - A gross hack?
    - A programming languages project?
    - A software engineering project?
    - A good idea?
    - A bad idea?
    - Insanity?
    Nobody on my Ph.D. committee quite knew.
    (and neither did I)

    View Slide

  42. It's not easy to wrap your brain around it...
    Example: typemaps
    %define Name ## _input_binary(TYPEMAP, SIZE)
    %typemap(in,noblock=1,fragment=#SWIG_AsCharPtrAndSize) (TYPEMAP, SIZE)
    (int res, Char *buf = 0, size_t size = 0, int alloc = 0) {
    res = SWIG_AsCharPtrAndSize($input, &buf, &size, &alloc);
    if (!SWIG_IsOK(res)) {
    %argument_fail(res, "(TYPEMAP, SIZE)", $symname, $argnum);
    }
    $1 = ($1_ltype) buf;
    $2 = ($2_ltype) size - 1;
    }
    %typemap(freearg,noblock=1,match="in") (TYPEMAP, SIZE) {
    if (alloc$argnum == SWIG_NEWOBJ) %delete_array(buf$argnum);
    }
    %enddef
    Don't ask me--I don't understand it either

    View Slide

  43. Operational Semantics for Multi-Language Programs
    Jacob Matthews Robert Bruce Findler
    University of Chicago
    {jacobm, robby}@cs.uchicago.edu
    Abstract
    Inter-language interoperability is big business, as the success of Mi-
    crosoft’s .NET and COM and Sun’s JVM show. Programming lan-
    guage designers are designing programming languages that reflect
    that fact — SML#, Mondrian, and Scala, to name just a few ex-
    amples, all treat interoperability with other languages as a central
    design feature. Still, current multi-language research tends not to
    focus on the semantics of interoperation features, but only on how
    to implement them efficiently. In this paper, we take first steps to-
    ward higher-level models of interoperating systems. Our technique
    abstracts away the low-level details of interoperability like garbage
    collection and representation coherence, and lets us focus on se-
    mantic properties like type-safety and observable equivalence.
    Beyond giving simple expressive models that are natural com-
    positions of single-language models, our studies have uncovered
    several interesting facts about interoperability. For example, higher-
    order contracts naturally emerge as the glue to ensure that inter-
    operating languages respect each other’s type systems. While we
    present our results in an abstract setting, they shed light on real
    multi-language systems and tools such as the JNI, SWIG, and
    Haskell’s stable pointers.
    Categories and Subject Descriptors D.3.1 [Programming Lan-
    guages]: Formal Definitions and Theory—Semantics
    General Terms Languages, theory
    for the popular wrapper generator SWIG [4]), these new foreign
    function interfaces are built to allow high-level, safe languages to
    interoperate with other high-level, safe languages, such as Python
    with Scheme [32] and Lua with OCaml [38].
    Since these embeddings are driven by practical concerns, the
    research that accompanies them rightly focuses on the bits and
    bytes of interoperability — how to represent data in memory, how
    to call a foreign function efficiently, and so on. But an important
    theoretical problem arises, independent of these implementation-
    level concerns: how can we reason formally about multi-language
    programs? This is a particularly important question for systems
    that involve typed languages, because we have to show that the
    embeddings respect their constituents’ type systems.
    In this paper we present a simple method for giving operational
    semantics to multi-language systems. Our models are rich enough
    to support a wide variety of multi-language embedding strategies,
    and powerful enough that we have been able to use them for
    type soundness and contextual equivalence proofs. Our technique
    is based on simple constructs we call boundaries, cross-language
    casts that regulate both control flow and value conversion between
    languages. We introduce boundaries through a series of operational
    semantics in which we combine a simple ML-like language with a
    simple Scheme-like language.
    In section 2, we introduce those two constituent languages for-
    mally and connect them using a primitive embedding where values
    in one language are opaque to the other. In section 3, we enrich
    People have tried... (POPL'07)

    View Slide

  44. Operational Semantics for Multi-Language Programs
    Jacob Matthews Robert Bruce Findler
    University of Chicago
    {jacobm, robby}@cs.uchicago.edu
    Abstract
    Inter-language interoperability is big business, as the success of Mi-
    crosoft’s .NET and COM and Sun’s JVM show. Programming lan-
    guage designers are designing programming languages that reflect
    that fact — SML#, Mondrian, and Scala, to name just a few ex-
    amples, all treat interoperability with other languages as a central
    design feature. Still, current multi-language research tends not to
    focus on the semantics of interoperation features, but only on how
    to implement them efficiently. In this paper, we take first steps to-
    ward higher-level models of interoperating systems. Our technique
    abstracts away the low-level details of interoperability like garbage
    collection and representation coherence, and lets us focus on se-
    mantic properties like type-safety and observable equivalence.
    Beyond giving simple expressive models that are natural com-
    positions of single-language models, our studies have uncovered
    several interesting facts about interoperability. For example, higher-
    order contracts naturally emerge as the glue to ensure that inter-
    operating languages respect each other’s type systems. While we
    present our results in an abstract setting, they shed light on real
    multi-language systems and tools such as the JNI, SWIG, and
    Haskell’s stable pointers.
    Categories and Subject Descriptors D.3.1 [Programming Lan-
    guages]: Formal Definitions and Theory—Semantics
    General Terms Languages, theory
    for the popular wrapper generator SWIG [4]), these new foreign
    function interfaces are built to allow high-level, safe languages to
    interoperate with other high-level, safe languages, such as Python
    with Scheme [32] and Lua with OCaml [38].
    Since these embeddings are driven by practical concerns, the
    research that accompanies them rightly focuses on the bits and
    bytes of interoperability — how to represent data in memory, how
    to call a foreign function efficiently, and so on. But an important
    theoretical problem arises, independent of these implementation-
    level concerns: how can we reason formally about multi-language
    programs? This is a particularly important question for systems
    that involve typed languages, because we have to show that the
    embeddings respect their constituents’ type systems.
    In this paper we present a simple method for giving operational
    semantics to multi-language systems. Our models are rich enough
    to support a wide variety of multi-language embedding strategies,
    and powerful enough that we have been able to use them for
    type soundness and contextual equivalence proofs. Our technique
    is based on simple constructs we call boundaries, cross-language
    casts that regulate both control flow and value conversion between
    languages. We introduce boundaries through a series of operational
    semantics in which we combine a simple ML-like language with a
    simple Scheme-like language.
    In section 2, we introduce those two constituent languages for-
    mally and connect them using a primitive embedding where values
    in one language are opaque to the other. In section 3, we enrich
    People have tried... (POPL'07)
    e = · · · | (MSG⇥ e)
    e = · · · | (GSM ⇥ e)
    E = · · · | (MSG⇥ E)
    E = · · · | (GSM ⇥ E)
    ⌥S
    e : TST
    ⌥M (MSG⇥ e) : ⇤
    ⌥M
    e : ⇤
    ⌥S (GSM ⇥ e) : TST
    E[MSG n]M
    ⇥ E[n]
    E[MSG v]M
    ⇥ E[MSG (wrong “Non-number”)]
    v ⌅= n
    E[MSG⇥1⇥⇥2 ⇥x.e]M
    ⇥ E[⇥x : ⇤1.MSG⇥2 ((⇥x.e) (GSM ⇥1 x))]
    x not free in e
    E[MSG⇥1⇥⇥2 v]M
    ⇥ E[MSG⇥1⇥⇥2 wrong “Non-procedure”]
    v ⌅= ⇥x.e
    E[(GSM n)]S
    ⇥ E[n]
    E[(GSM ⇥1⇥⇥2 v)]S
    ⇥ E[(⇥x. (GSM ⇥2 (v (MSG⇥1 x))))]
    Figure 3. Extensions to figure 1 to form the simple natural embed-
    ding
    e = · · · | (⇥MSN
    e)
    e = · · · | (G⇥ e) | (SM ⇥
    N
    e)
    E = · · · | (⇥MSN
    E)
    E = · · · | (G⇥ E) | (SM ⇥
    N
    E)
    ⌥S
    e : TST
    ⌥S (G⇥ e) : TST
    ⌥S
    e : TST
    ⌥M (⇥MSN
    e) : ⇤
    ⌥M
    e : ⇤
    ⌥S (SM ⇥
    N
    e) : ⇤
    E[ MSN n]M
    ⇥ E[n]
    E[SMN
    n]S
    ⇥ E[n]
    E[⇥1⇥⇥2MSN ⇥x.e]M
    ⇥ E[⇥x : ⇤1.SM ⇥2
    N
    ((⇥x.e) (SM ⇥1
    N
    x))]
    E[(SM ⇥1⇥⇥2
    N
    v)]S
    ⇥ E[(⇥x.(SM ⇥2
    N
    (v (⇥1MSN
    x))))]
    E[(G n)]S
    ⇥ E[n]
    E[(G v)]S
    ⇥ E[wrong “Non-number”] (v ⌅= n)
    E[(G⇥1⇥⇥2 (⇥x.e))]S
    ⇥ E[(⇥x⇤.(G⇥2 ((⇥x.e)(G⇥1 x⇤))))]
    E[(G⇥1⇥⇥2 v)]S
    ⇥ E[wrong “Non-procedure”] (v ⌅= ⇥x.e)
    Figure 4. Extensions to figure 1 to form the separated-guards
    natural embedding
    guarded bounda
    with unguarded
    Theorem 3. Fo
    the following pr
    (1
    (2
    where ⇤ is obse
    In other wor
    boundaries with
    program, and th
    boundaries is eq
    guards and unc
    figure 3 is the sa
    3.3 A further
    While the guard
    mentation based
    checks. For insta
    alent to
    (⇥x :
    The check perfo
    check performe
    the value is com
    that the convers
    We can refin
    essary checks. W
    written G⇥
    +
    , tha
    negative guards,
    Scheme. Their r
    E[(G+
    n)]S
    E[(G+
    v)]S
    E[(G⇥1⇥⇥2
    +
    v)]S
    E[(G⇥1⇥⇥2
    +
    v)]S
    E[(G v)]S
    E[(G⇥1⇥⇥2 v)]S
    The function
    that result from

    View Slide

  45. But let's not worry about understanding SWIG...

    View Slide

  46. More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    "Much of scientific programming is exploratory in
    nature, and for that sort of programming the use of
    compiled languages will cease. Interpreters will simply
    be fast enough for most such calculations.

    View Slide

  47. More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    SWIG flips it: Interpreted environments are
    extensions of compiled programs!

    View Slide

  48. More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    SWIG flips it: Interpreted environments are
    extensions of compiled programs!
    ...or "real" programs are written in C++.

    View Slide

  49. More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    SWIG flips it: Interpreted environments are
    extensions of compiled programs!
    ...or "real" programs are written in C++.
    Scripting is just this add-on... it's just scripting.

    View Slide

  50. I don't want to write C++
    CENSORED
    RANT
    No. I won't do it.

    View Slide

  51. More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    So maybe people using ctypes,
    Cython, etc. have the right idea!

    View Slide

  52. More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    So maybe people using ctypes,
    Cython, etc. have the right idea!
    But I'm still bothered!

    View Slide

  53. be fast enough for most such calculations.
    More computationally intensive programs will be
    written as extensions of interpreted environments."
    - Paul Dubois (Computers in Physics, Mar. 1997)
    "Much of scientific programming is exploratory in
    nature, and for that sort of programming the use of
    compiled languages will cease. Interpreters will simply
    Interpreters will simply

    View Slide

  54. - Paul Dubois (Computers in Physics, Mar. 1997)
    "Much of scientific programming is exploratory in
    nature, and for that sort of programming the use of
    compiled languages will cease. Interpreters will simply
    Extensions are still heavily tied to the C compiler
    - Generation of C code
    - Invocation of the C compiler (behind scenes)
    - Shared libraries
    - Dynamic loading
    - Other associated issues

    View Slide

  55. Question : Could you get rid
    of the compiler?

    View Slide

  56. Question : Could you get rid
    of the compiler?
    (if so, what would that look like?)

    View Slide

  57. Maybe it looks like PyPy...

    View Slide

  58. But, maybe it looks like LLVM...

    View Slide

  59. Let's Hack!
    • LLVM (http://llvm.org)
    • LLVMPY (http://www.llvmpy.org)

    View Slide

  60. The Future?
    This LLVM stuff is kind of fun...
    ... just the sort of thing to keep a
    physicist hacker busy.

    View Slide

  61. View Slide

  62. from numba.decorators import jit
    from numba import int32
    @jit(arg_types=[int32[:,:], int32[:,:]], ret_type=int32[:,:])
    def filter2d(image, filt):
    M, N = image.shape
    Mf, Nf = filt.shape
    Mf2 = Mf // 2
    Nf2 = Nf // 2
    result = numpy.zeros_like(image)
    for i in range(Mf2, M - Mf2):
    for j in range(Nf2, N - Nf2):
    num = 0.0
    for ii in range(Mf):
    for jj in range(Nf):
    num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj])
    result[i, j] = num
    return result

    View Slide

  63. View Slide

  64. Why must extensions be written in C/Fortran?
    LLVM

    View Slide

  65. Haskell anyone?
    LLVM

    View Slide

  66. If Python wanted to be bold...
    ...LLVM bindings in the standard library.

    View Slide