Rethinking Extension Programming

Rethinking Extension Programming David Beazley (@dabeaz) EuroSciPy August 25, 2012

"Much of scientiﬁc programming is exploratory in nature, and for
that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments."

"Much of scientiﬁc programming is exploratory in nature, and for
that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997)

NUMERICAL PYTHON Paul F. Dubois, Konrad Hinsen, and James Hugunin
Department Editor: Paul F. Dubois [email protected] P)1hon is a small and easy-la-learn hmgl!age with surpris- ing capabilities. It is an interpreted objecl-orienT.ed scripting language and has a full range of sophisticated features such as first-class functions, garbage collection, and excep- tion handling. Python has properties that m"ke it especially appealing for scientific programming: Python is quite simple and easy to learn, but it is a full and complete language. It is simple to extend Python with your own compiled objects and functions. Python is portable, from Unix to Windows 95 to Linux to Macintosh. Python is free, with no license required e'/en if you make a commercial product out of it. _ Python has a large user-contributed library of "modules" These modules cover a wide variety of needs, such as audio. benchmarks for the basic language and the numerical extension are available on the Python Web site (http://www. python.org). The numerical extension is still in beta test and may therefore change slightly from this description. In par- ticular, the beta-test period is needed to sort out some controversies in naming and coercion rules. But with this tutorial as a start and the latest readme file for the numerical extension, you should be able to start using it. Note that you will need to add the extension to the Python source; every effort is made to keep a "minimal Python" as small as possible, as Python is being used for applications where a small size is important. Python is extremely well suited to the development of programmable applications, as has been advocated on these pages (CiP S:I, 1994, p. 70). It has a scripting language as the user interface and compiled code for the compute-intensive portions. Introducing Python Python is an interpreter. You can either enter commands directly into the interpreter Of,more commonly, create a file containing a script. On Unix, you can invoke Python with the script as the first argument, or you can use the llsual trick of starting the script with a comment like this: # !/usr/localibinlpython Then you give execute permission to the script file. When you execute it, the Python interpreter is invoked on the script file itself. Since the above line is a comment as far as Python is concerned, it is then ignored. w ex th by 80 lis so us tw > { A l obj f The fi In m tupl fir 2 COMPUTERS IN PHYSICS, VOL. 10,NO.3. MAYIJUN 1996 © 1996 AMERICAN INSTITUTE OF PHYSICS 0894_1866/96110(3)126216/$10.00

"Imagine a physics program written in Fortran, whose user interface
is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, deﬁne functions, and so on."

"Imagine a physics program written in Fortran, whose user interface
is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, deﬁne functions, and so on." 1994

1994 Awesome!

Cool pictures, but all was not well...

Batch Scripting

Testing Hell

Testing Hell Well, does it conserve energy?

Horrible Workﬂow Batch job

Horrible Workﬂow Batch job 3-6 hours Data Files

Horrible Workﬂow Batch job 3-6 hours repeat Data Files (10-20
GB)

GB) FTP (10 Mbit/s)

GB) FTP (10 Mbit/s) 4 GB NFS

GB) FTP (10 Mbit/s) walking 4 GB NFS

GB) FTP (10 Mbit/s) walking 4 GB NFS hours

GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak

GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak Unworkable!

1995 (I was in the ﬁrst year of a computer
science Ph.D.)

1995 Compiler hacking!

1995 Functional Programming, Type systems.

1995 Networks (sockets, RPC, etc.)

1995 A distracted advisor!

A Diabolical Convergence 1995

A REPL is born! 1995

1995 Mother of Swig! (Yes)

You did what?!?!?!? NO! NO! NO! NO! NO! NO! NO!
(Go look at TCL)

SWIG project unleashed 1996

15 of 35 Scriptable Scientific Software [email protected] SWIG (Simplified Wrapper
and Interface Generator) • SWIG is a compiler that automatically constructs scripting interfaces given ANSI C/C++ declarations. • Can be used to quickly build scripting interfaces from header files. • Avoids the problem of writing wrapper code by hand. • Disclaimer : SWIG is not the only approach. %module SPaSM %{ #include "SPaSM.h" %} void memory(int natoms); void geometry(double ... void processors(int ... void set_boundary_period ... >>> from SPaSM import * >>> memory(20000) >>> geometry(0,0,0,80,80, 160,2.0) ... SWIG SWIG Interface file Python SWIG: C declarations to Python wrappers. (Details don't really matter here)

1996 It was awesome! (and radical) Sample Session

T1 * 3 != T3

(A ﬁnished dissertation is a good dissertation)

Swig lives on! New release! (Aug 20, 2012)

But, I've long been bothered... What is Swig exactly? -
A gross hack? - A programming languages project? - A software engineering project? - A good idea? - A bad idea? - Insanity? Nobody on my Ph.D. committee quite knew. (and neither did I)

It's not easy to wrap your brain around it... Example:
typemaps %define Name ## _input_binary(TYPEMAP, SIZE) %typemap(in,noblock=1,fragment=#SWIG_AsCharPtrAndSize) (TYPEMAP, SIZE) (int res, Char *buf = 0, size_t size = 0, int alloc = 0) { res = SWIG_AsCharPtrAndSize($input, &buf, &size, &alloc); if (!SWIG_IsOK(res)) { %argument_fail(res, "(TYPEMAP, SIZE)", $symname, $argnum); } $1 = ($1_ltype) buf; $2 = ($2_ltype) size - 1; } %typemap(freearg,noblock=1,match="in") (TYPEMAP, SIZE) { if (alloc$argnum == SWIG_NEWOBJ) %delete_array(buf$argnum); } %enddef Don't ask me--I don't understand it either

Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler
University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming language designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that interoperating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages formally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07)

Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler
University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming language designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that interoperating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages formally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07) e = · · · | (MSG⇥ e) e = · · · | (GSM ⇥ e) E = · · · | (MSG⇥ E) E = · · · | (GSM ⇥ E) ⌥S e : TST ⌥M (MSG⇥ e) : ⇤ ⌥M e : ⇤ ⌥S (GSM ⇥ e) : TST E[MSG n]M ⇥ E[n] E[MSG v]M ⇥ E[MSG (wrong “Non-number”)] v ⌅= n E[MSG⇥1⇥⇥2 ⇥x.e]M ⇥ E[⇥x : ⇤1.MSG⇥2 ((⇥x.e) (GSM ⇥1 x))] x not free in e E[MSG⇥1⇥⇥2 v]M ⇥ E[MSG⇥1⇥⇥2 wrong “Non-procedure”] v ⌅= ⇥x.e E[(GSM n)]S ⇥ E[n] E[(GSM ⇥1⇥⇥2 v)]S ⇥ E[(⇥x. (GSM ⇥2 (v (MSG⇥1 x))))] Figure 3. Extensions to figure 1 to form the simple natural embedding e = · · · | (⇥MSN e) e = · · · | (G⇥ e) | (SM ⇥ N e) E = · · · | (⇥MSN E) E = · · · | (G⇥ E) | (SM ⇥ N E) ⌥S e : TST ⌥S (G⇥ e) : TST ⌥S e : TST ⌥M (⇥MSN e) : ⇤ ⌥M e : ⇤ ⌥S (SM ⇥ N e) : ⇤ E[ MSN n]M ⇥ E[n] E[SMN n]S ⇥ E[n] E[⇥1⇥⇥2MSN ⇥x.e]M ⇥ E[⇥x : ⇤1.SM ⇥2 N ((⇥x.e) (SM ⇥1 N x))] E[(SM ⇥1⇥⇥2 N v)]S ⇥ E[(⇥x.(SM ⇥2 N (v (⇥1MSN x))))] E[(G n)]S ⇥ E[n] E[(G v)]S ⇥ E[wrong “Non-number”] (v ⌅= n) E[(G⇥1⇥⇥2 (⇥x.e))]S ⇥ E[(⇥x⇤.(G⇥2 ((⇥x.e)(G⇥1 x⇤))))] E[(G⇥1⇥⇥2 v)]S ⇥ E[wrong “Non-procedure”] (v ⌅= ⇥x.e) Figure 4. Extensions to figure 1 to form the separated-guards natural embedding guarded bounda with unguarded Theorem 3. Fo the following pr (1 (2 where ⇤ is obse In other wor boundaries with program, and th boundaries is eq guards and unc figure 3 is the sa 3.3 A further While the guard mentation based checks. For insta alent to (⇥x : The check perfo check performe the value is com that the convers We can refin essary checks. W written G⇥ + , tha negative guards, Scheme. Their r E[(G+ n)]S E[(G+ v)]S E[(G⇥1⇥⇥2 + v)]S E[(G⇥1⇥⇥2 + v)]S E[(G v)]S E[(G⇥1⇥⇥2 v)]S The function that result from

But let's not worry about understanding SWIG...

More computationally intensive programs will be written as extensions of
interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientiﬁc programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations.

interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG ﬂips it: Interpreted environments are extensions of compiled programs!

interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG ﬂips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++.

interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG ﬂips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++. Scripting is just this add-on... it's just scripting.

I don't want to write C++ CENSORED RANT No. I
won't do it.

interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea!

interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea! But I'm still bothered!

be fast enough for most such calculations. More computationally intensive
programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientiﬁc programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Interpreters will simply

- Paul Dubois (Computers in Physics, Mar. 1997) "Much of
scientiﬁc programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Extensions are still heavily tied to the C compiler - Generation of C code - Invocation of the C compiler (behind scenes) - Shared libraries - Dynamic loading - Other associated issues

Question : Could you get rid of the compiler?

Question : Could you get rid of the compiler? (if
so, what would that look like?)

Maybe it looks like PyPy...

But, maybe it looks like LLVM...

Let's Hack! • LLVM (http://llvm.org) • LLVMPY (http://www.llvmpy.org)

The Future? This LLVM stuff is kind of fun... ...
just the sort of thing to keep a physicist hacker busy.

from numba.decorators import jit from numba import int32 @jit(arg_types=[int32[:,:], int32[:,:]],
ret_type=int32[:,:]) def filter2d(image, filt): M, N = image.shape Mf, Nf = filt.shape Mf2 = Mf // 2 Nf2 = Nf // 2 result = numpy.zeros_like(image) for i in range(Mf2, M - Mf2): for j in range(Nf2, N - Nf2): num = 0.0 for ii in range(Mf): for jj in range(Nf): num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj]) result[i, j] = num return result

Why must extensions be written in C/Fortran? LLVM

Haskell anyone? LLVM

If Python wanted to be bold... ...LLVM bindings in the
standard library.

Rethinking Extension Programming

Rethinking Extension Programming

More Decks by David Beazley

Other Decks in Programming

Featured

Transcript