Slide 1

Slide 1 text

Rethinking Extension Programming David Beazley (@dabeaz) EuroSciPy August 25, 2012

Slide 2

Slide 2 text

"Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments."

Slide 3

Slide 3 text

"Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997)

Slide 4

Slide 4 text

NUMERICAL PYTHON Paul F. Dubois, Konrad Hinsen, and James Hugunin Department Editor: Paul F. Dubois [email protected] P)1hon is a small and easy-la-learn hmgl!age with surpris- ing capabilities. It is an interpreted objecl-orienT.ed script- ing language and has a full range of sophisticated features such as first-class functions, garbage collection, and excep- tion handling. Python has properties that m"ke it especially appealing for scientific programming: Python is quite simple and easy to learn, but it is a full and complete language. It is simple to extend Python with your own compiled objects and functions. Python is portable, from Unix to Windows 95 to Linux to Macintosh. Python is free, with no license required e'/en if you make a commercial product out of it. _ Python has a large user-contributed library of "modules" These modules cover a wide variety of needs, such as audio. benchmarks for the basic language and the numerical extension are available on the Python Web site (http://www. python.org). The numerical extension is still in beta test and may therefore change slightly from this description. In par- ticular, the beta-test period is needed to sort out some controversies in naming and coercion rules. But with this tutorial as a start and the latest readme file for the numerical extension, you should be able to start using it. Note that you will need to add the extension to the Python source; every effort is made to keep a "minimal Python" as small as possible, as Python is being used for applications where a small size is important. Python is extremely well suited to the development of programmable ap- plications, as has been advocated on these pages (CiP S:I, 1994, p. 70). It has a scripting language as the user interface and compiled code for the compute-in- tensive portions. Introducing Python Python is an interpreter. You can either enter commands directly into the interpreter Of,more commonly, create a file containing a script. On Unix, you can invoke Python with the script as the first argument, or you can use the llsual trick of starting the script with a comment like this: # !/usr/localibinlpython Then you give execute permission to the script file. When you execute it, the Python interpreter is invoked on the script file itself. Since the above line is a comment as far as Python is concerned, it is then ignored. w ex th by 80 lis so us tw > { A l obj f The fi In m tupl fir 2 COMPUTERS IN PHYSICS, VOL. 10,NO.3. MAYIJUN 1996 © 1996 AMERICAN INSTITUTE OF PHYSICS 0894_1866/96110(3)126216/$10.00

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

"Imagine a physics program written in Fortran, whose user interface is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, define functions, and so on."

Slide 7

Slide 7 text

"Imagine a physics program written in Fortran, whose user interface is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, define functions, and so on." 1994

Slide 8

Slide 8 text

1994

Slide 9

Slide 9 text

1994 Awesome!

Slide 10

Slide 10 text

Cool pictures, but all was not well...

Slide 11

Slide 11 text

Batch Scripting

Slide 12

Slide 12 text

Testing Hell

Slide 13

Slide 13 text

Testing Hell

Slide 14

Slide 14 text

Testing Hell Well, does it conserve energy?

Slide 15

Slide 15 text

Horrible Workflow Batch job

Slide 16

Slide 16 text

Horrible Workflow Batch job 3-6 hours Data Files

Slide 17

Slide 17 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB)

Slide 18

Slide 18 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB) FTP (10 Mbit/s)

Slide 19

Slide 19 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB) FTP (10 Mbit/s) 4 GB NFS

Slide 20

Slide 20 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB) FTP (10 Mbit/s) walking 4 GB NFS

Slide 21

Slide 21 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB) FTP (10 Mbit/s) walking 4 GB NFS hours

Slide 22

Slide 22 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak

Slide 23

Slide 23 text

Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20 GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak Unworkable!

Slide 24

Slide 24 text

1995 (I was in the first year of a computer science Ph.D.)

Slide 25

Slide 25 text

1995 Compiler hacking!

Slide 26

Slide 26 text

1995 Functional Programming, Type systems.

Slide 27

Slide 27 text

1995 Networks (sockets, RPC, etc.)

Slide 28

Slide 28 text

1995 A distracted advisor!

Slide 29

Slide 29 text

A Diabolical Convergence 1995

Slide 30

Slide 30 text

A REPL is born! 1995

Slide 31

Slide 31 text

1995

Slide 32

Slide 32 text

1995 Mother of Swig! (Yes)

Slide 33

Slide 33 text

You did what?!?!?!? NO! NO! NO! NO! NO! NO! NO! (Go look at TCL)

Slide 34

Slide 34 text

SWIG project unleashed 1996

Slide 35

Slide 35 text

15 of 35 Scriptable Scientific Software [email protected] SWIG (Simplified Wrapper and Interface Generator) • SWIG is a compiler that automatically constructs scripting interfaces given ANSI C/C++ declarations. • Can be used to quickly build scripting interfaces from header files. • Avoids the problem of writing wrapper code by hand. • Disclaimer : SWIG is not the only approach. %module SPaSM %{ #include "SPaSM.h" %} void memory(int natoms); void geometry(double ... void processors(int ... void set_boundary_period ... >>> from SPaSM import * >>> memory(20000) >>> geometry(0,0,0,80,80, 160,2.0) ... SWIG SWIG Interface file Python SWIG: C declarations to Python wrappers. (Details don't really matter here)

Slide 36

Slide 36 text

1996 It was awesome! (and radical) Sample Session

Slide 37

Slide 37 text

T1 * 3 != T3

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

(A finished dissertation is a good dissertation)

Slide 40

Slide 40 text

Swig lives on! New release! (Aug 20, 2012)

Slide 41

Slide 41 text

But, I've long been bothered... What is Swig exactly? - A gross hack? - A programming languages project? - A software engineering project? - A good idea? - A bad idea? - Insanity? Nobody on my Ph.D. committee quite knew. (and neither did I)

Slide 42

Slide 42 text

It's not easy to wrap your brain around it... Example: typemaps %define Name ## _input_binary(TYPEMAP, SIZE) %typemap(in,noblock=1,fragment=#SWIG_AsCharPtrAndSize) (TYPEMAP, SIZE) (int res, Char *buf = 0, size_t size = 0, int alloc = 0) { res = SWIG_AsCharPtrAndSize($input, &buf, &size, &alloc); if (!SWIG_IsOK(res)) { %argument_fail(res, "(TYPEMAP, SIZE)", $symname, $argnum); } $1 = ($1_ltype) buf; $2 = ($2_ltype) size - 1; } %typemap(freearg,noblock=1,match="in") (TYPEMAP, SIZE) { if (alloc$argnum == SWIG_NEWOBJ) %delete_array(buf$argnum); } %enddef Don't ask me--I don't understand it either

Slide 43

Slide 43 text

Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming lan- guage designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that inter- operating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages for- mally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07)

Slide 44

Slide 44 text

Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming lan- guage designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that inter- operating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages for- mally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07) e = · · · | (MSG⇥ e) e = · · · | (GSM ⇥ e) E = · · · | (MSG⇥ E) E = · · · | (GSM ⇥ E) ⌥S e : TST ⌥M (MSG⇥ e) : ⇤ ⌥M e : ⇤ ⌥S (GSM ⇥ e) : TST E[MSG n]M ⇥ E[n] E[MSG v]M ⇥ E[MSG (wrong “Non-number”)] v ⌅= n E[MSG⇥1⇥⇥2 ⇥x.e]M ⇥ E[⇥x : ⇤1.MSG⇥2 ((⇥x.e) (GSM ⇥1 x))] x not free in e E[MSG⇥1⇥⇥2 v]M ⇥ E[MSG⇥1⇥⇥2 wrong “Non-procedure”] v ⌅= ⇥x.e E[(GSM n)]S ⇥ E[n] E[(GSM ⇥1⇥⇥2 v)]S ⇥ E[(⇥x. (GSM ⇥2 (v (MSG⇥1 x))))] Figure 3. Extensions to figure 1 to form the simple natural embed- ding e = · · · | (⇥MSN e) e = · · · | (G⇥ e) | (SM ⇥ N e) E = · · · | (⇥MSN E) E = · · · | (G⇥ E) | (SM ⇥ N E) ⌥S e : TST ⌥S (G⇥ e) : TST ⌥S e : TST ⌥M (⇥MSN e) : ⇤ ⌥M e : ⇤ ⌥S (SM ⇥ N e) : ⇤ E[ MSN n]M ⇥ E[n] E[SMN n]S ⇥ E[n] E[⇥1⇥⇥2MSN ⇥x.e]M ⇥ E[⇥x : ⇤1.SM ⇥2 N ((⇥x.e) (SM ⇥1 N x))] E[(SM ⇥1⇥⇥2 N v)]S ⇥ E[(⇥x.(SM ⇥2 N (v (⇥1MSN x))))] E[(G n)]S ⇥ E[n] E[(G v)]S ⇥ E[wrong “Non-number”] (v ⌅= n) E[(G⇥1⇥⇥2 (⇥x.e))]S ⇥ E[(⇥x⇤.(G⇥2 ((⇥x.e)(G⇥1 x⇤))))] E[(G⇥1⇥⇥2 v)]S ⇥ E[wrong “Non-procedure”] (v ⌅= ⇥x.e) Figure 4. Extensions to figure 1 to form the separated-guards natural embedding guarded bounda with unguarded Theorem 3. Fo the following pr (1 (2 where ⇤ is obse In other wor boundaries with program, and th boundaries is eq guards and unc figure 3 is the sa 3.3 A further While the guard mentation based checks. For insta alent to (⇥x : The check perfo check performe the value is com that the convers We can refin essary checks. W written G⇥ + , tha negative guards, Scheme. Their r E[(G+ n)]S E[(G+ v)]S E[(G⇥1⇥⇥2 + v)]S E[(G⇥1⇥⇥2 + v)]S E[(G v)]S E[(G⇥1⇥⇥2 v)]S The function that result from

Slide 45

Slide 45 text

But let's not worry about understanding SWIG...

Slide 46

Slide 46 text

More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations.

Slide 47

Slide 47 text

More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs!

Slide 48

Slide 48 text

More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++.

Slide 49

Slide 49 text

More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++. Scripting is just this add-on... it's just scripting.

Slide 50

Slide 50 text

I don't want to write C++ CENSORED RANT No. I won't do it.

Slide 51

Slide 51 text

More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea!

Slide 52

Slide 52 text

More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea! But I'm still bothered!

Slide 53

Slide 53 text

be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Interpreters will simply

Slide 54

Slide 54 text

- Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Extensions are still heavily tied to the C compiler - Generation of C code - Invocation of the C compiler (behind scenes) - Shared libraries - Dynamic loading - Other associated issues

Slide 55

Slide 55 text

Question : Could you get rid of the compiler?

Slide 56

Slide 56 text

Question : Could you get rid of the compiler? (if so, what would that look like?)

Slide 57

Slide 57 text

Maybe it looks like PyPy...

Slide 58

Slide 58 text

But, maybe it looks like LLVM...

Slide 59

Slide 59 text

Let's Hack! • LLVM (http://llvm.org) • LLVMPY (http://www.llvmpy.org)

Slide 60

Slide 60 text

The Future? This LLVM stuff is kind of fun... ... just the sort of thing to keep a physicist hacker busy.

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

from numba.decorators import jit from numba import int32 @jit(arg_types=[int32[:,:], int32[:,:]], ret_type=int32[:,:]) def filter2d(image, filt): M, N = image.shape Mf, Nf = filt.shape Mf2 = Mf // 2 Nf2 = Nf // 2 result = numpy.zeros_like(image) for i in range(Mf2, M - Mf2): for j in range(Nf2, N - Nf2): num = 0.0 for ii in range(Mf): for jj in range(Nf): num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj]) result[i, j] = num return result

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

Why must extensions be written in C/Fortran? LLVM

Slide 65

Slide 65 text

Haskell anyone? LLVM

Slide 66

Slide 66 text

If Python wanted to be bold... ...LLVM bindings in the standard library.