Rethinking Extension Programming

Rethinking Extension Programming

Invited Keynote. EuroSciPy 2012. Brussels. Talk was recorded, but the video seems to be lost to the sands of time. A shame as I live-coded some LLVM examples. Contact me if its ever found.

70c42f4cf225f1455a7e01379bbd4d48?s=128

David Beazley

August 25, 2012
Tweet

Transcript

  1. Rethinking Extension Programming David Beazley (@dabeaz) EuroSciPy August 25, 2012

  2. "Much of scientific programming is exploratory in nature, and for

    that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments."
  3. "Much of scientific programming is exploratory in nature, and for

    that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations. More computationally intensive programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997)
  4. NUMERICAL PYTHON Paul F. Dubois, Konrad Hinsen, and James Hugunin

    Department Editor: Paul F. Dubois duboisl@llnl.gov P)1hon is a small and easy-la-learn hmgl!age with surpris- ing capabilities. It is an interpreted objecl-orienT.ed script- ing language and has a full range of sophisticated features such as first-class functions, garbage collection, and excep- tion handling. Python has properties that m"ke it especially appealing for scientific programming: Python is quite simple and easy to learn, but it is a full and complete language. It is simple to extend Python with your own compiled objects and functions. Python is portable, from Unix to Windows 95 to Linux to Macintosh. Python is free, with no license required e'/en if you make a commercial product out of it. _ Python has a large user-contributed library of "modules" These modules cover a wide variety of needs, such as audio. benchmarks for the basic language and the numerical extension are available on the Python Web site (http://www. python.org). The numerical extension is still in beta test and may therefore change slightly from this description. In par- ticular, the beta-test period is needed to sort out some controversies in naming and coercion rules. But with this tutorial as a start and the latest readme file for the numerical extension, you should be able to start using it. Note that you will need to add the extension to the Python source; every effort is made to keep a "minimal Python" as small as possible, as Python is being used for applications where a small size is important. Python is extremely well suited to the development of programmable ap- plications, as has been advocated on these pages (CiP S:I, 1994, p. 70). It has a scripting language as the user interface and compiled code for the compute-in- tensive portions. Introducing Python Python is an interpreter. You can either enter commands directly into the interpreter Of,more commonly, create a file containing a script. On Unix, you can invoke Python with the script as the first argument, or you can use the llsual trick of starting the script with a comment like this: # !/usr/localibinlpython Then you give execute permission to the script file. When you execute it, the Python interpreter is invoked on the script file itself. Since the above line is a comment as far as Python is concerned, it is then ignored. w ex th by 80 lis so us tw > { A l obj f The fi In m tupl fir 2 COMPUTERS IN PHYSICS, VOL. 10,NO.3. MAYIJUN 1996 © 1996 AMERICAN INSTITUTE OF PHYSICS 0894_1866/96110(3)126216/$10.00
  5. None
  6. "Imagine a physics program written in Fortran, whose user interface

    is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, define functions, and so on."
  7. "Imagine a physics program written in Fortran, whose user interface

    is a programming language (not Fortran, of course, but imagine something similar). In this programming language you can create variables, assign values, loop, if- test, define functions, and so on." 1994
  8. 1994

  9. 1994 Awesome!

  10. Cool pictures, but all was not well...

  11. Batch Scripting

  12. Testing Hell

  13. Testing Hell

  14. Testing Hell Well, does it conserve energy?

  15. Horrible Workflow Batch job

  16. Horrible Workflow Batch job 3-6 hours Data Files

  17. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB)
  18. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s)
  19. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) 4 GB NFS
  20. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS
  21. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS hours
  22. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak
  23. Horrible Workflow Batch job 3-6 hours repeat Data Files (10-20

    GB) FTP (10 Mbit/s) walking 4 GB NFS hours Tweak Unworkable!
  24. 1995 (I was in the first year of a computer

    science Ph.D.)
  25. 1995 Compiler hacking!

  26. 1995 Functional Programming, Type systems.

  27. 1995 Networks (sockets, RPC, etc.)

  28. 1995 A distracted advisor!

  29. A Diabolical Convergence 1995

  30. A REPL is born! 1995

  31. 1995

  32. 1995 Mother of Swig! (Yes)

  33. You did what?!?!?!? NO! NO! NO! NO! NO! NO! NO!

    (Go look at TCL)
  34. SWIG project unleashed 1996

  35. 15 of 35 Scriptable Scientific Software beazley@cs.uchicago.edu SWIG (Simplified Wrapper

    and Interface Generator) • SWIG is a compiler that automatically constructs scripting interfaces given ANSI C/C++ declarations. • Can be used to quickly build scripting interfaces from header files. • Avoids the problem of writing wrapper code by hand. • Disclaimer : SWIG is not the only approach. %module SPaSM %{ #include "SPaSM.h" %} void memory(int natoms); void geometry(double ... void processors(int ... void set_boundary_period ... >>> from SPaSM import * >>> memory(20000) >>> geometry(0,0,0,80,80, 160,2.0) ... SWIG SWIG Interface file Python SWIG: C declarations to Python wrappers. (Details don't really matter here)
  36. 1996 It was awesome! (and radical) Sample Session

  37. T1 * 3 != T3

  38. None
  39. (A finished dissertation is a good dissertation)

  40. Swig lives on! New release! (Aug 20, 2012)

  41. But, I've long been bothered... What is Swig exactly? -

    A gross hack? - A programming languages project? - A software engineering project? - A good idea? - A bad idea? - Insanity? Nobody on my Ph.D. committee quite knew. (and neither did I)
  42. It's not easy to wrap your brain around it... Example:

    typemaps %define Name ## _input_binary(TYPEMAP, SIZE) %typemap(in,noblock=1,fragment=#SWIG_AsCharPtrAndSize) (TYPEMAP, SIZE) (int res, Char *buf = 0, size_t size = 0, int alloc = 0) { res = SWIG_AsCharPtrAndSize($input, &buf, &size, &alloc); if (!SWIG_IsOK(res)) { %argument_fail(res, "(TYPEMAP, SIZE)", $symname, $argnum); } $1 = ($1_ltype) buf; $2 = ($2_ltype) size - 1; } %typemap(freearg,noblock=1,match="in") (TYPEMAP, SIZE) { if (alloc$argnum == SWIG_NEWOBJ) %delete_array(buf$argnum); } %enddef Don't ask me--I don't understand it either
  43. Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler

    University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming lan- guage designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that inter- operating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages for- mally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07)
  44. Operational Semantics for Multi-Language Programs Jacob Matthews Robert Bruce Findler

    University of Chicago {jacobm, robby}@cs.uchicago.edu Abstract Inter-language interoperability is big business, as the success of Mi- crosoft’s .NET and COM and Sun’s JVM show. Programming lan- guage designers are designing programming languages that reflect that fact — SML#, Mondrian, and Scala, to name just a few ex- amples, all treat interoperability with other languages as a central design feature. Still, current multi-language research tends not to focus on the semantics of interoperation features, but only on how to implement them efficiently. In this paper, we take first steps to- ward higher-level models of interoperating systems. Our technique abstracts away the low-level details of interoperability like garbage collection and representation coherence, and lets us focus on se- mantic properties like type-safety and observable equivalence. Beyond giving simple expressive models that are natural com- positions of single-language models, our studies have uncovered several interesting facts about interoperability. For example, higher- order contracts naturally emerge as the glue to ensure that inter- operating languages respect each other’s type systems. While we present our results in an abstract setting, they shed light on real multi-language systems and tools such as the JNI, SWIG, and Haskell’s stable pointers. Categories and Subject Descriptors D.3.1 [Programming Lan- guages]: Formal Definitions and Theory—Semantics General Terms Languages, theory for the popular wrapper generator SWIG [4]), these new foreign function interfaces are built to allow high-level, safe languages to interoperate with other high-level, safe languages, such as Python with Scheme [32] and Lua with OCaml [38]. Since these embeddings are driven by practical concerns, the research that accompanies them rightly focuses on the bits and bytes of interoperability — how to represent data in memory, how to call a foreign function efficiently, and so on. But an important theoretical problem arises, independent of these implementation- level concerns: how can we reason formally about multi-language programs? This is a particularly important question for systems that involve typed languages, because we have to show that the embeddings respect their constituents’ type systems. In this paper we present a simple method for giving operational semantics to multi-language systems. Our models are rich enough to support a wide variety of multi-language embedding strategies, and powerful enough that we have been able to use them for type soundness and contextual equivalence proofs. Our technique is based on simple constructs we call boundaries, cross-language casts that regulate both control flow and value conversion between languages. We introduce boundaries through a series of operational semantics in which we combine a simple ML-like language with a simple Scheme-like language. In section 2, we introduce those two constituent languages for- mally and connect them using a primitive embedding where values in one language are opaque to the other. In section 3, we enrich People have tried... (POPL'07) e = · · · | (MSG⇥ e) e = · · · | (GSM ⇥ e) E = · · · | (MSG⇥ E) E = · · · | (GSM ⇥ E) ⌥S e : TST ⌥M (MSG⇥ e) : ⇤ ⌥M e : ⇤ ⌥S (GSM ⇥ e) : TST E[MSG n]M ⇥ E[n] E[MSG v]M ⇥ E[MSG (wrong “Non-number”)] v ⌅= n E[MSG⇥1⇥⇥2 ⇥x.e]M ⇥ E[⇥x : ⇤1.MSG⇥2 ((⇥x.e) (GSM ⇥1 x))] x not free in e E[MSG⇥1⇥⇥2 v]M ⇥ E[MSG⇥1⇥⇥2 wrong “Non-procedure”] v ⌅= ⇥x.e E[(GSM n)]S ⇥ E[n] E[(GSM ⇥1⇥⇥2 v)]S ⇥ E[(⇥x. (GSM ⇥2 (v (MSG⇥1 x))))] Figure 3. Extensions to figure 1 to form the simple natural embed- ding e = · · · | (⇥MSN e) e = · · · | (G⇥ e) | (SM ⇥ N e) E = · · · | (⇥MSN E) E = · · · | (G⇥ E) | (SM ⇥ N E) ⌥S e : TST ⌥S (G⇥ e) : TST ⌥S e : TST ⌥M (⇥MSN e) : ⇤ ⌥M e : ⇤ ⌥S (SM ⇥ N e) : ⇤ E[ MSN n]M ⇥ E[n] E[SMN n]S ⇥ E[n] E[⇥1⇥⇥2MSN ⇥x.e]M ⇥ E[⇥x : ⇤1.SM ⇥2 N ((⇥x.e) (SM ⇥1 N x))] E[(SM ⇥1⇥⇥2 N v)]S ⇥ E[(⇥x.(SM ⇥2 N (v (⇥1MSN x))))] E[(G n)]S ⇥ E[n] E[(G v)]S ⇥ E[wrong “Non-number”] (v ⌅= n) E[(G⇥1⇥⇥2 (⇥x.e))]S ⇥ E[(⇥x⇤.(G⇥2 ((⇥x.e)(G⇥1 x⇤))))] E[(G⇥1⇥⇥2 v)]S ⇥ E[wrong “Non-procedure”] (v ⌅= ⇥x.e) Figure 4. Extensions to figure 1 to form the separated-guards natural embedding guarded bounda with unguarded Theorem 3. Fo the following pr (1 (2 where ⇤ is obse In other wor boundaries with program, and th boundaries is eq guards and unc figure 3 is the sa 3.3 A further While the guard mentation based checks. For insta alent to (⇥x : The check perfo check performe the value is com that the convers We can refin essary checks. W written G⇥ + , tha negative guards, Scheme. Their r E[(G+ n)]S E[(G+ v)]S E[(G⇥1⇥⇥2 + v)]S E[(G⇥1⇥⇥2 + v)]S E[(G v)]S E[(G⇥1⇥⇥2 v)]S The function that result from
  45. But let's not worry about understanding SWIG...

  46. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply be fast enough for most such calculations.
  47. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs!
  48. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++.
  49. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) SWIG flips it: Interpreted environments are extensions of compiled programs! ...or "real" programs are written in C++. Scripting is just this add-on... it's just scripting.
  50. I don't want to write C++ CENSORED RANT No. I

    won't do it.
  51. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea!
  52. More computationally intensive programs will be written as extensions of

    interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) So maybe people using ctypes, Cython, etc. have the right idea! But I'm still bothered!
  53. be fast enough for most such calculations. More computationally intensive

    programs will be written as extensions of interpreted environments." - Paul Dubois (Computers in Physics, Mar. 1997) "Much of scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Interpreters will simply
  54. - Paul Dubois (Computers in Physics, Mar. 1997) "Much of

    scientific programming is exploratory in nature, and for that sort of programming the use of compiled languages will cease. Interpreters will simply Extensions are still heavily tied to the C compiler - Generation of C code - Invocation of the C compiler (behind scenes) - Shared libraries - Dynamic loading - Other associated issues
  55. Question : Could you get rid of the compiler?

  56. Question : Could you get rid of the compiler? (if

    so, what would that look like?)
  57. Maybe it looks like PyPy...

  58. But, maybe it looks like LLVM...

  59. Let's Hack! • LLVM (http://llvm.org) • LLVMPY (http://www.llvmpy.org)

  60. The Future? This LLVM stuff is kind of fun... ...

    just the sort of thing to keep a physicist hacker busy.
  61. None
  62. from numba.decorators import jit from numba import int32 @jit(arg_types=[int32[:,:], int32[:,:]],

    ret_type=int32[:,:]) def filter2d(image, filt): M, N = image.shape Mf, Nf = filt.shape Mf2 = Mf // 2 Nf2 = Nf // 2 result = numpy.zeros_like(image) for i in range(Mf2, M - Mf2): for j in range(Nf2, N - Nf2): num = 0.0 for ii in range(Mf): for jj in range(Nf): num += (filt[Mf-1-ii, Nf-1-jj] * image[i-Mf2+ii, j-Nf2+jj]) result[i, j] = num return result
  63. None
  64. Why must extensions be written in C/Fortran? LLVM

  65. Haskell anyone? LLVM

  66. If Python wanted to be bold... ...LLVM bindings in the

    standard library.