Slide 1

Slide 1 text

Mixing languages in scientific codes David Bindel, 1 May 2017

Slide 2

Slide 2 text

Why mix languages?

Slide 3

Slide 3 text

Why mix languages? Language intrinsics — CUDA for GPU kernels — MATLAB (and Fortran) support arrays, complex arithmetic — Python, Perl, etc have much nicer string support than C — Easier to debug with strongly typed languages

Slide 4

Slide 4 text

Why mix languages? Libraries — Low-level libraries often written in C — LAPACK/BLAS written in Fortran (C bindings now standard) — Most sparse solvers in Fortran or C — Various giant libraries/frameworks: Boost (C++), PETSc (C), CGAL (C++), OpenFOAM (C++), SLICOT (Fortran), etc

Slide 5

Slide 5 text

Why mix languages? Performance — Lower-level languages (C/C++/Fortran) provide — Lower overhead for inner loops — More control over data layout — Optimizing compilers — Higher-level languages provide — Interactivity (REPL) — Terser and more widely understood syntax — Simple library interfaces (?)

Slide 6

Slide 6 text

Common issues — Which language "drives"? — One language per process, or tighter integration? — How are functions identified? Parameters marshalled? — How are data structures managed? — How are failures/exceptions handled? — How is the whole system developed (tools)? — How is the whole system built/deployed?

Slide 7

Slide 7 text

Common pa!erns

Slide 8

Slide 8 text

Common pa!erns Pipelines and pre/post-processors — Uni-directional data flow (source / transform / sink) — Stages communicate via pipes or intermediate files — Each stage is a separate code (possibly different languages)

Slide 9

Slide 9 text

Pre/post-processing example — Finite element pre-processor: GUI input -> input deck — Finite element engine: input deck -> deformation fields — Post-processor / visualization: deformations -> picture

Slide 10

Slide 10 text

Design issues — Protocol for passing records (or other info) — Pipes vs intermediate files — Naming of intermediate files — Reporting on intermediate results

Slide 11

Slide 11 text

Pros and cons — Relatively easy to debug — Best for batch computations — I/O costs can add up — Usually unidirectional communication — A bit OS dependent

Slide 12

Slide 12 text

Common pa!erns Shell-outs and bidirectional pipes — Main task opens child with a pipe (subprocess.Popen) — Subsequent bi-directional communication across pipe — Have to flush regularly!

Slide 13

Slide 13 text

Common pa!erns Library wrapping — Present language A interface to library in language B — Usually involves "glue" code to — Convert input parameters — Handle bad inputs — Call the library function — Handle exceptions — Convert the return values — More elaborate wrappers map higher-level

Slide 14

Slide 14 text

Interface generators — Writing gateways to convert data is tedious! — Use wrapper generators / interface generators — SWIG: Simplified Wrapper Interface Generator — f2py: Fortran-to-Python — Cython — RCpp — tolua — MWrap

Slide 15

Slide 15 text

Common issues — Often need an extra layer for "idiomatic" interface — Zero-based or one-based indexing? — Row-major or column-major array order? — Hard to push complex data types across languages — Objects, callbacks -- pass a handle vs actual data — Careful linking libraries (runtime support for library language)

Slide 16

Slide 16 text

Common pa!erns Configuration and extension — User extends base code with scripts — Game engine scripts — Editor scripts — Web browsers running JS — Common embedded language choices — Lisp variants (GUILE, Elisp) — Lua

Slide 17

Slide 17 text

Common issues — Often want a domain specific language (DSL) — Easier not to build it yourself — Prevents evolving another half-baked Lisp — Gives user the power to break things! — Cause infinite loops — Create security holes — ...

Slide 18

Slide 18 text

Common pa!erns Front-end / back-end — Front-end code (Javascript, Python, ...) — Authentication and user profile — GUI for problem parameter setup — Result visualization — Back-end code (C/C++, Fortran, ...) — Database engine — Computational kernels — Simulators

Slide 19

Slide 19 text

Front-end/back-end channels — Single process with function calls — Multiple co-located processes with IPC — TCP or Unix sockets — Pipes — Processes across possibly different machines — Plain TCP sockets — Messaging systems (e.g. 0MQ) — Remote Procedure Call (RPC) interfaces — Web-based API (e.g. http CRUD API)

Slide 20

Slide 20 text

Front-end/back-end issues — Usual issues with protocol design/choice — Dropped communication channel? — Partial failure?

Slide 21

Slide 21 text

Common pa!erns Legacy code front-end — Idea: Glue a fancy front-end onto an old code — Examples — AUTO 2000 — MATFEAP and FEAPMEX — Misc commercial FEM codes — Often involves front-end / back-end or pipeline style — Challenge: Adapt legacy code with minor changes

Slide 22

Slide 22 text

Common pa!erns Code generation — Goal: High-level interface, low-level performance — ... in a restricted domain — Technique: Write a code generator in a high-level language

Slide 23

Slide 23 text

Common pa!erns Code generation examples — FFTW: OCaml generating C — Numba: Python generating LLVM — PyCUDA: Python generating CUDA — SEJITS: Python generating C variants — SPIRAL: Special generating C — Matexpr: Special generating C/C++

Slide 24

Slide 24 text

Code generation issues — Static or dynamic? — Dynamic / JIT: tune for new problems — Static: user does not need JIT infrastructure! — External or internal DSL? — External: SPIRAL and company — Internal: Template meta-programming, Python transformers — Model-based optimizer or auto-tuning? — What space of problems?

Slide 25

Slide 25 text

Case studies TkBTG — My first numerical code! (undergrad summer project, c.~1997) — Organization — Tcl/Tk front-end — C++ engine (communication via Unix Pipe) — Misc Fortran library codes

Slide 26

Slide 26 text

Case studies LAPACK from C — CLAPACK: C translation of LAPACK (via f2c) — Run through "polisher" to clean up interfaces — Still useful for systems with no Fortran compiler — But a pain to maintain! — LAPACKE: C interfaces to LAPACK — Automatically generated wrappers around Fortran interfaces — No need to regenerate with successive versions

Slide 27

Slide 27 text

Case studies SUGAR — Initially — Custom device description language — MATLAB for solvers and models — Eventually — Lua-based device description language — MATLAB and C/C++ solvers — MATLAB and C/C++ element models

Slide 28

Slide 28 text

Case studies HiQLab (and others) — Lua-based device description language — C/C++ solvers and element models — High-level simulation scripting in MATLAB or Lua

Slide 29

Slide 29 text

Case studies MWrap — MATLAB MEX files from decorated MATLAB code — Involves a little language to specify type signatures — Automatically checks inputs/outputs — Automatically handles exceptions

Slide 30

Slide 30 text

Case studies FEAPMEX and MATFEAP — FEAP: An academic Finite Element Analysis Program — FEAPMEX — MATLAB MEX interface to FEAP — Directly accesses FEAP data structures — Auto-generated "glue" code — MATFEAP — Client-server interface to FEAP — FEAP stdin/stdout glued to socket or pipe