Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mixing languages in scientific code

Mixing languages in scientific code

Different languages are favored by different computational scientists for good reason: Fortran for legacy and speed, C/C++ for large frameworks and fast kernels, Julia and MATLAB and Python and R for their array syntax and ease of use. But when your simulation framework is in C++, you want to call some old solvers written in Fortran, and the student working with you mostly knows Python or MATLAB, how can you manage it all? In this talk, we discuss some techniques and challenges of mixed language programming, with a particular emphasis on “front-end / back-end” combinations where high-level logic is written in a language like MATLAB or Python and the computational kernels are written in a compiled language.

David Bindel

May 02, 2017
Tweet

More Decks by David Bindel

Other Decks in Programming

Transcript

  1. Why mix languages? Language intrinsics — CUDA for GPU kernels

    — MATLAB (and Fortran) support arrays, complex arithmetic — Python, Perl, etc have much nicer string support than C — Easier to debug with strongly typed languages
  2. Why mix languages? Libraries — Low-level libraries often written in

    C — LAPACK/BLAS written in Fortran (C bindings now standard) — Most sparse solvers in Fortran or C — Various giant libraries/frameworks: Boost (C++), PETSc (C), CGAL (C++), OpenFOAM (C++), SLICOT (Fortran), etc
  3. Why mix languages? Performance — Lower-level languages (C/C++/Fortran) provide —

    Lower overhead for inner loops — More control over data layout — Optimizing compilers — Higher-level languages provide — Interactivity (REPL) — Terser and more widely understood syntax — Simple library interfaces (?)
  4. Common issues — Which language "drives"? — One language per

    process, or tighter integration? — How are functions identified? Parameters marshalled? — How are data structures managed? — How are failures/exceptions handled? — How is the whole system developed (tools)? — How is the whole system built/deployed?
  5. Common pa!erns Pipelines and pre/post-processors — Uni-directional data flow (source

    / transform / sink) — Stages communicate via pipes or intermediate files — Each stage is a separate code (possibly different languages)
  6. Pre/post-processing example — Finite element pre-processor: GUI input -> input

    deck — Finite element engine: input deck -> deformation fields — Post-processor / visualization: deformations -> picture
  7. Design issues — Protocol for passing records (or other info)

    — Pipes vs intermediate files — Naming of intermediate files — Reporting on intermediate results
  8. Pros and cons — Relatively easy to debug — Best

    for batch computations — I/O costs can add up — Usually unidirectional communication — A bit OS dependent
  9. Common pa!erns Shell-outs and bidirectional pipes — Main task opens

    child with a pipe (subprocess.Popen) — Subsequent bi-directional communication across pipe — Have to flush regularly!
  10. Common pa!erns Library wrapping — Present language A interface to

    library in language B — Usually involves "glue" code to — Convert input parameters — Handle bad inputs — Call the library function — Handle exceptions — Convert the return values — More elaborate wrappers map higher-level
  11. Interface generators — Writing gateways to convert data is tedious!

    — Use wrapper generators / interface generators — SWIG: Simplified Wrapper Interface Generator — f2py: Fortran-to-Python — Cython — RCpp — tolua — MWrap
  12. Common issues — Often need an extra layer for "idiomatic"

    interface — Zero-based or one-based indexing? — Row-major or column-major array order? — Hard to push complex data types across languages — Objects, callbacks -- pass a handle vs actual data — Careful linking libraries (runtime support for library language)
  13. Common pa!erns Configuration and extension — User extends base code

    with scripts — Game engine scripts — Editor scripts — Web browsers running JS — Common embedded language choices — Lisp variants (GUILE, Elisp) — Lua
  14. Common issues — Often want a domain specific language (DSL)

    — Easier not to build it yourself — Prevents evolving another half-baked Lisp — Gives user the power to break things! — Cause infinite loops — Create security holes — ...
  15. Common pa!erns Front-end / back-end — Front-end code (Javascript, Python,

    ...) — Authentication and user profile — GUI for problem parameter setup — Result visualization — Back-end code (C/C++, Fortran, ...) — Database engine — Computational kernels — Simulators
  16. Front-end/back-end channels — Single process with function calls — Multiple

    co-located processes with IPC — TCP or Unix sockets — Pipes — Processes across possibly different machines — Plain TCP sockets — Messaging systems (e.g. 0MQ) — Remote Procedure Call (RPC) interfaces — Web-based API (e.g. http CRUD API)
  17. Common pa!erns Legacy code front-end — Idea: Glue a fancy

    front-end onto an old code — Examples — AUTO 2000 — MATFEAP and FEAPMEX — Misc commercial FEM codes — Often involves front-end / back-end or pipeline style — Challenge: Adapt legacy code with minor changes
  18. Common pa!erns Code generation — Goal: High-level interface, low-level performance

    — ... in a restricted domain — Technique: Write a code generator in a high-level language
  19. Common pa!erns Code generation examples — FFTW: OCaml generating C

    — Numba: Python generating LLVM — PyCUDA: Python generating CUDA — SEJITS: Python generating C variants — SPIRAL: Special generating C — Matexpr: Special generating C/C++
  20. Code generation issues — Static or dynamic? — Dynamic /

    JIT: tune for new problems — Static: user does not need JIT infrastructure! — External or internal DSL? — External: SPIRAL and company — Internal: Template meta-programming, Python transformers — Model-based optimizer or auto-tuning? — What space of problems?
  21. Case studies TkBTG — My first numerical code! (undergrad summer

    project, c.~1997) — Organization — Tcl/Tk front-end — C++ engine (communication via Unix Pipe) — Misc Fortran library codes
  22. Case studies LAPACK from C — CLAPACK: C translation of

    LAPACK (via f2c) — Run through "polisher" to clean up interfaces — Still useful for systems with no Fortran compiler — But a pain to maintain! — LAPACKE: C interfaces to LAPACK — Automatically generated wrappers around Fortran interfaces — No need to regenerate with successive versions
  23. Case studies SUGAR — Initially — Custom device description language

    — MATLAB for solvers and models — Eventually — Lua-based device description language — MATLAB and C/C++ solvers — MATLAB and C/C++ element models
  24. Case studies HiQLab (and others) — Lua-based device description language

    — C/C++ solvers and element models — High-level simulation scripting in MATLAB or Lua
  25. Case studies MWrap — MATLAB MEX files from decorated MATLAB

    code — Involves a little language to specify type signatures — Automatically checks inputs/outputs — Automatically handles exceptions
  26. Case studies FEAPMEX and MATFEAP — FEAP: An academic Finite

    Element Analysis Program — FEAPMEX — MATLAB MEX interface to FEAP — Directly accesses FEAP data structures — Auto-generated "glue" code — MATFEAP — Client-server interface to FEAP — FEAP stdin/stdout glued to socket or pipe