C/Elixir Interop

16925e7df06e14eb8d36263b4a8c31b4?s=47 Evadne Wu
October 26, 2016

C/Elixir Interop

libVIPS/Phoenix Case Study

16925e7df06e14eb8d36263b4a8c31b4?s=128

Evadne Wu

October 26, 2016
Tweet

Transcript

  1. C/Elixir Interop: A libVIPS/Phoenix Case Study Evadne Wu Head of

    Exam Systems, Faria Education Group evadne@managebac.com / @evadne last updated 26 October 2016
  2. Takeaway • Learn how you can embed your C program

    in Elixir • Investigate various integration approaches • Get a proper project that does what it says
  3. None
  4. Structure 1. Overview 2. Requirements 3. Solutions Assessment 4. Single-Solution

    Deep Dive 5. Demo 6. Observations / Q&A
  5. Overview • If you’ve written a web application, chances are

    that you’ll need to generate thumbnails. • If you have a Phoenix application, you’d probably like to generate thumbnails within your application so everything is in the same place. • You’d probably want a solution that just works but is also quite performant.
  6. Requirements • Generate thumbnails from Elixir/Erlang in an Erlang-like manner

    (i.e. that it isolates faults, is fast enough and does not do strange things)
  7. Solutions Assessment 1. Fork: spawn a process with arguments, wait

    for completion. 2. Daemon: Swap messages with a long-running daemon. 3. NIF: Run C code in BEAM directly. 4. Pattern Match: implement scaling code in Erlang/Elixir directly. 5. Persistent C Server: swap messages with a supervised process.
  8. Solution 1: Fork • Fork an OS process to generate

    one thumbnail at a time. • Start a child process with appropriate arguments. • Wait for the child process to finish. • Look at what the child process has sent.
  9. Fork: Characteristics • Assembly Required: No implicit flow control or

    resource cap. • Safe: Crashes isolated to external OS processes; resource cleanup done by OS. • Slow: code/data needs to be reloaded on each run. • Expensive: bigger servers, smaller conference budget.
  10. Fork: Good Bits • A simple forked process which exits

    and returns results at the same time is very easy to reason with. This can be attractive when you do not have a concurrency requirement. • Thorough cleanup is almost guaranteed upon process exit.
  11. Forking: Bad Bits • Forking is quite bad if your

    process needs to first load data into memory, or has a heavy initialisation process. • You may create a fork bomb if multiple forks can happen concurrently and there is no safeguard. • You will most likely need a timeout.
  12. Forking: Example System.cmd “mogrify”,
 arguments(image, output_path),
 stderr_to_stdout: true

  13. Solution 2: Daemon • Either implement a daemon for your

    code, or find a project that has one and use that. • The daemon will field your requests either over a port directly or via forked child processes that pass messages. • Some daemons may even have concurrency support.
  14. Daemon: Characteristics • Faster Per-File Processing Times: No need to

    reload data on each call. • Less Memory Pressure: Possible to share some memory among all processes. • Faults Isolated: Crashes isolated to an external OS process and its children. Possible to have an OS-level process manager restart the daemon(s). • Multiple Failure Modes: Errors can propagate and cause grief because the daemon is probably not written in Erlang.
  15. Daemon: Good Bits • ClamAV, a popular open-source virus scanner

    project, has two variants. It can run a daemon which then accepts work, or it can be run standalone. The daemon is about 10 to 100 times faster to scan a file in practice because it does not have to repeatedly load virus definitions. • This is an example of a proper daemon not written in Erlang (and you can still supervise a daemon using Erlang).
  16. Daemon: Bad Bits • It makes no sense to implement

    half of Erlang in another language. It takes longer to do that than learning Erlang. • If you do not have the daemon supervised by your application, you will not have a common root for all activities and that leads to madness. • You need to find a way to send a message to a daemon. You may need to make a binary/text interface or you may need to take the hit of forking something, which does that. Either way it is a lot of work.
  17. Solution 3: NIF • Write your code in C. •

    Expose them as NIFs (Native-Implemented Functions). • Call them from BEAM, wait for response (synchronously) then use that response.
  18. NIF: Good Bits • Concurrent: NIFs can be marked “dirty”,

    and they will be run on a separate set of schedulers. • Fast: No context-switching required, so calling NIFs can be quite fast.
  19. NIF: Bad Bits • No Fault Isolation: A crash in

    your NIF brings down the BEAM. • Hard to De-Risk: image formats can be complex; it would be difficult to proclaim any code manipulating them bug-free. Images can come from the Internet (i.e. user-provided input). • Elbow Grease Required: Special care is required to mark a NIF dirty. Failure to cover all bases may cause issues.
  20. Solution 4: Pattern-Match • Write your conversion code in Erlang

    using pattern-matching. • Requires intimate understanding of all image format specifications and of the BEAM as you will be moving a lot of binary data around. • A good weekend project for the tenacious.
  21. Solution 5: Persistent C Server • Write a synchronous, single-threaded

    C Server reading from STDIN and writing to STDOUT/STDERR. • Supervise the C Server with appropriate Erlang code which restarts the process as needed. Crash the C Server whenever. • Put as many of these pairs in a connection pool as needed.
  22. Single Solution We can summarise a few more data points

    from all available information. The ideal solution should be… • Not Forking; • Isolated in Own Memory Space; • Crash-Resistant.
  23. Single Solution: Ingredients • Image Manipulation: libVIPS and its High

    Level C binding. This is a proven solution and is faster than ImageMagick. Its functions can be picked-and-choosed in our custom C Server. • Protocol: Text-Based. This means the C Server can be tested in isolation without an elaborate test harness, will be able to work over STDIN/STDOUT, and will not require code to handle a binary protocol.
  24. None
  25. None
  26. None
  27. Single Solution: Layout • Implement an worker pool using Poolboy.

    • In each worker, pull in Erlexec and run/maintain a C server. • Implement a façade that checks out a process from the pool and uses it.
  28. None
  29. Single Solution: C Server $ scaler 20 20 foobar ERROR

    - Unable to open file 288 288 /Users/evadne/Pictures/IMG_0245.PNG /tmp/converter-lDMsQF.png $ identify /tmp/converter-lDMsQF.png /tmp/converter-lDMsQF.png PNG 288x216 288x216+0+0 8-bit sRGB 4.74KB 0.000u 0:00.000
  30. Single Solution: External Façade def preview(conn, params) do in_path =

    params["image"].path {:ok, out_path} = Resampler.request(in_path, 512, 512) {:ok, image} = File.read(out_path) base64 = Base.encode64(image) render conn, "preview.html", base64: base64, diff: formatted_diff(diff) end
  31. Demo • Note the local dependency and how a Makefile

    can be written to build the C bits in the right way. • Note how the performance gap seems to widen as the input gets larger.
  32. Observations • The solution is a bit faster than others

    indeed; the performance gap widens as the images grow larger. • Mixing Erlang/Elixir and C does not need to be hard. • Best tool for the job.
  33. Open Source evadne/supervised-scaler Elixir + Phoenix MIT License