Slide 1

Slide 1 text

C/Elixir Interop: A libVIPS/Phoenix Case Study Evadne Wu Head of Exam Systems, Faria Education Group [email protected] / @evadne last updated 26 October 2016

Slide 2

Slide 2 text

Takeaway • Learn how you can embed your C program in Elixir • Investigate various integration approaches • Get a proper project that does what it says

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Structure 1. Overview 2. Requirements 3. Solutions Assessment 4. Single-Solution Deep Dive 5. Demo 6. Observations / Q&A

Slide 5

Slide 5 text

Overview • If you’ve written a web application, chances are that you’ll need to generate thumbnails. • If you have a Phoenix application, you’d probably like to generate thumbnails within your application so everything is in the same place. • You’d probably want a solution that just works but is also quite performant.

Slide 6

Slide 6 text

Requirements • Generate thumbnails from Elixir/Erlang in an Erlang-like manner (i.e. that it isolates faults, is fast enough and does not do strange things)

Slide 7

Slide 7 text

Solutions Assessment 1. Fork: spawn a process with arguments, wait for completion. 2. Daemon: Swap messages with a long-running daemon. 3. NIF: Run C code in BEAM directly. 4. Pattern Match: implement scaling code in Erlang/Elixir directly. 5. Persistent C Server: swap messages with a supervised process.

Slide 8

Slide 8 text

Solution 1: Fork • Fork an OS process to generate one thumbnail at a time. • Start a child process with appropriate arguments. • Wait for the child process to finish. • Look at what the child process has sent.

Slide 9

Slide 9 text

Fork: Characteristics • Assembly Required: No implicit flow control or resource cap. • Safe: Crashes isolated to external OS processes; resource cleanup done by OS. • Slow: code/data needs to be reloaded on each run. • Expensive: bigger servers, smaller conference budget.

Slide 10

Slide 10 text

Fork: Good Bits • A simple forked process which exits and returns results at the same time is very easy to reason with. This can be attractive when you do not have a concurrency requirement. • Thorough cleanup is almost guaranteed upon process exit.

Slide 11

Slide 11 text

Forking: Bad Bits • Forking is quite bad if your process needs to first load data into memory, or has a heavy initialisation process. • You may create a fork bomb if multiple forks can happen concurrently and there is no safeguard. • You will most likely need a timeout.

Slide 12

Slide 12 text

Forking: Example System.cmd “mogrify”,
 arguments(image, output_path),
 stderr_to_stdout: true

Slide 13

Slide 13 text

Solution 2: Daemon • Either implement a daemon for your code, or find a project that has one and use that. • The daemon will field your requests either over a port directly or via forked child processes that pass messages. • Some daemons may even have concurrency support.

Slide 14

Slide 14 text

Daemon: Characteristics • Faster Per-File Processing Times: No need to reload data on each call. • Less Memory Pressure: Possible to share some memory among all processes. • Faults Isolated: Crashes isolated to an external OS process and its children. Possible to have an OS-level process manager restart the daemon(s). • Multiple Failure Modes: Errors can propagate and cause grief because the daemon is probably not written in Erlang.

Slide 15

Slide 15 text

Daemon: Good Bits • ClamAV, a popular open-source virus scanner project, has two variants. It can run a daemon which then accepts work, or it can be run standalone. The daemon is about 10 to 100 times faster to scan a file in practice because it does not have to repeatedly load virus definitions. • This is an example of a proper daemon not written in Erlang (and you can still supervise a daemon using Erlang).

Slide 16

Slide 16 text

Daemon: Bad Bits • It makes no sense to implement half of Erlang in another language. It takes longer to do that than learning Erlang. • If you do not have the daemon supervised by your application, you will not have a common root for all activities and that leads to madness. • You need to find a way to send a message to a daemon. You may need to make a binary/text interface or you may need to take the hit of forking something, which does that. Either way it is a lot of work.

Slide 17

Slide 17 text

Solution 3: NIF • Write your code in C. • Expose them as NIFs (Native-Implemented Functions). • Call them from BEAM, wait for response (synchronously) then use that response.

Slide 18

Slide 18 text

NIF: Good Bits • Concurrent: NIFs can be marked “dirty”, and they will be run on a separate set of schedulers. • Fast: No context-switching required, so calling NIFs can be quite fast.

Slide 19

Slide 19 text

NIF: Bad Bits • No Fault Isolation: A crash in your NIF brings down the BEAM. • Hard to De-Risk: image formats can be complex; it would be difficult to proclaim any code manipulating them bug-free. Images can come from the Internet (i.e. user-provided input). • Elbow Grease Required: Special care is required to mark a NIF dirty. Failure to cover all bases may cause issues.

Slide 20

Slide 20 text

Solution 4: Pattern-Match • Write your conversion code in Erlang using pattern-matching. • Requires intimate understanding of all image format specifications and of the BEAM as you will be moving a lot of binary data around. • A good weekend project for the tenacious.

Slide 21

Slide 21 text

Solution 5: Persistent C Server • Write a synchronous, single-threaded C Server reading from STDIN and writing to STDOUT/STDERR. • Supervise the C Server with appropriate Erlang code which restarts the process as needed. Crash the C Server whenever. • Put as many of these pairs in a connection pool as needed.

Slide 22

Slide 22 text

Single Solution We can summarise a few more data points from all available information. The ideal solution should be… • Not Forking; • Isolated in Own Memory Space; • Crash-Resistant.

Slide 23

Slide 23 text

Single Solution: Ingredients • Image Manipulation: libVIPS and its High Level C binding. This is a proven solution and is faster than ImageMagick. Its functions can be picked-and-choosed in our custom C Server. • Protocol: Text-Based. This means the C Server can be tested in isolation without an elaborate test harness, will be able to work over STDIN/STDOUT, and will not require code to handle a binary protocol.

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Single Solution: Layout • Implement an worker pool using Poolboy. • In each worker, pull in Erlexec and run/maintain a C server. • Implement a façade that checks out a process from the pool and uses it.

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Single Solution: C Server $ scaler 20 20 foobar ERROR - Unable to open file 288 288 /Users/evadne/Pictures/IMG_0245.PNG /tmp/converter-lDMsQF.png $ identify /tmp/converter-lDMsQF.png /tmp/converter-lDMsQF.png PNG 288x216 288x216+0+0 8-bit sRGB 4.74KB 0.000u 0:00.000

Slide 30

Slide 30 text

Single Solution: External Façade def preview(conn, params) do in_path = params["image"].path {:ok, out_path} = Resampler.request(in_path, 512, 512) {:ok, image} = File.read(out_path) base64 = Base.encode64(image) render conn, "preview.html", base64: base64, diff: formatted_diff(diff) end

Slide 31

Slide 31 text

Demo • Note the local dependency and how a Makefile can be written to build the C bits in the right way. • Note how the performance gap seems to widen as the input gets larger.

Slide 32

Slide 32 text

Observations • The solution is a bit faster than others indeed; the performance gap widens as the images grow larger. • Mixing Erlang/Elixir and C does not need to be hard. • Best tool for the job.

Slide 33

Slide 33 text

Open Source evadne/supervised-scaler Elixir + Phoenix MIT License