Slide 1

Slide 1 text

felixge Faster than C? Parsing binary data in JavaScript Felix Geisendörfer

Slide 2

Slide 2 text

@felixge felixge Background 2005 - 2008 2008 - now 2009 - now

Slide 3

Slide 3 text

@felixge felixge The NodeCopter nodecopter.com

Slide 4

Slide 4 text

felixge This Talk

Slide 5

Slide 5 text

felixge JavaScript vs C

Slide 6

Slide 6 text

felixge Good vs Evil

Slide 7

Slide 7 text

felixge Good Parts vs Evil

Slide 8

Slide 8 text

felixge Bad Parts vs Evil

Slide 9

Slide 9 text

felixge Faster than C?

Slide 10

Slide 10 text

felixge No Sorry for the “title bait”

Slide 11

Slide 11 text

felixge As fast as C bindings For parsing binary protocols / database drivers

Slide 12

Slide 12 text

felixge The Story

Slide 13

Slide 13 text

felixge early 2010

Slide 14

Slide 14 text

felixge No MySQL module for node.js early 2010

Slide 15

Slide 15 text

felixge

Slide 16

Slide 16 text

felixge Pure JS / No C/C++

Slide 17

Slide 17 text

felixge Before Buffers became usable

Slide 18

Slide 18 text

felixge The Parser was using JavaScript Strings

Slide 19

Slide 19 text

felixge Node.js Trivia

Slide 20

Slide 20 text

felixge “ Buffers” used to be called “Blobs”

Slide 21

Slide 21 text

felixge For 3 min and 15 sec

Slide 22

Slide 22 text

felixge RIP Blobs ✞

Slide 23

Slide 23 text

felixge Sun Dec 13 08:39:20 2009 - Sun Dec 13 08:42:45 2009

Slide 24

Slide 24 text

felixge Anyway

Slide 25

Slide 25 text

felixge mysql can be done without libmysql

Slide 26

Slide 26 text

felixge

Slide 27

Slide 27 text

felixge No good deed goes unpunished

Slide 28

Slide 28 text

felixge Sir Isaac Newton

Slide 29

Slide 29 text

felixge Third Law of Motion

Slide 30

Slide 30 text

“When a first body exerts a force F1 on a second body, the second body simultaneously exerts a force F2 = −F1 on the first body. This means that F1 and F2 are equal in magnitude and opposite in direction.”

Slide 31

Slide 31 text

felixge Third Law of Github

Slide 32

Slide 32 text

“When a first person pushes a library L1 into a remote repository, a second person simultaneously starts working on a second library L2 which will be equally awesome, but in a different way.”

Slide 33

Slide 33 text

felixge <3 Github!

Slide 34

Slide 34 text

felixge

Slide 35

Slide 35 text

@felixge felixge Benchmark • Parse ~180 MB / 100.000 rows of MySQL result data • 5 Columns: id, title, text, created, updated • -> create 100k objects with 500k keys + 500k values

Slide 36

Slide 36 text

felixge 0 500 1,000 1,500 mysql−0.9.6 mysql−libmysqlclient−1.5.1 benchmark mbit benchmark mysql−0.9.6 mysql−libmysqlclient−1.5.1

Slide 37

Slide 37 text

felixge Of course.

Slide 38

Slide 38 text

felixge libmysql = C

Slide 39

Slide 39 text

felixge my library = JavaScript

Slide 40

Slide 40 text

felixge C > JS, right?

Slide 41

Slide 41 text

felixge But V8!

Slide 42

Slide 42 text

felixge And Crankshaft!!

Slide 43

Slide 43 text

felixge Node.js !!!1!

Slide 44

Slide 44 text

felixge Was I living a lie?

Slide 45

Slide 45 text

felixge Kind of

Slide 46

Slide 46 text

felixge V8 / Node = Tools

Slide 47

Slide 47 text

felixge Performance is not a tool

Slide 48

Slide 48 text

felixge Performance is hard work & data analysis

Slide 49

Slide 49 text

felixge 0 500 1,000 1,500 mysql−0.9.6 mysql−libmysqlclient−1.5.1mysql−2.0.0−alpha3 benchmark mbit benchmark mysql−0.9.6 mysql−libmysqlclient−1.5.1 mysql−2.0.0−alpha3

Slide 50

Slide 50 text

felixge Third Law of Github

Slide 51

Slide 51 text

felixge

Slide 52

Slide 52 text

felixge 0 1,000 2,000 3,000 4,000 mysql−0.9.6 mysql−libmysqlclient−1.5.1 mysql−2.0.0−alpha3 mariadb−0.1.7 benchmark mbit benchmark mysql−0.9.6 mysql−libmysqlclient−1.5.1 mysql−2.0.0−alpha3 mariadb−0.1.7

Slide 53

Slide 53 text

felixge Time to give up?

Slide 54

Slide 54 text

felixge NEVER!

Slide 55

Slide 55 text

felixge New Parser

Slide 56

Slide 56 text

@felixge felixge 0 2,000 4,000 6,000 mysql2 new−parser benchmark mbit benchmark mysql2 new−parser

Slide 57

Slide 57 text

felixge Third law of Github?

Slide 58

Slide 58 text

felixge Endgame

Slide 59

Slide 59 text

felixge Last bottleneck: V8 / Creating JS Objects

Slide 60

Slide 60 text

felixge Also: MySQL Server saturated

Slide 61

Slide 61 text

felixge Anyway

Slide 62

Slide 62 text

felixge How to write fast JS

Slide 63

Slide 63 text

felixge Does not work

Slide 64

Slide 64 text

@felixge felixge Profiling • Good at telling you which functions are slow • Bad at telling you how to fix it

Slide 65

Slide 65 text

felixge Taking performance advice from strangers

Slide 66

Slide 66 text

@felixge felixge Taking performance advice from strangers • Good for ideas & inspiration • But useless when applied cargo-cult style

Slide 67

Slide 67 text

@felixge felixge for (var i = 0; i < array.length; i++) { // do some work with array[i] }

Slide 68

Slide 68 text

@felixge felixge for (var i = 0, length = array.length; i < length; i++) { // do some work with array[i] }

Slide 69

Slide 69 text

felixge What Does Work?

Slide 70

Slide 70 text

felixge Benchmark Driven Development

Slide 71

Slide 71 text

@felixge felixge Benchmark Driven Development • Similar to test driven development • Use it when performance is an explicit design goal • Benchmark first > benchmark after !

Slide 72

Slide 72 text

felixge 1 function benchmark() { 2 // intentionally empty 3 }

Slide 73

Slide 73 text

felixge 1 while (true) { 2 var start = Date.now(); 3 benchmark(); 4 var duration = Date.now() - start; 5 console.log(duration); 6 }

Slide 74

Slide 74 text

@felixge felixge Benchmark Driven Development • Next step: Implement a tiny part of your function • Example: Parse headers of MySQL packets • Look at impact, tweak code, repeat

Slide 75

Slide 75 text

@felixge felixge Example Results • try...catch is ok • big switch statement = bad • function calls = very cheap • buffering is ok

Slide 76

Slide 76 text

@felixge felixge Favorite Optimization: Loop unrolling with eval()’s good twin new Function()

Slide 77

Slide 77 text

@felixge felixge 1 function parseRow(columns, parser) { 2 var row = {}; 3 for (var i = 0; i < columns.length; i++) { 4 row[columns[i].name] = parser.readColumnValue(); 5 } 6 return row; 7 }

Slide 78

Slide 78 text

@felixge felixge 1 function parseRow(columns, parser) { 2 return { 3 id : parser.readColumnValue(), 4 title : parser.readColumnValue(), 5 body : parser.readColumnValue(), 6 created : parser.readColumnValue(), 7 updated : parser.readColumnValue(), 8 }; 9 }

Slide 79

Slide 79 text

@felixge felixge How can we unroll this loop at runtime?

Slide 80

Slide 80 text

@felixge felixge 1 var code = 'return {\n'; 2 3 columns.forEach(function(column) { 4 code += '"' + column.name + '":' + 'parser.readColumnValue(),\n'; 5 }); 6 7 code += '};\n'; 8 9 var parseRow = new Function('columns', 'parser', code);

Slide 81

Slide 81 text

PLEASE DO NOT REMEMBER THIS - DO YOUR OWN BENCHMARKS!!!

Slide 82

Slide 82 text

@felixge felixge Data Analysis

Slide 83

Slide 83 text

@felixge felixge Data analysis • Produce data points as tab separated values • Add as many VM/OS metrics as you can get to every line • Do not mix data and analysis !!

Slide 84

Slide 84 text

@felixge felixge Recommended Tools • node benchmark.js | tee results.tsv • R Programming language (with ggplot2) ! • Makefiles, Image Magick, Skitch

Slide 85

Slide 85 text

@felixge felixge Why?

Slide 86

Slide 86 text

@felixge felixge 0 2,000 4,000 6,000 mysql2 new−parser benchmark mbit benchmark mysql2 new−parser

Slide 87

Slide 87 text

@felixge felixge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3,000 4,000 5,000 6,000 mysql2 new−parser benchmark mbit benchmark ● ● mysql2 new−parser

Slide 88

Slide 88 text

@felixge felixge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3,000 4,000 5,000 6,000 mysql2 new−parser benchmark mbit benchmark ● ● mysql2 new−parser Dafuq? Dafuq?

Slide 89

Slide 89 text

@felixge felixge 3,000 4,000 5,000 6,000 0 100 200 300 number mbit benchmark mysql2 new−parser

Slide 90

Slide 90 text

@felixge felixge 10 20 30 0 100 200 300 number Heap Total (MB) benchmark mysql2 new−parser

Slide 91

Slide 91 text

@felixge felixge 5 10 15 0 100 200 300 number Heap Used (MB) benchmark mysql2 new−parser

Slide 92

Slide 92 text

@felixge felixge tl;dr

Slide 93

Slide 93 text

@felixge felixge 1. Write a benchmark 2. Write/change a little code 3. Collect data 4. Find problems 5. Goto 2

Slide 94

Slide 94 text

@felixge felixge Thank you Felix Geisendörfer

Slide 95

Slide 95 text

@felixge felixge github.com/felixge/faster-than-c All benchmarks, results and analysis scripts Felix Geisendörfer