FlatBuffers for Go - Speaker Deck

Slide 1

Slide 1 text

FlatBuffers for Go Fast and Fun Serialization 21 January 2015 Robert Winslow Programmer What we'll cover today Serialization basics Why we need another serialization format What makes FlatBuffers special Example code and usage

Slide 2

Slide 2 text

What is a serialization format? A standard way to store structured data, then read it back. Examples: JSON Protocol Buffers Thrift XML Serialization Who here spends a lot of time interacting with serialized data?

Slide 3

Slide 3 text

Serialization Trick question: everybody. Serialization standards are what let us make sense of sequences of bytes. Why a new serialization format? Android game developers at Google needed a better way to store data. Games are demanding applications. Memory bottlenecks are bad. Using too much CPU wastes battery life.

Slide 4

Slide 4 text

Why a new serialization format? The primary alternative was Protocol Buffers. Protocol Buffers is a major open source serialization project from Google. Who here has used Protocol Buffers? Why a new serialization format? Good things about Protocol Buffers: Robust. Secure. Popular. "Nobody ever got fired for choosing Protocol Buffers."

Slide 5

Slide 5 text

Why a new serialization format? Bad things about Protocol Buffers: Allocates temporary objects to unpack data. No direct random access. Poor data locality. Large codebase (3.8MB of code). Slow. Rumors of cost at scale... Why a new serialization format? The Fun Propulsion Lab is a tooling group inside Android. They decided to try a new approach.

Slide 6

Slide 6 text

Why a new serialization format? They asked, could we build a serialization format that: Is simple, Versions your data with a schema, Enables random access, And is ridiculously fast? They tried and succeeded. The result is FlatBuffers. How fast? By the numbers Read-only microbenchmarks on a small dataset: Library ops/sec nanoseconds/op FlatBuffers 12,500,000 80 Protocol Buffers LITE 3,311 302,000 Rapid JSON 1,718 583,000 pugixml 5,102 196,000 Over 1000 times faster than Protocol Buffers. This uses the C++ version; we hope to make the Go version comparably fast. Source: google.github.io/flatbuffers/md__benchmarks.html (https://google.github.io/flatbuffers/md__benchmarks.html)

Slide 7

Slide 7 text

What is FlatBuffers? New serialization standard that is both featureful and fast. Open-source project from Google, released in 2014. Created and maintained by Wouter van Oortmerssen, a programming language creator and game developer. Licensed under the permissive Apache v2 license. The big idea (TL;DR)

Slide 8

Slide 8 text

The big idea (TL;DR) FlatBuffer files are statically-typed, schema-versioned, portable structs. Wisdom "Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." - Rob Pike, Rob Pike's 5 Rules of Programming

Slide 9

Slide 9 text

Speed: Part 1 Why care about speed? Computers are "fast enough". Orders of magnitude still matter. What if you could use 100 computers instead of 1,000 to do a task?

Slide 10

Slide 10 text

What makes FlatBuffers so fast at read operations? A different approach: No memory allocations. Tight packing of data is friendly to CPU caches (L1, L2, L3) Minimal code on hot execution paths (CPU instruction cache). The philosophy is different FlatBuffers relies on pointer arithmetic to read data without allocating intermediate objects. No calls to make / malloc!

Slide 11

Slide 11 text

Minimal serialization of an array An array is a sequence of fixed-width elements. Use pointer arithmetic to find the data you want. Minimal serialization of an array A small array of int32: 2 3 5 7 (first four primes) Bytes for representing four numbers, each 4 bytes wide: 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 (little-endian)

Slide 12

Slide 12 text

Minimal serialization of an array Given the buffer: buf := []byte{0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0} Get the ith value: // Get: Read a little-endian int32 from a buffer. func Get(i int, buf []byte) int32 { offset := i * 4 data := buf[offset : offset+4] var n int32 = *(*int32)(unsafe.Pointer(&data[0])) return n } Get(0, buf) // 2 Get(1, buf) // 3 Get(2, buf) // 5 Get(3, buf) // 7 Minimal serialization of a struct A struct is just a heterogenous group of fixed-width elements. Use pointer arithmetic to find the data you want.

Slide 13

Slide 13 text

Minimal serialization of a struct Given the struct type Particle: type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } Stored in this buffer: buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} Get the X value: func GetX(buf []byte) (n int16) { data := buf[:2] n = *(*int16)(unsafe.Pointer(&data[0])) return } Minimal serialization of a struct type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} func GetY(buf []byte) (n int16) { data := buf[2:4] n = *(*int16)(unsafe.Pointer(&data[0])) return }

Slide 14

Slide 14 text

Minimal serialization of a struct type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} func GetZ(buf []byte) (n int16) { data := buf[4:6] n = *(*int16)(unsafe.Pointer(&data[0])) return } Minimal serialization of a struct type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} func GetRGB(buf []byte) (rgb [3]byte) { data := buf[6:9] rgb = *(*[3]byte)(unsafe.Pointer(&data[0])) return }

Slide 15

Slide 15 text

The philosophy is simple Just like arrays and structs, the FlatBuffers library uses pointer arithmetic. Every FlatBuffer is read in-place with pointer arithmetic operations. Show me code

Slide 16

Slide 16 text

Using FlatBuffers Here's an example schema: // player.fbs namespace Game; table Player { name:string (id: 0, required); health:short = 100 (id: 1); armor:short = 100 (id: 2); } root_type Player; file_identifier "PLAY"; file_extension "player"; Using FlatBuffers Feed the schema to the FlatBuffers generator to create accessor code: flatc -g player.fbs It creates one Go file used to work with data: Game └── Player.go It's a relatively small file: $ wc -c Game/Player.go 1441 Game/Player.go

Slide 17

Slide 17 text

Using FlatBuffers To create data, use the generated builder functions: builder := flatbuffers.NewBuilder(0) name := builder.CreateString("Robert") game.PlayerStart(builder) game.PlayerAddName(builder, name) game.PlayerAddHealth(builder, 60) // Tell FlatBuffers we are finished writing this object: player := game.PlayerEnd(builder) builder.Finish(player) // Save the backing byte buffer to a file: buf := builder.Bytes[builder.Head():] err := ioutil.WriteFile("robert.player", buf, 0666) It generates a tiny file: $ wc -c robert.player 40 robert.player Using FlatBuffers To read data, use generated getter functions: // Load the buffer from the file: buf, err := ioutil.ReadFile("robert.player") if err != nil { log.Fatal(err) } // Initialize FlatBuffers code to use the data: player := game.GetRootAsPlayer(buf, 0) // Print the data we saved: fmt.Printf("Name: %s\n", player.Name()) fmt.Printf("Health: %3d\n", player.Health()) fmt.Printf("Armor: %3d\n", player.Armor()) Prints the data we saved: Name: Robert Health: 60 Armor: 100

Slide 18

Slide 18 text

Using FlatBuffers Here's how the object looks on disk: 8 bytes: Offset of the Player object 12 bytes: Player object metadata 4 bytes: Name string metadata 4 bytes: Health 4 bytes: Armor + 8 bytes: "Robert" and padding -------------------------------------------- 40 bytes The philosophy is simple FlatBuffers generates code that uses just a few jumps to get any element you want. We use pointer arithmetic to skip the parsing step completely.

Slide 19

Slide 19 text

Speed: Part 2 Speed: The big picture We're not just talking about mobile anymore. Every computer is resource-constrained. At scale, inefficiences add up to tremendous amounts of energy, time, and money.

Slide 20

Slide 20 text

FlatBuffers can help Orders of magnitude faster Read-only microbenchmarks on a small dataset: Library ops/sec nanoseconds/op FlatBuffers 12,500,000 80 Protocol Buffers LITE 3,311 302,000 Rapid JSON 1,718 583,000 pugixml 5,102 196,000 Over 1000 times faster than Protocol Buffers. This uses the C++ version; we hope to make the Go version comparably fast. Source: google.github.io/flatbuffers/md__benchmarks.html (https://google.github.io/flatbuffers/md__benchmarks.html)

Slide 21

Slide 21 text

More reasons to use FlatBuffers Not only is it fast, it also supports Schema versioning Union fields Default values Inline structs Variable-length vectors Available in C++ C# Java Go Hackable 2,200+ GitHub stars Clearly written, easy to comprehend Stable wire format Unit test suite Fuzz test suites Few lines of code C++ 3109 C/C++ Header 1179 Go 724 --------------------- Total 5012

Slide 22

Slide 22 text

Use FlatBuffers today! Documentation: google.github.io/flatbuffers (https://google.github.io/flatbuffers) Source code: github.com/google/flatbuffers (https://github.com/google/flatbuffers) Go runtime library: go get github.com/google/flatbuffers/go Schema compiler: git clone https://github.com/google/flatbuffers.git Thank you Robert Winslow Programmer [email protected] (mailto:[email protected]) http://rwinslow.com/ (http://rwinslow.com/) @robert_winslow (http://twitter.com/robert_winslow)