Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FlatBuffers for Go

Hakka Labs
February 13, 2015

FlatBuffers for Go

Full post here:

Hakka Labs

February 13, 2015
Tweet

More Decks by Hakka Labs

Other Decks in Programming

Transcript

  1. FlatBuffers for Go Fast and Fun Serialization 21 January 2015

    Robert Winslow Programmer What we'll cover today Serialization basics Why we need another serialization format What makes FlatBuffers special Example code and usage
  2. What is a serialization format? A standard way to store

    structured data, then read it back. Examples: JSON Protocol Buffers Thrift XML Serialization Who here spends a lot of time interacting with serialized data?
  3. Serialization Trick question: everybody. Serialization standards are what let us

    make sense of sequences of bytes. Why a new serialization format? Android game developers at Google needed a better way to store data. Games are demanding applications. Memory bottlenecks are bad. Using too much CPU wastes battery life.
  4. Why a new serialization format? The primary alternative was Protocol

    Buffers. Protocol Buffers is a major open source serialization project from Google. Who here has used Protocol Buffers? Why a new serialization format? Good things about Protocol Buffers: Robust. Secure. Popular. "Nobody ever got fired for choosing Protocol Buffers."
  5. Why a new serialization format? Bad things about Protocol Buffers:

    Allocates temporary objects to unpack data. No direct random access. Poor data locality. Large codebase (3.8MB of code). Slow. Rumors of cost at scale... Why a new serialization format? The Fun Propulsion Lab is a tooling group inside Android. They decided to try a new approach.
  6. Why a new serialization format? They asked, could we build

    a serialization format that: Is simple, Versions your data with a schema, Enables random access, And is ridiculously fast? They tried and succeeded. The result is FlatBuffers. How fast? By the numbers Read-only microbenchmarks on a small dataset: Library ops/sec nanoseconds/op FlatBuffers 12,500,000 80 Protocol Buffers LITE 3,311 302,000 Rapid JSON 1,718 583,000 pugixml 5,102 196,000 Over 1000 times faster than Protocol Buffers. This uses the C++ version; we hope to make the Go version comparably fast. Source: google.github.io/flatbuffers/md__benchmarks.html (https://google.github.io/flatbuffers/md__benchmarks.html)
  7. What is FlatBuffers? New serialization standard that is both featureful

    and fast. Open-source project from Google, released in 2014. Created and maintained by Wouter van Oortmerssen, a programming language creator and game developer. Licensed under the permissive Apache v2 license. The big idea (TL;DR)
  8. The big idea (TL;DR) FlatBuffer files are statically-typed, schema-versioned, portable

    structs. Wisdom "Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." - Rob Pike, Rob Pike's 5 Rules of Programming
  9. Speed: Part 1 Why care about speed? Computers are "fast

    enough". Orders of magnitude still matter. What if you could use 100 computers instead of 1,000 to do a task?
  10. What makes FlatBuffers so fast at read operations? A different

    approach: No memory allocations. Tight packing of data is friendly to CPU caches (L1, L2, L3) Minimal code on hot execution paths (CPU instruction cache). The philosophy is different FlatBuffers relies on pointer arithmetic to read data without allocating intermediate objects. No calls to make / malloc!
  11. Minimal serialization of an array An array is a sequence

    of fixed-width elements. Use pointer arithmetic to find the data you want. Minimal serialization of an array A small array of int32: 2 3 5 7 (first four primes) Bytes for representing four numbers, each 4 bytes wide: 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 (little-endian)
  12. Minimal serialization of an array Given the buffer: buf :=

    []byte{0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0} Get the ith value: // Get: Read a little-endian int32 from a buffer. func Get(i int, buf []byte) int32 { offset := i * 4 data := buf[offset : offset+4] var n int32 = *(*int32)(unsafe.Pointer(&data[0])) return n } Get(0, buf) // 2 Get(1, buf) // 3 Get(2, buf) // 5 Get(3, buf) // 7 Minimal serialization of a struct A struct is just a heterogenous group of fixed-width elements. Use pointer arithmetic to find the data you want.
  13. Minimal serialization of a struct Given the struct type Particle:

    type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } Stored in this buffer: buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} Get the X value: func GetX(buf []byte) (n int16) { data := buf[:2] n = *(*int16)(unsafe.Pointer(&data[0])) return } Minimal serialization of a struct type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} func GetY(buf []byte) (n int16) { data := buf[2:4] n = *(*int16)(unsafe.Pointer(&data[0])) return }
  14. Minimal serialization of a struct type Particle struct { X

    int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} func GetZ(buf []byte) (n int16) { data := buf[4:6] n = *(*int16)(unsafe.Pointer(&data[0])) return } Minimal serialization of a struct type Particle struct { X int16 // bytes: 0, 1 Y int16 // bytes: 2, 3 Z int16 // bytes: 4, 5 RGB [3]byte // bytes: 6, 7, 8 } buf := []byte{1, 0, 2, 0, 3, 0, 128, 0, 192} func GetRGB(buf []byte) (rgb [3]byte) { data := buf[6:9] rgb = *(*[3]byte)(unsafe.Pointer(&data[0])) return }
  15. The philosophy is simple Just like arrays and structs, the

    FlatBuffers library uses pointer arithmetic. Every FlatBuffer is read in-place with pointer arithmetic operations. Show me code
  16. Using FlatBuffers Here's an example schema: // player.fbs namespace Game;

    table Player { name:string (id: 0, required); health:short = 100 (id: 1); armor:short = 100 (id: 2); } root_type Player; file_identifier "PLAY"; file_extension "player"; Using FlatBuffers Feed the schema to the FlatBuffers generator to create accessor code: flatc -g player.fbs It creates one Go file used to work with data: Game └── Player.go It's a relatively small file: $ wc -c Game/Player.go 1441 Game/Player.go
  17. Using FlatBuffers To create data, use the generated builder functions:

    builder := flatbuffers.NewBuilder(0) name := builder.CreateString("Robert") game.PlayerStart(builder) game.PlayerAddName(builder, name) game.PlayerAddHealth(builder, 60) // Tell FlatBuffers we are finished writing this object: player := game.PlayerEnd(builder) builder.Finish(player) // Save the backing byte buffer to a file: buf := builder.Bytes[builder.Head():] err := ioutil.WriteFile("robert.player", buf, 0666) It generates a tiny file: $ wc -c robert.player 40 robert.player Using FlatBuffers To read data, use generated getter functions: // Load the buffer from the file: buf, err := ioutil.ReadFile("robert.player") if err != nil { log.Fatal(err) } // Initialize FlatBuffers code to use the data: player := game.GetRootAsPlayer(buf, 0) // Print the data we saved: fmt.Printf("Name: %s\n", player.Name()) fmt.Printf("Health: %3d\n", player.Health()) fmt.Printf("Armor: %3d\n", player.Armor()) Prints the data we saved: Name: Robert Health: 60 Armor: 100
  18. Using FlatBuffers Here's how the object looks on disk: 8

    bytes: Offset of the Player object 12 bytes: Player object metadata 4 bytes: Name string metadata 4 bytes: Health 4 bytes: Armor + 8 bytes: "Robert" and padding -------------------------------------------- 40 bytes The philosophy is simple FlatBuffers generates code that uses just a few jumps to get any element you want. We use pointer arithmetic to skip the parsing step completely.
  19. Speed: Part 2 Speed: The big picture We're not just

    talking about mobile anymore. Every computer is resource-constrained. At scale, inefficiences add up to tremendous amounts of energy, time, and money.
  20. FlatBuffers can help Orders of magnitude faster Read-only microbenchmarks on

    a small dataset: Library ops/sec nanoseconds/op FlatBuffers 12,500,000 80 Protocol Buffers LITE 3,311 302,000 Rapid JSON 1,718 583,000 pugixml 5,102 196,000 Over 1000 times faster than Protocol Buffers. This uses the C++ version; we hope to make the Go version comparably fast. Source: google.github.io/flatbuffers/md__benchmarks.html (https://google.github.io/flatbuffers/md__benchmarks.html)
  21. More reasons to use FlatBuffers Not only is it fast,

    it also supports Schema versioning Union fields Default values Inline structs Variable-length vectors Available in C++ C# Java Go Hackable 2,200+ GitHub stars Clearly written, easy to comprehend Stable wire format Unit test suite Fuzz test suites Few lines of code C++ 3109 C/C++ Header 1179 Go 724 --------------------- Total 5012
  22. Use FlatBuffers today! Documentation: google.github.io/flatbuffers (https://google.github.io/flatbuffers) Source code: github.com/google/flatbuffers (https://github.com/google/flatbuffers)

    Go runtime library: go get github.com/google/flatbuffers/go Schema compiler: git clone https://github.com/google/flatbuffers.git Thank you Robert Winslow Programmer [email protected] (mailto:[email protected]) http://rwinslow.com/ (http://rwinslow.com/) @robert_winslow (http://twitter.com/robert_winslow)