Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bencode - serializer and deserializer in Go

Bencode - serializer and deserializer in Go

Oleg Kovalov

January 28, 2021
Tweet

More Decks by Oleg Kovalov

Other Decks in Programming

Transcript

  1. • Gopher for ~5 years • Open source contributor •

    Engineer at GoGoApps Hi, i’m Oleg olegk.dev @oleg_kovalov @cristaloleg
  2. Before we start... • Let's treat them as 1 thing

    • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal
  3. Before we start... • Let's treat them as 1 thing

    • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal • Is there any _real_ difference? • I don't know ¯\_(ツ)_/¯
  4. Serializer & deserializer • Storing and retrieving data • Files

    and databases, RPC calls, etc • Fast to process • in a both ways
  5. Serializer & deserializer • Storing and retrieving data • Files

    and databases, RPC calls, etc • Fast to process • in a both ways • Memory conservative • small memory consumption
  6. Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... •

    Binary • Gob, Msgpack, BSON, ... • Text • JSON, XML, YAML, XML, ...
  7. Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is

    good? • Binary, so compact • Fast because of codegen
  8. Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is

    good? • Binary, so compact • Fast because of codegen • What is not • Must to have a schema • Tooling and codegen • Both • Cross lang-compatibility
  9. Binary • Gob, Msgpack, BSON • What is good? •

    Schema, kind of • Fast because raw data
  10. Binary • Gob, Msgpack, BSON • What is good? •

    Schema, kind of • Fast because raw data • What is not • Not for humans • Cross lang-compatibility
  11. Text • JSON, XML, YAML, XML, ... • What is

    good? • Super easy to read(?) • Easy to integrate
  12. Text • JSON, XML, YAML, XML, ... • What is

    good? • Super easy to read(?) • Easy to integrate • What is not • Parsing can be painful • Not so compact • Loosing data types
  13. Welcome Bencode • Bencode is the encoding used by the

    peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia
  14. Welcome Bencode • Bencode is the encoding used by the

    peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia • Horribly unpopular • is used only in torrents, JSON conquered the World • But still an interesting thing to make!
  15. It's simple! • Int as 'i<integer>e' • 42 == i42e

    • String as '<length>:<contents>' • "hello" == 5:hello
  16. It's simple! • Int as 'i<integer>e' • 42 == i42e

    • String as '<length>:<contents>' • "hello" == 5:hello • List as 'l<contents>e' • ["hello", 78] == l5:helloi78ee
  17. It's simple! • Int as 'i<integer>e' • 42 == i42e

    • String as '<length>:<contents>' • "hello" == 5:hello • List as 'l<contents>e' • ["hello", 78] == l5:helloi78ee • Map as 'd<contents>e' (keys are sorted) • {"bar": "hello", "foo": 78}) == d3:bar5:hello3:fooi78ee.
  18. What about API ? • Go stdlib suggests us •

    NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error
  19. What about API ? • Go stdlib suggests us •

    NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world
  20. What about API ? • Go stdlib suggests us •

    NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world • But is this enough? • Is this optimal? • Can we do better?
  21. Make it enough • Cozy helpers • func Marshal(v interface{})

    ([]byte, error) • NewEncoder + Encode on param v
  22. Make it enough • Cozy helpers • func Marshal(v interface{})

    ([]byte, error) • NewEncoder + Encode on param v • func Unmarshal(data []byte, v interface{}) error • NewDecoder + Decode on params data and v
  23. Make it optimal • Help runtime to process memory •

    Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error)
  24. Make it optimal • Help runtime to process memory •

    Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error) • Also useful interfaces • type Marshaler interface • MarshalBencode() ([]byte, error) • type Unmarshaler interface • UnmarshalBencode([]byte) error
  25. Make it better • type Marshaler interface • MarshalBencode() ([]byte,

    error) • but this API forces user to allocate byteslice...
  26. Make it better • type Marshaler interface • MarshalBencode() ([]byte,

    error) • but this API forces user to allocate byteslice... • MarshalBencodeTo(w io.Writer) error • passing encoder writer explicitly • no allocation is needed • great!
  27. Sorted keys in dict • Number of keys is small

    • Let's use this feature • Sorting less than 20 items • Insertion sort rocks here
  28. Check your bounds • Go is a memory safe language

    • You cannot access elements outside a variable • you'll got a panic, which is good • well, you can access anything via unsafe pkg • Smart compiler adds Bounds Checks™ to prevent bad access • And the process to remove some unneeded checks is called Bound Checks Elimination (BCE) • How it's related to our fast small sort?
  29. Detecting bound checks • go-perftuner boundChecks . • boundChecks: .:

    ./util.go:3:26: slice/array has bound checks • boundChecks: .: ./util.go:4:23: slice/array has bound checks • boundChecks: .: ./util.go:4:32: slice/array has bound checkss
  30. Ah, also fuzzing • Randomized testing for Go • 1

    repo - tons of bugs and later fixes • https://github.com/dvyukov/go-fuzz • Super useful for deserialisers • Don't run on CI in most of the cases
  31. Benchmarks • Hard to make right • There is no

    silver bullet (wat?) • Don't allocate unless you need this • Using one value and forgetting the rest • Not only op/s but also allocs/op and bytes/op
  32. Also see • Other projects • https://github.com/cristalhq • go-perftuner •

    https://github.com/cristaloleg/go-perftuner • Serializing Data in Go | Klaus Post | Go Systems Conf SF 2020 • https://www.youtube.com/watch?v=YIOuEFjCmXE