Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bencode - serializer and deserializer in Go

Bencode - serializer and deserializer in Go

64a4ba69d50590e592cd8e572454daa8?s=128

Oleg Kovalov

January 28, 2021
Tweet

Transcript

  1. Bencode serializer and deserializer in Go Warsaw, January 2021 Oleg

    Kovalov
  2. • Gopher for ~5 years • Open source contributor •

    Engineer at GoGoApps Hi, i’m Oleg olegk.dev @oleg_kovalov @cristaloleg
  3. Before we start...

  4. Before we start... • Let's treat them as 1 thing

    • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal
  5. Before we start... • Let's treat them as 1 thing

    • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal • Is there any _real_ difference? • I don't know ¯\_(ツ)_/¯
  6. Serializer & deserializer

  7. Serializer & deserializer • Storing and retrieving data • Files

    and databases, RPC calls, etc
  8. Serializer & deserializer • Storing and retrieving data • Files

    and databases, RPC calls, etc • Fast to process • in a both ways
  9. Serializer & deserializer • Storing and retrieving data • Files

    and databases, RPC calls, etc • Fast to process • in a both ways • Memory conservative • small memory consumption
  10. Serializer.png

  11. Deserializer.png

  12. Classification

  13. Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...

  14. Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... •

    Binary • Gob, Msgpack, BSON, ...
  15. Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... •

    Binary • Gob, Msgpack, BSON, ... • Text • JSON, XML, YAML, XML, ...
  16. Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...

  17. Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is

    good? • Binary, so compact • Fast because of codegen
  18. Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is

    good? • Binary, so compact • Fast because of codegen • What is not • Must to have a schema • Tooling and codegen • Both • Cross lang-compatibility
  19. Binary • Gob, Msgpack, BSON

  20. Binary • Gob, Msgpack, BSON • What is good? •

    Schema, kind of • Fast because raw data
  21. Binary • Gob, Msgpack, BSON • What is good? •

    Schema, kind of • Fast because raw data • What is not • Not for humans • Cross lang-compatibility
  22. Text • JSON, XML, YAML, XML, ...

  23. Text • JSON, XML, YAML, XML, ... • What is

    good? • Super easy to read(?) • Easy to integrate
  24. Text • JSON, XML, YAML, XML, ... • What is

    good? • Super easy to read(?) • Easy to integrate • What is not • Parsing can be painful • Not so compact • Loosing data types
  25. Welcome Bencode

  26. Welcome Bencode • Bencode is the encoding used by the

    peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia
  27. Welcome Bencode • Bencode is the encoding used by the

    peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia • Horribly unpopular • is used only in torrents, JSON conquered the World • But still an interesting thing to make!
  28. It's simple!

  29. It's simple! • Int as 'i<integer>e' • 42 == i42e

  30. It's simple! • Int as 'i<integer>e' • 42 == i42e

    • String as '<length>:<contents>' • "hello" == 5:hello
  31. It's simple! • Int as 'i<integer>e' • 42 == i42e

    • String as '<length>:<contents>' • "hello" == 5:hello • List as 'l<contents>e' • ["hello", 78] == l5:helloi78ee
  32. It's simple! • Int as 'i<integer>e' • 42 == i42e

    • String as '<length>:<contents>' • "hello" == 5:hello • List as 'l<contents>e' • ["hello", 78] == l5:helloi78ee • Map as 'd<contents>e' (keys are sorted) • {"bar": "hello", "foo": 78}) == d3:bar5:hello3:fooi78ee.
  33. What about API ? • Go stdlib suggests us •

    NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error
  34. What about API ? • Go stdlib suggests us •

    NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world
  35. What about API ? • Go stdlib suggests us •

    NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world • But is this enough? • Is this optimal? • Can we do better?
  36. Make it enough • Cozy helpers • func Marshal(v interface{})

    ([]byte, error) • NewEncoder + Encode on param v
  37. Make it enough • Cozy helpers • func Marshal(v interface{})

    ([]byte, error) • NewEncoder + Encode on param v • func Unmarshal(data []byte, v interface{}) error • NewDecoder + Decode on params data and v
  38. Make it optimal • Help runtime to process memory •

    Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error)
  39. Make it optimal • Help runtime to process memory •

    Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error) • Also useful interfaces • type Marshaler interface • MarshalBencode() ([]byte, error) • type Unmarshaler interface • UnmarshalBencode([]byte) error
  40. Make it better • type Marshaler interface • MarshalBencode() ([]byte,

    error) • but this API forces user to allocate byteslice...
  41. Make it better • type Marshaler interface • MarshalBencode() ([]byte,

    error) • but this API forces user to allocate byteslice... • MarshalBencodeTo(w io.Writer) error • passing encoder writer explicitly • no allocation is needed • great!
  42. Sorted keys in dict • Number of keys is small

    • Let's use this feature
  43. Sorted keys in dict • Number of keys is small

    • Let's use this feature • Sorting less than 20 items • Insertion sort rocks here
  44. Check your bounds • Go is a memory safe language

    • You cannot access elements outside a variable • you'll got a panic, which is good • well, you can access anything via unsafe pkg • Smart compiler adds Bounds Checks™ to prevent bad access • And the process to remove some unneeded checks is called Bound Checks Elimination (BCE) • How it's related to our fast small sort?
  45. Detecting bound checks • go-perftuner boundChecks . • boundChecks: .:

    ./util.go:3:26: slice/array has bound checks • boundChecks: .: ./util.go:4:23: slice/array has bound checks • boundChecks: .: ./util.go:4:32: slice/array has bound checkss
  46. Eliminating checks • go-perftuner boundChecks . • <empty output>

  47. Ah, also fuzzing • Randomized testing for Go • 1

    repo - tons of bugs and later fixes • https://github.com/dvyukov/go-fuzz • Super useful for deserialisers • Don't run on CI in most of the cases
  48. just 1 file for fuzzing

  49. Benchmarks • Hard to make right • There is no

    silver bullet (wat?) • Don't allocate unless you need this • Using one value and forgetting the rest • Not only op/s but also allocs/op and bytes/op
  50. Benchmark encoding

  51. Benchmark decoding

  52. A meme is worth a 1000 words

  53. Where is the code Lebowski? • https://github.com/cristalhq/bencode • benchmarks https://github.com/cristaloleg/benches

    • more benchmarks https://github.com/alecthomas/ go_serialization_benchmarks
  54. Also see • Other projects • https://github.com/cristalhq • go-perftuner •

    https://github.com/cristaloleg/go-perftuner • Serializing Data in Go | Klaus Post | Go Systems Conf SF 2020 • https://www.youtube.com/watch?v=YIOuEFjCmXE
  55. Thanks • Bohdan Storozhuk • Iskander Sharipov • Roman Bystrytskyi

    • you and Golang Poland <3
  56. Thank you Questions? @oleg_kovalov @cristaloleg