Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bencode - serializer and deserializer in Go

Bencode - serializer and deserializer in Go

Oleg Kovalov

January 28, 2021
Tweet

More Decks by Oleg Kovalov

Other Decks in Programming

Transcript

  1. Bencode
    serializer and
    deserializer in Go
    Warsaw, January 2021
    Oleg Kovalov

    View Slide

  2. • Gopher for ~5 years
    • Open source contributor
    • Engineer at GoGoApps
    Hi, i’m Oleg
    olegk.dev
    @oleg_kovalov
    @cristaloleg

    View Slide

  3. Before we start...

    View Slide

  4. Before we start...
    • Let's treat them as 1 thing
    • Encode - Decode
    • Serialize - Dezerialize
    • Marshal - Unmarshal

    View Slide

  5. Before we start...
    • Let's treat them as 1 thing
    • Encode - Decode
    • Serialize - Dezerialize
    • Marshal - Unmarshal
    • Is there any _real_ difference?
    • I don't know ¯\_(ツ)_/¯

    View Slide

  6. Serializer & deserializer

    View Slide

  7. Serializer & deserializer
    • Storing and retrieving data
    • Files and databases, RPC calls, etc

    View Slide

  8. Serializer & deserializer
    • Storing and retrieving data
    • Files and databases, RPC calls, etc
    • Fast to process
    • in a both ways

    View Slide

  9. Serializer & deserializer
    • Storing and retrieving data
    • Files and databases, RPC calls, etc
    • Fast to process
    • in a both ways
    • Memory conservative
    • small memory consumption

    View Slide

  10. Serializer.png

    View Slide

  11. Deserializer.png

    View Slide

  12. Classification

    View Slide

  13. Classification
    • Schema-driven
    • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...

    View Slide

  14. Classification
    • Schema-driven
    • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...
    • Binary
    • Gob, Msgpack, BSON, ...

    View Slide

  15. Classification
    • Schema-driven
    • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...
    • Binary
    • Gob, Msgpack, BSON, ...
    • Text
    • JSON, XML, YAML, XML, ...

    View Slide

  16. Schema-driven
    • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...

    View Slide

  17. Schema-driven
    • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...
    • What is good?
    • Binary, so compact
    • Fast because of codegen

    View Slide

  18. Schema-driven
    • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...
    • What is good?
    • Binary, so compact
    • Fast because of codegen
    • What is not
    • Must to have a schema
    • Tooling and codegen
    • Both
    • Cross lang-compatibility

    View Slide

  19. Binary
    • Gob, Msgpack, BSON

    View Slide

  20. Binary
    • Gob, Msgpack, BSON
    • What is good?
    • Schema, kind of
    • Fast because raw data

    View Slide

  21. Binary
    • Gob, Msgpack, BSON
    • What is good?
    • Schema, kind of
    • Fast because raw data
    • What is not
    • Not for humans
    • Cross lang-compatibility

    View Slide

  22. Text
    • JSON, XML, YAML, XML, ...

    View Slide

  23. Text
    • JSON, XML, YAML, XML, ...
    • What is good?
    • Super easy to read(?)
    • Easy to integrate

    View Slide

  24. Text
    • JSON, XML, YAML, XML, ...
    • What is good?
    • Super easy to read(?)
    • Easy to integrate
    • What is not
    • Parsing can be painful
    • Not so compact
    • Loosing data types

    View Slide

  25. Welcome Bencode

    View Slide

  26. Welcome Bencode
    • Bencode is the encoding used by the peer-to-peer file sharing
    system BitTorrent for storing and transmitting loosely structured
    data.
    • info by lovely Wikipedia

    View Slide

  27. Welcome Bencode
    • Bencode is the encoding used by the peer-to-peer file sharing
    system BitTorrent for storing and transmitting loosely structured
    data.
    • info by lovely Wikipedia
    • Horribly unpopular
    • is used only in torrents, JSON conquered the World
    • But still an interesting thing to make!

    View Slide

  28. It's simple!

    View Slide

  29. It's simple!
    • Int as 'ie'
    • 42 == i42e

    View Slide

  30. It's simple!
    • Int as 'ie'
    • 42 == i42e
    • String as ':'
    • "hello" == 5:hello

    View Slide

  31. It's simple!
    • Int as 'ie'
    • 42 == i42e
    • String as ':'
    • "hello" == 5:hello
    • List as 'le'
    • ["hello", 78] == l5:helloi78ee

    View Slide

  32. It's simple!
    • Int as 'ie'
    • 42 == i42e
    • String as ':'
    • "hello" == 5:hello
    • List as 'le'
    • ["hello", 78] == l5:helloi78ee
    • Map as 'de' (keys are sorted)
    • {"bar": "hello", "foo": 78}) == d3:bar5:hello3:fooi78ee.

    View Slide

  33. What about API ?
    • Go stdlib suggests us
    • NewEncoder(w io.Writer) + Encode(v interface{}) error
    • NewDecoder(r io.Reader) + Decode(v interface{}) error

    View Slide

  34. What about API ?
    • Go stdlib suggests us
    • NewEncoder(w io.Writer) + Encode(v interface{}) error
    • NewDecoder(r io.Reader) + Decode(v interface{}) error
    • Looks good
    • Same as JSON, YAML, XML and everything else in Go world

    View Slide

  35. What about API ?
    • Go stdlib suggests us
    • NewEncoder(w io.Writer) + Encode(v interface{}) error
    • NewDecoder(r io.Reader) + Decode(v interface{}) error
    • Looks good
    • Same as JSON, YAML, XML and everything else in Go world
    • But is this enough?
    • Is this optimal?
    • Can we do better?

    View Slide

  36. Make it enough
    • Cozy helpers
    • func Marshal(v interface{}) ([]byte, error)
    • NewEncoder + Encode on param v

    View Slide

  37. Make it enough
    • Cozy helpers
    • func Marshal(v interface{}) ([]byte, error)
    • NewEncoder + Encode on param v
    • func Unmarshal(data []byte, v interface{}) error
    • NewDecoder + Decode on params data and v

    View Slide

  38. Make it optimal
    • Help runtime to process memory
    • Marshal(v interface{}) ([]byte, error)
    • MarshalTo(dst []byte, v interface{}) ([]byte, error)

    View Slide

  39. Make it optimal
    • Help runtime to process memory
    • Marshal(v interface{}) ([]byte, error)
    • MarshalTo(dst []byte, v interface{}) ([]byte, error)
    • Also useful interfaces
    • type Marshaler interface
    • MarshalBencode() ([]byte, error)
    • type Unmarshaler interface
    • UnmarshalBencode([]byte) error

    View Slide

  40. Make it better
    • type Marshaler interface
    • MarshalBencode() ([]byte, error)
    • but this API forces user to allocate byteslice...

    View Slide

  41. Make it better
    • type Marshaler interface
    • MarshalBencode() ([]byte, error)
    • but this API forces user to allocate byteslice...
    • MarshalBencodeTo(w io.Writer) error
    • passing encoder writer explicitly
    • no allocation is needed
    • great!

    View Slide

  42. Sorted keys in dict
    • Number of keys is small
    • Let's use this feature

    View Slide

  43. Sorted keys in dict
    • Number of keys is small
    • Let's use this feature
    • Sorting less than 20 items
    • Insertion sort rocks here

    View Slide

  44. Check your bounds
    • Go is a memory safe language
    • You cannot access elements outside a variable
    • you'll got a panic, which is good
    • well, you can access anything via unsafe pkg
    • Smart compiler adds Bounds Checks™ to prevent bad access
    • And the process to remove some unneeded checks is called Bound
    Checks Elimination (BCE)
    • How it's related to our fast small sort?

    View Slide

  45. Detecting bound checks
    • go-perftuner boundChecks .
    • boundChecks: .: ./util.go:3:26: slice/array has bound checks
    • boundChecks: .: ./util.go:4:23: slice/array has bound checks
    • boundChecks: .: ./util.go:4:32: slice/array has bound checkss

    View Slide

  46. Eliminating checks
    • go-perftuner boundChecks .

    View Slide

  47. Ah, also fuzzing
    • Randomized testing for Go
    • 1 repo - tons of bugs and later fixes
    • https://github.com/dvyukov/go-fuzz
    • Super useful for deserialisers
    • Don't run on CI in most of the cases

    View Slide

  48. just 1 file for fuzzing

    View Slide

  49. Benchmarks
    • Hard to make right
    • There is no silver bullet (wat?)
    • Don't allocate unless you need this
    • Using one value and forgetting the rest
    • Not only op/s but also allocs/op and bytes/op

    View Slide

  50. Benchmark encoding

    View Slide

  51. Benchmark decoding

    View Slide

  52. A meme is worth a 1000 words

    View Slide

  53. Where is the code Lebowski?
    • https://github.com/cristalhq/bencode
    • benchmarks https://github.com/cristaloleg/benches
    • more benchmarks https://github.com/alecthomas/
    go_serialization_benchmarks

    View Slide

  54. Also see
    • Other projects
    • https://github.com/cristalhq
    • go-perftuner
    • https://github.com/cristaloleg/go-perftuner
    • Serializing Data in Go | Klaus Post | Go Systems Conf SF
    2020
    • https://www.youtube.com/watch?v=YIOuEFjCmXE

    View Slide

  55. Thanks
    • Bohdan Storozhuk
    • Iskander Sharipov
    • Roman Bystrytskyi
    • you and Golang Poland <3

    View Slide

  56. Thank you
    Questions?
    @oleg_kovalov
    @cristaloleg

    View Slide