Bencode
serializer and
deserializer in Go
Warsaw, January 2021
Oleg Kovalov
Slide 2
Slide 2 text
• Gopher for ~5 years
• Open source contributor
• Engineer at GoGoApps
Hi, i’m Oleg
olegk.dev
@oleg_kovalov
@cristaloleg
Slide 3
Slide 3 text
Before we start...
Slide 4
Slide 4 text
Before we start...
• Let's treat them as 1 thing
• Encode - Decode
• Serialize - Dezerialize
• Marshal - Unmarshal
Slide 5
Slide 5 text
Before we start...
• Let's treat them as 1 thing
• Encode - Decode
• Serialize - Dezerialize
• Marshal - Unmarshal
• Is there any _real_ difference?
• I don't know ¯\_(ツ)_/¯
Slide 6
Slide 6 text
Serializer & deserializer
Slide 7
Slide 7 text
Serializer & deserializer
• Storing and retrieving data
• Files and databases, RPC calls, etc
Slide 8
Slide 8 text
Serializer & deserializer
• Storing and retrieving data
• Files and databases, RPC calls, etc
• Fast to process
• in a both ways
Slide 9
Slide 9 text
Serializer & deserializer
• Storing and retrieving data
• Files and databases, RPC calls, etc
• Fast to process
• in a both ways
• Memory conservative
• small memory consumption
Schema-driven
• Protobuf, FlatBuffers, Cap'n'proto, Avro, ...
• What is good?
• Binary, so compact
• Fast because of codegen
Slide 18
Slide 18 text
Schema-driven
• Protobuf, FlatBuffers, Cap'n'proto, Avro, ...
• What is good?
• Binary, so compact
• Fast because of codegen
• What is not
• Must to have a schema
• Tooling and codegen
• Both
• Cross lang-compatibility
Slide 19
Slide 19 text
Binary
• Gob, Msgpack, BSON
Slide 20
Slide 20 text
Binary
• Gob, Msgpack, BSON
• What is good?
• Schema, kind of
• Fast because raw data
Slide 21
Slide 21 text
Binary
• Gob, Msgpack, BSON
• What is good?
• Schema, kind of
• Fast because raw data
• What is not
• Not for humans
• Cross lang-compatibility
Slide 22
Slide 22 text
Text
• JSON, XML, YAML, XML, ...
Slide 23
Slide 23 text
Text
• JSON, XML, YAML, XML, ...
• What is good?
• Super easy to read(?)
• Easy to integrate
Slide 24
Slide 24 text
Text
• JSON, XML, YAML, XML, ...
• What is good?
• Super easy to read(?)
• Easy to integrate
• What is not
• Parsing can be painful
• Not so compact
• Loosing data types
Slide 25
Slide 25 text
Welcome Bencode
Slide 26
Slide 26 text
Welcome Bencode
• Bencode is the encoding used by the peer-to-peer file sharing
system BitTorrent for storing and transmitting loosely structured
data.
• info by lovely Wikipedia
Slide 27
Slide 27 text
Welcome Bencode
• Bencode is the encoding used by the peer-to-peer file sharing
system BitTorrent for storing and transmitting loosely structured
data.
• info by lovely Wikipedia
• Horribly unpopular
• is used only in torrents, JSON conquered the World
• But still an interesting thing to make!
Slide 28
Slide 28 text
It's simple!
Slide 29
Slide 29 text
It's simple!
• Int as 'ie'
• 42 == i42e
Slide 30
Slide 30 text
It's simple!
• Int as 'ie'
• 42 == i42e
• String as ':'
• "hello" == 5:hello
Slide 31
Slide 31 text
It's simple!
• Int as 'ie'
• 42 == i42e
• String as ':'
• "hello" == 5:hello
• List as 'le'
• ["hello", 78] == l5:helloi78ee
Slide 32
Slide 32 text
It's simple!
• Int as 'ie'
• 42 == i42e
• String as ':'
• "hello" == 5:hello
• List as 'le'
• ["hello", 78] == l5:helloi78ee
• Map as 'de' (keys are sorted)
• {"bar": "hello", "foo": 78}) == d3:bar5:hello3:fooi78ee.
Slide 33
Slide 33 text
What about API ?
• Go stdlib suggests us
• NewEncoder(w io.Writer) + Encode(v interface{}) error
• NewDecoder(r io.Reader) + Decode(v interface{}) error
Slide 34
Slide 34 text
What about API ?
• Go stdlib suggests us
• NewEncoder(w io.Writer) + Encode(v interface{}) error
• NewDecoder(r io.Reader) + Decode(v interface{}) error
• Looks good
• Same as JSON, YAML, XML and everything else in Go world
Slide 35
Slide 35 text
What about API ?
• Go stdlib suggests us
• NewEncoder(w io.Writer) + Encode(v interface{}) error
• NewDecoder(r io.Reader) + Decode(v interface{}) error
• Looks good
• Same as JSON, YAML, XML and everything else in Go world
• But is this enough?
• Is this optimal?
• Can we do better?
Slide 36
Slide 36 text
Make it enough
• Cozy helpers
• func Marshal(v interface{}) ([]byte, error)
• NewEncoder + Encode on param v
Slide 37
Slide 37 text
Make it enough
• Cozy helpers
• func Marshal(v interface{}) ([]byte, error)
• NewEncoder + Encode on param v
• func Unmarshal(data []byte, v interface{}) error
• NewDecoder + Decode on params data and v
Slide 38
Slide 38 text
Make it optimal
• Help runtime to process memory
• Marshal(v interface{}) ([]byte, error)
• MarshalTo(dst []byte, v interface{}) ([]byte, error)
Slide 39
Slide 39 text
Make it optimal
• Help runtime to process memory
• Marshal(v interface{}) ([]byte, error)
• MarshalTo(dst []byte, v interface{}) ([]byte, error)
• Also useful interfaces
• type Marshaler interface
• MarshalBencode() ([]byte, error)
• type Unmarshaler interface
• UnmarshalBencode([]byte) error
Slide 40
Slide 40 text
Make it better
• type Marshaler interface
• MarshalBencode() ([]byte, error)
• but this API forces user to allocate byteslice...
Slide 41
Slide 41 text
Make it better
• type Marshaler interface
• MarshalBencode() ([]byte, error)
• but this API forces user to allocate byteslice...
• MarshalBencodeTo(w io.Writer) error
• passing encoder writer explicitly
• no allocation is needed
• great!
Slide 42
Slide 42 text
Sorted keys in dict
• Number of keys is small
• Let's use this feature
Slide 43
Slide 43 text
Sorted keys in dict
• Number of keys is small
• Let's use this feature
• Sorting less than 20 items
• Insertion sort rocks here
Slide 44
Slide 44 text
Check your bounds
• Go is a memory safe language
• You cannot access elements outside a variable
• you'll got a panic, which is good
• well, you can access anything via unsafe pkg
• Smart compiler adds Bounds Checks™ to prevent bad access
• And the process to remove some unneeded checks is called Bound
Checks Elimination (BCE)
• How it's related to our fast small sort?
Ah, also fuzzing
• Randomized testing for Go
• 1 repo - tons of bugs and later fixes
• https://github.com/dvyukov/go-fuzz
• Super useful for deserialisers
• Don't run on CI in most of the cases
Slide 48
Slide 48 text
just 1 file for fuzzing
Slide 49
Slide 49 text
Benchmarks
• Hard to make right
• There is no silver bullet (wat?)
• Don't allocate unless you need this
• Using one value and forgetting the rest
• Not only op/s but also allocs/op and bytes/op
Slide 50
Slide 50 text
Benchmark encoding
Slide 51
Slide 51 text
Benchmark decoding
Slide 52
Slide 52 text
A meme is worth a 1000 words
Slide 53
Slide 53 text
Where is the code Lebowski?
• https://github.com/cristalhq/bencode
• benchmarks https://github.com/cristaloleg/benches
• more benchmarks https://github.com/alecthomas/
go_serialization_benchmarks
Slide 54
Slide 54 text
Also see
• Other projects
• https://github.com/cristalhq
• go-perftuner
• https://github.com/cristaloleg/go-perftuner
• Serializing Data in Go | Klaus Post | Go Systems Conf SF
2020
• https://www.youtube.com/watch?v=YIOuEFjCmXE
Slide 55
Slide 55 text
Thanks
• Bohdan Storozhuk
• Iskander Sharipov
• Roman Bystrytskyi
• you and Golang Poland <3