Slide 1

Slide 1 text

Bencode serializer and deserializer in Go Warsaw, January 2021 Oleg Kovalov

Slide 2

Slide 2 text

• Gopher for ~5 years • Open source contributor • Engineer at GoGoApps Hi, i’m Oleg olegk.dev @oleg_kovalov @cristaloleg

Slide 3

Slide 3 text

Before we start...

Slide 4

Slide 4 text

Before we start... • Let's treat them as 1 thing • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal

Slide 5

Slide 5 text

Before we start... • Let's treat them as 1 thing • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal • Is there any _real_ difference? • I don't know ¯\_(ツ)_/¯

Slide 6

Slide 6 text

Serializer & deserializer

Slide 7

Slide 7 text

Serializer & deserializer • Storing and retrieving data • Files and databases, RPC calls, etc

Slide 8

Slide 8 text

Serializer & deserializer • Storing and retrieving data • Files and databases, RPC calls, etc • Fast to process • in a both ways

Slide 9

Slide 9 text

Serializer & deserializer • Storing and retrieving data • Files and databases, RPC calls, etc • Fast to process • in a both ways • Memory conservative • small memory consumption

Slide 10

Slide 10 text

Serializer.png

Slide 11

Slide 11 text

Deserializer.png

Slide 12

Slide 12 text

Classification

Slide 13

Slide 13 text

Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...

Slide 14

Slide 14 text

Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • Binary • Gob, Msgpack, BSON, ...

Slide 15

Slide 15 text

Classification • Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • Binary • Gob, Msgpack, BSON, ... • Text • JSON, XML, YAML, XML, ...

Slide 16

Slide 16 text

Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ...

Slide 17

Slide 17 text

Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is good? • Binary, so compact • Fast because of codegen

Slide 18

Slide 18 text

Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is good? • Binary, so compact • Fast because of codegen • What is not • Must to have a schema • Tooling and codegen • Both • Cross lang-compatibility

Slide 19

Slide 19 text

Binary • Gob, Msgpack, BSON

Slide 20

Slide 20 text

Binary • Gob, Msgpack, BSON • What is good? • Schema, kind of • Fast because raw data

Slide 21

Slide 21 text

Binary • Gob, Msgpack, BSON • What is good? • Schema, kind of • Fast because raw data • What is not • Not for humans • Cross lang-compatibility

Slide 22

Slide 22 text

Text • JSON, XML, YAML, XML, ...

Slide 23

Slide 23 text

Text • JSON, XML, YAML, XML, ... • What is good? • Super easy to read(?) • Easy to integrate

Slide 24

Slide 24 text

Text • JSON, XML, YAML, XML, ... • What is good? • Super easy to read(?) • Easy to integrate • What is not • Parsing can be painful • Not so compact • Loosing data types

Slide 25

Slide 25 text

Welcome Bencode

Slide 26

Slide 26 text

Welcome Bencode • Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia

Slide 27

Slide 27 text

Welcome Bencode • Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia • Horribly unpopular • is used only in torrents, JSON conquered the World • But still an interesting thing to make!

Slide 28

Slide 28 text

It's simple!

Slide 29

Slide 29 text

It's simple! • Int as 'ie' • 42 == i42e

Slide 30

Slide 30 text

It's simple! • Int as 'ie' • 42 == i42e • String as ':' • "hello" == 5:hello

Slide 31

Slide 31 text

It's simple! • Int as 'ie' • 42 == i42e • String as ':' • "hello" == 5:hello • List as 'le' • ["hello", 78] == l5:helloi78ee

Slide 32

Slide 32 text

It's simple! • Int as 'ie' • 42 == i42e • String as ':' • "hello" == 5:hello • List as 'le' • ["hello", 78] == l5:helloi78ee • Map as 'de' (keys are sorted) • {"bar": "hello", "foo": 78}) == d3:bar5:hello3:fooi78ee.

Slide 33

Slide 33 text

What about API ? • Go stdlib suggests us • NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error

Slide 34

Slide 34 text

What about API ? • Go stdlib suggests us • NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world

Slide 35

Slide 35 text

What about API ? • Go stdlib suggests us • NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world • But is this enough? • Is this optimal? • Can we do better?

Slide 36

Slide 36 text

Make it enough • Cozy helpers • func Marshal(v interface{}) ([]byte, error) • NewEncoder + Encode on param v

Slide 37

Slide 37 text

Make it enough • Cozy helpers • func Marshal(v interface{}) ([]byte, error) • NewEncoder + Encode on param v • func Unmarshal(data []byte, v interface{}) error • NewDecoder + Decode on params data and v

Slide 38

Slide 38 text

Make it optimal • Help runtime to process memory • Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error)

Slide 39

Slide 39 text

Make it optimal • Help runtime to process memory • Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error) • Also useful interfaces • type Marshaler interface • MarshalBencode() ([]byte, error) • type Unmarshaler interface • UnmarshalBencode([]byte) error

Slide 40

Slide 40 text

Make it better • type Marshaler interface • MarshalBencode() ([]byte, error) • but this API forces user to allocate byteslice...

Slide 41

Slide 41 text

Make it better • type Marshaler interface • MarshalBencode() ([]byte, error) • but this API forces user to allocate byteslice... • MarshalBencodeTo(w io.Writer) error • passing encoder writer explicitly • no allocation is needed • great!

Slide 42

Slide 42 text

Sorted keys in dict • Number of keys is small • Let's use this feature

Slide 43

Slide 43 text

Sorted keys in dict • Number of keys is small • Let's use this feature • Sorting less than 20 items • Insertion sort rocks here

Slide 44

Slide 44 text

Check your bounds • Go is a memory safe language • You cannot access elements outside a variable • you'll got a panic, which is good • well, you can access anything via unsafe pkg • Smart compiler adds Bounds Checks™ to prevent bad access • And the process to remove some unneeded checks is called Bound Checks Elimination (BCE) • How it's related to our fast small sort?

Slide 45

Slide 45 text

Detecting bound checks • go-perftuner boundChecks . • boundChecks: .: ./util.go:3:26: slice/array has bound checks • boundChecks: .: ./util.go:4:23: slice/array has bound checks • boundChecks: .: ./util.go:4:32: slice/array has bound checkss

Slide 46

Slide 46 text

Eliminating checks • go-perftuner boundChecks . •

Slide 47

Slide 47 text

Ah, also fuzzing • Randomized testing for Go • 1 repo - tons of bugs and later fixes • https://github.com/dvyukov/go-fuzz • Super useful for deserialisers • Don't run on CI in most of the cases

Slide 48

Slide 48 text

just 1 file for fuzzing

Slide 49

Slide 49 text

Benchmarks • Hard to make right • There is no silver bullet (wat?) • Don't allocate unless you need this • Using one value and forgetting the rest • Not only op/s but also allocs/op and bytes/op

Slide 50

Slide 50 text

Benchmark encoding

Slide 51

Slide 51 text

Benchmark decoding

Slide 52

Slide 52 text

A meme is worth a 1000 words

Slide 53

Slide 53 text

Where is the code Lebowski? • https://github.com/cristalhq/bencode • benchmarks https://github.com/cristaloleg/benches • more benchmarks https://github.com/alecthomas/ go_serialization_benchmarks

Slide 54

Slide 54 text

Also see • Other projects • https://github.com/cristalhq • go-perftuner • https://github.com/cristaloleg/go-perftuner • Serializing Data in Go | Klaus Post | Go Systems Conf SF 2020 • https://www.youtube.com/watch?v=YIOuEFjCmXE

Slide 55

Slide 55 text

Thanks • Bohdan Storozhuk • Iskander Sharipov • Roman Bystrytskyi • you and Golang Poland <3

Slide 56

Slide 56 text

Thank you Questions? @oleg_kovalov @cristaloleg