Before we start... • Let's treat them as 1 thing • Encode - Decode • Serialize - Dezerialize • Marshal - Unmarshal • Is there any _real_ difference? • I don't know ¯\_(ツ)_/¯
Serializer & deserializer • Storing and retrieving data • Files and databases, RPC calls, etc • Fast to process • in a both ways • Memory conservative • small memory consumption
Schema-driven • Protobuf, FlatBuffers, Cap'n'proto, Avro, ... • What is good? • Binary, so compact • Fast because of codegen • What is not • Must to have a schema • Tooling and codegen • Both • Cross lang-compatibility
Text • JSON, XML, YAML, XML, ... • What is good? • Super easy to read(?) • Easy to integrate • What is not • Parsing can be painful • Not so compact • Loosing data types
Welcome Bencode • Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia
Welcome Bencode • Bencode is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. • info by lovely Wikipedia • Horribly unpopular • is used only in torrents, JSON conquered the World • But still an interesting thing to make!
What about API ? • Go stdlib suggests us • NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error
What about API ? • Go stdlib suggests us • NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world
What about API ? • Go stdlib suggests us • NewEncoder(w io.Writer) + Encode(v interface{}) error • NewDecoder(r io.Reader) + Decode(v interface{}) error • Looks good • Same as JSON, YAML, XML and everything else in Go world • But is this enough? • Is this optimal? • Can we do better?
Make it enough • Cozy helpers • func Marshal(v interface{}) ([]byte, error) • NewEncoder + Encode on param v • func Unmarshal(data []byte, v interface{}) error • NewDecoder + Decode on params data and v
Make it optimal • Help runtime to process memory • Marshal(v interface{}) ([]byte, error) • MarshalTo(dst []byte, v interface{}) ([]byte, error) • Also useful interfaces • type Marshaler interface • MarshalBencode() ([]byte, error) • type Unmarshaler interface • UnmarshalBencode([]byte) error
Make it better • type Marshaler interface • MarshalBencode() ([]byte, error) • but this API forces user to allocate byteslice... • MarshalBencodeTo(w io.Writer) error • passing encoder writer explicitly • no allocation is needed • great!
Check your bounds • Go is a memory safe language • You cannot access elements outside a variable • you'll got a panic, which is good • well, you can access anything via unsafe pkg • Smart compiler adds Bounds Checks™ to prevent bad access • And the process to remove some unneeded checks is called Bound Checks Elimination (BCE) • How it's related to our fast small sort?
Ah, also fuzzing • Randomized testing for Go • 1 repo - tons of bugs and later fixes • https://github.com/dvyukov/go-fuzz • Super useful for deserialisers • Don't run on CI in most of the cases
Benchmarks • Hard to make right • There is no silver bullet (wat?) • Don't allocate unless you need this • Using one value and forgetting the rest • Not only op/s but also allocs/op and bytes/op
Where is the code Lebowski? • https://github.com/cristalhq/bencode • benchmarks https://github.com/cristaloleg/benches • more benchmarks https://github.com/alecthomas/ go_serialization_benchmarks
Also see • Other projects • https://github.com/cristalhq • go-perftuner • https://github.com/cristaloleg/go-perftuner • Serializing Data in Go | Klaus Post | Go Systems Conf SF 2020 • https://www.youtube.com/watch?v=YIOuEFjCmXE