Upgrade to Pro — share decks privately, control downloads, hide ads and more …

YAJBE - Yet Another JSON Binary Encoding

YAJBE - Yet Another JSON Binary Encoding

YAJBE is a compact binary data format built to be a drop-in replacement for JSON (JavaScript Object Notation).

We have a lot of services exchanging or storing data using JSON, and most of them don't want to switch to a data format that requires a schema.

We wanted to remove the overhead of the JSON format (especially field names), but keeping the same data model flexibility (numbers, strings, arrays, maps/objects, and a few values such as false, true, and null).

Read more about it at https://matteobertozzi.github.io/posts/yajbe-encoding

Matteo Bertozzi

March 12, 2023
Tweet

More Decks by Matteo Bertozzi

Other Decks in Programming

Transcript

  1. Th30z {code} YAJBE is a compact binary data format built

    to be a drop-in replacement for JSON (JavaScript Object Notation). What is YAJBE? Motivations [ {“name”: …, “surname”: …, “age”: …}, {“name”: …, “surname”: …, “age”: …}, {“name”: …, “surname”: …, “age”: …}, ] We have a lot of services exchanging or storing data using JSON, and most of them don't want to switch to a data format that requires a schema. We wanted to remove the overhead of the JSON format (especially field names), but keeping the same data model flexibility (numbers, strings, arrays, maps/objects, and a few values such as false, true, and null). fieldNamesSpaceUsed = len(“name”) + len(“surname”) + len(“age”); fieldNamesOverhead = (nRows - 1) * fieldNamesSpaceUsed
  2. Th30z {code} • Remove the space-overhead of duplicate "field names"

    for list of objects. • Reduce the space-overhead of Integer, boolean and other values. • Support for "raw" binary data, avoiding the base64 encode. • Faster encode/decode by moving to a binary format. • Simple spec to be easy to implement in many languages. • Drop-in replacement stringify()/parse() methods Goals [ {“name”: …, “surname”: …, “age”: …}, {“name”: …, “surname”: …, “age”: …}, {“name”: …, “surname”: …, “age”: …}, ] fieldNamesSpaceUsed = len(“name”) + len(“surname”) + len(“age”); fieldNamesOverhead = (nRows - 1) * fieldNamesSpaceUsed ObjectMapper mapper = new JsonMapper() -> YajbeMapper() byte[] encoded = mapper.writeValueAsBytes(obj) MyObject obj = mapper.readValue(encoded, MyObject.class); Java using Jackson Library JSON.stringify(): string -> YAJBE.encode(): Uint8Array JSON.parse(string): T -> YAJBE.decode(Uint8Array): T Javascript/Typescript
  3. Th30z {code} null 0 0 0 0 0 0 0

    0 0 1 2 3 4 5 6 7 8 false 0 0 0 0 0 0 1 0 0 1 2 3 4 5 6 7 8 true 0 0 0 0 0 0 1 1 0 1 2 3 4 5 6 7 8 JSON YAJBE 4bytes 5bytes 4bytes 1byte 1byte 1byte Reduce the overhead of Primitive Types
  4. Th30z {code} Reduce the overhead of Primitive Types 0 1

    0 x x x x x 0 1 2 3 4 5 6 7 8 0 1 1 x x x x x Int:unsigned [1-24] Int:signed [-23,0] 0 1 x 1 1 0 0 1 0 1 x 1 1 0 1 0 0 1 x 1 1 0 1 1 0 1 x 1 1 1 0 0 0 1 x 1 1 1 0 1 0 1 x 1 1 1 1 0 0 1 x 1 1 1 1 1 0 1 x 1 1 0 0 0 Int with external 1byte length Int with external 2bytes length Int with external 3bytes length Int with external 4bytes length Int with external 5bytes length Int with external 6bytes length Int with external 7bytes length Int with external 8bytes length JSON YAJBE 0 0 1 1 0 0 0 0 0 0 1 2 3 4 5 6 7 8 24 0 1 1 0 1 1 1 0 1 2 3 4 5 6 7 8 -23 0 1 1 1 0 1 1 1 0 1 2 3 4 5 6 7 8 1byte 2bytes 3bytes 1byte 1byte 1byte 0 0 1 0 1 1 0 0 0 0 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 0 1 2 3 4 5 6 7 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2byte 3byte 279 3bytes -65559 6bytes 0 1 2 3 4 5 6 7 8
  5. Th30z {code} Reduce the overhead of Field Names 1 0

    0 0 1 0 1 1 0 1 2 3 4 5 6 7 8 [ { “description”: …, “image_big_url”: …, “image_small_url”: …, “image_thumb_url”: …, “image_type”: …, }, { “description”: …, “image_big_url”: …, “image_small_url”: …, “image_thumb_url”: …, “image_type”: …, }, ] • Keep track of field names that we’ve already seen 
 (each field will get an index) • If the field name was already seen write the index • If the field has prefix and/or suffix in common with the previous store the difference • Otherwise store the full field “description” 1 0 0 0 1 1 0 1 “image_big_url” 1 1 1 0 0 1 0 1 6 4 “small” 1 1 1 0 0 1 0 1 6 4 “thumb” 1 1 0 0 0 1 0 0 6 “type” 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 0 1 0 0 1 0 0 Field Name at index 0 Field Name at index 1 Field Name at index 2 Field Name at index 3 Field Name at index 4 Full field name. Length 11 6bytes Prefix & 4bytes Suffix are in common with the previous field name. 5bytes are unique 6bytes of prefix, 4bytes are unique First object in the array Other objects in the array Full field name. Length 13
  6. Th30z {code} Arrays and Objects/Maps 0 0 1 x 0

    0 0 0 0 1 2 3 4 5 6 7 8 Empty Array [] Empty Object {} 0 0 1 x x x x x 0 1 2 3 4 5 6 7 8 items length < 11 Small Array/Object Array/Object with unspecified length 0 0 1 x 1 1 1 1 0 1 2 3 4 5 6 7 8 items 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 8 unspecified length End Of Array/Object 0 0 1 x 1 0 1 1 0 1 2 3 4 5 6 7 8 length Array/Object Items 0 0 1 x 1 1 0 0 0 0 1 x 1 1 0 1 0 0 1 x 1 1 1 0 External 2bytes length External 3bytes length External 4bytes length
  7. Th30z {code} String/Bytes 1 x 0 0 0 0 0

    0 0 1 2 3 4 5 6 7 8 Empty String/Bytes 1 x x x x x x x 0 1 2 3 4 5 6 7 8 bytes length < 60 Small String/Bytes 1 x 1 1 1 1 0 0 0 1 2 3 4 5 6 7 8 length String/Bytes bytes 1 x 1 1 1 1 0 1 1 x 1 1 1 1 1 0 1 x 1 1 1 1 1 1 External 2bytes length External 3bytes length External 4bytes length “Enum” Mapping In JSON enums are generally written as strings. Since there is no schema, we don’t know which string is an enum. But, we can find frequent strings and index them. 0 0 0 0 1 0 0 1 index 0 1 2 3 4 5 6 7 8