Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond JSON: Fantastic Serialization Formats and Where to Find Them

Yos Riady
January 13, 2017

Beyond JSON: Fantastic Serialization Formats and Where to Find Them

Today, JSON (Javascript Object Notation) is the de-facto serialization format for exchanging data between HTTP-connected services. Several features of JSON makes it a useful general purpose format: it's human readable, easy to learn, and the ubiquity of Javascript. In this talk, let's look beyond JSON. We'll learn about three different serialization formats (JSON, MessagePack, Protocol Buffers); and discover benefits unique to each.

https://goo.gl/f6ncAQ

Yos Riady

January 13, 2017
Tweet

More Decks by Yos Riady

Other Decks in Programming

Transcript

  1. A Web API is a website for your program A

    Web API lets software communicate with each other over the network. Serialization is a key step in this communication process. What medium do system communicate with?
  2. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  3. Introduction to Serialization (and Deserialization) Serialization is the process of

    translating object state into a format that can be transmitted and reconstructed later.
  4. Introduction to Serialization (and Deserialization) Serialization is the process of

    translating object state into a format that can be transmitted and reconstructed later.
  5. Reasons for Serialization • Communication: For transferring data between systems

    ◦ Systems need a shared language to exchange information ◦ The language has to be platform independent
  6. Challenges of Serialization • Human readability • Types and validation

    • Schema evolution • Interface Definition / Documentation • Performance • Others
  7. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  8. JSON (JavaScript Object Notation) • The de facto standard for

    data serialization on the web ◦ Easy to parse, generate, and read ◦ Human readable ◦ No schema ◦ No type checking • Easy to work with, but not very efficient over the wire • No built-in schema support
  9. JSON: Human readable { "first_name": "George", "last_name": "Washington", "birthday": "1732-02-22",

    "address": { "street_address": "3200 Mount Vernon Memorial Highway", "city": "Mount Vernon", "state": "Virginia", "country": "United States" } }
  10. • Type information from statically typed languages are ‘lost in

    translation’ • Validating messages is done by ad-hoc validation code, which needs to be written ◦ Checking if a required attribute exists ◦ Checking the types of an attribute ◦ Other validations No types
  11. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  12. MessagePack • Like JSON, but with efficient binary encoding ◦

    Not human readable ◦ Smaller: Takes less space ◦ Faster: Cut your client-server exchange traffic ◦ Schemas & Types (IDL) • Useful for systems that require low latency and high throughput. ◦ Realtime games & systems / APIs • Can be used alongside JSON
  13. MessagePack: More compact than JSON JSON: 27 bytes {“compact”: true,

    “schema”: 0} MessagePack: 18 bytes 82 a7 63 6f 6d 70 61 63 74 c3 a6 73 63 68 65 6d 61 00
  14. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  15. Protocol Buffers • A way of encoding structured data in

    an efficient yet extensible format. ◦ “The language of data” at Google ◦ Communication between internal services • Compact binary format • Schemas • Client generation
  16. “We carefully craft our data models inside our databases, maintain

    layers of code to keep these models in check, and then allow all that forethought to fly out of the window when we want to send that data over the wire to another service.”
  17. Protobufs: Schemas are awesome // Generated Java client Code Person

    john = Person.newBuilder() .setId(1234) .setName("John Doe") .setEmail("[email protected]") .build(); output = new FileOutputStream(args[0]); john.writeTo(output);
  18. message Person { required string name = 1; required int32

    id = 2; optional string email = 3; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } repeated PhoneNumber phone = 4; }
  19. Protobufs: Schema evolution We only know something once we start

    doing it. Can we add new fields to our schema over time, without breaking backwards-compatibility?
  20. Protobufs: Backward compatibility message Person { required int32 id =

    1 required string name = 2 optional string email = 3 } message Person { required int32 id = 1 required string name = 2 optional int32 age = 4 } • Old code will happily read new messages and simply ignore any new fields • To the old code, optional fields that were deleted will simply have their default value • New code will also transparently read old messages
  21. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  22. In Closing • When is JSON a good fit? ◦

    You want data to be human readable ◦ Data is consumed directly on the browser ◦ It’s not important to tie the data model to a schema • MessagePack ◦ When low latency and high throughput is key ◦ Internal communication • Protocol Buffers ◦ Serializing structured data with Schemas & Types ◦ Client generation across languages ◦ Backward compatibility & Schema evolution ◦ Internal communication