Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond JSON: Fantastic Serialization Formats and Where to Find Them

1b0ab2500efc1b91424fb49045312929?s=47 Yos Riady
January 13, 2017

Beyond JSON: Fantastic Serialization Formats and Where to Find Them

Today, JSON (Javascript Object Notation) is the de-facto serialization format for exchanging data between HTTP-connected services. Several features of JSON makes it a useful general purpose format: it's human readable, easy to learn, and the ubiquity of Javascript. In this talk, let's look beyond JSON. We'll learn about three different serialization formats (JSON, MessagePack, Protocol Buffers); and discover benefits unique to each.

https://goo.gl/f6ncAQ

1b0ab2500efc1b91424fb49045312929?s=128

Yos Riady

January 13, 2017
Tweet

Transcript

  1. Beyond JSON Fantastic Serialization Formats and Where To Find Them

    Yos Riady yos.io goo.gl/f6ncAQ
  2. Beyond JSON Fantastic Cerealization Formats and Where To Find Them

    Yos Riady yos.io goo.gl/f6ncAQ
  3. None
  4. What’s a Web API?

  5. A Web API is a website for your program A

    Web API lets software communicate with each other over the network. Serialization is a key step in this communication process. What medium do system communicate with?
  6. Examples of Web APIs

  7. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  8. None
  9. Serialization, what’s that?

  10. Introduction to Serialization (and Deserialization) Serialization is the process of

    translating object state into a format that can be transmitted and reconstructed later.
  11. Introduction to Serialization (and Deserialization) Serialization is the process of

    translating object state into a format that can be transmitted and reconstructed later.
  12. For APIs, communication is key.

  13. Reasons for Serialization • Communication: For transferring data between systems

    ◦ Systems need a shared language to exchange information ◦ The language has to be platform independent
  14. None
  15. Data serialization format

  16. None
  17. None
  18. Challenges of Serialization • Human readability • Types and validation

    • Schema evolution • Interface Definition / Documentation • Performance • Others
  19. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  20. None
  21. JSON (JavaScript Object Notation) • The de facto standard for

    data serialization on the web ◦ Easy to parse, generate, and read ◦ Human readable ◦ No schema ◦ No type checking • Easy to work with, but not very efficient over the wire • No built-in schema support
  22. JSON: Human readable { "first_name": "George", "last_name": "Washington", "birthday": "1732-02-22",

    "address": { "street_address": "3200 Mount Vernon Memorial Highway", "city": "Mount Vernon", "state": "Virginia", "country": "United States" } }
  23. • Type information from statically typed languages are ‘lost in

    translation’ • Validating messages is done by ad-hoc validation code, which needs to be written ◦ Checking if a required attribute exists ◦ Checking the types of an attribute ◦ Other validations No types
  24. JSON Schema

  25. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  26. None
  27. None
  28. MessagePack • Like JSON, but with efficient binary encoding ◦

    Not human readable ◦ Smaller: Takes less space ◦ Faster: Cut your client-server exchange traffic ◦ Schemas & Types (IDL) • Useful for systems that require low latency and high throughput. ◦ Realtime games & systems / APIs • Can be used alongside JSON
  29. MessagePack: More compact than JSON JSON: 27 bytes {“compact”: true,

    “schema”: 0} MessagePack: 18 bytes 82 a7 63 6f 6d 70 61 63 74 c3 a6 73 63 68 65 6d 61 00
  30. MessagePack Demo

  31. None
  32. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  33. None
  34. Protocol Buffers • A way of encoding structured data in

    an efficient yet extensible format. ◦ “The language of data” at Google ◦ Communication between internal services • Compact binary format • Schemas • Client generation
  35. “We carefully craft our data models inside our databases, maintain

    layers of code to keep these models in check, and then allow all that forethought to fly out of the window when we want to send that data over the wire to another service.”
  36. Protobufs: Schemas are awesome // Generated Java client Code Person

    john = Person.newBuilder() .setId(1234) .setName("John Doe") .setEmail("jdoe@example.com") .build(); output = new FileOutputStream(args[0]); john.writeTo(output);
  37. message Person { required string name = 1; required int32

    id = 2; optional string email = 3; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } repeated PhoneNumber phone = 4; }
  38. None
  39. Protobufs: Schema evolution We only know something once we start

    doing it. Can we add new fields to our schema over time, without breaking backwards-compatibility?
  40. Protobufs: Backward compatibility message Person { required int32 id =

    1 required string name = 2 optional string email = 3 } message Person { required int32 id = 1 required string name = 2 optional int32 age = 4 } • Old code will happily read new messages and simply ignore any new fields • To the old code, optional fields that were deleted will simply have their default value • New code will also transparently read old messages
  41. The de-facto serialization format of today Background JSON Message Pack

    Protocol Buffers Conclusion An efficient binary serialization format Next Steps Introduction to serialization For serializing structured data
  42. In Closing • When is JSON a good fit? ◦

    You want data to be human readable ◦ Data is consumed directly on the browser ◦ It’s not important to tie the data model to a schema • MessagePack ◦ When low latency and high throughput is key ◦ Internal communication • Protocol Buffers ◦ Serializing structured data with Schemas & Types ◦ Client generation across languages ◦ Backward compatibility & Schema evolution ◦ Internal communication
  43. None
  44. Thanks Yos Riady yos.io

  45. Questions Yos Riady yos.io