Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond JSON: Fantastic Serialization Formats and Where to Find Them

Yos Riady
January 13, 2017

Beyond JSON: Fantastic Serialization Formats and Where to Find Them

Today, JSON (Javascript Object Notation) is the de-facto serialization format for exchanging data between HTTP-connected services. Several features of JSON makes it a useful general purpose format: it's human readable, easy to learn, and the ubiquity of Javascript. In this talk, let's look beyond JSON. We'll learn about three different serialization formats (JSON, MessagePack, Protocol Buffers); and discover benefits unique to each.

https://goo.gl/f6ncAQ

Yos Riady

January 13, 2017
Tweet

More Decks by Yos Riady

Other Decks in Programming

Transcript

  1. Beyond JSON
    Fantastic Serialization Formats and Where To Find Them
    Yos Riady
    yos.io
    goo.gl/f6ncAQ

    View Slide

  2. Beyond JSON
    Fantastic Cerealization Formats and Where To Find Them
    Yos Riady
    yos.io
    goo.gl/f6ncAQ

    View Slide

  3. View Slide

  4. What’s a Web API?

    View Slide

  5. A Web API is a website for your program
    A Web API lets software communicate with each other over
    the network.
    Serialization is a key step in this communication process.
    What medium do system communicate with?

    View Slide

  6. Examples of Web APIs

    View Slide

  7. The de-facto
    serialization format of
    today
    Background JSON
    Message
    Pack
    Protocol
    Buffers
    Conclusion
    An efficient binary
    serialization format
    Next Steps
    Introduction to
    serialization
    For serializing
    structured data

    View Slide

  8. View Slide

  9. Serialization, what’s that?

    View Slide

  10. Introduction to Serialization (and Deserialization)
    Serialization is the process of translating object state into a format that can be
    transmitted and reconstructed later.

    View Slide

  11. Introduction to Serialization (and Deserialization)
    Serialization is the process of translating object state into a format that can be
    transmitted and reconstructed later.

    View Slide

  12. For APIs, communication is key.

    View Slide

  13. Reasons for Serialization
    ● Communication: For transferring data between systems
    ○ Systems need a shared language to exchange information
    ○ The language has to be platform independent

    View Slide

  14. View Slide

  15. Data serialization
    format

    View Slide

  16. View Slide

  17. View Slide

  18. Challenges of Serialization
    ● Human readability
    ● Types and validation
    ● Schema evolution
    ● Interface Definition / Documentation
    ● Performance
    ● Others

    View Slide

  19. The de-facto
    serialization format of
    today
    Background JSON
    Message
    Pack
    Protocol
    Buffers
    Conclusion
    An efficient binary
    serialization format Next Steps
    Introduction to
    serialization
    For serializing
    structured data

    View Slide

  20. View Slide

  21. JSON (JavaScript Object Notation)
    ● The de facto standard for data serialization on the web
    ○ Easy to parse, generate, and read
    ○ Human readable
    ○ No schema
    ○ No type checking
    ● Easy to work with, but not very efficient over the wire
    ● No built-in schema support

    View Slide

  22. JSON: Human readable
    {
    "first_name": "George",
    "last_name": "Washington",
    "birthday": "1732-02-22",
    "address": {
    "street_address": "3200 Mount Vernon Memorial Highway",
    "city": "Mount Vernon",
    "state": "Virginia",
    "country": "United States"
    }
    }

    View Slide

  23. ● Type information from statically typed languages
    are ‘lost in translation’
    ● Validating messages is done by ad-hoc validation
    code, which needs to be written
    ○ Checking if a required attribute exists
    ○ Checking the types of an attribute
    ○ Other validations
    No types

    View Slide

  24. JSON Schema

    View Slide

  25. The de-facto
    serialization format of
    today
    Background JSON
    Message
    Pack
    Protocol
    Buffers
    Conclusion
    An efficient binary
    serialization format
    Next Steps
    Introduction to
    serialization
    For serializing
    structured data

    View Slide

  26. View Slide

  27. View Slide

  28. MessagePack
    ● Like JSON, but with efficient binary encoding
    ○ Not human readable
    ○ Smaller: Takes less space
    ○ Faster: Cut your client-server exchange traffic
    ○ Schemas & Types (IDL)
    ● Useful for systems that require low latency and high throughput.
    ○ Realtime games & systems / APIs
    ● Can be used alongside JSON

    View Slide

  29. MessagePack: More compact than JSON
    JSON: 27 bytes
    {“compact”: true, “schema”: 0}
    MessagePack: 18 bytes
    82 a7 63 6f 6d 70 61 63 74 c3 a6 73 63 68 65 6d 61 00

    View Slide

  30. MessagePack Demo

    View Slide

  31. View Slide

  32. The de-facto
    serialization format of
    today
    Background JSON
    Message
    Pack
    Protocol
    Buffers
    Conclusion
    An efficient binary
    serialization format
    Next Steps
    Introduction to
    serialization
    For serializing
    structured data

    View Slide

  33. View Slide

  34. Protocol Buffers
    ● A way of encoding structured data in an efficient yet extensible format.
    ○ “The language of data” at Google
    ○ Communication between internal services
    ● Compact binary format
    ● Schemas
    ● Client generation

    View Slide

  35. “We carefully craft our data models inside
    our databases, maintain layers of code to
    keep these models in check, and then allow
    all that forethought to fly out of the window
    when we want to send that data over the
    wire to another service.”

    View Slide

  36. Protobufs: Schemas are awesome
    // Generated Java client Code
    Person john = Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("[email protected]")
    .build();
    output = new FileOutputStream(args[0]);
    john.writeTo(output);

    View Slide

  37. message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;
    enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
    }
    message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
    }
    repeated PhoneNumber phone = 4;
    }

    View Slide

  38. View Slide

  39. Protobufs: Schema evolution
    We only know something once we start doing it.
    Can we add new fields to our schema over time, without breaking
    backwards-compatibility?

    View Slide

  40. Protobufs: Backward compatibility
    message Person {
    required int32 id = 1
    required string name = 2
    optional string email = 3
    }
    message Person {
    required int32 id = 1
    required string name = 2
    optional int32 age = 4
    }
    ● Old code will happily read new
    messages and simply ignore any new
    fields
    ● To the old code, optional fields that
    were deleted will simply have their
    default value
    ● New code will also transparently read
    old messages

    View Slide

  41. The de-facto
    serialization format of
    today
    Background JSON
    Message
    Pack
    Protocol
    Buffers
    Conclusion
    An efficient binary
    serialization format Next Steps
    Introduction to
    serialization
    For serializing
    structured data

    View Slide

  42. In Closing
    ● When is JSON a good fit?
    ○ You want data to be human readable
    ○ Data is consumed directly on the browser
    ○ It’s not important to tie the data model to a schema
    ● MessagePack
    ○ When low latency and high throughput is key
    ○ Internal communication
    ● Protocol Buffers
    ○ Serializing structured data with Schemas & Types
    ○ Client generation across languages
    ○ Backward compatibility & Schema evolution
    ○ Internal communication

    View Slide

  43. View Slide

  44. Thanks
    Yos Riady
    yos.io

    View Slide

  45. Questions
    Yos Riady
    yos.io

    View Slide