Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Extensibility, Forward Compatibility, and Structured Data in Cloud Events

Extensibility, Forward Compatibility, and Structured Data in Cloud Events

An explanation of the arguments for and against changing the JSON spec from one that can be described in Proto Lang to one that cannot

Rachel Myers

August 23, 2018
Tweet

More Decks by Rachel Myers

Other Decks in Technology

Transcript

  1. Confidential + Proprietary Proto Lang The .proto file lets developers

    define a message type using the Protobuf Interface Definition Language to generate libraries and encodings in multiple languages. One of the encodings is JSON; we’re bending over backwards to define a CloudEvents message in Proto Lang that will generate JSON defined in json_format.md. syntax = "proto3"; package io.cloudevents.v0; import "google/protobuf/any.proto"; import "google/protobuf/struct.proto"; import "google/protobuf/timestamp.proto"; option go_package = "cloudevents.io/protobuf/"; option java_package = "io.cloudevents"; option java_multiple_files = true; message CloudEvent { string event_type = 1; string event_type_version = 2; string cloud_events_version = 3; string source = 4; string event_id = 5; google.protobuf.Timestamp timestamp = 6; string schema_url = 7; string content_type = 8; oneof payload { google.protobuf.Value data = 9; bytes bytes_data = 10; google.protobuf.Any proto_data = 11; } google.protobuf.Struct extensions = 12; } cloudevents.proto
  2. Confidential + Proprietary Proto Binary • The other use of

    “proto” is an alternative serialization format, the Proto Binary. • It’s used both internally and externally when higher performance is required.
  3. Confidential + Proprietary Proto disambiguation. • We define a CloudEvents

    message in Proto Lang that will generate serializations for both JSON and Proto. JSON serialization is described here • A source of confusion has been that the Proto Binary and JSON serializations are independent – they’re not if they’re both generated by a common Proto Lang definition. • Only a subset of valid JSON can be generated by Proto Lang, because it enforces guardrails • Moving extensions to top level properties makes it impossible to define a spec-compliant CloudEvents in Proto Lang • We acknowledge that the extensions bag is clunkier and makes the promotion process for JSON-only systems more complicated. We think it makes the spec more reliable and useful for more systems
  4. Confidential + Proprietary Proto (Proto Lang and Proto Binary) are

    widely used • Proto is used extensively by other CNCF projects, like gRPC and Kubernetes, and our spec should play well with the other CNCF projects • Proto is used at companies with popular APIs that we’d love to have support CloudEvents; proto’s public mailing list has ~4000 users • Publishing a CloudEvents definition in Proto Lang would let those companies quickly start using CloudEvents with their existing Proto-based systems • If CloudEvents JSON cannot be expressed in Proto Lang, every company using Proto internally will have a higher cost to start sending and receiving CloudEvents
  5. Confidential + Proprietary Demo An example (using @duglin’s jsonext tool)

    shows that we get non-deterministic behavior if we circumvent Proto Lang’s guardrails for JSON Sticking to the Proto Lang gives us a balance of forward compatibility and extensibility
  6. Confidential + Proprietary Forward Compatibility We want to be able

    to add new attributes to the spec in the future without breaking event consumers that use the old version
  7. Confidential + Proprietary Forward compatibility in JSON: • JSON keys

    are strings • If keys are uniquely named, there are no collisions • All values are json primitives • Future iterations of the spec that add new known properties were assumed to be a non-breaking change for existing JSON event consumers
  8. Confidential + Proprietary Forward compatibility in Proto Binary: Why should

    event consumers using Proto Binary not use “unknown fields” for extensions? • “Unknown fields” doesn’t provide forward compatibility ◦ Top level keys are integers ◦ We’ll use a high number for the integer to avoid collisions instead of a normal low-index int ◦ It wouldn’t be convertible to JSON if it only has an int id. • It’s very hard to use two extensions ◦ An event consumer would have to individually write an additional Proto Lang definition for the base spec that combines the two extensions into a new tag • If the extensions are in in a property bag instead of unknown fields, then promoting them from property bags to a known field is a major change, not a minor change, ◦ Even though it’s a minor change for JSON, it’s a major change for binary formats. ◦ IMPORTANT QUESTION: Will the WG increment semantic versioning only when a change will break a JSON-only event consumer?
  9. Confidential + Proprietary Extensibility via a property bag: The Pros

    • For event consumers using a binary format, extensions can be used without special handling: ◦ A vendor-specific extension sampledRate is added to the extensions property bag. ◦ Event consumers can assign extensions to be a struct, which is designed to handle arbitrary JSON keys and values ◦ A lot of work has gone into making the conversion between JSON and Proto Binary smooth, and we can take advantage of this • For event consumers using a JSON-only format, extensions can be used without special handling
  10. Confidential + Proprietary Extensibility via a property bag: The Cons

    • Promoting an extension to a top level attribute is a breaking change for both binary and JSON formats ◦ For example, if sampledRate is widely used and promoted to the top level ▪ To be backwards compatible, event consumers will need to continue to accept CE v1.0, where sampledRate is in the extensions bag ▪ To support CE v2.0, event consumers will need to look for sampledRate as a top level property • Avoiding the breaking change is a motivation to move away from the property bag and put all extensions at the top level
  11. Confidential + Proprietary Extensibility via top level properties • For

    JSON-only implementations ◦ Arbitrary top level attributes are fine, as long as they are uniquely named ◦ The promotion path is seamless; event consumers will see no change between the attribute being an extension and being a known attribute • For Proto Lang implementations ◦ Cannot easily handle arbitrary top-level attributes ◦ Workaround 1: Hand-craft the Proto Binary, adding the known attributes to integer-keyed top level attributes, and adding unknown properties to a property bag ▪ This requires abandoning built-in conversion between JSON and Proto Binary provided by Proto Lang and special casing CloudEvents in every system that wants to support it ◦ Workaround 2: Event consumers could drop unknown attributes, effectively dropping extensibility
  12. Confidential + Proprietary Conclusion • If the extensions bag is

    removed, the JSON format cannot be expressed by Proto Lang, and Google will be unable to avoid fracturing the spec (duplicate JSON format, differing compatibility across versions). • We know it’s a sacrifice to give up the cleaner JSON and seamless property promotion for JSON only systems. We believe the change will also limit the usefulness of the spec • Requests for the group: ◦ Understand the conflict between forward compatibility, extensibility, and structured data, and decide which goals we value and which to sacrifice. ◦ If the WG wants to sacrifice support for structs, or ◦ If the WG will only declare breaking changes when they break JSON-only systems, we need to know sooner rather than later