$30 off During Our Annual Pro Sale. View Details »

JSON Schema Intro and Workshop - GA4GH Hinxton 2019

JSON Schema Intro and Workshop - GA4GH Hinxton 2019

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.
This presentation and workshop session introduces you to JSON Schema.
First we look at who uses JSON Schema and what they use it for.
Next we cover some key concepts and termonology to help you understand how vocabulary keywords work.
Then we look at a few JSON Schema key words for each category type.
Followed by a small workshop session, creating a JSON Schema.
Moving on, covering some mroe advanced keywords in JSON Schema.
Lastly, some upcoming draft-8 features!

To read more and support my ongoing work on JSON Schema, please see https://ko-fi.com/relequestual

Ben Hutton

April 29, 2019
Tweet

More Decks by Ben Hutton

Other Decks in Technology

Transcript

  1. JSON Schema
    Workshop
    Validation and annotation of JSON documents
    2018/04/29
    Ben Hutton – Senior Web Developer

    View Slide

  2. JSON Schema
    A vocabulary that allows you to annotate and validate JSON documents

    View Slide

  3. Overview
    • Who uses JSON Schema?
    • Case studies on the uses of JSON Schema
    • IETF and JSON Schema draft versions
    • Key concepts
    • Basic JSON Schema - Validation and Annotation
    • Let’s build a JSON Schema – Interactive. Laptops required!
    • Advanced JSON Schema – Application and Referencing
    • Let’s build a JSON Schema again!
    • Questions and schema troubleshooting

    View Slide

  4. JSON Schema for GA4GH
    Search API request format

    View Slide

  5. JSON Schema?

    View Slide

  6. JSON Schema?

    View Slide

  7. JSON Schema!
    15,000,000+ Weekly downloads*
    from Node Package Manager
    in 2019 so far
    * Weekly downloads of the package “ajv”, a JSON Schema Validator, as recorded by npm

    View Slide

  8. “What for?”

    View Slide

  9. It’s good to be validated!
    (Case Studies)

    View Slide

  10. Amazon API Gateway
    Data structure of a payload
    Request validation
    Generate an SDK
    Amazon Web
    Services
    https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html

    View Slide

  11. Amazon API Gateway
    “API Gateway can perform the basic validation. This enables you, the API developer, to focus on
    app-specific deep validation in the backend. For the basic validation, API Gateway verifies either
    or both of the following conditions:
    The required request parameters in the URI, query string, and headers of an incoming request
    are included and non-blank.
    The applicable request payload adheres to the configured JSON schema request model of the
    method.”
    OpenAPI Specification payload definitions
    “Currently, API Gateway supports generating an SDK for an API in Java, JavaScript, Java for
    Android, Objective-C or Swift for iOS, and Ruby.”
    https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html

    View Slide

  12. Gov.uk publishing system
    Check data for publishing is valid
    Contract testing between layers
    Collaboration across departments and teams
    Documentation generation
    Government
    Digital Service
    https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/
    https://github.com/alphagov/govuk-content-schemas

    View Slide

  13. GDS Publishing Platform
    Multiple applications
    Different teams
    Frequent changes
    Pull Requests require working implementations
    on multiple fronts before merging
    Examples normally required
    https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/
    https://github.com/alphagov/govuk-content-schemas

    View Slide

  14. Documentation
    https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/
    https://github.com/alphagov/govuk-content-schemas

    View Slide

  15. Example payloads
    https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/
    https://github.com/alphagov/govuk-content-schemas

    View Slide

  16. Metadata Ingestion
    Collaborative definitions
    Generate spreadsheets for users
    (and convert back to JSON)
    Documentation generation
    Validate user submissions
    Human Cell Atlas
    https://github.com/HumanCellAtlas/metadata-schema
    https://prod.data.humancellatlas.org/metadata/design-principles/structure

    View Slide

  17. Five major entities
    Used together to form an experiment
    (project)
    All validated using JSON Schema
    https://github.com/HumanCellAtlas/metadata-schema
    https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI

    View Slide

  18. JSON documents are self described
    Schemas are semantically versioned
    following clear major.minor.patch rules
    Agile and able to adapt to changes
    JSON Schema extension for validation of
    ontology terms
    Developed governance model to manage
    modifications
    https://github.com/HumanCellAtlas/metadata-schema
    https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI

    View Slide

  19. https://github.com/HumanCellAtlas/metadata-schema
    https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI
    Documentation

    View Slide

  20. API Documentation
    API Testing
    Cloud Platform
    as a Service
    provider
    https://blog.heroku.com/json-schema-document-debug-apis

    View Slide

  21. “When we at Heroku started testing
    our API with committee it immediately
    uncovered some inconsistencies
    between the existing JSON Schema
    files and what various API endpoints
    actually returned – and it turns out to
    be a gift that keeps giving.”
    Jessie Young, Heroku
    https://blog.heroku.com/json-schema-document-debug-apis

    View Slide

  22. Data Dictionary (models) and validation
    Firefox telemetry format
    MDN web docs data
    Form generation
    Cloud Deployment Management
    Google API Discovery Service
    Experience platform
    Form generation
    Validating tests
    Database level validation

    View Slide

  23. JSON Schema
    A vocabulary that allows you to annotate and validate JSON documents
    (and a few other things too!)

    View Slide

  24. JSON Schema
    Fundamentals

    View Slide

  25. Key Concepts
    Validation and annotation
    Let’s make a JSON Schema!
    Application keywords
    Referencing
    Let’s a JSON Schema! …Again!
    Not covering JSON Hyper Schema
    JSON Schema
    the specification

    View Slide

  26. Draft version?
    Core
    Validation
    <= Draft 4 >= Draft 5 Draft 7
    IETF draft document
    Personal drafts
    Draft 7:
    draft-handrews-json-schema-01
    AND
    draft-handrews-json-schema-validation-01
    http://json-schema.org/specification.html
    TODAY: Draft 7 and Draft 8

    View Slide

  27. The “instance” : The JSON document which is being validated or described by a JSON Schema
    The “schema” : The JSON Schema document
    A schema must be an Object or Boolean.
    Constraints based:
    Key Concepts
    Empty schema and `true` are equal
    Any valid JSON document passes validation
    `false` and ”not: empty schema” are equal
    Any valid JSON document fails validation
    https://tools.ietf.org/html/draft-handrews-json-schema-01

    View Slide

  28. Schema “keywords” : Object properties that are applied to the instance
    Keywords fall under one or both of two categories (mostly):
    Assertions : produce a boolean result when applied to an instance
    Annotations : attach information to an instance for application use
    Root Schema : Schema that is the whole JSON document
    Subschemas : A schema as a value of an object or array
    Some keywords take a schema as their value
    Key Concepts
    https://tools.ietf.org/html/draft-handrews-json-schema-01

    View Slide

  29. Validation?
    “JSON Schema validation applies schemas to locations within the instance, and asserts constraints on the
    structure of the data at each location. An instance location that satisfies all asserted constraints is then
    annotated with any keywords that contain non-assertion information, such as descriptive metadata and usage
    hints. If all locations within the instance satisfy all asserted constraints, then the instance is said to be valid
    against the schema.”
    Applicability : Determining which schema are applied to which instances - “Validation begins by applying the
    root schema to the complete instance document.”
    Assertions : Statement of fact in terms of valid or not. – “Each assertion adds constraints that an instance must
    satisfy in order to successfully validate.”
    Annotation : Labels or other metadata which apply to the instance data based on assertions.
    Key Concepts - Validation
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  30. An instance has one of six primitive types, and a range of possible values depending on the type:
    null : A JSON "null" value
    boolean : A JSON "true" or "false" value
    object : An unordered set of properties mapping a string to an instance
    array : An ordered list of instances
    number : An arbitrary-precision, base-10 decimal number value
    string : A string of Unicode code points
    JSON Primitive Types
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  31. Validation - Keywords for…
    • Any Instance Type
    • Numeric Instances (number and integer)
    • Strings
    • Arrays
    • Objects
    Validation Keywords
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  32. Validation – Keywords for Any Instance Type – “type”
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    The value of `type` may be a String or an array of unique Strings.
    The String values must be one of "null", "boolean", "object", "array", "number", "string”, or "integer"

    View Slide

  33. Validation – Keywords for Numeric Instances – Ranges
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    `minimum`, `maximum`, `exclusiveMinimum`, ` exclusiveMaximum`

    View Slide

  34. Validation – Keywords for String Instances – “pattern”
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    `pattern` value should be a regex.
    The regex is not anchored!
    Regex for “does not include ‘nice’”

    View Slide

  35. Validation – Keywords for Array Instances – “items”
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    The value of `items` can be a schema or an array of schemas.
    If the value is a schema, that schema is applicable to each instance in the array.
    If the value is an array of schemas, each schema is applicable to the instance at the same location in the array.
    You usually only want a schema as opposed to an array of schemas.

    View Slide

  36. Validation – Keywords for Array Instances – “uniqueItems”
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    Must be a Boolean.
    With a value of `true`, asserts true if all items in the array are unique.
    A value of `always` always asserts true. Same as omitting the keyword.

    View Slide

  37. Validation – Keywords for Object Instances – “properties”
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    The value of `properties` must be an object.
    The values of this object must be a JSON Schema.
    That JSON Schema is APPLIED to the child instance (or value) for the corresponding key in the instance object.
    ?

    View Slide

  38. Validation Keyword
    `properties`
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    The value for `age` is a JSON Schema, but it’s only applicable to the instance object FOR the matching key IF it exists.
    `properties` defines how child instances are validated, and not the immediate instance.
    It’s an APPLICATIOR key word.
    The values of this object must be a JSON Schema.
    That JSON Schema is APPLIED to the child instance (value) for the corresponding key in the instance object.

    View Slide

  39. Validation – Keywords for Object Instances – “required”
    Validation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema Valid instance Invalid instance
    The value of `required` must be an array.
    Validation is successful if every item in the array are also keys in the instance object.

    View Slide

  40. Annotation – “title” and “description”
    Annotation Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema
    The value of `title` and `description` must be a string.
    They can both be used to “decorate” a user interface or documentation generated from the schema.
    Schema with annotations

    View Slide

  41. Let’s make a
    JSON Schema!

    View Slide

  42. ”$schema” keyword:
    Identifies the version of JSON Schema being used, and the location of the meta-schema.
    ”$id” keyword:
    The unique identifier for the schema, and the base URI for reference resolving (more on that later)
    …One more thing

    View Slide

  43. Go to http://bit.ly/ga4gh-json-schema-workshop-01 for all the links!
    A JSON Schema like structure (In YAML): https://github.com/ga4gh-schemablocks/blocks/blob/master/src/yaml/ontology_term.yaml
    Let’s make it a JSON Schema!
    You will need:
    A YAML to JSON to YAML converter: https://www.json2yaml.com
    A means to quickly and easily test a JSON Schema: https://www.jsonschemavalidator.net
    Core and Validation spec documents: http://json-schema.org/specification.html
    Slightly more friendly documentation and examples: http://json-schema.org/understanding-json-schema
    Example data (included in first link)
    The $schema and $id to start (Included in first link)
    Let’s make a
    JSON Schema!
    You may find it easier or faster to write in YAML

    View Slide

  44. View Slide

  45. "pattern": "^\\w+:\\w+$” in JSON
    Slashes in strings have to be escaped!
    Semantic version with build metadata
    Must be an object
    ”id” is required, but label is not.
    (Not specified this way, but could be the case)
    ”description” is an annotation field.
    | (pipe) allows for multi line text in YAML.
    Newlines are replaced in the conversion to
    JSON with ”\n”.
    In YAML – because It’s easier to read!
    ”examples” is an annotation keyword, which must be
    an array, but there are no restrictions on the values of
    that array.

    View Slide

  46. Questions?

    View Slide

  47. Validation - Keywords for…
    • Any Instance Type
    • Numeric Instances (number and integer)
    • Strings
    • Arrays
    • Objects
    Application - Keywords for…
    • Applying Subschemas With Boolean Logic
    • Applying Subschemas Conditionally
    Validation Keywords
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  48. • Let’s take our previous simple schema
    • Add date of birth
    • Always require “name”
    • Require age OR date of birth
    Application Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
    Schema

    View Slide

  49. Application keywords -“oneOf”, “allOf”, “anyOf”
    Taking our previous “name” and “age” example…
    The previous schema didn’t make either value required, just
    specified their type if included.
    “name” is now required.
    “oneOf” must be an Array, where each value must be a schema.
    Validation is successful if exactly one of the schemas in the array
    validates successfully against the instance.
    What if we want ”age” AND / OR “dateOfBirth”?
    “anyOf” is similar, but “at least one” as opposed to “exactly one”.
    Subschemas applied with boolean logic!
    Application Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  50. Application keywords -“if”, “then”, “else”
    Conditional applicability
    The value of these keywords must be a schema. These
    schemas are “subschemas”.
    If the schema from “if” validates successfully, the “then”
    schema is applied to the instance.
    If the schema from “if” fails validation, the “else” schema
    is applied to the instance.
    Let’s try: “If age is less than 16, guardianName is required”
    Can anyone spot why this schema won’t do what you
    might expect?
    Application Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  51. Application keywords -“if”, “then”, “else”
    The value of these keywords must be a schema. These schemas
    are “subschemas”.
    The value of “if” is a ”valid” schema, but imposes no constraints,
    because “age” is not a JSON Schema keyword.
    Remember: no constraints is equivalent to an empty schema “{ }”
    or `true`, meaning validation passes.
    “age” must be wrapped in a “properties” keyword in order for its
    value to be applied to the instance, which them generates an
    assertion on pass or fail for validation.
    Application Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  52. Application keywords -“if”, “then”, “else”
    The value of these keywords must be a schema. These schemas
    are “subschemas”.
    The value of “if” is now a subschema which has constraints!
    Fixed!
    Common and easy error to make error.
    2-3 times a week on the JSON Schema slack or StackOverflow.
    (Additionally, it should be “exclusiveMaximum”)
    Application Keyword
    Examples
    https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

    View Slide

  53. “definitions” and “$ref”
    “definitions” provides a place to put and reference reusable
    parts of a JSON Schema document.
    The value of “definitions” must be an object, where each
    value must be a schema.
    “An object schema with a “$ref” property must be
    interpreted as a ”$ref” references.” *
    The value of “$ref” must be a URI References.
    Referenced schema is applied to the instance.
    Other properties in the object schema must be ignored.
    Schema Reuse
    https://tools.ietf.org/html/draft-handrews-json-schema-01

    View Slide

  54. “$ref” and “$id” – URI resolution – RFC 3986
    A reference of “#item” resolves against the base URI of this
    document to:
    http://example.net/root.json#item
    The schema for “single” identifies as ”#item”, and so the
    reference can be resolved within the same document.
    Think of a subschema’s use of a relative ”$id” as similar to
    giving an HTML element an ”id” and creating a link.
    The reference to “other.json” resolves against the base URI of
    this document to:
    http://example.net/other.json
    Not defined in this document.
    Users may preload other schemas into implementations or
    allow implementations to take network actions to resolve
    referenced schemas.
    Schema Reuse
    https://tools.ietf.org/html/draft-handrews-json-schema-01
    The use of “$id” in subschemas could change the base URI of URI
    resolution. Therefore it is not advised unless you know what you’re
    doing and why. This gets complex. Please see:
    https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8
    and RFC 3986

    View Slide

  55. Let’s make a
    JSON Schema!
    …again!

    View Slide

  56. Go to http://bit.ly/ga4gh-json-schema-workshop-01 for all the links!
    A JSON Schema like structure (In YAML): https://github.com/ga4gh-schemablocks/blocks/blob/master/src/yaml/ontology_term.yaml
    Let’s make it a JSON Schema!
    You will need:
    A YAML to JSON to YAML converter: https://www.json2yaml.com
    A means to quickly and easily test a JSON Schema: https://www.jsonschemavalidator.net
    Core and Validation spec documents: http://json-schema.org/specification.html
    Slightly more friendly documentation and examples: http://json-schema.org/understanding-json-schema
    Let’s make a
    JSON Schema!
    You may find it easier or faster to write in YAML

    View Slide

  57. Questions?

    View Slide

  58. Recap
    • Who uses JSON Schema?
    • Case studies on the uses of JSON Schema
    • IETF and JSON Schema draft versions
    • Key concepts
    • Basic JSON Schema - Validation and Annotation
    • Let’s build a JSON Schema – Interactive. Laptops required!
    • Advanced JSON Schema – Application and Referencing
    • Let’s build a JSON Schema again!
    • Questions and schema troubleshooting

    View Slide

  59. Moving Forward
    Draft 8

    View Slide

  60. But can JSON Schema…
    “Validate an ontology term based on the supplied ontology identifier?”
    Well… no. JSON Schema doesn’t prohibit you adding your own keywords though…
    “OK, so I’ll create a new one to do this!”
    https://github.com/elixir-europe/json-schema-validator
    As an npm package or as a standalone server.
    “This validator has three custom keywords implemented, `graph_restriction`,
    `isChildTermOf` and `isValidTerm`.”
    Uses EBI Ontology Lookup Service.
    Used by Human Cell Atlas and others!
    https://github.com/elixir-europe/BioHackathon/blob/master/interoperability/JSON%20schema%20validation%20with%20ontologies/README.md

    View Slide

  61. But can JSON Schema…
    “Validate an ontology term based on the supplied ontology identifier?”
    If you add to the keyword vocabulary. But you’ll have to tell people in advance!
    Enter: “$vocabularies”
    Take a schema and extend it by creating a new meta-schema.
    The new meta-schema defines the “$vocabularies” it uses.
    Your new schemas “$schema” value of the “$id” from your new meta-schema.
    This allows implementations to dynamically load libraries that support additional sets
    of keywords, if they are provided and identified.
    https://github.com/elixir-europe/BioHackathon/blob/master/interoperability/JSON%20schema%20validation%20with%20ontologies/README.md

    View Slide

  62. Still lots more!

    View Slide

  63. Acknowledgements

    View Slide

  64. Thank you!
    Ben Hutton
    Wellcome Sanger Institute
    JSON Schema core
    Github: relequestual
    Twitter: @relequestual
    http://json-schema.org
    Includes link to join slack
    Additional Sponsors:
    http://bit.ly/json-schema-work

    View Slide