Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JSON Schema Intro and Workshop - GA4GH Hinxton 2019

JSON Schema Intro and Workshop - GA4GH Hinxton 2019

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.
This presentation and workshop session introduces you to JSON Schema.
First we look at who uses JSON Schema and what they use it for.
Next we cover some key concepts and termonology to help you understand how vocabulary keywords work.
Then we look at a few JSON Schema key words for each category type.
Followed by a small workshop session, creating a JSON Schema.
Moving on, covering some mroe advanced keywords in JSON Schema.
Lastly, some upcoming draft-8 features!

To read more and support my ongoing work on JSON Schema, please see https://ko-fi.com/relequestual

D8b4889359df45a8c8099ef16bd96be9?s=128

Ben Hutton

April 29, 2019
Tweet

Transcript

  1. JSON Schema Workshop Validation and annotation of JSON documents 2018/04/29

    Ben Hutton – Senior Web Developer
  2. JSON Schema A vocabulary that allows you to annotate and

    validate JSON documents
  3. Overview • Who uses JSON Schema? • Case studies on

    the uses of JSON Schema • IETF and JSON Schema draft versions • Key concepts • Basic JSON Schema - Validation and Annotation • Let’s build a JSON Schema – Interactive. Laptops required! • Advanced JSON Schema – Application and Referencing • Let’s build a JSON Schema again! • Questions and schema troubleshooting
  4. JSON Schema for GA4GH Search API request format

  5. JSON Schema?

  6. JSON Schema?

  7. JSON Schema! 15,000,000+ Weekly downloads* from Node Package Manager in

    2019 so far * Weekly downloads of the package “ajv”, a JSON Schema Validator, as recorded by npm
  8. “What for?”

  9. It’s good to be validated! (Case Studies)

  10. Amazon API Gateway Data structure of a payload Request validation

    Generate an SDK Amazon Web Services https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html
  11. Amazon API Gateway “API Gateway can perform the basic validation.

    This enables you, the API developer, to focus on app-specific deep validation in the backend. For the basic validation, API Gateway verifies either or both of the following conditions: The required request parameters in the URI, query string, and headers of an incoming request are included and non-blank. The applicable request payload adheres to the configured JSON schema request model of the method.” OpenAPI Specification payload definitions “Currently, API Gateway supports generating an SDK for an API in Java, JavaScript, Java for Android, Objective-C or Swift for iOS, and Ruby.” https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html
  12. Gov.uk publishing system Check data for publishing is valid Contract

    testing between layers Collaboration across departments and teams Documentation generation Government Digital Service https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas
  13. GDS Publishing Platform Multiple applications Different teams Frequent changes Pull

    Requests require working implementations on multiple fronts before merging Examples normally required https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas
  14. Documentation https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas

  15. Example payloads https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas

  16. Metadata Ingestion Collaborative definitions Generate spreadsheets for users (and convert

    back to JSON) Documentation generation Validate user submissions Human Cell Atlas https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure
  17. Five major entities Used together to form an experiment (project)

    All validated using JSON Schema https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI
  18. JSON documents are self described Schemas are semantically versioned following

    clear major.minor.patch rules Agile and able to adapt to changes JSON Schema extension for validation of ontology terms Developed governance model to manage modifications https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI
  19. https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI

    Documentation
  20. API Documentation API Testing Cloud Platform as a Service provider

    https://blog.heroku.com/json-schema-document-debug-apis
  21. “When we at Heroku started testing our API with committee

    it immediately uncovered some inconsistencies between the existing JSON Schema files and what various API endpoints actually returned – and it turns out to be a gift that keeps giving.” Jessie Young, Heroku https://blog.heroku.com/json-schema-document-debug-apis
  22. Data Dictionary (models) and validation Firefox telemetry format MDN web

    docs data Form generation Cloud Deployment Management Google API Discovery Service Experience platform Form generation Validating tests Database level validation
  23. JSON Schema A vocabulary that allows you to annotate and

    validate JSON documents (and a few other things too!)
  24. JSON Schema Fundamentals

  25. Key Concepts Validation and annotation Let’s make a JSON Schema!

    Application keywords Referencing Let’s a JSON Schema! …Again! Not covering JSON Hyper Schema JSON Schema the specification
  26. Draft version? Core Validation <= Draft 4 >= Draft 5

    Draft 7 IETF draft document Personal drafts Draft 7: draft-handrews-json-schema-01 AND draft-handrews-json-schema-validation-01 http://json-schema.org/specification.html TODAY: Draft 7 and Draft 8
  27. The “instance” : The JSON document which is being validated

    or described by a JSON Schema The “schema” : The JSON Schema document A schema must be an Object or Boolean. Constraints based: Key Concepts Empty schema and `true` are equal Any valid JSON document passes validation `false` and ”not: empty schema” are equal Any valid JSON document fails validation https://tools.ietf.org/html/draft-handrews-json-schema-01
  28. Schema “keywords” : Object properties that are applied to the

    instance Keywords fall under one or both of two categories (mostly): Assertions : produce a boolean result when applied to an instance Annotations : attach information to an instance for application use Root Schema : Schema that is the whole JSON document Subschemas : A schema as a value of an object or array Some keywords take a schema as their value Key Concepts https://tools.ietf.org/html/draft-handrews-json-schema-01
  29. Validation? “JSON Schema validation applies schemas to locations within the

    instance, and asserts constraints on the structure of the data at each location. An instance location that satisfies all asserted constraints is then annotated with any keywords that contain non-assertion information, such as descriptive metadata and usage hints. If all locations within the instance satisfy all asserted constraints, then the instance is said to be valid against the schema.” Applicability : Determining which schema are applied to which instances - “Validation begins by applying the root schema to the complete instance document.” Assertions : Statement of fact in terms of valid or not. – “Each assertion adds constraints that an instance must satisfy in order to successfully validate.” Annotation : Labels or other metadata which apply to the instance data based on assertions. Key Concepts - Validation https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  30. An instance has one of six primitive types, and a

    range of possible values depending on the type: null : A JSON "null" value boolean : A JSON "true" or "false" value object : An unordered set of properties mapping a string to an instance array : An ordered list of instances number : An arbitrary-precision, base-10 decimal number value string : A string of Unicode code points JSON Primitive Types https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  31. Validation - Keywords for… • Any Instance Type • Numeric

    Instances (number and integer) • Strings • Arrays • Objects Validation Keywords https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  32. Validation – Keywords for Any Instance Type – “type” Validation

    Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `type` may be a String or an array of unique Strings. The String values must be one of "null", "boolean", "object", "array", "number", "string”, or "integer"
  33. Validation – Keywords for Numeric Instances – Ranges Validation Keyword

    Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance `minimum`, `maximum`, `exclusiveMinimum`, ` exclusiveMaximum`
  34. Validation – Keywords for String Instances – “pattern” Validation Keyword

    Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance `pattern` value should be a regex. The regex is not anchored! Regex for “does not include ‘nice’”
  35. Validation – Keywords for Array Instances – “items” Validation Keyword

    Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `items` can be a schema or an array of schemas. If the value is a schema, that schema is applicable to each instance in the array. If the value is an array of schemas, each schema is applicable to the instance at the same location in the array. You usually only want a schema as opposed to an array of schemas.
  36. Validation – Keywords for Array Instances – “uniqueItems” Validation Keyword

    Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance Must be a Boolean. With a value of `true`, asserts true if all items in the array are unique. A value of `always` always asserts true. Same as omitting the keyword.
  37. Validation – Keywords for Object Instances – “properties” Validation Keyword

    Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `properties` must be an object. The values of this object must be a JSON Schema. That JSON Schema is APPLIED to the child instance (or value) for the corresponding key in the instance object. ?
  38. Validation Keyword `properties` https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 The value for `age` is a

    JSON Schema, but it’s only applicable to the instance object FOR the matching key IF it exists. `properties` defines how child instances are validated, and not the immediate instance. It’s an APPLICATIOR key word. The values of this object must be a JSON Schema. That JSON Schema is APPLIED to the child instance (value) for the corresponding key in the instance object.
  39. Validation – Keywords for Object Instances – “required” Validation Keyword

    Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `required` must be an array. Validation is successful if every item in the array are also keys in the instance object.
  40. Annotation – “title” and “description” Annotation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema

    The value of `title` and `description` must be a string. They can both be used to “decorate” a user interface or documentation generated from the schema. Schema with annotations
  41. Let’s make a JSON Schema!

  42. ”$schema” keyword: Identifies the version of JSON Schema being used,

    and the location of the meta-schema. ”$id” keyword: The unique identifier for the schema, and the base URI for reference resolving (more on that later) …One more thing
  43. Go to http://bit.ly/ga4gh-json-schema-workshop-01 for all the links! A JSON Schema

    like structure (In YAML): https://github.com/ga4gh-schemablocks/blocks/blob/master/src/yaml/ontology_term.yaml Let’s make it a JSON Schema! You will need: A YAML to JSON to YAML converter: https://www.json2yaml.com A means to quickly and easily test a JSON Schema: https://www.jsonschemavalidator.net Core and Validation spec documents: http://json-schema.org/specification.html Slightly more friendly documentation and examples: http://json-schema.org/understanding-json-schema Example data (included in first link) The $schema and $id to start (Included in first link) Let’s make a JSON Schema! You may find it easier or faster to write in YAML
  44. None
  45. "pattern": "^\\w+:\\w+$” in JSON Slashes in strings have to be

    escaped! Semantic version with build metadata Must be an object ”id” is required, but label is not. (Not specified this way, but could be the case) ”description” is an annotation field. | (pipe) allows for multi line text in YAML. Newlines are replaced in the conversion to JSON with ”\n”. In YAML – because It’s easier to read! ”examples” is an annotation keyword, which must be an array, but there are no restrictions on the values of that array.
  46. Questions?

  47. Validation - Keywords for… • Any Instance Type • Numeric

    Instances (number and integer) • Strings • Arrays • Objects Application - Keywords for… • Applying Subschemas With Boolean Logic • Applying Subschemas Conditionally Validation Keywords https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  48. • Let’s take our previous simple schema • Add date

    of birth • Always require “name” • Require age OR date of birth Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema
  49. Application keywords -“oneOf”, “allOf”, “anyOf” Taking our previous “name” and

    “age” example… The previous schema didn’t make either value required, just specified their type if included. “name” is now required. “oneOf” must be an Array, where each value must be a schema. Validation is successful if exactly one of the schemas in the array validates successfully against the instance. What if we want ”age” AND / OR “dateOfBirth”? “anyOf” is similar, but “at least one” as opposed to “exactly one”. Subschemas applied with boolean logic! Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  50. Application keywords -“if”, “then”, “else” Conditional applicability The value of

    these keywords must be a schema. These schemas are “subschemas”. If the schema from “if” validates successfully, the “then” schema is applied to the instance. If the schema from “if” fails validation, the “else” schema is applied to the instance. Let’s try: “If age is less than 16, guardianName is required” Can anyone spot why this schema won’t do what you might expect? Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  51. Application keywords -“if”, “then”, “else” The value of these keywords

    must be a schema. These schemas are “subschemas”. The value of “if” is a ”valid” schema, but imposes no constraints, because “age” is not a JSON Schema keyword. Remember: no constraints is equivalent to an empty schema “{ }” or `true`, meaning validation passes. “age” must be wrapped in a “properties” keyword in order for its value to be applied to the instance, which them generates an assertion on pass or fail for validation. Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  52. Application keywords -“if”, “then”, “else” The value of these keywords

    must be a schema. These schemas are “subschemas”. The value of “if” is now a subschema which has constraints! Fixed! Common and easy error to make error. 2-3 times a week on the JSON Schema slack or StackOverflow. (Additionally, it should be “exclusiveMaximum”) Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01
  53. “definitions” and “$ref” “definitions” provides a place to put and

    reference reusable parts of a JSON Schema document. The value of “definitions” must be an object, where each value must be a schema. “An object schema with a “$ref” property must be interpreted as a ”$ref” references.” * The value of “$ref” must be a URI References. Referenced schema is applied to the instance. Other properties in the object schema must be ignored. Schema Reuse https://tools.ietf.org/html/draft-handrews-json-schema-01
  54. “$ref” and “$id” – URI resolution – RFC 3986 A

    reference of “#item” resolves against the base URI of this document to: http://example.net/root.json#item The schema for “single” identifies as ”#item”, and so the reference can be resolved within the same document. Think of a subschema’s use of a relative ”$id” as similar to giving an HTML element an ”id” and creating a link. The reference to “other.json” resolves against the base URI of this document to: http://example.net/other.json Not defined in this document. Users may preload other schemas into implementations or allow implementations to take network actions to resolve referenced schemas. Schema Reuse https://tools.ietf.org/html/draft-handrews-json-schema-01 The use of “$id” in subschemas could change the base URI of URI resolution. Therefore it is not advised unless you know what you’re doing and why. This gets complex. Please see: https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8 and RFC 3986
  55. Let’s make a JSON Schema! …again!

  56. Go to http://bit.ly/ga4gh-json-schema-workshop-01 for all the links! A JSON Schema

    like structure (In YAML): https://github.com/ga4gh-schemablocks/blocks/blob/master/src/yaml/ontology_term.yaml Let’s make it a JSON Schema! You will need: A YAML to JSON to YAML converter: https://www.json2yaml.com A means to quickly and easily test a JSON Schema: https://www.jsonschemavalidator.net Core and Validation spec documents: http://json-schema.org/specification.html Slightly more friendly documentation and examples: http://json-schema.org/understanding-json-schema Let’s make a JSON Schema! You may find it easier or faster to write in YAML
  57. Questions?

  58. Recap • Who uses JSON Schema? • Case studies on

    the uses of JSON Schema • IETF and JSON Schema draft versions • Key concepts • Basic JSON Schema - Validation and Annotation • Let’s build a JSON Schema – Interactive. Laptops required! • Advanced JSON Schema – Application and Referencing • Let’s build a JSON Schema again! • Questions and schema troubleshooting
  59. Moving Forward Draft 8

  60. But can JSON Schema… “Validate an ontology term based on

    the supplied ontology identifier?” Well… no. JSON Schema doesn’t prohibit you adding your own keywords though… “OK, so I’ll create a new one to do this!” https://github.com/elixir-europe/json-schema-validator As an npm package or as a standalone server. “This validator has three custom keywords implemented, `graph_restriction`, `isChildTermOf` and `isValidTerm`.” Uses EBI Ontology Lookup Service. Used by Human Cell Atlas and others! https://github.com/elixir-europe/BioHackathon/blob/master/interoperability/JSON%20schema%20validation%20with%20ontologies/README.md
  61. But can JSON Schema… “Validate an ontology term based on

    the supplied ontology identifier?” If you add to the keyword vocabulary. But you’ll have to tell people in advance! Enter: “$vocabularies” Take a schema and extend it by creating a new meta-schema. The new meta-schema defines the “$vocabularies” it uses. Your new schemas “$schema” value of the “$id” from your new meta-schema. This allows implementations to dynamically load libraries that support additional sets of keywords, if they are provided and identified. https://github.com/elixir-europe/BioHackathon/blob/master/interoperability/JSON%20schema%20validation%20with%20ontologies/README.md
  62. Still lots more!

  63. Acknowledgements

  64. Thank you! Ben Hutton Wellcome Sanger Institute JSON Schema core

    Github: relequestual Twitter: @relequestual http://json-schema.org Includes link to join slack Additional Sponsors: http://bit.ly/json-schema-work