Slide 1

Slide 1 text

JSON Schema Workshop Validation and annotation of JSON documents 2018/04/29 Ben Hutton – Senior Web Developer

Slide 2

Slide 2 text

JSON Schema A vocabulary that allows you to annotate and validate JSON documents

Slide 3

Slide 3 text

Overview • Who uses JSON Schema? • Case studies on the uses of JSON Schema • IETF and JSON Schema draft versions • Key concepts • Basic JSON Schema - Validation and Annotation • Let’s build a JSON Schema – Interactive. Laptops required! • Advanced JSON Schema – Application and Referencing • Let’s build a JSON Schema again! • Questions and schema troubleshooting

Slide 4

Slide 4 text

JSON Schema for GA4GH Search API request format

Slide 5

Slide 5 text

JSON Schema?

Slide 6

Slide 6 text

JSON Schema?

Slide 7

Slide 7 text

JSON Schema! 15,000,000+ Weekly downloads* from Node Package Manager in 2019 so far * Weekly downloads of the package “ajv”, a JSON Schema Validator, as recorded by npm

Slide 8

Slide 8 text

“What for?”

Slide 9

Slide 9 text

It’s good to be validated! (Case Studies)

Slide 10

Slide 10 text

Amazon API Gateway Data structure of a payload Request validation Generate an SDK Amazon Web Services https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html

Slide 11

Slide 11 text

Amazon API Gateway “API Gateway can perform the basic validation. This enables you, the API developer, to focus on app-specific deep validation in the backend. For the basic validation, API Gateway verifies either or both of the following conditions: The required request parameters in the URI, query string, and headers of an incoming request are included and non-blank. The applicable request payload adheres to the configured JSON schema request model of the method.” OpenAPI Specification payload definitions “Currently, API Gateway supports generating an SDK for an API in Java, JavaScript, Java for Android, Objective-C or Swift for iOS, and Ruby.” https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html

Slide 12

Slide 12 text

Gov.uk publishing system Check data for publishing is valid Contract testing between layers Collaboration across departments and teams Documentation generation Government Digital Service https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas

Slide 13

Slide 13 text

GDS Publishing Platform Multiple applications Different teams Frequent changes Pull Requests require working implementations on multiple fronts before merging Examples normally required https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas

Slide 14

Slide 14 text

Documentation https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas

Slide 15

Slide 15 text

Example payloads https://technology.blog.gov.uk/2015/01/07/validating-a-distributed-architecture-with-json-schema/ https://github.com/alphagov/govuk-content-schemas

Slide 16

Slide 16 text

Metadata Ingestion Collaborative definitions Generate spreadsheets for users (and convert back to JSON) Documentation generation Validate user submissions Human Cell Atlas https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure

Slide 17

Slide 17 text

Five major entities Used together to form an experiment (project) All validated using JSON Schema https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI

Slide 18

Slide 18 text

JSON documents are self described Schemas are semantically versioned following clear major.minor.patch rules Agile and able to adapt to changes JSON Schema extension for validation of ontology terms Developed governance model to manage modifications https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI

Slide 19

Slide 19 text

https://github.com/HumanCellAtlas/metadata-schema https://prod.data.humancellatlas.org/metadata/design-principles/structure Images from presentation by Mallory Freeberg @ EBI Documentation

Slide 20

Slide 20 text

API Documentation API Testing Cloud Platform as a Service provider https://blog.heroku.com/json-schema-document-debug-apis

Slide 21

Slide 21 text

“When we at Heroku started testing our API with committee it immediately uncovered some inconsistencies between the existing JSON Schema files and what various API endpoints actually returned – and it turns out to be a gift that keeps giving.” Jessie Young, Heroku https://blog.heroku.com/json-schema-document-debug-apis

Slide 22

Slide 22 text

Data Dictionary (models) and validation Firefox telemetry format MDN web docs data Form generation Cloud Deployment Management Google API Discovery Service Experience platform Form generation Validating tests Database level validation

Slide 23

Slide 23 text

JSON Schema A vocabulary that allows you to annotate and validate JSON documents (and a few other things too!)

Slide 24

Slide 24 text

JSON Schema Fundamentals

Slide 25

Slide 25 text

Key Concepts Validation and annotation Let’s make a JSON Schema! Application keywords Referencing Let’s a JSON Schema! …Again! Not covering JSON Hyper Schema JSON Schema the specification

Slide 26

Slide 26 text

Draft version? Core Validation <= Draft 4 >= Draft 5 Draft 7 IETF draft document Personal drafts Draft 7: draft-handrews-json-schema-01 AND draft-handrews-json-schema-validation-01 http://json-schema.org/specification.html TODAY: Draft 7 and Draft 8

Slide 27

Slide 27 text

The “instance” : The JSON document which is being validated or described by a JSON Schema The “schema” : The JSON Schema document A schema must be an Object or Boolean. Constraints based: Key Concepts Empty schema and `true` are equal Any valid JSON document passes validation `false` and ”not: empty schema” are equal Any valid JSON document fails validation https://tools.ietf.org/html/draft-handrews-json-schema-01

Slide 28

Slide 28 text

Schema “keywords” : Object properties that are applied to the instance Keywords fall under one or both of two categories (mostly): Assertions : produce a boolean result when applied to an instance Annotations : attach information to an instance for application use Root Schema : Schema that is the whole JSON document Subschemas : A schema as a value of an object or array Some keywords take a schema as their value Key Concepts https://tools.ietf.org/html/draft-handrews-json-schema-01

Slide 29

Slide 29 text

Validation? “JSON Schema validation applies schemas to locations within the instance, and asserts constraints on the structure of the data at each location. An instance location that satisfies all asserted constraints is then annotated with any keywords that contain non-assertion information, such as descriptive metadata and usage hints. If all locations within the instance satisfy all asserted constraints, then the instance is said to be valid against the schema.” Applicability : Determining which schema are applied to which instances - “Validation begins by applying the root schema to the complete instance document.” Assertions : Statement of fact in terms of valid or not. – “Each assertion adds constraints that an instance must satisfy in order to successfully validate.” Annotation : Labels or other metadata which apply to the instance data based on assertions. Key Concepts - Validation https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 30

Slide 30 text

An instance has one of six primitive types, and a range of possible values depending on the type: null : A JSON "null" value boolean : A JSON "true" or "false" value object : An unordered set of properties mapping a string to an instance array : An ordered list of instances number : An arbitrary-precision, base-10 decimal number value string : A string of Unicode code points JSON Primitive Types https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 31

Slide 31 text

Validation - Keywords for… • Any Instance Type • Numeric Instances (number and integer) • Strings • Arrays • Objects Validation Keywords https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 32

Slide 32 text

Validation – Keywords for Any Instance Type – “type” Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `type` may be a String or an array of unique Strings. The String values must be one of "null", "boolean", "object", "array", "number", "string”, or "integer"

Slide 33

Slide 33 text

Validation – Keywords for Numeric Instances – Ranges Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance `minimum`, `maximum`, `exclusiveMinimum`, ` exclusiveMaximum`

Slide 34

Slide 34 text

Validation – Keywords for String Instances – “pattern” Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance `pattern` value should be a regex. The regex is not anchored! Regex for “does not include ‘nice’”

Slide 35

Slide 35 text

Validation – Keywords for Array Instances – “items” Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `items` can be a schema or an array of schemas. If the value is a schema, that schema is applicable to each instance in the array. If the value is an array of schemas, each schema is applicable to the instance at the same location in the array. You usually only want a schema as opposed to an array of schemas.

Slide 36

Slide 36 text

Validation – Keywords for Array Instances – “uniqueItems” Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance Must be a Boolean. With a value of `true`, asserts true if all items in the array are unique. A value of `always` always asserts true. Same as omitting the keyword.

Slide 37

Slide 37 text

Validation – Keywords for Object Instances – “properties” Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `properties` must be an object. The values of this object must be a JSON Schema. That JSON Schema is APPLIED to the child instance (or value) for the corresponding key in the instance object. ?

Slide 38

Slide 38 text

Validation Keyword `properties` https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 The value for `age` is a JSON Schema, but it’s only applicable to the instance object FOR the matching key IF it exists. `properties` defines how child instances are validated, and not the immediate instance. It’s an APPLICATIOR key word. The values of this object must be a JSON Schema. That JSON Schema is APPLIED to the child instance (value) for the corresponding key in the instance object.

Slide 39

Slide 39 text

Validation – Keywords for Object Instances – “required” Validation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema Valid instance Invalid instance The value of `required` must be an array. Validation is successful if every item in the array are also keys in the instance object.

Slide 40

Slide 40 text

Annotation – “title” and “description” Annotation Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema The value of `title` and `description` must be a string. They can both be used to “decorate” a user interface or documentation generated from the schema. Schema with annotations

Slide 41

Slide 41 text

Let’s make a JSON Schema!

Slide 42

Slide 42 text

”$schema” keyword: Identifies the version of JSON Schema being used, and the location of the meta-schema. ”$id” keyword: The unique identifier for the schema, and the base URI for reference resolving (more on that later) …One more thing

Slide 43

Slide 43 text

Go to http://bit.ly/ga4gh-json-schema-workshop-01 for all the links! A JSON Schema like structure (In YAML): https://github.com/ga4gh-schemablocks/blocks/blob/master/src/yaml/ontology_term.yaml Let’s make it a JSON Schema! You will need: A YAML to JSON to YAML converter: https://www.json2yaml.com A means to quickly and easily test a JSON Schema: https://www.jsonschemavalidator.net Core and Validation spec documents: http://json-schema.org/specification.html Slightly more friendly documentation and examples: http://json-schema.org/understanding-json-schema Example data (included in first link) The $schema and $id to start (Included in first link) Let’s make a JSON Schema! You may find it easier or faster to write in YAML

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

"pattern": "^\\w+:\\w+$” in JSON Slashes in strings have to be escaped! Semantic version with build metadata Must be an object ”id” is required, but label is not. (Not specified this way, but could be the case) ”description” is an annotation field. | (pipe) allows for multi line text in YAML. Newlines are replaced in the conversion to JSON with ”\n”. In YAML – because It’s easier to read! ”examples” is an annotation keyword, which must be an array, but there are no restrictions on the values of that array.

Slide 46

Slide 46 text

Questions?

Slide 47

Slide 47 text

Validation - Keywords for… • Any Instance Type • Numeric Instances (number and integer) • Strings • Arrays • Objects Application - Keywords for… • Applying Subschemas With Boolean Logic • Applying Subschemas Conditionally Validation Keywords https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 48

Slide 48 text

• Let’s take our previous simple schema • Add date of birth • Always require “name” • Require age OR date of birth Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 Schema

Slide 49

Slide 49 text

Application keywords -“oneOf”, “allOf”, “anyOf” Taking our previous “name” and “age” example… The previous schema didn’t make either value required, just specified their type if included. “name” is now required. “oneOf” must be an Array, where each value must be a schema. Validation is successful if exactly one of the schemas in the array validates successfully against the instance. What if we want ”age” AND / OR “dateOfBirth”? “anyOf” is similar, but “at least one” as opposed to “exactly one”. Subschemas applied with boolean logic! Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 50

Slide 50 text

Application keywords -“if”, “then”, “else” Conditional applicability The value of these keywords must be a schema. These schemas are “subschemas”. If the schema from “if” validates successfully, the “then” schema is applied to the instance. If the schema from “if” fails validation, the “else” schema is applied to the instance. Let’s try: “If age is less than 16, guardianName is required” Can anyone spot why this schema won’t do what you might expect? Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 51

Slide 51 text

Application keywords -“if”, “then”, “else” The value of these keywords must be a schema. These schemas are “subschemas”. The value of “if” is a ”valid” schema, but imposes no constraints, because “age” is not a JSON Schema keyword. Remember: no constraints is equivalent to an empty schema “{ }” or `true`, meaning validation passes. “age” must be wrapped in a “properties” keyword in order for its value to be applied to the instance, which them generates an assertion on pass or fail for validation. Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 52

Slide 52 text

Application keywords -“if”, “then”, “else” The value of these keywords must be a schema. These schemas are “subschemas”. The value of “if” is now a subschema which has constraints! Fixed! Common and easy error to make error. 2-3 times a week on the JSON Schema slack or StackOverflow. (Additionally, it should be “exclusiveMaximum”) Application Keyword Examples https://tools.ietf.org/html/draft-handrews-json-schema-validation-01

Slide 53

Slide 53 text

“definitions” and “$ref” “definitions” provides a place to put and reference reusable parts of a JSON Schema document. The value of “definitions” must be an object, where each value must be a schema. “An object schema with a “$ref” property must be interpreted as a ”$ref” references.” * The value of “$ref” must be a URI References. Referenced schema is applied to the instance. Other properties in the object schema must be ignored. Schema Reuse https://tools.ietf.org/html/draft-handrews-json-schema-01

Slide 54

Slide 54 text

“$ref” and “$id” – URI resolution – RFC 3986 A reference of “#item” resolves against the base URI of this document to: http://example.net/root.json#item The schema for “single” identifies as ”#item”, and so the reference can be resolved within the same document. Think of a subschema’s use of a relative ”$id” as similar to giving an HTML element an ”id” and creating a link. The reference to “other.json” resolves against the base URI of this document to: http://example.net/other.json Not defined in this document. Users may preload other schemas into implementations or allow implementations to take network actions to resolve referenced schemas. Schema Reuse https://tools.ietf.org/html/draft-handrews-json-schema-01 The use of “$id” in subschemas could change the base URI of URI resolution. Therefore it is not advised unless you know what you’re doing and why. This gets complex. Please see: https://tools.ietf.org/html/draft-handrews-json-schema-01#section-8 and RFC 3986

Slide 55

Slide 55 text

Let’s make a JSON Schema! …again!

Slide 56

Slide 56 text

Go to http://bit.ly/ga4gh-json-schema-workshop-01 for all the links! A JSON Schema like structure (In YAML): https://github.com/ga4gh-schemablocks/blocks/blob/master/src/yaml/ontology_term.yaml Let’s make it a JSON Schema! You will need: A YAML to JSON to YAML converter: https://www.json2yaml.com A means to quickly and easily test a JSON Schema: https://www.jsonschemavalidator.net Core and Validation spec documents: http://json-schema.org/specification.html Slightly more friendly documentation and examples: http://json-schema.org/understanding-json-schema Let’s make a JSON Schema! You may find it easier or faster to write in YAML

Slide 57

Slide 57 text

Questions?

Slide 58

Slide 58 text

Recap • Who uses JSON Schema? • Case studies on the uses of JSON Schema • IETF and JSON Schema draft versions • Key concepts • Basic JSON Schema - Validation and Annotation • Let’s build a JSON Schema – Interactive. Laptops required! • Advanced JSON Schema – Application and Referencing • Let’s build a JSON Schema again! • Questions and schema troubleshooting

Slide 59

Slide 59 text

Moving Forward Draft 8

Slide 60

Slide 60 text

But can JSON Schema… “Validate an ontology term based on the supplied ontology identifier?” Well… no. JSON Schema doesn’t prohibit you adding your own keywords though… “OK, so I’ll create a new one to do this!” https://github.com/elixir-europe/json-schema-validator As an npm package or as a standalone server. “This validator has three custom keywords implemented, `graph_restriction`, `isChildTermOf` and `isValidTerm`.” Uses EBI Ontology Lookup Service. Used by Human Cell Atlas and others! https://github.com/elixir-europe/BioHackathon/blob/master/interoperability/JSON%20schema%20validation%20with%20ontologies/README.md

Slide 61

Slide 61 text

But can JSON Schema… “Validate an ontology term based on the supplied ontology identifier?” If you add to the keyword vocabulary. But you’ll have to tell people in advance! Enter: “$vocabularies” Take a schema and extend it by creating a new meta-schema. The new meta-schema defines the “$vocabularies” it uses. Your new schemas “$schema” value of the “$id” from your new meta-schema. This allows implementations to dynamically load libraries that support additional sets of keywords, if they are provided and identified. https://github.com/elixir-europe/BioHackathon/blob/master/interoperability/JSON%20schema%20validation%20with%20ontologies/README.md

Slide 62

Slide 62 text

Still lots more!

Slide 63

Slide 63 text

Acknowledgements

Slide 64

Slide 64 text

Thank you! Ben Hutton Wellcome Sanger Institute JSON Schema core Github: relequestual Twitter: @relequestual http://json-schema.org Includes link to join slack Additional Sponsors: http://bit.ly/json-schema-work