$30 off During Our Annual Pro Sale. View Details »

Why your configuration needs a schema

Why your configuration needs a schema

Talk from Configuration Management Camp 2018, all about the high cost of current configuration management approaches, why that leads to serialisation formats like YAML and JSON being edited directly, and how schemas and auto-generation can help.

Gareth Rushgrove

February 06, 2018

More Decks by Gareth Rushgrove

Other Decks in Technology


  1. Why your configuration needs a schema Gareth Rushgrove

  2. @garethr

  3. None
  4. - The proliferation of config file formats - The high

    cost of config management - Why configuration needs a schema - Auto-generate everything
  5. The proliferation of configuration file formats The state of things

  6. XML, INI, JSON, YAML, EDN, HOCON, TOML, CSON, Java Properties,

    internal DSLs, ...
  7. Everyone has opinions about config file formats

  8. Don’t use JSON as a Config File Format

  9. None
  10. None
  11. Some formats are associated with certain languages or frameworks

  12. Others become the default for communities of practice

  13. Apart from the parsing bit, to the application being configured

    it’s just a data structure
  14. For the operator all of the different formats are separate

    user interfaces
  15. This is one of the reasons for higher-level configuration management

  16. The cost of configuration management A barrier to entry to

    good tooling
  17. Multiple configuration management tools is not a bad thing, but

    it does mean lots of reinventing the wheel
  18. Everyone ends up with a way to manage packages, services,

    files, users and groups
  19. Worse, everyone ends up with a way to manage Apache

  20. sous-chef/apache2 1498 commits, 116 contributors, 46 releases puppetlabs/puppetlabs-apache 2992 commits,

    342 contributors, 39 releases Ansible Galaxy 298 results for apache
  21. Managing files is a big part of managing most systems

  22. Using BigQuery on 7.5 million lines of Puppet

  23. What types are used the most?

  24. More than 30% of Puppet resources where files

  25. Option 1: Templates None of the benefits of your chosen

    tool, and you’re exposed to all the configuration file formats directly
  26. template '/etc/app/config.yaml' do source 'config.yaml.erb' mode '0755' owner 'web' group

    'web' end You need a separate templating language
  27. How can you reason about a system when the configuration

    is spread across an explosion of templating languages, file formats and templates?
  28. Option 2: Format-specific resources You can now use your chosen

    tool, but the tool has no context for the application, it’s just data, and the format still bleeds through
  29. ini_setting { "sample setting": ensure => present, path => '/tmp/foo.ini',

    section => 'bar', setting => 'baz', value => 'quux', } Manage an INI file with Puppet
  30. A PowerShell DSC Resource for INI files

  31. Import-DscResource -ModuleName DSCR_IniFile cIniFile Apple { Path = "C:\Test.ini" Section

    = "" Key = "Fruit_A" Value = "Apple" } Manage an INI file with DSC
  32. Ansible module for INI files

  33. Chef resources for JSON and YAML files

  34. Option 3: App-specific resources You get all the power of

    your chosen tool, but at the cost of bespoke development
  35. webapp "cfgmgmtcamp" do static_url_path "/my_project/static" mysql_database_user "project_user_name" show_settings_route "/show-settings" debug

    True end A bespoke application in Chef
  36. How can we lower the cost of native resources for

  37. What have schemas got to do with this? Moving on

    to talk about solutions
  38. Most Chef cookbooks or Ansible or Puppet modules are not

    written by the developers of the application being managed
  39. Most configuration is informally specified via implementation, and often not

    versioned like an API
  40. What if instead applications provided a schema for their configuration?

  41. Examples and demonstrations Experiments in auto generating tools

  42. Kubernetes has a well-defined set of configuration primitives; Pods, Deployments,

    Services, ReplicationControllers, etc.
  43. Kubernetes uses OpenAPI to describe the API

  44. OpenAPI uses JSON Schema internally

  45. Kubernetes JSON Schema

  46. That’s a lot of JSON PS> Get-Content -Path swagger.json |

    Measure-Object -line).Lines 85340 PS> (Get-ChildItem -Path v*/*.json -Recurse | Measure-Object).Count 26181 PS> (Get-ChildItem -Path v*/*.json -Recurse | Get-Content | Measure-Object -line).Lines 7296392
  47. Generated Puppet types and providers

  48. Generated jsonnet templates

  49. Generated ICL templates

  50. Validation tools

  51. Programming language clients

  52. Could we have this for any application?

  53. A simple example application from the internet

  54. { "STATIC_URL_PATH": "/my_project/static", "MYSQL_DATABASE_USER": "project_user_name", "SHOW_SETTINGS_ROUTE": "/show-settings", "DEBUG": true }

    Our application has a configuration file
  55. { "definitions": {}, "$schema": "http://json-schema.org/draft-06/schema#", "id": "app_config", "title": "app_config", "type":

    "object", "additionalProperties": false, "required": [ "STATIC_URL_PATH", "MYSQL_DATABASE_USER" ], "properties": { "STATIC_URL_PATH": { "$id": "/properties/STATIC_URL_PATH", "type": "string", "title": "Static URL path", "description": "A filesystem path for static assets", Let’s write a (JSON) schema
  56. "additionalProperties": false, Only allow the defined properties

  57. "required": [ "STATIC_URL_PATH", "MYSQL_DATABASE_USER" ], These properties are required

  58. "STATIC_URL_PATH": { "$id": "/properties/STATIC_URL_PATH", "type": "string", "title": "Static URL path",

    "description": "A filesystem path for static assets", "examples": [ "/my_project/static" ] }, Describe each individual property
  59. Validate config using the schemas $ jsonschema -F "{error.message}" -i

    app.json schema.json u'STATIC_URL_PATH' is a required property
  60. We have a schema. Now what?

  61. Validate arbitrary structures with JSON Schema import json import fastjsonschema

    data = { "STATIC_URL_PATH": "/my_project/static", "MYSQL_DATABASE_USER": "project_user_name", "SHOW_SETTINGS_ROUTE": "/show-settings", "DEBUG": True, } validate = fastjsonschema.compile(json.load(open('schema.json'))) validate(data) print(json.dumps(data))
  62. The JSON in JSON Schema refers to the syntax for

    the schema. It can be used to validate data in other formats
  63. Generate browser-based user interfaces

  64. Generate interactive documentation

  65. Generate models in different languages

  66. Quicktype generating Simple Types $ docker run -v ${PWD}:/pwd quicktype

    -l types -s schema /pwd/schemas/schema.json class Schema { staticURLPath: String mysqlDatabaseUser: String showSettingsRoute: Maybe<String> debug: Maybe<Bool> }
  67. Quicktype generating Go $ docker run -v ${PWD}:/pwd quicktype -l

    go -s schema /pwd/schemas/schema.json // To parse and unparse this JSON data, add this code to your project and do: // // r, err := UnmarshalSchema(bytes) // bytes, err = r.Marshal() package main import "encoding/json" func UnmarshalSchema(data []byte) (Schema, error) { var r Schema err := json.Unmarshal(data, &r) return r, err } func (r *Schema) Marshal() ([]byte, error) {
  68. Quicktype currently supports generating TypeScript, Elm, Java, C#, Go, Swift

    and C++
  69. Python JSON Schema Objects

  70. Dynamically build objects from schemas import python_jsonschema_objects as pjs import

    json schema = json.load(open('schema.json')) builder = pjs.ObjectBuilder(schema) ns = builder.build_classes() Config = ns.AppConfig config = Config( STATIC_URL_PATH="/static", MYSQL_DATABASE_USER="db", )
  71. What if we could generate Puppet types, Chef resources, Libral

    providers, Ansible modules, etc.
  72. Live Demo Klaxon

  73. Generate Chef resource from schema $ ./to_chef.py resource_name :app_config property

    :path, String, name_property: true property :static_url_path, String, required: true property :mysql_database_user, String, required: true property :show_settings_route, String, default: '/settings' property :debug, Boolean action :create do file path do content "{ STATIC_URL_PATH: "#{static_url_path}", MYSQL_DATABASE_USER: "#{mysql_database_user}", SHOW_SETTINGS_ROUTE: "#{show_settings_route}", DEBUG: #{debug} }"
  74. Generate Libral provider from schema $ ./to_libral.py #! /usr/bin/python import

    json import sys import os METADATA=""" --- provider: type: app_config invoke: json actions: [get,set] suitable: true attributes: path: desc: The filepath for the configuration file
  75. Generate Puppet type from schema $ ./to_puppet.py Puppet::Type.newtype(:app_config) do ensurable

    validate do required_properties = [ :static_url_path, :mysql_database_user, ] required_properties.each do |property| if self[property].nil? and self.provider.send(property) == :absent fail "You must provide a #{property}" end end end newparam(:path, namevar: true) do
  76. Conclusions If all you remember is...

  77. Schemas can allow for greater portability, and improved interoperability, between

  78. If you’re building applications, consider writing a schema to describe

    your configuration
  79. If you’re building configuration management tools consider relying a lot

    more on auto-generation
  80. Any questions? And thanks for listening