Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Devops meets Functional Programming - Vladimir Kirillov

Devops meets Functional Programming - Vladimir Kirillov

This is a story of an Infrastructure team at Zalora that implemented DevOps using Haskell and Nix.

The story is about:

- drowning in inherent complexity of existing Puppet configuration
- establishing a functional programming community inside the company
- implementing configuration management using purely-functional language and package manager Nix and using NixOS as the base OS
- challenges of using new tools at scale
- building cloud infrastructure tools using Haskell
- building a code-driven deployment platform borrowing design practices from Erlang/OTP, Mesos and other successful distributed system frameworks, accommodating engineering team growth
- overcoming adoption failures and finally reaching operational happiness

Cc6ffa01992b5fa13e1bb5091a202b77?s=128

DevOpsDays Singapore

October 17, 2015
Tweet

More Decks by DevOpsDays Singapore

Other Decks in Technology

Transcript

  1. Lambda the Ul,mate Devops Vlad Ki / @darkproger

  2. • Pla%orm So+ware / SRE at Zalora, CS undergrad at

    NTUU KPI • We're hiring to our distributed team!
  3. • config mgmt • running systems • monitoring • deployment

    / CI / CD • immutable infrastructure • supervising failures • ... • system integra<on
  4. None
  5. None
  6. systems as code

  7. Microgram The Zalora Pla+orm

  8. Microgram • Applica(on realms (infra + configs) • Applica(on defini(ons

    (how to run / scale / monitor) • User APIs • Run(me converts defini(ons to real things (like infra) • Run(me handles opera(ons & automates labor • Fail
  9. DSLs everywhere

  10. /etc/passwd nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false root:*:0:0:System Administrator:/var/root:/bin/sh daemon:*:1:1:System Services:/var/root:/usr/bin/false _uucp:*:4:4:Unix to Unix

    Copy Protocol:/var/spool/uucp:/usr/sbin/uucico _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false _networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false _installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false _lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false _postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false _scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
  11. /etc/pf.conf block quick from <bad_hosts> pass in on $ext_if proto

    tcp to $webserver port www \ (max-src-conn-rate 100/10, \ overload <bad_hosts> flush global) pass out on em0 inet proto tcp \ from $developerhosts to any port 80 \ set queue developers pass out on em0 inet proto tcp \ from any to any port 25 \ set queue mail
  12. sendmail.mc define(`confTO_CONNECT', `1m')dnl define(`confTO_IDENT', `0')dnl define(`confTO_COMMAND', `2m')dnl LOCAL_NET_CONFIG # This

    rule ensures that all local mail is delivered using the # smtp transport, everything else will go via the smart host. R$* < @ $* .$m. > $* $#smtp $@ $2.$m. $: $1 < @ $2.$m. > $3
  13. None
  14. JSON { "domain": "www.example.com", "mongodb": { "host": "localhost", "port": 27017

    } }
  15. YAML / Ansible tasks: - name: take out of load

    balancer pool command: /usr/bin/take_out_of_pool {{ inventory_hostname }} delegate_to: 127.0.0.1
  16. "Programming" with YAML - command: /opt/application/upgrade_db.py when: inventory_hostname == webservers[0]

  17. "Programming" with JSON (CloudForma8on) "Outputs" : { "MyOutput" : {

    "Value" : { "Fn::Join" : [ "%", [ "A-string", {"Ref" : "AWS::StackName" } ] ] } } }
  18. Puppet: resource-oriented DSL, isolated effects Although Puppet’s language is built

    around describing resources (and the rela6onships between them) in a declara6ve way, several parts of the language do depend on evalua6on order case $operatingsystem { centos, redhat: { $service_name = 'ntpd' } debian, ubuntu: { $service_name = 'ntp' } } package { $service_name: ensure => installed, }
  19. Puppet + Hiera Apparently trea+ng code and data separately is

    cool again! --- mysql::server::root_password: 'strongpassword' databases: gotcms: user: 'got' password: 'super_secret_db_password' host: 'localhost' grant: 'ALL'
  20. Chef: DSL + side effects ruby_block 'sleep30s' do block do

    sleep 30 end action :nothing end
  21. <?xml version='1.0' encoding='UTF-8'?> <project> <actions/> <description></description> <logRotator> <daysToKeep>7</daysToKeep> <numToKeep>-1</numToKeep> <artifactDaysToKeep>-1</artifactDaysToKeep>

    <artifactNumToKeep>-1</artifactNumToKeep> </logRotator> <keepDependencies>false</keepDependencies> <properties/> <scm class="hudson.scm.NullSCM"/> <canRoam>true</canRoam> <disabled>false</disabled> <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding> <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding> <triggers class="vector"/>
  22. <tr> <xsl:attribute name="style"> <xsl:choose> <xsl:when test="CONDITION"> <xsl:value-of select="'visibility: visible'"> </xsl:when>

    <xsl:otherwise> <xsl:value-of select="'visibility: collapse'"> </xsl:otherwise> </xsl:choose> </xsl:attribute> </tr>
  23. None
  24. systems as code systems are code

  25. None
  26. Case study: MariaDB mul+-source replica+on

  27. % cat databass.yaml sg: db-alpha.infra.zalora.io my: db-psi.infra.zalora.io id: db-psi.infra.zalora.io hk:

    db-psi.infra.zalora.io th: db-theta.infra.zalora.io ph: db-beta.infra.zalora.io vn: db-beta.infra.zalora.io
  28. • MariaDB channels are server-centric • our model is country-centric

    • countries can share servers go#a aggregate!
  29. Any insufficiently expressive DSL leads to the necessity of using

    an ul:mate systems integra:on tool.
  30. (drumroll)

  31. Which is bash. With some awk, sed and perl

  32. % cat databass.yaml | \ awk -F': ' '{print $2,

    $1}' | \ awk '{a[$1] = a[$1] (a[$1] ? "," : "") $2; } END { for(k in a) print k, a[k]; }' db-beta.infra.zalora.io ph,vn db-psi.infra.zalora.io my,id,hk db-alpha.infra.zalora.io sg db-theta.infra.zalora.io th
  33. Bash • somewhat ubiquitous and mostly unportable • hard to

    scale • strings • more strings • unexpected EOF while looking for matching `"' • also strings
  34. Other ways to tackle refactoring • rewrite manually? • use

    a random script to do it during preprocessing? • write a puppet plugin to do that? • AbstractAggregationVirtualMethodFactoryFactory? • find a be8er template engine? • like, embed python into your tags? • why not use an expressive language in the first place?
  35. Scaling Bash

  36. Scaling Bash with Nix

  37. db-slave-channels = let mapper = _: { db-name, masterhost, ...

    }: nameValuePair (to-key master-host) { inherit master-host; databases = [ db-name ]; }; reducer = { name, value }: all: all // { ${name} = value // { databases = all.${name}.databases or [] ++ value.databases; }; }; in fold reducer {} (mapAttrsToList mapper conf);
  38. None
  39. bash = "${pkgs.bash}/bin/bash"; base64 = "${pkgs.coreutils}/bin/base64"; jq = "${pkgs.jq}/bin/jq"; curl

    = "${pkgs.curl}/bin/curl -s --retry --fail"; awk = "${pkgs.gawk}/bin/awk"; openssl = "${pkgs.openssl}/bin/openssl";
  40. None
  41. Nix • func&onal language • dynamically typed • lightweight "schema

    valida&on" (hard to say typing) • one side-effect (derivation, used to build a package manager framework)
  42. Case study: configuring Jenkins • click through all the forms?

    • can we build a DSL?
  43. eris-sdk = erisJob { branch = "master"; shell = ''

    export SLACK_CHANNEL='#eris-facepalm' export SLACK_TIMEOUT=5m bin/slack make sdk ''; ssh-keys = [ credentials.hydrabot ]; triggers = [ (ghprb-trigger "eris-sdk") (github-push-trigger) ]; };
  44. mapAttrs' (realm-name: spec: deployJob "deploy-${realm-name}" { inherit realm-name; scm =

    eris_master; permissions = with spec; { build = humans-can-build ++ others-can-build; }; ssh-keys = attrValues credentials; })) (filterAttrs is-deployable realms);
  45. choice-parameter = { name , description ? "" , choices

    ? [ "this is a list" ] }: (term "hudson.model.ChoiceParameterDefinition" null [ (term "name" null name) (term "description" null description) (term "choices" { class = "java.util.Arrays$ArrayList"; } [ (term "a" { class = "string-array"; } (map (term "string" null) choices)) ]) ]);
  46. NixOS • Nix packages + Linux kernel + systemd •

    Immutable system images • Full-stack
  47. Infrastructure Specs

  48. infra = { ec2-instance = { web1 = m3large; web2

    = m3large; db-master = { infra, ... }: r3xlarge' { blockDeviceMapping."/dev/xvdm".disk = infra.ebs.database; }; cron = instance "m3.large"; }; elb = elb.defaults; ebs.database = { inherit (realm.ec2-args) region zone; size = 200; volumeType.gp2 = true; }; };
  49. Pluggable sta+c verifica+on

  50. writeBashScript = name: script: let prelude = '' #!${pkgs.bash}/bin/bash set

    -e -o pipefail ''; in pkgs.runCommand name { inherit prelude script; } '' echo "$prelude" >> "$out" echo "$script" >> "$out" chmod +x "$out" ${ShellCheck}/bin/shellcheck \ "$out" '';
  51. None
  52. Experiences with Nix

  53. None
  54. • zalora/upcast - declara1ve infra provisioning (like nixops/ terraform/fugue) •

    zalora/replicator - automated MySQL replica1on • zalora/sproxy - proxy that handles OAuth2 + ACL interface • zalora/aws-ec2 - EC2 extensions for aris1db/aws • unicron, a single-user cron • a lot more on Zalora's GitHub
  55. None
  56. Correct by construc-on • harden your interoperability with type-safe APIs

    data Expr :: * -> * where E :: Executable -> e -> Expr e Pipe :: Expr e -> Expr e -> Expr e Seq :: Expr e -> Expr e -> Expr e Or :: Expr e -> Expr e -> Expr e Redir :: Expr e -> FilePath -> Expr e Env :: [Pair] -> Expr e -> Expr e Sudo :: Expr e -> Expr e SSH :: Hostname -> e -> Expr e -> Expr e
  57. Correct by construc-on: web APIs data DeploymentParams = -- ...

    data DeploymentStatus = -- ... type Deployments = "deployments.xml" :> Header "x-api-key" ApiKey :> ReqBody '[FormUrlEncoded] DeploymentParams :> Post '[XML] DeploymentStatus
  58. Interpre'ng effects wtf :: (Member Spec r, Member Upcast r)

    => Eff r (Map Machine ExitCode) wtf = infra >>= machines >>= traverse (`ssh` cmd) where cmd :: Commandline cmd = exec "wtf" []
  59. -- | Idempotent effects based on evaluating the spec (ok

    to cache). data Spec v where NixQuery :: FromJSON json => Query json -> Spec json Infra :: Spec Infras Stub :: Show a => a -> Spec () -- | Effects that require networking. data Upcast v where Machines :: Infras -> Upcast (Attrs Machine) NixBuild :: Query a -> Upcast [StorePath] NixInstantiate :: Query a -> Upcast [StorePath] MachineExec :: (Machine, Commandline) -> Upcast ExitCode SystemInstalls :: (Machine, StorePath) -> Upcast ExitCode NRNotification :: ApiKey -> [DNS] -> Upcast ()
  60. Integra(ng with the rest of the stringy world • UNIX

    is a minefield for experimen5ng with parser combinators! • a lot of perf analysis or systems explora5on is done by analysing streams of text • use haskell if lost in awk+perl+sed • see proger/lxkit and zalora/gctuner
  61. Notable tools • propellor • shake/bake • language-puppet • literate

    haskell (idea from org-mode with emacs)
  62. None
  63. Why not just take a random PL again? • DSLs

    are about isola0ng effects • (like calling system() during a pure graph traversal) • hard to work in a language where you can't protect from shoo0ng self in the foot
  64. Don't sacrifice expressivity for cheap wins. You'll pay the cost

    later.
  65. There are infinitely many tools to do cool things.

  66. There are not enough languages that are powerful enough to

    map those domains.
  67. Keep the unicorn happy!