Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Current/Future action with SRE

Current/Future action with SRE

Share about our activity.

k.yanagimoto

October 27, 2018
Tweet

More Decks by k.yanagimoto

Other Decks in Technology

Transcript

  1. Current/Future action with SRE
    Oct 27, 2018
    Koichi Yanagimoto
    EC Incubation Development Dept.
    Rakuten, Inc.

    View Slide

  2. Who am I ?
    Name: Koichi Yanagimoto
    @kyanagimoto
    Hobby: Snowboarding, Golf
    Joined Rakuten 2009
    Working as a SRE in our team.

    View Slide

  3. 3
    About SRE ?
    “class SRE implements DevOps.”
    DevOps8@ SRE=:.
    DevOps SRE
    Reduce organization Silos
    $(&)H4
    Share ownership
    ( 6+G

    Accept failure as normal
    !
    #5
    SLOs & Blameless postmortem
    1"37A'

    Implement gradual change
    BC%;E

    Reduce costs of failure
    .*>?<

    Leverage tooling and automation
    -)9+
    Automate your job away
    -)

    Measure everything
    ,2F

    Measure toil and reliability
    Toil0D/2F

    View Slide

  4. 4
    Share ownership -

    Sapporo Team
    (Development/Operation)
    Tokyo Team
    (SRE/Development)
    - Request / discuss about Toils.
    - Feedback to new tools / Architecture.
    - Propose the new Architecture.
    - Make PoC.
    Development

    View Slide

  5. 5
    SLOs & Blameless postmortem -

    SLI : service level indicators ()
    SLO : service level objectives ()
    SLA : service level agreements ()
    Availability Level Per year Pert month Per day
    99% 3.65 days 7.2 hours 14.4 minutes
    99.9% 8.76 hours 43.2 minutes 1.44 minutes
    99.99% 52.6 minutes 4.32 minutes 8.64 seconds
    99.999% 5.26 minutes 25.9 seconds 0.87 seconds

    View Slide

  6. 6
    Reduce costs of failure -

    w/ Spinnaker
    https://www.spinnaker.io/
    From now… on-premises
    w/ original pipeline

    View Slide

  7. 7
    Automate your job away -
    Internet
    Squid layer
    Varnish layer
    Example…
    (Squid + Varnish : S-OUT varnish VM)
    All-in
    config
    All-in
    config
    All-in
    config
    Varnish-1 Varnish-2 Varnish-3
    S-in S-in S-in
    V-1 s-out
    Config
    V-1 s-out
    Config
    V-1 s-out
    Config
    S-out
    Squid-1 Squid-2 Squid-3

    View Slide

  8. 8
    Consul/consul-template
    https://www.consul.io/





    (Consul is a distributed service mesh to connect, secure, and configure
    services across any runtime platform and public or private cloud)
    consul-template
    https://github.com/hashicorp/c
    onsul-template
    (The daemon consul-template queries a Consul or Vault cluster and
    updates any number of specified templates on the file system.)
    Template "
    !
    From https://www.consul.io/ access at 15th Oct. 2018
    From https://github.com/hashicorp/consul-template access at 15th Oct. 2018

    View Slide

  9. 9
    Automate your job away -
    Consul servers
    Internet
    K8S
    Squid
    Varnish
    Consul-template
    Consul-1
    Consul-2 Consul-3
    Squid-1 Squid-2 Squid-3
    Varnish-1 Varnish-2 Varnish-3

    View Slide

  10. 10
    Automate your job away -
    Consul servers
    Internet
    K8S
    Squid
    Varnish
    Consul-template
    Consul-1
    Consul-2 Consul-3
    Squid-1 Squid-2 Squid-3
    Varnish-1 Varnish-2 Varnish-3
    w/o V-1
    config
    Rewrite config
    reload config

    View Slide

  11. 11
    Consul-template sample
    It’s used a language called “HashiCorp Configuration Language”
    Squid.ctmpl
    Change to IP address for each squid node.
    Change to IP address for every varnish nodes.

    View Slide

  12. 12
    Automate your job away -
    Docker imagebuild…

    View Slide

  13. 13
    Measure toil and reliability - Toil$ %
    We stored data into Elasticsearch,
    view with Kibana, which runs at
    K8S
    K8S Elasticsearch
    kibana"
    https://www.elastic.co/guide/en/elasticsearch/refer
    ence/master/modules-node.html
    Created Elasticsearch cluster as this link’s best
    practice.

    Best Practice! #
    • Master nodes
    • Ingest nodes
    • Data nodes
    • Kibana node
    • Elastic APM node
    • Curator
    • Elastalert
    • Beats

    View Slide

  14. 14
    Measure toil and reliability - Toil

    View Slide

  15. 15
    Measure toil and reliability - Toil

    View Slide

  16. View Slide