Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Utilising OSS to Operate a Centralised, Globally Distributed Cloud Platform

Utilising OSS to Operate a Centralised, Globally Distributed Cloud Platform

Condé Nast International is home to some of the largest online publications in the world - including Vogue, GQ, Wired, and Vanity Fair. In an effort to provide a cohesive vision for these brands across more than 30 markets, a truly global platform was required. Utilising AWS and Kubernetes at its core, the platform officially launched in September 2018 and serves over 200 million unique visitors/month.

Of course, operating Cloud Native Infrastructure is more than just spinning up a container orchestrator! Auxiliary services are required in order to operate it effectively and provide developers with a true platform experience. Open Source Software (OSS) forms the backbone for much of what we do. As such, this talk will be focusing on how Condé Nast International utilises OSS to effectively operate multiple Kubernetes clusters across the world, paying special attention to observability, testing, application delivery, and developer experience.

Josh Michielsen

September 05, 2019
Tweet

More Decks by Josh Michielsen

Other Decks in Technology

Transcript

  1. Utilising OSS to Operate
    a Centralised, Globally
    Distributed Cloud
    Platform
    Josh Michielsen

    View full-size slide

  2. About Me
    → Snr Software Engineer, Platform
    Engineering @ Condé Nast
    (@condenasteng)
    → Live in Cambridge, UK
    → Cyclist
    → Photographer
    → Dog Lover!
    @jmickey_
    jmichielsen
    jmickey
    mickey.dev
    [email protected]

    View full-size slide

  3. Closer Look - Cluster Deployment
    How we deploy and upgrade our clusters
    with Terraform and Tectonic.
    Closer Look - Logging
    Shipping logs with Fluentd makes retrieving
    logs in-cluster relatively simple. At Condé
    we pair this with ElasticSearch and Kibana.
    Looking to the Future
    What the future holds for the Condé Nast
    Cloud Platform.
    Value of Open Source
    What is the value of utilising Open Source
    software when developing Cloud Native
    software.
    Platform Overview
    Overview of the Cloud Platform at Condé
    Nast built on top of Kubernetes & AWS.
    Closer Look - App Deployment
    Helm simplifies the packaging and
    deployment of applications running on
    Kubernetes.
    Closer Look - Ingress
    How we use Traefik as an ingress controller
    for public and private ingress for our
    Kubernetes clusters.
    01
    AGENDA
    02
    03
    05
    06
    07
    04

    View full-size slide

  4. The Value of Open Source
    01

    View full-size slide

  5. Software is eating the world and
    open source is eating software.
    Alexis Richardson
    Founder and CEO of Weaveworks, and CNCF TOC Chairman

    View full-size slide

  6. Basic Benefits
    → Flexibility and agility
    → Cost effective
    → Access to source code, allowing for
    greater understanding of the product
    → Avoid lock-in
    → Community

    View full-size slide

  7. Platform Overview
    02

    View full-size slide

  8. Global Cloud Platform
    Clusters in 4 Regions
    11 Markets
    130m+ Monthly Pageviews
    17/34 Publications Migrated

    View full-size slide

  9. KubeCon Keynote:
    http://bit.ly/cni-keynote-kubecon

    View full-size slide

  10. X-cache: MISS Ingress
    Credit: Katie Gamanji - @k_gamanji

    View full-size slide

  11. Credit: Katie Gamanji - @k_gamanji

    View full-size slide

  12. Credit: Katie Gamanji - @k_gamanji

    View full-size slide

  13. Closer Look:
    App Deployment
    03

    View full-size slide

  14. A Kubernetes package manager that simplifies the
    packaging, configuration, and deployment of applications
    and services onto Kubernetes clusters

    View full-size slide

  15. Helm Basics
    Provides a templating language that can be used to
    generate standard resource configurations. Charts
    can be provided a set of override values.
    Helm charts can have dependencies, allowing you
    to modularise your Helm configurations.
    When executed, Helm:
    → Replaces the values in the configuration
    → Builds the resource definitions
    → Deploys them to Kubernetes, and keeps track of
    all those associated resources
    → All while versioning them as a set (A.K.A a
    “release”)
    $ helm create myapp
    $ cat myapp/templates/deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: {{ include "myapp.fullname" . }}
    labels:
    {{ include "myapp.labels" . | indent 4 }}
    spec:
    replicas: {{ .Values.replicaCount }}
    selector:
    matchLabels:
    app.kubernetes.io/name:
    {{ include "myapp.name" . }}
    app.kubernetes.io/instance:
    {{ .Release.Name }}
    ...

    View full-size slide

  16. Helm at Condé → Single base helm chart used across all development teams.
    → YAML file to provide values for each environment, stored in the
    application repo.
    → Conditionals on dependencies means developers can choose
    the features they want to use by simply specifying the config
    for that feature.
    → We set non-negotiable Helm configuration items that must be
    included (e.g. Limits).
    → Deployed to Kubernetes from CircleCI.

    View full-size slide

  17. Dependency: Ingress
    Condition: ingress.enabled
    Dependency: HPA
    Condition: hpa.enabled
    Dependency: Service
    Condition: service.enabled
    Base
    Helm
    Chart
    name: myapp
    replicas: 3
    ingress:
    enabled: true
    ...
    service:
    enabled: true
    ...
    myapp/prod.yaml
    v0.0.2

    View full-size slide

  18. Closer Look:
    Ingress
    04

    View full-size slide

  19. A modern HTTP reverse proxy and load balancer that
    makes deploying microservices easy. Traefik integrates
    with your existing infrastructure components and
    configures itself “automatically and dynamically”.

    View full-size slide

  20. Traefik at Condé → Each development team has a namespace.
    → Each namespace has a public ingress, and a private
    ingress.
    → Certificates are configured on AWS ELBs via AWS ACM.
    → Ingress rules are managed via an ingress configuration
    block within the Helm chart.
    → Enables developers to manage their own application
    ingress rules. Including allow and block lists.

    View full-size slide

  21. Closer Look:
    Cluster Deployment
    05

    View full-size slide

  22. Tectonic Installer provides the ability to declare
    Kubernetes clusters in Terraform. Together with
    Continuous Delivery we’re able to deploy and update
    clusters easy and quickly while storing all cluster state as
    Terraform code.

    View full-size slide

  23. Cluster Deployment
    at Condé
    → We self-deploy Kubernetes to CoreOS hosts in AWS
    EC2.
    → Kubernetes master and worker node, and etcd node
    configuration is bootstrapped using CoreOS Ignition.
    → We specify control-plane component versions as
    Terraform variables. Including Bootkube, Calico, Flanel,
    etcd, and CoreDNS.
    → Tectonic Installer handles the creation of AWS VPC,
    subnets, security groups, NACLs, etc.
    → Upgrades: Submit a PR in a private configuration repo
    (stores Terraform variables), which triggers a CircleCI
    pipeline.
    → terraform apply is gated by manual approvals so that
    terraform plan output can be reviewed first.

    View full-size slide

  24. Pull Repo &
    Clone Tectonic
    Plan Dev
    Plan Staging
    Plan Prod
    Apply Dev Acceptance Dev
    Apply Staging Acceptance Staging
    Acceptance Prod
    Apply Prod
    Hold
    Hold
    Hold

    View full-size slide

  25. Closer Look:
    Logging
    06

    View full-size slide

  26. Fluentd is an open source data collector for unified
    logging. It provides an easy way to retrieve, process,
    format, and forward application logs.

    View full-size slide

  27. Fluentd at Condé → Application developers configure their apps to log to
    stdout.
    → All development teams must adhere to our structured
    logging standard.
    → Fluentd is deployed as a Kubernetes DaemonSet within
    its own namespace.
    → Fluentd is configured with access to the local node logs,
    and the Kubernetes log volume.
    → Logs are process with additional metadata (e.g.
    namespace, labes, env, region).
    → Logs are them forwarded to AWS ElasticSearch via a
    cluster local ES proxy.

    View full-size slide


  28. type tail
    format kubernetes
    multiline_flush_interval 5s
    path /var/log/kube-proxy.log
    pos_file /var/log/kube-proxy.pos
    tag kube-proxy

    The format for the log line.
    In this case Kubernetes.
    Interval between buffer
    flushing.
    Location of the log file in
    the node file system.
    Store the last position
    read within the log file.
    Tag the log blog with the
    Kubernetes service.

    View full-size slide

  29. The Future
    07

    View full-size slide

  30. Replace Tectonic → The cluster bootstrapping space has evolved
    considerably. Keeping a close eye on ClusterAPI.
    Kubeadm and Kops have improved.
    Prometheus → The introduction of tools like Thanos and Cortex
    have made managing Prometheus across multiple
    clusters, envs, and even namespaces much easier.
    Weaveworks Flux → GitOps for Kubernetes. Git becomes the single
    source of truth, and Flux executes automatic
    remediation when drift occurs.
    Service Mesh → mTLS throughout the cluster, retries, service
    discovery, load balancing, auth(n/z).

    View full-size slide

  31. Thanks for Listening!
    @jmickey_
    jmichielsen
    jmickey
    mickey.dev
    [email protected]

    View full-size slide