Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Utilising OSS to Operate a Centralised, Globally Distributed Cloud Platform

Utilising OSS to Operate a Centralised, Globally Distributed Cloud Platform

Condé Nast International is home to some of the largest online publications in the world - including Vogue, GQ, Wired, and Vanity Fair. In an effort to provide a cohesive vision for these brands across more than 30 markets, a truly global platform was required. Utilising AWS and Kubernetes at its core, the platform officially launched in September 2018 and serves over 200 million unique visitors/month.

Of course, operating Cloud Native Infrastructure is more than just spinning up a container orchestrator! Auxiliary services are required in order to operate it effectively and provide developers with a true platform experience. Open Source Software (OSS) forms the backbone for much of what we do. As such, this talk will be focusing on how Condé Nast International utilises OSS to effectively operate multiple Kubernetes clusters across the world, paying special attention to observability, testing, application delivery, and developer experience.

Josh Michielsen

September 05, 2019
Tweet

More Decks by Josh Michielsen

Other Decks in Technology

Transcript

  1. Utilising OSS to Operate
    a Centralised, Globally
    Distributed Cloud
    Platform
    Josh Michielsen

    View Slide

  2. About Me
    → Snr Software Engineer, Platform
    Engineering @ Condé Nast
    (@condenasteng)
    → Live in Cambridge, UK
    → Cyclist
    → Photographer
    → Dog Lover!
    @jmickey_
    jmichielsen
    jmickey
    mickey.dev
    [email protected]

    View Slide

  3. View Slide

  4. Closer Look - Cluster Deployment
    How we deploy and upgrade our clusters
    with Terraform and Tectonic.
    Closer Look - Logging
    Shipping logs with Fluentd makes retrieving
    logs in-cluster relatively simple. At Condé
    we pair this with ElasticSearch and Kibana.
    Looking to the Future
    What the future holds for the Condé Nast
    Cloud Platform.
    Value of Open Source
    What is the value of utilising Open Source
    software when developing Cloud Native
    software.
    Platform Overview
    Overview of the Cloud Platform at Condé
    Nast built on top of Kubernetes & AWS.
    Closer Look - App Deployment
    Helm simplifies the packaging and
    deployment of applications running on
    Kubernetes.
    Closer Look - Ingress
    How we use Traefik as an ingress controller
    for public and private ingress for our
    Kubernetes clusters.
    01
    AGENDA
    02
    03
    05
    06
    07
    04

    View Slide

  5. The Value of Open Source
    01

    View Slide

  6. Software is eating the world and
    open source is eating software.
    Alexis Richardson
    Founder and CEO of Weaveworks, and CNCF TOC Chairman

    View Slide

  7. Basic Benefits
    → Flexibility and agility
    → Cost effective
    → Access to source code, allowing for
    greater understanding of the product
    → Avoid lock-in
    → Community

    View Slide

  8. Platform Overview
    02

    View Slide

  9. Global Cloud Platform
    Clusters in 4 Regions
    11 Markets
    130m+ Monthly Pageviews
    17/34 Publications Migrated

    View Slide

  10. KubeCon Keynote:
    http://bit.ly/cni-keynote-kubecon

    View Slide

  11. X-cache: MISS Ingress
    Credit: Katie Gamanji - @k_gamanji

    View Slide

  12. Credit: Katie Gamanji - @k_gamanji

    View Slide

  13. Credit: Katie Gamanji - @k_gamanji

    View Slide

  14. Closer Look:
    App Deployment
    03

    View Slide

  15. A Kubernetes package manager that simplifies the
    packaging, configuration, and deployment of applications
    and services onto Kubernetes clusters

    View Slide

  16. Helm Basics
    Provides a templating language that can be used to
    generate standard resource configurations. Charts
    can be provided a set of override values.
    Helm charts can have dependencies, allowing you
    to modularise your Helm configurations.
    When executed, Helm:
    → Replaces the values in the configuration
    → Builds the resource definitions
    → Deploys them to Kubernetes, and keeps track of
    all those associated resources
    → All while versioning them as a set (A.K.A a
    “release”)
    $ helm create myapp
    $ cat myapp/templates/deployment.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: {{ include "myapp.fullname" . }}
    labels:
    {{ include "myapp.labels" . | indent 4 }}
    spec:
    replicas: {{ .Values.replicaCount }}
    selector:
    matchLabels:
    app.kubernetes.io/name:
    {{ include "myapp.name" . }}
    app.kubernetes.io/instance:
    {{ .Release.Name }}
    ...

    View Slide

  17. Helm at Condé → Single base helm chart used across all development teams.
    → YAML file to provide values for each environment, stored in the
    application repo.
    → Conditionals on dependencies means developers can choose
    the features they want to use by simply specifying the config
    for that feature.
    → We set non-negotiable Helm configuration items that must be
    included (e.g. Limits).
    → Deployed to Kubernetes from CircleCI.

    View Slide

  18. Dependency: Ingress
    Condition: ingress.enabled
    Dependency: HPA
    Condition: hpa.enabled
    Dependency: Service
    Condition: service.enabled
    Base
    Helm
    Chart
    name: myapp
    replicas: 3
    ingress:
    enabled: true
    ...
    service:
    enabled: true
    ...
    myapp/prod.yaml
    v0.0.2

    View Slide

  19. Closer Look:
    Ingress
    04

    View Slide

  20. A modern HTTP reverse proxy and load balancer that
    makes deploying microservices easy. Traefik integrates
    with your existing infrastructure components and
    configures itself “automatically and dynamically”.

    View Slide

  21. View Slide

  22. Traefik at Condé → Each development team has a namespace.
    → Each namespace has a public ingress, and a private
    ingress.
    → Certificates are configured on AWS ELBs via AWS ACM.
    → Ingress rules are managed via an ingress configuration
    block within the Helm chart.
    → Enables developers to manage their own application
    ingress rules. Including allow and block lists.

    View Slide

  23. View Slide

  24. Closer Look:
    Cluster Deployment
    05

    View Slide

  25. Tectonic Installer provides the ability to declare
    Kubernetes clusters in Terraform. Together with
    Continuous Delivery we’re able to deploy and update
    clusters easy and quickly while storing all cluster state as
    Terraform code.

    View Slide

  26. Cluster Deployment
    at Condé
    → We self-deploy Kubernetes to CoreOS hosts in AWS
    EC2.
    → Kubernetes master and worker node, and etcd node
    configuration is bootstrapped using CoreOS Ignition.
    → We specify control-plane component versions as
    Terraform variables. Including Bootkube, Calico, Flanel,
    etcd, and CoreDNS.
    → Tectonic Installer handles the creation of AWS VPC,
    subnets, security groups, NACLs, etc.
    → Upgrades: Submit a PR in a private configuration repo
    (stores Terraform variables), which triggers a CircleCI
    pipeline.
    → terraform apply is gated by manual approvals so that
    terraform plan output can be reviewed first.

    View Slide

  27. Pull Repo &
    Clone Tectonic
    Plan Dev
    Plan Staging
    Plan Prod
    Apply Dev Acceptance Dev
    Apply Staging Acceptance Staging
    Acceptance Prod
    Apply Prod
    Hold
    Hold
    Hold

    View Slide

  28. Closer Look:
    Logging
    06

    View Slide

  29. Fluentd is an open source data collector for unified
    logging. It provides an easy way to retrieve, process,
    format, and forward application logs.

    View Slide

  30. Fluentd at Condé → Application developers configure their apps to log to
    stdout.
    → All development teams must adhere to our structured
    logging standard.
    → Fluentd is deployed as a Kubernetes DaemonSet within
    its own namespace.
    → Fluentd is configured with access to the local node logs,
    and the Kubernetes log volume.
    → Logs are process with additional metadata (e.g.
    namespace, labes, env, region).
    → Logs are them forwarded to AWS ElasticSearch via a
    cluster local ES proxy.

    View Slide


  31. type tail
    format kubernetes
    multiline_flush_interval 5s
    path /var/log/kube-proxy.log
    pos_file /var/log/kube-proxy.pos
    tag kube-proxy

    The format for the log line.
    In this case Kubernetes.
    Interval between buffer
    flushing.
    Location of the log file in
    the node file system.
    Store the last position
    read within the log file.
    Tag the log blog with the
    Kubernetes service.

    View Slide

  32. The Future
    07

    View Slide

  33. Replace Tectonic → The cluster bootstrapping space has evolved
    considerably. Keeping a close eye on ClusterAPI.
    Kubeadm and Kops have improved.
    Prometheus → The introduction of tools like Thanos and Cortex
    have made managing Prometheus across multiple
    clusters, envs, and even namespaces much easier.
    Weaveworks Flux → GitOps for Kubernetes. Git becomes the single
    source of truth, and Flux executes automatic
    remediation when drift occurs.
    Service Mesh → mTLS throughout the cluster, retries, service
    discovery, load balancing, auth(n/z).

    View Slide

  34. Thanks for Listening!
    @jmickey_
    jmichielsen
    jmickey
    mickey.dev
    [email protected]

    View Slide