Utilising OSS to Operate a Centralised, Globally Distributed Cloud Platform

Utilising OSS to Operate a Centralised, Globally Distributed Cloud Platform

Condé Nast International is home to some of the largest online publications in the world - including Vogue, GQ, Wired, and Vanity Fair. In an effort to provide a cohesive vision for these brands across more than 30 markets, a truly global platform was required. Utilising AWS and Kubernetes at its core, the platform officially launched in September 2018 and serves over 200 million unique visitors/month.

Of course, operating Cloud Native Infrastructure is more than just spinning up a container orchestrator! Auxiliary services are required in order to operate it effectively and provide developers with a true platform experience. Open Source Software (OSS) forms the backbone for much of what we do. As such, this talk will be focusing on how Condé Nast International utilises OSS to effectively operate multiple Kubernetes clusters across the world, paying special attention to observability, testing, application delivery, and developer experience.

B3d9b66c0d46431017776efe58baa683?s=128

Josh Michielsen

September 05, 2019
Tweet

Transcript

  1. 2.

    About Me → Snr Software Engineer, Platform Engineering @ Condé

    Nast (@condenasteng) → Live in Cambridge, UK → Cyclist → Photographer → Dog Lover! @jmickey_ jmichielsen jmickey mickey.dev j@mickey.dev
  2. 3.
  3. 4.

    Closer Look - Cluster Deployment How we deploy and upgrade

    our clusters with Terraform and Tectonic. Closer Look - Logging Shipping logs with Fluentd makes retrieving logs in-cluster relatively simple. At Condé we pair this with ElasticSearch and Kibana. Looking to the Future What the future holds for the Condé Nast Cloud Platform. Value of Open Source What is the value of utilising Open Source software when developing Cloud Native software. Platform Overview Overview of the Cloud Platform at Condé Nast built on top of Kubernetes & AWS. Closer Look - App Deployment Helm simplifies the packaging and deployment of applications running on Kubernetes. Closer Look - Ingress How we use Traefik as an ingress controller for public and private ingress for our Kubernetes clusters. 01 AGENDA 02 03 05 06 07 04
  4. 6.

    Software is eating the world and open source is eating

    software. Alexis Richardson Founder and CEO of Weaveworks, and CNCF TOC Chairman
  5. 7.

    Basic Benefits → Flexibility and agility → Cost effective →

    Access to source code, allowing for greater understanding of the product → Avoid lock-in → Community
  6. 9.

    Global Cloud Platform Clusters in 4 Regions 11 Markets 130m+

    Monthly Pageviews 17/34 Publications Migrated
  7. 15.

    A Kubernetes package manager that simplifies the packaging, configuration, and

    deployment of applications and services onto Kubernetes clusters
  8. 16.

    Helm Basics Provides a templating language that can be used

    to generate standard resource configurations. Charts can be provided a set of override values. Helm charts can have dependencies, allowing you to modularise your Helm configurations. When executed, Helm: → Replaces the values in the configuration → Builds the resource definitions → Deploys them to Kubernetes, and keeps track of all those associated resources → All while versioning them as a set (A.K.A a “release”) $ helm create myapp $ cat myapp/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: {{ include "myapp.fullname" . }} labels: {{ include "myapp.labels" . | indent 4 }} spec: replicas: {{ .Values.replicaCount }} selector: matchLabels: app.kubernetes.io/name: {{ include "myapp.name" . }} app.kubernetes.io/instance: {{ .Release.Name }} ...
  9. 17.

    Helm at Condé → Single base helm chart used across

    all development teams. → YAML file to provide values for each environment, stored in the application repo. → Conditionals on dependencies means developers can choose the features they want to use by simply specifying the config for that feature. → We set non-negotiable Helm configuration items that must be included (e.g. Limits). → Deployed to Kubernetes from CircleCI.
  10. 18.

    Dependency: Ingress Condition: ingress.enabled Dependency: HPA Condition: hpa.enabled Dependency: Service

    Condition: service.enabled Base Helm Chart name: myapp replicas: 3 ingress: enabled: true ... service: enabled: true ... myapp/prod.yaml v0.0.2
  11. 20.

    A modern HTTP reverse proxy and load balancer that makes

    deploying microservices easy. Traefik integrates with your existing infrastructure components and configures itself “automatically and dynamically”.
  12. 21.
  13. 22.

    Traefik at Condé → Each development team has a namespace.

    → Each namespace has a public ingress, and a private ingress. → Certificates are configured on AWS ELBs via AWS ACM. → Ingress rules are managed via an ingress configuration block within the Helm chart. → Enables developers to manage their own application ingress rules. Including allow and block lists.
  14. 23.
  15. 25.

    Tectonic Installer provides the ability to declare Kubernetes clusters in

    Terraform. Together with Continuous Delivery we’re able to deploy and update clusters easy and quickly while storing all cluster state as Terraform code.
  16. 26.

    Cluster Deployment at Condé → We self-deploy Kubernetes to CoreOS

    hosts in AWS EC2. → Kubernetes master and worker node, and etcd node configuration is bootstrapped using CoreOS Ignition. → We specify control-plane component versions as Terraform variables. Including Bootkube, Calico, Flanel, etcd, and CoreDNS. → Tectonic Installer handles the creation of AWS VPC, subnets, security groups, NACLs, etc. → Upgrades: Submit a PR in a private configuration repo (stores Terraform variables), which triggers a CircleCI pipeline. → terraform apply is gated by manual approvals so that terraform plan output can be reviewed first.
  17. 27.

    Pull Repo & Clone Tectonic Plan Dev Plan Staging Plan

    Prod Apply Dev Acceptance Dev Apply Staging Acceptance Staging Acceptance Prod Apply Prod Hold Hold Hold
  18. 29.

    Fluentd is an open source data collector for unified logging.

    It provides an easy way to retrieve, process, format, and forward application logs.
  19. 30.

    Fluentd at Condé → Application developers configure their apps to

    log to stdout. → All development teams must adhere to our structured logging standard. → Fluentd is deployed as a Kubernetes DaemonSet within its own namespace. → Fluentd is configured with access to the local node logs, and the Kubernetes log volume. → Logs are process with additional metadata (e.g. namespace, labes, env, region). → Logs are them forwarded to AWS ElasticSearch via a cluster local ES proxy.
  20. 31.

    <source> type tail format kubernetes multiline_flush_interval 5s path /var/log/kube-proxy.log pos_file

    /var/log/kube-proxy.pos tag kube-proxy </source> The format for the log line. In this case Kubernetes. Interval between buffer flushing. Location of the log file in the node file system. Store the last position read within the log file. Tag the log blog with the Kubernetes service.
  21. 33.

    Replace Tectonic → The cluster bootstrapping space has evolved considerably.

    Keeping a close eye on ClusterAPI. Kubeadm and Kops have improved. Prometheus → The introduction of tools like Thanos and Cortex have made managing Prometheus across multiple clusters, envs, and even namespaces much easier. Weaveworks Flux → GitOps for Kubernetes. Git becomes the single source of truth, and Flux executes automatic remediation when drift occurs. Service Mesh → mTLS throughout the cluster, retries, service discovery, load balancing, auth(n/z).