Slide 1

Slide 1 text

Journey from VMs to Containers Engineering

Slide 2

Slide 2 text

tldr;

Slide 3

Slide 3 text

Micheal Benedict •I am @micheal •Eng Manager, Infrastructure. Focus on: •Continous Delivery •Kubernetes •Infrastructure Governance

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Mission Help you discover and do what you love

Slide 6

Slide 6 text

Cravings Omar Seyal The perfect path to cold brew 36 Caffeinated Inc. Pin A bookmark someone has saved from the internet to a board they’ve created.

Slide 7

Slide 7 text

Cravings Omar Seyal The perfect path to cold brew 36 Caffeinated Inc. Pin

Slide 8

Slide 8 text

Cravings Omar Seyal The perfect path to cold brew 36 Caffeinated Inc. Pin

Slide 9

Slide 9 text

Cravings Omar Seyal The perfect path to cold brew 36 Caffeinated Inc. Pin

Slide 10

Slide 10 text

Cravings Omar Seyal The perfect path to cold brew 36 Caffeinated Inc. Cravings Omar Seyel The perfect path to cold brew 36 Caffeinated Inc. Pin

Slide 11

Slide 11 text

Cravings Omar Seyel The perfect path to cold brew 36 Caffeinated Inc. Board A greater 
 collection of ideas.

Slide 12

Slide 12 text

200m+ People on Pinterest
 each month 100b+ Pins 3b+ Boards 10b+ Recommendatios/Day 450+ Engineers

Slide 13

Slide 13 text

Homefeed Goal: fresh, contextual & relevant
 From 100B Pins to 5K Pins. 
 


Slide 14

Slide 14 text

How to recommend for this pin? Boards Pins Challenges: Graph traversal, Candidate generation, Scoring & Serving across billions of objects Source: The Interplay of User Experience & Machine Learning by Vanja Josifovski @Pinterest

Slide 15

Slide 15 text

Challenges: Graph traversal, Candidate generation, Scoring & Serving across billions of objects Boards Pins How to recommend for this pin? Random walks from a node = Personalized PageRank. More connections = higher score. 100K+ steps < 50ms Source: The Interplay of User Experience & Machine Learning by Vanja Josifovski @Pinterest

Slide 16

Slide 16 text

Visual Search Define visual similarity between any visual object and images in a dataset, in real time. Object Detection

Slide 17

Slide 17 text

Visual Search Define visual similarity between any visual object and images in a dataset, in real time. Lens Near Real-time

Slide 18

Slide 18 text

Under the hood

Slide 19

Slide 19 text

Analytics Cache µService API NoSQL µS Index Sharded DB µS iOS Android Web Mobile Web Hadoop / Spark / Tensor Flow Presto Dashboard Big Data Storage Hive Impala Streaming Hybrid Batch Machine Learning Neutral Nets … Message Bus Source: Thinking with both sides of the brain by David Chaiken @Pinterest Overall Architecture 
 (Serving & Analytics) S

Slide 20

Slide 20 text

Infrastructure Footprint 1M+ requests per sec 175PB+ of data O(104) of servers O(103) of services

Slide 21

Slide 21 text

Serving
 (Monoliths + MicroServices) Analytics 
 (Batch, Streaming,
 structured/unstructured queries) Diverse Workloads

Slide 22

Slide 22 text

Compute Platform

Slide 23

Slide 23 text

fastest path from an idea to production, without worrying about infrastructure without worrying about infrastructure Vision

Slide 24

Slide 24 text

fastest path from an idea to production, without worrying about infrastructure without worrying about infrastructure Vision

Slide 25

Slide 25 text

focus #1 Simplify E2E Dev XP What are the steps a developer is required (but not expected) to do when building, launching & managing services, batch jobs, etc.?

Slide 26

Slide 26 text

focus #2 An integrated Infra Platform What is required to build a reliable, scalable, efficient & well integrated infrastructure platform?

Slide 27

Slide 27 text

focus #3 Infra Governance Without hampering developer experience and adding opswork, What controls are required to effectively utilize & manage Infrastructure

Slide 28

Slide 28 text

Developer XP CODE TEST BUILD DEPLOY & DELIVERY OPERATE

Slide 29

Slide 29 text

SETUP TEST & BUILD UNIT TEST IMAGE MANGEMENT OPERATIONS METRICS LOGS TRACING DELIVERY WORKFLOW MANAGEMENT JOB SUBMISSION INTEGRATION TEST OWNERSHIP SCAFFOLDING ROLES, KEYS & SECRETS RESOURCE MANAGEMENT QUOTA AMI MANAGEMENT CLUSTER PROVISIONING METERING HEALTH CHECK JOB STATUS JOB CONFIG Scope DEV XP UI CLI API

Slide 30

Slide 30 text

H1 2016 H2 2016 H1 2017 H2 2017 Phase 2: Productionize Docker & Adoption • Metric, logging, security and high availability support. • Fully production ready and over one hundred services migrated (+API fleet) Phase 1: Docker MVP • Developer Workflow • Image Management • Integration w/ existing security 
 & networking systems • First Production Service migrated Containers @Pinterest Kickoff H1 2018 Timeline

Slide 31

Slide 31 text

H1 2017 H2 2017 H1 2018 H2 2018 Timeline Container Orchestration @Pinterest Kickoff • Motivation / Evaluation • MVP build & Operate production cluster for a use-case Phase 1: Onboard workloads (non-serving, batch type) • Adhoc job submission (Tooling) • Onboarded Jenkins • Onboard JupyterHub • Prototyped TensorFlow (using KubeFlow) Phase 2: Onboard workloads (serving but non-crticial) • Productionize TensorFlow (using KubeFlow) • Onboard non-critical serving workloads • Deployment workflow manager* • Infrastructure Governance

Slide 32

Slide 32 text

CHOICES POC CRITERIA OUTCOME Container Orchestration Evaluation Framework

Slide 33

Slide 33 text

CHOICES POC CRITERIA OUTCOME ● Resource and task Scheduling (Flexibility, Multi-Tenancy, Extensibility etc.) ● Scalability ● Integration Cost ● Docker Support, Sidecar support and Runtime extensibility ● Network Support on AWS* ● Security Support on AWS* ● Stateful Service Support ● Ecosystem and Community ● Cluster Operations & Support Container Orchestration

Slide 34

Slide 34 text

Container Orchestration CHOICES POC CRITERIA OUTCOME

Slide 35

Slide 35 text

CHOICES POC CRITERIA OUTCOME Container Orchestration

Slide 36

Slide 36 text

Container Orchestration CHOICES POC CRITERIA OUTCOME ❤

Slide 37

Slide 37 text

PLATFORM APPLICATION CLUSTER ONCALL/ SUPPORT Pillars Adopting K8S Early Customers

Slide 38

Slide 38 text

Cluster Adopting K8S • Self Hosted v/ Managed - Using a combination of both. KOPs (for self-hosted) & etc-manager • Number of Clusters - Mixed opinions. POC to evaluate burden v/ flexibility • HA Strategy - Go cross AZ at the minimum. Multi-region is flaky (without federation) • Ingress - Mostly for internal web tools, Using Amazon’s ALB (Inter-VPC routing key) • Machine Types (homogenous v/ heterogeneous) - Use node scheduling policy, taints & labels judiciously. POC to capture benefits of diverse instance-types & workloads. • Maintenance - Decide SLOs upfront • Stateless v/ Persistent v/ Durable Store - Leaning towards Persistent

Slide 39

Slide 39 text

Platform Adopting K8S • Security[1] - Ensure workloads can be authenticated & access control (enforce or trust /verify) • Networking[2] - Offer dedicated & shared • Service Discovery - Support existing and provide path to move to new • Pinterest’s internal solution (ZUM) backed by Zookeeper. • Next generation is envoy based (still POC) • Ingress - For Internal Services, everyone likes Heroku! Expose http and provide sharable URL • Contour (github.com/heptio/contour) • Metrics & Logging - Observe pod automatically (both metrics & logging) • Offer tiered SLOs (ex, Application Logging > Debuggability) • Governance - Make sure ownership of Jobs, Quota and integration with Chargeback*

Slide 40

Slide 40 text

Pod IAM Setup ● Role set as annotation of Pod ● IPTables rule redirect to local meta- proxy (Drome) ● Drome Agnet consult’s Kubelet, acquires token from “Role Assume Service “ K8S Platform - Security[1]

Slide 41

Slide 41 text

K8S Platform - Networking[2] Support for ENI & Bridge mode ● Support for AWS IAM role and Security Group, Network Isolation and VPC routable IP ● AWS’s Elastic Network Interfaces per pod. ● Support different CNIs plugins (Configured by Pod annotations) ● Collaborating w/ AWS on amazon-vpc-cni-k8s

Slide 42

Slide 42 text

Application Tooling Adopting K8S •Developer Experience: Define dev and prod user deployment user experience upfront •CLI - PinCloud •UI - Infrastructure Console •App Configs- Pinterest Service Description Spec •Canonical JobTypes, Ownership, SidecarConfig •Deployment workflow manager* •Job Submission Service - Manage canonical job metadata (agnostic of underlying compute infra) JOB SUBMISSION SERVICE UI/CLI K8S HADOOP WORKFLOW MANAGER

Slide 43

Slide 43 text

K8S App Tools - pincloud CLI

Slide 44

Slide 44 text

Thank you!