Slide 1

Slide 1 text

Microservices Lifecycle Management Micheal Benedict (@micheal)
 Product Manager, Cloud & Data Infrastructure @Pinterest

Slide 2

Slide 2 text

Agenda History (Microservices at Twitter and Pinterest) Lifecycle of a job What is Governance? Challenges & Solution Future 1 2 3 4

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

2010

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

MONOLITH

Slide 7

Slide 7 text

2017

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

MONOLITH + MICROSERVICES

Slide 11

Slide 11 text

Number of microservices: O(102)

Slide 12

Slide 12 text

Number of microservices: O(102) Number of (offline) jobs: O(103)

Slide 13

Slide 13 text

Number of VM Instances: O(104) Number of microservices: O(102) Number of (offline) jobs: O(103)

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

MONOLITH MICROSERVICES

Slide 16

Slide 16 text

Source

Slide 17

Slide 17 text

Source number of microservices >1000

Slide 18

Slide 18 text

tldr;

Slide 19

Slide 19 text

FENCING & OWNERSHIP Clear isolation of services & its ownership. RELIABILITY
 Failure isolation and graceful degradation SCALABILITY & EFFICIENCY Scale independently ensuring efficient use of infrastructure DEVELOPER PRODUCTIVITY Make it simple for engineers to build and launch services quickly and easily MICROSERVICES The obvious benefits

Slide 20

Slide 20 text

Microservices enable organizations to scale

Slide 21

Slide 21 text

However…

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Not always what it seems…

Slide 24

Slide 24 text

What is the lifecycle of a job?

Slide 25

Slide 25 text

A job can be… ‣long running service ‣batch job ‣map reduce ‣model training ‣experiment

Slide 26

Slide 26 text

RELEASE MONITOR CREATE DEPRECATE

Slide 27

Slide 27 text

RELEASE TEST & BUILD PACKAGE MONITOR LOGS, METRICS & TRACE GRAPH & ALERTS ONCALL DEPLOY (CANARY/PROD) CREATE DEPRECATE

Slide 28

Slide 28 text

RELEASE TEST & BUILD PACKAGE MONITOR LOGS, METRICS & TRACE GRAPH & ALERTS ONCALL DEPLOY (CANARY/PROD) MANAGE CREATE DEPRECATE

Slide 29

Slide 29 text

RELEASE TEST & BUILD PACKAGE MONITOR LOGS, METRICS & TRACE GRAPH & ALERTS ONCALL DEPLOY (CANARY/PROD) MANAGE IDENTITY & CREDENTIAL METADATA RESOURCE & CAPACITY CREATE DEPRECATE BUDGET & SPEND OWNERSHIP

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Governance is needed, But how?

Slide 32

Slide 32 text

Deploy package `pin_write_service` vCPU: 8.0 Memory: 12G Instances: 10 Service Discovery: pinwriter COMPUTE _cluster=pin_write_cluster _namespace=pin_write

Slide 33

Slide 33 text

BLOB STORAGE _prefix=pin_media_pictures _prefix=pin_media_videos KEY/VAL STORAGE _namespace=pin_write COMPUTE _cluster=pin_write_cluster _namespace=pin_write Deploy package `pin_write_service` vCPU: 8.0 Memory: 12G Instances: 10 Service Discovery: pinwriter GB: 20GB RPS: 100K WPS: 10K GB: 2TB GETs: 500K PUTs: 50K

Slide 34

Slide 34 text

Who owns what?

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Logical grouping of identifiers tied to the business The dictionary JOB OWNERSHIP DIRECTORY BUSINESS OWNER TEAM PROJECT 1:N 1:N JOB NAME 1:N 
 (Depends on Identity & Credential Manager) 1:N OWNERSHIP IDENTITY

Slide 37

Slide 37 text

BUSINESS OWNER TEAM PROJECT 1:N 1:N JOB NAME 1:N 
 (Depends on Identity & Credential Manager) 1:N OWNERSHIP IDENTITY INFRASTRUCTURE CORE-SERVICE PinAndBoard 1:N 1:N pin_writer_service 1:N 1:N

Slide 38

Slide 38 text

How to provision identifiers per resource for a job?

Slide 39

Slide 39 text

BLOB STORAGE _prefix=pin_media_pictures _prefix=pin_media_videos KEY/VAL STORAGE _namespace=pin_write COMPUTE _cluster=pin_write_cluster _namespace=pin_write Deploy package `pin_write_service` vCPU: 8.0 Memory: 12G Instances: 10 Service Discovery: pinwriter GB: 20GB RPS: 100K WPS: 10K GB: 2TB GETs: 500K PUTs: 50K

Slide 40

Slide 40 text

pin_write_service BLOB STORAGE _prefix= _prefix= COMPUTE _cluster= KEY/VAL STORAGE _namespace= JOB NAME 1:N IDENTIFIER PER 
 RESOURCE TYPE CANONICAL JOB IDENTIFIER

Slide 41

Slide 41 text

Canonical identifiers for a job Identifying a job across platform/infrastructure services. COMPUTE BLOB STORAGE KEY/VAL
 STORAGE foo_service _cluster=
 _namespace= IDENTITY PROVISIONING SERVICE _prefix= IDENTITY MANAGER

Slide 42

Slide 42 text

How to manage credentials per identifier?

Slide 43

Slide 43 text

A consistent (role based) method to generate credentials for access control & audibility. COMPUTE BLOCK STORAGE RDBMS foo_service _cluster=
 foo_cluster _database= foodb IDENTITY PROVISIONING SERVICE CREDENTIAL PROVISIONING SERVICE generate service account with privileges based on identifiers IAM Keys and Secrets _prefix=fooStore _prefix=barStore CREDENTIAL MANAGER

Slide 44

Slide 44 text

Where to look up job metadata/configuration?

Slide 45

Slide 45 text

https://github.com/pinterest/teletraan

Slide 46

Slide 46 text

Key/Val pairs tied to Jobs & Projects following an hierarchical order Source of truth for Job Metadata METADATA
 MANAGER KEY/VAL KEY/VAL BUSINESS OWNER TEAM PROJECT 1:N 1:N JOB NAME 1:N 
 (Depends on Identity & Credential Manager) 1:N OWNERSHIP IDENTITY

Slide 47

Slide 47 text

How to allocate infra resources consistently?

Slide 48

Slide 48 text

So, what resources can I use? Inventorying and provisioning of resources across platform/infrastructure services. RESOURCE
 MANAGER Define resources to offer: - Online Compute - Storage - Batch Compute Abstract resource provisioning by providing a workflow to provision resources - Allows policies (ex: < 100 vCPU free to launch) - Tie to identity system

Slide 49

Slide 49 text

So, what resources can I use? Inventorying and provisioning of resources across platform/infrastructure services. RESOURCE
 MANAGER COMPUTE BLOB STORAGE KEY/VAL
 STORAGE foo_service CPU MEMORY DISK STORAGE IN GB GETS PUTS STORAGE IN GB WPS RPS RESOURCE PROVISIONING SERVICE CLOUD PROVISIONING IDENTITY PROVISIONING SERVICE

Slide 50

Slide 50 text

How to meter utilization of resources & attribute cost?

Slide 51

Slide 51 text

METER &CHARGEBACK How much am I using? $$ Ability to meter allocation and utilization of resources per service, per engineering team and charge them accordingly Enables Visibility & Accountability Metering across Infrastructure requires standard `schema` - ts (timestamp) - identifier - infrastructure - resource - utilization Leverage internal visibility/observability stack Unit price definition per resource can difficult.

Slide 52

Slide 52 text

GOVERNANCE IDENTITY & CREDENTIAL METADATA RESOURCE & CAPACITY BUDGET & SPEND OWNERSHIP

Slide 53

Slide 53 text

Future

Slide 54

Slide 54 text

DASHBOARD (SINGLE PANE OF GLASS) METADATA RESOURCE & CAPACITY BUDET, METERING & CHARGEBACK IDENTITY & CREDENTIAL PROVIDER APIS & ADAPTERS REPORTING WORKFLOWS { INFRASTRUCTURE AND PLATFORM SERVICES DATACENTER / PUBLIC CLOUD INTERNAL APIS OWNERSHIP

Slide 55

Slide 55 text

Thanks!