Slide 1

Slide 1 text

Changing the way cities move Microservices pitfalls Addressing the most frequent pitfalls when transitioning to Microservices

Slide 2

Slide 2 text

Mobimeo – Changing the way cities move Easy access to daily mobility Our technology empowers mobility providers to orchestrate existing and new modes of public transport. Together we create an effortless transport experience to make mobility service attractive to millions of users. More mobility. Less traffic. 2

Slide 3

Slide 3 text

3 We know what drives the mobility sector - today and tomorrow About Mobimeo Founded in Founded in 2018 as subsidiary company of Deutsche Bahn AG and merged with parts of moovel Group GmBH in 2020 Offices in Berlin and Hamburg 170 Mobimeos from over 39 nations

Slide 4

Slide 4 text

4 Magnus Kulke Engineering Manager github.com/mkulke lnkd.in/magnuskulke Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 5

Slide 5 text

5 Lothar Schulz Engineering Manager lotharschulz.info github.com/lotharschulz lnkd.in/lotharschulz @lothar_schulz Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 6

Slide 6 text

6 Contracts Lawyer up! Ambiguities and Unmet Expectations Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 7

Slide 7 text

7 Microservices are (also/primarily?) a social tool ● There is a relation between architecture and team setup ● “Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.” Conway’s Law ● Enables teams to make autonomous decisions Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Remove placeholder

Slide 8

Slide 8 text

8 ● Codify expectations towards an API from the consumer’s perspective ○ Behaviour: does not change unexpectedly ○ Availability: when can we retire an API? ● How to express such a contract? ○ Machine readable: Swagger/OpenAPI, JSON Schema, GraphQL ○ API Versions ● Abstain from breaking changes ○ Additional properties? ○ Extending enums? ● Make everything optional: Protobuf3 Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Service Boundaries are Defined by Contracts

Slide 9

Slide 9 text

9 Problem: A Schema might not be expressive enough ● Documents can be formally correct ● But semantics have changed ○ References in a document ○ Content: New ID for entity ● Pragmatic solution: Contract tests Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 10

Slide 10 text

10 Performance Characteristics ● Service level objectives ● Rate limits ● Request budgets Microservices Pitfalls - 2020 12 08 - Magnus Kulke/Lothar Schulz Remove placeholder

Slide 11

Slide 11 text

11 ● Unforeseen (ab)use patterns ● How to attribute incoming traffic? ○ Correlation Ids ○ Callers need to tag their requests ● Manage access ○ Service Accounts ○ Declarative: Service Mesh Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz The Other Side: Protection from Harmful Workloads

Slide 12

Slide 12 text

12 Domains None of your concern! Slicing microservices properly Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 13

Slide 13 text

13 Database as Microservice Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 14

Slide 14 text

14 Monolith Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz My Shop Find goods Buy goods Pay the goods

Slide 15

Slide 15 text

15 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Book Search Pay

Slide 16

Slide 16 text

16 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Scaling Book Search Pay

Slide 17

Slide 17 text

17 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Scaling - Vertical Book Search Pay

Slide 18

Slide 18 text

18 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Scaling - Vertical - Horizontal Book Search Pay

Slide 19

Slide 19 text

19 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Scaling - Vertical - Horizontal - Sharding Book Search Pay

Slide 20

Slide 20 text

20 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Book Search Pay

Slide 21

Slide 21 text

21 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Book Search Pay Recommendation

Slide 22

Slide 22 text

22 Domains Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Book Search Pay Recommendation Voucher

Slide 23

Slide 23 text

23 Domains - Bounded Contexts Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Book Search Pay Recommendation Voucher

Slide 24

Slide 24 text

24 Distributed Systems Your Consensus is a House of Cards Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 25

Slide 25 text

25 ● HA/Clustering prior to consensus systems ○ Heartbeats with serial cable ○ DRBD/GFS ○ STONITH Hardware ● Complex HA machinery was often the cause of outages Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Consensus Systems are Great 🖥💥 🔫

Slide 26

Slide 26 text

26 ● Systems need to agree on a single truth ● Consensus Protocols ● L. Lamport: The Part-Time Parliament, 1998 ● Simple example: Raft (consul, etcd) Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Safe Coordination in Distributed systems

Slide 27

Slide 27 text

27 "Anything that can go wrong will [eventually] go wrong" Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz However: Murphy’s Law We take a lot of things for granted + there are unknown unknowns.

Slide 28

Slide 28 text

28 ● Recently introduced rate limits ○ Urgent rollback, 3am ○ Node cannot pull redis:latest 🙀 ● DNS Load Balancing ● DNS transport is UDP ● UDP Packages are limited in size ● Per Spec DNS allows <= 512 bytes Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz Scenario 1: DockerHub

Slide 29

Slide 29 text

29 Scenario 1: DockerHub, cont. Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz ● DNS responses > 512 bytes fall back to TCP ○ Your sysadmin might not know this ○ Security Group blocks tcp/53 ● Not all resolvers are alike / agree on the spec ○ Glibc “salvages” truncated DNS messages ○ Golang DNS resolver (Docker) does not ○ Quick fix: CGO_ENABLED=1

Slide 30

Slide 30 text

30 Scenario 2: DNS, again (it’s always DNS) Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz ● Our J2EE service is stuck in an exception loop ○ Logs a lot of large stack traces (lots of lines) ● Engineers integrate cool .io SaaS for tailing logs in Logstash ○ Every line a request to cool .io data sink ○ Every line a hostname is resolved ● Cloud Providers disapproves, starts rate-limiting DNS the service’s node ● K8S api-server/node comm. is affected. ○ Node is marked as broken ○ Scheduler moved ever-crashing service to fresh, healthy node ● Repeat

Slide 31

Slide 31 text

31 Scenario 3: Seemingly unlimited resources Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz ● Nov. 25th Kinesis outage ○ every node connects with every other node ○ After scaling exceeded threads-max ● File Handles ○ Some workloads do not properly close TCP/IP connections ○ Intermediate proxies have to arbitrarily terminate ○ (Old) user-land kube-proxy leaked goroutines & file handles

Slide 32

Slide 32 text

32 Observability How to X-Ray a hairball Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 33

Slide 33 text

33 Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 34

Slide 34 text

34 Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 35

Slide 35 text

35 Tailor towards audience Example: - 24x7 - the engineering teams - Management - End customers Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 36

Slide 36 text

36 Intuition, experience, and an understanding of what engineers know about the services they serve is used to define - service level indicators (SLIs), - objectives (SLOs), - and agreements (SLAs). Service Level Objectives Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz SRE Book - Service Level Objectives

Slide 37

Slide 37 text

37 - request latency - request response time and/or timeout rate - error rate - proportion of service errors - traffic / system throughput - typically measured in requests per second - availability - what’s the uptime of a service - saturation - measures the system fraction, emphasizing the resources that are most constrained (e.g., in a memory-constrained system, show memory; in an I/O-constrained system, show I/O). I experienced system degrading service levels before being saturated, e.g. 90% CPU utilization triggered a service degradation already. Guidance - The Four Golden Signals Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz SRE Book - The Four Golden Signals

Slide 38

Slide 38 text

38 Results Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 39

Slide 39 text

39 Results Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 40

Slide 40 text

40 Questions please Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz

Slide 41

Slide 41 text

Addressing the most frequent pitfalls when transitioning to Microservices - 2020 12 08 - Magnus Kulke/Lothar Schulz