Slide 1

Slide 1 text

2019 DevDay Failure Detection of Microservices With Test and Zipkin > Hiroyuki Ito > LINE SET TF TF Leader

Slide 2

Slide 2 text

As a Key Message Proposal in the Era of Microservices Detecting failures more quickly Over solving almost all of bugs before release With Test Automation

Slide 3

Slide 3 text

“Build Quality in” Is the Basics

Slide 4

Slide 4 text

MSA Is Expanding Continuously MSA = Microservice Architecture Resilience is the hypothesis in MSA We cannot foresee all bugs beforehand

Slide 5

Slide 5 text

We Can Utilize Test Automation for… reducing MTTR (Mean Time to Repair) contributing to our business achieving and enhancing resiliency of Microservices

Slide 6

Slide 6 text

Agenda > Background / Challenges
 > Failure Detection with Test Automation
 > Make API Tests Easy with Karate
 > Reduce MTTR with Zipkin

Slide 7

Slide 7 text

Background / Challenges

Slide 8

Slide 8 text

About 2,500 Microservices Behind our Products

Slide 9

Slide 9 text

Increase of Outages Growth and expansion of our business Rapid increases of Integration Points Growth and expansion of our Microservices Especially at Integration Points

Slide 10

Slide 10 text

Becoming Difficult Teams and Product Managers couldn't respond to users / customers quickly and lose money Hard to detect failures Hard to distinguish which Microservices caused failures

Slide 11

Slide 11 text

Lots of Confusion > Need UI to test API? > Most of them didn't know we can test APIs in a programmatic way How To Test? Actions > How to detect failures? > How to recover services quickly? > How to contribute to business / users / customers? Lack of a Whole-Image due to Independent Develop-Ability and Deployability > One team test only one Microservice(s) > Lack of testing beyond teams / silos

Slide 12

Slide 12 text

SET = "Software Engineer In Test" Responsible for process improvements in each product development team With automation techniques (Test Automation and DevOps) With Agile methodologies

Slide 13

Slide 13 text

Experience Report At Channel Gateway

Slide 14

Slide 14 text

System Image of Channel Gateway Game Store Channel Gateway User Others ! This is an image !

Slide 15

Slide 15 text

Failure Detection With Test Automation

Slide 16

Slide 16 text

Improve MTTR for Microservices Implement API Test scripts with JUnit and Spring Boot for Developer Testing Notify failures and recovery via chat channel Run them periodically via CI Server

Slide 17

Slide 17 text

Example of JUnit

Slide 18

Slide 18 text

Results (Good) Failure detection worked Some developers started implementing them Detected some infrastructure vulnerabilities (unintentionally)

Slide 19

Slide 19 text

Results (Need To Improve) > Couldn't achieve quick recovery Hard To Analyze Root Cause and Distinguish Which Microservices Caused Didn't Become Established > Few developers maintained them > Hard to read and implement for developers and a Product Manager Left to Developers' Own Choice > How to analyze root cause? > How to distinguish which Microservices caused failures?

Slide 20

Slide 20 text

Make API Tests Easy
 With Karate

Slide 21

Slide 21 text

JUnit and Spring Boot -> Karate Framework > Taught how to write proper tests with examples > Added / updated Karate features based on real needs Worked With the Product Development Team > Can implement with less code > Easy to read, implement, and maintain for developers and the Product Manager > BDD (Behavior-Driven Development) style

Slide 22

Slide 22 text

Example of Karate

Slide 23

Slide 23 text

Example of JUnit

Slide 24

Slide 24 text

Results (Good) Reduced outages (30 - 50%) Even the Product Manager started writing tests The product development team became self-organized

Slide 25

Slide 25 text

Results (Need To Improve) Couldn't know easily which Microservices failed

Slide 26

Slide 26 text

Reduce MTTR With Zipkin

Slide 27

Slide 27 text

What Is Zipkin? Distributed Tracing System > Can visualize latency of each Microservice > Can visualize dependency among Microservices > Can detect failures of Microservices with Trace ID > OSS

Slide 28

Slide 28 text

Sebas Family Powered by Zipkin > Can kick tests via chat channel > Don't need to access to CI Server to change configurations Sebas-Bot: Chat-Bot Sebas-Report: Intelligent Test Reports > Expansion of Karate Test Reports > Added Zipkin's trace ID to track Microservices > Rewrote report feature with Vue.js

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Delete /todo/{username}/{todoitemid} [34.037ms] delete /todo/{username}/{todoitemid} [39.239ms]

Slide 33

Slide 33 text

Results (Good) Could easily know which Micorservices failed 1 week -> 1 - 2 hours Resilience: Can recover services quickly

Slide 34

Slide 34 text

Results (Need To Improve) Add/expand Zipkin to all related Microservices

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Improved Dramatically > Can define goals and milestones > Can prioritize targets to implement tests based on # of outages and impacts to users Became a Self-Organized/Self-Running Team Became as a Model Team With Guidelines and Reference Implementations Reduced Outages Dramatically (30 - 50 %)

Slide 37

Slide 37 text

Product Manager Can Write Codes For contributing to the product! Writing production codes may hurt the team Good to learn behavior of current products and teach members how to clarify requirements Writing test scripts don't hurt the team and can contribute to the team

Slide 38

Slide 38 text

Expansion Some teams started using Karate (including Messaging API) Holding Boot Camp / Hackathon events

Slide 39

Slide 39 text

Future Plan > Expand Users Sebas-Bot Utilize Them for Process Improvement for Product Development Teams Sebas-Report > Expand Zipkin to all related Microservices > Detect problematic Microservices > Generate test scripts

Slide 40

Slide 40 text

Conclusion

Slide 41

Slide 41 text

As a Key Message Again Proposal in the Era of Microservices Detecting failures more quickly Over solving almost all of bugs before release With Test Automation

Slide 42

Slide 42 text

Utilize technology for improving business Lead process improvements on each product development team Tame complexity of Microservices

Slide 43

Slide 43 text

Test Automation Is NOT Only For Quality Assurance!

Slide 44

Slide 44 text

Innovate New Ways With Technical Excellence!

Slide 45

Slide 45 text

Thank You