Failure Detection of Microservices with Test and Zipkin

Failure Detection of Microservices with Test and Zipkin

Hiroyuki Ito
LINE SET TF TF Leader
https://linedevday.linecorp.com/jp/2019/sessions/B2-5

Be4518b119b8eb017625e0ead20f8fe7?s=128

LINE DevDay 2019

November 21, 2019
Tweet

Transcript

  1. 2019 DevDay Failure Detection of Microservices With Test and Zipkin

    > Hiroyuki Ito > LINE SET TF TF Leader
  2. As a Key Message Proposal in the Era of Microservices

    Detecting failures more quickly Over solving almost all of bugs before release With Test Automation
  3. “Build Quality in” Is the Basics

  4. MSA Is Expanding Continuously MSA = Microservice Architecture Resilience is

    the hypothesis in MSA We cannot foresee all bugs beforehand
  5. We Can Utilize Test Automation for… reducing MTTR (Mean Time

    to Repair) contributing to our business achieving and enhancing resiliency of Microservices
  6. Agenda > Background / Challenges
 > Failure Detection with Test

    Automation
 > Make API Tests Easy with Karate
 > Reduce MTTR with Zipkin
  7. Background / Challenges

  8. About 2,500 Microservices Behind our Products

  9. Increase of Outages Growth and expansion of our business Rapid

    increases of Integration Points Growth and expansion of our Microservices Especially at Integration Points
  10. Becoming Difficult Teams and Product Managers couldn't respond to users

    / customers quickly and lose money Hard to detect failures Hard to distinguish which Microservices caused failures
  11. Lots of Confusion > Need UI to test API? >

    Most of them didn't know we can test APIs in a programmatic way How To Test? Actions > How to detect failures? > How to recover services quickly? > How to contribute to business / users / customers? Lack of a Whole-Image due to Independent Develop-Ability and Deployability > One team test only one Microservice(s) > Lack of testing beyond teams / silos
  12. SET = "Software Engineer In Test" Responsible for process improvements

    in each product development team With automation techniques (Test Automation and DevOps) With Agile methodologies
  13. Experience Report At Channel Gateway

  14. System Image of Channel Gateway Game Store Channel Gateway User

    Others ! This is an image !
  15. Failure Detection With Test Automation

  16. Improve MTTR for Microservices Implement API Test scripts with JUnit

    and Spring Boot for Developer Testing Notify failures and recovery via chat channel Run them periodically via CI Server
  17. Example of JUnit

  18. Results (Good) Failure detection worked Some developers started implementing them

    Detected some infrastructure vulnerabilities (unintentionally)
  19. Results (Need To Improve) > Couldn't achieve quick recovery Hard

    To Analyze Root Cause and Distinguish Which Microservices Caused Didn't Become Established > Few developers maintained them > Hard to read and implement for developers and a Product Manager Left to Developers' Own Choice > How to analyze root cause? > How to distinguish which Microservices caused failures?
  20. Make API Tests Easy
 With Karate

  21. JUnit and Spring Boot -> Karate Framework > Taught how

    to write proper tests with examples > Added / updated Karate features based on real needs Worked With the Product Development Team > Can implement with less code > Easy to read, implement, and maintain for developers and the Product Manager > BDD (Behavior-Driven Development) style
  22. Example of Karate

  23. Example of JUnit

  24. Results (Good) Reduced outages (30 - 50%) Even the Product

    Manager started writing tests The product development team became self-organized
  25. Results (Need To Improve) Couldn't know easily which Microservices failed

  26. Reduce MTTR With Zipkin

  27. What Is Zipkin? Distributed Tracing System > Can visualize latency

    of each Microservice > Can visualize dependency among Microservices > Can detect failures of Microservices with Trace ID > OSS
  28. Sebas Family Powered by Zipkin > Can kick tests via

    chat channel > Don't need to access to CI Server to change configurations Sebas-Bot: Chat-Bot Sebas-Report: Intelligent Test Reports > Expansion of Karate Test Reports > Added Zipkin's trace ID to track Microservices > Rewrote report feature with Vue.js
  29. None
  30. None
  31. None
  32. Delete /todo/{username}/{todoitemid} [34.037ms] delete /todo/{username}/{todoitemid} [39.239ms]

  33. Results (Good) Could easily know which Micorservices failed 1 week

    -> 1 - 2 hours Resilience: Can recover services quickly
  34. Results (Need To Improve) Add/expand Zipkin to all related Microservices

  35. None
  36. Improved Dramatically > Can define goals and milestones > Can

    prioritize targets to implement tests based on # of outages and impacts to users Became a Self-Organized/Self-Running Team Became as a Model Team With Guidelines and Reference Implementations Reduced Outages Dramatically (30 - 50 %)
  37. Product Manager Can Write Codes For contributing to the product!

    Writing production codes may hurt the team Good to learn behavior of current products and teach members how to clarify requirements Writing test scripts don't hurt the team and can contribute to the team
  38. Expansion Some teams started using Karate (including Messaging API) Holding

    Boot Camp / Hackathon events
  39. Future Plan > Expand Users Sebas-Bot Utilize Them for Process

    Improvement for Product Development Teams Sebas-Report > Expand Zipkin to all related Microservices > Detect problematic Microservices > Generate test scripts
  40. Conclusion

  41. As a Key Message Again Proposal in the Era of

    Microservices Detecting failures more quickly Over solving almost all of bugs before release With Test Automation
  42. Utilize technology for improving business Lead process improvements on each

    product development team Tame complexity of Microservices
  43. Test Automation Is NOT Only For Quality Assurance!

  44. Innovate New Ways With Technical Excellence!

  45. Thank You