Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Failure Detection of Microservices with Test and Zipkin

Failure Detection of Microservices with Test and Zipkin

Hiroyuki Ito
LINE SET TF TF Leader
https://linedevday.linecorp.com/jp/2019/sessions/B2-5

LINE DevDay 2019

November 21, 2019
Tweet

More Decks by LINE DevDay 2019

Other Decks in Technology

Transcript

  1. As a Key Message Proposal in the Era of Microservices

    Detecting failures more quickly Over solving almost all of bugs before release With Test Automation
  2. MSA Is Expanding Continuously MSA = Microservice Architecture Resilience is

    the hypothesis in MSA We cannot foresee all bugs beforehand
  3. We Can Utilize Test Automation for… reducing MTTR (Mean Time

    to Repair) contributing to our business achieving and enhancing resiliency of Microservices
  4. Agenda > Background / Challenges
 > Failure Detection with Test

    Automation
 > Make API Tests Easy with Karate
 > Reduce MTTR with Zipkin
  5. Increase of Outages Growth and expansion of our business Rapid

    increases of Integration Points Growth and expansion of our Microservices Especially at Integration Points
  6. Becoming Difficult Teams and Product Managers couldn't respond to users

    / customers quickly and lose money Hard to detect failures Hard to distinguish which Microservices caused failures
  7. Lots of Confusion > Need UI to test API? >

    Most of them didn't know we can test APIs in a programmatic way How To Test? Actions > How to detect failures? > How to recover services quickly? > How to contribute to business / users / customers? Lack of a Whole-Image due to Independent Develop-Ability and Deployability > One team test only one Microservice(s) > Lack of testing beyond teams / silos
  8. SET = "Software Engineer In Test" Responsible for process improvements

    in each product development team With automation techniques (Test Automation and DevOps) With Agile methodologies
  9. Improve MTTR for Microservices Implement API Test scripts with JUnit

    and Spring Boot for Developer Testing Notify failures and recovery via chat channel Run them periodically via CI Server
  10. Results (Good) Failure detection worked Some developers started implementing them

    Detected some infrastructure vulnerabilities (unintentionally)
  11. Results (Need To Improve) > Couldn't achieve quick recovery Hard

    To Analyze Root Cause and Distinguish Which Microservices Caused Didn't Become Established > Few developers maintained them > Hard to read and implement for developers and a Product Manager Left to Developers' Own Choice > How to analyze root cause? > How to distinguish which Microservices caused failures?
  12. JUnit and Spring Boot -> Karate Framework > Taught how

    to write proper tests with examples > Added / updated Karate features based on real needs Worked With the Product Development Team > Can implement with less code > Easy to read, implement, and maintain for developers and the Product Manager > BDD (Behavior-Driven Development) style
  13. Results (Good) Reduced outages (30 - 50%) Even the Product

    Manager started writing tests The product development team became self-organized
  14. What Is Zipkin? Distributed Tracing System > Can visualize latency

    of each Microservice > Can visualize dependency among Microservices > Can detect failures of Microservices with Trace ID > OSS
  15. Sebas Family Powered by Zipkin > Can kick tests via

    chat channel > Don't need to access to CI Server to change configurations Sebas-Bot: Chat-Bot Sebas-Report: Intelligent Test Reports > Expansion of Karate Test Reports > Added Zipkin's trace ID to track Microservices > Rewrote report feature with Vue.js
  16. Results (Good) Could easily know which Micorservices failed 1 week

    -> 1 - 2 hours Resilience: Can recover services quickly
  17. Improved Dramatically > Can define goals and milestones > Can

    prioritize targets to implement tests based on # of outages and impacts to users Became a Self-Organized/Self-Running Team Became as a Model Team With Guidelines and Reference Implementations Reduced Outages Dramatically (30 - 50 %)
  18. Product Manager Can Write Codes For contributing to the product!

    Writing production codes may hurt the team Good to learn behavior of current products and teach members how to clarify requirements Writing test scripts don't hurt the team and can contribute to the team
  19. Future Plan > Expand Users Sebas-Bot Utilize Them for Process

    Improvement for Product Development Teams Sebas-Report > Expand Zipkin to all related Microservices > Detect problematic Microservices > Generate test scripts
  20. As a Key Message Again Proposal in the Era of

    Microservices Detecting failures more quickly Over solving almost all of bugs before release With Test Automation
  21. Utilize technology for improving business Lead process improvements on each

    product development team Tame complexity of Microservices