Reliability Engineering for Enterprise Serverless

Reliability Engineering for Enterprise Serverless

5cd1b99b9950b26e8ee005bd6cd1a860?s=128

Masashi Terui

March 11, 2018
Tweet

Transcript

  1. RELIABILITY ENGINEERING FOR ENTERPRISE SERVERLESS MASASHI TERUI @ JAWS DAYS

    2018
  2. SERVERWORKS CO.,LTD. + FREELANCER • Serverless Oji-san • Serverless Framework

    Plugin Developer • Serverlessconf Tokyo 2016/2017 speaker • Remote worker (in Sapporo) • The best Cloud Engineer in Hokkaido!! (Ͱ͋Γ͍ͨʣ MASASHI TERUI ARCHITECT / DEVELOPER
  3. AGENDA SERVERLESS ͬͯͳΜ͚ͩͬʁ 1 6 2 7 3 8 4

    9 5 10 SERVERLESS ͷ৴པੑͱ͸ʁ SERVERLESS ͷΑ͋͘Δ՝୊ SERVERLESS Λ࣮ମΛଊ͑Δ RELIABILITY ߟ͑ํ RELIABILITY ઃܭฤ RELIABILITY ࣮૷ฤ RELIABILITY ؂ࢹฤ SUMMARY ੈքΛ޿͛Δ SERVERLESS ͸ಛผ͡Όͳ͍
  4. SERVERLESSͬͯͳΜ͚ͩͬʁ WHAT IS

  5. CNCF SERVERLESS WHITEPAPER V1.0 • Serverless computing refers to the

    concept of building and running applications that do not require server management • A platform may provide one or both of the following: • Functions-as-a-Service (FaaS) • Backend-as-a-Service (BaaS) • Products or platforms deliver the following benefits to developers: • Zero Server Ops • No Compute Cost When Idle 
 https://github.com/cncf/wg-serverless/tree/master/whitepaper
  6. SERVERLESS IS NOT GLUE IN ENTERPRISE APPLICATION ”THE ORCHESTRATOR MANAGES

    THE TRADES USING A GRAPH OF STATES”
  7. SERVERLESS CLOUD NATIVE LANDSCAPE

  8. BUT I PREFER THIS ONE https://www.slideshare.net/acloudguru/ant-stanley-being-serverless

  9. SERVERLESS USE CASES (FROM CNCF WP) • Asynchronous, concurrent, easy

    to parallelize into independent units of work • Infrequent or has sporadic demand, with large, unpredictable variance in scaling requirements • Stateless, ephemeral, without a major need for instantaneous cold start time • Highly dynamic in terms of changing business requirements that drive a need for accelerated developer velocity • Non-HTTP-centric and non-elastic scale workloads that weren’t good fits for an IaaS, PaaS, or CaaS solution (Event Driven workloads)
  10. “There are many workloads that are stateful and/or not easy

    to parallelize” ͱࢥͬͯ·ͤΜ͔ʁ “Asynchronous and Event Driven processing is too difficult for humans”
  11. SERVERLESSͷ৴པੑͱ͸ʁ Reliability

  12. RELIABILITY(RASIS) ࡉ෼Խ͢Δͱ৭ʑ͋Δ • Reliability • Availability • Serviceability • Integrity

    • Security ͜͜Ͱ͸ҎԼͷΑ͏ʹఆٛ
 ʮReliability = ޿ٛͷ৴པੑ(RASIS)ʯ Reliability Availability Serviceability Integrity Security Reliability
  13. IS SERVERLESS DIFFICULT TO GUARANTEE THE RELIABILITY? • Strongly depends

    on FaaS platform and BaaS products • Lose the business continuity (Reliability, Availability) • Distributed Instances • Lose the traceability (Serviceability) • Hard to develop • All become functions (Serviceability) • NoSQL matches better than RDB (Integrity)
  14. SERVERLESSͷΑ͋͘Δ՝୊ i s s u e s

  15. FaaSͷΑ͋͘Δ՝୊ • How to test the functions? • Granularity of

    the functions • Messaging between the functions and backends • Handling request and response (Error Handling) • Log Aggregation, Traceability • Monitoring
  16. BaaSͷΑ͋͘Δ՝୊ • How to choose the services? • Fault Tolerance

    • Monitoring
  17. SERVERLESSͷ࣮ମΛଊ͑Δ Mechanism HINT

  18. MECANISM OF FAAS

  19. SERVERLESS PROCESSING MODEL https://github.com/cncf/wg-serverless/tree/master/whitepaper#detail-view-serverless-processing-model ॲཧͷશମ૾Λ௫Ή Πϕϯτιʔε͔ΒͷΠϕϯτॲཧཁٻΛ ଟ਺ͷΠϯελϯε্ͷFunction͕෼ࢄॲཧ

  20. THE INTERNAL FLOW OF PROCESSING https://github.com/apache/incubator-openwhisk/blob/master/docs/about.md#the-internal-flow-of-processing Πϕϯτ(HTTP)͕ॲཧ͞ΕΔྲྀΕΛ௫Ή ετϦʔϜ΍ΩϡʔΛڬΜͰ෼ࢄॲཧ͢Δͷ͕جຊ ֎͔ΒݟͯಉظͰ΋த͸ඇಉظ͔ͭ෼ࢄ

  21. THE FUNCTIONS THAT INVOKED ASYNCHRONOUS IN THE CONTAINERS FaaS is…

    ίϯςφ಺Ͱඇಉظʹݺͼग़͞ΕΔؔ਺
  22. WHAT IS BAAS?

  23. FROM OWNERSHIP TO USE SERVICES Ϋϥ΢υʹΑͬͯαʔό͕ॴ༗͔Βར༻΁ ͞Βʹϛυϧ΢ΣΞ΍ϥΠϒϥϦ΋ར༻΁ BaaSΛ࢖͏ͷ͸ͦ͏͍͏ࣗવͳྲྀΕ

  24. FULLY MANAGED AND ABSTRACTED MIDDLEWARES AND LIBRARIES BaaS is… ϑϧϚωʔδυ͔ͭந৅Խ͞ΕͨMW΍ϥΠϒϥϦ

  25. SERVERLESS͸ಛผ͡Όͳ͍ is not special

  26. CONSTITUTED OF THE ABSTRACTED FUNCTIONS AND MIDDLEWARES Serverless is… ந৅Խ͞ΕͯΔ͚ͩͰؔ਺ͱMWͰͰ͖ͯΔ

  27. WE CAN MAKE RELIABLE SERVERLESS APPLICATION SO… ৴པੑ͸࡞ΕΔʂ

  28. RELIABILITYͷߟ͑ํ Method of thinking

  29. RELIABILITYͷߟ͑ํ • Make the reliability by myself • Serverless will

    help you, but will not protect your business • Think simple • Apply generally development/operation practices • If you can't apply the practices, take care of the serverless mechanism • Keep simple • Don't be afraid that increase the number of the functions • We should be afraid complicated architecture • Change your mind as a software • Everything is part of your application
  30. RELIABILITY ઃܭ Architecting

  31. ALL EVENTS FLOWS IN THE SAME DIRECTION Πϕϯτ(σʔλ)͸ಉ͡ํ޲ʹྲྀ͢ ݁Ռ͕ඞཁͳΒಉظͰฦͣ͞ औΓʹߦ͔ͤΔ

  32. ALL EVENTS FLOWS IN THE SAME DIRECTION • They will

    be naturally Asynchronous and Functional • Asynchronous processing is Retriable • Functional processing is Reproducible • The clients get the results by myself • However, polling is not good choise... • Pushing is better choice • Can we be happy with AppSync? (Pushing via Websocket)
  33. UNIFY THE ENDPOINTS BETWEEN THE SERVICES αʔϏεؒͷΤϯυϙΠϯτ͸ू໿͢Δ ϥοϓͯ͠ू໿͠ίϯτϩʔϧΛಘΔ

  34. UNIFY THE ENDPOINTS BETWEEN THE SERVICES • Microservices • Separate

    the services by the domains (One BaaS is one of your service) • The endpoint of the service is not unique, it has the endpoints for each operations • Wrap the endpoints to abstract them • Like a “MySQL Server” and “libmysql” • Do you call “libmysql” directly? • You can make Failover/Failsafe mechanism • Like a Reverse Proxy • Do you connect to multiple “Read replicas” from “each app servers”? • Trafic controlling, Caching
  35. ALL SERIES OF EVENTS HAVE THE SAME ID Ұ࿈ͷॲཧʹ͸ಉ͡IDΛ෇༩ ͦΕΛҾ͖ճ͢͜ͱͰIDͰτϨʔεͰ͖Δ

    ͜ͷID͸༷ʑͳ੍ޚʹ΋࢖͑Δ
  36. ALL SERIES OF EVENTS HAVE THE SAME ID • Log

    Aggregation • A series of events can be traced by the ID • Monitor the progress • Log all event messages • Execution control • At least once -> Exactly once • e.g. DynamoDB Conditional Writes • Make it easy to implement with something like a decorator
  37. DATA MODELING • Become the friend with DynamoDB • Distributed

    by Partition Key and Indexed(B-tree) by Sort Key, LSI • GSI is a projection of sorted(indexed) data • The consistency can be guaranteed without ACID transaction • Denormalization • Strong consistency reading, 
 Conditional Writing • There are some difficult situation • Write asynchronous to RDB
  38. RELIABILITY ࣮૷ Implementation

  39. GRANULARITY OF THE FUNCTIONS •Testable!! • Unit testing is justice

    in the serverless world • Make the dependencies of other services are replaceable • Would be replaceable to the mocks • Easy to Failover/Failsafe
  40. HOW TO TEST THE FUNCTIONS • Unit testing is justice

    in the serverless world (2ճ໨) • Deploy a new environment if the mocks are not enough at integration testing • It is easy with some frameworks (e.g. Serverless, SAM) • The services outside of AWS are needed to be easily to deploy (via API) • Continuous E2E testing with traceable ID • It become a monitoring
  41. RELIABILITY ؂ࢹ Monitoring

  42. RELIABILITY MONITORING • The greatest monitoring is the notifications from

    the application • Be sure to catch the errors and notify them • Collect the metrics of the services • CloudWatch • This is a condition to choice the services outside of AWS • Continuous E2E testing with traceable ID
  43. SUMMARY

  44. “SERVERLESS IS NOT SPECIAL” THANKS!! “MAKE THE RELIABILITY BY MYSELF”

    “THINK SIMPLE, KEEP SIMPLE” “EVERYTHING IS PART OF YOUR APPLICATION” “LET'S EXPAND SERVERLESS WORLD”
  45. bit.ly/jd2018-sls #jawsdays #jd2018_a PLEASE TAKE A SURVEY