Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Wix Microservice Stack

The Wix Microservice Stack

A talk given at Wix R&D in Dnipro, Ukraine on March 2017. Video available at https://www.youtube.com/watch?v=eIX33mQdkAI&feature=youtu.be

While microservices are conceptually simple, it's a deep rabbit hole to go down. Deceptively simple questions can have far-reaching implications: Which communication protocol should I choose? Is event-driven the way to go? What monitoring tools should I put in place?

In this talk we'll cover some of the fundamental questions, outline the solutions adopted or developed by Wix, and share our hindsight on what worked well for us, what didn't and thoughts on future directions for our stack.

Tomer Gabel

March 29, 2017
Tweet

More Decks by Tomer Gabel

Other Decks in Programming

Transcript

  1. Service Scheduling • A hard problem! • Multiple dimensions: –

    Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  2. Service Scheduling • A hard problem! • Multiple dimensions: –

    Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)
  3. Service Scheduling • The middle ground: – Naïve automatic scheduler

    – Human-configured overrides for zoning, optimization • Easy but limited scale – A few hundred servers
  4. In practice • Static topology – Managed with Frying Pan

    – Exported to Chef – Deployed via configuration files • Live registry in Zookeeper – Deployment only – … for now
  5. Protocol • RPC-style – Sync or async – Point-to-point •

    Message passing – Async only – Requires broker Shared Concerns Topology Serialization Operations
  6. Protocol • Wix RPC – RPC-style – Custom JSON –

    HTTP • Pros/cons – Rock-solid – Sync/blocking – Legacy Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)
  7. Protocol • Greyhound –Message-passing –Custom JSON –Kafka • Pros/cons –Async

    + replayable –Still experimental Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)
  8. Load balancing • Centralized – Simple – Limited flexibility –

    Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds?
  9. Load balancing • Centralized – Simple – Limited flexibility –

    Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds? Frying Pan  Chef  Nginx
  10. To our shame • There’s always IDL. • Informal –

    Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  11. To our shame • There’s always IDL. • Informal –

    Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid<SiteMember> memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid<SiteMember> memberId, Guid<SMCollection> collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid<SMCollection> collectionId); List<SiteMemberDto> listMembersByCollectionId( Guid<SMCollection> collectionId); }
  12. In Detail • Java interfaces? + Ridiculously simple + Lend

    well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Server stack (JVM) – Jetty – Spring + Spring MVC – Custom handler • RPC client stack (JVM) – Spring – Proxy classes generated at runtime – AsyncHttpClient
  13. In Detail • Java interfaces? + Ridiculously simple + Lend

    well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  14. In Detail • Java interfaces? + Ridiculously simple + Lend

    well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(
  15. Cascade Failures • What is a cascade failure? • Mitigations

    – Bulkheading – Circuit breakers – Load shedding • We don’t do any of that (mostly)
  16. Does it go? • Short answer: yes. • Battle-tested –

    Evolving since 2010. – >200 services in production. • Known quantity – Easy to operate – Performs well enough – Known workarounds
  17. Not all is well, though • Polyglot development – Custom

    client stack – Expensive to port! • Implicit state – Transparently handled by the framework – Thread local storage – Hard to go async! Client Proxy Service A Service B Session info Session info Transaction ID Session info Transaction ID A/B experiment Transaction ID A/B experiment
  18. Codebase modeling • A product comprises multiple services • Services

    have dependencies – Creating a DAG – Tends to cluster around domains • Org structure reflects the clustering (Conway)
  19. Codebase modeling Repository-per-domain • Small repositories • Artifacts built independently

    • Binary dependencies • Requires specialized tools to manage: – Versions – Build dependencies Monorepo • Repository contains everything • Code is built atomically • Source dependencies • Requires a specialized build tool
  20. At Wix • One repo per domain • Dependencies: –

    Declared in POMs – Version management via custom plugin – Builds managed by custom tool* • Custom dashboard, “Wix Lifecycle” * Lifecycle – Dependency Management Algorithm
  21. Version management [INFO] QuickRelease /home/builduser/agent01/work/d9922a1c87aee4bb bf1bc8bcfb2eccebc4268651c5f19faa689be6e4 [08:10:55][INFO] Adding tag RC;.;1.20.0

    [08:10:56][INFO] Tag RC;.;1.20.0 added successfully [08:10:56][INFO] Working on onboarding-server-web [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar deployable copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar sources copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar tests copied [08:10:56][INFO] onboarding-server-web pom deployed [08:10:57][INFO] Deploying artifacts to release artifacts repository [08:10:57][INFO] Deploying onboarding-server-web to RELEASE [08:10:57][INFO] pushing new pom [08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git push --tag origin master exitValue = 0 • All artifacts share a common parent – Master list of versions • Manually-triggered release builds – Custom release plugin – Increments version – Updates master – Pushes changes to git
  22. Health • Host monitoring – Sensu alerts – Usual host

    metrics – Health-check endpoint in framework • End-to-end – Pingdom • Business – Custom BI toolchain
  23. Instrumentation • Metrics – DropWizard Metrics – Graphite and Anodot

    – Built-in metrics (RPC, resource pools…) – APIs for custom metrics • Alerts – Anodot, NewRelic – Via PagerDuty
  24. Debugging • Logs – Good old Logback – No centralized

    aggregation – Not particularly useful • Feature toggle overrides • Distributed tracing
  25. WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank

    you for listening [email protected] @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com