Slide 1

Slide 1 text

The Wix Microservice Stack Tomer Gabel, Wix March 2017 @ Dnipro, UA

Slide 2

Slide 2 text

Agenda 1. Topology 2. Networking 3. Structure 4. Operations 5. Beer

Slide 3

Slide 3 text

Our conceptual system Store Service Checkout Service Cart Service

Slide 4

Slide 4 text

1. TOPOLOGY Image: Penrose Steps by Alex Eylar (CC BY-NC-SA 2.0)

Slide 5

Slide 5 text

Our conceptual system Store Service Checkout Service Cart Service Host A Host B Host C

Slide 6

Slide 6 text

Topology Topology Service→ host mapping Server inventory Service catalogue Formally, “scheduling”

Slide 7

Slide 7 text

Service Scheduling • A hard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)

Slide 8

Slide 8 text

Service Scheduling • A hard problem! • Multiple dimensions: – Resource utilization (disk space, I/O, RAM, network, power…) – Resource availability – Failover (physical server, rack, row…) – Custom constraints (zoning, e.g. PCI compliance)

Slide 9

Slide 9 text

Service Scheduling • The middle ground: – Naïve automatic scheduler – Human-configured overrides for zoning, optimization • Easy but limited scale – A few hundred servers

Slide 10

Slide 10 text

Our conceptual system Store Service Checkout Service Cart Service http://err:42/uh … derp?

Slide 11

Slide 11 text

Service Discovery Static Dynamic Logical Physical That way madness lies

Slide 12

Slide 12 text

Service Discovery Static Dynamic Logical Physical

Slide 13

Slide 13 text

Service Discovery Static Dynamic Logical Physical

Slide 14

Slide 14 text

In practice • Static topology – Managed with Frying Pan – Exported to Chef – Deployed via configuration files • Live registry in Zookeeper – Deployment only – … for now

Slide 15

Slide 15 text

2. NETWORKING Image: Neurons by Birth Into Being (CC BY-NC-SA 2.0)

Slide 16

Slide 16 text

Back to diagrams Store Service Checkout Service Cart Service

Slide 17

Slide 17 text

Back to diagrams Store Service Checkout Service Cart Service Protocol

Slide 18

Slide 18 text

Protocol • RPC-style – Sync or async – Point-to-point • Message passing – Async only – Requires broker Shared Concerns Topology Serialization Operations

Slide 19

Slide 19 text

Protocol • Wix RPC – RPC-style – Custom JSON – HTTP • Pros/cons – Rock-solid – Sync/blocking – Legacy Image: psycho chicken by Bernhard Latzko (CC BY-ND 2.0)

Slide 20

Slide 20 text

Protocol • Greyhound –Message-passing –Custom JSON –Kafka • Pros/cons –Async + replayable –Still experimental Image: Robin Fledgeling by edgeplot (CC BY-NC-SA 2.0)

Slide 21

Slide 21 text

Load balancing • Centralized – Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds?

Slide 22

Slide 22 text

Load balancing • Centralized – Simple – Limited flexibility – Limited scale – Thin implementation  highly portable – Suitable for static topologies • Distributed – Highly scalable – Flexible – Fully dynamic – Fat implementation  difficult to port • Quasi-distributed – e.g. Synapse – Best of both worlds? Frying Pan  Chef  Nginx

Slide 23

Slide 23 text

To our shame • There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid memberId, Guid collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid collectionId); List listMembersByCollectionId( Guid collectionId); }

Slide 24

Slide 24 text

To our shame • There’s always IDL. • Informal – Text documentation – Code samples • Formal – Swagger, Apiary etc. – ProtoBuf, Thrift, Avro – WSDL, god forbid! • … or – Ad-hoc public interface SiteMembersService { SiteMemberDto getMemberById( Guid memberId, UserGuid userId); SiteMemberDto getMemberOrOwnerById( Guid memberId, Guid collectionId); SiteMemberDto getMemberDtoByEmailAndCollectionId( String email, Guid collectionId); List listMembersByCollectionId( Guid collectionId); }

Slide 25

Slide 25 text

In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Server stack (JVM) – Jetty – Spring + Spring MVC – Custom handler • RPC client stack (JVM) – Spring – Proxy classes generated at runtime – AsyncHttpClient

Slide 26

Slide 26 text

In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(

Slide 27

Slide 27 text

In Detail • Java interfaces? + Ridiculously simple + Lend well to RPC – Coupled to JVM • JSON serialization + Jackson-based + Custom, extensible mapping – Reflection-based • Alternative stack – Based on Node.js – Generated RPC clients – Manually-converted entity schema :-(

Slide 28

Slide 28 text

Cascade Failures • What is a cascade failure? • Mitigations – Bulkheading – Circuit breakers – Load shedding • We don’t do any of that (mostly)

Slide 29

Slide 29 text

Does it go? • Short answer: yes. • Battle-tested – Evolving since 2010. – >200 services in production. • Known quantity – Easy to operate – Performs well enough – Known workarounds

Slide 30

Slide 30 text

Not all is well, though • Polyglot development – Custom client stack – Expensive to port!

Slide 31

Slide 31 text

Not all is well, though • Polyglot development – Custom client stack – Expensive to port! • Implicit state – Transparently handled by the framework – Thread local storage – Hard to go async! Client Proxy Service A Service B Session info Session info Transaction ID Session info Transaction ID A/B experiment Transaction ID A/B experiment

Slide 32

Slide 32 text

3. STRUCTURE

Slide 33

Slide 33 text

Codebase modeling • A product comprises multiple services • Services have dependencies – Creating a DAG – Tends to cluster around domains • Org structure reflects the clustering (Conway)

Slide 34

Slide 34 text

Codebase modeling Repository-per-domain • Small repositories • Artifacts built independently • Binary dependencies • Requires specialized tools to manage: – Versions – Build dependencies Monorepo • Repository contains everything • Code is built atomically • Source dependencies • Requires a specialized build tool

Slide 35

Slide 35 text

At Wix • One repo per domain • Dependencies: – Declared in POMs – Version management via custom plugin – Builds managed by custom tool* • Custom dashboard, “Wix Lifecycle” * Lifecycle – Dependency Management Algorithm

Slide 36

Slide 36 text

Version management [INFO] QuickRelease /home/builduser/agent01/work/d9922a1c87aee4bb bf1bc8bcfb2eccebc4268651c5f19faa689be6e4 [08:10:55][INFO] Adding tag RC;.;1.20.0 [08:10:56][INFO] Tag RC;.;1.20.0 added successfully [08:10:56][INFO] Working on onboarding-server-web [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar deployable copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar sources copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar copied [08:10:56][INFO] onboarding-server-web-1.19.0- SNAPSHOT jar tests copied [08:10:56][INFO] onboarding-server-web pom deployed [08:10:57][INFO] Deploying artifacts to release artifacts repository [08:10:57][INFO] Deploying onboarding-server-web to RELEASE [08:10:57][INFO] pushing new pom [08:10:59]2016-02-22 08:10:39 [INFO ] /usr/bin/git push --tag origin master exitValue = 0 • All artifacts share a common parent – Master list of versions • Manually-triggered release builds – Custom release plugin – Increments version – Updates master – Pushes changes to git

Slide 37

Slide 37 text

4. OPERATIONS

Slide 38

Slide 38 text

Back to diagrams Store Service Checkout Service Cart Service How ya doin’?

Slide 39

Slide 39 text

Health • Host monitoring – Sensu alerts – Usual host metrics – Health-check endpoint in framework • End-to-end – Pingdom • Business – Custom BI toolchain

Slide 40

Slide 40 text

Instrumentation • Metrics – DropWizard Metrics – Graphite and Anodot – Built-in metrics (RPC, resource pools…) – APIs for custom metrics • Alerts – Anodot, NewRelic – Via PagerDuty

Slide 41

Slide 41 text

Debugging • Logs – Good old Logback – No centralized aggregation – Not particularly useful • Feature toggle overrides • Distributed tracing

Slide 42

Slide 42 text

WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com