Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pragmatic Micro-services for Organisational Scalability

Pragmatic Micro-services for Organisational Scalability

At FashionTrade, we use Docker containers deployed to Kubernetes hosted on Google Cloud Platform to primarily achieve two very important goals: a) whenever someone joins the team, we want them to make their first useful production commit within a day or two and b) creating, testing and deploying a service should not be much harder than editing a text file and pushing to git. We achieve these with our Docker based services setup and automated deployment pipeline built on Jenkins. In this talk we'll go into technical details about our setup, the organisational aspects of our approach and the results so far.

B8e3675079efff214c605fa0b67bf3d0?s=128

FashionTrade.com Engineering

January 19, 2017
Tweet

Transcript

  1. Pragmatic Micro-Services at FashionTrade Docker Meetup January 2017

  2. About Friso van Vollenhoven Mostly worked in software dev and

    related roles. Former CTO at a (big) data analytics and machine learning company. Now CTO at FashionTrade. I am the proud owner of a three character Twitter handle: @fzk. I have 18 endorsements for Awesomeness on LinkedIn.
  3. FashionTrade A B2B platform for fashion wholesale. Fashion brands and

    retailer can connect and do business. E-commerce for (fashion) businesses. Tagline: “We simplify wholesale so you can Connect, Trade & Grow.” About
  4. None
  5. Product information enters the brand integration API Validation Product information

    is merged with existing data that applies Price lists, stock levels, existing images, etc. Product enters search engine Only if complete (i.e. it has a known price, availability, etc.) Product information is used for orders, confirmations, etc. The life of a product
  6. The life of a product

  7. Why services Separates business concerns Allows people to work on

    many things concurrently Works well with log based data architecture Ideally, makes for organisational scalability at the cost of added complexity in delivery
  8. Example

  9. Service written in Java External facing Requires authentication Has a

    health check (required) Has a admin endpoint for internal querying Brand integration API service
  10. Dockerfile FROM eu.gcr.io/ft-main/jre:8 EXPOSE 8080 8081 LABEL \ meta.attributes.id="pim-integration" \

    meta.attributes.type="service" \ meta.attributes.team="Developers" \ meta.description="The FashionTrade PIM integration service." \ meta.checks.health.endpoint="/healthcheck" \ meta.checks.health.port="8081" \ meta.ports.http.service="80" \ meta.ports.http.container="8080" \ meta.ports.admin.service="8081" \ meta.ports.admin.container="8081" \ meta.routing.gateway.mapping.path-segment="pim" \ meta.routing.gateway.mapping.dns-prefix="api" ENV \ NAMESPACE="ft-prod" \ KAFKA_BOOTSTRAP_SERVERS="kafka:9092" ENTRYPOINT ["java", "-jar", "pim-integration-service.jar", "server", "config.yml"] COPY config.yml config.yml COPY target/pim-integration-service-all.jar pim-integration-service.jar
  11. Dockerfile, deployment information LABEL \ meta.attributes.id="pim-integration" \ meta.attributes.type="service" \ meta.attributes.team="Developers"

    \ meta.description="The FashionTrade PIM integration service." \ meta.checks.health.endpoint="/healthcheck" \ meta.checks.health.port="8081" \ meta.ports.http.service="80" \ meta.ports.http.container="8080" \ meta.ports.admin.service="8081" \ meta.ports.admin.container="8081" \ meta.routing.gateway.mapping.path-segment="pim" \ meta.routing.gateway.mapping.dns-prefix="api"
  12. // Snippet from seed_job.groovy // ... // ============================================================================ // Services

    // ============================================================================ /** The master list of all services that should be built and deployed. */ def services = [ 'api-docs-service' : Builder.Bash, 'app-service' : Builder.Npm, 'brand-service' : Builder.Maven, 'canary-service' : Builder.Maven, 'connection-service' : Builder.Maven, 'gatekit' : Builder.Maven, 'image-service' : Builder.Maven, 'login-service' : Builder.Npm, 'order-service' : Builder.Maven, 'pim-integration-service' : Builder.Maven, 'product-search-service' : Builder.Maven, 'product-service' : Builder.Maven, 'retailer-service' : Builder.Maven, 'user-service' : Builder.Maven ] // ... Jenkins seed job
  13. What happens? Build Deploy Update routing state Monitor (Scale)

  14. Build Jenkins creates a build pipeline from a seed job

    (new repos are added manually) Pipeline for dev branches terminate at build Pipeline for master includes a deploy step Deploy currently goes to a dev environment Manually push to production Will automate when we have better integration testing in place As a startup you don’t always have time for everything you want to do ...
  15. Deploy Metatron Internal tool Generates a Kubernetes manifest (YAML config

    file) from Docker labels Deploy step in Jenkins Runs metatron Applies manifest against target environment Environment specific configuration managed through Kubernetes secrets
  16. Service routing is essentially a reverse proxy that knows about

    all services Routing state depends on currently deployed and healthy services Should not treat it as configuration Hard to correctly centrally manage Custom built service router: gatekit Routing
  17. Custom service router for Kubernetes Routing state is runtime state

    based on deployed services; not static configuration Docker labels become service metadata in Kubernetes services Gatekit polls Kubernetes cluster for services, metadata and health status Current logic: at least one pod healthy == service healthy Gatekit provides later opportunity to solve A/B testing and canary deployments at the platform level Gatekit
  18. The bigger picture

  19. Deployment abstraction through Dockerfile Dockerfile / image + labels are

    the lingua franca of our deployments I.e. development delivers containers, go to production automatically Is that DevOps? Not everybody always knows the entire stack But would that scale? There is a tradeoff between complete understanding of all the moving parts and the speed of onboarding before productivity.
  20. A word on: Monitoring Most systems are pull based Need

    to install and configure agent to read the necessary data Pull based is complex in dynamic (service) environments Currently experimenting with DataDog (https://www.datadoghq.com/) Services push to agent Agent sends data upstream Still learning
  21. Datadog setup (artist impression) Datadog agent deployed using Daemon Sets

    Services push metrics to the agent (found on service host name) Agent takes care of bringing data upstream SaaS solution for dashboard / alerts / etc.
  22. A word on: Logging Logging is mostly push based (JVM

    logging uses Appenders) Currently using StackDriver on GCP Limited functionality Poor out-of-the-box experience Moving to ELK Using a hosted ELK SaaS provider (http://logz.io/)
  23. A word on: Dependencies

  24. A word on: Dependencies Sync vs. Async It is mentioned

    that sync dependencies are just expensive method calls Probably because it’s called RPC It’s not about sync vs. async It’s about schema’s And being evolution friendly with your schema’s
  25. Schema evolution When adding a new field to an entity,

    it must be optional Removing fields from the schema can’t be done But a producer can stop populating optional fields Readers / consumers / clients must have sensible handling of empty optionals Usually default values Sometimes different behaviour Whether the entity comes in over RPC or a queue is a different concern
  26. Side: diagramming made simple $ python render.py $ dot -Tpdf

    -Gratio='fill' -Gsize='11.7,8.3!' \ > -Gmargin='0' /tmp/dependencies.gv -O $ open /tmp/dependencies.gv.pdf
  27. Random Experiences and Learnings

  28. $ kubectl get ingress gatekit -o yaml apiVersion: extensions/v1beta1 kind:

    Ingress metadata: annotations: ingress.kubernetes.io/backends: '{"k8s-be-30535--cc30c14d35b2a243":"HEALTHY"}' <... lines snipped ...> ingress.kubernetes.io/url-map: k8s-um-default-gatekit--cc30c14d35b2a243 creationTimestamp: 2016-12-19T16:03:07Z generation: 1 name: gatekit namespace: default resourceVersion: "23182657" selfLink: /apis/extensions/v1beta1/namespaces/default/ingresses/gatekit uid: a1bdfb46-c604-11e6-a6ee-42010af00031 spec: backend: serviceName: gatekit-public servicePort: 80 tls: - secretName: gatekit-tls-certs status: loadBalancer: ingress: - ip: 130.211.27.124 GKE Ingress Magic
  29. Creating a Kubernetes ingress on GKE actually creates a Google

    Cloud Load Balancer Also, the IP stays static for as long as you don’t change the service name that it’s tied to (Google’s load balancing is not DNS based, but BGP based as it should be) If you change the value of the Kubernetes secret that holds the TLS cert, it automatically reconfigures the load balancer (not bad, right?) Of course you can still configure your own ingress controllers (e.g. if you need pod stickiness) GKE Ingress Magic
  30. Two levels of autoscaling pod autoscaling within Kubernetes (HPA) Node

    autoscaling by GKE (scaling settings defined for instance group) With CPU usage bursting, sometimes a cluster node hang / becomes unavailable This also stops the cluster autoscaler, so you don’t get new nodes Unfortunate because of the CPU burst in the first place Still looking into this Autoscaling sometimes doesn’t
  31. Using JSON for everything (including Kafka messages) Schemas defined in

    code Code attracts logic; schemas shouldn’t have logic JSON is more troublesome than anticipated Really easy to publish evolution incompatible messages on a queue Conscious decision to lower learning curve while bootstrapping development Will move to binary message format with formal schema definitions as soon as possible gRPC looks promising for synchronous dependencies Schema discipline
  32. Some Observations

  33. None
  34. None
  35. None
  36. None
  37. None
  38. None
  39. None
  40. Warning: shameless “we’re hiring” slide coming up...

  41. Vacancies Back end engineer (JVM, Python) Core Platform Customer Success

    Solutions Front end engineer (JavaScript, React / Redux) Infrastructure / deployment engineer Responsible for infra, Kafka + ES clusters and the build + deployment pipeline
  42. Questions?

  43. THANKS ! www.fashiontrade.com | hello@fashiontrade.com