Real World Microservices

Real World Microservices

Df8bebd125339c83eb28e0f9aa48ced1?s=128

Christoph Leiter

April 07, 2017
Tweet

Transcript

  1. 2.

    Agenda 1 Introduction 2 Microservices 3 Creating the infrastructure 4

    Implementation of microservices 5 Lessons learned 2
  2. 5.

    starjack II Customers sign up and order their personal keycard

    Tickets of various lift operators can be booked and are available within seconds No more standing in line for a ticket 5
  3. 6.

    starjack III Created with microservices from the ground up Everything

    is a REST interface Frontend uses ES6 and React/Redux 100% Open Source components Development infrastructure Self hosted Gitlab (git, issues, Docker registry) Jenkins - Builds are executed on DigitalOcean and pushed to the Docker registry 6
  4. 8.

    Why Microservices Architectural choice with advantages and drawbacks + Scalability,

    Reliability, Isolation − Operational complexity starjack communicates with many different 3rd party systems Uses multiple protocols like REST and SOAP Used to isolate services from each other If one service has problems or is down not everything is affected Should one service get compromised an attacker does not get access to all data Reduces complexity of individual services It’s easier to manage 12 services with 2k LOC than 1 service with 24k LOC When you change one system you can only break so much 8
  5. 9.

    Our Microservices I starjack-auth Holds user data and handles logins

    starjack-dta Gets tickets from lift operators using the DTA interface from SkiData AG starjack-axess Gets tickets from lift operators using the Axess interface from Axess AG starjack-liftoperator Manages our lift operators and works as a facade for dta and axess starjack-order Verifies and processes customer orders. Creates invoices and sends emails starjack-payment Handles payments with the Mpay24 PSP 9
  6. 10.

    Our Microservices II starjack-keycards Orders new keycards from a 3rd

    party supplier and updates status once produced starjack-weather Retrieves current weather for lift operators, currently using OpenWeatherMaps starjack-maps Used to get map locations for lift operators and for travel duration estimation. Uses Google Maps starjack-faq Used to manage FAQ entries starjack-mail Sends mails using Mailgun mail service As the system grows we will have more services instead of growing one monolithic system without bounds 10
  7. 12.

    starjack Deployment starjack uses AWS as its deployment platform Offers

    tons of services, everything is fully automatable Very high reliability possible if your architecture supports it Great security properties because you get your own software defined network Everything is deployed in three availability zones We use many AWS services: EC2, S3, CloudFront, RDS, ElastiCache, SQS, Route53, CloudWatch, . . . 12
  8. 16.

    The Basics So, what do we need to get started

    with a microservices architecture? Very good fully automated infrastructure management is key, see Martin Fowler’s “You need to be this tall to use microservices” Fowler says you need to be able to rapidly provision servers have very good monitoring and logging infrastructure have deployment automated Bonus points if you can programatically recreate your infrastructure from scratch 16
  9. 17.

    Infrastructure as Code I Terraform allows you to specify your

    whole infrastructure as simple HCL files Supports AWS, Google Cloud, Mailgun and dozens of other services When you run Terraform it will compare your current state with the desired state and apply the needed changes Creates dependency graph between your resources and modifies them in the right order No more clicking around in AWS web console Every change is documented and versioned, manual changes will be reverted on next run 17
  10. 20.

    Infrastructure as Code IV We use Terraform for the whole

    basic infrastructure VPC, Firewall rules EC2 instances, ELB, S3, CloudFront, Route 53, SQS RDS Cluster, Redis Cluster Mailgun Very easy to use and works really well 20
  11. 21.

    Server Provisioning Now we have our bare EC2 instances running

    and we need to install some software on them Terraform is not a provisioner – we need another tool to automate that We chose Ansible to provision our servers Works over SSH and doesn’t have requirements for the clients besides python We tag our instances by role with Terraform. Automatic inventory file by using ec2.py Configures EC2 instances, creates databases and users, defines DigitalOcean Jenkins Slave, . . . 21
  12. 22.

    Service Scheduling We use a cluster as our microservice deployment

    platform. A scheduler is needed to make decisions on where in your cluster your services should run. We chose Nomad In comparison to other schedulers easy to get started Just one binary to install Relatively new and not as mature as other solutions Requires three servers, should be deployed to different availability zones 22
  13. 23.

    Service Discovery Now that our services are deployed somewhere we

    need to find them That’s the job of a service discovery service We chose Consul because it plays well together with Nomad Whenever a new service is deployed with Nomad it will register its endpoints in Consul A tool called consul-template updates the nginx configuration file as soon as changes happen and reloads nginx 23
  14. 24.

    Deployment I Nomad needs a job description file to know

    what to schedule where Nginx needs to be configured so it knows which endpoints should be routed to which services We deploy our services using our small custom YAML DSL which is processed by Ansible 24
  15. 26.

    Deployment III Deployment steps 1 Modify services DSL file 2

    Trigger deployment with Ansible 3 Creates Nomad job specification files and triggers scheduling 4 Nomad does a rolling update of the services 5 Nomad worker nodes pull new Docker images and start them 6 As new versions are rolled out Consul and Nginx get updated 26
  16. 29.

    Microservices Details Microservices are written in Kotlin and are based

    on spring-boot & Hibernate Each service has its own git repository One common library which services can use Be careful not to introduce unwanted dependencies between services! Treat as API and don’t break it Only used for cross cutting concerns Communication between microservices by using REST for synchronous and a queue for asynchronous communication Not covered in detail, your implementation will be different anyways :) 29
  17. 30.

    Authentication User requests need be authenticated We don’t want to

    query the authentication service for every request Would create a lot of load and a potential bottleneck Instead we use JSON Web Tokens (JWT) Authentication service creates cryptographically signed token for the user using its private key Services have a public key and can check whether the token is legitimate Since there’s no invalidation of a token we use a low TTL and a refresh mechanism 30
  18. 31.
  19. 32.

    Interservice Communication If an immediate response is needed communication is

    simply done via REST HTTP requests Passes Authentication header if it’s required to identify the user For asynchronous communication we use a queue Decouples services Messages don’t get lost if other service is down Automatic retries Increases reliability of your system Coordination between multiple instances of the same type is done with Redis 32
  20. 33.

    Logging I When you have dozens of services running you

    need a good centralized logging solution AWS offers CloudWatch Logs Docker can natively log to CloudWatch Every log message should only be one event (one line) We use awslogs as a “remote grep” tool for CloudWatch 33
  21. 35.

    Logging III Logs are useless if nobody looks at them.

    You need a quick overview and notifications for errors We have an AWS Lambda function which subscribes to our application logs Filters logs for events we are interested in Errors and warnings Whitelisted info events Uses slack API to push log messages to Slack Different channels based on severity and type Adds Slack notifications for errors 35
  22. 37.

    Monitoring We need external monitoring to notify us in case

    a system is down We expose simple status endpoints from our services and let StatusCake monitor those If a service is unreachable StatusCake uses Pushover to send us a push notification 37
  23. 39.

    Log Everything When things fail you need to be able

    to debug the problem It helps to log as much as you can Application logs Web server access/error logs Queue messages Linux system logs Frontend logs For application logs think about how you will search for messages later, i.e. include relevant data For system logs use blacklists for messages you are not interested in, get notified for everything you don’t expect 39
  24. 40.

    Distributed Systems A distributed system might not exactly behave as

    you’d think, see “Fallacies of Distributed Computing” Even when something has a failure likelihood of << 1% if that runs thousands of times it will eventually go wrong Expect that things will go wrong and make your system as robust as possible 40
  25. 41.

    DB Connections Usually DBs have a connection limit set Once

    the limit is reached you won’t get a new connection Think about how many connections you will have, it might be more than you think connections = services × instances × poolsize 41
  26. 42.

    System Resources Really hard to know your service memory requirements

    beforehand Services will probably consume more resources than you assume as they also need to have their runtime environment in memory No memory sharing/deduplication if you use Docker Be careful if you do hard memory limit enforcement 42
  27. 43.

    Summary The biggest challenge when doing microservices isn’t programming microservices

    but the infrastructure Everything has to be automated. You need: Automatic infrastructure setup: Terraform, CloudFormation, Heat, . . . A provisioner: Ansible, Puppet, Chef, . . . Automatic builds: Jenkins, Gitlab, Travis CI, . . . A scheduler: Nomad, Kubernetes, Docker Swarm, . . . A discovery service: Consul, etcd, Zookeeper, . . . An easy way to deploy services Centralized logging: CloudWatch, ELK, Graylog, . . . Monitoring: StatusCake, Nagios, sensu, . . . 43
  28. 44.

    Summary II You only need to invest in infrastructure knowledge

    once Makes development and evolution of your system easier Dramatically reduces complexity of individual services You get rewarded with a distributed highly reliable system 44
  29. 45.

    Kotlin Meetup If you’re interested in Kotlin please join our

    meetup! Next meetup is on April 18th. 45
  30. 47.

    References starjack: https://starjack.at/ MicroservicePrerequisites: https://martinfowler.com/ bliki/MicroservicePrerequisites.html Terraform: https://www.terraform.io/ Ansible: https://www.ansible.com/

    Nomad: https://www.nomadproject.io/ Consul: https://www.consul.io/ consul-template: https://github.com/hashicorp/consul-template Fallacies of distributed computing: https://en.wikipedia. org/wiki/Fallacies_of_distributed_computing 47