Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real World Microservices

Real World Microservices

Christoph Leiter

April 07, 2017
Tweet

More Decks by Christoph Leiter

Other Decks in Programming

Transcript

  1. Real World Microservices
    Christoph Leiter
    Vienna Microservices Meetup, April 6th 2017

    View Slide

  2. Agenda
    1 Introduction
    2 Microservices
    3 Creating the infrastructure
    4 Implementation of microservices
    5 Lessons learned
    2

    View Slide

  3. Introduction

    View Slide

  4. starjack
    A platform for buying ski tickets online:
    4

    View Slide

  5. starjack II
    Customers sign up and order their personal keycard
    Tickets of various lift operators can be booked and are
    available within seconds
    No more standing in line for a ticket
    5

    View Slide

  6. starjack III
    Created with microservices from the ground up
    Everything is a REST interface
    Frontend uses ES6 and React/Redux
    100% Open Source components
    Development infrastructure
    Self hosted Gitlab (git, issues, Docker registry)
    Jenkins - Builds are executed on DigitalOcean and pushed to
    the Docker registry
    6

    View Slide

  7. Microservices

    View Slide

  8. Why Microservices
    Architectural choice with advantages and drawbacks
    + Scalability, Reliability, Isolation
    − Operational complexity
    starjack communicates with many different 3rd party systems
    Uses multiple protocols like REST and SOAP
    Used to isolate services from each other
    If one service has problems or is down not everything is affected
    Should one service get compromised an attacker does not get
    access to all data
    Reduces complexity of individual services
    It’s easier to manage 12 services with 2k LOC than 1 service
    with 24k LOC
    When you change one system you can only break so much
    8

    View Slide

  9. Our Microservices I
    starjack-auth Holds user data and handles logins
    starjack-dta Gets tickets from lift operators using the
    DTA interface from SkiData AG
    starjack-axess Gets tickets from lift operators using the
    Axess interface from Axess AG
    starjack-liftoperator Manages our lift operators and works as a
    facade for dta and axess
    starjack-order Verifies and processes customer orders.
    Creates invoices and sends emails
    starjack-payment Handles payments with the Mpay24 PSP
    9

    View Slide

  10. Our Microservices II
    starjack-keycards Orders new keycards from a 3rd party
    supplier and updates status once produced
    starjack-weather Retrieves current weather for lift operators,
    currently using OpenWeatherMaps
    starjack-maps Used to get map locations for lift operators
    and for travel duration estimation. Uses
    Google Maps
    starjack-faq Used to manage FAQ entries
    starjack-mail Sends mails using Mailgun mail service
    As the system grows we will have more services instead of growing
    one monolithic system without bounds
    10

    View Slide

  11. Our Microservices III
    11

    View Slide

  12. starjack Deployment
    starjack uses AWS as its deployment platform
    Offers tons of services, everything is fully automatable
    Very high reliability possible if your architecture supports it
    Great security properties because you get your own software
    defined network
    Everything is deployed in three availability zones
    We use many AWS services: EC2, S3, CloudFront, RDS,
    ElastiCache, SQS, Route53, CloudWatch, . . .
    12

    View Slide

  13. Deployment II
    Logical view on AWS
    13

    View Slide

  14. Deployment III
    Deployment view on AWS
    14

    View Slide

  15. Creating the Infrastructure

    View Slide

  16. The Basics
    So, what do we need to get started with a microservices
    architecture?
    Very good fully automated infrastructure management is key,
    see Martin Fowler’s “You need to be this tall to use
    microservices”
    Fowler says you need to
    be able to rapidly provision servers
    have very good monitoring and logging infrastructure
    have deployment automated
    Bonus points if you can programatically recreate your
    infrastructure from scratch
    16

    View Slide

  17. Infrastructure as Code I
    Terraform allows you to specify your whole infrastructure as
    simple HCL files
    Supports AWS, Google Cloud, Mailgun and dozens of other
    services
    When you run Terraform it will compare your current state
    with the desired state and apply the needed changes
    Creates dependency graph between your resources and
    modifies them in the right order
    No more clicking around in AWS web console
    Every change is documented and versioned, manual changes
    will be reverted on next run
    17

    View Slide

  18. Infrastructure as Code II
    18

    View Slide

  19. Infrastructure as Code III
    19

    View Slide

  20. Infrastructure as Code IV
    We use Terraform for the whole basic infrastructure
    VPC, Firewall rules
    EC2 instances, ELB, S3, CloudFront, Route 53, SQS
    RDS Cluster, Redis Cluster
    Mailgun
    Very easy to use and works really well
    20

    View Slide

  21. Server Provisioning
    Now we have our bare EC2 instances running and we need to
    install some software on them
    Terraform is not a provisioner – we need another tool to
    automate that
    We chose Ansible to provision our servers
    Works over SSH and doesn’t have requirements for the clients
    besides python
    We tag our instances by role with Terraform. Automatic
    inventory file by using ec2.py
    Configures EC2 instances, creates databases and users, defines
    DigitalOcean Jenkins Slave, . . .
    21

    View Slide

  22. Service Scheduling
    We use a cluster as our microservice deployment platform. A
    scheduler is needed to make decisions on where in your cluster your
    services should run.
    We chose Nomad
    In comparison to other schedulers easy to get started
    Just one binary to install
    Relatively new and not as mature as other solutions
    Requires three servers, should be deployed to different
    availability zones
    22

    View Slide

  23. Service Discovery
    Now that our services are deployed somewhere we need to find
    them
    That’s the job of a service discovery service
    We chose Consul because it plays well together with Nomad
    Whenever a new service is deployed with Nomad it will
    register its endpoints in Consul
    A tool called consul-template updates the nginx configuration
    file as soon as changes happen and reloads nginx
    23

    View Slide

  24. Deployment I
    Nomad needs a job description file to know what to schedule
    where
    Nginx needs to be configured so it knows which endpoints
    should be routed to which services
    We deploy our services using our small custom YAML DSL
    which is processed by Ansible
    24

    View Slide

  25. Deployment II
    25

    View Slide

  26. Deployment III
    Deployment steps
    1 Modify services DSL file
    2 Trigger deployment with Ansible
    3 Creates Nomad job specification files and triggers scheduling
    4 Nomad does a rolling update of the services
    5 Nomad worker nodes pull new Docker images and start them
    6 As new versions are rolled out Consul and Nginx get updated
    26

    View Slide

  27. Infrastructure Components
    In summary we have these infrastructure components
    27

    View Slide

  28. Implementation of
    Microservices

    View Slide

  29. Microservices Details
    Microservices are written in Kotlin and are based on
    spring-boot & Hibernate
    Each service has its own git repository
    One common library which services can use
    Be careful not to introduce unwanted dependencies between
    services!
    Treat as API and don’t break it
    Only used for cross cutting concerns
    Communication between microservices by using REST for
    synchronous and a queue for asynchronous communication
    Not covered in detail, your implementation will be different
    anyways :)
    29

    View Slide

  30. Authentication
    User requests need be authenticated
    We don’t want to query the authentication service for every
    request
    Would create a lot of load and a potential bottleneck
    Instead we use JSON Web Tokens (JWT)
    Authentication service creates cryptographically signed token
    for the user using its private key
    Services have a public key and can check whether the token is
    legitimate
    Since there’s no invalidation of a token we use a low TTL and
    a refresh mechanism
    30

    View Slide

  31. JWT
    31

    View Slide

  32. Interservice Communication
    If an immediate response is needed communication is simply
    done via REST HTTP requests
    Passes Authentication header if it’s required to identify the
    user
    For asynchronous communication we use a queue
    Decouples services
    Messages don’t get lost if other service is down
    Automatic retries
    Increases reliability of your system
    Coordination between multiple instances of the same type is
    done with Redis
    32

    View Slide

  33. Logging I
    When you have dozens of services running you need a good
    centralized logging solution
    AWS offers CloudWatch Logs
    Docker can natively log to CloudWatch
    Every log message should only be one event (one line)
    We use awslogs as a “remote grep” tool for CloudWatch
    33

    View Slide

  34. Logging II
    Allows searching in multiple log groups
    Time based restrictions with -s and -e
    34

    View Slide

  35. Logging III
    Logs are useless if nobody looks at them. You need a quick
    overview and notifications for errors
    We have an AWS Lambda function which subscribes to our
    application logs
    Filters logs for events we are interested in
    Errors and warnings
    Whitelisted info events
    Uses slack API to push log messages to Slack
    Different channels based on severity and type
    Adds Slack notifications for errors
    35

    View Slide

  36. Logging IV
    36

    View Slide

  37. Monitoring
    We need external monitoring to notify us in case a system is
    down
    We expose simple status endpoints from our services and let
    StatusCake monitor those
    If a service is unreachable StatusCake uses Pushover to send
    us a push notification
    37

    View Slide

  38. Lessons Learned

    View Slide

  39. Log Everything
    When things fail you need to be able to debug the problem
    It helps to log as much as you can
    Application logs
    Web server access/error logs
    Queue messages
    Linux system logs
    Frontend logs
    For application logs think about how you will search for
    messages later, i.e. include relevant data
    For system logs use blacklists for messages you are not
    interested in, get notified for everything you don’t expect
    39

    View Slide

  40. Distributed Systems
    A distributed system might not exactly behave as you’d think,
    see “Fallacies of Distributed Computing”
    Even when something has a failure likelihood of << 1% if that
    runs thousands of times it will eventually go wrong
    Expect that things will go wrong and make your system as
    robust as possible
    40

    View Slide

  41. DB Connections
    Usually DBs have a connection limit set
    Once the limit is reached you won’t get a new connection
    Think about how many connections you will have, it might be
    more than you think
    connections = services × instances × poolsize
    41

    View Slide

  42. System Resources
    Really hard to know your service memory requirements
    beforehand
    Services will probably consume more resources than you
    assume as they also need to have their runtime environment
    in memory
    No memory sharing/deduplication if you use Docker
    Be careful if you do hard memory limit enforcement
    42

    View Slide

  43. Summary
    The biggest challenge when doing microservices isn’t
    programming microservices but the infrastructure
    Everything has to be automated. You need:
    Automatic infrastructure setup: Terraform, CloudFormation,
    Heat, . . .
    A provisioner: Ansible, Puppet, Chef, . . .
    Automatic builds: Jenkins, Gitlab, Travis CI, . . .
    A scheduler: Nomad, Kubernetes, Docker Swarm, . . .
    A discovery service: Consul, etcd, Zookeeper, . . .
    An easy way to deploy services
    Centralized logging: CloudWatch, ELK, Graylog, . . .
    Monitoring: StatusCake, Nagios, sensu, . . .
    43

    View Slide

  44. Summary II
    You only need to invest in infrastructure knowledge once
    Makes development and evolution of your system easier
    Dramatically reduces complexity of individual services
    You get rewarded with a distributed highly reliable system
    44

    View Slide

  45. Kotlin Meetup
    If you’re interested in Kotlin please join our meetup!
    Next meetup is on April 18th.
    45

    View Slide

  46. End
    Thanks for your attention
    Questions?
    Contact me at [email protected]
    46

    View Slide

  47. References
    starjack: https://starjack.at/
    MicroservicePrerequisites: https://martinfowler.com/
    bliki/MicroservicePrerequisites.html
    Terraform: https://www.terraform.io/
    Ansible: https://www.ansible.com/
    Nomad: https://www.nomadproject.io/
    Consul: https://www.consul.io/
    consul-template:
    https://github.com/hashicorp/consul-template
    Fallacies of distributed computing: https://en.wikipedia.
    org/wiki/Fallacies_of_distributed_computing
    47

    View Slide