$30 off During Our Annual Pro Sale. View Details »

A Journey Through Wonderland

A Journey Through Wonderland

In this talk we're going to introduce you to Wonderland, Jimdo's in-house PaaS for microservices. Wonderland provides Jimdo developers with an API and other tools that make deploying Dockerized applications easy. Our PaaS utilizes Amazon ECS to run Docker containers on a CoreOS cluster in EC2. Besides those basic building blocks, we integrate different external services for metrics and logging.

We'll show how Wonderland works under the hood, what we like and what we don't like about the current setup, and how our teams are using the platform for running services in production.

(Talk given at ContainerDays 2016: http://www.containerdays.de/)

Mathias Lafeldt

June 28, 2016
Tweet

More Decks by Mathias Lafeldt

Other Decks in Technology

Transcript

  1. A Journey
    Through
    Wonderland
    Paul Seiffert
    Mathias Lafeldt

    View Slide

  2. The Purpose of
    Wonderland

    View Slide

  3. View Slide

  4. ● Took Jimdo 5 years to migrate core
    infrastructure from bare metal to AWS
    ● Teams started to love the cloud
    ● Many experiments in different AWS
    accounts
    ● “Reinvented” production stacks
    How we got here

    View Slide

  5. ● Founded to solve common infrastructure
    problems of Jimdo teams
    ● Provides standard platform that is reliable
    and simple to use: Wonderland
    ● Allows Jimdo developers to focus on
    product development
    Werkzeugschmiede Team

    View Slide

  6. Wonderland’s History

    View Slide

  7. Wonderland 101

    View Slide

  8. PaaS allowing
    Jimdo developers
    to run their
    dockerized applications

    View Slide

  9. ● Long-running stateless services
    ○ DNS, load balancing, health checks,
    auto scaling, …
    ● One-off tasks and cron jobs
    ● Centralized logging and metrics
    collection via external providers
    Features

    View Slide

  10. ● APIs
    ● CLI tool wl
    ● Chatbot Alice
    ● Docker registry
    ● Vault
    ● No SSH access
    Interfaces

    View Slide

  11. ● SLA
    ● Status page
    ● Documentation
    ● Workshops
    ● Use-case-driven development
    Internal service provider

    View Slide

  12. Wonderland Internals

    View Slide

  13. We run...
    ● AWS infrastructure
    ● Services providing our APIs

    View Slide

  14. AWS Infrastructure
    ● Networking
    ● Cluster of EC2 instances
    ● Jenkins
    ● Route 53, DynamoDB, S3, SQS, SNS, ...

    View Slide

  15. “Crims” Cluster
    ● Runs user applications + system services
    ● EC2 auto-scaling group
    ● Providing resources to ECS
    ● CoreOS

    View Slide

  16. AWS ECS
    AWS EC2 Auto-Scaling Group

    View Slide

  17. View Slide

  18. Two-Dimensional
    ● Services
    (based on resource consumption)
    ● Cluster
    (based on available slots)
    Auto-Scaling

    View Slide

  19. View Slide

  20. AWS/AutoScaling
    GroupDesiredCapacity
    Wonderland/ECS
    DesiredClusterSizeDelta
    1 week

    View Slide

  21. ECS
    Agent
    Log
    Forwarder
    Datadog
    Agent
    AWS ECS
    Service
    A
    Service
    B
    Service
    C
    E
    L
    B
    E
    L
    B
    HTTP :80
    HTTPS :443
    HTTP :11411
    TCP :1234 TCP :11412
    A Crims Cluster Instance

    View Slide

  22. ● Infrastructure as code
    ● CloudFormation and Ansible
    ● Applied by a Central State Enforcer
    ● Workflow based on GitHub pull requests
    ● Automated rollout to production
    Infrastructure Development

    View Slide

  23. ● We test everything
    ● Unit, integration, and system tests
    ● Tests in staging environment
    ● Staging is set up from scratch every week
    ● Periodic GameDays
    QA

    View Slide

  24. Our Services
    ● provide APIs
    ● deploy other services
    ● are Wonderland services

    View Slide

  25. SQS
    Queue
    Status
    Check
    Service
    AutoScaler
    Deployer API
    (Dash-)
    Boards
    Oraculum
    (Logs)
    AWS
    Route53
    AWS
    Application
    AutoScaling
    Notifi-
    cations
    AWS SNS
    Alice
    (Chatbot)
    Deployer Worker
    WL (CLI Tool)
    AWS
    S3

    View Slide

  26. Service Configuration
    $ cat wonderland-autoscaler/wonderland.yaml
    ---
    scale: 2
    components:
    - name: autoscaler
    image: registry.example.com/wonderland-autoscaler:v1.0.3
    env:
    DYNAMODB_TABLE_NAME: wonderland-autoscaling-configs
    endpoint:
    domain: autoscaler.example.com
    load-balancer:
    healthcheck:
    path: /v1/health
    ports:
    - port: 443
    protocol: HTTPS
    component: autoscaler
    port: 80

    View Slide

  27. Deploy it!
    $ wl deploy autoscaler -f wonderland-autoscaler/wonderland.yaml
    autoscaler/1466583476 This is try 1
    autoscaler/1466583476 Updating ELB autoscaler-1466437217
    autoscaler/1466583476 Configuring health check HTTP:11011/v1/health
    autoscaler/1466583476 Enabling cross-zone load balancing
    autoscaler/1466583476 Configuring connection draining with a timeout of 180s
    autoscaler/1466583476 Not enabling access log
    autoscaler/1466583476 Letting autoscaler.example.com point to autoscaler-1363526915.eu-west-1.elb.amazonaws.com
    autoscaler/1466583476 Registered new ECS TaskDefinition (autoscaler:58) for service autoscaler
    autoscaler/1466583476 Updating ECS service autoscaler-1466437217
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 180s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 170s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 160s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 150s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 140s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 130s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 120s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 110s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 100s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 90s)
    autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 80s)
    autoscaler/1466583476 Rolling update completed successfully.
    autoscaler/1466583476 Waiting for ELB to have at least one healthy instance
    autoscaler/1466583476 Deleting old ECS Task Definition service-autoscaler:57
    autoscaler/1466583476 Marking deployment autoscaler/1466583476 active
    autoscaler/1466583476 [Boards] Creating Board for Service [werkzeugschmiede] autoscaler
    autoscaler/1466583476 [Datadog] Creating Deployment Event
    autoscaler/1466583476 [Notifications] Notification channel is /v1/teams/werkzeugschmiede/channels/autoscaler
    autoscaler/1466583476 [StatusCheck] CheckID is f85ded4d-9ad0-4375-81b4-5989964e8ed5
    autoscaler/1466583476 Deployment successful

    View Slide

  28. Monitor it!
    $ wl status autoscaler
    Current deployment: 1466583491
    Desired scale: 2
    Machine Component Status Started Deployment ELB
    ------- --------- ------ ------- ---------- ---
    i-7db992f7 autoscaler RUNNING 22 Jun 16 11:14 CEST 1466583491 InService
    i-fb2f5b77 autoscaler RUNNING 24 Jun 16 01:13 CEST 1466583491 InService
    $ wl logs -f autoscaler
    ...

    View Slide

  29. View Slide

  30. The
    Future

    View Slide

  31. ● Persistent disk storage
    ● Dynamic load balancing
    ● Long-running / memory hungry jobs
    ● Speed up ECS cluster rotation
    ● Make crons more reliable
    ● Outsource Docker registry
    Improvements

    View Slide

  32. Twitter: @seiffertp / @mlafeldt
    https://medium.com/production-ready
    Questions?
    Thank you.

    View Slide