A Journey
Through
Wonderland
Paul Seiffert
Mathias Lafeldt
Slide 2
Slide 2 text
The Purpose of
Wonderland
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
● Took Jimdo 5 years to migrate core
infrastructure from bare metal to AWS
● Teams started to love the cloud
● Many experiments in different AWS
accounts
● “Reinvented” production stacks
How we got here
Slide 5
Slide 5 text
● Founded to solve common infrastructure
problems of Jimdo teams
● Provides standard platform that is reliable
and simple to use: Wonderland
● Allows Jimdo developers to focus on
product development
Werkzeugschmiede Team
Slide 6
Slide 6 text
Wonderland’s History
Slide 7
Slide 7 text
Wonderland 101
Slide 8
Slide 8 text
PaaS allowing
Jimdo developers
to run their
dockerized applications
Slide 9
Slide 9 text
● Long-running stateless services
○ DNS, load balancing, health checks,
auto scaling, …
● One-off tasks and cron jobs
● Centralized logging and metrics
collection via external providers
Features
Slide 10
Slide 10 text
● APIs
● CLI tool wl
● Chatbot Alice
● Docker registry
● Vault
● No SSH access
Interfaces
Slide 11
Slide 11 text
● SLA
● Status page
● Documentation
● Workshops
● Use-case-driven development
Internal service provider
Slide 12
Slide 12 text
Wonderland Internals
Slide 13
Slide 13 text
We run...
● AWS infrastructure
● Services providing our APIs
ECS
Agent
Log
Forwarder
Datadog
Agent
AWS ECS
Service
A
Service
B
Service
C
E
L
B
E
L
B
HTTP :80
HTTPS :443
HTTP :11411
TCP :1234 TCP :11412
A Crims Cluster Instance
Slide 22
Slide 22 text
● Infrastructure as code
● CloudFormation and Ansible
● Applied by a Central State Enforcer
● Workflow based on GitHub pull requests
● Automated rollout to production
Infrastructure Development
Slide 23
Slide 23 text
● We test everything
● Unit, integration, and system tests
● Tests in staging environment
● Staging is set up from scratch every week
● Periodic GameDays
QA
Slide 24
Slide 24 text
Our Services
● provide APIs
● deploy other services
● are Wonderland services
Slide 25
Slide 25 text
SQS
Queue
Status
Check
Service
AutoScaler
Deployer API
(Dash-)
Boards
Oraculum
(Logs)
AWS
Route53
AWS
Application
AutoScaling
Notifi-
cations
AWS SNS
Alice
(Chatbot)
Deployer Worker
WL (CLI Tool)
AWS
S3
Deploy it!
$ wl deploy autoscaler -f wonderland-autoscaler/wonderland.yaml
autoscaler/1466583476 This is try 1
autoscaler/1466583476 Updating ELB autoscaler-1466437217
autoscaler/1466583476 Configuring health check HTTP:11011/v1/health
autoscaler/1466583476 Enabling cross-zone load balancing
autoscaler/1466583476 Configuring connection draining with a timeout of 180s
autoscaler/1466583476 Not enabling access log
autoscaler/1466583476 Letting autoscaler.example.com point to autoscaler-1363526915.eu-west-1.elb.amazonaws.com
autoscaler/1466583476 Registered new ECS TaskDefinition (autoscaler:58) for service autoscaler
autoscaler/1466583476 Updating ECS service autoscaler-1466437217
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 180s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 170s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 160s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 150s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 140s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 130s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 120s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 110s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 100s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 90s)
autoscaler/1466583476 Waiting for service autoscaler-1466437217 to complete rolling update (timeout in 80s)
autoscaler/1466583476 Rolling update completed successfully.
autoscaler/1466583476 Waiting for ELB to have at least one healthy instance
autoscaler/1466583476 Deleting old ECS Task Definition service-autoscaler:57
autoscaler/1466583476 Marking deployment autoscaler/1466583476 active
autoscaler/1466583476 [Boards] Creating Board for Service [werkzeugschmiede] autoscaler
autoscaler/1466583476 [Datadog] Creating Deployment Event
autoscaler/1466583476 [Notifications] Notification channel is /v1/teams/werkzeugschmiede/channels/autoscaler
autoscaler/1466583476 [StatusCheck] CheckID is f85ded4d-9ad0-4375-81b4-5989964e8ed5
autoscaler/1466583476 Deployment successful
Slide 28
Slide 28 text
Monitor it!
$ wl status autoscaler
Current deployment: 1466583491
Desired scale: 2
Machine Component Status Started Deployment ELB
------- --------- ------ ------- ---------- ---
i-7db992f7 autoscaler RUNNING 22 Jun 16 11:14 CEST 1466583491 InService
i-fb2f5b77 autoscaler RUNNING 24 Jun 16 01:13 CEST 1466583491 InService
$ wl logs -f autoscaler
...
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
The
Future
Slide 31
Slide 31 text
● Persistent disk storage
● Dynamic load balancing
● Long-running / memory hungry jobs
● Speed up ECS cluster rotation
● Make crons more reliable
● Outsource Docker registry
Improvements