10 Real problems & solutions in your build and deploy process

10 Real problems and solutions for your Build & Deploy
process

About AppsFlyer • Mobile analytics & Attribution • 11 offices
worldwide • Vast majority of mobile platforms supported • More than 8B events per day • Over 2K integrated partners • Cloud based (AWS,GCP)

Build & Deploy tools . . . . . .
Commits Deploys Building projects Building Docker images Communication Image repository Machines running code Configuration / state Versions storage Deployment system Registry cache SVC

Categories • General issues • Scale optimizations • Features &
Tools • Visibility

General Issues

Case #1: Deployment failures 1) Health check timeouts 2) Machine
list mismatch 3) Wrong Java version 4) Incorrect startup parameters 5) Wrong timing for lb health check 6) Port in use

The Solution #1 1) Tune start time, validate timeout is
not too short 2) Update dynamically, add mismatch alerts, auto correction 3) Jenkins version = instance/container version 4) Set reasonable defaults, guide new members 5) Sync LB healthcheck with deployment state 6) Dynamic allocation / validation

Case #2: Docker issues 1) Hostname characters limitation 2) JVM
escape 3) Image is corrupted in registry 4) Conntract table limits

The Solution #2 1) Make sure you are not exceeding
64 chars 2) Upgrade to higher docker version 3) Let client auto fail to other registry, otherwise fake commit to recreate image 4) Increase conntract table & file descriptors

Features & Tools

Case #3: (NOT) Loosing traffic while deploying 1) Loosing traffic
when deploying to server behind a load balancer 2) Loosing traffic when stopping a container

The Solution #3 1) Connection draining: remove server from load
balancer and wait for x seconds before the deployment. Drain time value is set in consul per service, the default is 30 seconds 2) Graceful shutdown: Set several seconds before killing the container and capture sigterm to flush in process messages to external DB or queue Ex: POST /containers/e90e34656806/stop?t=5 HTTP/1.1

Case #4: Deploy from branch Building & deploying from non
non default branch

The Solution #4 We added an option to build &
deploy from a branch in our deployment system The branch flow includes several steps: 1) Save the current configuration (Jenkins) 2) Update the new branch & revision 3) Initiate Jenkins build 4) Revert configuration state 5) Detect when image is available 6) Enable deployment ** Alternatively you can create a new Jenkins configuration for each branch and cleanup later

The Solution #4 A few notes: • Maintain separate KV
for default and branch • Provide an option to either build from “scratch” or “base image” • Regularly backup Jenkins configurations • Send build failures to slack

Case #5: Free developers from your burden with self serve

The Solution #5 Build self serve UI to enable to
add or edit: • Services • Modes • Autoscale • Healer • Spots instances • Alerts

Case #6: Slow build time The time period between pushing
code to deployment readiness extends over time

The Solution #6 • Create base images, according to service
type • Increase the number of Jenkins slaves • Migrate slaves to new generation CPU instances • Split compilation and image build to tune workload • Re design registry to improve image push time • Add proxies (jars, npm, etc)

Case #7: Distributing Docker registry Single Docker registry, in active–passive
mode becomes a bottleneck when building and deploying simultaneously to several dozens of services and instances

The Solution #7 Distributed sharded registry with replication factor, high
availability, rack awareness and automatic recovery Example: scenario of 3 registries and RF of 2: • Each service/mode served by 2 registries • Pairs are distributed evenly between modes • Metadata is saved in consul KV • All images are uploaded to S3 so reseeding registry is easy

The Solution #7 Relevant Links: • Project Blog: http://relmos.blogspot.co.il/2016/09/scaling-private-docker-registry-at_49.html •
Registry Deploy: https://docs.docker.com/registry/deploying/

Case #8: Cleaning old versions Prevent old versions from pilling
up

The Solution #8 This is where docker shines We clean:
• Containers with stopped exit code • Old Images which are not being used • Registry

Visibility

Case #9: Detecting versions inconsistency Different versions of the same
service deployed in production (unintentionally)

The Solution #9 • A graphical near real-time view on
versions per service • Easy way to add alert on inconsistent versions

Case #10: Tracking it all Lack of visibility on deployments
and failed builds

The Solution #10 • Add integration to slack that includes
all relevant information (version,servers,instances,user) • Send deployment events to graphite and combine with relevant dashboards • Send build & deployment logs to central log system • Event system which graphically presents important events (deployments, heals, autoscale, etc...)

[email protected]

10 Real problems & solutions in your build and ...

10 Real problems & solutions in your build and deploy process

AppsFlyer

More Decks by AppsFlyer

Other Decks in Technology

Featured

Transcript

10 Real problems and solutions for your Build & Deploy

About AppsFlyer • Mobile analytics & Attribution • 11 offices

Build & Deploy tools . . . . . .

Categories • General issues • Scale optimizations • Features &

General Issues

Case #1: Deployment failures 1) Health check timeouts 2) Machine

The Solution #1 1) Tune start time, validate timeout is

Case #2: Docker issues 1) Hostname characters limitation 2) JVM

The Solution #2 1) Make sure you are not exceeding

Features & Tools

Case #3: (NOT) Loosing traffic while deploying 1) Loosing traffic

The Solution #3 1) Connection draining: remove server from load

Case #4: Deploy from branch Building & deploying from non

The Solution #4 We added an option to build &

The Solution #4 A few notes: • Maintain separate KV

Case #5: Free developers from your burden with self serve

The Solution #5 Build self serve UI to enable to

Scale

Case #6: Slow build time The time period between pushing

The Solution #6 • Create base images, according to service

Case #7: Distributing Docker registry Single Docker registry, in active–passive

The Solution #7 Distributed sharded registry with replication factor, high

The Solution #7 Relevant Links: • Project Blog: http://relmos.blogspot.co.il/2016/09/scaling-private-docker-registry-at_49.html •

Case #8: Cleaning old versions Prevent old versions from pilling

The Solution #8 This is where docker shines We clean:

Visibility

Case #9: Detecting versions inconsistency Different versions of the same

The Solution #9 • A graphical near real-time view on

Case #10: Tracking it all Lack of visibility on deployments

The Solution #10 • Add integration to slack that includes

[email protected]