Jeremy Voorhis
Senior Engineer, AppFog Inc
[email protected]
@jvoorhis
http://appfog.com/
Hacking CloudFoundry
Sunday, April 1, 12
Slide 2
Slide 2 text
Overview
• Survey VCAP architecture
• Example: Creating a new VCAP component
• Example: Integrating OSS monitoring tools
• Example: Adding runtime support
Sunday, April 1, 12
Slide 3
Slide 3 text
CloudFoundry Architecture
Sunday, April 1, 12
Slide 4
Slide 4 text
• http://github.com/cloudfoundry
• NATS – messaging
• VCAP – applications and services
• VMC – API client
• 10 more at the time of this talk
Open Source Projects
Sunday, April 1, 12
Slide 5
Slide 5 text
VCAP Design Principles
• Loosely coupled components
• Fail fast
• Minimize single points of failure
• Infrastructure unaware
Sunday, April 1, 12
Slide 6
Slide 6 text
Loosely Coupled Components
• Collection of single purpose daemons
• Connected by pub/sub and HTTP
• Distributed state
• System state controlled by REST API
Sunday, April 1, 12
Slide 7
Slide 7 text
Fail Fast
• Optimized for mean-time-to-recovery
• Detect unrecoverable errors (i.e. dropped connections), log and exit(1)
• Process supervisor (i.e. Monit) strongly advised
• Components are easily replaceable
Sunday, April 1, 12
Slide 8
Slide 8 text
Minimize Single Points of Failure
• Most components scale horizontally
• Singletons
• CCDB – use replication!
• HealthManager – intermittent failures don’t disrupt service
• NATS – cluster work in progress
• Service daemons
Sunday, April 1, 12
Slide 9
Slide 9 text
Infrastructure Unaware
• Dev/Test: OS X, Micro CloudFoundry
• Public Cloud: AWS, Rackspace, HP, Joyent, more...
• Data Centers: vSphere, Bare metal
• Ubuntu 10.04 LTS recommended
Sunday, April 1, 12
Slide 10
Slide 10 text
Implementation Notes
• UNIX daemons
• Written in Ruby
• EventMachine library for asynchronous I/O
• Discoverable via NATS
• HTTP APIs for rolling metrics and health checks
Sunday, April 1, 12
Slide 11
Slide 11 text
CloudFoundry Components
• NATS
• Router
• CloudController
• DEA
• HealthManager
• Service Daemons
Sunday, April 1, 12
Slide 12
Slide 12 text
VCAP Topology
Sunday, April 1, 12
Slide 13
Slide 13 text
NATS
• The “Nervous System”
• Pub/Sub message bus
• Messages have a topic + payload
• Pattern matching for topics
• Many communication patterns
• Supports all CloudFoundry components
Sunday, April 1, 12
Slide 14
Slide 14 text
Router
• Proxies HTTP to hosted apps and CloudFoundry components
• Discovers routes to backends via NATS
• Random load balancing
Sunday, April 1, 12
Slide 15
Slide 15 text
CloudController
• Control systems via REST + JSON API
• Single point of truth for accounts (CCDB)
• Users
• Apps
• Services
Sunday, April 1, 12
Slide 16
Slide 16 text
DEA
• “Droplet Execution Agent”
• Starts and stops apps
• Allocates resources and enforces resource consumption
• Announces app state transitions via NATS
• Supports rolling restarts
Sunday, April 1, 12
Slide 17
Slide 17 text
HealthManager
• Monitors system for drift
• i.e. # running instances / app
• Signals to CloudController when something isn’t right
• Only non-CC component connecting to CCDB
Sunday, April 1, 12
Slide 18
Slide 18 text
Service Daemons
• Services external resources that you can bind to your app
• e.g. RDBMS, Key/Value Store, Filesystem, Message Queues, SMTP
• Nodes control individual service installations
• e.g. Create database and grant permissions for MySQL
• Gateways broker nodes to system
• Selects a node for provisioning
Sunday, April 1, 12
Slide 19
Slide 19 text
Putting it together: pushing an app
• Client POSTs app metadata, creates app in system
• Client PUTs needed files
• CloudController stages app, discovers DEA to run it
• DEA pulls app package, announces route to app on NATS
• Router discovers route, forwards traffic
Sunday, April 1, 12
Slide 20
Slide 20 text
Example: VCAP Component for Cache Control
• HTTP caching reverse proxy
• Load balancer
• Stores cache in virtual memory
Sunday, April 1, 12
Slide 21
Slide 21 text
HTTP Caching: Motivation
• Influenced by PHP Fog
• Many read-heavy sites (e.g. Drupal, Wordpress)
• Cache control headers often incorrect
• Purge cached content on deploy
• Doesn’t replace a CDN, but really helps!
Sunday, April 1, 12
Slide 22
Slide 22 text
Deploying Varnish
ELB
Router
DEA
ELB
Router
DEA
Varnish
Sunday, April 1, 12
Slide 23
Slide 23 text
Varnishing
• New VCAP component
• Deployed on Varnish nodes
• App deployments trigger purge command
• Reports activity and latency via /varz API
• < 300 LoC
Sunday, April 1, 12
Slide 24
Slide 24 text
Varnishing Boilerplate
• Read config
• Register as VCAP component
• Setup NATS subscriptions
• Write PID file, trap signals, enter run loop
Sunday, April 1, 12
Slide 25
Slide 25 text
Varnishing Logic
• Subscribe to ‘dea.*.start’
• Scripting varnishadm
• Authenticate with secret
• Issues purge command for each route assoc’d with app
Sunday, April 1, 12
Slide 26
Slide 26 text
Example: Monitoring
• Monitoring infrastructure is a solved problem
• What about monitoring CloudFoundry?
• Problems:
• System health
• Capacity forecasting
• QoS
Sunday, April 1, 12
Slide 27
Slide 27 text
Monitoring VCAP: Goals
• Leverage existing systems
• IaaS agnostic
• Alerts for on-call rotation
• Correlate reservation metrics with usage
• Aggregate service data (i.e. % mem reserved across DEA pool)
• Detect changes in clusters automatically
Sunday, April 1, 12
Slide 28
Slide 28 text
Collecting VCAP Data
• Every component has a JSON + REST API
• Components are discoverable via NATS
• Capability security model
• Credentials are generated at boot (UUID / UUID)
• http://user:[email protected]/varz
Sunday, April 1, 12
Slide 29
Slide 29 text
Monitoring Prototype
• Ad hoc solution: dashboard built with NodeJS
• Beautiful graphs
• Gained experience plumbing VCAP
• Unsolved problems:
• Inventory
• Correlation with lower layer
• Alerting
Sunday, April 1, 12
Slide 30
Slide 30 text
Our Monitoring Stack
Sunday, April 1, 12
Slide 31
Slide 31 text
varz-query
• NodeJS CLI
• Discovers components and presents metrics
• < 100 LoC
• Added support for timed request/response to node_nats
Sunday, April 1, 12
Slide 32
Slide 32 text
Check_MK Configuration
• Inventories VCAP components
• VCAP-specific checks implemented in Python
• Submits results to Nagios
• WARNING and CRITICAL results reported to PagerDuty
Sunday, April 1, 12
Slide 33
Slide 33 text
Chef’s Role
• Generates main.mk
• Nodes added automatically when provisioned
• Deploys Check_MK agent, varz-query
Sunday, April 1, 12
Slide 34
Slide 34 text
Putting It All Together
varz-query
NATS VCAP
Sunday, April 1, 12
Slide 35
Slide 35 text
Example: Adding PHP Support
• AppFog is the CloudFoundry Community Lead for PHP
• We created and actively develop PHP Fog
• We contributed our PHP experience back to CloudFoundry
Sunday, April 1, 12
Slide 36
Slide 36 text
Our PHP Stack
• Apache2
• PHP community widely depends on mod_rewrite, .htaccess
• PHP 5.3
• Leverages UNIX for isolation
• Every app runs under a unique UID / GID
• Same effect as VCAP’s secure mode
Sunday, April 1, 12
Slide 37
Slide 37 text
Our PHP Stack (Cont’d)
• Uses Apache2 vhosts in multi-tenant environment
• mpm-itk module allows workers to drop privileges
Sunday, April 1, 12
Slide 38
Slide 38 text
How VCAP Represents Apps
• Framework
• App deployment know-how
• e.g. Sinatra, Rails
• Runtime
• VMs, interpreters, common libraries
• e.g. ruby-1.8.7-p, ruby-1.9.3-p125
• Constrained by framework
Sunday, April 1, 12
Slide 39
Slide 39 text
App Packages
• Contain source code and dependency metadata
• Two flavors:
• Unstaged – just the app bits
• Staged – includes start/stop scripts, dependencies
Sunday, April 1, 12
Implementing a Staging Plugin
• Inherit from abstract StagingPlugin class
• MUST implement #framework, #stage_application
• MAY specialize further
Sunday, April 1, 12
Slide 42
Slide 42 text
Implementing a Staging Plugin (Cont’d)
• Staging manifest
• Runtimes
• App servers
• Detection rules
Sunday, April 1, 12
Slide 43
Slide 43 text
Staging Plugin for PHP
• Runs Apache2 in foreground
• Generates and bundles Apache2 and PHP configs
• Full support for environment vars, service bindings, resource limits
• Available now in open source project
Sunday, April 1, 12
Slide 44
Slide 44 text
Runtimes
• Deploy multiple runtime versions on any node
• DEA pool can be heterogeneous
• Versions can be upgraded without migrating existing apps
Sunday, April 1, 12
Going Further: A Custom DEA
• Motivation: multi-tenant PHP for free tier
• Married PHPFog’s stack with VCAP’s secure user pool
• mpm-itk
• Subclassed DEA::Agent
Sunday, April 1, 12
Slide 47
Slide 47 text
Go Forth And Hack!
• http://github.com/cloudfoundry
• http://cloudfoundry.org/
• http://appfog.com/
Sunday, April 1, 12
Slide 48
Slide 48 text
Jeremy Voorhis
Senior Engineer, AppFog Inc
[email protected]
@jvoorhis
http://appfog.com/
Thank You!
Sunday, April 1, 12