Slide 1

Slide 1 text

Jeremy Voorhis Senior Engineer, AppFog Inc [email protected] @jvoorhis http://appfog.com/ Hacking CloudFoundry Sunday, April 1, 12

Slide 2

Slide 2 text

Overview • Survey VCAP architecture • Example: Creating a new VCAP component • Example: Integrating OSS monitoring tools • Example: Adding runtime support Sunday, April 1, 12

Slide 3

Slide 3 text

CloudFoundry Architecture Sunday, April 1, 12

Slide 4

Slide 4 text

• http://github.com/cloudfoundry • NATS – messaging • VCAP – applications and services • VMC – API client • 10 more at the time of this talk Open Source Projects Sunday, April 1, 12

Slide 5

Slide 5 text

VCAP Design Principles • Loosely coupled components • Fail fast • Minimize single points of failure • Infrastructure unaware Sunday, April 1, 12

Slide 6

Slide 6 text

Loosely Coupled Components • Collection of single purpose daemons • Connected by pub/sub and HTTP • Distributed state • System state controlled by REST API Sunday, April 1, 12

Slide 7

Slide 7 text

Fail Fast • Optimized for mean-time-to-recovery • Detect unrecoverable errors (i.e. dropped connections), log and exit(1) • Process supervisor (i.e. Monit) strongly advised • Components are easily replaceable Sunday, April 1, 12

Slide 8

Slide 8 text

Minimize Single Points of Failure • Most components scale horizontally • Singletons • CCDB – use replication! • HealthManager – intermittent failures don’t disrupt service • NATS – cluster work in progress • Service daemons Sunday, April 1, 12

Slide 9

Slide 9 text

Infrastructure Unaware • Dev/Test: OS X, Micro CloudFoundry • Public Cloud: AWS, Rackspace, HP, Joyent, more... • Data Centers: vSphere, Bare metal • Ubuntu 10.04 LTS recommended Sunday, April 1, 12

Slide 10

Slide 10 text

Implementation Notes • UNIX daemons • Written in Ruby • EventMachine library for asynchronous I/O • Discoverable via NATS • HTTP APIs for rolling metrics and health checks Sunday, April 1, 12

Slide 11

Slide 11 text

CloudFoundry Components • NATS • Router • CloudController • DEA • HealthManager • Service Daemons Sunday, April 1, 12

Slide 12

Slide 12 text

VCAP Topology Sunday, April 1, 12

Slide 13

Slide 13 text

NATS • The “Nervous System” • Pub/Sub message bus • Messages have a topic + payload • Pattern matching for topics • Many communication patterns • Supports all CloudFoundry components Sunday, April 1, 12

Slide 14

Slide 14 text

Router • Proxies HTTP to hosted apps and CloudFoundry components • Discovers routes to backends via NATS • Random load balancing Sunday, April 1, 12

Slide 15

Slide 15 text

CloudController • Control systems via REST + JSON API • Single point of truth for accounts (CCDB) • Users • Apps • Services Sunday, April 1, 12

Slide 16

Slide 16 text

DEA • “Droplet Execution Agent” • Starts and stops apps • Allocates resources and enforces resource consumption • Announces app state transitions via NATS • Supports rolling restarts Sunday, April 1, 12

Slide 17

Slide 17 text

HealthManager • Monitors system for drift • i.e. # running instances / app • Signals to CloudController when something isn’t right • Only non-CC component connecting to CCDB Sunday, April 1, 12

Slide 18

Slide 18 text

Service Daemons • Services external resources that you can bind to your app • e.g. RDBMS, Key/Value Store, Filesystem, Message Queues, SMTP • Nodes control individual service installations • e.g. Create database and grant permissions for MySQL • Gateways broker nodes to system • Selects a node for provisioning Sunday, April 1, 12

Slide 19

Slide 19 text

Putting it together: pushing an app • Client POSTs app metadata, creates app in system • Client PUTs needed files • CloudController stages app, discovers DEA to run it • DEA pulls app package, announces route to app on NATS • Router discovers route, forwards traffic Sunday, April 1, 12

Slide 20

Slide 20 text

Example: VCAP Component for Cache Control • HTTP caching reverse proxy • Load balancer • Stores cache in virtual memory Sunday, April 1, 12

Slide 21

Slide 21 text

HTTP Caching: Motivation • Influenced by PHP Fog • Many read-heavy sites (e.g. Drupal, Wordpress) • Cache control headers often incorrect • Purge cached content on deploy • Doesn’t replace a CDN, but really helps! Sunday, April 1, 12

Slide 22

Slide 22 text

Deploying Varnish ELB Router DEA ELB Router DEA Varnish Sunday, April 1, 12

Slide 23

Slide 23 text

Varnishing • New VCAP component • Deployed on Varnish nodes • App deployments trigger purge command • Reports activity and latency via /varz API • < 300 LoC Sunday, April 1, 12

Slide 24

Slide 24 text

Varnishing Boilerplate • Read config • Register as VCAP component • Setup NATS subscriptions • Write PID file, trap signals, enter run loop Sunday, April 1, 12

Slide 25

Slide 25 text

Varnishing Logic • Subscribe to ‘dea.*.start’ • Scripting varnishadm • Authenticate with secret • Issues purge command for each route assoc’d with app Sunday, April 1, 12

Slide 26

Slide 26 text

Example: Monitoring • Monitoring infrastructure is a solved problem • What about monitoring CloudFoundry? • Problems: • System health • Capacity forecasting • QoS Sunday, April 1, 12

Slide 27

Slide 27 text

Monitoring VCAP: Goals • Leverage existing systems • IaaS agnostic • Alerts for on-call rotation • Correlate reservation metrics with usage • Aggregate service data (i.e. % mem reserved across DEA pool) • Detect changes in clusters automatically Sunday, April 1, 12

Slide 28

Slide 28 text

Collecting VCAP Data • Every component has a JSON + REST API • Components are discoverable via NATS • Capability security model • Credentials are generated at boot (UUID / UUID) • http://user:[email protected]/varz Sunday, April 1, 12

Slide 29

Slide 29 text

Monitoring Prototype • Ad hoc solution: dashboard built with NodeJS • Beautiful graphs • Gained experience plumbing VCAP • Unsolved problems: • Inventory • Correlation with lower layer • Alerting Sunday, April 1, 12

Slide 30

Slide 30 text

Our Monitoring Stack Sunday, April 1, 12

Slide 31

Slide 31 text

varz-query • NodeJS CLI • Discovers components and presents metrics • < 100 LoC • Added support for timed request/response to node_nats Sunday, April 1, 12

Slide 32

Slide 32 text

Check_MK Configuration • Inventories VCAP components • VCAP-specific checks implemented in Python • Submits results to Nagios • WARNING and CRITICAL results reported to PagerDuty Sunday, April 1, 12

Slide 33

Slide 33 text

Chef’s Role • Generates main.mk • Nodes added automatically when provisioned • Deploys Check_MK agent, varz-query Sunday, April 1, 12

Slide 34

Slide 34 text

Putting It All Together varz-query NATS VCAP Sunday, April 1, 12

Slide 35

Slide 35 text

Example: Adding PHP Support • AppFog is the CloudFoundry Community Lead for PHP • We created and actively develop PHP Fog • We contributed our PHP experience back to CloudFoundry Sunday, April 1, 12

Slide 36

Slide 36 text

Our PHP Stack • Apache2 • PHP community widely depends on mod_rewrite, .htaccess • PHP 5.3 • Leverages UNIX for isolation • Every app runs under a unique UID / GID • Same effect as VCAP’s secure mode Sunday, April 1, 12

Slide 37

Slide 37 text

Our PHP Stack (Cont’d) • Uses Apache2 vhosts in multi-tenant environment • mpm-itk module allows workers to drop privileges Sunday, April 1, 12

Slide 38

Slide 38 text

How VCAP Represents Apps • Framework • App deployment know-how • e.g. Sinatra, Rails • Runtime • VMs, interpreters, common libraries • e.g. ruby-1.8.7-p, ruby-1.9.3-p125 • Constrained by framework Sunday, April 1, 12

Slide 39

Slide 39 text

App Packages • Contain source code and dependency metadata • Two flavors: • Unstaged – just the app bits • Staged – includes start/stop scripts, dependencies Sunday, April 1, 12

Slide 40

Slide 40 text

Staging Apps • Stager • Builds staged app packages • Plugin architecture Sunday, April 1, 12

Slide 41

Slide 41 text

Implementing a Staging Plugin • Inherit from abstract StagingPlugin class • MUST implement #framework, #stage_application • MAY specialize further Sunday, April 1, 12

Slide 42

Slide 42 text

Implementing a Staging Plugin (Cont’d) • Staging manifest • Runtimes • App servers • Detection rules Sunday, April 1, 12

Slide 43

Slide 43 text

Staging Plugin for PHP • Runs Apache2 in foreground • Generates and bundles Apache2 and PHP configs • Full support for environment vars, service bindings, resource limits • Available now in open source project Sunday, April 1, 12

Slide 44

Slide 44 text

Runtimes • Deploy multiple runtime versions on any node • DEA pool can be heterogeneous • Versions can be upgraded without migrating existing apps Sunday, April 1, 12

Slide 45

Slide 45 text

Runtime Rollout • CloudController: • config file • App::Runtimes # FIXME • Staging manifests • DEA: • dea.yml • Executables, stdlibs, cached dependencies Sunday, April 1, 12

Slide 46

Slide 46 text

Going Further: A Custom DEA • Motivation: multi-tenant PHP for free tier • Married PHPFog’s stack with VCAP’s secure user pool • mpm-itk • Subclassed DEA::Agent Sunday, April 1, 12

Slide 47

Slide 47 text

Go Forth And Hack! • http://github.com/cloudfoundry • http://cloudfoundry.org/ • http://appfog.com/ Sunday, April 1, 12

Slide 48

Slide 48 text

Jeremy Voorhis Senior Engineer, AppFog Inc [email protected] @jvoorhis http://appfog.com/ Thank You! Sunday, April 1, 12

Slide 49

Slide 49 text

Questions? Sunday, April 1, 12