Upgrade to Pro — share decks privately, control downloads, hide ads and more …

smartstack

martin rhoads
October 30, 2013
150

 smartstack

martin rhoads

October 30, 2013
Tweet

Transcript

  1. Note: it’s a clip-on Wo are these guys up on

    that there stage? Intros Igor Serebryany + SRE at Airbnb since 2012 + Built datacenter automation at SingleHop + Scientific computing at University of Chicago + Hobbies: welding, biking, long walks on the beach 2
  2. Optional and somewhat awesome footnote text goes here. This guy

    is even more bearded than the last! Intros Martin Rhoads + SRE at Airbnb + user of AWS since 2006 + First 10 employees at RightScale + Previously worked at Cloudscaling deploying OpenStack at Tier1’s and Telcos + BioInformatics at UCSB + Obsessed with making things easier 3
  3. Wat are trying to sell me? Wy do I need

    an SOA? + The definitive way to scale your architecture + Allow different people to work on different code without stepping on toes + Separate deployment schedules + Separate machine and data requirements + Fail separately -- so you can have graceful degradation 5
  4. How an SOA happens Wen customers love a service very,

    very much... That looks pretty workable... 6
  5. How an SOA happens Hmm well it makes sense. 8

    Wen customers love a service very, very much...
  6. How an SOA happens Wen customers love a service very,

    very much... I have no idea what I’m doing... 11
  7. Here’s how it ends up A certain kind of fun

    12 Role help (4 machines) help Role rendezvous (3 machines) rendezvous Role shiny- server (1 machines) shiny Role redis-trebuchet- master (1 machines) redis_trebuchet Role redis-fraud-master (1 machines) redis_fraud redis_fraud_slave Role monorail- web (79 machines) monorail Role redis-feeds- master (1 machines) redis_feeds Role social_connections (6 machines) social_connections Role geminabox-internal (1 machines) geminabox-internal Role redis-resque- master (1 machines) redis_resque Role moweb (3 machines) moweb Role social-api (1 machines) social-api Role redis-social-2- master (1 machines) redis_social-2 Role communities (3 machines) communities Role candidate-pricing (9 machines) pricing_candidate_thrift Role testimonials-service (2 machines) testimonials-service Role cashout (1 machines) cashout Role redis-social-1- master (1 machines) redis_social-1 Role optica (3 machines) optica Role ganesh (3 machines) ganesh Role zookeeper-main (5 machines) exhibitor-main Role corgi (6 machines) corgi Role fraud-prediction (6 machines) fraud-prediction Role kibana (3 machines) kibana Role redis-rookery- master (1 machines) redis_rookery Role flog (6 machines) flog flog_thrift Role ssspy (3 machines) ssspy Role name_matching (1 machines) name_matching Role redis-counters- master (1 machines) redis_counters Role aircorps-elasticsearch (4 machines) aircorps-elasticsearch Role zookeeper- di (3 machines) exhibitor-di Role redis-ganesh- master (1 machines) redis_ganesh Role internalauth (1 machines) internalauth Role redis-corgi-slave (1 machines) redis_corgi_slave Role search (6 machines) search Role rabbitmq (4 machines) rabbitmq rabbitmq_management Role logstash- collector (4 machines) logstash Role openscoring (3 machines) openscoring Role hadoop- journal- production (14 machines) hive_thrift Role ec2admin (2 machines) ec2admin-api ec2admin Role dyson (4 machines) dyson Role commitment (2 machines) commitment Role community (3 machines) community Role redis-general- master (1 machines) redis_general Role testimonials-search (3 machines) testimonials-search Role minotaur (1 machines) minotaur Role graze- default (2 machines) graze Role host_standards (7 machines) host_standards Role monorail-resque-scheduler (1 machines) monorail-resque-admin Role bouncer-worker (6 machines) bouncer_worker Role redis-rendezvous- master (1 machines) redis_rendezvous Role log_lady (1 machines) log_lady Role pricing (9 machines) pricing pricing_thrift Role redis-corgi- master (1 machines) redis_corgi Role rabbitmq-data (3 machines) rabbitmq-data-management rabbitmq-data Role sherlock (3 machines) sherlock Role autopricing (3 machines) autopricing Role redis-omgpro- master (1 machines) redis_omgpro Role rookery-api (3 machines) rookery-api Role kibana3 (1 machines) kibana3 logstash-elasticsearch Role companion (1 machines) companion Role billow (3 machines) billow Role monorail-titanic- admin (4 machines) monorail-admin Role sphinx (1 machines) sphinx Role bouncer (9 machines) bouncer Role redis-communities- master (1 machines) redis_communities Role calendar-service (4 machines) calendar-service Role commitment-publisher (2 machines) commitment-publisher facebook_friends_db fraud_db calendar-api calendar-db-slave translation-memory airmaster fraud_db_slave spark-db airslave squash help-db calendar-db social-db-a social-db-b social-db-c social-db-d-slave cashout-db session-activities-db session-activities-db-slave air18n airbatch kafka-h1 airlift-discovery host_standards-db bouncer_db rookery-db companion-db calendar-service-db calendar-service-db-slave
  8. To sum up 13 1 2 3 4 Services help

    you scale SOA is an architecture style designed around services An SOA is hard to manage SmartStack makes managing an SOA a breeze
  9. SERVICE 1 Service(s) you want to deliver 2 ZooKeeper registry

    to track everything ZOOKEEPER 3 Nerve checks health and updates Zookeeper NERVE 4 Synapse routes between services SYNAPSE SERVICE ZOOKEEPER NERVE SYNAPSE
  10. MONORAIL NERVE SYNAPSE MOBILE WEB NERVE SYNAPSE ZOOKEEPER HEALTHY? HEALTHY?

    YES! YES! REGISTER HEALTHY REGISTER HEALTHY + /production/monorail/services/i-1234567 => {‘host’: 1.2.3.4, ‘port’: 5678} + /production/mobile_web/services/i-0abcdef => {‘host’: 5.6.7.8, ‘port’: 5678} MONORAIL? mobile_web host: 5.6.7.8 port: 5678 monorail host: 1.2.3.4 port: 5678 1.2.3.4:5678
  11. Optional and somewhat awesome footnote text goes here. haproxy We

    get myriad benefits from haproxy + Stable and well-tested + Performs in-process connectivity checks + Great introspection and logging + Lots of load-balancing algorithms (RR, least-conn) + Somewhat dynamically reconfigurable (stats socket) + Did we mention stable? At the core of synapse 18
  12. Abstraction 21 + The same code in the same language

    is always doing discovery/registration + Your application doesn’t know about nerve/synapse -- it only knows about it’s dependencies + Always consistent across your infrastructure
  13. You don’t have to wake up Automatic Failure Handling +

    Bad backends are automatically taken out of rotation + Useful during both problems and routine maintenance/deploys + Push-based => very rapid detection; avoid those little blips + haproxy even routes around network partitions! 22
  14. See what’s REALLY going on Introspection Leverage the power of

    haproxy + status page that lets you see local state + lots of available integrations to gather global state + world-class logging for large-scale analysis 23
  15. No central point of failure Distributed by Design + Traffic

    flows directly between boxes -- no routing layer + Even if SmartStack is stopped or broken, haproxy keeps traffic flowing + Zookeeper helps to avoid common pitfalls (like different backends in different network segments) 24
  16. How SmartStack has changed Airbnb The impact 25 100+ Services

    using SmartStack Requests per second LOC deleted Engineers using SmartStack 2K 3K 30
  17. Ben: “SmartStack is fucking great! It helped me to discover

    services – and quit smoking” Phillippe: “Distributed computing? And all this time I thought everything was running on one machine” Nelson: “I would give you a quote... except that my praise for SmartStack would test the limits of my credibility. :)” Spike : “Nerve and Synapse have greatly simplified my life as an application developer, and have enabled me to launch our first Node.js services with very little ops overhead.” Barbara: “I love it!” Sean: “Smart Stack has made deployment of new java services a matter of beer and 20 lines of ruby” Our engineers love SmartStack Like, a platonic kind of love
  18. Future Direction Is this project, like, done...? 27 1 2

    3 4 Better resiliency: more graceful handling of zookeeper edge cases Better testing: improve on the current integration test suite Dynamic registration: for services running on Mesos et. al. A push API for nerve: allow services to communicate coming downtime 5 An auto-scaling layer: use nerve information to determine load levels
  19. Getting Started 29 1 2 3 install Vagrant git clone

    https://github.com/airbnb/smartstack-cookbook.git vagrant up