Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WombatOAM

 WombatOAM

Scale, Manage, Prevent: Generic Operations and Maintenance in Erlang done right!

Making distributed systems that scale to many machines is easier done with Erlang/OTP than most other technologies. Deploying, managing and monitoring thousands of Erlang nodes prepared for massive load, however, remains a tough (and repetitive) challenge. In the context of the EU funded RELEASE project, the main task of WombatOAM is to provide the scalable infrastructure for deploying thousands of Erlang nodes. It provides a broker layer capable of dynamically scaling heterogeneous cloud clusters based on capability profile matching.

In this talk, I will tell you how the scalability and robustness capabilities of WombatOAM were addressed, allowing us to deploy and monitor 10,000 Erlang nodes running an ant farm simulation in 4 minutes. The talk will cover the journey – focusing on the analysis of WombatOAM (using WombatOAM, as we like dog food), on the applied techniques and on the key decisions taken when advancing WombatOAM.

Francesco Cesarini is the founder of Erlang Solutions Ltd. He has used Erlang on a daily basis since 1995, starting as an intern at Ericsson’s computer science laboratory, the birthplace of Erlang. He moved on to Ericsson’s Erlang training and consulting arm working on the first release of OTP, applying it to turnkey solutions and flagship telecom applications. In 1999, soon after Erlang was released as open source, he founded Erlang Solutions, who have become the world leaders in Erlang based consulting, contracting, training and systems development. Francesco has worked in major Erlang based projects both within and outside Ericsson, and as Technical Director, has led the development and consulting teams at Erlang Solutions. He is also the co-author of 'Erlang Programming' and 'Designing for Scalability with Erlang/OTP', both published by O’Reilly and lectures at Oxford University.

erlang.paris

July 29, 2015
Tweet

More Decks by erlang.paris

Other Decks in Technology

Transcript

  1. Erlang Solutions Ltd. © 1999-2015 Erlang Solutions Ltd. Scale, manage

    and prevent! Francesco Cesarini 
 @FrancescoC
  2. © 1999-2015 Erlang Solutions Ltd. 99,9999999 - A Heck Of

    A Lot Of Nines 2 “We are extremely pleased with the outcome of the initial phase of this project. This is a major step in the phased development of what we believe is a world-leading Next Generation Network," said Richard Newman, General Manager of Planning and Delivery of Network Transport at BT Wholesale.” Ericsson Press Release 5 July, 2002 “Since cut-over of the first nodes in BT’s network in January 2002, only one minor fault has occurred, resulting in 99.9999999% availability.”
  3. © 1999-2015 Erlang Solutions Ltd. 99,9999999 - A Heck Of

    A Lot Of Nines 3 “As a matter of fact, the network performance has been so reliable that there is almost a risk that our field engineers do not learn maintenance skills.” Bert Nilsson, Director, NGS-Programs Ericsson
 Ericsson Contact, Issue 19 2002 99.9999999% uptime
  4. © 1999-2015 Erlang Solutions Ltd. - Business & System Metrics

    - Notifications & Logs - Minor, Major and Critical Alarms Visibility 5
  5. © 1999-2015 Erlang Solutions Ltd. - Business & System Metrics

    - Notifications & Logs - Minor, Major and Critical Alarms Visibility 6 Pre-emptive Support
  6. © 1999-2015 Erlang Solutions Ltd. - Business & System Metrics

    - Notifications & Logs - Minor, Major and Critical Alarms Visibility 7 Post-mortem Debugging
  7. © 1999-2015 Erlang Solutions Ltd. Why Wombat? 8 Erlang  RPC

    REST HTTP            BTO  So1ware
  8. © 1999-2015 Erlang Solutions Ltd. Characteristics • No bottleneck •

    No single point of failure • Scale out in size • Purely parallel • Distributed • Fault-tolerant 20
  9. © 1999-2015 Erlang Solutions Ltd. The Myths of the Hero

    Programmer…. Is it Documented? Is the developer supporting it? What visibility does your devops have into what is going on? - Live Tracing - Audit Trails - Metrics - Alarms - OPS have a CLI / HTTP Interface 22
  10. © 1999-2015 Erlang Solutions Ltd. Road Map • Integration with

    standard Monitoring Tools • Plugins for the major OAM tools • Interfaces towards OAM SAAS providers • SNMP • Plugins for OTP applications • Mnesia, MySQL and Postgres Drivers, Cowboy • Riak, RabbitMQ, MongooseIM, Ejabberd, CouchDB • Operations • Tools such as etop, Pman, AppMon in the dashboard • Configuration management • Automated software upgrade • Live tracing and profiling • Orchestration no longer in beta • Rules Based Engine • Trigger based system with a rules based engine • Own DSL • Auto scalability 23 Customer Driven 2015/2016 2014/15 2015
  11. © 1999-2015 Erlang Solutions Ltd. 24 Learn from the past

    Live in the future Scale, manage and prevent! Discount Code: authd 50% off the Early Release 40% off the printed copy Francesco Cesarini 
 @FrancescoC