Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WombatOAM

 WombatOAM

Scale, Manage, Prevent: Generic Operations and Maintenance in Erlang done right!

Making distributed systems that scale to many machines is easier done with Erlang/OTP than most other technologies. Deploying, managing and monitoring thousands of Erlang nodes prepared for massive load, however, remains a tough (and repetitive) challenge. In the context of the EU funded RELEASE project, the main task of WombatOAM is to provide the scalable infrastructure for deploying thousands of Erlang nodes. It provides a broker layer capable of dynamically scaling heterogeneous cloud clusters based on capability profile matching.

In this talk, I will tell you how the scalability and robustness capabilities of WombatOAM were addressed, allowing us to deploy and monitor 10,000 Erlang nodes running an ant farm simulation in 4 minutes. The talk will cover the journey – focusing on the analysis of WombatOAM (using WombatOAM, as we like dog food), on the applied techniques and on the key decisions taken when advancing WombatOAM.

Francesco Cesarini is the founder of Erlang Solutions Ltd. He has used Erlang on a daily basis since 1995, starting as an intern at Ericsson’s computer science laboratory, the birthplace of Erlang. He moved on to Ericsson’s Erlang training and consulting arm working on the first release of OTP, applying it to turnkey solutions and flagship telecom applications. In 1999, soon after Erlang was released as open source, he founded Erlang Solutions, who have become the world leaders in Erlang based consulting, contracting, training and systems development. Francesco has worked in major Erlang based projects both within and outside Ericsson, and as Technical Director, has led the development and consulting teams at Erlang Solutions. He is also the co-author of 'Erlang Programming' and 'Designing for Scalability with Erlang/OTP', both published by O’Reilly and lectures at Oxford University.

erlang.paris

July 29, 2015
Tweet

More Decks by erlang.paris

Other Decks in Technology

Transcript

  1. Erlang Solutions Ltd.
    © 1999-2015 Erlang Solutions Ltd.
    Scale, manage and prevent!
    Francesco Cesarini 

    @FrancescoC

    View full-size slide

  2. © 1999-2015 Erlang Solutions Ltd.
    99,9999999 - A Heck Of A Lot Of Nines
    2
    “We are extremely pleased with the outcome of the initial phase of this
    project. This is a major step in the phased development of what we
    believe is a world-leading Next Generation Network," said Richard
    Newman, General Manager of Planning and Delivery of Network
    Transport at BT Wholesale.”
    Ericsson Press Release 5 July, 2002
    “Since cut-over of the first nodes in BT’s network in
    January 2002, only one minor fault has occurred,
    resulting in 99.9999999% availability.”

    View full-size slide

  3. © 1999-2015 Erlang Solutions Ltd.
    99,9999999 - A Heck Of A Lot Of Nines
    3
    “As a matter of fact, the network performance has been so reliable
    that there is almost a risk that our field engineers do not learn
    maintenance skills.”
    Bert Nilsson, Director, NGS-Programs Ericsson

    Ericsson Contact, Issue 19 2002
    99.9999999% uptime

    View full-size slide

  4. © 1999-2015 Erlang Solutions Ltd.
    Pre-emptive support
    4
    Pre-emptive support

    View full-size slide

  5. © 1999-2015 Erlang Solutions Ltd.
    - Business & System Metrics
    - Notifications & Logs
    - Minor, Major and Critical Alarms
    Visibility
    5

    View full-size slide

  6. © 1999-2015 Erlang Solutions Ltd.
    - Business & System Metrics
    - Notifications & Logs
    - Minor, Major and Critical Alarms
    Visibility
    6
    Pre-emptive Support

    View full-size slide

  7. © 1999-2015 Erlang Solutions Ltd.
    - Business & System Metrics
    - Notifications & Logs
    - Minor, Major and Critical Alarms
    Visibility
    7
    Post-mortem Debugging

    View full-size slide

  8. © 1999-2015 Erlang Solutions Ltd.
    Why Wombat?
    8
    Erlang  RPC
    REST
    HTTP
               BTO  So1ware

    View full-size slide

  9. © 1999-2015 Erlang Solutions Ltd.
    Case study
    9
    XMPP  Load  Generation
    MongooseIM

    View full-size slide

  10. © 1999-2015 Erlang Solutions Ltd.
    ETS limit: major
    10

    View full-size slide

  11. © 1999-2015 Erlang Solutions Ltd.
    Process message queue: major
    11

    View full-size slide

  12. © 1999-2015 Erlang Solutions Ltd.
    Different application versions
    12

    View full-size slide

  13. © 1999-2015 Erlang Solutions Ltd.
    module_clash
    13

    View full-size slide

  14. © 1999-2015 Erlang Solutions Ltd.
    Apply Application filter
    14

    View full-size slide

  15. © 1999-2015 Erlang Solutions Ltd.
    Analysis of MongoDB driver
    15

    View full-size slide

  16. © 1999-2015 Erlang Solutions Ltd.
    Wombat’s attractive features
    16
    Metric Notification Alarm

    View full-size slide

  17. © 1999-2015 Erlang Solutions Ltd.
    Wombat-tree
    17
    200 x 200 x 200 x 200 x 200 x

    View full-size slide

  18. © 1999-2015 Erlang Solutions Ltd.
    Wombat-tree
    18
    100 x 100 x 100 x 100 x

    View full-size slide

  19. © 1999-2015 Erlang Solutions Ltd.
    Wombat-tree
    19
    66 x 66 x
    66 x

    View full-size slide

  20. © 1999-2015 Erlang Solutions Ltd.
    Characteristics
    • No bottleneck
    • No single point of failure
    • Scale out in size
    • Purely parallel
    • Distributed
    • Fault-tolerant
    20

    View full-size slide

  21. © 1999-2015 Erlang Solutions Ltd. 21
    I wrote my
    Erlang system
    in 4 weeks!

    View full-size slide

  22. © 1999-2015 Erlang Solutions Ltd.
    The Myths of the Hero Programmer….
    Is it Documented?
    Is the developer supporting it?
    What visibility does your devops have into what is
    going on?
    - Live Tracing
    - Audit Trails
    - Metrics
    - Alarms
    - OPS have a CLI / HTTP Interface
    22

    View full-size slide

  23. © 1999-2015 Erlang Solutions Ltd.
    Road Map
    • Integration with standard Monitoring Tools
    • Plugins for the major OAM tools
    • Interfaces towards OAM SAAS providers
    • SNMP
    • Plugins for OTP applications
    • Mnesia, MySQL and Postgres Drivers, Cowboy
    • Riak, RabbitMQ, MongooseIM, Ejabberd, CouchDB
    • Operations
    • Tools such as etop, Pman, AppMon in the dashboard
    • Configuration management
    • Automated software upgrade
    • Live tracing and profiling
    • Orchestration no longer in beta
    • Rules Based Engine
    • Trigger based system with a rules based engine
    • Own DSL
    • Auto scalability
    23
    Customer
    Driven
    2015/2016
    2014/15
    2015

    View full-size slide

  24. © 1999-2015 Erlang Solutions Ltd. 24
    Learn from the past
    Live in the future
    Scale, manage and prevent!
    Discount Code: authd
    50% off the Early Release
    40% off the printed copy
    Francesco Cesarini 

    @FrancescoC

    View full-size slide