Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2013 August BLUG - Discovery and Monitoring Wit...

Avatar for Alan Robertson Alan Robertson
August 08, 2013
53

2013 August BLUG - Discovery and Monitoring Without Limit

A presentation of the Assimilation Project to the Boulder Linux User's Group

Avatar for Alan Robertson

Alan Robertson

August 08, 2013
Tweet

Transcript

  1. Discovery and Monitoring Without Limit using The Assimilation Project #AssimProj

    @OSSAlanR http://assimproj.org/ Alan Robertson <[email protected]> Assimilation Systems Limited http://assimilationsystems.com
  2. 8 August 2013 © 2013 Assimilation Systems Limited 2/33 Biography

    • Founded Linux-HA project - led 1998-2007 - now called Pacemaker • Founded Assimilation Project in 2010 • Founded Assimilation Systems Limited in 2013 • Alumnus of Bell Labs, SuSE, IBM
  3. 8 August 2013 © 2013 Assimilation Systems Limited 3/33 Project

    background • Available as GPL (or commercial) – your fork/exec scripts not required to be GPL • Founded in late 2010 • Now my full time endeavor – Assimilation Systems Limited • Currently around 25K lines of code • First release: April 2013
  4. 8 August 2013 © 2013 Assimilation Systems Limited 4/33 T.A.N.S.T.A.A.F.L.

    What I need from you... • Feedback on the project/product – Is it useful – why or why not? – Would it sell to management? • Feedback on my approach to presenting it • Other presentation feedback – Clarity, Style, etc...
  5. 8 August 2013 © 2013 Assimilation Systems Limited 5/33 Project

    Scope Zero-network-footprint continuous Discovery integrated with extreme-scale Monitoring • Continuous extensible discovery – systems, switches, services, dependencies – zero network footprint • Extensible exception monitoring – more than 100K systems • All data goes into central graph database
  6. 8 August 2013 © 2013 Assimilation Systems Limited 6/33 Questions

    • How many of you have monitoring? – Open or closed source? – How many of you are happy with it? • How many of you have discovery? – Open or closed source? – Is it continuous? – How many of you are happy with it?
  7. 8 August 2013 © 2013 Assimilation Systems Limited 7/33 Why

    Assimilation Software? • Management Perspective • DevOps Perspective
  8. 8 August 2013 © 2013 Assimilation Systems Limited 8/33 Risk

    Management/Mitigation • Intrusions • Licensed Software • Audit Risk • Outages • System management
  9. 8 August 2013 © 2013 Assimilation Systems Limited 9/33 Why

    Discovery? (DevOps) • Documentation: incomplete, incorrect • Dependencies: unknown • Planning: Needs accurate data • Best Practices: Verification needs data • ITIL CMDB (Configuration Mgmt DataBase)
  10. 8 August 2013 © 2013 Assimilation Systems Limited 10/33 Why

    Our Monitoring? • Simpler to configure (in theory) • Growth is non-issue • Extremely low network traffic • Ideal for cross-WAN monitoring • Highlight cascading failure root causes • Not confused by switch failures • Most switches get monitored “for free”
  11. 8 August 2013 © 2013 Assimilation Systems Limited 11/33 This

    all sounds unreasonable... • Huge scalability without complexity? • Discovery without sending packets? Really?
  12. 8 August 2013 © 2013 Assimilation Systems Limited 12/33 Architectural

    Overview Collective Management Authority • One CMA per installation Nanoprobes • One nanoprobe per OS image Data Storage • Central Neo4j graph database General Rule: “No News Is Good News”
  13. 8 August 2013 © 2013 Assimilation Systems Limited 13/33 Simple

    Scalability • I can explain how we scale so your grandmother would understand
  14. 8 August 2013 © 2013 Assimilation Systems Limited 14/33 Massive

    Scalability – or “I see dead servers in O(1) time” • Adding systems does not increase the monitoring work on any system • Each server monitors 2 (or 4) neighbors • Each server monitors its own services • Ring repair and alerting is O(n) – but a very small amount of work • Ring repair for a million nodes is less than 10K packets per day (approximately 1 packet per 9 seconds) Current Implementation
  15. 8 August 2013 © 2013 Assimilation Systems Limited 15/33 Decreasing

    Network Footprint (planned) • Support diagnosing switch issues • Minimize network traffic • Ideal for multi-site arrangements
  16. 8 August 2013 © 2013 Assimilation Systems Limited 16/33 Service

    Monitoring Based on Linux-HA LRM • LRM == Local Resource Manager • Well-proven architecture: – “no news is good news” AKA management by exception • Implements Open Cluster Framework standard (and others) • Each system monitors own services • Can also start, stop, migrate services
  17. 8 August 2013 © 2013 Assimilation Systems Limited 18/33 Monitoring

    Pros and Cons Pros Simple & Scalable Uniform work distribution No single point of failure Distinguishes switch vs host failure Easy on LAN, WAN Cons Active agents Potential slowness at power-on
  18. 8 August 2013 © 2013 Assimilation Systems Limited 19/33 How

    does this apply to clouds? • Fits nicely into a cloud infrastructure – Should integrate into OpenStack, et al – Can control VMs • Can monitor customer VMs – Add nanoprobe to base image – bottom level of rings disappear without LLDP or CDP
  19. 8 August 2013 © 2013 Assimilation Systems Limited 21/33 Nanoprobe

    Functions ('C') Announce self to CMA • Reserved multicast address (can be unicast address or name if no multicast) Do what CMA says • receive configuration information – CMA addresses, ports, defaults • send/expect heartbeats • perform discovery actions • perform monitoring actions No persistent state across reboots
  20. 8 August 2013 © 2013 Assimilation Systems Limited 22/33 Basic

    CMA Functions (python) Nanoprobe management • Configure & direct • Hear alerts & discovery • Update rings: join/leave Update database Issue alerts
  21. 8 August 2013 © 2013 Assimilation Systems Limited 24/33 Why

    a graph database? (Neo4j) • Dependency & Discovery information: graph • Speed of graph traversals depends on size of subgraph, not total graph size • Root cause queries  graph traversals – notoriously slow in relational databases • Visualization of relationships • Schema-less design: good for constantly changing heterogeneous environment
  22. 8 August 2013 © 2013 Assimilation Systems Limited 25/33 How

    does discovery work? Nanoprobe scripts perform discovery • Each discovers one kind of information • Can take arguments (in environment) • Output JSON CMA stores Discovery Information • JSON stored in Neo4j database • CMA discovery plugins => graph nodes and relationships
  23. 8 August 2013 © 2013 Assimilation Systems Limited 26/33 sshd

    Service JSON Snippet (from netstat and /proc) "sshd": { "exe": "/usr/sbin/sshd", "cmdline": [ "/usr/sbin/sshd", "-D" ], "uid": "root", "gid": "root", "cwd": "/", "listenaddrs": { "0.0.0.0:22": { "proto": "tcp", "addr": "0.0.0.0", "port": 22 }, and so on...
  24. 8 August 2013 © 2013 Assimilation Systems Limited 27/33 ssh

    Client JSON Snippet (from netstat and /proc) "ssh": { "exe": "/usr/sbin/ssh", "cmdline": [ "ssh", "servidor" ], "uid": "alanr", "gid": "alanr", "cwd": "/home/alanr/monitor/src", "clientaddrs": { "10.10.10.5:22": { "proto": "tcp", "addr": "10.10.10.5", "port": 22 }, and so on...
  25. 8 August 2013 © 2013 Assimilation Systems Limited 29/33 Switch

    Discovery Data from LLDP (or CDP) CRM transforms LLDP (CDP) Data to JSON
  26. 8 August 2013 © 2013 Assimilation Systems Limited 30/33 Current

    State • First release was April 2013 • Great unit test infrastructure • Nanoprobe code – works well • Service monitoring works • Lacking real digital signatures, encryption, compression • Reliable UDP comm code all working • CMA code works, much more to go • Several discovery methods written • Licensed under the GPL
  27. 8 August 2013 © 2013 Assimilation Systems Limited 31/33 Future

    Plans • Production grade by end of year • Support, commercial licenses • “Real digital signatures, compression, encryption • Other security enhancements • Much more discovery • GUI • Alerting • Reporting • Add Statistical Monitoring • Best Practice Audits • Dynamic (aka cloud) specialization • Hundreds more ideas – See: https://trello.com/b/OpaED3AT
  28. 8 August 2013 © 2013 Assimilation Systems Limited 32/33 Get

    Involved! Powerful Ideas and Infrastucture Fun, ground-breaking project Looking for early adopters, testers!! Needs for every kind of skill • Awesome User Interfaces (UI/UX) • Evangelism, community building • Test Code (simulate 106 servers!) • Python, C, script coding • Documentation • Feedback: Testing, Ideas, Plans • Many others!
  29. 8 August 2013 © 2013 Assimilation Systems Limited 33/33 Resistance

    Is Futile! #AssimProj @OSSAlanR #AssimMon Project Web Site http://assimproj.org Blog techthoughts.typepad.com lists.community.tummy.com/cgi-bin/mailman/admin/assimilation