Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2013 OSMC Assimilation Presentation

2013 OSMC Assimilation Presentation

Presentation at the 2013 Open Source Monitoring Conference in Nuremberg, Germany on 24 October 2013.

Alan Robertson

October 24, 2013
Tweet

More Decks by Alan Robertson

Other Decks in Technology

Transcript

  1. O S M C IT Discovery and Monitoring Without Limit

    using The Assimilation Project #AssimProj @OSSAlanR http://assimproj.org/ http://bit.ly/AssimOSMC2013 Alan Robertson <[email protected]> Assimilation Systems Limited http://assimilationsystems.com
  2. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 2/45

    O S M C Project Scope Zero-network-footprint continuous Discovery integrated with extreme-scale Monitoring • Continuous extensible discovery – systems, switches, services, dependencies – zero network footprint • Extensible exception monitoring – more than 100K systems • All data goes into central graph database
  3. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 3/45

    O S M C Assimilation Project History • Inspired by 2 million core computer (cyclops64) • Concerns for extreme scale • Topology aware monitoring • Topology discovery w/out security issues =►Discovery of everything!
  4. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 5/45

    O S M C An 8-dimensional overview • Problems Addressed • Unique Capabilities • Distribution of Work • Architectural Components • Discovery Graph Schema • Extensible Discovery API • Current Status • Project Needs
  5. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 6/45

    O S M C First Dimension: Problems Addressed Risk Management at extreme scale 1. Maintaining detailed discovery database 2. Discovering systems you've forgotten about 3. Discovering what (licensed) software you're running – and where 4. Monitoring services, systems and switches 5. Finding services you aren't monitoring
  6. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 7/45

    O S M C Risk Management/Mitigation • Intrusions • Licensed Software • Audit Risk • Outages • System management
  7. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 8/45

    O S M C Why Discovery? (DevOps) • Documentation: incomplete, incorrect • Dependencies: unknown • Planning: Needs accurate data • Best Practices: Verification needs data • ITIL CMDB (Configuration Mgmt DataBase) Our Discovery: continuous, low-profile
  8. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 9/45

    O S M C Second Dimension: Unique Powerful Features 1. Continuous Discovery 2. Zero network footprint 3. Centralized graph database 4. We know everything that changes 5. Discover and update dependency information
  9. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 10/45

    O S M C (even more) Features... 6. Discovery and monitoring tightly integrated 7. Discovery and monitoring easily extensible 8. Naturally scalable to > 100K systems 9. Server failures distinguishable from switch failures 10.Minimal network load 11.Multi-tenant support
  10. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 11/45

    O S M C This all sounds unreasonable... • Huge scalability without complexity? • Discovery without sending packets? Really?
  11. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 12/45

    O S M C Third Dimension: Uniformly, fully distributed work Two philosophical underpinnings 1. Monitoring and Discovery are fully distributed 2. Reliable “no news is good news” Only responses to changes are centralized
  12. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 13/45

    O S M C Simple Scalability • I can explain how we distribute work so your grandmother would understand
  13. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 14/45

    O S M C Massive Scalability – or “I see dead servers in O(1) time” • Adding systems does not increase the monitoring work on any system • Each server monitors 2 (or 4) neighbors • Each server monitors its own services • Ring repair and alerting is O(n) – but a very small amount of work • Ring repair for a million nodes is less than 10K packets per day (approximately 1 packet per 9 seconds) Current Implementation
  14. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 15/45

    O S M C Minimizing Network Footprint (planned) • Support diagnosing switch issues • Minimize network traffic • Ideal for multi-site arrangements
  15. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 16/45

    O S M C Fourth Dimension: Architectural Components Three Architectural Compnents Collective Management Authority • One CMA per installation Nanoprobes • One nanoprobe per system Data Storage • Central Neo4j graph database
  16. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 17/45

    O S M C Nanoprobe Functions ('C') Announce self to CMA • Reserved multicast address (can be unicast address or name if no multicast) Do what CMA says • receive configuration information – CMA addresses, ports, defaults • send/expect heartbeats • perform discovery actions • perform monitoring actions No persistent state across reboots
  17. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 18/45

    O S M C Service Monitoring based on Linux-HA/Pacemaker LRM • LRM == Local Resource Manager • Well-proven architecture: – “no news is good news” AKA management by exception • Implements Open Cluster Framework standard (and others) • Each system monitors own services • Can also start, stop, migrate services
  18. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 19/45

    O S M C Basic CMA Functions (python) Nanoprobe management • Configure & direct • Hear alerts & discovery • Update rings: join/leave Update database Issue alerts
  19. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 20/45

    O S M C Monitoring Pros and Cons Pros Simple & Scalable Uniform work distribution No single point of failure Distinguishes switch vs host failure Easy on LAN, WAN Multi-tenant approach Cons Active agents Potential slowness at power-on
  20. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 21/45

    O S M C Why a graph database? (Neo4j) • Humans describe systems as graphs • Dependency & Discovery information: graph • Speed of graph traversals depends on size of subgraph, not total graph size • Root cause queries  graph traversals – notoriously slow in relational databases • Visualization is Natural • Schema-less design: good for constantly changing heterogeneous environment • Graph Model === Object Model
  21. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 22/45

    O S M C Fifth Dimension: Discovery API Scripts perform discovery – output JSON Three Discovery Snippets • OS information • Service discovery • Client discovery
  22. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 23/45

    O S M C How does discovery work? Nanoprobe scripts perform discovery • Each discovers one kind of information • Can take arguments from environment • Output JSON CMA stores Discovery Information • JSON stored in Neo4j database • CMA discovery plugins => graph nodes and relationships
  23. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 24/45

    O S M C OS discovery JSON Snippet { "nodename": "alanr-1225B", "operating-system": "GNU/Linux", "machine": "x86_64", "processor": "x86_64", "hardware-platform": "x86_64", "kernel-name": "Linux", "kernel-release": "3.8.0-31-generic", "kernel-version": "#46-Ubuntu SMP ...", "Distributor ID": "Ubuntu", "Description": "Ubuntu 13.04", "Release": "13.04", "Codename": "raring" }
  24. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 25/45

    O S M C sshd Service JSON Snippet (from netstat and /proc) "sshd": { "exe": "/usr/sbin/sshd", "cmdline": [ "/usr/sbin/sshd", "-D" ], "uid": "root", "gid": "root", "cwd": "/", "listenaddrs": { "0.0.0.0:22": { "proto": "tcp", "addr": "0.0.0.0", "port": 22 }, and so on...
  25. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 26/45

    O S M C ssh Client JSON Snippet (from netstat and /proc) "ssh": { "exe": "/usr/sbin/ssh", "cmdline": [ "ssh", "servidor" ], "uid": "alanr", "gid": "alanr", "cwd": "/home/alanr/monitor/src", "clientaddrs": { "10.10.10.5:22": { "proto": "tcp", "addr": "10.10.10.5", "port": 22 }, and so on...
  26. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 27/45

    O S M C Sixth Dimension: Graph Schema Two Schema subgraphs • Client / server dependency • Switch interconnect
  27. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 29/45

    O S M C Switch Discovery Data from LLDP (or CDP) CRM transforms LLDP (CDP) Data to JSON
  28. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 30/45

    O S M C Seventh Dimension: Current Status • First release April 2013 • Great unit tests • Nanoprobe code works well • Several discovery methods written • CMA restructuring finishing up • UI development underway • Licensed under GPL: commercial options available
  29. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 31/45

    O S M C Eighth Dimension: Get Involved! We need every talent! • Early adopters • Testers, Continuous Integration • Designers • Developers (C,Python, Shell, PowerShell, JavaScript) • Porters (esp Windows) • Promoters, publicists • Packagers • And so on...
  30. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 32/45

    O S M C Get Involved! Powerful Ideas and Infrastucture Fun, ground-breaking project Looking for early adopters, testers!! Needs for every kind of skill • Awesome User Interfaces (UI/UX) • Evangelism, community building • Test Code (simulate 106 servers!) • Python, C, script coding • Documentation • Feedback: Testing, Ideas, Plans • Many others!
  31. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 33/45

    O S M C Resistance Is Futile! Mailing List bit.ly/AssimML #AssimProj @OSSAlanR Project Web Site assimproj.org Blog techthoughts.typepad.com assimilationsystems.com
  32. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 35/45

    O S M C Discovery Discovering • systems you've forgotten • what you're not monitoring • whatever you'd like • without setting off network security alarms
  33. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 36/45

    O S M C Monitoring Monitoring • extreme scale • topology aware • integrated with discovery • easy-to-configure
  34. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 37/45

    O S M C Why Assimilation Software? • Management Perspective • DevOps Perspective
  35. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 38/45

    O S M C How does this apply to clouds? • Fits nicely into a cloud infrastructure – Should integrate into OpenStack, et al – Can control VMs • Can monitor customer VMs – Add nanoprobe to base image – bottom level of rings disappear without LLDP or CDP
  36. OSMC 24 October 2013 © 2013 Assimilation Systems Limited 39/45

    O S M C Future Plans • Production grade by end of year • Purchased support • “Real digital signatures, compression, encryption • Other security enhancements • Much more discovery • GUI • Alerting • Reporting • Add Statistical Monitoring • Best Practice Audits • Dynamic (aka cloud) specialization • Hundreds more ideas – See: https://trello.com/b/OpaED3AT