This is a presentation given on Wed 08 January 2014 at linux.conf.au in Perth, Australia.
LCA2014IT Discovery and MonitoringWithout LimitusingThe Assimilation Project#AssimProj @OSSAlanRhttp://assimproj.org/Alan Robertson Assimilation Systems Limitedhttp://assimilationsystems.com
View Slide
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 2/37LCA2014Project ScopeZero-network-footprint continuous Discoveryintegrated with extreme-scale Monitoring●Continuous extensible discovery– systems, switches, services, dependencies– zero network footprint●Extensible exception monitoring– more than 100K systems●All data goes into central graph database
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 3/37LCA2014Questions●How many of you have monitoring?– Open or closed source?– How many of you are happy with it?●How many of you have discovery?– Open or closed source?– Is it continuous?– How many of you are happy with it?
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 4/37LCA2014Assimilation Project History●Inspired by 2 million core computer (cyclops64)●Concerns for extreme scale●Topology aware monitoring●Topology discovery w/out security issues=►Discovery of everything!
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 5/37LCA2014
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 6/37LCA2014An 8-dimensional overview●Problems Addressed●Unique Capabilities●Distribution of Work●Architectural Components●Discovery Graph Schema●Extensible Discovery API●Current Status●Project Needs
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 7/37LCA2014First Dimension:Problems AddressedRisk Management at extreme scale1. Maintaining detaileddiscovery database2. Discovering systemsyou've forgotten about3. Discovering what (licensed)software you're running – and where4. Monitoring services, systems andswitches5. Finding services you aren't monitoring
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 8/37LCA2014Risk Management/Mitigation●Intrusions●Licensed Software●Audit Risk●Outages●System management
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 9/37LCA2014Why Discovery? (DevOps)●Documentation: incomplete, incorrect●Dependencies: unknown●Planning: Needs accurate data●Best Practices: Verification needsdata●ITIL CMDB (Configuration MgmtDataBase)Our Discovery: continuous, low-profile
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 10/37LCA2014Second Dimension:Unique Powerful Features1. Continuous Discovery2. Zero network discovery footprint3. Centralized graph database4. We know everything thatchanges5. Discover and update dependencyinformation
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 11/37LCA2014(even more) Features...6. Discovery and monitoring tightlyintegrated – discovery drives monitoring7. Discovery and monitoring easilyextensible8. Naturally scalable to > 100K systems9. Server failures distinguishablefrom switch failures10.Minimal network load11.Multi-tenant support
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 12/37LCA2014This all sounds unreasonable...●Huge scalability without complexity?●Discovery without sending packets?Really?
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 13/37LCA2014Third Dimension:Uniformly, fully distributed workTwo philosophical underpinnings1. Monitoring and Discoveryare fully distributed2. Reliable “no news is good news”Only responses to changes are centralized
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 14/37LCA2014Simple Scalability●I can explain how we distributework so your grandmotherwould understand
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 15/37LCA2014Massive Scalability – or“I see dead servers in O(1) time”●Adding systems does not increase the monitoring work on anysystem●Each server monitors 2 (or 4) neighbors●Each server monitors its own services●Ring repair and alerting is O(n) – but a very small amount of work●Ring repair for a million nodes is less than 10K packets per day(approximately 1 packet per 9 seconds)Current Implementation
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 16/37LCA2014Minimizing Network Footprint(planned)●Support diagnosing switch issues●Minimize network traffic●Ideal for multi-site arrangements
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 17/37LCA2014Fourth Dimension:Architectural ComponentsThree Architectural ComponentsCollective ManagementAuthority●One CMA per installationNanoprobes●One nanoprobe per systemData Storage●Central Neo4j graph database
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 18/37LCA2014Basic CMA Functions (python)Nanoprobe management●Configure & direct●Hear alerts & discovery●Update rings: join/leaveUpdate databaseIssue alerts
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 19/37LCA2014Nanoprobe Functions ('C')Announce self to CMA●Reserved multicast address (can beunicast address or name if no multicast)Do what CMA says●receive configuration information– CMA addresses, ports, defaults●send/expect heartbeats●perform discovery actions●perform monitoring actionsNo persistent state across reboots
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 20/37LCA2014Service Monitoring based onLinux-HA/Pacemaker LRM●LRM == Local Resource Manager●Well-proven architecture:– “no news is good news” AKAmanagement by exception●Implements Open Cluster Frameworkstandard (and others)●Each system monitors own services●Can also start, stop, migrate services
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 21/37LCA2014Monitoring Pros and ConsProsSimple & ScalableUniform workdistributionNo single point offailureDistinguishes switchvs host failureEasy on LAN, WANMulti-tenant approachConsActive agentsPotential slowness atpower-on
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 22/37LCA2014Why a graph database? (Neo4j)●Humans describe systems as graphs●Dependency & Discovery information: graph●Speed of graph traversals depends on sizeof subgraph, not total graph size●Root cause queries graph traversals –notoriously slow in relational databases●Visualization is Natural●Schema-less design: good for constantlychanging heterogeneous environment●Graph Model === Object Model
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 23/37LCA2014Fifth Dimension:Discovery APIScripts perform discovery– output JSONThree Sample Discovery Snippets●OS information●Service discovery●Client discovery
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 24/37LCA2014A multi-dimensional demo●Demonstrate basic capabilities– Discovery– Automatic monitoring configuration– Monitoring – failures / successes●No configuration was supplied– everything comes from discovery
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 25/37LCA2014How does discovery work?Nanoprobe scripts perform discovery●Each discovers one kind of information●Can take arguments from environment●Output JSONCMA stores Discovery Information●JSON stored in Neo4j database●CMA discovery plugins => graph nodesand relationships
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 26/37LCA2014OS discovery JSON Snippet{ "nodename": "alanr-1225B","operating-system": "GNU/Linux","machine": "x86_64","processor": "x86_64","hardware-platform": "x86_64","kernel-name": "Linux","kernel-release": "3.8.0-31-generic","kernel-version": "#46-Ubuntu SMP ...","Distributor ID": "Ubuntu","Description": "Ubuntu 13.04","Release": "13.04","Codename": "raring"}
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 27/37LCA2014sshd Service JSON Snippet(from netstat and /proc)"sshd": {"exe": "/usr/sbin/sshd","cmdline": [ "/usr/sbin/sshd", "-D" ],"uid": "root","gid": "root","cwd": "/","listenaddrs": {"0.0.0.0:22": {"proto": "tcp","addr": "0.0.0.0","port": 22}, and so on...
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 28/37LCA2014ssh Client JSON Snippet(from netstat and /proc)"ssh": {"exe": "/usr/sbin/ssh","cmdline": [ "ssh", "servidor" ],"uid": "alanr","gid": "alanr","cwd": "/home/alanr/monitor/src","clientaddrs": {"10.10.10.5:22": {"proto": "tcp","addr": "10.10.10.5","port": 22}, and so on...
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 29/37LCA2014Sixth Dimension:Graph SchemaTwo Schema subgraphs●Client / serverdependency●Switch interconnect
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 30/37LCA2014ssh -> sshd dependency graph
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 31/37LCA2014Switch Discovery Datafrom LLDP (or CDP)CRM transforms LLDP (CDP) Data to JSON
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 32/37LCA2014Seventh Dimension:Current Status●First release April 2013●Great unit tests●Nanoprobe code works well●Several discovery methods written●CMA restructuring complete●Discovery => Automatic Monitoring (WOOT!)●UI development underway●Licensed under GPL: commercial optionsavailable
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 33/37LCA2014Eighth Dimension:Get Involved!We need every talent!●Early adopters●Testers, Continuous Integration●Designers●Developers (C,Python, Shell, PowerShell, JavaScript)●Porters (esp Windows)●Promoters, publicists●Packagers●And so on...
linux.conf.au08 January2014© 2013 Assimilation Systems Limited 34/37LCA2014Resistance Is Futile!Mailing List bit.ly/AssimML#AssimProj @OSSAlanRProject Web Siteassimproj.orgBlogtechthoughts.typepad.comassimilationsystems.com