Presentation at the 2013 Open Source Monitoring Conference in Nuremberg, Germany on 24 October 2013.
OSMCIT Discovery and MonitoringWithout LimitusingThe Assimilation Project#AssimProj @OSSAlanRhttp://assimproj.org/http://bit.ly/AssimOSMC2013Alan Robertson Assimilation Systems Limitedhttp://assimilationsystems.com
View Slide
OSMC24 October2013© 2013 Assimilation Systems Limited 2/45OSMCProject ScopeZero-network-footprint continuous Discoveryintegrated with extreme-scale Monitoring●Continuous extensible discovery– systems, switches, services, dependencies– zero network footprint●Extensible exception monitoring– more than 100K systems●All data goes into central graph database
OSMC24 October2013© 2013 Assimilation Systems Limited 3/45OSMCAssimilation Project History●Inspired by 2 million core computer (cyclops64)●Concerns for extreme scale●Topology aware monitoring●Topology discovery w/out security issues=►Discovery of everything!
OSMC24 October2013© 2013 Assimilation Systems Limited 4/45OSMC
OSMC24 October2013© 2013 Assimilation Systems Limited 5/45OSMCAn 8-dimensional overview●Problems Addressed●Unique Capabilities●Distribution of Work●Architectural Components●Discovery Graph Schema●Extensible Discovery API●Current Status●Project Needs
OSMC24 October2013© 2013 Assimilation Systems Limited 6/45OSMCFirst Dimension:Problems AddressedRisk Management at extreme scale1. Maintaining detaileddiscovery database2. Discovering systemsyou've forgotten about3. Discovering what (licensed)software you're running – and where4. Monitoring services, systems andswitches5. Finding services you aren't monitoring
OSMC24 October2013© 2013 Assimilation Systems Limited 7/45OSMCRisk Management/Mitigation●Intrusions●Licensed Software●Audit Risk●Outages●System management
OSMC24 October2013© 2013 Assimilation Systems Limited 8/45OSMCWhy Discovery? (DevOps)●Documentation: incomplete, incorrect●Dependencies: unknown●Planning: Needs accurate data●Best Practices: Verification needsdata●ITIL CMDB (Configuration MgmtDataBase)Our Discovery: continuous, low-profile
OSMC24 October2013© 2013 Assimilation Systems Limited 9/45OSMCSecond Dimension:Unique Powerful Features1. Continuous Discovery2. Zero network footprint3. Centralized graph database4. We know everything thatchanges5. Discover and update dependencyinformation
OSMC24 October2013© 2013 Assimilation Systems Limited 10/45OSMC(even more) Features...6. Discovery and monitoring tightlyintegrated7. Discovery and monitoring easilyextensible8. Naturally scalable to > 100K systems9. Server failures distinguishablefrom switch failures10.Minimal network load11.Multi-tenant support
OSMC24 October2013© 2013 Assimilation Systems Limited 11/45OSMCThis all sounds unreasonable...●Huge scalability without complexity?●Discovery without sending packets?Really?
OSMC24 October2013© 2013 Assimilation Systems Limited 12/45OSMCThird Dimension:Uniformly, fully distributed workTwo philosophical underpinnings1. Monitoring and Discoveryare fully distributed2. Reliable “no news is good news”Only responses to changes are centralized
OSMC24 October2013© 2013 Assimilation Systems Limited 13/45OSMCSimple Scalability●I can explain how we distributework so your grandmotherwould understand
OSMC24 October2013© 2013 Assimilation Systems Limited 14/45OSMCMassive Scalability – or“I see dead servers in O(1) time”●Adding systems does not increase the monitoring work on anysystem●Each server monitors 2 (or 4) neighbors●Each server monitors its own services●Ring repair and alerting is O(n) – but a very small amount of work●Ring repair for a million nodes is less than 10K packets per day(approximately 1 packet per 9 seconds)Current Implementation
OSMC24 October2013© 2013 Assimilation Systems Limited 15/45OSMCMinimizing Network Footprint(planned)●Support diagnosing switch issues●Minimize network traffic●Ideal for multi-site arrangements
OSMC24 October2013© 2013 Assimilation Systems Limited 16/45OSMCFourth Dimension:Architectural ComponentsThree Architectural CompnentsCollective ManagementAuthority●One CMA per installationNanoprobes●One nanoprobe per systemData Storage●Central Neo4j graph database
OSMC24 October2013© 2013 Assimilation Systems Limited 17/45OSMCNanoprobe Functions ('C')Announce self to CMA●Reserved multicast address (can beunicast address or name if no multicast)Do what CMA says●receive configuration information– CMA addresses, ports, defaults●send/expect heartbeats●perform discovery actions●perform monitoring actionsNo persistent state across reboots
OSMC24 October2013© 2013 Assimilation Systems Limited 18/45OSMCService Monitoring based onLinux-HA/Pacemaker LRM●LRM == Local Resource Manager●Well-proven architecture:– “no news is good news” AKAmanagement by exception●Implements Open Cluster Frameworkstandard (and others)●Each system monitors own services●Can also start, stop, migrate services
OSMC24 October2013© 2013 Assimilation Systems Limited 19/45OSMCBasic CMA Functions (python)Nanoprobe management●Configure & direct●Hear alerts & discovery●Update rings: join/leaveUpdate databaseIssue alerts
OSMC24 October2013© 2013 Assimilation Systems Limited 20/45OSMCMonitoring Pros and ConsProsSimple & ScalableUniform workdistributionNo single point offailureDistinguishes switchvs host failureEasy on LAN, WANMulti-tenant approachConsActive agentsPotential slowness atpower-on
OSMC24 October2013© 2013 Assimilation Systems Limited 21/45OSMCWhy a graph database? (Neo4j)●Humans describe systems as graphs●Dependency & Discovery information: graph●Speed of graph traversals depends on sizeof subgraph, not total graph size●Root cause queries graph traversals –notoriously slow in relational databases●Visualization is Natural●Schema-less design: good for constantlychanging heterogeneous environment●Graph Model === Object Model
OSMC24 October2013© 2013 Assimilation Systems Limited 22/45OSMCFifth Dimension:Discovery APIScripts perform discovery– output JSONThree Discovery Snippets●OS information●Service discovery●Client discovery
OSMC24 October2013© 2013 Assimilation Systems Limited 23/45OSMCHow does discovery work?Nanoprobe scripts perform discovery●Each discovers one kind of information●Can take arguments from environment●Output JSONCMA stores Discovery Information●JSON stored in Neo4j database●CMA discovery plugins => graph nodes andrelationships
OSMC24 October2013© 2013 Assimilation Systems Limited 24/45OSMCOS discovery JSON Snippet{ "nodename": "alanr-1225B","operating-system": "GNU/Linux","machine": "x86_64","processor": "x86_64","hardware-platform": "x86_64","kernel-name": "Linux","kernel-release": "3.8.0-31-generic","kernel-version": "#46-Ubuntu SMP ...","Distributor ID": "Ubuntu","Description": "Ubuntu 13.04","Release": "13.04","Codename": "raring"}
OSMC24 October2013© 2013 Assimilation Systems Limited 25/45OSMCsshd Service JSON Snippet(from netstat and /proc)"sshd": {"exe": "/usr/sbin/sshd","cmdline": [ "/usr/sbin/sshd", "-D" ],"uid": "root","gid": "root","cwd": "/","listenaddrs": {"0.0.0.0:22": {"proto": "tcp","addr": "0.0.0.0","port": 22}, and so on...
OSMC24 October2013© 2013 Assimilation Systems Limited 26/45OSMCssh Client JSON Snippet(from netstat and /proc)"ssh": {"exe": "/usr/sbin/ssh","cmdline": [ "ssh", "servidor" ],"uid": "alanr","gid": "alanr","cwd": "/home/alanr/monitor/src","clientaddrs": {"10.10.10.5:22": {"proto": "tcp","addr": "10.10.10.5","port": 22}, and so on...
OSMC24 October2013© 2013 Assimilation Systems Limited 27/45OSMCSixth Dimension:Graph SchemaTwo Schema subgraphs●Client / serverdependency●Switch interconnect
OSMC24 October2013© 2013 Assimilation Systems Limited 28/45OSMCssh -> sshd dependency graph
OSMC24 October2013© 2013 Assimilation Systems Limited 29/45OSMCSwitch Discovery Datafrom LLDP (or CDP)CRM transforms LLDP (CDP) Data to JSON
OSMC24 October2013© 2013 Assimilation Systems Limited 30/45OSMCSeventh Dimension:Current Status●First release April 2013●Great unit tests●Nanoprobe code works well●Several discovery methods written●CMA restructuring finishing up●UI development underway●Licensed under GPL: commercialoptions available
OSMC24 October2013© 2013 Assimilation Systems Limited 31/45OSMCEighth Dimension:Get Involved!We need every talent!●Early adopters●Testers, Continuous Integration●Designers●Developers (C,Python, Shell, PowerShell, JavaScript)●Porters (esp Windows)●Promoters, publicists●Packagers●And so on...
OSMC24 October2013© 2013 Assimilation Systems Limited 32/45OSMCGet Involved!Powerful Ideas and InfrastuctureFun, ground-breaking projectLooking for early adopters, testers!!Needs for every kind of skill●Awesome User Interfaces (UI/UX)●Evangelism, community building●Test Code (simulate 106 servers!)●Python, C, script coding●Documentation●Feedback: Testing, Ideas, Plans●Many others!
OSMC24 October2013© 2013 Assimilation Systems Limited 33/45OSMCResistance Is Futile!Mailing List bit.ly/AssimML#AssimProj @OSSAlanRProject Web Siteassimproj.orgBlogtechthoughts.typepad.comassimilationsystems.com
OSMC24 October2013© 2013 Assimilation Systems Limited 34/45OSMCMy Older GeekGirl
OSMC24 October2013© 2013 Assimilation Systems Limited 35/45OSMCDiscoveryDiscovering●systems you've forgotten●what you're not monitoring●whatever you'd like●without setting off network security alarms
OSMC24 October2013© 2013 Assimilation Systems Limited 36/45OSMCMonitoringMonitoring●extreme scale●topology aware●integrated with discovery●easy-to-configure
OSMC24 October2013© 2013 Assimilation Systems Limited 37/45OSMCWhy Assimilation Software?●Management Perspective●DevOps Perspective
OSMC24 October2013© 2013 Assimilation Systems Limited 38/45OSMCHow does this apply to clouds?●Fits nicely into a cloud infrastructure– Should integrate into OpenStack, et al– Can control VMs●Can monitor customer VMs– Add nanoprobe to base image– bottom level of rings disappear withoutLLDP or CDP
OSMC24 October2013© 2013 Assimilation Systems Limited 39/45OSMCFuture Plans●Production grade by end of year●Purchased support●“Real digital signatures, compression, encryption●Other security enhancements●Much more discovery●GUI●Alerting●Reporting●Add Statistical Monitoring●Best Practice Audits●Dynamic (aka cloud) specialization●Hundreds more ideas– See: https://trello.com/b/OpaED3AT