Presentation of the Assimilation Project to NCAR Boulder
IT Discovery and MonitoringWithout LimitusingThe Assimilation Project#AssimProj @OSSAlanRhttp://assimproj.org/http://bit.ly/AssimNCAR2013Alan Robertson Assimilation Systems Limitedhttp://assimilationsystems.com
View Slide
NCAR26 September2013© 2013 Assimilation Systems Limited 2/38Why Am I Here?●Understand Your Environment and Needs– What you currently do for discovery– What you currently do for monitoring– How those are working for you– Are multi-tenant capabilities appealing?●Give an overview of the project●Engage with your communities●Understand if we ought to stay in touch
NCAR26 September2013© 2013 Assimilation Systems Limited 3/38Upcoming EventsNational Center for Atmospheric Research (today!)Denver Open Source User’s GroupFacebook presentationGraphConnect San FranciscoOpen Source Monitoring Conference - NürnbergNSA / Homeland Security Assimilation Technical TalkLarge Installation System Administration Conference - DCColorado Springs Open Source User’s Grouplinux.conf.au – Linux Conference in Australia - PerthDetails on http://assimilationsystems.com/
NCAR26 September2013© 2013 Assimilation Systems Limited 4/38DiscoveryDiscovering●systems you've forgotten●what you're not monitoring●whatever you'd like●without setting off security alarms
NCAR26 September2013© 2013 Assimilation Systems Limited 5/38MonitoringMonitoring●extreme scale●topology aware●integrated with discovery●easy-to-configure
NCAR26 September2013© 2013 Assimilation Systems Limited 6/38Assimilation Project History●Inspired by 2 million core computer (cyclops64)●Concerns for extreme scale●Topology aware monitoring●Topology discovery w/out security issues=►Discovery of everything!
NCAR26 September2013© 2013 Assimilation Systems Limited 7/38Project ScopeZero-network-footprint continuous Discoveryintegrated with extreme-scale Monitoring●Continuous extensible discovery– systems, switches, services, dependencies– zero network footprint●Extensible exception monitoring– more than 100K systems●All data goes into central graph database
NCAR26 September2013© 2013 Assimilation Systems Limited 8/38Why Assimilation Software?●Management Perspective●DevOps Perspective
NCAR26 September2013© 2013 Assimilation Systems Limited 9/38Risk Management/Mitigation●Intrusions●Licensed Software●Audit Risk●Outages●System management
NCAR26 September2013© 2013 Assimilation Systems Limited 10/38Why Discovery? (DevOps)●Documentation: incomplete, incorrect●Dependencies: unknown●Planning: Needs accurate data●Best Practices: Verification needsdata●ITIL CMDB (Configuration MgmtDataBase)Our Discovery: continuous, low-profile
NCAR26 September2013© 2013 Assimilation Systems Limited 11/38Why Our Monitoring?●Simpler to configure (in theory)●Growth is non-issue●Extremely low network traffic●Ideal for cross-WAN monitoring●Highlight cascading failure root causes●Not confused by switch failures●Most switches get monitored “for free”
NCAR26 September2013© 2013 Assimilation Systems Limited 12/38This all sounds unreasonable...●Huge scalability without complexity?●Discovery without sending packets?Really?
NCAR26 September2013© 2013 Assimilation Systems Limited 13/38Architectural OverviewCollective Management Authority●One CMA per installationNanoprobes●One nanoprobe per OS imageData Storage●Central Neo4j graph databaseGeneral Rule: “No News Is Good News”
NCAR26 September2013© 2013 Assimilation Systems Limited 14/38Simple Scalability●I can explain how we scale soyour grandmother wouldunderstand
NCAR26 September2013© 2013 Assimilation Systems Limited 15/38Massive Scalability – or“I see dead servers in O(1) time”●Adding systems does not increase the monitoring work on anysystem●Each server monitors 2 (or 4) neighbors●Each server monitors its own services●Ring repair and alerting is O(n) – but a very small amount of work●Ring repair for a million nodes is less than 10K packets per day(approximately 1 packet per 9 seconds)Current Implementation
NCAR26 September2013© 2013 Assimilation Systems Limited 16/38Minimizing Network Footprint(planned)●Support diagnosing switch issues●Minimize network traffic●Ideal for multi-site arrangements
NCAR26 September2013© 2013 Assimilation Systems Limited 17/38Service MonitoringBased on Linux-HA LRM●LRM == Local Resource Manager●Well-proven architecture:– “no news is good news” AKAmanagement by exception●Implements Open Cluster Frameworkstandard (and others)●Each system monitors own services●Can also start, stop, migrate services
NCAR26 September2013© 2013 Assimilation Systems Limited 18/38Monitoring Pros and ConsProsSimple & ScalableUniform workdistributionNo single point offailureDistinguishes switchvs host failureEasy on LAN, WANMulti-tenant approachConsActive agentsPotential slowness atpower-on
NCAR26 September2013© 2013 Assimilation Systems Limited 19/38How does this apply to clouds?●Fits nicely into a cloud infrastructure– Should integrate into OpenStack, et al– Can control VMs●Can monitor customer VMs– Add nanoprobe to base image– bottom level of rings disappear withoutLLDP or CDP
NCAR26 September2013© 2013 Assimilation Systems Limited 20/38Architectural Details●Nanoprobes●CMA●Neo4j
NCAR26 September2013© 2013 Assimilation Systems Limited 21/38Nanoprobe Functions ('C')Announce self to CMA●Reserved multicast address (can beunicast address or name if no multicast)Do what CMA says●receive configuration information– CMA addresses, ports, defaults●send/expect heartbeats●perform discovery actions●perform monitoring actionsNo persistent state across reboots
NCAR26 September2013© 2013 Assimilation Systems Limited 22/38Basic CMA Functions (python)Nanoprobe management●Configure & direct●Hear alerts & discovery●Update rings: join/leaveUpdate databaseIssue alerts
NCAR26 September2013© 2013 Assimilation Systems Limited 23/38Why a graph database? (Neo4j)●Dependency & Discovery information: graph●Speed of graph traversals depends on sizeof subgraph, not total graph size●Root cause queries graph traversals –notoriously slow in relational databases●Visualization of relationships●Schema-less design: good for constantlychanging heterogeneous environment●Graph Model === Object Model
NCAR26 September2013© 2013 Assimilation Systems Limited 24/38How does discovery work?Nanoprobe scripts perform discovery●Each discovers one kind of information●Can take arguments (in environment)●Output JSONCMA stores Discovery Information●JSON stored in Neo4j database●CMA discovery plugins => graph nodes andrelationships
NCAR26 September2013© 2013 Assimilation Systems Limited 25/38sshd Service JSON Snippet(from netstat and /proc)"sshd": {"exe": "/usr/sbin/sshd","cmdline": [ "/usr/sbin/sshd", "-D" ],"uid": "root","gid": "root","cwd": "/","listenaddrs": {"0.0.0.0:22": {"proto": "tcp","addr": "0.0.0.0","port": 22}, and so on...
NCAR26 September2013© 2013 Assimilation Systems Limited 26/38ssh Client JSON Snippet(from netstat and /proc)"ssh": {"exe": "/usr/sbin/ssh","cmdline": [ "ssh", "servidor" ],"uid": "alanr","gid": "alanr","cwd": "/home/alanr/monitor/src","clientaddrs": {"10.10.10.5:22": {"proto": "tcp","addr": "10.10.10.5","port": 22}, and so on...
NCAR26 September2013© 2013 Assimilation Systems Limited 27/38ssh -> sshd dependency graph
NCAR26 September2013© 2013 Assimilation Systems Limited 28/38Switch Discovery Datafrom LLDP (or CDP)CRM transforms LLDP (CDP) Data to JSON
NCAR26 September2013© 2013 Assimilation Systems Limited 29/38Current State●First release was April 2013●Great unit test infrastructure●Nanoprobe code – works well●Service monitoring works●Lacks digital signatures, encryption, compression●Reliable UDP comm code working●Several discovery methods written●CMA and database code restructuring near-complete●UI development underway●Licensed under the GPL, commercial license available
NCAR26 September2013© 2013 Assimilation Systems Limited 30/38Future Plans●Production grade by end of year●Purchased support●“Real digital signatures, compression, encryption●Other security enhancements●Much more discovery●GUI●Alerting●Reporting●Add Statistical Monitoring●Best Practice Audits●Dynamic (aka cloud) specialization●Hundreds more ideas– See: https://trello.com/b/OpaED3AT
NCAR26 September2013© 2013 Assimilation Systems Limited 31/38Get Involved!Powerful Ideas and InfrastuctureFun, ground-breaking projectLooking for early adopters, testers!!Needs for every kind of skill●Awesome User Interfaces (UI/UX)●Evangelism, community building●Test Code (simulate 106 servers!)●Python, C, script coding●Documentation●Feedback: Testing, Ideas, Plans●Many others!
NCAR26 September2013© 2013 Assimilation Systems Limited 32/38Resistance Is Futile!#AssimProj @OSSAlanR#AssimMonProject Web Sitehttp://assimproj.orgBlogtechthoughts.typepad.comlists.community.tummy.com/cgi-bin/mailman/admin/assimilation