Alan presents on the Assimilation Project
Monitoring2014Modeling and Monitoring Hundredsof Thousands of ServersusingThe Assimilation Project#AssimProj @OSSAlanRhttp://assimproj.org/Alan Robertson Assimilation Systems Limitedhttp://assimilationsystems.com© 2014 Assimilation Systems Limited
View Slide
MonitoringMeetup04 December 20142/36Monitoring2014© 2014 Assimilation Systems LimitedBiography●35+ years in IT/development – 10 years insystem management (SysAdmin)●Founded Linux-HA project - led 1998-2007– aka “Heartbeat” - now called Pacemaker●Founded Assimilation Project in 2010●Founded Assimilation Systems Limited in2013●Alumnus of Bell Labs, SuSE, IBM
MonitoringMeetup04 December 20143/36Monitoring2014© 2014 Assimilation Systems LimitedHighly Scalable Discovery-Driven AutomationContinuous Discovery integrated withextreme-scale Monitoring●Continuous extensible discovery– systems, switches, services, dependencies –zero network footprint discovery process●Extensible exception monitoring– more than 100K systems●All data goes into central graph CMDB
MonitoringMeetup04 December 20144/36Monitoring2014© 2014 Assimilation Systems LimitedAssimilation Project History●Inspired by 2 million core computer (cyclops64)●Concerns for extreme scale●Topology aware monitoring●Topology discovery w/out security issues=►Discovery of everything!
MonitoringMeetup04 December 20145/36Monitoring2014© 2014 Assimilation Systems Limited
MonitoringMeetup04 December 20146/36Monitoring2014© 2014 Assimilation Systems LimitedAn 8-dimensional overview●Problems Addressed●Unique Capabilities●Distribution of Work●Architectural Components●Discovery Graph Schema●Extensible Discovery API●Current Status●Project Needs
MonitoringMeetup04 December 20147/36Monitoring2014© 2014 Assimilation Systems LimitedFirst Dimension:Problems Addressed1. Risk Management at extreme scale2. Maintaining detailed discovery database3. Discovering systems you've forgotten4. Discovering vulnerable and licensedsoftware you're running – and where5. Monitoring services, systems & switches6. Finding services you aren't monitoring
MonitoringMeetup04 December 20148/36Monitoring2014© 2014 Assimilation Systems LimitedRisk Management/Mitigation●Intrusions●Vulnerable Software●Licensed Software●Audit Risk●Outages●System management
MonitoringMeetup04 December 20149/36Monitoring2014© 2014 Assimilation Systems LimitedWhy Discovery? (DevOps)●Documentation: incomplete, incorrect●Dependencies: unknown●Planning: Needs accurate data●Best Practices: Verification needs data●ITIL CMDB (Configuration ManagementData Base)Our Discovery: continuous, low-profile
MonitoringMeetup04 December 201410/36Monitoring2014© 2014 Assimilation Systems LimitedSecond Dimension:Unique Powerful Features1. Continuous Discovery2. Discovery: Zero network footprint3. Centralized graph database4. We know everything that changes5. Discover and update dependencyinformation6. Discovery and monitoring tightlyintegrated – discovery drives automation
MonitoringMeetup04 December 201411/36Monitoring2014© 2014 Assimilation Systems Limited(even more) Features...7. Discovery and monitoring easilyextensible8. Naturally scalable to > 100K systems9. Minimal network load10.Server failures distinguishablefrom switch failures11.Best practice and vulnerability alerts12.Multi-tenant support
MonitoringMeetup04 December 201412/36Monitoring2014© 2014 Assimilation Systems LimitedThis all sounds unreasonable...●Huge scalability without complexity?●Discovery without pings or port scans?Really?
MonitoringMeetup04 December 201413/36Monitoring2014© 2014 Assimilation Systems LimitedThird Dimension:Fully distributed workTwo philosophical underpinnings1. Monitoring and Discovery are fully distributed2. Reliable “no news is good news”Only responses to changes are centralized
MonitoringMeetup04 December 201414/36Monitoring2014© 2014 Assimilation Systems LimitedSimple ScalabilityI can explain how we scale so yourgrandmother would understand...
MonitoringMeetup04 December 201415/36Monitoring2014© 2014 Assimilation Systems LimitedSimple ScalabilityI can explain how we scale so yourgrandmother would understand...istockphoto©bowdenimages
MonitoringMeetup04 December 201416/36Monitoring2014© 2014 Assimilation Systems LimitedMassive Scalability – or“I see dead servers in O(1) time”●Adding systems does not increase the monitoring work on anysystem●Each server monitors 2 (or 4) neighbors●Each server monitors and discovers its own services●Ring repair and alerting is O(n) – but a very small amount of workCurrent Implementation
MonitoringMeetup04 December 201417/36Monitoring2014© 2014 Assimilation Systems LimitedMinimizing Network Footprint(planned)●Support diagnosing switch issues●Minimize network traffic●Ideal for multi-site arrangements
MonitoringMeetup04 December 201418/36Monitoring2014© 2014 Assimilation Systems LimitedFourth Dimension:Architectural ComponentsThree Architectural Components1. Collective Management Authority●One CMA per installation2. Nanoprobes (agents)●One per system3. Data Storage●Central Neo4j graph database (CMDB)
MonitoringMeetup04 December 201419/36Monitoring2014© 2014 Assimilation Systems LimitedBasic CMA Functions (python)Nanoprobe management●Configure & direct●Hear alerts & discovery●Update rings: join/leaveUpdate databaseIssue alerts-- provide event notification
MonitoringMeetup04 December 201420/36Monitoring2014© 2014 Assimilation Systems LimitedNanoprobe Functions ('C')Announce self to CMA●Default: use reserved multicast addressDo what CMA says●receive configuration information– CMA addresses, ports, defaults●send/expect heartbeats●perform discovery actions●perform monitoring actionsNo persistent state across reboots
MonitoringMeetup04 December 201421/36Monitoring2014© 2014 Assimilation Systems LimitedService Monitoring based onHA Technologies●Well-proven architecture:– “no news is good news” AKAmanagement by exception●Implements Open Cluster Frameworkstandard (LSB and others)●Each system monitors own services●Can also start, stop, migrate services
MonitoringMeetup04 December 201422/36Monitoring2014© 2014 Assimilation Systems LimitedMonitoring Pros and ConsProsSimple & ScalableUniform work distributionNo single point of failureDistinguishes switch vshost failureEasy on LAN, WANMulti-tenant approachConsActive agentsPotential slownessat power-on
MonitoringMeetup04 December 201423/36Monitoring2014© 2014 Assimilation Systems LimitedWhy a graph database? (Neo4j)●Humans describe systems as graphs●Dependency & Discovery information: graph●Speed of graph traversals depends on size ofsubgraph, not total graph size●Root cause queries graph traversals –notoriously slow in relational databases●Visualization is Natural●Schema-less design: good for constantly changingheterogeneous environment●Graph Model === Object Model
MonitoringMeetup04 December 201424/36Monitoring2014© 2014 Assimilation Systems LimitedA multi-dimensional demo●Demonstrate basic capabilities– Discovery– Discovery-driven monitoring configuration– Discovery-driven 'tripwire-like' checksums– Monitoring – failures / successes– Host down notification●No configuration was supplied– everything comes from discoveryhttp://assimilationsystems.com/90_second_demo/
MonitoringMeetup04 December 201425/36Monitoring2014© 2014 Assimilation Systems LimitedFifth Dimension:Discovery APIScripts perform discovery– output JSONThree Sample Discovery Snippets●OS information●Service discovery●Client discovery
MonitoringMeetup04 December 201426/36Monitoring2014© 2014 Assimilation Systems LimitedHow does discovery work?Nanoprobe scripts perform discovery●Each discovers one kind of information●Can take arguments from environment●Output JSONCMA stores Discovery Information●JSON stored in Neo4j database●CMA discovery plugins => graph nodesand relationships
MonitoringMeetup04 December 201427/36Monitoring2014© 2014 Assimilation Systems LimitedA Few Canned Queriesallipports get all port/ip/service/hostsallswitchports get switch connectionscrashed get crashed serversshutdown get gracefully shutdown serversdownservices get nonworking servicesfindip get system owning IPfindmac get system owning MACunknownips get unknown IP addressesunmonitored get unmonitored services
MonitoringMeetup04 December 201428/36Monitoring2014© 2014 Assimilation Systems LimitedOS discovery JSON Snippet{ "nodename": "alanr-1225B","operating-system": "GNU/Linux","machine": "x86_64","processor": "x86_64","hardware-platform": "x86_64","kernel-name": "Linux","kernel-release": "3.8.0-31-generic","kernel-version": "#46-Ubuntu SMP ...","Distributor ID": "Ubuntu","Description": "Ubuntu 13.04","Release": "13.04","Codename": "raring" }
MonitoringMeetup04 December 201429/36Monitoring2014© 2014 Assimilation Systems Limited"sshd": {"exe": "/usr/sbin/sshd","cmdline": [ "/usr/sbin/sshd", "-D" ],"uid": "root","gid": "root","cwd": "/","listenaddrs": {"0.0.0.0:22": {"proto": "tcp","addr": "0.0.0.0","port": 22 },sshd Service JSON Snippet(from netstat and /proc)
MonitoringMeetup04 December 201430/36Monitoring2014© 2014 Assimilation Systems Limited"ssh": {"exe": "/usr/sbin/ssh","cmdline": [ "ssh", "servidor" ],"uid": "alanr","gid": "alanr","cwd": "/home/alanr/monitor/src","clientaddrs": {"10.10.10.5:22": {"proto": "tcp","addr": "10.10.10.5","port": 22 },ssh Client JSON Snippet(from netstat and /proc)
MonitoringMeetup04 December 201431/36Monitoring2014© 2014 Assimilation Systems LimitedSixth Dimension:Graph SchemaTwo Schema subgraphs●Client / serverdependency●Switch interconnect
MonitoringMeetup04 December 201432/36Monitoring2014© 2014 Assimilation Systems Limitedssh -> sshd dependency graph
MonitoringMeetup04 December 201433/36Monitoring2014© 2014 Assimilation Systems LimitedSwitch Discovery Datafrom LLDP (or CDP)
MonitoringMeetup04 December 201434/36Monitoring2014© 2014 Assimilation Systems LimitedSeventh Dimension:Current Status●Fourth release out 20 October 2014– next release (December?) will have encrypted comm●Great unit tests●Several discovery methods written●Extensible Automated Discovery Triggers●Discovery => Automatic Monitoring (WOOT!)●Discovery => Network-Facing Checksums●Command Line Queries●Licenses: Commercial or GPLv3
MonitoringMeetup04 December 201435/36Monitoring2014© 2014 Assimilation Systems LimitedEighth Dimension:Get Involved!We need you!●Early adopters●Testers, Continuous Integration●Best practice experts●Designers●Developers (C,Python, Shell, PowerShell, JavaScript)●Porters (esp Windows)●Promoters, Publicists, Packagers, etc.
MonitoringMeetup04 December 201436/36Monitoring2014© 2014 Assimilation Systems LimitedResistance Is Futile!These slides bit.ly/AssimLFNW14Mailing List bit.ly/AssimML#AssimProj @OSSAlanR#assimilation on freenode IRCProject Web Siteassimproj.orgCompany Web Siteassimilationsystems.com