This talk gives an overview of the Assimilation Project from the perspective of it's distributed computing aspects - hitting on scalability, protocol, encryption, etc.
DistComp.2014Distributed ComputinginThe Assimilation Project#AssimProj @OSSAlanRhttp://assimproj.org/Alan Robertson Assimilation Systems Limitedhttp://assimilationsystems.com© 2014 Assimilation Systems Limited
View Slide
DistributedComputingMeetup09 December 20142/43DistComp.2014© 2014 Assimilation Systems LimitedBiography●35+ years in IT/development – 10 years insystem management (SysAdmin)●Founded Linux-HA project - led 1998-2007– aka “Heartbeat” - now called Pacemaker●Founded Assimilation Project in 2010●Founded Assimilation Systems Limited in2013●Alumnus of Bell Labs(21), SuSE(1), IBM(13)
DistributedComputingMeetup09 December 20143/43DistComp.2014© 2014 Assimilation Systems LimitedHighly Scalable Discovery-Driven AutomationContinuous Discovery integrated withextreme-scale Monitoring●Continuous extensible discovery– systems, switches, services, dependencies –zero network footprint discovery process●Extensible exception monitoring– more than 100K systems●All data goes into central graph CMDB
DistributedComputingMeetup09 December 20144/43DistComp.2014© 2014 Assimilation Systems LimitedAssimilation Project History●Inspired by 2 million core computer (cyclops64)●Concerns for extreme scale●Topology aware monitoring●Topology discovery w/out security issues=►Discovery of everything!Basically a C2I system:Command, Communication and Intelligence
DistributedComputingMeetup09 December 20145/43DistComp.2014© 2014 Assimilation Systems LimitedA seven-dimensional overview●Problems Addressed●Unique Capabilities●Distribution of Work●Architectural Components●Communications Protocol●Current Status●Project Needs
DistributedComputingMeetup09 December 20146/43DistComp.2014© 2014 Assimilation Systems LimitedFirst Dimension:Problems Addressed1. Risk Management at extreme scale2. Maintaining detailed discovery database3. Discovering systems you've forgotten4. Discovering vulnerable and licensedsoftware you're running – and where5. Monitoring services, systems & switches6. Finding services you aren't monitoring
DistributedComputingMeetup09 December 20147/43DistComp.2014© 2014 Assimilation Systems LimitedSecond Dimension:Unique Powerful Features1. Continuous Discovery2. Discovery: Zero network footprint3. Centralized graph database4. We know everything that changes5. Discover and update dependencyinformation6. Discovery and monitoring tightlyintegrated – discovery drives automation
DistributedComputingMeetup09 December 20148/43DistComp.2014© 2014 Assimilation Systems Limited(even more) Features...7. Discovery and monitoring easilyextensible8. Naturally scalable to > 100K systems9. Minimal network load10.Server failures distinguishablefrom switch failures11.Best practice and vulnerability alerts12.Multi-tenant support
DistributedComputingMeetup09 December 20149/43DistComp.2014© 2014 Assimilation Systems LimitedThis all sounds unreasonable...●Huge scalability without complexity?●Discovery without pings or port scans?Really?
DistributedComputingMeetup09 December 201410/43DistComp.2014© 2014 Assimilation Systems LimitedTypical Monitoring Algorithm●A system sends out pings to see if systems are alive●Probe each service over the network– sometimes aggregated by endpoint agents●Load on system rises rapidly●Load on network rises rapidly with a hot spot aroundmonitoring system●Growth accomplished by more systems, proxies,and other forms of complexity
DistributedComputingMeetup09 December 201411/43DistComp.2014© 2014 Assimilation Systems LimitedMore about Cyclops64●Specialized monitoring hardware●Cube communication topology● 24●24●24●160 [2,216,204] cores (!)●Round trip costs up to 132 forwards●Traditional monitoring protocol:– really, really bad idea
DistributedComputingMeetup09 December 201412/43DistComp.2014© 2014 Assimilation Systems LimitedTypical Discovery Algorithms●Turn off intrusion detection system– Ping every address– Port scans every address– SNMP and other probes done againstopen ports– Walk network to find switch connections●Turn intrusion detection back on●Repeat annually, quarterly, monthly or weekly
DistributedComputingMeetup09 December 201413/43DistComp.2014© 2014 Assimilation Systems LimitedThird Dimension:Fully distributed workTwo philosophical underpinnings1. Monitoring and Discovery are fully distributed2. Reliable “no news is good news”Only responses to changes are centralized
DistributedComputingMeetup09 December 201414/43DistComp.2014© 2014 Assimilation Systems LimitedSimple ScalabilityI can explain how we scale so yourgrandmother would understand...
DistributedComputingMeetup09 December 201415/43DistComp.2014© 2014 Assimilation Systems LimitedSimple ScalabilityI can explain how we scale so yourgrandmother would understand...istockphoto©bowdenimages
DistributedComputingMeetup09 December 201416/43DistComp.2014© 2014 Assimilation Systems LimitedMassive Scalability – or“I see dead servers in O(1) time”●Adding systems does not increase the monitoring work on anysystem●Each server monitors 2 (or 4) neighbors●Each server monitors and discovers its own services●Ring repair and alerting is O(n) – but a very small amount of workCurrent Implementation
DistributedComputingMeetup09 December 201417/43DistComp.2014© 2014 Assimilation Systems LimitedMinimizing Network Footprint(planned)●Support diagnosing switch issues●Minimize network traffic●Ideal for multi-site arrangements
DistributedComputingMeetup09 December 201418/43DistComp.2014© 2014 Assimilation Systems LimitedFourth Dimension:Architectural ComponentsThree Architectural Components1. Collective Management Authority●One CMA per installation2. Nanoprobes (agents)●One per system3. Data Storage●Central Neo4j graph database (CMDB)
DistributedComputingMeetup09 December 201419/43DistComp.2014© 2014 Assimilation Systems LimitedBasic CMA Functions (python)Nanoprobe management●Configure & direct●Hear alerts & discovery●Update rings: join/leaveUpdate databaseIssue alerts-- provide event notification
DistributedComputingMeetup09 December 201420/43DistComp.2014© 2014 Assimilation Systems LimitedNanoprobe Functions ('C')Announce self to CMA●Default: use reserved multicast addressDo what CMA says●receive configuration information– CMA addresses, ports, defaults●send/expect heartbeats●perform discovery actions●perform monitoring actionsNo persistent state across reboots
DistributedComputingMeetup09 December 201421/43DistComp.2014© 2014 Assimilation Systems LimitedService Monitoring based onHA Technologies●Well-proven architecture:– “no news is good news” AKAmanagement by exception●Implements Open Cluster Frameworkstandard (LSB and others)●Each system monitors own services●Can also start, stop, migrate services
DistributedComputingMeetup09 December 201422/43DistComp.2014© 2014 Assimilation Systems LimitedMonitoring Pros and ConsProsSimple & ScalableUniform work distributionNo single point of failureDistinguishes switch vshost failureEasy on LAN, WANMulti-tenant approachConsActive agentsPotential slownessat power-on
DistributedComputingMeetup09 December 201423/43DistComp.2014© 2014 Assimilation Systems LimitedWhy a graph database? (Neo4j)●Humans describe systems as graphs●Dependency & Discovery information: graph●Speed of graph traversals depends on size ofsubgraph, not total graph size●Root cause queries graph traversals –notoriously slow in relational databases●Visualization is Natural●Schema-less design: good for constantly changingheterogeneous environment●Graph Model === Object Model
DistributedComputingMeetup09 December 201424/43DistComp.2014© 2014 Assimilation Systems LimitedA multi-dimensional demo●Demonstrate basic capabilities– Discovery– Discovery-driven monitoring configuration– Discovery-driven 'tripwire-like' checksums– Monitoring – failures / successes– Host down notification●No configuration was supplied– everything comes from discoveryhttp://assimilationsystems.com/90_second_demo/
DistributedComputingMeetup09 December 201425/43DistComp.2014© 2014 Assimilation Systems LimitedCommunications Attributes●Non-heartbeat communication is rare– could be months or years between packets●Some data sent to CMA is sensitive●Command sent to nanoprobes arepotentially dangerous●CMA connects to up to 106 clients●No news is good news: cannot loseinformation
DistributedComputingMeetup09 December 201426/43DistComp.2014© 2014 Assimilation Systems LimitedFifth DimensionCommunications Protocol●UDP with reliable transmission protocol– packets ACKed when acted on●Includes signatures, encryption,compression●Communication resets happen on nextcommunication – not immediately●Encryption is almost done (this week!)– using libsodium – curve25519 encryption
DistributedComputingMeetup09 December 201427/43DistComp.2014© 2014 Assimilation Systems LimitedKey Management Scenarios●Nanoprobe one-time initialization●CMA one-time initialization●Nanoprobe startup●Command flow
DistributedComputingMeetup09 December 201428/43DistComp.2014© 2014 Assimilation Systems LimitedNanoprobe one-timeinitialization
DistributedComputingMeetup09 December 201429/43DistComp.2014© 2014 Assimilation Systems LimitedCMA one-time initialization
DistributedComputingMeetup09 December 201430/43DistComp.2014© 2014 Assimilation Systems LimitedNanoprobe Startup
DistributedComputingMeetup09 December 201431/43DistComp.2014© 2014 Assimilation Systems LimitedCommand Processing
DistributedComputingMeetup09 December 201432/43DistComp.2014© 2014 Assimilation Systems LimitedSixth Dimension:Current Status●Fourth release out 20 October 2014– next release (December?) will have encrypted comm●Great unit tests●Several discovery methods written●Extensible Automated Discovery Triggers●Discovery => Automatic Monitoring (WOOT!)●Discovery => Network-Facing Checksums●Command Line Queries●Licenses: Commercial or GPLv3
DistributedComputingMeetup09 December 201433/43DistComp.2014© 2014 Assimilation Systems LimitedSeventh Dimension:Get Involved!We need you!●Early adopters●Testers, Continuous Integration●Best practice experts●Designers●Developers (C,Python, Shell, PowerShell, JavaScript)●Porters (esp Windows)●Promoters, Publicists, Packagers, etc.
DistributedComputingMeetup09 December 201434/43DistComp.2014© 2014 Assimilation Systems LimitedResistance Is Futile!These slides: bit.ly/AssimDCM14Mailing List bit.ly/AssimML#AssimProj @OSSAlanR#assimilation on freenode IRCProject Web Siteassimproj.orgCompany Web Siteassimilationsystems.com
DistributedComputingMeetup09 December 201435/43DistComp.2014© 2014 Assimilation Systems LimitedFifth Dimension:Discovery APIScripts perform discovery– output JSONThree Sample Discovery Snippets●OS information●Service discovery●Client discovery
DistributedComputingMeetup09 December 201436/43DistComp.2014© 2014 Assimilation Systems LimitedHow does discovery work?Nanoprobe scripts perform discovery●Each discovers one kind of information●Can take arguments from environment●Output JSONCMA stores Discovery Information●JSON stored in Neo4j database●CMA discovery plugins => graph nodesand relationships
DistributedComputingMeetup09 December 201437/43DistComp.2014© 2014 Assimilation Systems LimitedA Few Canned Queriesallipports get all port/ip/service/hostsallswitchports get switch connectionscrashed get crashed serversshutdown get gracefully shutdown serversdownservices get nonworking servicesfindip get system owning IPfindmac get system owning MACunknownips get unknown IP addressesunmonitored get unmonitored services
DistributedComputingMeetup09 December 201438/43DistComp.2014© 2014 Assimilation Systems LimitedOS discovery JSON Snippet{ "nodename": "alanr-1225B","operating-system": "GNU/Linux","machine": "x86_64","processor": "x86_64","hardware-platform": "x86_64","kernel-name": "Linux","kernel-release": "3.8.0-31-generic","kernel-version": "#46-Ubuntu SMP ...","Distributor ID": "Ubuntu","Description": "Ubuntu 13.04","Release": "13.04","Codename": "raring" }
DistributedComputingMeetup09 December 201439/43DistComp.2014© 2014 Assimilation Systems Limited"sshd": {"exe": "/usr/sbin/sshd","cmdline": [ "/usr/sbin/sshd", "-D" ],"uid": "root","gid": "root","cwd": "/","listenaddrs": {"0.0.0.0:22": {"proto": "tcp","addr": "0.0.0.0","port": 22 },sshd Service JSON Snippet(from netstat and /proc)
DistributedComputingMeetup09 December 201440/43DistComp.2014© 2014 Assimilation Systems Limited"ssh": {"exe": "/usr/sbin/ssh","cmdline": [ "ssh", "servidor" ],"uid": "alanr","gid": "alanr","cwd": "/home/alanr/monitor/src","clientaddrs": {"10.10.10.5:22": {"proto": "tcp","addr": "10.10.10.5","port": 22 },ssh Client JSON Snippet(from netstat and /proc)
DistributedComputingMeetup09 December 201441/43DistComp.2014© 2014 Assimilation Systems LimitedTwo Schema subgraphs●Client / serverdependency●Switch interconnect
DistributedComputingMeetup09 December 201442/43DistComp.2014© 2014 Assimilation Systems Limitedssh -> sshd dependency graph
DistributedComputingMeetup09 December 201443/43DistComp.2014© 2014 Assimilation Systems LimitedSwitch Discovery Datafrom LLDP (or CDP)