Slide 1

Slide 1 text

D i s t C o m p . 2 0 1 4 Distributed Computing in The Assimilation Project #AssimProj @OSSAlanR http://assimproj.org/ Alan Robertson Assimilation Systems Limited http://assimilationsystems.com © 2014 Assimilation Systems Limited

Slide 2

Slide 2 text

Distributed Computing Meetup 09 December 2014 2/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Biography ● 35+ years in IT/development – 10 years in system management (SysAdmin) ● Founded Linux-HA project - led 1998-2007 – aka “Heartbeat” - now called Pacemaker ● Founded Assimilation Project in 2010 ● Founded Assimilation Systems Limited in 2013 ● Alumnus of Bell Labs(21), SuSE(1), IBM(13)

Slide 3

Slide 3 text

Distributed Computing Meetup 09 December 2014 3/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Highly Scalable Discovery- Driven Automation Continuous Discovery integrated with extreme-scale Monitoring ● Continuous extensible discovery – systems, switches, services, dependencies – zero network footprint discovery process ● Extensible exception monitoring – more than 100K systems ● All data goes into central graph CMDB

Slide 4

Slide 4 text

Distributed Computing Meetup 09 December 2014 4/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Assimilation Project History ● Inspired by 2 million core computer (cyclops64) ● Concerns for extreme scale ● Topology aware monitoring ● Topology discovery w/out security issues =►Discovery of everything! Basically a C2I system: Command, Communication and Intelligence

Slide 5

Slide 5 text

Distributed Computing Meetup 09 December 2014 5/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited A seven-dimensional overview ● Problems Addressed ● Unique Capabilities ● Distribution of Work ● Architectural Components ● Communications Protocol ● Current Status ● Project Needs

Slide 6

Slide 6 text

Distributed Computing Meetup 09 December 2014 6/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited First Dimension: Problems Addressed 1. Risk Management at extreme scale 2. Maintaining detailed discovery database 3. Discovering systems you've forgotten 4. Discovering vulnerable and licensed software you're running – and where 5. Monitoring services, systems & switches 6. Finding services you aren't monitoring

Slide 7

Slide 7 text

Distributed Computing Meetup 09 December 2014 7/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Second Dimension: Unique Powerful Features 1. Continuous Discovery 2. Discovery: Zero network footprint 3. Centralized graph database 4. We know everything that changes 5. Discover and update dependency information 6. Discovery and monitoring tightly integrated – discovery drives automation

Slide 8

Slide 8 text

Distributed Computing Meetup 09 December 2014 8/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited (even more) Features... 7. Discovery and monitoring easily extensible 8. Naturally scalable to > 100K systems 9. Minimal network load 10.Server failures distinguishable from switch failures 11.Best practice and vulnerability alerts 12.Multi-tenant support

Slide 9

Slide 9 text

Distributed Computing Meetup 09 December 2014 9/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited This all sounds unreasonable... ● Huge scalability without complexity? ● Discovery without pings or port scans? Really?

Slide 10

Slide 10 text

Distributed Computing Meetup 09 December 2014 10/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Typical Monitoring Algorithm ● A system sends out pings to see if systems are alive ● Probe each service over the network – sometimes aggregated by endpoint agents ● Load on system rises rapidly ● Load on network rises rapidly with a hot spot around monitoring system ● Growth accomplished by more systems, proxies, and other forms of complexity

Slide 11

Slide 11 text

Distributed Computing Meetup 09 December 2014 11/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited More about Cyclops64 ● Specialized monitoring hardware ● Cube communication topology ● 24●24●24●160 [2,216,204] cores (!) ● Round trip costs up to 132 forwards ● Traditional monitoring protocol: – really, really bad idea

Slide 12

Slide 12 text

Distributed Computing Meetup 09 December 2014 12/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Typical Discovery Algorithms ● Turn off intrusion detection system – Ping every address – Port scans every address – SNMP and other probes done against open ports – Walk network to find switch connections ● Turn intrusion detection back on ● Repeat annually, quarterly, monthly or weekly

Slide 13

Slide 13 text

Distributed Computing Meetup 09 December 2014 13/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Third Dimension: Fully distributed work Two philosophical underpinnings 1. Monitoring and Discovery are fully distributed 2. Reliable “no news is good news” Only responses to changes are centralized

Slide 14

Slide 14 text

Distributed Computing Meetup 09 December 2014 14/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Simple Scalability I can explain how we scale so your grandmother would understand...

Slide 15

Slide 15 text

Distributed Computing Meetup 09 December 2014 15/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Simple Scalability I can explain how we scale so your grandmother would understand... istockphoto ©bowdenimages

Slide 16

Slide 16 text

Distributed Computing Meetup 09 December 2014 16/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Massive Scalability – or “I see dead servers in O(1) time” ● Adding systems does not increase the monitoring work on any system ● Each server monitors 2 (or 4) neighbors ● Each server monitors and discovers its own services ● Ring repair and alerting is O(n) – but a very small amount of work Current Implementation

Slide 17

Slide 17 text

Distributed Computing Meetup 09 December 2014 17/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Minimizing Network Footprint (planned) ● Support diagnosing switch issues ● Minimize network traffic ● Ideal for multi-site arrangements

Slide 18

Slide 18 text

Distributed Computing Meetup 09 December 2014 18/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Fourth Dimension: Architectural Components Three Architectural Components 1. Collective Management Authority ● One CMA per installation 2. Nanoprobes (agents) ● One per system 3. Data Storage ● Central Neo4j graph database (CMDB)

Slide 19

Slide 19 text

Distributed Computing Meetup 09 December 2014 19/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Basic CMA Functions (python) Nanoprobe management ● Configure & direct ● Hear alerts & discovery ● Update rings: join/leave Update database Issue alerts -- provide event notification

Slide 20

Slide 20 text

Distributed Computing Meetup 09 December 2014 20/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Nanoprobe Functions ('C') Announce self to CMA ● Default: use reserved multicast address Do what CMA says ● receive configuration information – CMA addresses, ports, defaults ● send/expect heartbeats ● perform discovery actions ● perform monitoring actions No persistent state across reboots

Slide 21

Slide 21 text

Distributed Computing Meetup 09 December 2014 21/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Service Monitoring based on HA Technologies ● Well-proven architecture: – “no news is good news” AKA management by exception ● Implements Open Cluster Framework standard (LSB and others) ● Each system monitors own services ● Can also start, stop, migrate services

Slide 22

Slide 22 text

Distributed Computing Meetup 09 December 2014 22/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Monitoring Pros and Cons Pros Simple & Scalable Uniform work distribution No single point of failure Distinguishes switch vs host failure Easy on LAN, WAN Multi-tenant approach Cons Active agents Potential slowness at power-on

Slide 23

Slide 23 text

Distributed Computing Meetup 09 December 2014 23/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Why a graph database? (Neo4j) ● Humans describe systems as graphs ● Dependency & Discovery information: graph ● Speed of graph traversals depends on size of subgraph, not total graph size ● Root cause queries  graph traversals – notoriously slow in relational databases ● Visualization is Natural ● Schema-less design: good for constantly changing heterogeneous environment ● Graph Model === Object Model

Slide 24

Slide 24 text

Distributed Computing Meetup 09 December 2014 24/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited A multi-dimensional demo ● Demonstrate basic capabilities – Discovery – Discovery-driven monitoring configuration – Discovery-driven 'tripwire-like' checksums – Monitoring – failures / successes – Host down notification ● No configuration was supplied – everything comes from discovery http://assimilationsystems.com/90_second_demo/

Slide 25

Slide 25 text

Distributed Computing Meetup 09 December 2014 25/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Communications Attributes ● Non-heartbeat communication is rare – could be months or years between packets ● Some data sent to CMA is sensitive ● Command sent to nanoprobes are potentially dangerous ● CMA connects to up to 106 clients ● No news is good news: cannot lose information

Slide 26

Slide 26 text

Distributed Computing Meetup 09 December 2014 26/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Fifth Dimension Communications Protocol ● UDP with reliable transmission protocol – packets ACKed when acted on ● Includes signatures, encryption, compression ● Communication resets happen on next communication – not immediately ● Encryption is almost done (this week!) – using libsodium – curve25519 encryption

Slide 27

Slide 27 text

Distributed Computing Meetup 09 December 2014 27/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Key Management Scenarios ● Nanoprobe one-time initialization ● CMA one-time initialization ● Nanoprobe startup ● Command flow

Slide 28

Slide 28 text

Distributed Computing Meetup 09 December 2014 28/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Nanoprobe one-time initialization

Slide 29

Slide 29 text

Distributed Computing Meetup 09 December 2014 29/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited CMA one-time initialization

Slide 30

Slide 30 text

Distributed Computing Meetup 09 December 2014 30/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Nanoprobe Startup

Slide 31

Slide 31 text

Distributed Computing Meetup 09 December 2014 31/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Command Processing

Slide 32

Slide 32 text

Distributed Computing Meetup 09 December 2014 32/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Sixth Dimension: Current Status ● Fourth release out 20 October 2014 – next release (December?) will have encrypted comm ● Great unit tests ● Several discovery methods written ● Extensible Automated Discovery Triggers ● Discovery => Automatic Monitoring (WOOT!) ● Discovery => Network-Facing Checksums ● Command Line Queries ● Licenses: Commercial or GPLv3

Slide 33

Slide 33 text

Distributed Computing Meetup 09 December 2014 33/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Seventh Dimension: Get Involved! We need you! ● Early adopters ● Testers, Continuous Integration ● Best practice experts ● Designers ● Developers (C,Python, Shell, PowerShell, JavaScript) ● Porters (esp Windows) ● Promoters, Publicists, Packagers, etc.

Slide 34

Slide 34 text

Distributed Computing Meetup 09 December 2014 34/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Resistance Is Futile! These slides: bit.ly/AssimDCM14 Mailing List bit.ly/AssimML #AssimProj @OSSAlanR #assimilation on freenode IRC Project Web Site assimproj.org Company Web Site assimilationsystems.com

Slide 35

Slide 35 text

Distributed Computing Meetup 09 December 2014 35/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Fifth Dimension: Discovery API Scripts perform discovery – output JSON Three Sample Discovery Snippets ● OS information ● Service discovery ● Client discovery

Slide 36

Slide 36 text

Distributed Computing Meetup 09 December 2014 36/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited How does discovery work? Nanoprobe scripts perform discovery ● Each discovers one kind of information ● Can take arguments from environment ● Output JSON CMA stores Discovery Information ● JSON stored in Neo4j database ● CMA discovery plugins => graph nodes and relationships

Slide 37

Slide 37 text

Distributed Computing Meetup 09 December 2014 37/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited A Few Canned Queries allipports get all port/ip/service/hosts allswitchports get switch connections crashed get crashed servers shutdown get gracefully shutdown servers downservices get nonworking services findip get system owning IP findmac get system owning MAC unknownips get unknown IP addresses unmonitored get unmonitored services

Slide 38

Slide 38 text

Distributed Computing Meetup 09 December 2014 38/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited OS discovery JSON Snippet { "nodename": "alanr-1225B", "operating-system": "GNU/Linux", "machine": "x86_64", "processor": "x86_64", "hardware-platform": "x86_64", "kernel-name": "Linux", "kernel-release": "3.8.0-31-generic", "kernel-version": "#46-Ubuntu SMP ...", "Distributor ID": "Ubuntu", "Description": "Ubuntu 13.04", "Release": "13.04", "Codename": "raring" }

Slide 39

Slide 39 text

Distributed Computing Meetup 09 December 2014 39/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited "sshd": { "exe": "/usr/sbin/sshd", "cmdline": [ "/usr/sbin/sshd", "-D" ], "uid": "root", "gid": "root", "cwd": "/", "listenaddrs": { "0.0.0.0:22": { "proto": "tcp", "addr": "0.0.0.0", "port": 22 }, sshd Service JSON Snippet (from netstat and /proc)

Slide 40

Slide 40 text

Distributed Computing Meetup 09 December 2014 40/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited "ssh": { "exe": "/usr/sbin/ssh", "cmdline": [ "ssh", "servidor" ], "uid": "alanr", "gid": "alanr", "cwd": "/home/alanr/monitor/src", "clientaddrs": { "10.10.10.5:22": { "proto": "tcp", "addr": "10.10.10.5", "port": 22 }, ssh Client JSON Snippet (from netstat and /proc)

Slide 41

Slide 41 text

Distributed Computing Meetup 09 December 2014 41/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Two Schema subgraphs ● Client / server dependency ● Switch interconnect

Slide 42

Slide 42 text

Distributed Computing Meetup 09 December 2014 42/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited ssh -> sshd dependency graph

Slide 43

Slide 43 text

Distributed Computing Meetup 09 December 2014 43/43 D i s t C o m p . 2 0 1 4 © 2014 Assimilation Systems Limited Switch Discovery Data from LLDP (or CDP)