Slide 1

Slide 1 text

L C A 2 0 1 4 IT Discovery and Monitoring Without Limit using The Assimilation Project #AssimProj @OSSAlanR http://assimproj.org/ Alan Robertson Assimilation Systems Limited http://assimilationsystems.com

Slide 2

Slide 2 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 2/37 L C A 2 0 1 4 Project Scope Zero-network-footprint continuous Discovery integrated with extreme-scale Monitoring ● Continuous extensible discovery – systems, switches, services, dependencies – zero network footprint ● Extensible exception monitoring – more than 100K systems ● All data goes into central graph database

Slide 3

Slide 3 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 3/37 L C A 2 0 1 4 Questions ● How many of you have monitoring? – Open or closed source? – How many of you are happy with it? ● How many of you have discovery? – Open or closed source? – Is it continuous? – How many of you are happy with it?

Slide 4

Slide 4 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 4/37 L C A 2 0 1 4 Assimilation Project History ● Inspired by 2 million core computer (cyclops64) ● Concerns for extreme scale ● Topology aware monitoring ● Topology discovery w/out security issues =►Discovery of everything!

Slide 5

Slide 5 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 5/37 L C A 2 0 1 4

Slide 6

Slide 6 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 6/37 L C A 2 0 1 4 An 8-dimensional overview ● Problems Addressed ● Unique Capabilities ● Distribution of Work ● Architectural Components ● Discovery Graph Schema ● Extensible Discovery API ● Current Status ● Project Needs

Slide 7

Slide 7 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 7/37 L C A 2 0 1 4 First Dimension: Problems Addressed Risk Management at extreme scale 1. Maintaining detailed discovery database 2. Discovering systems you've forgotten about 3. Discovering what (licensed) software you're running – and where 4. Monitoring services, systems and switches 5. Finding services you aren't monitoring

Slide 8

Slide 8 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 8/37 L C A 2 0 1 4 Risk Management/Mitigation ● Intrusions ● Licensed Software ● Audit Risk ● Outages ● System management

Slide 9

Slide 9 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 9/37 L C A 2 0 1 4 Why Discovery? (DevOps) ● Documentation: incomplete, incorrect ● Dependencies: unknown ● Planning: Needs accurate data ● Best Practices: Verification needs data ● ITIL CMDB (Configuration Mgmt DataBase) Our Discovery: continuous, low-profile

Slide 10

Slide 10 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 10/37 L C A 2 0 1 4 Second Dimension: Unique Powerful Features 1. Continuous Discovery 2. Zero network discovery footprint 3. Centralized graph database 4. We know everything that changes 5. Discover and update dependency information

Slide 11

Slide 11 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 11/37 L C A 2 0 1 4 (even more) Features... 6. Discovery and monitoring tightly integrated – discovery drives monitoring 7. Discovery and monitoring easily extensible 8. Naturally scalable to > 100K systems 9. Server failures distinguishable from switch failures 10.Minimal network load 11.Multi-tenant support

Slide 12

Slide 12 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 12/37 L C A 2 0 1 4 This all sounds unreasonable... ● Huge scalability without complexity? ● Discovery without sending packets? Really?

Slide 13

Slide 13 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 13/37 L C A 2 0 1 4 Third Dimension: Uniformly, fully distributed work Two philosophical underpinnings 1. Monitoring and Discovery are fully distributed 2. Reliable “no news is good news” Only responses to changes are centralized

Slide 14

Slide 14 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 14/37 L C A 2 0 1 4 Simple Scalability ● I can explain how we distribute work so your grandmother would understand

Slide 15

Slide 15 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 15/37 L C A 2 0 1 4 Massive Scalability – or “I see dead servers in O(1) time” ● Adding systems does not increase the monitoring work on any system ● Each server monitors 2 (or 4) neighbors ● Each server monitors its own services ● Ring repair and alerting is O(n) – but a very small amount of work ● Ring repair for a million nodes is less than 10K packets per day (approximately 1 packet per 9 seconds) Current Implementation

Slide 16

Slide 16 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 16/37 L C A 2 0 1 4 Minimizing Network Footprint (planned) ● Support diagnosing switch issues ● Minimize network traffic ● Ideal for multi-site arrangements

Slide 17

Slide 17 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 17/37 L C A 2 0 1 4 Fourth Dimension: Architectural Components Three Architectural Components Collective Management Authority ● One CMA per installation Nanoprobes ● One nanoprobe per system Data Storage ● Central Neo4j graph database

Slide 18

Slide 18 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 18/37 L C A 2 0 1 4 Basic CMA Functions (python) Nanoprobe management ● Configure & direct ● Hear alerts & discovery ● Update rings: join/leave Update database Issue alerts

Slide 19

Slide 19 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 19/37 L C A 2 0 1 4 Nanoprobe Functions ('C') Announce self to CMA ● Reserved multicast address (can be unicast address or name if no multicast) Do what CMA says ● receive configuration information – CMA addresses, ports, defaults ● send/expect heartbeats ● perform discovery actions ● perform monitoring actions No persistent state across reboots

Slide 20

Slide 20 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 20/37 L C A 2 0 1 4 Service Monitoring based on Linux-HA/Pacemaker LRM ● LRM == Local Resource Manager ● Well-proven architecture: – “no news is good news” AKA management by exception ● Implements Open Cluster Framework standard (and others) ● Each system monitors own services ● Can also start, stop, migrate services

Slide 21

Slide 21 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 21/37 L C A 2 0 1 4 Monitoring Pros and Cons Pros Simple & Scalable Uniform work distribution No single point of failure Distinguishes switch vs host failure Easy on LAN, WAN Multi-tenant approach Cons Active agents Potential slowness at power-on

Slide 22

Slide 22 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 22/37 L C A 2 0 1 4 Why a graph database? (Neo4j) ● Humans describe systems as graphs ● Dependency & Discovery information: graph ● Speed of graph traversals depends on size of subgraph, not total graph size ● Root cause queries  graph traversals – notoriously slow in relational databases ● Visualization is Natural ● Schema-less design: good for constantly changing heterogeneous environment ● Graph Model === Object Model

Slide 23

Slide 23 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 23/37 L C A 2 0 1 4 Fifth Dimension: Discovery API Scripts perform discovery – output JSON Three Sample Discovery Snippets ● OS information ● Service discovery ● Client discovery

Slide 24

Slide 24 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 24/37 L C A 2 0 1 4 A multi-dimensional demo ● Demonstrate basic capabilities – Discovery – Automatic monitoring configuration – Monitoring – failures / successes ● No configuration was supplied – everything comes from discovery

Slide 25

Slide 25 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 25/37 L C A 2 0 1 4 How does discovery work? Nanoprobe scripts perform discovery ● Each discovers one kind of information ● Can take arguments from environment ● Output JSON CMA stores Discovery Information ● JSON stored in Neo4j database ● CMA discovery plugins => graph nodes and relationships

Slide 26

Slide 26 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 26/37 L C A 2 0 1 4 OS discovery JSON Snippet { "nodename": "alanr-1225B", "operating-system": "GNU/Linux", "machine": "x86_64", "processor": "x86_64", "hardware-platform": "x86_64", "kernel-name": "Linux", "kernel-release": "3.8.0-31-generic", "kernel-version": "#46-Ubuntu SMP ...", "Distributor ID": "Ubuntu", "Description": "Ubuntu 13.04", "Release": "13.04", "Codename": "raring" }

Slide 27

Slide 27 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 27/37 L C A 2 0 1 4 sshd Service JSON Snippet (from netstat and /proc) "sshd": { "exe": "/usr/sbin/sshd", "cmdline": [ "/usr/sbin/sshd", "-D" ], "uid": "root", "gid": "root", "cwd": "/", "listenaddrs": { "0.0.0.0:22": { "proto": "tcp", "addr": "0.0.0.0", "port": 22 }, and so on...

Slide 28

Slide 28 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 28/37 L C A 2 0 1 4 ssh Client JSON Snippet (from netstat and /proc) "ssh": { "exe": "/usr/sbin/ssh", "cmdline": [ "ssh", "servidor" ], "uid": "alanr", "gid": "alanr", "cwd": "/home/alanr/monitor/src", "clientaddrs": { "10.10.10.5:22": { "proto": "tcp", "addr": "10.10.10.5", "port": 22 }, and so on...

Slide 29

Slide 29 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 29/37 L C A 2 0 1 4 Sixth Dimension: Graph Schema Two Schema subgraphs ● Client / server dependency ● Switch interconnect

Slide 30

Slide 30 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 30/37 L C A 2 0 1 4 ssh -> sshd dependency graph

Slide 31

Slide 31 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 31/37 L C A 2 0 1 4 Switch Discovery Data from LLDP (or CDP) CRM transforms LLDP (CDP) Data to JSON

Slide 32

Slide 32 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 32/37 L C A 2 0 1 4 Seventh Dimension: Current Status ● First release April 2013 ● Great unit tests ● Nanoprobe code works well ● Several discovery methods written ● CMA restructuring complete ● Discovery => Automatic Monitoring (WOOT!) ● UI development underway ● Licensed under GPL: commercial options available

Slide 33

Slide 33 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 33/37 L C A 2 0 1 4 Eighth Dimension: Get Involved! We need every talent! ● Early adopters ● Testers, Continuous Integration ● Designers ● Developers (C,Python, Shell, PowerShell, JavaScript) ● Porters (esp Windows) ● Promoters, publicists ● Packagers ● And so on...

Slide 34

Slide 34 text

linux.conf.au 08 January 2014 © 2013 Assimilation Systems Limited 34/37 L C A 2 0 1 4 Resistance Is Futile! Mailing List bit.ly/AssimML #AssimProj @OSSAlanR Project Web Site assimproj.org Blog techthoughts.typepad.com assimilationsystems.com