Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2014 Monitoring Meetup

2014 Monitoring Meetup

Alan presents on the Assimilation Project

Alan Robertson

December 04, 2014
Tweet

More Decks by Alan Robertson

Other Decks in Technology

Transcript

  1. M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    Modeling and Monitoring Hundreds
    of Thousands of Servers
    using
    The Assimilation Project
    #AssimProj @OSSAlanR
    http://assimproj.org/
    Alan Robertson
    Assimilation Systems Limited
    http://assimilationsystems.com
    © 2014 Assimilation Systems Limited

    View Slide

  2. Monitoring
    Meetup
    04 December 2014
    2/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Biography

    35+ years in IT/development – 10 years in
    system management (SysAdmin)

    Founded Linux-HA project - led 1998-2007
    – aka “Heartbeat” - now called Pacemaker

    Founded Assimilation Project in 2010

    Founded Assimilation Systems Limited in
    2013

    Alumnus of Bell Labs, SuSE, IBM

    View Slide

  3. Monitoring
    Meetup
    04 December 2014
    3/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Highly Scalable Discovery-
    Driven Automation
    Continuous Discovery integrated with
    extreme-scale Monitoring

    Continuous extensible discovery
    – systems, switches, services, dependencies –
    zero network footprint discovery process

    Extensible exception monitoring
    – more than 100K systems

    All data goes into central graph CMDB

    View Slide

  4. Monitoring
    Meetup
    04 December 2014
    4/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Assimilation Project History

    Inspired by 2 million core computer (cyclops64)

    Concerns for extreme scale

    Topology aware monitoring

    Topology discovery w/out security issues
    =►Discovery of everything!

    View Slide

  5. Monitoring
    Meetup
    04 December 2014
    5/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited

    View Slide

  6. Monitoring
    Meetup
    04 December 2014
    6/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    An 8-dimensional overview

    Problems Addressed

    Unique Capabilities

    Distribution of Work

    Architectural Components

    Discovery Graph Schema

    Extensible Discovery API

    Current Status

    Project Needs

    View Slide

  7. Monitoring
    Meetup
    04 December 2014
    7/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    First Dimension:
    Problems Addressed
    1. Risk Management at extreme scale
    2. Maintaining detailed discovery database
    3. Discovering systems you've forgotten
    4. Discovering vulnerable and licensed
    software you're running – and where
    5. Monitoring services, systems & switches
    6. Finding services you aren't monitoring

    View Slide

  8. Monitoring
    Meetup
    04 December 2014
    8/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Risk Management/Mitigation

    Intrusions

    Vulnerable Software

    Licensed Software

    Audit Risk

    Outages

    System management

    View Slide

  9. Monitoring
    Meetup
    04 December 2014
    9/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Why Discovery? (DevOps)

    Documentation: incomplete, incorrect

    Dependencies: unknown

    Planning: Needs accurate data

    Best Practices: Verification needs data

    ITIL CMDB (Configuration Management
    Data Base)
    Our Discovery: continuous, low-profile

    View Slide

  10. Monitoring
    Meetup
    04 December 2014
    10/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Second Dimension:
    Unique Powerful Features
    1. Continuous Discovery
    2. Discovery: Zero network footprint
    3. Centralized graph database
    4. We know everything that changes
    5. Discover and update dependency
    information
    6. Discovery and monitoring tightly
    integrated – discovery drives automation

    View Slide

  11. Monitoring
    Meetup
    04 December 2014
    11/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    (even more) Features...
    7. Discovery and monitoring easily
    extensible
    8. Naturally scalable to > 100K systems
    9. Minimal network load
    10.Server failures distinguishable
    from switch failures
    11.Best practice and vulnerability alerts
    12.Multi-tenant support

    View Slide

  12. Monitoring
    Meetup
    04 December 2014
    12/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    This all sounds unreasonable...

    Huge scalability without complexity?

    Discovery without pings or port scans?
    Really?

    View Slide

  13. Monitoring
    Meetup
    04 December 2014
    13/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Third Dimension:
    Fully distributed work
    Two philosophical underpinnings
    1. Monitoring and Discovery are fully distributed
    2. Reliable “no news is good news”
    Only responses to changes are centralized

    View Slide

  14. Monitoring
    Meetup
    04 December 2014
    14/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Simple Scalability
    I can explain how we scale so your
    grandmother would understand...

    View Slide

  15. Monitoring
    Meetup
    04 December 2014
    15/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Simple Scalability
    I can explain how we scale so your
    grandmother would understand...
    istockphoto
    ©bowdenimages

    View Slide

  16. Monitoring
    Meetup
    04 December 2014
    16/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Massive Scalability – or
    “I see dead servers in O(1) time”

    Adding systems does not increase the monitoring work on any
    system

    Each server monitors 2 (or 4) neighbors

    Each server monitors and discovers its own services

    Ring repair and alerting is O(n) – but a very small amount of work
    Current Implementation

    View Slide

  17. Monitoring
    Meetup
    04 December 2014
    17/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Minimizing Network Footprint
    (planned)

    Support diagnosing switch issues

    Minimize network traffic

    Ideal for multi-site arrangements

    View Slide

  18. Monitoring
    Meetup
    04 December 2014
    18/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Fourth Dimension:
    Architectural Components
    Three Architectural Components
    1. Collective Management Authority

    One CMA per installation
    2. Nanoprobes (agents)

    One per system
    3. Data Storage

    Central Neo4j graph database (CMDB)

    View Slide

  19. Monitoring
    Meetup
    04 December 2014
    19/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Basic CMA Functions (python)
    Nanoprobe management

    Configure & direct

    Hear alerts & discovery

    Update rings: join/leave
    Update database
    Issue alerts
    -- provide event notification

    View Slide

  20. Monitoring
    Meetup
    04 December 2014
    20/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Nanoprobe Functions ('C')
    Announce self to CMA

    Default: use reserved multicast address
    Do what CMA says

    receive configuration information
    – CMA addresses, ports, defaults

    send/expect heartbeats

    perform discovery actions

    perform monitoring actions
    No persistent state across reboots

    View Slide

  21. Monitoring
    Meetup
    04 December 2014
    21/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Service Monitoring based on
    HA Technologies

    Well-proven architecture:
    – “no news is good news” AKA
    management by exception

    Implements Open Cluster Framework
    standard (LSB and others)

    Each system monitors own services

    Can also start, stop, migrate services

    View Slide

  22. Monitoring
    Meetup
    04 December 2014
    22/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Monitoring Pros and Cons
    Pros
    Simple & Scalable
    Uniform work distribution
    No single point of failure
    Distinguishes switch vs
    host failure
    Easy on LAN, WAN
    Multi-tenant approach
    Cons
    Active agents
    Potential slowness
    at power-on

    View Slide

  23. Monitoring
    Meetup
    04 December 2014
    23/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Why a graph database? (Neo4j)

    Humans describe systems as graphs

    Dependency & Discovery information: graph

    Speed of graph traversals depends on size of
    subgraph, not total graph size

    Root cause queries  graph traversals –
    notoriously slow in relational databases

    Visualization is Natural

    Schema-less design: good for constantly changing
    heterogeneous environment

    Graph Model === Object Model

    View Slide

  24. Monitoring
    Meetup
    04 December 2014
    24/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    A multi-dimensional demo

    Demonstrate basic capabilities
    – Discovery
    – Discovery-driven monitoring configuration
    – Discovery-driven 'tripwire-like' checksums
    – Monitoring – failures / successes
    – Host down notification

    No configuration was supplied
    – everything comes from discovery
    http://assimilationsystems.com/90_second_demo/

    View Slide

  25. Monitoring
    Meetup
    04 December 2014
    25/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Fifth Dimension:
    Discovery API
    Scripts perform discovery
    – output JSON
    Three Sample Discovery Snippets

    OS information

    Service discovery

    Client discovery

    View Slide

  26. Monitoring
    Meetup
    04 December 2014
    26/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    How does discovery work?
    Nanoprobe scripts perform discovery

    Each discovers one kind of information

    Can take arguments from environment

    Output JSON
    CMA stores Discovery Information

    JSON stored in Neo4j database

    CMA discovery plugins => graph nodes
    and relationships

    View Slide

  27. Monitoring
    Meetup
    04 December 2014
    27/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    A Few Canned Queries
    allipports get all port/ip/service/hosts
    allswitchports get switch connections
    crashed get crashed servers
    shutdown get gracefully shutdown servers
    downservices get nonworking services
    findip get system owning IP
    findmac get system owning MAC
    unknownips get unknown IP addresses
    unmonitored get unmonitored services

    View Slide

  28. Monitoring
    Meetup
    04 December 2014
    28/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    OS discovery JSON Snippet
    { "nodename": "alanr-1225B",
    "operating-system": "GNU/Linux",
    "machine": "x86_64",
    "processor": "x86_64",
    "hardware-platform": "x86_64",
    "kernel-name": "Linux",
    "kernel-release": "3.8.0-31-generic",
    "kernel-version": "#46-Ubuntu SMP ...",
    "Distributor ID": "Ubuntu",
    "Description": "Ubuntu 13.04",
    "Release": "13.04",
    "Codename": "raring" }

    View Slide

  29. Monitoring
    Meetup
    04 December 2014
    29/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    "sshd": {
    "exe": "/usr/sbin/sshd",
    "cmdline": [ "/usr/sbin/sshd", "-D" ],
    "uid": "root",
    "gid": "root",
    "cwd": "/",
    "listenaddrs": {
    "0.0.0.0:22": {
    "proto": "tcp",
    "addr": "0.0.0.0",
    "port": 22 },
    sshd Service JSON Snippet
    (from netstat and /proc)

    View Slide

  30. Monitoring
    Meetup
    04 December 2014
    30/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    "ssh": {
    "exe": "/usr/sbin/ssh",
    "cmdline": [ "ssh", "servidor" ],
    "uid": "alanr",
    "gid": "alanr",
    "cwd": "/home/alanr/monitor/src",
    "clientaddrs": {
    "10.10.10.5:22": {
    "proto": "tcp",
    "addr": "10.10.10.5",
    "port": 22 },
    ssh Client JSON Snippet
    (from netstat and /proc)

    View Slide

  31. Monitoring
    Meetup
    04 December 2014
    31/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Sixth Dimension:
    Graph Schema
    Two Schema subgraphs

    Client / server
    dependency

    Switch interconnect

    View Slide

  32. Monitoring
    Meetup
    04 December 2014
    32/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    ssh -> sshd dependency graph

    View Slide

  33. Monitoring
    Meetup
    04 December 2014
    33/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Switch Discovery Data
    from LLDP (or CDP)

    View Slide

  34. Monitoring
    Meetup
    04 December 2014
    34/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Seventh Dimension:
    Current Status

    Fourth release out 20 October 2014
    – next release (December?) will have encrypted comm

    Great unit tests

    Several discovery methods written

    Extensible Automated Discovery Triggers

    Discovery => Automatic Monitoring (WOOT!)

    Discovery => Network-Facing Checksums

    Command Line Queries

    Licenses: Commercial or GPLv3

    View Slide

  35. Monitoring
    Meetup
    04 December 2014
    35/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Eighth Dimension:
    Get Involved!
    We need you!

    Early adopters

    Testers, Continuous Integration

    Best practice experts

    Designers

    Developers (C,Python, Shell, PowerShell, JavaScript)

    Porters (esp Windows)

    Promoters, Publicists, Packagers, etc.

    View Slide

  36. Monitoring
    Meetup
    04 December 2014
    36/36
    M
    o
    n
    i
    t
    o
    r
    i
    n
    g
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Resistance Is Futile!
    These slides bit.ly/AssimLFNW14
    Mailing List bit.ly/AssimML
    #AssimProj @OSSAlanR
    #assimilation on freenode IRC
    Project Web Site
    assimproj.org
    Company Web Site
    assimilationsystems.com

    View Slide