Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Painlessly Discovering (and Monitoring) All The Things

Painlessly Discovering (and Monitoring) All The Things

A dirty little secret in IT is that we don’t always know everything we have, what our systems are doing or fully monitor them. The Assimilation Project integrates continuous discovery and monitoring, creating a graph CMDB of your infrastructure and services - scalably monitoring them with near-zero configuration. Come learn how to easily put your infrastructure knowledge in one place, monitor your systems, services and configurations, and automatically update it and examine it against best practices.

Alan Robertson

January 30, 2015
Tweet

More Decks by Alan Robertson

Other Decks in Technology

Transcript

  1. C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    Painlessly Discovering
    (and monitoring)
    All The Things
    #AssimProj @OSSAlanR
    http://assimproj.org/
    Alan Robertson
    Assimilation Systems Limited
    http://assimilationsystems.com
    © 2015 Assimilation Systems Limited

    View Slide

  2. CfgMgmt
    Camp
    03 February 2015
    2/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Biography

    35+ years in IT/development – 10 years in
    system management (SysAdmin)

    Founded Linux-HA project - led 1998-2007
    – aka “Heartbeat” - now called Pacemaker

    Founded Assimilation Project in 2010

    Founded Assimilation Systems Limited in
    2013

    Alumnus of Bell Labs, SuSE, IBM

    View Slide

  3. CfgMgmt
    Camp
    03 February 2015
    3/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Assimilation Project History

    Inspired by 2 million core computer (cyclops64)

    Concerns for extreme scale

    Topology aware monitoring

    Topology discovery w/out security issues
    =►Discovery of everything!

    View Slide

  4. CfgMgmt
    Camp
    03 February 2015
    4/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    An 8-dimensional overview
    1.Problems Addressed
    2.Unique Capabilities
    3.Distribution of Work
    4.Architectural Components
    5.Sample Graph and Discovery API
    6.Best Practice Analyses
    7.Current Status
    8.What You Need To Do!

    View Slide

  5. C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    First Dimension:
    Problems Addressed

    Discovering and maintaining documentation
    (CMDB) using continuous discovery
    – Services, Systems, Dependencies, Switches,
    Interconnects, Configuration

    Monitoring and alerting: services, systems
    and compliance

    Managing compliance

    Mitigating risk

    View Slide

  6. CfgMgmt
    Camp
    03 February 2015
    6/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Highly Scalable Discovery-
    Driven Automation
    Continuous Discovery drives everything

    Continuous extensible discovery (CMDB)
    – systems, switches, services, dependencies –
    zero network footprint discovery process

    Extensible exception monitoring
    – more than 100K systems

    Discovery Drives Best Practice Analyses
    – Initially concentrating on security

    All data goes into central graph CMDB

    View Slide

  7. CfgMgmt
    Camp
    03 February 2015
    7/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Why Discovery? (DevOps)

    Documentation: incomplete, incorrect

    Dependencies: unknown

    Planning: Needs accurate data

    Best Practices: Verification needs data

    ITIL CMDB (Configuration Management
    Data Base)
    Our Discovery: continuous, low-profile

    View Slide

  8. CfgMgmt
    Camp
    03 February 2015
    8/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Second Dimension:
    Unique Powerful Features
    1. Continuous Discovery
    2. Discovery: Zero network footprint
    3. Centralized graph database
    4. We know everything that changes
    5. Discover and update dependency
    information
    6. Discovery and monitoring tightly
    integrated – discovery drives automation

    View Slide

  9. CfgMgmt
    Camp
    03 February 2015
    9/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    (even more) Features...
    7. Discovery and monitoring easily
    extensible
    8. Naturally scalable to > 100K systems
    9. Minimal network load
    10.Server failures distinguishable
    from switch failures
    11.Best practice and vulnerability alerts
    12.Multi-tenant support

    View Slide

  10. CfgMgmt
    Camp
    03 February 2015
    10/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    This all sounds unreasonable...

    Huge scalability without complexity?

    Discovery without pings or port scans?
    Really?

    View Slide

  11. CfgMgmt
    Camp
    03 February 2015
    11/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Third Dimension:
    Fully distributed work
    Two philosophical underpinnings
    1. Monitoring and Discovery are fully distributed
    2. Reliable “no news is good news”
    Only responses to changes are centralized

    View Slide

  12. CfgMgmt
    Camp
    03 February 2015
    12/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Simple Scalability
    I can explain how we scale so your
    grandmother would understand...

    View Slide

  13. CfgMgmt
    Camp
    03 February 2015
    13/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Simple Scalability
    I can explain how we scale so your
    grandmother would understand...
    istockphoto
    ©bowdenimages

    View Slide

  14. CfgMgmt
    Camp
    03 February 2015
    14/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Massive Scalability – or
    “I see dead servers in O(1) time”

    Adding systems does not increase the monitoring work on any
    system

    Each server monitors 2 (or 4) neighbors

    Each server monitors and discovers its own services

    Ring repair and alerting is O(n) – but a very small amount of work
    Current Implementation

    View Slide

  15. CfgMgmt
    Camp
    03 February 2015
    15/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Fourth Dimension:
    Architectural Components
    Three Architectural Components
    1. Collective Management Authority

    One CMA per installation
    2. Nanoprobes (agents)

    One per system
    3. Data Storage

    Central Neo4j graph database (CMDB)

    View Slide

  16. CfgMgmt
    Camp
    03 February 2015
    16/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Basic CMA Functions (python)
    Nanoprobe management

    Configure & direct

    Hear alerts & discovery

    Update rings: join/leave
    Update database
    Analyze configuration changes
    Issue alerts
    -- provide event notification

    View Slide

  17. CfgMgmt
    Camp
    03 February 2015
    17/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Nanoprobe Functions ('C')
    Announce self to CMA

    Default: use reserved multicast address
    Do what CMA says

    receive configuration information
    – CMA addresses, ports, defaults

    send/expect heartbeats

    perform discovery actions

    perform monitoring actions
    No persistent state across reboots

    View Slide

  18. CfgMgmt
    Camp
    03 February 2015
    18/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Service Monitoring based on
    HA Technologies

    Well-proven architecture:
    – reliable “no news is good news”

    Implements Open Cluster Framework
    standard (LSB and others)

    Each system monitors own services

    Can also start, stop, migrate services

    View Slide

  19. CfgMgmt
    Camp
    03 February 2015
    19/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    A multi-dimensional demo

    Demonstrate basic capabilities
    – Discovery
    – Discovery-driven monitoring configuration
    – Discovery-driven 'tripwire-like' checksums
    – Monitoring – failures / successes
    – Host down notification

    No configuration was supplied
    – everything comes from discovery
    http://assimilationsystems.com/90_second_demo/

    View Slide

  20. CfgMgmt
    Camp
    03 February 2015
    20/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Fifth Dimension:
    Discovery Graph and API

    View Slide

  21. CfgMgmt
    Camp
    03 February 2015
    21/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    How does discovery work?
    Nanoprobe scripts perform discovery

    Each discovers one kind of information

    Can take arguments from environment

    Output JSON
    CMA stores Discovery Information

    JSON stored in Neo4j database

    CMA discovery plugins => graph nodes and
    relationships

    View Slide

  22. CfgMgmt
    Camp
    03 February 2015
    22/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    A Few Canned Queries
    allipports get all port/ip/service/hosts
    allswitchports get switch connections
    crashed get crashed servers
    shutdown get gracefully shutdown servers
    downservices get nonworking services
    findip get system owning IP
    findmac get system owning MAC
    unknownips get unknown IP addresses
    unmonitored get unmonitored services

    View Slide

  23. CfgMgmt
    Camp
    03 February 2015
    23/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    OS discovery JSON Snippet
    { "nodename": "alanr-1225B",
    "operating-system": "GNU/Linux",
    "machine": "x86_64",
    "processor": "x86_64",
    "hardware-platform": "x86_64",
    "kernel-name": "Linux",
    "kernel-release": "3.8.0-31-generic",
    "kernel-version": "#46-Ubuntu SMP ...",
    "Distributor ID": "Ubuntu",
    "Description": "Ubuntu 13.04",
    "Release": "13.04",
    "Codename": "raring" }

    View Slide

  24. CfgMgmt
    Camp
    03 February 2015
    24/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Sixth Dimension:
    Best Practice Analyses
    This is a planned direction of the project

    Triggered by Discovery Updates
    – Analysis occurs within seconds of change
    – No change => No analysis

    We can analyze anything discovered

    Expect to create alerts and reports

    View Slide

  25. CfgMgmt
    Camp
    03 February 2015
    25/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Sample Security Best Practices

    Inappropriate services (telnet, etc)

    Settings in /proc/sys/

    Security Patch Coverage
    – OS vendor (RedHat, SuSE, Canonical, etc)
    – Application (Oracle, IBM, WordPress, etc)

    Other OS settings

    Common Application Settings
    FYI: Sharing information with Lynis project

    View Slide

  26. CfgMgmt
    Camp
    03 February 2015
    26/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Other Sample Security Features

    Discovery of “forgotten” IP addresses

    Monitoring of Open Ports and Services

    Nmon profiling of new MAC addresses

    Checksum outliers analysis

    Security Best Practice Analyses

    View Slide

  27. CfgMgmt
    Camp
    03 February 2015
    27/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Seventh Dimension:
    Current Status

    Fifth release tagged 30 January 2015

    Moving towards security emphasis

    Great unit and system tests

    Strongly encrypted communication

    Several discovery methods written

    Extensible Automated Discovery Triggers

    Discovery => Automatic Monitoring + Network-Facing
    Checksums

    Command Line Queries

    Licenses: Commercial or GPLv3

    View Slide

  28. CfgMgmt
    Camp
    03 February 2015
    28/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Eighth Dimension:
    Get Involved!

    Early adopters – customers!

    Contributors
    – Testers, Continuous Integration
    – Best practice experts
    – Designers
    – Developers (C,Python, Shell, PowerShell,
    JavaScript)
    – Porters (esp Windows)
    – Promoters, Publicists, Packagers, etc.

    View Slide

  29. CfgMgmt
    Camp
    03 February 2015
    29/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Resistance Is Futile!
    These slides bit.ly/AssimCFGMC15
    Mailing List bit.ly/AssimML
    #AssimProj @OSSAlanR
    #assimilation on freenode IRC
    Project Web Site
    assimproj.org
    Company Web Site
    assimilationsystems.com

    View Slide

  30. CfgMgmt
    Camp
    03 February 2015
    30/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Risk Management/Mitigation

    Intrusions

    Vulnerable Software

    Licensed Software

    Audit Risk

    Outages

    System management

    View Slide

  31. CfgMgmt
    Camp
    03 February 2015
    31/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Why a graph database? (Neo4j)

    Humans describe systems as graphs

    Dependency & Discovery information: graph

    Speed of graph traversals depends on size of
    subgraph, not total graph size

    Root cause queries  graph traversals –
    notoriously slow in relational databases

    Visualization is Natural

    Schema-less design: good for constantly changing
    heterogeneous environment

    Graph Model === Object Model

    View Slide

  32. CfgMgmt
    Camp
    03 February 2015
    32/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Monitoring Pros and Cons
    Pros
    Simple & Scalable
    Uniform work distribution
    No single point of failure
    Distinguishes switch vs
    host failure
    Easy on LAN, WAN
    Multi-tenant approach
    Cons
    Active agents
    Potential slowness
    at power-on

    View Slide

  33. CfgMgmt
    Camp
    03 February 2015
    33/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Minimizing Network Footprint
    (planned)

    Support diagnosing switch issues

    Minimize network traffic

    Ideal for multi-site arrangements

    View Slide

  34. CfgMgmt
    Camp
    03 February 2015
    34/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Sixth Dimension:
    Graph Schema
    Two Schema subgraphs

    Client / server
    dependency

    Switch interconnect

    View Slide

  35. CfgMgmt
    Camp
    03 February 2015
    35/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    "sshd": {
    "exe": "/usr/sbin/sshd",
    "cmdline": [ "/usr/sbin/sshd", "-D" ],
    "uid": "root",
    "gid": "root",
    "cwd": "/",
    "listenaddrs": {
    "0.0.0.0:22": {
    "proto": "tcp",
    "addr": "0.0.0.0",
    "port": 22 },
    sshd Service JSON Snippet
    (from netstat and /proc)

    View Slide

  36. CfgMgmt
    Camp
    03 February 2015
    36/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    "ssh": {
    "exe": "/usr/sbin/ssh",
    "cmdline": [ "ssh", "servidor" ],
    "uid": "alanr",
    "gid": "alanr",
    "cwd": "/home/alanr/monitor/src",
    "clientaddrs": {
    "10.10.10.5:22": {
    "proto": "tcp",
    "addr": "10.10.10.5",
    "port": 22 },
    ssh Client JSON Snippet
    (from netstat and /proc)

    View Slide

  37. CfgMgmt
    Camp
    03 February 2015
    37/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    ssh -> sshd dependency graph

    View Slide

  38. CfgMgmt
    Camp
    03 February 2015
    38/38
    C
    f
    g
    M
    g
    m
    t
    C
    a
    m
    p
    2015
    © 2015 Assimilation Systems Limited
    Switch Discovery Data
    from LLDP (or CDP)

    View Slide