Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Assimilation Project Distributed Computing Overview

Assimilation Project Distributed Computing Overview

This talk gives an overview of the Assimilation Project from the perspective of it's distributed computing aspects - hitting on scalability, protocol, encryption, etc.

Alan Robertson

December 09, 2014
Tweet

More Decks by Alan Robertson

Other Decks in Programming

Transcript

  1. D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    Distributed Computing
    in
    The Assimilation Project
    #AssimProj @OSSAlanR
    http://assimproj.org/
    Alan Robertson
    Assimilation Systems Limited
    http://assimilationsystems.com
    © 2014 Assimilation Systems Limited

    View Slide

  2. Distributed
    Computing
    Meetup
    09 December 2014
    2/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Biography

    35+ years in IT/development – 10 years in
    system management (SysAdmin)

    Founded Linux-HA project - led 1998-2007
    – aka “Heartbeat” - now called Pacemaker

    Founded Assimilation Project in 2010

    Founded Assimilation Systems Limited in
    2013

    Alumnus of Bell Labs(21), SuSE(1), IBM(13)

    View Slide

  3. Distributed
    Computing
    Meetup
    09 December 2014
    3/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Highly Scalable Discovery-
    Driven Automation
    Continuous Discovery integrated with
    extreme-scale Monitoring

    Continuous extensible discovery
    – systems, switches, services, dependencies –
    zero network footprint discovery process

    Extensible exception monitoring
    – more than 100K systems

    All data goes into central graph CMDB

    View Slide

  4. Distributed
    Computing
    Meetup
    09 December 2014
    4/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Assimilation Project History

    Inspired by 2 million core computer (cyclops64)

    Concerns for extreme scale

    Topology aware monitoring

    Topology discovery w/out security issues
    =►Discovery of everything!
    Basically a C2I system:
    Command, Communication and Intelligence

    View Slide

  5. Distributed
    Computing
    Meetup
    09 December 2014
    5/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    A seven-dimensional overview

    Problems Addressed

    Unique Capabilities

    Distribution of Work

    Architectural Components

    Communications Protocol

    Current Status

    Project Needs

    View Slide

  6. Distributed
    Computing
    Meetup
    09 December 2014
    6/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    First Dimension:
    Problems Addressed
    1. Risk Management at extreme scale
    2. Maintaining detailed discovery database
    3. Discovering systems you've forgotten
    4. Discovering vulnerable and licensed
    software you're running – and where
    5. Monitoring services, systems & switches
    6. Finding services you aren't monitoring

    View Slide

  7. Distributed
    Computing
    Meetup
    09 December 2014
    7/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Second Dimension:
    Unique Powerful Features
    1. Continuous Discovery
    2. Discovery: Zero network footprint
    3. Centralized graph database
    4. We know everything that changes
    5. Discover and update dependency
    information
    6. Discovery and monitoring tightly
    integrated – discovery drives automation

    View Slide

  8. Distributed
    Computing
    Meetup
    09 December 2014
    8/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    (even more) Features...
    7. Discovery and monitoring easily
    extensible
    8. Naturally scalable to > 100K systems
    9. Minimal network load
    10.Server failures distinguishable
    from switch failures
    11.Best practice and vulnerability alerts
    12.Multi-tenant support

    View Slide

  9. Distributed
    Computing
    Meetup
    09 December 2014
    9/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    This all sounds unreasonable...

    Huge scalability without complexity?

    Discovery without pings or port scans?
    Really?

    View Slide

  10. Distributed
    Computing
    Meetup
    09 December 2014
    10/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Typical Monitoring Algorithm

    A system sends out pings to see if systems are alive

    Probe each service over the network
    – sometimes aggregated by endpoint agents

    Load on system rises rapidly

    Load on network rises rapidly with a hot spot around
    monitoring system

    Growth accomplished by more systems, proxies,
    and other forms of complexity

    View Slide

  11. Distributed
    Computing
    Meetup
    09 December 2014
    11/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    More about Cyclops64

    Specialized monitoring hardware

    Cube communication topology
    ● 24●24●24●160 [2,216,204] cores (!)

    Round trip costs up to 132 forwards

    Traditional monitoring protocol:
    – really, really bad idea

    View Slide

  12. Distributed
    Computing
    Meetup
    09 December 2014
    12/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Typical Discovery Algorithms

    Turn off intrusion detection system
    – Ping every address
    – Port scans every address
    – SNMP and other probes done against
    open ports
    – Walk network to find switch connections

    Turn intrusion detection back on

    Repeat annually, quarterly, monthly or weekly

    View Slide

  13. Distributed
    Computing
    Meetup
    09 December 2014
    13/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Third Dimension:
    Fully distributed work
    Two philosophical underpinnings
    1. Monitoring and Discovery are fully distributed
    2. Reliable “no news is good news”
    Only responses to changes are centralized

    View Slide

  14. Distributed
    Computing
    Meetup
    09 December 2014
    14/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Simple Scalability
    I can explain how we scale so your
    grandmother would understand...

    View Slide

  15. Distributed
    Computing
    Meetup
    09 December 2014
    15/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Simple Scalability
    I can explain how we scale so your
    grandmother would understand...
    istockphoto
    ©bowdenimages

    View Slide

  16. Distributed
    Computing
    Meetup
    09 December 2014
    16/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Massive Scalability – or
    “I see dead servers in O(1) time”

    Adding systems does not increase the monitoring work on any
    system

    Each server monitors 2 (or 4) neighbors

    Each server monitors and discovers its own services

    Ring repair and alerting is O(n) – but a very small amount of work
    Current Implementation

    View Slide

  17. Distributed
    Computing
    Meetup
    09 December 2014
    17/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Minimizing Network Footprint
    (planned)

    Support diagnosing switch issues

    Minimize network traffic

    Ideal for multi-site arrangements

    View Slide

  18. Distributed
    Computing
    Meetup
    09 December 2014
    18/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Fourth Dimension:
    Architectural Components
    Three Architectural Components
    1. Collective Management Authority

    One CMA per installation
    2. Nanoprobes (agents)

    One per system
    3. Data Storage

    Central Neo4j graph database (CMDB)

    View Slide

  19. Distributed
    Computing
    Meetup
    09 December 2014
    19/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Basic CMA Functions (python)
    Nanoprobe management

    Configure & direct

    Hear alerts & discovery

    Update rings: join/leave
    Update database
    Issue alerts
    -- provide event notification

    View Slide

  20. Distributed
    Computing
    Meetup
    09 December 2014
    20/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Nanoprobe Functions ('C')
    Announce self to CMA

    Default: use reserved multicast address
    Do what CMA says

    receive configuration information
    – CMA addresses, ports, defaults

    send/expect heartbeats

    perform discovery actions

    perform monitoring actions
    No persistent state across reboots

    View Slide

  21. Distributed
    Computing
    Meetup
    09 December 2014
    21/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Service Monitoring based on
    HA Technologies

    Well-proven architecture:
    – “no news is good news” AKA
    management by exception

    Implements Open Cluster Framework
    standard (LSB and others)

    Each system monitors own services

    Can also start, stop, migrate services

    View Slide

  22. Distributed
    Computing
    Meetup
    09 December 2014
    22/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Monitoring Pros and Cons
    Pros
    Simple & Scalable
    Uniform work distribution
    No single point of failure
    Distinguishes switch vs
    host failure
    Easy on LAN, WAN
    Multi-tenant approach
    Cons
    Active agents
    Potential slowness
    at power-on

    View Slide

  23. Distributed
    Computing
    Meetup
    09 December 2014
    23/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Why a graph database? (Neo4j)

    Humans describe systems as graphs

    Dependency & Discovery information: graph

    Speed of graph traversals depends on size of
    subgraph, not total graph size

    Root cause queries  graph traversals –
    notoriously slow in relational databases

    Visualization is Natural

    Schema-less design: good for constantly changing
    heterogeneous environment

    Graph Model === Object Model

    View Slide

  24. Distributed
    Computing
    Meetup
    09 December 2014
    24/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    A multi-dimensional demo

    Demonstrate basic capabilities
    – Discovery
    – Discovery-driven monitoring configuration
    – Discovery-driven 'tripwire-like' checksums
    – Monitoring – failures / successes
    – Host down notification

    No configuration was supplied
    – everything comes from discovery
    http://assimilationsystems.com/90_second_demo/

    View Slide

  25. Distributed
    Computing
    Meetup
    09 December 2014
    25/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Communications Attributes

    Non-heartbeat communication is rare
    – could be months or years between packets

    Some data sent to CMA is sensitive

    Command sent to nanoprobes are
    potentially dangerous

    CMA connects to up to 106 clients

    No news is good news: cannot lose
    information

    View Slide

  26. Distributed
    Computing
    Meetup
    09 December 2014
    26/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Fifth Dimension
    Communications Protocol

    UDP with reliable transmission protocol
    – packets ACKed when acted on

    Includes signatures, encryption,
    compression

    Communication resets happen on next
    communication – not immediately

    Encryption is almost done (this week!)
    – using libsodium – curve25519 encryption

    View Slide

  27. Distributed
    Computing
    Meetup
    09 December 2014
    27/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Key Management Scenarios

    Nanoprobe one-time initialization

    CMA one-time initialization

    Nanoprobe startup

    Command flow

    View Slide

  28. Distributed
    Computing
    Meetup
    09 December 2014
    28/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Nanoprobe one-time
    initialization

    View Slide

  29. Distributed
    Computing
    Meetup
    09 December 2014
    29/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    CMA one-time initialization

    View Slide

  30. Distributed
    Computing
    Meetup
    09 December 2014
    30/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Nanoprobe Startup

    View Slide

  31. Distributed
    Computing
    Meetup
    09 December 2014
    31/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Command Processing

    View Slide

  32. Distributed
    Computing
    Meetup
    09 December 2014
    32/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Sixth Dimension:
    Current Status

    Fourth release out 20 October 2014
    – next release (December?) will have encrypted comm

    Great unit tests

    Several discovery methods written

    Extensible Automated Discovery Triggers

    Discovery => Automatic Monitoring (WOOT!)

    Discovery => Network-Facing Checksums

    Command Line Queries

    Licenses: Commercial or GPLv3

    View Slide

  33. Distributed
    Computing
    Meetup
    09 December 2014
    33/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Seventh Dimension:
    Get Involved!
    We need you!

    Early adopters

    Testers, Continuous Integration

    Best practice experts

    Designers

    Developers (C,Python, Shell, PowerShell, JavaScript)

    Porters (esp Windows)

    Promoters, Publicists, Packagers, etc.

    View Slide

  34. Distributed
    Computing
    Meetup
    09 December 2014
    34/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Resistance Is Futile!
    These slides: bit.ly/AssimDCM14
    Mailing List bit.ly/AssimML
    #AssimProj @OSSAlanR
    #assimilation on freenode IRC
    Project Web Site
    assimproj.org
    Company Web Site
    assimilationsystems.com

    View Slide

  35. Distributed
    Computing
    Meetup
    09 December 2014
    35/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Fifth Dimension:
    Discovery API
    Scripts perform discovery
    – output JSON
    Three Sample Discovery Snippets

    OS information

    Service discovery

    Client discovery

    View Slide

  36. Distributed
    Computing
    Meetup
    09 December 2014
    36/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    How does discovery work?
    Nanoprobe scripts perform discovery

    Each discovers one kind of information

    Can take arguments from environment

    Output JSON
    CMA stores Discovery Information

    JSON stored in Neo4j database

    CMA discovery plugins => graph nodes
    and relationships

    View Slide

  37. Distributed
    Computing
    Meetup
    09 December 2014
    37/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    A Few Canned Queries
    allipports get all port/ip/service/hosts
    allswitchports get switch connections
    crashed get crashed servers
    shutdown get gracefully shutdown servers
    downservices get nonworking services
    findip get system owning IP
    findmac get system owning MAC
    unknownips get unknown IP addresses
    unmonitored get unmonitored services

    View Slide

  38. Distributed
    Computing
    Meetup
    09 December 2014
    38/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    OS discovery JSON Snippet
    { "nodename": "alanr-1225B",
    "operating-system": "GNU/Linux",
    "machine": "x86_64",
    "processor": "x86_64",
    "hardware-platform": "x86_64",
    "kernel-name": "Linux",
    "kernel-release": "3.8.0-31-generic",
    "kernel-version": "#46-Ubuntu SMP ...",
    "Distributor ID": "Ubuntu",
    "Description": "Ubuntu 13.04",
    "Release": "13.04",
    "Codename": "raring" }

    View Slide

  39. Distributed
    Computing
    Meetup
    09 December 2014
    39/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    "sshd": {
    "exe": "/usr/sbin/sshd",
    "cmdline": [ "/usr/sbin/sshd", "-D" ],
    "uid": "root",
    "gid": "root",
    "cwd": "/",
    "listenaddrs": {
    "0.0.0.0:22": {
    "proto": "tcp",
    "addr": "0.0.0.0",
    "port": 22 },
    sshd Service JSON Snippet
    (from netstat and /proc)

    View Slide

  40. Distributed
    Computing
    Meetup
    09 December 2014
    40/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    "ssh": {
    "exe": "/usr/sbin/ssh",
    "cmdline": [ "ssh", "servidor" ],
    "uid": "alanr",
    "gid": "alanr",
    "cwd": "/home/alanr/monitor/src",
    "clientaddrs": {
    "10.10.10.5:22": {
    "proto": "tcp",
    "addr": "10.10.10.5",
    "port": 22 },
    ssh Client JSON Snippet
    (from netstat and /proc)

    View Slide

  41. Distributed
    Computing
    Meetup
    09 December 2014
    41/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Two Schema subgraphs

    Client / server
    dependency

    Switch interconnect

    View Slide

  42. Distributed
    Computing
    Meetup
    09 December 2014
    42/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    ssh -> sshd dependency graph

    View Slide

  43. Distributed
    Computing
    Meetup
    09 December 2014
    43/43
    D
    i
    s
    t
    C
    o
    m
    p
    .
    2
    0
    1
    4
    © 2014 Assimilation Systems Limited
    Switch Discovery Data
    from LLDP (or CDP)

    View Slide