Upgrade to Pro — share decks privately, control downloads, hide ads and more …

October 2013 DOSUG Assimilation Presentation

October 2013 DOSUG Assimilation Presentation

Presentation on http://assimproj.org/

Alan Robertson

October 01, 2013
Tweet

More Decks by Alan Robertson

Other Decks in Technology

Transcript

  1. D
    O
    S
    U
    G
    IT Discovery and Monitoring
    Without Limit
    using
    The Assimilation Project
    #AssimProj @OSSAlanR
    http://assimproj.org/
    http://bit.ly/AssimDOSUG2013
    Alan Robertson
    Assimilation Systems Limited
    http://assimilationsystems.com

    View Slide

  2. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 2/37
    D
    O
    S
    U
    G
    Upcoming Events
    Denver Open Source User’s Group (today!)
    Facebook presentation
    GraphConnect San Francisco
    Open Source Monitoring Conference - Nürnberg
    NSA / Homeland Security Assimilation Technical Talk
    Large Installation System Administration Conference - DC
    Colorado Springs Open Source User’s Group
    linux.conf.au – Awesome Australian Linux Conf - Perth
    Details on http://assimilationsystems.com/

    View Slide

  3. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 3/37
    D
    O
    S
    U
    G
    Discovery
    Discovering

    systems you've forgotten

    what you're not monitoring

    whatever you'd like

    without setting off security alarms

    View Slide

  4. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 4/37
    D
    O
    S
    U
    G
    Monitoring
    Monitoring

    extreme scale

    topology aware

    integrated with discovery

    easy-to-configure

    View Slide

  5. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 5/37
    D
    O
    S
    U
    G
    Assimilation Project History

    Inspired by 2 million core computer (cyclops64)

    Concerns for extreme scale

    Topology aware monitoring

    Topology discovery w/out security issues
    =►Discovery of everything!

    View Slide

  6. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 6/37
    D
    O
    S
    U
    G
    Project Scope
    Zero-network-footprint continuous Discovery
    integrated with extreme-scale Monitoring

    Continuous extensible discovery
    – systems, switches, services, dependencies
    – zero network footprint

    Extensible exception monitoring
    – more than 100K systems

    All data goes into central graph database

    View Slide

  7. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 7/37
    D
    O
    S
    U
    G
    Why Assimilation Software?

    Management Perspective

    DevOps Perspective

    View Slide

  8. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 8/37
    D
    O
    S
    U
    G
    Risk Management/Mitigation

    Intrusions

    Licensed Software

    Audit Risk

    Outages

    System management

    View Slide

  9. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 9/37
    D
    O
    S
    U
    G
    Why Discovery? (DevOps)

    Documentation: incomplete, incorrect

    Dependencies: unknown

    Planning: Needs accurate data

    Best Practices: Verification needs
    data

    ITIL CMDB (Configuration Mgmt
    DataBase)
    Our Discovery: continuous, low-profile

    View Slide

  10. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 10/37
    D
    O
    S
    U
    G
    Why Our Monitoring?

    Simpler to configure (in theory)

    Growth is non-issue

    Extremely low network traffic

    Ideal for cross-WAN monitoring

    Highlight cascading failure root causes

    Not confused by switch failures

    Most switches get monitored “for free”

    View Slide

  11. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 11/37
    D
    O
    S
    U
    G
    This all sounds unreasonable...

    Huge scalability without complexity?

    Discovery without sending packets?
    Really?

    View Slide

  12. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 12/37
    D
    O
    S
    U
    G
    Architectural Overview
    Collective Management Authority

    One CMA per installation
    Nanoprobes

    One nanoprobe per OS image
    Data Storage

    Central Neo4j graph database
    General Rule: “No News Is Good News”

    View Slide

  13. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 13/37
    D
    O
    S
    U
    G
    Simple Scalability

    I can explain how we scale so
    your grandmother would
    understand

    View Slide

  14. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 14/37
    D
    O
    S
    U
    G
    Massive Scalability – or
    “I see dead servers in O(1) time”

    Adding systems does not increase the monitoring work on any
    system

    Each server monitors 2 (or 4) neighbors

    Each server monitors its own services

    Ring repair and alerting is O(n) – but a very small amount of work

    Ring repair for a million nodes is less than 10K packets per day
    (approximately 1 packet per 9 seconds)
    Current Implementation

    View Slide

  15. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 15/37
    D
    O
    S
    U
    G
    Minimizing Network Footprint
    (planned)

    Support diagnosing switch issues

    Minimize network traffic

    Ideal for multi-site arrangements

    View Slide

  16. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 16/37
    D
    O
    S
    U
    G
    Service Monitoring
    Based on Linux-HA LRM

    LRM == Local Resource Manager

    Well-proven architecture:
    – “no news is good news” AKA
    management by exception

    Implements Open Cluster Framework
    standard (and others)

    Each system monitors own services

    Can also start, stop, migrate services

    View Slide

  17. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 17/37
    D
    O
    S
    U
    G
    Monitoring Pros and Cons
    Pros
    Simple & Scalable
    Uniform work
    distribution
    No single point of
    failure
    Distinguishes switch
    vs host failure
    Easy on LAN, WAN
    Multi-tenant approach
    Cons
    Active agents
    Potential slowness at
    power-on

    View Slide

  18. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 18/37
    D
    O
    S
    U
    G
    How does this apply to clouds?

    Fits nicely into a cloud infrastructure
    – Should integrate into OpenStack, et al
    – Can control VMs

    Can monitor customer VMs
    – Add nanoprobe to base image
    – bottom level of rings disappear without
    LLDP or CDP

    View Slide

  19. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 19/37
    D
    O
    S
    U
    G
    Architectural Details

    Nanoprobes

    CMA

    Neo4j

    View Slide

  20. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 20/37
    D
    O
    S
    U
    G
    Nanoprobe Functions ('C')
    Announce self to CMA

    Reserved multicast address (can be
    unicast address or name if no multicast)
    Do what CMA says

    receive configuration information
    – CMA addresses, ports, defaults

    send/expect heartbeats

    perform discovery actions

    perform monitoring actions
    No persistent state across reboots

    View Slide

  21. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 21/37
    D
    O
    S
    U
    G
    Basic CMA Functions (python)
    Nanoprobe management

    Configure & direct

    Hear alerts & discovery

    Update rings: join/leave
    Update database
    Issue alerts

    View Slide

  22. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 22/37
    D
    O
    S
    U
    G
    Why a graph database? (Neo4j)

    Dependency & Discovery information: graph

    Speed of graph traversals depends on size
    of subgraph, not total graph size

    Root cause queries  graph traversals –
    notoriously slow in relational databases

    Visualization of relationships

    Schema-less design: good for constantly
    changing heterogeneous environment

    Graph Model === Object Model

    View Slide

  23. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 23/37
    D
    O
    S
    U
    G
    How does discovery work?
    Nanoprobe scripts perform discovery

    Each discovers one kind of information

    Can take arguments (in environment)

    Output JSON
    CMA stores Discovery Information

    JSON stored in Neo4j database

    CMA discovery plugins => graph nodes and
    relationships

    View Slide

  24. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 24/37
    D
    O
    S
    U
    G
    sshd Service JSON Snippet
    (from netstat and /proc)
    "sshd": {
    "exe": "/usr/sbin/sshd",
    "cmdline": [ "/usr/sbin/sshd", "-D" ],
    "uid": "root",
    "gid": "root",
    "cwd": "/",
    "listenaddrs": {
    "0.0.0.0:22": {
    "proto": "tcp",
    "addr": "0.0.0.0",
    "port": 22
    }, and so on...

    View Slide

  25. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 25/37
    D
    O
    S
    U
    G
    ssh Client JSON Snippet
    (from netstat and /proc)
    "ssh": {
    "exe": "/usr/sbin/ssh",
    "cmdline": [ "ssh", "servidor" ],
    "uid": "alanr",
    "gid": "alanr",
    "cwd": "/home/alanr/monitor/src",
    "clientaddrs": {
    "10.10.10.5:22": {
    "proto": "tcp",
    "addr": "10.10.10.5",
    "port": 22
    }, and so on...

    View Slide

  26. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 26/37
    D
    O
    S
    U
    G
    ssh -> sshd dependency graph

    View Slide

  27. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 27/37
    D
    O
    S
    U
    G
    Switch Discovery Data
    from LLDP (or CDP)
    CRM transforms LLDP (CDP) Data to JSON

    View Slide

  28. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 28/37
    D
    O
    S
    U
    G
    Current State

    First release was April 2013

    Great unit test infrastructure

    Nanoprobe code – works well

    Service monitoring works

    Lacks digital signatures, encryption, compression

    Reliable UDP comm code working

    Several discovery methods written

    CMA and database code restructuring near-complete

    UI development underway

    Licensed under the GPL, commercial license available

    View Slide

  29. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 29/37
    D
    O
    S
    U
    G
    Future Plans

    Production grade by end of year

    Purchased support

    “Real digital signatures, compression, encryption

    Other security enhancements

    Much more discovery

    GUI

    Alerting

    Reporting

    Add Statistical Monitoring

    Best Practice Audits

    Dynamic (aka cloud) specialization

    Hundreds more ideas
    – See: https://trello.com/b/OpaED3AT

    View Slide

  30. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 30/37
    D
    O
    S
    U
    G
    Get Involved!
    Powerful Ideas and Infrastucture
    Fun, ground-breaking project
    Looking for early adopters, testers!!
    Needs for every kind of skill

    Awesome User Interfaces (UI/UX)

    Evangelism, community building

    Test Code (simulate 106 servers!)

    Python, C, script coding

    Documentation

    Feedback: Testing, Ideas, Plans

    Many others!

    View Slide

  31. DOSUG
    1 October
    2013
    © 2013 Assimilation Systems Limited 31/37
    D
    O
    S
    U
    G
    Resistance Is Futile!
    #AssimProj @OSSAlanR
    #AssimMon
    Project Web Site
    http://assimproj.org
    Blog
    techthoughts.typepad.com
    lists.community.tummy.com/cgi-bin/mailman/admin/assimilation

    View Slide