2013 September Assimilation presentation to TWC

IT Discovery and Monitoring Without Limit using The Assimilation Project
#AssimProj @OSSAlanR http://assimproj.org/ http://bit.ly/AssimTWC2013 Alan Robertson <[email protected]> Assimilation Systems Limited http://assimilationsystems.com

Upcoming Events Time-Warner Cable Developer Conference National Center for Atmospheric
Research Denver Open Source User’s Group GraphConnect San Francisco Open Source Monitoring Conference - Nürnberg NSA / Homeland Security Assimilation Technical Talk Large Installation System Administration Conference - DC Colorado Springs Open Source User’s Group linux.conf.au – Linux Conference in Australia - Perth

12 September 2013 © 2013 Assimilation Systems Limited 3/37 Discovery
Discovering • systems you've forgotten • what you're not monitoring • whatever you'd like • without setting off security alarms

12 September 2013 © 2013 Assimilation Systems Limited 4/37 Monitoring
Monitoring • extreme scale • topology aware • integrated with discovery • easy-to-configure

12 September 2013 © 2013 Assimilation Systems Limited 5/37 Assimilation
Project History • Inspired by 2 million core computer • Concerns for extreme scale • Topology aware monitoring • Topology discovery w/out security issues • Discovery of everything!

12 September 2013 © 2013 Assimilation Systems Limited 6/37 Project
Scope Zero-network-footprint continuous Discovery integrated with extreme-scale Monitoring • Continuous extensible discovery – systems, switches, services, dependencies – zero network footprint • Extensible exception monitoring – more than 100K systems • All data goes into central graph database

12 September 2013 © 2013 Assimilation Systems Limited 7/37 Why
Assimilation Software? • Management Perspective • DevOps Perspective

12 September 2013 © 2013 Assimilation Systems Limited 8/37 Risk
Management/Mitigation • Intrusions • Licensed Software • Audit Risk • Outages • System management

Discovery? (DevOps) • Documentation: incomplete, incorrect • Dependencies: unknown • Planning: Needs accurate data • Best Practices: Verification needs data • ITIL CMDB (Configuration Mgmt DataBase)

Our Monitoring? • Simpler to configure (in theory) • Growth is non-issue • Extremely low network traffic • Ideal for cross-WAN monitoring • Highlight cascading failure root causes • Not confused by switch failures • Most switches get monitored “for free”

12 September 2013 © 2013 Assimilation Systems Limited 11/37 This
all sounds unreasonable... • Huge scalability without complexity? • Discovery without sending packets? Really?

12 September 2013 © 2013 Assimilation Systems Limited 12/37 Architectural
Overview Collective Management Authority • One CMA per installation Nanoprobes • One nanoprobe per OS image Data Storage • Central Neo4j graph database General Rule: “No News Is Good News”

12 September 2013 © 2013 Assimilation Systems Limited 13/37 Simple
Scalability • I can explain how we scale so your grandmother would understand

12 September 2013 © 2013 Assimilation Systems Limited 14/37 Massive
Scalability – or “I see dead servers in O(1) time” • Adding systems does not increase the monitoring work on any system • Each server monitors 2 (or 4) neighbors • Each server monitors its own services • Ring repair and alerting is O(n) – but a very small amount of work • Ring repair for a million nodes is less than 10K packets per day (approximately 1 packet per 9 seconds) Current Implementation

12 September 2013 © 2013 Assimilation Systems Limited 15/37 Decreasing
Network Footprint (planned) • Support diagnosing switch issues • Minimize network traffic • Ideal for multi-site arrangements

12 September 2013 © 2013 Assimilation Systems Limited 16/37 Service
Monitoring Based on Linux-HA LRM • LRM == Local Resource Manager • Well-proven architecture: – “no news is good news” AKA management by exception • Implements Open Cluster Framework standard (and others) • Each system monitors own services • Can also start, stop, migrate services

12 September 2013 © 2013 Assimilation Systems Limited 17/37 Monitoring
Pros and Cons Pros Simple & Scalable Uniform work distribution No single point of failure Distinguishes switch vs host failure Easy on LAN, WAN Cons Active agents Potential slowness at power-on

12 September 2013 © 2013 Assimilation Systems Limited 18/37 How
does this apply to clouds? • Fits nicely into a cloud infrastructure – Should integrate into OpenStack, et al – Can control VMs • Can monitor customer VMs – Add nanoprobe to base image – bottom level of rings disappear without LLDP or CDP

12 September 2013 © 2013 Assimilation Systems Limited 19/37 Architectural
Details • Nanoprobes • CMA • Neo4j

12 September 2013 © 2013 Assimilation Systems Limited 20/37 Nanoprobe
Functions ('C') Announce self to CMA • Reserved multicast address (can be unicast address or name if no multicast) Do what CMA says • receive configuration information – CMA addresses, ports, defaults • send/expect heartbeats • perform discovery actions • perform monitoring actions No persistent state across reboots

12 September 2013 © 2013 Assimilation Systems Limited 21/37 Basic
CMA Functions (python) Nanoprobe management • Configure & direct • Hear alerts & discovery • Update rings: join/leave Update database Issue alerts

a graph database? (Neo4j) • Dependency & Discovery information: graph • Speed of graph traversals depends on size of subgraph, not total graph size • Root cause queries  graph traversals – notoriously slow in relational databases • Visualization of relationships • Schema-less design: good for constantly changing heterogeneous environment

12 September 2013 © 2013 Assimilation Systems Limited 24/37 How
does discovery work? Nanoprobe scripts perform discovery • Each discovers one kind of information • Can take arguments (in environment) • Output JSON CMA stores Discovery Information • JSON stored in Neo4j database • CMA discovery plugins => graph nodes and relationships

12 September 2013 © 2013 Assimilation Systems Limited 25/37 sshd
Service JSON Snippet (from netstat and /proc) "sshd": { "exe": "/usr/sbin/sshd", "cmdline": [ "/usr/sbin/sshd", "-D" ], "uid": "root", "gid": "root", "cwd": "/", "listenaddrs": { "0.0.0.0:22": { "proto": "tcp", "addr": "0.0.0.0", "port": 22 }, and so on...

12 September 2013 © 2013 Assimilation Systems Limited 26/37 ssh
Client JSON Snippet (from netstat and /proc) "ssh": { "exe": "/usr/sbin/ssh", "cmdline": [ "ssh", "servidor" ], "uid": "alanr", "gid": "alanr", "cwd": "/home/alanr/monitor/src", "clientaddrs": { "10.10.10.5:22": { "proto": "tcp", "addr": "10.10.10.5", "port": 22 }, and so on...

12 September 2013 © 2013 Assimilation Systems Limited 28/37 Switch
Discovery Data from LLDP (or CDP) CRM transforms LLDP (CDP) Data to JSON

12 September 2013 © 2013 Assimilation Systems Limited 29/37 Current
State • First release was April 2013 • Great unit test infrastructure • Nanoprobe code – works well • Service monitoring works • Lacking real digital signatures, encryption, compression • Reliable UDP comm code all working • CMA code works, much more to go • Several discovery methods written • Licensed under the GPL

12 September 2013 © 2013 Assimilation Systems Limited 30/37 Future
Plans • Production grade by end of year • Support, commercial licenses • “Real digital signatures, compression, encryption • Other security enhancements • Much more discovery • GUI • Alerting • Reporting • Add Statistical Monitoring • Best Practice Audits • Dynamic (aka cloud) specialization • Hundreds more ideas – See: https://trello.com/b/OpaED3AT

12 September 2013 © 2013 Assimilation Systems Limited 31/37 Get
Involved! Powerful Ideas and Infrastucture Fun, ground-breaking project Looking for early adopters, testers!! Needs for every kind of skill • Awesome User Interfaces (UI/UX) • Evangelism, community building • Test Code (simulate 106 servers!) • Python, C, script coding • Documentation • Feedback: Testing, Ideas, Plans • Many others!

12 September 2013 © 2013 Assimilation Systems Limited 32/37 Resistance
Is Futile! #AssimProj @OSSAlanR #AssimMon Project Web Site http://assimproj.org Blog techthoughts.typepad.com lists.community.tummy.com/cgi-bin/mailman/admin/assimilation

2013 September Assimilation presentation to TWC

2013 September Assimilation presentation to TWC

Alan Robertson

More Decks by Alan Robertson

Other Decks in Technology

Featured

Transcript

IT Discovery and Monitoring Without Limit using The Assimilation Project

Upcoming Events Time-Warner Cable Developer Conference National Center for Atmospheric

12 September 2013 © 2013 Assimilation Systems Limited 3/37 Discovery

12 September 2013 © 2013 Assimilation Systems Limited 4/37 Monitoring

12 September 2013 © 2013 Assimilation Systems Limited 5/37 Assimilation

12 September 2013 © 2013 Assimilation Systems Limited 6/37 Project

12 September 2013 © 2013 Assimilation Systems Limited 7/37 Why

12 September 2013 © 2013 Assimilation Systems Limited 8/37 Risk

12 September 2013 © 2013 Assimilation Systems Limited 9/37 Why

12 September 2013 © 2013 Assimilation Systems Limited 10/37 Why

12 September 2013 © 2013 Assimilation Systems Limited 11/37 This

12 September 2013 © 2013 Assimilation Systems Limited 12/37 Architectural

12 September 2013 © 2013 Assimilation Systems Limited 13/37 Simple

12 September 2013 © 2013 Assimilation Systems Limited 14/37 Massive

12 September 2013 © 2013 Assimilation Systems Limited 15/37 Decreasing

12 September 2013 © 2013 Assimilation Systems Limited 16/37 Service

12 September 2013 © 2013 Assimilation Systems Limited 17/37 Monitoring

12 September 2013 © 2013 Assimilation Systems Limited 18/37 How

12 September 2013 © 2013 Assimilation Systems Limited 19/37 Architectural

12 September 2013 © 2013 Assimilation Systems Limited 20/37 Nanoprobe

12 September 2013 © 2013 Assimilation Systems Limited 21/37 Basic

12 September 2013 © 2013 Assimilation Systems Limited 23/37 Why

12 September 2013 © 2013 Assimilation Systems Limited 24/37 How

12 September 2013 © 2013 Assimilation Systems Limited 25/37 sshd

12 September 2013 © 2013 Assimilation Systems Limited 26/37 ssh

12 September 2013 © 2013 Assimilation Systems Limited 27/37 ssh

12 September 2013 © 2013 Assimilation Systems Limited 28/37 Switch

12 September 2013 © 2013 Assimilation Systems Limited 29/37 Current

12 September 2013 © 2013 Assimilation Systems Limited 30/37 Future

12 September 2013 © 2013 Assimilation Systems Limited 31/37 Get

12 September 2013 © 2013 Assimilation Systems Limited 32/37 Resistance