Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
2014 Monitoring Meetup
Search
Alan Robertson
December 04, 2014
Technology
0
70
2014 Monitoring Meetup
Alan presents on the Assimilation Project
Alan Robertson
December 04, 2014
Tweet
Share
More Decks by Alan Robertson
See All by Alan Robertson
Distributing Credentials (Secrets) to Containers
ossalanr
0
62
Distributing Secrets Securely to Containers
ossalanr
0
37
Testing Distributed Systems with Fuzzy Monkey Testing
ossalanr
0
220
2016 BSidesLV OWASP Assimilation talk
ossalanr
0
110
Salt Lake City DevOpsDays
ossalanr
0
110
DevOpsDaysRox (Rockies) Assimilation Security Talk
ossalanr
0
72
2015 Open Source Monitoring Conference (OSMC) slides
ossalanr
0
180
2015 Ohio LinuxFest Assimilation Talk
ossalanr
0
64
Ohio LinuxFest Keynote: Why you should contribute to Open Source projects
ossalanr
0
61
Other Decks in Technology
See All in Technology
MLOps の現場から
asei
7
660
事業貢献を考えるための技術改善の目標設計と改善実績 / Targeted design of technical improvements to consider business contribution and improvement performance
oomatomo
0
140
UI State設計とテスト方針
rmakiyama
3
770
プロダクト開発を加速させるためのQA文化の築き方 / How to build QA culture to accelerate product development
mii3king
1
280
Amazon VPC Lattice 最新アップデート紹介 - PrivateLink も似たようなアップデートあったけど違いとは
bigmuramura
0
200
Microsoft Azure全冠になってみた ~アレを使い倒した者が試験を制す!?~/Obtained all Microsoft Azure certifications Those who use "that" to the full will win the exam! ?
yuj1osm
2
120
DUSt3R, MASt3R, MASt3R-SfM にみる3D基盤モデル
spatial_ai_network
2
220
pg_bigmをRustで実装する(第50回PostgreSQLアンカンファレンス@オンライン 発表資料)
shinyakato_
0
110
社外コミュニティで学び社内に活かす共に学ぶプロジェクトの実践/backlogworld2024
nishiuma
0
280
クレカ・銀行連携機能における “状態”との向き合い方 / SmartBank Engineer LT Event
smartbank
2
100
ずっと昔に Star をつけたはずの思い出せない GitHub リポジトリを見つけたい!
rokuosan
0
160
AWS re:Invent 2024で発表された コードを書く開発者向け機能について
maruto
0
210
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
810
Building Flexible Design Systems
yeseniaperezcruz
327
38k
YesSQL, Process and Tooling at Scale
rocio
169
14k
Done Done
chrislema
182
16k
Building Applications with DynamoDB
mza
91
6.1k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.2k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
The Invisible Side of Design
smashingmag
298
50k
A Philosophy of Restraint
colly
203
16k
Transcript
M o n i t o r i n g
2 0 1 4 Modeling and Monitoring Hundreds of Thousands of Servers using The Assimilation Project #AssimProj @OSSAlanR http://assimproj.org/ Alan Robertson <
[email protected]
> Assimilation Systems Limited http://assimilationsystems.com © 2014 Assimilation Systems Limited
Monitoring Meetup 04 December 2014 2/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Biography • 35+ years in IT/development – 10 years in system management (SysAdmin) • Founded Linux-HA project - led 1998-2007 – aka “Heartbeat” - now called Pacemaker • Founded Assimilation Project in 2010 • Founded Assimilation Systems Limited in 2013 • Alumnus of Bell Labs, SuSE, IBM
Monitoring Meetup 04 December 2014 3/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Highly Scalable Discovery- Driven Automation Continuous Discovery integrated with extreme-scale Monitoring • Continuous extensible discovery – systems, switches, services, dependencies – zero network footprint discovery process • Extensible exception monitoring – more than 100K systems • All data goes into central graph CMDB
Monitoring Meetup 04 December 2014 4/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Assimilation Project History • Inspired by 2 million core computer (cyclops64) • Concerns for extreme scale • Topology aware monitoring • Topology discovery w/out security issues =►Discovery of everything!
Monitoring Meetup 04 December 2014 5/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited
Monitoring Meetup 04 December 2014 6/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited An 8-dimensional overview • Problems Addressed • Unique Capabilities • Distribution of Work • Architectural Components • Discovery Graph Schema • Extensible Discovery API • Current Status • Project Needs
Monitoring Meetup 04 December 2014 7/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited First Dimension: Problems Addressed 1. Risk Management at extreme scale 2. Maintaining detailed discovery database 3. Discovering systems you've forgotten 4. Discovering vulnerable and licensed software you're running – and where 5. Monitoring services, systems & switches 6. Finding services you aren't monitoring
Monitoring Meetup 04 December 2014 8/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Risk Management/Mitigation • Intrusions • Vulnerable Software • Licensed Software • Audit Risk • Outages • System management
Monitoring Meetup 04 December 2014 9/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Why Discovery? (DevOps) • Documentation: incomplete, incorrect • Dependencies: unknown • Planning: Needs accurate data • Best Practices: Verification needs data • ITIL CMDB (Configuration Management Data Base) Our Discovery: continuous, low-profile
Monitoring Meetup 04 December 2014 10/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Second Dimension: Unique Powerful Features 1. Continuous Discovery 2. Discovery: Zero network footprint 3. Centralized graph database 4. We know everything that changes 5. Discover and update dependency information 6. Discovery and monitoring tightly integrated – discovery drives automation
Monitoring Meetup 04 December 2014 11/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited (even more) Features... 7. Discovery and monitoring easily extensible 8. Naturally scalable to > 100K systems 9. Minimal network load 10.Server failures distinguishable from switch failures 11.Best practice and vulnerability alerts 12.Multi-tenant support
Monitoring Meetup 04 December 2014 12/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited This all sounds unreasonable... • Huge scalability without complexity? • Discovery without pings or port scans? Really?
Monitoring Meetup 04 December 2014 13/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Third Dimension: Fully distributed work Two philosophical underpinnings 1. Monitoring and Discovery are fully distributed 2. Reliable “no news is good news” Only responses to changes are centralized
Monitoring Meetup 04 December 2014 14/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Simple Scalability I can explain how we scale so your grandmother would understand...
Monitoring Meetup 04 December 2014 15/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Simple Scalability I can explain how we scale so your grandmother would understand... istockphoto ©bowdenimages
Monitoring Meetup 04 December 2014 16/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Massive Scalability – or “I see dead servers in O(1) time” • Adding systems does not increase the monitoring work on any system • Each server monitors 2 (or 4) neighbors • Each server monitors and discovers its own services • Ring repair and alerting is O(n) – but a very small amount of work Current Implementation
Monitoring Meetup 04 December 2014 17/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Minimizing Network Footprint (planned) • Support diagnosing switch issues • Minimize network traffic • Ideal for multi-site arrangements
Monitoring Meetup 04 December 2014 18/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Fourth Dimension: Architectural Components Three Architectural Components 1. Collective Management Authority • One CMA per installation 2. Nanoprobes (agents) • One per system 3. Data Storage • Central Neo4j graph database (CMDB)
Monitoring Meetup 04 December 2014 19/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Basic CMA Functions (python) Nanoprobe management • Configure & direct • Hear alerts & discovery • Update rings: join/leave Update database Issue alerts -- provide event notification
Monitoring Meetup 04 December 2014 20/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Nanoprobe Functions ('C') Announce self to CMA • Default: use reserved multicast address Do what CMA says • receive configuration information – CMA addresses, ports, defaults • send/expect heartbeats • perform discovery actions • perform monitoring actions No persistent state across reboots
Monitoring Meetup 04 December 2014 21/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Service Monitoring based on HA Technologies • Well-proven architecture: – “no news is good news” AKA management by exception • Implements Open Cluster Framework standard (LSB and others) • Each system monitors own services • Can also start, stop, migrate services
Monitoring Meetup 04 December 2014 22/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Monitoring Pros and Cons Pros Simple & Scalable Uniform work distribution No single point of failure Distinguishes switch vs host failure Easy on LAN, WAN Multi-tenant approach Cons Active agents Potential slowness at power-on
Monitoring Meetup 04 December 2014 23/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Why a graph database? (Neo4j) • Humans describe systems as graphs • Dependency & Discovery information: graph • Speed of graph traversals depends on size of subgraph, not total graph size • Root cause queries graph traversals – notoriously slow in relational databases • Visualization is Natural • Schema-less design: good for constantly changing heterogeneous environment • Graph Model === Object Model
Monitoring Meetup 04 December 2014 24/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited A multi-dimensional demo • Demonstrate basic capabilities – Discovery – Discovery-driven monitoring configuration – Discovery-driven 'tripwire-like' checksums – Monitoring – failures / successes – Host down notification • No configuration was supplied – everything comes from discovery http://assimilationsystems.com/90_second_demo/
Monitoring Meetup 04 December 2014 25/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Fifth Dimension: Discovery API Scripts perform discovery – output JSON Three Sample Discovery Snippets • OS information • Service discovery • Client discovery
Monitoring Meetup 04 December 2014 26/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited How does discovery work? Nanoprobe scripts perform discovery • Each discovers one kind of information • Can take arguments from environment • Output JSON CMA stores Discovery Information • JSON stored in Neo4j database • CMA discovery plugins => graph nodes and relationships
Monitoring Meetup 04 December 2014 27/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited A Few Canned Queries allipports get all port/ip/service/hosts allswitchports get switch connections crashed get crashed servers shutdown get gracefully shutdown servers downservices get nonworking services findip get system owning IP findmac get system owning MAC unknownips get unknown IP addresses unmonitored get unmonitored services
Monitoring Meetup 04 December 2014 28/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited OS discovery JSON Snippet { "nodename": "alanr-1225B", "operating-system": "GNU/Linux", "machine": "x86_64", "processor": "x86_64", "hardware-platform": "x86_64", "kernel-name": "Linux", "kernel-release": "3.8.0-31-generic", "kernel-version": "#46-Ubuntu SMP ...", "Distributor ID": "Ubuntu", "Description": "Ubuntu 13.04", "Release": "13.04", "Codename": "raring" }
Monitoring Meetup 04 December 2014 29/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited "sshd": { "exe": "/usr/sbin/sshd", "cmdline": [ "/usr/sbin/sshd", "-D" ], "uid": "root", "gid": "root", "cwd": "/", "listenaddrs": { "0.0.0.0:22": { "proto": "tcp", "addr": "0.0.0.0", "port": 22 }, sshd Service JSON Snippet (from netstat and /proc)
Monitoring Meetup 04 December 2014 30/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited "ssh": { "exe": "/usr/sbin/ssh", "cmdline": [ "ssh", "servidor" ], "uid": "alanr", "gid": "alanr", "cwd": "/home/alanr/monitor/src", "clientaddrs": { "10.10.10.5:22": { "proto": "tcp", "addr": "10.10.10.5", "port": 22 }, ssh Client JSON Snippet (from netstat and /proc)
Monitoring Meetup 04 December 2014 31/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Sixth Dimension: Graph Schema Two Schema subgraphs • Client / server dependency • Switch interconnect
Monitoring Meetup 04 December 2014 32/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited ssh -> sshd dependency graph
Monitoring Meetup 04 December 2014 33/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Switch Discovery Data from LLDP (or CDP)
Monitoring Meetup 04 December 2014 34/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Seventh Dimension: Current Status • Fourth release out 20 October 2014 – next release (December?) will have encrypted comm • Great unit tests • Several discovery methods written • Extensible Automated Discovery Triggers • Discovery => Automatic Monitoring (WOOT!) • Discovery => Network-Facing Checksums • Command Line Queries • Licenses: Commercial or GPLv3
Monitoring Meetup 04 December 2014 35/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Eighth Dimension: Get Involved! We need you! • Early adopters • Testers, Continuous Integration • Best practice experts • Designers • Developers (C,Python, Shell, PowerShell, JavaScript) • Porters (esp Windows) • Promoters, Publicists, Packagers, etc.
Monitoring Meetup 04 December 2014 36/36 M o n i
t o r i n g 2 0 1 4 © 2014 Assimilation Systems Limited Resistance Is Futile! These slides bit.ly/AssimLFNW14 Mailing List bit.ly/AssimML #AssimProj @OSSAlanR #assimilation on freenode IRC Project Web Site assimproj.org Company Web Site assimilationsystems.com