Efficient monitoring with open source tools (Time series databases)

Effecient monitoring with Open source tools Osman Ungur, github.com/o

Who i am? • software developer with system-administration background over
10 years • mostly writes Java and PHP • also working about infrastructure design, system automation, deployment and monitoring • obsessed about clean, well structured, maintainable and scalable architectures. • loves open source github.com/o

My career path • in 2002, i started to learn
fundamentals of Linux network and security. After that, for years i sold and managed dedicated servers and shared web hosting • after the Linux administration story, in 2005 dived into PHP and learned principles of object-oriented-programming • in 2010, i'd started a company which is uses Java, Spring Framework and SOA architecture. Ported thousands of line PHP code to Java and experienced with very large trafﬁc. Slowly i'd embraced Java, NoSQL, RESTful and micro-services architectures • Since August 2015, i'm working as a freelance consultant, trainer and developer. I'm an active contributor and author of open-source projects.

Today • Why i need? • Best practices • Time-series
databases • Agents • Dashboards • Alerting

What is going on?

• What is your application doing right now? • Do
you will be notifed when a server fails?

How about ﬁxing things?

• Fixing problems is difﬁcult without logs and monitoring •
Sleep better by automation and monitoring

Customers and Boss

• Don't tolerate software errors • Everyone hates "500 Server
Error" • Don't like slow websites

Loss of • productivity • money • reputation • time
• customer • trust

What kind of problems? What to monitor?!?!

• Can the users hit my page? • What is
%95th page load time? • Is our revenue increased? • What are mostly occured exceptions in last hour?

• I didn't change the code, something wonky? • Which
part of system is unaccesible? • Do i need to scale up / down my servers? • Is my servers works over capacity?

If something fails?

You need to get it up and running ASAP

Our objective is reducing • time to detect • time
to repair

Time series databases

A time series database (TSDB) is a software system that
is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace. Wikipedia

RRDTool • Round robin database tool (File based) • Successor
of MRTG • Used by Nagios, Munin, Cacti, pfSense, Ganglia • Storing and graphing capability • Outdated data model, only command line interface

Graphite • Whisper database library (File based) • Very popular,
simple to operate • Tons of tools that work with graphite • Comes with dashboard, nice functions • Outdated data model, doesn't scale

InﬂuxDB • Time Structured Merge Tree (TSM) • Easy to
operate, highly customisable • Also supports events • Good performance, InﬂuxQL • Clustering removed from open source edition

Prometheus • Local ﬁle per time series • Pull based
metric collectors, PromQL • Easy to operate, good data model • Effecient storage, good performance • Also supports alerting

OpenTSDB • Hadoop backed • Scales very well, moderate performance
• JSON over HTTP • One of the ﬁrst databases to use metric lables in its data model • Painful to operate

RiakTS • Riak backed • Very easy to operate •
Moderate performance • Highly resilient • Good data model, querying like SQL

DalmatinerDB • Riak backed • Very high performance • Clustering
and fault tolerance • Works with ZFS, Postgres • Limited client support

KairosDB • Cassandra storage • Fast writes • Good data
model • Ineffecient storage • Slow to query

Blueﬂood • Cassandra storage • Good performance • Highly scalable
• Outdated data model • Metric processing system behind Rackspace Metrics

Others • Druid • Netﬂix Atlas • Chronix Server

Complex monitoring suites • Nagios • Sensu • Ganglia

Log based • Graylog • ELK • Splunk

Agents • Collectd • Diamond • Metrics

Alerting • Riemann • Seyren • Icinga

Questions? github.com/o

Efficient monitoring with open source tools (Time series databases)

Efficient monitoring with open source tools (Time series databases)

More Decks by Osman Ungur

Other Decks in Technology

Featured

Transcript