Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient monitoring with open source tools (Ti...

Osman
October 23, 2016

Efficient monitoring with open source tools (Time series databases)

Ozgur Web Gunleri 2016

Osman

October 23, 2016
Tweet

More Decks by Osman

Other Decks in Technology

Transcript

  1. Who i am? • software developer with system-administration background over

    10 years • mostly writes Java and PHP • also working about infrastructure design, system automation, deployment and monitoring • obsessed about clean, well structured, maintainable and scalable architectures. • loves open source github.com/o
  2. My career path • in 2002, i started to learn

    fundamentals of Linux network and security. After that, for years i sold and managed dedicated servers and shared web hosting • after the Linux administration story, in 2005 dived into PHP and learned principles of object-oriented-programming • in 2010, i'd started a company which is uses Java, Spring Framework and SOA architecture. Ported thousands of line PHP code to Java and experienced with very large traffic. Slowly i'd embraced Java, NoSQL, RESTful and micro-services architectures • Since August 2015, i'm working as a freelance consultant, trainer and developer. I'm an active contributor and author of open-source projects.
  3. Today • Why i need? • Best practices • Time-series

    databases • Agents • Dashboards • Alerting
  4. • What is your application doing right now? • Do

    you will be notifed when a server fails?
  5. • Fixing problems is difficult without logs and monitoring •

    Sleep better by automation and monitoring
  6. • Can the users hit my page? • What is

    %95th page load time? • Is our revenue increased? • What are mostly occured exceptions in last hour?
  7. • I didn't change the code, something wonky? • Which

    part of system is unaccesible? • Do i need to scale up / down my servers? • Is my servers works over capacity?
  8. • Is (rdbms|mq|cache) running healthy? • What are (mem|cpu|disk|io) usage

    of servers? • Is (app server|web server|lb) is up? • Current (bandwidth|network) usage comparing with last weeks?
  9. A time series database (TSDB) is a software system that

    is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace. Wikipedia
  10. RRDTool • Round robin database tool (File based) • Successor

    of MRTG • Used by Nagios, Munin, Cacti, pfSense, Ganglia • Storing and graphing capability • Outdated data model, only command line interface
  11. Graphite • Whisper database library (File based) • Very popular,

    simple to operate • Tons of tools that work with graphite • Comes with dashboard, nice functions • Outdated data model, doesn't scale
  12. InfluxDB • Time Structured Merge Tree (TSM) • Easy to

    operate, highly customisable • Also supports events • Good performance, InfluxQL • Clustering removed from open source edition
  13. Prometheus • Local file per time series • Pull based

    metric collectors, PromQL • Easy to operate, good data model • Effecient storage, good performance • Also supports alerting
  14. OpenTSDB • Hadoop backed • Scales very well, moderate performance

    • JSON over HTTP • One of the first databases to use metric lables in its data model • Painful to operate
  15. RiakTS • Riak backed • Very easy to operate •

    Moderate performance • Highly resilient • Good data model, querying like SQL
  16. DalmatinerDB • Riak backed • Very high performance • Clustering

    and fault tolerance • Works with ZFS, Postgres • Limited client support
  17. KairosDB • Cassandra storage • Fast writes • Good data

    model • Ineffecient storage • Slow to query
  18. Blueflood • Cassandra storage • Good performance • Highly scalable

    • Outdated data model • Metric processing system behind Rackspace Metrics