Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Move over Graphite, Prometheus is here - php[tek]

Move over Graphite, Prometheus is here - php[tek]

We all agree metrics are important, and Graphite’s a great tool for capturing them. However, in the last few years, the metrics space has released lots of great tools that blow Graphite out of the water—one of which is Prometheus from SoundCloud. Prometheus allows you to query any dimension of your data while still storing it in a highly efficient format.

Together, we’ll take a look at how to get started with Prometheus, including how to create dashboards with Grafana and alerts using AlertManager. By the time you leave, you’ll understand how Prometheus works and will be itching to add it to your projects!

Michael Heap

May 31, 2018
Tweet

More Decks by Michael Heap

Other Decks in Technology

Transcript

  1. #phptek @mheap How many 500 errors in the last 5

    minutes? Is our data processing rate better, worse, or the same as this time last week? How many concurrent users do we have?
  2. #phptek @mheap How many active phone calls are there? What’s

    the average call duration? How many calls have there been to 441234567890 today?
  3. #phptek @mheap # HELP node_filesystem_free_bytes Filesystem free space in bytes.

    # TYPE node_filesystem_free_bytes gauge node_filesystem_free_bytes{device="/dev/disk1s1",fstype="apfs",mountpoint="/ Volumes/Macintosh HD"} 4.9138515968e+10 node_filesystem_free_bytes{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 3.62441240576e+11 node_filesystem_free_bytes{device="/dev/disk2s4",fstype="apfs",mountpoint="/ private/var/vm"} 4.36741931008e+11
  4. #phptek @mheap # HELP go_gc_duration_seconds A summary of the GC

    invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0 go_gc_duration_seconds{quantile="0.25"} 0 go_gc_duration_seconds{quantile="0.5"} 0 go_gc_duration_seconds{quantile="0.75"} 0 go_gc_duration_seconds{quantile="1"} 0 go_gc_duration_seconds_sum 0 go_gc_duration_seconds_count 0 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 6 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.10"} 1 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 827952 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 827952 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.443286e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 243 # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started. # TYPE go_memstats_gc_cpu_fraction gauge go_memstats_gc_cpu_fraction 0 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 169984 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 827952 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 761856 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 1.990656e+06 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 7710 # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes gauge go_memstats_heap_released_bytes 0 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 2.752512e+06 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 0 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 5 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 7953 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 6944 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 16384 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 30096 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 32768 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 4.473924e+06 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 1.059618e+06 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 393216 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 393216 # HELP go_memstats_sys_bytes Number of bytes obtained from system. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 5.867768e+06 # HELP go_threads Number of OS threads created. # TYPE go_threads gauge go_threads 7 # HELP node_cpu_seconds_total Seconds the cpus spent in each mode. # TYPE node_cpu_seconds_total counter node_cpu_seconds_total{cpu="0",mode="idle"} 537140.75 node_cpu_seconds_total{cpu="0",mode="nice"} 0 node_cpu_seconds_total{cpu="0",mode="system"} 202810.13 node_cpu_seconds_total{cpu="0",mode="user"} 236956.35 node_cpu_seconds_total{cpu="1",mode="idle"} 789924.55 node_cpu_seconds_total{cpu="1",mode="nice"} 0 node_cpu_seconds_total{cpu="1",mode="system"} 76430.46 node_cpu_seconds_total{cpu="1",mode="user"} 110379.86 node_cpu_seconds_total{cpu="2",mode="idle"} 521434.82 node_cpu_seconds_total{cpu="2",mode="nice"} 0 node_cpu_seconds_total{cpu="2",mode="system"} 206715.68 node_cpu_seconds_total{cpu="2",mode="user"} 248584.66 node_cpu_seconds_total{cpu="3",mode="idle"} 788754.35 node_cpu_seconds_total{cpu="3",mode="nice"} 0 node_cpu_seconds_total{cpu="3",mode="system"} 76188.77 node_cpu_seconds_total{cpu="3",mode="user"} 111791.47 # HELP node_disk_read_bytes_total The total number of bytes read successfully. # TYPE node_disk_read_bytes_total counter node_disk_read_bytes_total{device="disk0"} 6.22708862976e+11 node_disk_read_bytes_total{device="disk3"} 1.12842752e+08 # HELP node_disk_read_seconds_total The total number of seconds spent by all reads. # TYPE node_disk_read_seconds_total counter node_disk_read_seconds_total{device="disk0"} 22165.627411002 node_disk_read_seconds_total{device="disk3"} 67.88703918 # HELP node_disk_read_sectors_total The total number of sectors read successfully. # TYPE node_disk_read_sectors_total counter node_disk_read_sectors_total{device="disk0"} 4327.06494140625 node_disk_read_sectors_total{device="disk3"} 7.34765625 # HELP node_disk_reads_completed_total The total number of reads completed successfully. # TYPE node_disk_reads_completed_total counter node_disk_reads_completed_total{device="disk0"} 1.7723658e+07 node_disk_reads_completed_total{device="disk3"} 3762 # HELP node_disk_write_seconds_total This is the total number of seconds spent by all writes. # TYPE node_disk_write_seconds_total counter node_disk_write_seconds_total{device="disk0"} 8632.255762983 node_disk_write_seconds_total{device="disk3"} 0 # HELP node_disk_writes_completed_total The total number of writes completed successfully. # TYPE node_disk_writes_completed_total counter node_disk_writes_completed_total{device="disk0"} 1.9779856e+07 node_disk_writes_completed_total{device="disk3"} 0 # HELP node_disk_written_bytes_total The total number of bytes written successfully. # TYPE node_disk_written_bytes_total counter node_disk_written_bytes_total{device="disk0"} 6.94838308864e+11 node_disk_written_bytes_total{device="disk3"} 0 # HELP node_disk_written_sectors_total The total number of sectors written successfully. # TYPE node_disk_written_sectors_total counter node_disk_written_sectors_total{device="disk0"} 4829.06640625 node_disk_written_sectors_total{device="disk3"} 0 # HELP node_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which node_exporter was built. # TYPE node_exporter_build_info gauge node_exporter_build_info{branch="HEAD",goversion="go1.10",revision="002c1ca02917406cbecc457162e2bdb1f29c2f49",version="0.16.0-rc.0"} 1 # HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes. # TYPE node_filesystem_avail_bytes gauge node_filesystem_avail_bytes{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 4.7416078336e+10 node_filesystem_avail_bytes{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 3.58532878336e+11 node_filesystem_avail_bytes{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 3.58565429248e+11 node_filesystem_avail_bytes{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 2.322432e+07 node_filesystem_avail_bytes{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_avail_bytes{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_device_error Whether an error occurred while getting statistics for the given device. # TYPE node_filesystem_device_error gauge node_filesystem_device_error{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 0 node_filesystem_device_error{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 0 node_filesystem_device_error{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 0 node_filesystem_device_error{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 0 node_filesystem_device_error{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_device_error{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_files Filesystem total file nodes. # TYPE node_filesystem_files gauge node_filesystem_files{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 9.223372036854776e+18 node_filesystem_files{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 9.223372036854776e+18 node_filesystem_files{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 9.223372036854776e+18 node_filesystem_files{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 4.294967279e+09 node_filesystem_files{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_files{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_files_free Filesystem total free file nodes. # TYPE node_filesystem_files_free gauge node_filesystem_files_free{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 9.223372036854352e+18 node_filesystem_files_free{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 9.223372036853541e+18 node_filesystem_files_free{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 9.223372036854776e+18 node_filesystem_files_free{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 4.294964965e+09 node_filesystem_files_free{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_files_free{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_free_bytes Filesystem free space in bytes. # TYPE node_filesystem_free_bytes gauge node_filesystem_free_bytes{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 4.9138515968e+10 node_filesystem_free_bytes{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 3.62441240576e+11 node_filesystem_free_bytes{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 4.36741931008e+11 node_filesystem_free_bytes{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 2.322432e+07 node_filesystem_free_bytes{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_free_bytes{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_readonly Filesystem read-only status. # TYPE node_filesystem_readonly gauge node_filesystem_readonly{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 0 node_filesystem_readonly{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 0 node_filesystem_readonly{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 0 node_filesystem_readonly{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 1 node_filesystem_readonly{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_readonly{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_size_bytes Filesystem size in bytes. # TYPE node_filesystem_size_bytes gauge node_filesystem_size_bytes{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 5.9999997952e+10 node_filesystem_size_bytes{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 4.3996317696e+11 node_filesystem_size_bytes{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 4.3996317696e+11 node_filesystem_size_bytes{device="/dev/disk3s2",fstype="hfs",mountpoint="/Volumes/Deckset"} 1.3418496e+08 node_filesystem_size_bytes{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_size_bytes{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_load1 1m load average. # TYPE node_load1 gauge node_load1 2.451171875 # HELP node_load15 15m load average. # TYPE node_load15 gauge node_load15 2.7646484375 # HELP node_load5 5m load average. # TYPE node_load5 gauge node_load5 2.6083984375 # HELP node_memory_active_bytes_total Memory information field active_bytes_total. # TYPE node_memory_active_bytes_total gauge node_memory_active_bytes_total 3.251331072e+09 # HELP node_memory_bytes_total Memory information field bytes_total. # TYPE node_memory_bytes_total gauge node_memory_bytes_total 1.7179869184e+10 # HELP node_memory_free_bytes_total Memory information field free_bytes_total. # TYPE node_memory_free_bytes_total gauge node_memory_free_bytes_total 5.61926144e+08 # HELP node_memory_inactive_bytes_total Memory information field inactive_bytes_total. # TYPE node_memory_inactive_bytes_total gauge node_memory_inactive_bytes_total 3.997949952e+09 # HELP node_memory_swapped_in_pages_total Memory information field swapped_in_pages_total. # TYPE node_memory_swapped_in_pages_total gauge node_memory_swapped_in_pages_total 2.51926528e+09 # HELP node_memory_swapped_out_pages_total Memory information field swapped_out_pages_total. # TYPE node_memory_swapped_out_pages_total gauge node_memory_swapped_out_pages_total 3.131211776e+09 # HELP node_memory_wired_bytes_total Memory information field wired_bytes_total. # TYPE node_memory_wired_bytes_total gauge node_memory_wired_bytes_total 3.211726848e+09 # HELP node_network_receive_bytes_total Network device statistic receive_bytes. # TYPE node_network_receive_bytes_total counter node_network_receive_bytes_total{device="XHC0"} 0 node_network_receive_bytes_total{device="XHC1"} 0 node_network_receive_bytes_total{device="XHC20"} 0 node_network_receive_bytes_total{device="awdl0"} 5120 node_network_receive_bytes_total{device="bridge0"} 0 node_network_receive_bytes_total{device="en0"} 1.214772224e+09 node_network_receive_bytes_total{device="en1"} 0 node_network_receive_bytes_total{device="en2"} 0 node_network_receive_bytes_total{device="en3"} 0 node_network_receive_bytes_total{device="en4"} 0 node_network_receive_bytes_total{device="en5"} 1.000448e+06 node_network_receive_bytes_total{device="gif0"} 0 node_network_receive_bytes_total{device="lo0"} 2.01657344e+09 node_network_receive_bytes_total{device="p2p0"} 0 node_network_receive_bytes_total{device="stf0"} 0 node_network_receive_bytes_total{device="utun0"} 0 node_network_receive_bytes_total{device="utun1"} 505856 node_network_receive_bytes_total{device="utun2"} 23552 node_network_receive_bytes_total{device="utun3"} 46080 node_network_receive_bytes_total{device="utun4"} 0 node_network_receive_bytes_total{device="utun5"} 0 node_network_receive_bytes_total{device="utun6"} 0 node_network_receive_bytes_total{device="vboxnet0"} 1.631232e+06 # HELP node_network_receive_errs_total Network device statistic receive_errs. # TYPE node_network_receive_errs_total counter node_network_receive_errs_total{device="XHC0"} 0 node_network_receive_errs_total{device="XHC1"} 0 node_network_receive_errs_total{device="XHC20"} 0 node_network_receive_errs_total{device="awdl0"} 0 node_network_receive_errs_total{device="bridge0"} 0 node_network_receive_errs_total{device="en0"} 0 node_network_receive_errs_total{device="en1"} 0 node_network_receive_errs_total{device="en2"} 0 node_network_receive_errs_total{device="en3"} 0 node_network_receive_errs_total{device="en4"} 0 node_network_receive_errs_total{device="en5"} 0 node_network_receive_errs_total{device="gif0"} 0 node_network_receive_errs_total{device="lo0"} 0 node_network_receive_errs_total{device="p2p0"} 0 node_network_receive_errs_total{device="stf0"} 0 node_network_receive_errs_total{device="utun0"} 0 node_network_receive_errs_total{device="utun1"} 0 node_network_receive_errs_total{device="utun2"} 0 node_network_receive_errs_total{device="utun3"} 0 node_network_receive_errs_total{device="utun4"} 0 node_network_receive_errs_total{device="utun5"} 0 node_network_receive_errs_total{device="utun6"} 0 node_network_receive_errs_total{device="vboxnet0"} 0 # HELP node_network_receive_multicast_total Network device statistic receive_multicast. # TYPE node_network_receive_multicast_total counter node_network_receive_multicast_total{device="XHC0"} 0 node_network_receive_multicast_total{device="XHC1"} 0 node_network_receive_multicast_total{device="XHC20"} 0 node_network_receive_multicast_total{device="awdl0"} 33 node_network_receive_multicast_total{device="bridge0"} 0 node_network_receive_multicast_total{device="en0"} 5.331321e+06 node_network_receive_multicast_total{device="en1"} 0 node_network_receive_multicast_total{device="en2"} 0 node_network_receive_multicast_total{device="en3"} 0 node_network_receive_multicast_total{device="en4"} 0 node_network_receive_multicast_total{device="en5"} 4 node_network_receive_multicast_total{device="gif0"} 0 node_network_receive_multicast_total{device="lo0"} 266605 node_network_receive_multicast_total{device="p2p0"} 0 node_network_receive_multicast_total{device="stf0"} 0 node_network_receive_multicast_total{device="utun0"} 0 node_network_receive_multicast_total{device="utun1"} 0 node_network_receive_multicast_total{device="utun2"} 0 node_network_receive_multicast_total{device="utun3"} 0 node_network_receive_multicast_total{device="utun4"} 0 node_network_receive_multicast_total{device="utun5"} 0 node_network_receive_multicast_total{device="utun6"} 0 node_network_receive_multicast_total{device="vboxnet0"} 98 # HELP node_network_receive_packets_total Network device statistic receive_packets. # TYPE node_network_receive_packets_total counter node_network_receive_packets_total{device="XHC0"} 0 node_network_receive_packets_total{device="XHC1"} 0 node_network_receive_packets_total{device="XHC20"} 0 node_network_receive_packets_total{device="awdl0"} 42 node_network_receive_packets_total{device="bridge0"} 0 node_network_receive_packets_total{device="en0"} 5.6394197e+07 node_network_receive_packets_total{device="en1"} 0 node_network_receive_packets_total{device="en2"} 0 node_network_receive_packets_total{device="en3"} 0 node_network_receive_packets_total{device="en4"} 0 node_network_receive_packets_total{device="en5"} 4299 node_network_receive_packets_total{device="gif0"} 0 node_network_receive_packets_total{device="lo0"} 3.243677e+06 node_network_receive_packets_total{device="p2p0"} 0 node_network_receive_packets_total{device="stf0"} 0 node_network_receive_packets_total{device="utun0"} 0 node_network_receive_packets_total{device="utun1"} 3548 node_network_receive_packets_total{device="utun2"} 168 node_network_receive_packets_total{device="utun3"} 226 node_network_receive_packets_total{device="utun4"} 0 node_network_receive_packets_total{device="utun5"} 0 node_network_receive_packets_total{device="utun6"} 0 node_network_receive_packets_total{device="vboxnet0"} 1533 # HELP node_network_transmit_bytes_total Network device statistic transmit_bytes. # TYPE node_network_transmit_bytes_total counter node_network_transmit_bytes_total{device="XHC0"} 0 node_network_transmit_bytes_total{device="XHC1"} 0 node_network_transmit_bytes_total{device="XHC20"} 0 node_network_transmit_bytes_total{device="awdl0"} 1.50016e+06 node_network_transmit_bytes_total{device="bridge0"} 0 node_network_transmit_bytes_total{device="en0"} 2.575358976e+09 node_network_transmit_bytes_total{device="en1"} 0 node_network_transmit_bytes_total{device="en2"} 0 node_network_transmit_bytes_total{device="en3"} 0 node_network_transmit_bytes_total{device="en4"} 0 node_network_transmit_bytes_total{device="en5"} 483328 node_network_transmit_bytes_total{device="gif0"} 0 node_network_transmit_bytes_total{device="lo0"} 2.01657344e+09 node_network_transmit_bytes_total{device="p2p0"} 0 node_network_transmit_bytes_total{device="stf0"} 0 node_network_transmit_bytes_total{device="utun0"} 0 node_network_transmit_bytes_total{device="utun1"} 493568 node_network_transmit_bytes_total{device="utun2"} 23552 node_network_transmit_bytes_total{device="utun3"} 46080 node_network_transmit_bytes_total{device="utun4"} 0 node_network_transmit_bytes_total{device="utun5"} 0 node_network_transmit_bytes_total{device="utun6"} 0 node_network_transmit_bytes_total{device="vboxnet0"} 1.695744e+06 # HELP node_network_transmit_errs_total Network device statistic transmit_errs. # TYPE node_network_transmit_errs_total counter node_network_transmit_errs_total{device="XHC0"} 0 node_network_transmit_errs_total{device="XHC1"} 0 node_network_transmit_errs_total{device="XHC20"} 0 node_network_transmit_errs_total{device="awdl0"} 0 node_network_transmit_errs_total{device="bridge0"} 0 node_network_transmit_errs_total{device="en0"} 0 node_network_transmit_errs_total{device="en1"} 0 node_network_transmit_errs_total{device="en2"} 0 node_network_transmit_errs_total{device="en3"} 0 node_network_transmit_errs_total{device="en4"} 0 node_network_transmit_errs_total{device="en5"} 0 node_network_transmit_errs_total{device="gif0"} 0 node_network_transmit_errs_total{device="lo0"} 0 node_network_transmit_errs_total{device="p2p0"} 0 node_network_transmit_errs_total{device="stf0"} 0 node_network_transmit_errs_total{device="utun0"} 0 node_network_transmit_errs_total{device="utun1"} 0 node_network_transmit_errs_total{device="utun2"} 0 node_network_transmit_errs_total{device="utun3"} 0 node_network_transmit_errs_total{device="utun4"} 0 node_network_transmit_errs_total{device="utun5"} 0 node_network_transmit_errs_total{device="utun6"} 0 node_network_transmit_errs_total{device="vboxnet0"} 0 # HELP node_network_transmit_multicast_total Network device statistic transmit_multicast. # TYPE node_network_transmit_multicast_total counter node_network_transmit_multicast_total{device="XHC0"} 0 node_network_transmit_multicast_total{device="XHC1"} 0 node_network_transmit_multicast_total{device="XHC20"} 0 node_network_transmit_multicast_total{device="awdl0"} 0 node_network_transmit_multicast_total{device="bridge0"} 0 node_network_transmit_multicast_total{device="en0"} 0 node_network_transmit_multicast_total{device="en1"} 0 node_network_transmit_multicast_total{device="en2"} 0 node_network_transmit_multicast_total{device="en3"} 0 node_network_transmit_multicast_total{device="en4"} 0 node_network_transmit_multicast_total{device="en5"} 0 node_network_transmit_multicast_total{device="gif0"} 0 node_network_transmit_multicast_total{device="lo0"} 0 node_network_transmit_multicast_total{device="p2p0"} 0 node_network_transmit_multicast_total{device="stf0"} 0 node_network_transmit_multicast_total{device="utun0"} 0 node_network_transmit_multicast_total{device="utun1"} 0 node_network_transmit_multicast_total{device="utun2"} 0 node_network_transmit_multicast_total{device="utun3"} 0 node_network_transmit_multicast_total{device="utun4"} 0 node_network_transmit_multicast_total{device="utun5"} 0 node_network_transmit_multicast_total{device="utun6"} 0 node_network_transmit_multicast_total{device="vboxnet0"} 0 # HELP node_network_transmit_packets_total Network device statistic transmit_packets. # TYPE node_network_transmit_packets_total counter node_network_transmit_packets_total{device="XHC0"} 0 node_network_transmit_packets_total{device="XHC1"} 0 node_network_transmit_packets_total{device="XHC20"} 0 node_network_transmit_packets_total{device="awdl0"} 6691 node_network_transmit_packets_total{device="bridge0"} 1 node_network_transmit_packets_total{device="en0"} 3.2582836e+07 node_network_transmit_packets_total{device="en1"} 0 node_network_transmit_packets_total{device="en2"} 0 node_network_transmit_packets_total{device="en3"} 0 node_network_transmit_packets_total{device="en4"} 0 node_network_transmit_packets_total{device="en5"} 4145 node_network_transmit_packets_total{device="gif0"} 0 node_network_transmit_packets_total{device="lo0"} 3.243677e+06 node_network_transmit_packets_total{device="p2p0"} 0 node_network_transmit_packets_total{device="stf0"} 0 node_network_transmit_packets_total{device="utun0"} 2 node_network_transmit_packets_total{device="utun1"} 3236 node_network_transmit_packets_total{device="utun2"} 160 node_network_transmit_packets_total{device="utun3"} 223 node_network_transmit_packets_total{device="utun4"} 2 node_network_transmit_packets_total{device="utun5"} 2 node_network_transmit_packets_total{device="utun6"} 2 node_network_transmit_packets_total{device="vboxnet0"} 73766 # HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape. # TYPE node_scrape_collector_duration_seconds gauge node_scrape_collector_duration_seconds{collector="cpu"} 0.00013298 node_scrape_collector_duration_seconds{collector="diskstats"} 0.000803364 node_scrape_collector_duration_seconds{collector="filesystem"} 0.000119007 node_scrape_collector_duration_seconds{collector="loadavg"} 2.3448e-05 node_scrape_collector_duration_seconds{collector="meminfo"} 5.3036e-05 node_scrape_collector_duration_seconds{collector="netdev"} 0.000338404 node_scrape_collector_duration_seconds{collector="textfile"} 1.7727e-05 node_scrape_collector_duration_seconds{collector="time"} 2.8571e-05 # HELP node_scrape_collector_success node_exporter: Whether a collector succeeded. # TYPE node_scrape_collector_success gauge node_scrape_collector_success{collector="cpu"} 1 node_scrape_collector_success{collector="diskstats"} 1 node_scrape_collector_success{collector="filesystem"} 1 node_scrape_collector_success{collector="loadavg"} 1 node_scrape_collector_success{collector="meminfo"} 1 node_scrape_collector_success{collector="netdev"} 1 node_scrape_collector_success{collector="textfile"} 1 node_scrape_collector_success{collector="time"} 1 # HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise # TYPE node_textfile_scrape_error gauge node_textfile_scrape_error 0 # HELP node_time_seconds System time in seconds since epoch (1970). # TYPE node_time_seconds gauge node_time_seconds 1.5210412225783854e+09 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served. # TYPE promhttp_metric_handler_requests_in_flight gauge promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 0 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0
  5. #phptek @mheap # HELP node_filesystem_readonly Filesystem read-only status. # TYPE

    node_filesystem_readonly gauge node_filesystem_readonly{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 0 node_filesystem_readonly{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 0 node_filesystem_readonly{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 0 node_filesystem_readonly{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_readonly{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_filesystem_size_bytes Filesystem size in bytes. # TYPE node_filesystem_size_bytes gauge node_filesystem_size_bytes{device="/dev/disk1s1",fstype="apfs",mountpoint="/Volumes/Macintosh HD"} 5.9999997952e+10 node_filesystem_size_bytes{device="/dev/disk2s1",fstype="apfs",mountpoint="/"} 4.3996317696e+11 node_filesystem_size_bytes{device="/dev/disk2s4",fstype="apfs",mountpoint="/private/var/vm"} 4.3996317696e+11 node_filesystem_size_bytes{device="map -hosts",fstype="autofs",mountpoint="/net"} 0 node_filesystem_size_bytes{device="map auto_home",fstype="autofs",mountpoint="/home"} 0 # HELP node_load1 1m load average. # TYPE node_load1 gauge node_load1 2.451171875 # HELP node_load15 15m load average. # TYPE node_load15 gauge node_load15 2.7646484375
  6. #phptek @mheap node_exporter Key Description arp Exposes ARP statistics from

    /proc/net/arp. cpu Exposes CPU statistics filesystem Exposes filesystem statistics, such as disk space used. Itvs Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. uname Exposes system information as provided by the uname system call.
  7. #phptek @mheap mysqld_exporter Key Description perf_schema.tablelocks Collect metrics from performance_schema.table_lock_waits_summary_by_table

    info_schema.processlist Collect thread state counts from information_schema.processlist binlog_size Collect the current size of all registered binlog files auto_increment.columns Collect auto_increment columns and max values from information_schema
  8. #phptek @mheap haproxy_exporter Key Description current_queue Current number of queued

    requests assigned to this server current_sessions Current number of active sessions bytes_in_total Current total of incoming bytes connection_errors_total Total of connection errors
  9. #phptek @mheap memcached_exporter Key Description bytes_read Total number of bytes

    read by this server from network connections_total Total number of connections opened since the server started running items_evicted_total Total number of valid items removed from cache to free memory for new items commands_total Total number of all requests broken down by command (get, set, etc.) and status per slab
  10. #phptek @mheap Pushgateway The Prometheus Pushgateway exists to allow ephemeral

    and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus https://github.com/prometheus/pushgateway https://github.com/Lazyshot/prometheus-php
  11. #phptek @mheap Counters Use for counting events that happen (e.g.

    total number of requests) and query using rate() Gauge Use to instrument the current state of a metric (e.g. memory usage, jobs in queue) Histograms Use to sample observations in order to analyse distribution of a data set (e.g. request latency) Summaries Use for pre-calculated quantiles on client side, but be mindful of calculation cost and aggregation limitations
  12. #phptek @mheap calls_placed_total Element Value calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="BT",number="441234567890",type="landline"} 4 calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="BT",number="442079460000",type="landline"}

    8 calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="BT",number="442079460000",type="landline"} 1 calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="o2",number="447700900000",type="mobile"} 6 calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="o2",number="447908249481",type="mobile"} 7
  13. #phptek @mheap calls_placed_total{number="441234567890"}[3m] Element Value calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="BT",number="441234567890",type="landline"} 3 @1521482766.23 4

    @1521482769.23 12 @1521482772.229 16 @1521482775.229 21 @1521482778.23 25 @1521482781.23 27 @1521482784.229 31 @1521482787.229 35 @1521482790.229
  14. #phptek @mheap calls_placed_total{number="441234567890"}[3m] offset 1w Element Value calls_placed_total{instance="localhost: 3000",job="nexmo_calls",network="BT",number="441234567890",type="landline"} 2

    @1521311766.23 7 @1521311769.23 18 @1523112772.229 20 @1523112775.229 27 @1523112778.23 28 @1523112781.23 30 @1523112784.229 36 @1523112787.229 39 @1523112790.229
  15. #phptek @mheap # Total number of calls regardless of any

    labels sum(calls_placed_total) # Total number of requests, broken down by the number label sum(calls_placed_total[5m]) by (number) # Total per-second rate over the last 5 minutes by number sum(rate(calls_placed_total[5m])) by (number)
  16. #phptek @mheap alert: HighCallsBeingPlacedOnLandline expr: rate(calls_placed_total{network=~".*", type="landline"} [1m]) >10 for:

    5m labels: severity: critical annotations: description: 'Unusually high call count on {{ $labels.network }}' summary: 'High call count on {{ $labels.network }}'
  17. #phptek @mheap [ smtp_from: <tmpl_string> ] [ slack_api_url: <string> ]

    [ victorops_api_key: <string> ] [ victorops_api_url: <string> | default = "https://alert.victorops.com/ integrations/generic/20131114/alert/" ] [ pagerduty_url: <string> | default = "https://events.pagerduty.com/v2/ enqueue" ] [ opsgenie_api_key: <string> ] [ opsgenie_api_url: <string> | default = "https://api.opsgenie.com/" ] [ hipchat_api_url: <string> | default = "https://api.hipchat.com/" ] [ hipchat_auth_token: <secret> ]
  18. #phptek @mheap route: receiver: 'default-receiver' group_wait: 30s group_interval: 5m repeat_interval:

    4h group_by: [cluster, alertname] - receiver: 'database-pager' group_wait: 10s match_re: service: mysql|cassandra - receiver: 'frontend-pager' group_by: [product, environment] match: team: frontend
  19. #phptek @mheap route: receiver: 'default-receiver' group_wait: 30s group_interval: 5m repeat_interval:

    4h group_by: [cluster, alertname] - receiver: 'database-pager' group_wait: 10s match_re: service: mysql|cassandra - receiver: 'frontend-pager' group_by: [product, environment] match: team: frontend
  20. #phptek @mheap route: receiver: 'default-receiver' group_wait: 30s group_interval: 5m repeat_interval:

    4h group_by: [cluster, alertname] - receiver: 'database-pager' group_wait: 10s match_re: service: mysql|cassandra - receiver: 'frontend-pager' group_by: [product, environment] match: team: frontend
  21. #phptek @mheap route: receiver: 'default-receiver' group_wait: 30s group_interval: 5m repeat_interval:

    4h group_by: [cluster, alertname] - receiver: 'database-pager' group_wait: 10s match_re: service: mysql|cassandra - receiver: 'frontend-pager' group_by: [product, environment] match: team: frontend
  22. #phptek @mheap receivers: - name: 'team-X-mails' email_configs: - to: '[email protected]

    - name: 'team-X-pager' email_configs: - to: '[email protected]' pagerduty_configs: - routing_key: <team-X-key> - name: 'team-Y-mails' email_configs: - to: '[email protected]' - name: 'team-Y-pager' pagerduty_configs: - routing_key: <team-Y-key> - name: 'team-DB-pager' pagerduty_configs: - routing_key: <team-DB-key>
  23. #phptek @mheap ! Database is down ! User login failure

    > 100 ! Report generation failure > 15 ! GET /healthcheck returned 500
  24. #phptek @mheap ! Database is down ! User login failure

    > 100 ! Report generation failure > 15 ! GET /healthcheck returned 500
  25. #phptek @mheap ! Database is down ! User login failure

    > 100 ! Report generation failure > 15 ! GET /healthcheck returned 500
  26. #phptek @mheap ! Database is down ! User login failure

    > 100 ! Report generation failure > 15 ! GET /healthcheck returned 500
  27. #phptek @mheap 4.6M time series per server 72k samples ingested

    per second, per server 185 production prometheus servers