Logging revamp for OSG
Basic information about syslog
- IETF documented status quo in RFC 3164
- Later obsoleted by RFC 5424
- Originally used for sendmail
- Major point is to keep logs locally, and
optionally send a copy off to the server
- Implementations: syslog-ng, rsyslog (4.x
default in RHEL6)
Facility Number Keyword Facility Description
0 kern kernel messages
1 user user-level messages
2 mail mail system
3 daemon system daemons
4 auth security/authorization messages
5 syslog messages generated internally by syslogd
6 lpr line printer subsystem
7 news network news subsystem
8 uucp UUCP subsystem
9 clock daemon
10 authpriv security/authorization messages
11 ftp FTP daemon
12 - NTP subsystem
13 - log audit
14 - log alert
15 cron clock daemon
16 local0 local use 0 (local0)
17 local1 local use 1 (local1)
18 local2 local use 2 (local2)
19 local3 local use 3 (local3)
20 local4 local use 4 (local4)
21 local5 local use 5 (local5)
22 local6 local use 6 (local6)
23 local7 local use 7 (local7)
Code Severity Keyword Description General Description
0 Emergency emerg (panic) System is unusable. A "panic" condition
usually affecting multiple apps/servers/sites. At this level it would usually notify all
tech staff on call.
1 Alert alert Action must be taken immediately. Should be corrected
immediately, therefore notify staff who can fix the problem. An example would be the
loss of a primary ISP connection.
2 Critical crit Critical conditions. Should be corrected immediately,
but indicates failure in a primary system, an example is a loss of a backup ISP
3 Error err (error) Error conditions. Non-urgent failures, these should
be relayed to developers or admins; each item must be resolved within a given time.
4 Warning warning (warn) Warning conditions. Warning messages, not an
error, but indication that an error will occur if action is not taken, e.g. file system
85% full - each item must be resolved within a given time.
5 Notice notice Normal but significant condition. Events that are
unusual but not error conditions - might be summarized in an email to developers or
admins to spot potential problems - no immediate action required.
6 Informational info Informational messages. Normal operational messages
- may be harvested for reporting, measuring throughput, etc. - no action required.
7 Debug debug Debug-level messages. Info useful to developers for
debugging the application, not useful during operations.
Anatomy of a syslog message
ABNF in 5424 for message format, BUT messages have PRIority , HEADER (ts and source ip/host), MSG (total <1024 bytes
RFC revised to 480 octets, ':[ ' terminated TAG < 32 chars): " TIMESTAMP HOSTNAME MTAG MCONTENT".
SYSLOG-MSG = HEADER SP STRUCTURED-DATA [SP MSG]
HEADER = PRI VERSION SP TIMESTAMP SP HOSTNAME
SP APP-NAME SP PROCID SP MSGID
PRI = "<" PRIVAL ">"
PRIVAL = 1*3DIGIT ; range 0 .. 191
VERSION = NONZERO-DIGIT 0*2DIGIT
HOSTNAME = NILVALUE / 1*255PRINTUSASCII
APP-NAME = NILVALUE / 1*48PRINTUSASCII
PROCID = NILVALUE / 1*128PRINTUSASCII
MSGID = NILVALUE / 1*32PRINTUSASCII
The TIMESTAMP field is the local time and is in the format of "Mmm dd hh:mm:ss" (without the quote marks) where (no YYYY!,
must do in message until RFC rev adds it)
The format of "TAG[pid]:" - without the quote marks - is common. The left square bracket is used to terminate the
TAG field in this case and is then the first character in the CONTENT
field. If the process id is immaterial, it may be left off.
Anatomy of a syslog message (pt 2)
The Priority value is calculated by first:
1. multiplying the Facility number by 8 and then
2. adding the numerical value of the Severity
For example, a kernel message (Facility=0) with a Severity of Emergency (Severity=0) would have
a Priority value of 0. Also, a "local use 4" message (Facility=20) with a Severity of Notice
(Severity=5) would have a Priority value of 165. In the PRI part of a syslog message, these
values would be placed between the angle brackets as <0> and <165> respectively. The only
time a value of "0" will follow the "<" is for the Priority value of "0". Otherwise, leading "0"
s MUST NOT be used.
Diversion: What logs do we have?
- tsm/ship - What are these?
- imapd, mail (sendmail)
- shib (idp-*, shibd, transaction, native)
- torque, listserv (listserv.log), catalina.out
- www (access, error, suexec)
- kern, messages, syslog, auth, local, yum,
transaction, anaconda, up2date
- mod_jk (cm), slapd, net-snmpd (xen)
- handful of others, but legacy
Phase 1: Convert everything to
remote syslog architecture
- Local files with remote syslog
- Local files rotate 'last N days' + permanent
archive copy can be kept on the remote system
- Remote files could be a rough facsimile of
current /nerdc/log setup
- No more log bale (do we really need this???)
- Sane permissions on /nerdc/log subdirectories
for everything possible. Or not, and just allow
Phase 2: Parse deep!
- Logstash is an abstraction layer over
logs coming out of rsyslog
- It can parse and annotate logs, add context
- It can also be used to load log data into
fancier backends like ElasticSearch
- ElasticSearch for the last N days of logs.
- Kibana as a nice UI for ElasticSearch.
Kibana + Elastic Search
Kibana + Elastic Search