Errors: to log, or not to log?

Errors to log, or not to log? Konrad Reiche By
Renée French (CC)

if err != nil { return nil, err }

cmd, err := rdb.Get(ctx, "key") if err != nil {
return nil, err }

resp, err := handler(ctx, req) middleware.go

resp, err := handler(ctx, req) if err != nil {
log.Error(err) } middleware.go

[ERROR] client update: read tcp: i/o timeout [ERROR] dial tcp:
lookup redis: i/o timeout [ERROR] write tcp 10.101.244.225:46470->192.168.22.191: i/o timeout [ERROR] redis: read tcp 10.101.244.225:46470->192.168.22.191:6379: i/o timeout [ERROR] fetch data: invalid parameter type: int [ERROR] context canceled [ERROR] unexpected end of JSON input [ERROR] read tcp 10.108.131.55:46078->192.168.10.169:9090: read: connection reset by peer [ERROR] internal error [ERROR] rpc error: code = Unknown desc = no actions were found to be valid [ERROR] rpc error: code = Canceled desc = context canceled [ERROR] context canceled [ERROR] context canceled [ERROR] HTTP 500: write tcp 10.108.251.133:9090->10.111.167.92:33674: i/o timeout [ERROR] rpc error: code = Unknown desc = no actions were found to be valid

Noisy Error Monitoring System - Time consuming to ﬁnd critical
errors - Error log is ignored altogether - Logging errors in the ﬁrst place becomes pointless

lookup redis: i/o timeout [ERROR] write tcp 10.101.244.225:46470->192.168.22.191: i/o timeout [ERROR] redis: read tcp 10.101.244.225:46470->192.168.22.191:6379: i/o timeout [ERROR] fetch data: invalid parameter type: int [ERROR] context canceled [ERROR] unexpected end of JSON input [ERROR] read tcp 10.108.131.55:46078->192.168.10.169:9090: read: connection reset by peer [ERROR] internal error [ERROR] rpc error: code = Unknown desc = no actions were found to be valid [ERROR] rpc error: code = Canceled desc = context canceled [ERROR] context canceled [ERROR] context canceled [ERROR] HTTP 500: write tcp 10.108.251.133:9090->10.111.167.92:33674: i/o timeout [ERROR] rpc error: code = Unknown desc = no actions were found to be valid

lookup redis: i/o timeout [ERROR] write tcp 10.101.244.225:46470->192.168.22.191: i/o timeout [ERROR] redis: read tcp 10.101.244.225:46470->192.168.22.191:6379: i/o timeout [ERROR] context canceled [ERROR] rpc error: code = Canceled desc = context canceled [ERROR] context canceled [ERROR] context canceled [ERROR] HTTP 500: write tcp 10.108.251.133:9090->10.111.167.92:33674: i/o timeout

log.Error(err) } middleware.go

var e net.Error if errors.As(err, &e) && e.Timeout() { timeouts.Inc() return } log.Error(err) } middleware.go

var e net.Error if errors.As(err, &e) && e.Timeout() { timeouts.Inc() return } if errors.Is(err, context.Canceled) { contextsCanceled.Inc() return } log.Error(err) } middleware.go

return nil, err } redis.go middleware.go ... ...

var e net.Error if errors.As(err, &e) && e.Timeout() { timeouts.Inc() // ??? } return nil, err } redis.go middleware.go ... ...

type loggable interface { error Log() bool }

type ObservedError struct { cause error } func (e ObservedError)
Log() bool { return false } func (e ObservedError) Unwrap() error { return e.cause }

var e net.Error if errors.As(err, &e) && e.Timeout() { timeouts.Inc() // ??? } return nil, err } redis.go middleware.go ... ...

var e net.Error if errors.As(err, &e) && e.Timeout() { timeouts.Inc() err = metrics.NewObservedError(err) return nil, err } return nil, err } redis.go middleware.go ... ...

var e loggable if errors.As(err, &e) && !e.Log() { return } log.Error(err) } middleware.go

lookup redis: i/o timeout [ERROR] fatal error: concurrent map read and map write [ERROR] redis: read tcp 10.101.244.225:46470->192.168.22.191:6379: i/o timeout [ERROR] fetch data: invalid parameter type: int [ERROR] context canceled [ERROR] unexpected end of JSON input [ERROR] read tcp 10.108.131.55:46078->192.168.10.169:9090: read: connection reset by peer [ERROR] internal error [ERROR] rpc error: code = Unknown desc = no actions were found to be valid [ERROR] rpc error: code = Canceled desc = context canceled [ERROR] context canceled [ERROR] context canceled [ERROR] HTTP 500: write tcp 10.108.251.133:9090->10.111.167.92:33674: i/o timeout [ERROR] rpc error: code = Unknown desc = no actions were found to be valid

[ERROR] fatal error: concurrent map read and map write [ERROR]
fetch data: invalid parameter type: int

Thank you [@|u/]konradreiche[.com]

Errors: to log, or not to log?

Errors: to log, or not to log?

Konrad Reiche

More Decks by Konrad Reiche

Other Decks in Programming

Featured

Transcript