Reliable Log Aggregation System in Multi-Tenant Kubernetes cluster

Speaker: Hiroki Sakamoto - Role: Site Reliability Engineer at Verda
- Mission: Improving System Reliability - Interest: Kubernetes, Distributed System

Background

What is Verda? is based on OpenStack. since 2016~ FaaS
PaaS IaaS NAT LB Bare metal

What is Verda? Virtual Machines 74,000+ Physical Machines 30,000+ Hypervisors
4,000+

SRE Teams for Verda Platform wide SRE Provide Verda-Internal platform
to improve Verda services reliability Infra Management Manage our physical infrastructure resources to host Verda services

SRE Teams for Verda Platform wide SRE Provide Verda-Internal platform
to improve Verda services reliability Here!

Log aggregation in Verda

Presentation Targets Targets - The people who address the
issues similar to ours - The people who are thinking about Multi-Tenant architecture - The people who can make decision about architecture Theme - Config Management in Multi-Tenant Kuberenetes - Operation for Fluentd in Multi-Tenant Kubernetes

Benefits - Get a useful idea to address the
config management in Multi- Tenant Kubernetes - Get knowledge about log management and config management in Multi-Tenant Kuberetes before your release

Issues about logging in Verda

Provide Multi-Tenant Kubernetes for Verda Purpose - Aggregate infra resources
- Standardize operations - Provide internal platform tools to reduce operation costs /PWB /FVUSPO ,FZTUPOF .POJUPSJOH -PBECBMBODFS /"5 %BTICPBSE %FTJHOBUF $JOEFS

Original log aggregation mechanism /PWB"1* emptydir -PHSPUBUFE /FVUSPO"1* emptydir -PHSPUBUFE
Elasticsearch

Elasticsearch - Fluentd and Logrotated are in a Pod as sidecars - Emptydir is used to share log files among the containers

Elasticsearch Generate logs!

Elasticsearch Tail logs

Elasticsearch Send logs

Elasticsearch Rotate logs If needed

Elasticsearch Pain Points - Too many sidecars in all of the Pods - All developers must maintain Fluentd regardless of their knowledge - Lack of monitoring, taking care of performance, reliability and durability

Elasticsearch Hard to schedule pods efficiently due to too many containers!!

Elasticsearch Hard to schedule pods efficiently due to too many containers!! Quality depends on each teams

Elasticsearch Hard to schedule pods efficiently due to too many containers!! Quality depends on each teams Need to send “Audit logs” but don’t have enough monitoring

Elasticsearch 5XFNPKJzCZ$PQZSJHIU5XJUUFS *ODBOEPUIFSDPOUSJCVUPSTJTMJDFOTFEVOEFS$$#:

Elasticsearch We need to re-think!

Solutions for the issues

2 solutions Provide Managed Fluentd cluster Provide Fluentd Config Operator

Managed Fluentd Cluster /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout
logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Managed scope

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Forwarders - Collect logs and send them to aggregators - Deployed as Daemonset It means that a node has only one Fluentd container

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Aggregators - Receive logs from forwarders - Process and filter logs - Send logs to datastore like ES - Deployed as StatefulSet - With PersistentVolume

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Why split? - Not need much resource for Daemonset - Improve Scalability - Reduce changing scope when deploying

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Pods output logs to stdout

"HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout Forwarder Docker log driver copies from stdout to other log file logfile logfile

"HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout Forwarder Tail from the logs logfile logfile

"HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout Forwarder Aggregate and process logs logfile logfile

"HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout Forwarder Send logs logfile logfile

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Fluentd is shared resource

For durability /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout "HHSFHBUPS/PEFT
Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout Forwarder - Buffer logs in each host directory - Flush buffers at shutdown - Save the position it’s already read into files - Require ack response from aggregator logfile logfile

For durability /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile
"HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder - Buffer logs in each PersistentVolume - Flush buffers at shutdown - Distribute Pods across nodes

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Pros - Developers don’t need to maintain Fluentd - Fluentd can buffer logs while the destinations are down - Easy to scale aggregators - Monitored by SRE Team so developers don’t need to do that - Ensured durability, reliability and performance by SRE Team

logfile /PEF# ,FZTUPOF /"5 -PBECBMBODFS stdout stdout stdout logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Forwarder Aggregator Aggregator Aggregator So... How do developers apply their own logging config? 5XFNPKJzCZ$PQZSJHIU5XJUUFS *ODBOEPUIFSDPOUSJCVUPSTJTMJDFOTFEVOEFS$$#:

2 solutions Provide Managed Fluentd cluster Provide Fluentd Config Operator

Shared Fluentd Issues /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout
logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder A team would apply...

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder The other team would apply...

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Conflict! Conflict! Conflict!

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Some team would apply broken config... !

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Process down!!

logfile "HHSFHBUPS/PEFT Elasticsearch Forwarder Aggregator Aggregator Aggregator /PEF# /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile Forwarder Process down!! What happen?!

Shared Fluentd Issues - Changes to config would be conflicted
among some teams - Invalid config causes process down - Developers need to take care of Fluentd when applying config

Requirements - All configs should be validated before applying
- All configs shouldn’t affect other teams’ configs - All configs shouldn’t cause process down - All applying config shouldn’t make developers operate manually

LoggingPipeline Forwarder Config Fluentd Config Operator Forwarder Aggregator Fluentd
Config Operator Aggregator Config FluentdNode Developers SRE

Fluentd Config Operator LoggingPipeline Forwarder Config Forwarder Aggregator Fluentd
Config Operator Aggregator Config FluentdNode Developers SRE - Automatically validate config written in CRD “LoggingPipeline” - Automatically compile the Fluentd config to CM if the config is valid - Automatically notify Fluentd to reload new config - Automatically block config if the config is invalid

Config Operator Aggregator Config FluentdNode Developers SRE Apply to specify managed Fluentd

Config Operator Aggregator Config FluentdNode Developers SRE Apply logging config

Fluentd Config Operator Where to collect logs

Fluentd Config Operator How to process logs

Fluentd Config Operator Where to send logs

Config Operator Aggregator Config FluentdNode Developers SRE Start to reoncile

Config Operator Aggregator Config FluentdNode Developers SRE Compile config to Configmaps for validation for forwarder for aggregator

Config Operator Aggregator Config FluentdNode Developers SRE for forwarder for aggregator Start validation

Config Operator Aggregator Config FluentdNode Developers SRE Compile & update configs If success

Config Operator Aggregator Config FluentdNode Developers SRE Notify Fluentd specified in FluentdNode CRD if config get updated

Config Operator Aggregator Config FluentdNode Developers SRE All developers need to do is specify log source and destination in CRD. 5XFNPKJzCZ$PQZSJHIU5XJUUFS *ODBOEPUIFSDPOUSJCVUPSTJTMJDFOTFEVOEFS$$#:

Config Operator Aggregator Config FluentdNode Developers SRE Dive into this more detail

CRD: LoggingPipeline Support stdout as log source

CRD: LoggingPipeline Support logs in emptyDir as log source

CRD: LoggingPipeline Support logs defined in snippet as log
source

Compile LoggingPipeline

Compile LoggingPipeline Compiled for forwarders

Compile LoggingPipeline Automatically generate to indicate emptyDir path in
host

Compile LoggingPipeline Automatically add to ensure durability

Compile LoggingPipeline Compiled for aggregators

Compile LoggingPipeline Relabel to encapsulate not to affect other
configs

Compile LoggingPipeline Automatically complicate prefix to save in persistent
volume

Compile LoggingPipeline - Compile separately for forwarders and aggregators
- Automatically complicate important parameters - Automatically wrap config with label to isolate it not to affect others - Automatically change directory to buffer logs to ensure durability

Config Validation - Run static validation - Run pod
with dry-run command for forwarders - Run pod with actual-run to ensure connectivity to destinations for aggregators

Operations for the solutions

Load Test /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile
/PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile

/PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder Dark launch in prod

/PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder About 250GB/day

Load Test Results… The CPU usage of some aggregators
got very higher periodically regardless of plenty of aggregator instances - Event thread in Fluentd was hanging up - Connections between the aggregator and forwarders were not too much - It means that aggregation, processing and writing buffer are heavy - But I/O was not hanging up

got very higher periodically regardless of plenty of aggregator instances - Event thread in Fluentd was hanging up - Connections between the aggregator and forwarders were not too much - It means that aggregation, processing and writing buffer are heavy - But I/O was not hanging up Log chunk size may be too much Let’s make chunk size be lower! 5XFNPKJzCZ$PQZSJHIU5XJUUFS *ODBOEPUIFSDPOUSJCVUPSTJTMJDFOTFEVOEFS$$#:

got very higher periodically regardless of plenty of aggregator instances - Event thread in Fluentd was hanging up - Connections between the aggregator and forwarders were not too much - It means that aggregation, processing and writing buffer are heavy - But I/O was not hanging up Resolved 5XFNPKJzCZ$PQZSJHIU5XJUUFS *ODBOEPUIFSDPOUSJCVUPSTJTMJDFOTFEVOEFS$$#:

Monitoring AlertManager Cluster Remote write Monitor VM Alert Fire alerts
Query periodically /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder

Monitoring AlertManager Cluster Remote write Monitor VM Alert Fire alerts
Query periodically /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder - Prometheus in the cluster scrape metrics from Fluentd containers - It is monitored by a Prometheus which is out of the cluster - The Prometheus writes metrics to VictoriaMetrics TSDB - VM Alert query pre-defined rules for VictoriaMetrics periodically - VM Alert fire alerts to AlertManager if match with the rules - AlertManager sends notifications to the destinations like Slack, PagerDuty

Monitoring - CPU, Memory Usage - Process down, Pod down,
Pod restart count - Whether there is no logs which is sent to the destination - log inflow speed < log processing speed - Disk usage for buffering and buffered bytes - Number of errors and slow flush about Fluentd - Number of errors about Fluentd Config Operator

Project Results

Project Results Provided Managed Fluentd Cluster and Fluentd Config
Operator - Got developers off maintaining Fluentd - All developers need to do is to manage their own logging config itself - Reduced about 172 containers in a cluster - Improve reliability, durability and performance about logging - Found undetected error about logging by monitoring

However… Some issues occurs after release - Docker JSON
Log Driver splits the logs more than 16k so broken json log come to our Fluentd… - We need mechanism to notify developers parsing error

Dead Letter Routing

Next Project

Introduce Kafka and… Elasticsearch /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout
stdout logfile /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder

stdout logfile /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder - Remove direct dependency between forwarder and aggregator to improve scalability - Enable developers to send logs from out of the cluster

stdout logfile /PEF" /PWB /FVUSPO ,FZTUPOF stdout stdout stdout logfile "HHSFHBUPS/PEFT Forwarder Aggregator Aggregator Aggregator Forwarder Standardize logging across all Verda services!

Thank you

Reliable Log Aggregation System in Multi-Tenant...

Reliable Log Aggregation System in Multi-Tenant Kubernetes cluster

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Featured

Transcript