データルーター？Vector/Getting Started with Vector

Getting Started with Vector Cloud native meetup tokyo #9 This
document includes the work that is distributed in the Apache License 2.0

proﬁle: name: Wataru Matsui org: [ Z Lab, 3bi.tech ]
twitter: @watawuwu

• What’s Vector? • Usage • VS ... • Roadmap
• Conclusions Agenda

What’s Vector？ https://vector.dev

Logs, Metrics & Events Router Is like Fluentd?

Developed by Timber.io https://timber.io

Feature • Log, Metrics, or Events • Agent Or Service
• Fast • Correct • Clear Guarantee • Vendor Neutral • Easy To Deploy • Hot Reload

• Fluentd • Fluent Bit • Filebeat • Logstash Similar
tool

　Summary ©timber.io

©timber.io

Topologies: Distributed ©timber.io

Topologies: Centralized ©timber.io

Topologies: Stream-Based ©timber.io

How to use Vector

Source types • ﬁle • statsd • syslog • tcp
• vector • stdin(debug)

[sources.my_file_source_id]    # REQUIRED - General   type = "file"
# must be: "file"  include = ["/var/log/nginx/*.log"]  exclude = [""]  Source conﬁg

[sources.my_tcp_source_id]    # REQUIRED - General   type = "tcp"
# must be: "tcp"  address = ["0.0.0.0:9000"]  Source conﬁg

Sink types • aws ◦ cloudwatch_logs ◦ kinesis_streams ◦ s3
• elasticsearch • http • kafka • prometheus • splunk_hec • tcp • vector • console • blackhole(/dev/null)

[sinks.my_tcp_sink_id]    # REQUIRED - General   type = "tcp"
# must be: "tcp"  input = ["my_tcp_source_id"]  address = ["92.12.333.224:5000"]    # OPTIONAL - Requests   encoding = "json" # default, enum: "json", "text"     Sinks conﬁg

[sinks.my_s3_sink_id]    # REQUIRED - General   type = "s3"
# must be: "s3"  input = ["my_file_source_id"]  bucket = "my-bucket"  region = "ap-northeast-1"  encoding = "ndjson" # enum: "ndjson", "text"     # OPTIONAL - Requests  key_prefix = "date=%F/" # default Sinks conﬁg

[sinks.my_prometheus_sink_id]    # REQUIRED - General   type = "prometheus"
# must be: "prometheus"   input = ["my_log2metrics_source_id"]  address = "0.0.0.0:9598" Sinks conﬁg

Transform types • Fileld ◦ add_fields ◦ remove_filed ◦ filed_filter
• Paser ◦ grok_parser ◦ json_parser ◦ regex_parser ◦ tokenizer • log_to_metric • sampler • lua • vector • console • blackhole(/dev/null)

[transforms.my_regex_trans_id]    # REQUIRED - General  type = "regex_parser" #
must be: "regex_parser"   inputs = ["my_file_source_id"]  regex = "^(?P<host>[\\w\\.]+) - (?P<user>[\\w]+) (?P<bytes_in>[\\d]+) \\[(?P<timestamp>.*)\\] \"(? P<method>[\\w]+) (?P<path>.*)\" (?P<status>[\\d]+) (?P<bytes_out>[\\d]+)$"    # OPTIONAL - Types  [transforms.my_regex_trans_id.types]  status = "int"  method = "string"  bytes_in = "int"  bytes_out = "int" Transform conﬁg

[transforms.my_prometheus_trans_id]    # REQUIRED - General  type = "log_to_metric" #
must be: "log_to_metric"  inputs = ["my_file_source_id"]    # OPTIONAL - Types  [[transforms.my_regex_trans_id.metrics]]  type = "counter" # enum: "counter", "gauge"  field = "duration"   increment_by_value = false  name = "duration_total"  labels = {host = "${HOSTNAME}", region = "us-east-1"}     Transform conﬁg

[sources.logs]   type = 'file'  include = ['/var/log/*.log']    [transforms.tokenizer]
  inputs = ['logs']  type = 'tokenizer'  field_names = ["timestamp", "level", "message"]    [transforms.sampler]   inputs = ['tokenizer']  type = 'sampler'  hash_field = 'request_id'  rate = 10  [sinks.search]   inputs = ['sampler']  type = 'elasticsearch'  host = '123.123.123.123:5000'    [sinks.backup]   inputs = ['tokenizer']  type = 's3'  region = 'ap-northeast-1'  bucket = 'log-backup'  key_prefix = 'date=%F'  Vector conﬁg

  Vector  FluentBit  FluentD  File to TCP  76.7MiB/s  35MiB/s  26.1MiB/s 
Regex Parsing  13.2MiB/s  20.5MiB/s  2.6MiB/s  TCP to HTTP  26.7MiB/s  19.6MiB/s  <1MiB/s  Performance report by Timber.io

  Vector  FluentBit  FluentD  Memory  188.1MiB  370MiB  890MiB  CPU  1.51
1m avg  0.56 1m avg  0.57 1m avg  Performance report by Timber.io

Don't trust the reports. Measure, Measure, Measure!

Measure using GKE • Kubernetes: v1.13.7 • Node x4 ◦
4 CPU ◦ 3.6 GB Memory ◦ 100 GB Storage(Standard) • Manifests ◦ https://github.com/watawuwu/vector-test

Memory Usage Mem usage is low Why ﬂuent-bit uses memory?
Vector  26 MiB/s  Fluent Bit  1.091 GiB/s  Fluentd  92 MiB/s 

CPU Usage CPU usage is high Vector  1.84 core  Fluent
Bit  0.26 core  Fluentd  1.25 core 

IO Throughput Vector Fluentd Fluentd Bit Throughput is low Error
in the test method？ Vector  9.39 MiB/s  Fluent Bit  8.26 MiB/s  Fluentd  13.64 MiB/s 

Roadmap

Roadmap • v0.4 Schemas(current) • v0.5 Stream Consumers • v0.6
Columnar Writing • v0.7 CLI • v0.8 Wire Level Tailing • v1.0 Stable => 2019/12 Release!!

Conclusions

ADAPT  TRIAL  ASSESS  HOLD  watawuwu’s TECH RADAR

Thanks! Kubernetes, Cloud Native zlab.co.jp

データルーター？Vector/Getting Started with Vector

データルーター？Vector/Getting Started with Vector

More Decks by watawuwu

Other Decks in Technology

Featured

Transcript