データルーター？Vector/Getting Started with Vector

Slide 1

Slide 1 text

Getting Started with Vector Cloud native meetup tokyo #9 This document includes the work that is distributed in the Apache License 2.0

Slide 2

Slide 2 text

proﬁle: name: Wataru Matsui org: [ Z Lab, 3bi.tech ] twitter: @watawuwu

Slide 3

Slide 3 text

● What’s Vector? ● Usage ● VS ... ● Roadmap ● Conclusions Agenda

Slide 4

Slide 4 text

What’s Vector？ https://vector.dev

Slide 5

Slide 5 text

Logs, Metrics & Events Router Is like Fluentd?

Slide 6

Slide 6 text

Developed by Timber.io https://timber.io

Slide 7

Slide 7 text

Feature ● Log, Metrics, or Events ● Agent Or Service ● Fast ● Correct ● Clear Guarantee ● Vendor Neutral ● Easy To Deploy ● Hot Reload

Slide 8

Slide 8 text

● Fluentd ● Fluent Bit ● Filebeat ● Logstash Similar tool

Slide 9

Slide 9 text

　Summary ©timber.io

Slide 10

Slide 10 text

©timber.io

Slide 11

Slide 11 text

Topologies: Distributed ©timber.io

Slide 12

Slide 12 text

Topologies: Centralized ©timber.io

Slide 13

Slide 13 text

Topologies: Stream-Based ©timber.io

Slide 14

Slide 14 text

How to use Vector

Slide 15

Slide 15 text

Source types ● ﬁle ● statsd ● syslog ● tcp ● vector ● stdin(debug)

Slide 16

Slide 16 text

[sources.my_file_source_id]    # REQUIRED - General   type = "file" # must be: "file"  include = ["/var/log/nginx/*.log"]  exclude = [""]  Source conﬁg

Slide 17

Slide 17 text

[sources.my_tcp_source_id]    # REQUIRED - General   type = "tcp" # must be: "tcp"  address = ["0.0.0.0:9000"]  Source conﬁg

Slide 18

Slide 18 text

Sink types ● aws ○ cloudwatch_logs ○ kinesis_streams ○ s3 ● elasticsearch ● http ● kafka ● prometheus ● splunk_hec ● tcp ● vector ● console ● blackhole(/dev/null)

Slide 19

Slide 19 text

[sinks.my_tcp_sink_id]    # REQUIRED - General   type = "tcp" # must be: "tcp"  input = ["my_tcp_source_id"]  address = ["92.12.333.224:5000"]    # OPTIONAL - Requests   encoding = "json" # default, enum: "json", "text"     Sinks conﬁg

Slide 20

Slide 20 text

[sinks.my_s3_sink_id]    # REQUIRED - General   type = "s3" # must be: "s3"  input = ["my_file_source_id"]  bucket = "my-bucket"  region = "ap-northeast-1"  encoding = "ndjson" # enum: "ndjson", "text"     # OPTIONAL - Requests  key_prefix = "date=%F/" # default Sinks conﬁg

Slide 21

Slide 21 text

[sinks.my_prometheus_sink_id]    # REQUIRED - General   type = "prometheus" # must be: "prometheus"   input = ["my_log2metrics_source_id"]  address = "0.0.0.0:9598" Sinks conﬁg

Slide 22

Slide 22 text

Transform types ● Fileld ○ add_fields ○ remove_filed ○ filed_filter ● Paser ○ grok_parser ○ json_parser ○ regex_parser ○ tokenizer ● log_to_metric ● sampler ● lua ● vector ● console ● blackhole(/dev/null)

Slide 23

Slide 23 text

[transforms.my_regex_trans_id]    # REQUIRED - General  type = "regex_parser" # must be: "regex_parser"   inputs = ["my_file_source_id"]  regex = "^(?P[\\w\\.]+) - (?P[\\w]+) (?P[\\d]+) \\[(?P.*)\\] \"(? P[\\w]+) (?P.*)\" (?P[\\d]+) (?P[\\d]+)$"    # OPTIONAL - Types  [transforms.my_regex_trans_id.types]  status = "int"  method = "string"  bytes_in = "int"  bytes_out = "int" Transform conﬁg

Slide 24

Slide 24 text

[transforms.my_prometheus_trans_id]    # REQUIRED - General  type = "log_to_metric" # must be: "log_to_metric"  inputs = ["my_file_source_id"]    # OPTIONAL - Types  [[transforms.my_regex_trans_id.metrics]]  type = "counter" # enum: "counter", "gauge"  field = "duration"   increment_by_value = false  name = "duration_total"  labels = {host = "${HOSTNAME}", region = "us-east-1"}     Transform conﬁg

Slide 25

Slide 25 text

[sources.logs]   type = 'file'  include = ['/var/log/*.log']    [transforms.tokenizer]   inputs = ['logs']  type = 'tokenizer'  field_names = ["timestamp", "level", "message"]    [transforms.sampler]   inputs = ['tokenizer']  type = 'sampler'  hash_field = 'request_id'  rate = 10  [sinks.search]   inputs = ['sampler']  type = 'elasticsearch'  host = '123.123.123.123:5000'    [sinks.backup]   inputs = ['tokenizer']  type = 's3'  region = 'ap-northeast-1'  bucket = 'log-backup'  key_prefix = 'date=%F'  Vector conﬁg

Slide 26

Slide 26 text

Slide 27

Slide 27 text

  Vector  FluentBit  FluentD  File to TCP  76.7MiB/s  35MiB/s  26.1MiB/s  Regex Parsing  13.2MiB/s  20.5MiB/s  2.6MiB/s  TCP to HTTP  26.7MiB/s  19.6MiB/s  <1MiB/s  Performance report by Timber.io

Slide 28

Slide 28 text

  Vector  FluentBit  FluentD  Memory  188.1MiB  370MiB  890MiB  CPU  1.51 1m avg  0.56 1m avg  0.57 1m avg  Performance report by Timber.io

Slide 29

Slide 29 text

Don't trust the reports. Measure, Measure, Measure!

Slide 30

Slide 30 text

Measure using GKE ● Kubernetes: v1.13.7 ● Node x4 ○ 4 CPU ○ 3.6 GB Memory ○ 100 GB Storage(Standard) ● Manifests ○ https://github.com/watawuwu/vector-test

Slide 31

Slide 31 text

Memory Usage Mem usage is low Why ﬂuent-bit uses memory? Vector  26 MiB/s  Fluent Bit  1.091 GiB/s  Fluentd  92 MiB/s 

Slide 32

Slide 32 text

CPU Usage CPU usage is high Vector  1.84 core  Fluent Bit  0.26 core  Fluentd  1.25 core 

Slide 33

Slide 33 text

IO Throughput Vector Fluentd Fluentd Bit Throughput is low Error in the test method？ Vector  9.39 MiB/s  Fluent Bit  8.26 MiB/s  Fluentd  13.64 MiB/s 

Slide 34

Slide 34 text

Roadmap

Slide 35

Slide 35 text

Roadmap ● v0.4 Schemas(current) ● v0.5 Stream Consumers ● v0.6 Columnar Writing ● v0.7 CLI ● v0.8 Wire Level Tailing ● v1.0 Stable => 2019/12 Release!!