Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
データルーター?Vector/Getting Started with Vector
watawuwu
August 07, 2019
Technology
6
580
データルーター?Vector/Getting Started with Vector
watawuwu
August 07, 2019
Tweet
Share
More Decks by watawuwu
See All by watawuwu
watawuwu
0
4.1k
watawuwu
10
1.2k
watawuwu
1
580
watawuwu
0
740
watawuwu
3
1.7k
Other Decks in Technology
See All in Technology
line_developers
PRO
1
550
junendo
0
130
kazeburo
1
120
supership
0
180
hmatsu47
1
170
hmatsu47
2
140
jaguar_imo
0
110
gracia
0
940
legalforce
PRO
0
160
charity
9
12k
comucal
PRO
0
300
tj8000rpm
0
170
Featured
See All Featured
tanoku
86
8.6k
gr2m
83
11k
trallard
15
800
holman
288
130k
maltzj
502
36k
reverentgeek
167
7.3k
matthewcrist
73
7.5k
shlominoach
176
7.6k
deanohume
294
28k
hannesfritz
28
970
bryan
32
3.5k
myddelton
109
11k
Transcript
Getting Started with Vector Cloud native meetup tokyo #9 This
document includes the work that is distributed in the Apache License 2.0
profile: name: Wataru Matsui org: [ Z Lab, 3bi.tech ]
twitter: @watawuwu
• What’s Vector? • Usage • VS ... • Roadmap
• Conclusions Agenda
What’s Vector? https://vector.dev
Logs, Metrics & Events Router Is like Fluentd?
Developed by Timber.io https://timber.io
Feature • Log, Metrics, or Events • Agent Or Service
• Fast • Correct • Clear Guarantee • Vendor Neutral • Easy To Deploy • Hot Reload
• Fluentd • Fluent Bit • Filebeat • Logstash Similar
tool
Summary ©timber.io
©timber.io
Topologies: Distributed ©timber.io
Topologies: Centralized ©timber.io
Topologies: Stream-Based ©timber.io
How to use Vector
Source types • file • statsd • syslog • tcp
• vector • stdin(debug)
[sources.my_file_source_id] # REQUIRED - General type = "file"
# must be: "file" include = ["/var/log/nginx/*.log"] exclude = [""] Source config
[sources.my_tcp_source_id] # REQUIRED - General type = "tcp"
# must be: "tcp" address = ["0.0.0.0:9000"] Source config
Sink types • aws ◦ cloudwatch_logs ◦ kinesis_streams ◦ s3
• elasticsearch • http • kafka • prometheus • splunk_hec • tcp • vector • console • blackhole(/dev/null)
[sinks.my_tcp_sink_id] # REQUIRED - General type = "tcp"
# must be: "tcp" input = ["my_tcp_source_id"] address = ["92.12.333.224:5000"] # OPTIONAL - Requests encoding = "json" # default, enum: "json", "text" Sinks config
[sinks.my_s3_sink_id] # REQUIRED - General type = "s3"
# must be: "s3" input = ["my_file_source_id"] bucket = "my-bucket" region = "ap-northeast-1" encoding = "ndjson" # enum: "ndjson", "text" # OPTIONAL - Requests key_prefix = "date=%F/" # default Sinks config
[sinks.my_prometheus_sink_id] # REQUIRED - General type = "prometheus"
# must be: "prometheus" input = ["my_log2metrics_source_id"] address = "0.0.0.0:9598" Sinks config
Transform types • Fileld ◦ add_fields ◦ remove_filed ◦ filed_filter
• Paser ◦ grok_parser ◦ json_parser ◦ regex_parser ◦ tokenizer • log_to_metric • sampler • lua • vector • console • blackhole(/dev/null)
[transforms.my_regex_trans_id] # REQUIRED - General type = "regex_parser" #
must be: "regex_parser" inputs = ["my_file_source_id"] regex = "^(?P<host>[\\w\\.]+) - (?P<user>[\\w]+) (?P<bytes_in>[\\d]+) \\[(?P<timestamp>.*)\\] \"(? P<method>[\\w]+) (?P<path>.*)\" (?P<status>[\\d]+) (?P<bytes_out>[\\d]+)$" # OPTIONAL - Types [transforms.my_regex_trans_id.types] status = "int" method = "string" bytes_in = "int" bytes_out = "int" Transform config
[transforms.my_prometheus_trans_id] # REQUIRED - General type = "log_to_metric" #
must be: "log_to_metric" inputs = ["my_file_source_id"] # OPTIONAL - Types [[transforms.my_regex_trans_id.metrics]] type = "counter" # enum: "counter", "gauge" field = "duration" increment_by_value = false name = "duration_total" labels = {host = "${HOSTNAME}", region = "us-east-1"} Transform config
[sources.logs] type = 'file' include = ['/var/log/*.log'] [transforms.tokenizer]
inputs = ['logs'] type = 'tokenizer' field_names = ["timestamp", "level", "message"] [transforms.sampler] inputs = ['tokenizer'] type = 'sampler' hash_field = 'request_id' rate = 10 [sinks.search] inputs = ['sampler'] type = 'elasticsearch' host = '123.123.123.123:5000' [sinks.backup] inputs = ['tokenizer'] type = 's3' region = 'ap-northeast-1' bucket = 'log-backup' key_prefix = 'date=%F' Vector config
VS
Vector FluentBit FluentD File to TCP 76.7MiB/s 35MiB/s 26.1MiB/s
Regex Parsing 13.2MiB/s 20.5MiB/s 2.6MiB/s TCP to HTTP 26.7MiB/s 19.6MiB/s <1MiB/s Performance report by Timber.io
Vector FluentBit FluentD Memory 188.1MiB 370MiB 890MiB CPU 1.51
1m avg 0.56 1m avg 0.57 1m avg Performance report by Timber.io
Don't trust the reports. Measure, Measure, Measure!
Measure using GKE • Kubernetes: v1.13.7 • Node x4 ◦
4 CPU ◦ 3.6 GB Memory ◦ 100 GB Storage(Standard) • Manifests ◦ https://github.com/watawuwu/vector-test
Memory Usage Mem usage is low Why fluent-bit uses memory?
Vector 26 MiB/s Fluent Bit 1.091 GiB/s Fluentd 92 MiB/s
CPU Usage CPU usage is high Vector 1.84 core Fluent
Bit 0.26 core Fluentd 1.25 core
IO Throughput Vector Fluentd Fluentd Bit Throughput is low Error
in the test method? Vector 9.39 MiB/s Fluent Bit 8.26 MiB/s Fluentd 13.64 MiB/s
Roadmap
Roadmap • v0.4 Schemas(current) • v0.5 Stream Consumers • v0.6
Columnar Writing • v0.7 CLI • v0.8 Wire Level Tailing • v1.0 Stable => 2019/12 Release!!
Conclusions
ADAPT TRIAL ASSESS HOLD watawuwu’s TECH RADAR
Thanks! Kubernetes, Cloud Native zlab.co.jp