データルーター?Vector/Getting Started with Vector

E7a86f7090ecce18bc3848324741b04c?s=47 watawuwu
August 07, 2019

データルーター?Vector/Getting Started with Vector

E7a86f7090ecce18bc3848324741b04c?s=128

watawuwu

August 07, 2019
Tweet

Transcript

  1. Getting Started with Vector Cloud native meetup tokyo #9 This

    document includes the work that is distributed in the Apache License 2.0
  2. profile: name: Wataru Matsui org: [ Z Lab, 3bi.tech ]

    twitter: @watawuwu
  3. • What’s Vector? • Usage • VS ... • Roadmap

    • Conclusions Agenda
  4. What’s Vector? https://vector.dev

  5. Logs, Metrics & Events Router Is like Fluentd?

  6. Developed by Timber.io https://timber.io

  7. Feature • Log, Metrics, or Events • Agent Or Service

    • Fast • Correct • Clear Guarantee • Vendor Neutral • Easy To Deploy • Hot Reload
  8. • Fluentd • Fluent Bit • Filebeat • Logstash Similar

    tool
  9.  Summary ©timber.io

  10. ©timber.io

  11. Topologies: Distributed ©timber.io

  12. Topologies: Centralized ©timber.io

  13. Topologies: Stream-Based ©timber.io

  14. How to use Vector

  15. Source types • file • statsd • syslog • tcp

    • vector • stdin(debug)
  16. [sources.my_file_source_id]
 
 # REQUIRED - General 
 type = "file"

    # must be: "file"
 include = ["/var/log/nginx/*.log"]
 exclude = [""]
 Source config
  17. [sources.my_tcp_source_id]
 
 # REQUIRED - General 
 type = "tcp"

    # must be: "tcp"
 address = ["0.0.0.0:9000"]
 Source config
  18. Sink types • aws ◦ cloudwatch_logs ◦ kinesis_streams ◦ s3

    • elasticsearch • http • kafka • prometheus • splunk_hec • tcp • vector • console • blackhole(/dev/null)
  19. [sinks.my_tcp_sink_id]
 
 # REQUIRED - General 
 type = "tcp"

    # must be: "tcp"
 input = ["my_tcp_source_id"]
 address = ["92.12.333.224:5000"]
 
 # OPTIONAL - Requests 
 encoding = "json" # default, enum: "json", "text" 
 
 Sinks config
  20. [sinks.my_s3_sink_id]
 
 # REQUIRED - General 
 type = "s3"

    # must be: "s3"
 input = ["my_file_source_id"]
 bucket = "my-bucket"
 region = "ap-northeast-1"
 encoding = "ndjson" # enum: "ndjson", "text" 
 
 # OPTIONAL - Requests
 key_prefix = "date=%F/" # default Sinks config
  21. [sinks.my_prometheus_sink_id]
 
 # REQUIRED - General 
 type = "prometheus"

    # must be: "prometheus" 
 input = ["my_log2metrics_source_id"]
 address = "0.0.0.0:9598" Sinks config
  22. Transform types • Fileld ◦ add_fields ◦ remove_filed ◦ filed_filter

    • Paser ◦ grok_parser ◦ json_parser ◦ regex_parser ◦ tokenizer • log_to_metric • sampler • lua • vector • console • blackhole(/dev/null)
  23. [transforms.my_regex_trans_id]
 
 # REQUIRED - General
 type = "regex_parser" #

    must be: "regex_parser" 
 inputs = ["my_file_source_id"]
 regex = "^(?P<host>[\\w\\.]+) - (?P<user>[\\w]+) (?P<bytes_in>[\\d]+) \\[(?P<timestamp>.*)\\] \"(? P<method>[\\w]+) (?P<path>.*)\" (?P<status>[\\d]+) (?P<bytes_out>[\\d]+)$"
 
 # OPTIONAL - Types
 [transforms.my_regex_trans_id.types]
 status = "int"
 method = "string"
 bytes_in = "int"
 bytes_out = "int" Transform config
  24. [transforms.my_prometheus_trans_id]
 
 # REQUIRED - General
 type = "log_to_metric" #

    must be: "log_to_metric"
 inputs = ["my_file_source_id"]
 
 # OPTIONAL - Types
 [[transforms.my_regex_trans_id.metrics]]
 type = "counter" # enum: "counter", "gauge"
 field = "duration" 
 increment_by_value = false
 name = "duration_total"
 labels = {host = "${HOSTNAME}", region = "us-east-1"} 
 
 Transform config
  25. [sources.logs] 
 type = 'file'
 include = ['/var/log/*.log']
 
 [transforms.tokenizer]

    
 inputs = ['logs']
 type = 'tokenizer'
 field_names = ["timestamp", "level", "message"]
 
 [transforms.sampler] 
 inputs = ['tokenizer']
 type = 'sampler'
 hash_field = 'request_id'
 rate = 10
 [sinks.search] 
 inputs = ['sampler']
 type = 'elasticsearch'
 host = '123.123.123.123:5000'
 
 [sinks.backup] 
 inputs = ['tokenizer']
 type = 's3'
 region = 'ap-northeast-1'
 bucket = 'log-backup'
 key_prefix = 'date=%F'
 Vector config
  26. VS

  27. 
 Vector
 FluentBit
 FluentD
 File to TCP
 76.7MiB/s
 35MiB/s
 26.1MiB/s


    Regex Parsing
 13.2MiB/s
 20.5MiB/s
 2.6MiB/s
 TCP to HTTP
 26.7MiB/s
 19.6MiB/s
 <1MiB/s
 Performance report by Timber.io
  28. 
 Vector
 FluentBit
 FluentD
 Memory
 188.1MiB
 370MiB
 890MiB
 CPU
 1.51

    1m avg
 0.56 1m avg
 0.57 1m avg
 Performance report by Timber.io
  29. Don't trust the reports. Measure, Measure, Measure!

  30. Measure using GKE • Kubernetes: v1.13.7 • Node x4 ◦

    4 CPU ◦ 3.6 GB Memory ◦ 100 GB Storage(Standard) • Manifests ◦ https://github.com/watawuwu/vector-test
  31. Memory Usage Mem usage is low Why fluent-bit uses memory?

    Vector
 26 MiB/s
 Fluent Bit
 1.091 GiB/s
 Fluentd
 92 MiB/s

  32. CPU Usage CPU usage is high Vector
 1.84 core
 Fluent

    Bit
 0.26 core
 Fluentd
 1.25 core

  33. IO Throughput Vector Fluentd Fluentd Bit Throughput is low Error

    in the test method? Vector
 9.39 MiB/s
 Fluent Bit
 8.26 MiB/s
 Fluentd
 13.64 MiB/s

  34. Roadmap

  35. Roadmap • v0.4 Schemas(current) • v0.5 Stream Consumers • v0.6

    Columnar Writing • v0.7 CLI • v0.8 Wire Level Tailing • v1.0 Stable => 2019/12 Release!!
  36. Conclusions

  37. ADAPT
 TRIAL
 ASSESS
 HOLD
 watawuwu’s TECH RADAR

  38. Thanks! Kubernetes, Cloud Native zlab.co.jp