Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing kmql

Avatar for Yuto Kawamura Yuto Kawamura
December 18, 2020

Introducing kmql

Avatar for Yuto Kawamura

Yuto Kawamura

December 18, 2020
Tweet

More Decks by Yuto Kawamura

Other Decks in Technology

Transcript

  1. Me → Yuto Kawamura → Lead of Kafka infra team

    @ LINE → Performance Engineering/Reliability Engineering → Speaker at Kafka Summit 2017, 2018, Kafka Meetup ... many times
  2. Query Kafka cluster's metadata with SQL $ kmql --bootstrap-servers="YOUR CLUSTER's

    bootstrap.servers" query> SELECT * FROM replicas LIMIT 3; ╔═══════════╤═══════════╤═══════════╤═══════════╤═════════════════════╤════════════╗ ║ TOPIC │ PARTITION │ BROKER_ID │ IS_LEADER │ IS_PREFERRED_LEADER │ IS_IN_SYNC ║ ╠═══════════╪═══════════╪═══════════╪═══════════╪═════════════════════╪════════════╣ ║ topic-xyz │ 0 │ 1 │ true │ true │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 3 │ false │ false │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 2 │ false │ false │ true ║ ╚═══════════╧═══════════╧═══════════╧═══════════╧═════════════════════╧════════════╝
  3. kafka-topics.sh, kafka-configs.sh, kafka- consumer-groups.sh ... → Ok for simple queries

    → Difficult to feed output for another command/ script → Can't query complex conditions $ ./kafka-topics.sh ... --describe --topic topic-xyz Topic: topic-xyz PartitionCount: 6 ReplicationFactor: 3 Configs: min.insync.replicas=2 Topic: topic-xyz Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 2,3,1 Topic: topic-xyz Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 2,3,1
  4. SQL for query, JSON or SSV for output SQL: →

    Can express very complex conditions → Built-in functions for aggregations → SUM(), COUNT(), AVG() JSON/SSV: → Easy for scripts to parse → Easy to chain commands by pipe
  5. Usage - interactive console $ kmql --bootstrap-servers="YOUR CLUSTER's bootstrap.servers" query>

    SELECT * FROM replicas LIMIT 3; ╔═══════════╤═══════════╤═══════════╤═══════════╤═════════════════════╤════════════╗ ║ TOPIC │ PARTITION │ BROKER_ID │ IS_LEADER │ IS_PREFERRED_LEADER │ IS_IN_SYNC ║ ╠═══════════╪═══════════╪═══════════╪═══════════╪═════════════════════╪════════════╣ ║ topic-xyz │ 0 │ 1 │ true │ true │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 3 │ false │ false │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 2 │ false │ false │ true ║ ╚═══════════╧═══════════╧═══════════╧═══════════╧═════════════════════╧════════════╝
  6. Usage - processing the output JSON: kmql ... -e "SELECT

    * FROM replicas LIMIT 3" --format=json | jq . [ { "TOPIC": "topic-xyz", "PARTITION": 0, "BROKER_ID": 1, "IS_LEADER": true, "IS_PREFERRED_LEADER": true, "IS_IN_SYNC": true }, ... ]
  7. Usage - processing the output SSV (Space-Separated-Values): $ kmql ...

    -e "SELECT * FROM replicas LIMIT 3" --format=ssv | grep ... # TOPIC PARTITION BROKER_ID IS_LEADER IS_PREFERRED_LEADER IS_IN_SYNC topic-xyz 0 1 true true true topic-xyz 0 3 false false true topic-xyz 0 2 false false true
  8. Features → Table caching → Keyword (table names, column names)

    completion → ANSI SQL (thanks to the underlying H2 DB)
  9. Supported Tables → replicas - topic, partition, assigned broker, ISR

    state etc. → brokers - brokers in the cluster, hostname, port, rack, and controllership → logdirs - directories storing topic data, topic, partition, size → configs - static/dynamic configurations like min.insync.replicas, retention.ms → consumers - all consumer groups, coordinator broker, group state, host and topic/partitions → acls - ACLs. (To-Be-Come)
  10. Simple Ones # Dump replica info for the specific topic

    SELECT * FROM replicas WHERE topic = 'topic-name' # Topic/partitions which its leader is assigned to broker 1 SELECT topic, partition FROM replicas WHERE broker_id = 1 AND is_leader # Topic partitions that has an out-of-sync replica SELECT DISTINCT topic, partition FROM replicas WHERE NOT is_in_sync
  11. Broker hostnames along with topic/ partitions that has out-of-sync replicas

    SELECT host, topic, partition FROM replicas JOIN brokers ON broker_id = brokers.id WHERE NOT is_in_sync; ╔═════════════╤═══════════╤═══════════╗ ║ HOST │ TOPIC │ PARTITION ║ ╠═════════════╪═══════════╪═══════════╣ ║ host008.com │ topic-xyz │ 0 ║ ╟─────────────┼───────────┼───────────╢ ║ host009.com │ topic-xyz │ 2 ║ ╟─────────────┼───────────┼───────────╢ ║ host012.com │ topic-xyz │ 8 ║ ╚═════════════╧═══════════╧═══════════╝
  12. Top disk space consuming cleanup.policy=compaction topics SELECT topic, SUM(size) as

    total_size FROM logdirs JOIN configs ON topic = configs.name WHERE resource_type = 'topic' AND key = 'cleanup.policy' AND value = 'compact' GROUP BY topic ORDER BY total_size DESC LIMIT 3; ╔═══════════════════════════╤═════════════╗ ║ TOPIC │ TOTAL_SIZE ║ ╠═══════════════════════════╪═════════════╣ ║ large-compaction-topicA │ 30732396152 ║ ╟───────────────────────────┼─────────────╢ ║ middle-compaction-topic │ 24356079436 ║ ╟───────────────────────────┼─────────────╢ ║ so-so-topic │ 15073406600 ║ ╚═══════════════════════════╧═════════════╝
  13. Consumer group along with their host that is consuming any

    topic from broker=3 SELECT DISTINCT group_id, host FROM consumers JOIN replicas ON consumers.topic = replicas.topic AND consumers.partition = replicas.partition WHERE broker_id = 3 AND is_leader; ╔══════════════════════════════════════════╤═════════════════╗ ║ GROUP_ID │ HOST ║ ╠══════════════════════════════════════════╪═════════════════╣ ║ group-A │ /xxx.yy.123.251 ║ ╟──────────────────────────────────────────┼─────────────────╢ ║ group-B │ /xxx.yy.123.76 ║ ╟──────────────────────────────────────────┼─────────────────╢ ║ group-C │ /xxx.yy.123.162 ║ ╚══════════════════════════════════════════╧═════════════════╝