Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing kmql

Yuto Kawamura
December 18, 2020

Introducing kmql

Yuto Kawamura

December 18, 2020
Tweet

More Decks by Yuto Kawamura

Other Decks in Technology

Transcript

  1. Me → Yuto Kawamura → Lead of Kafka infra team

    @ LINE → Performance Engineering/Reliability Engineering → Speaker at Kafka Summit 2017, 2018, Kafka Meetup ... many times
  2. Query Kafka cluster's metadata with SQL $ kmql --bootstrap-servers="YOUR CLUSTER's

    bootstrap.servers" query> SELECT * FROM replicas LIMIT 3; ╔═══════════╤═══════════╤═══════════╤═══════════╤═════════════════════╤════════════╗ ║ TOPIC │ PARTITION │ BROKER_ID │ IS_LEADER │ IS_PREFERRED_LEADER │ IS_IN_SYNC ║ ╠═══════════╪═══════════╪═══════════╪═══════════╪═════════════════════╪════════════╣ ║ topic-xyz │ 0 │ 1 │ true │ true │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 3 │ false │ false │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 2 │ false │ false │ true ║ ╚═══════════╧═══════════╧═══════════╧═══════════╧═════════════════════╧════════════╝
  3. kafka-topics.sh, kafka-configs.sh, kafka- consumer-groups.sh ... → Ok for simple queries

    → Difficult to feed output for another command/ script → Can't query complex conditions $ ./kafka-topics.sh ... --describe --topic topic-xyz Topic: topic-xyz PartitionCount: 6 ReplicationFactor: 3 Configs: min.insync.replicas=2 Topic: topic-xyz Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 2,3,1 Topic: topic-xyz Partition: 1 Leader: 1 Replicas: 1,2,3 Isr: 2,3,1
  4. SQL for query, JSON or SSV for output SQL: →

    Can express very complex conditions → Built-in functions for aggregations → SUM(), COUNT(), AVG() JSON/SSV: → Easy for scripts to parse → Easy to chain commands by pipe
  5. Usage - interactive console $ kmql --bootstrap-servers="YOUR CLUSTER's bootstrap.servers" query>

    SELECT * FROM replicas LIMIT 3; ╔═══════════╤═══════════╤═══════════╤═══════════╤═════════════════════╤════════════╗ ║ TOPIC │ PARTITION │ BROKER_ID │ IS_LEADER │ IS_PREFERRED_LEADER │ IS_IN_SYNC ║ ╠═══════════╪═══════════╪═══════════╪═══════════╪═════════════════════╪════════════╣ ║ topic-xyz │ 0 │ 1 │ true │ true │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 3 │ false │ false │ true ║ ╟───────────┼───────────┼───────────┼───────────┼─────────────────────┼────────────╢ ║ topic-xyz │ 0 │ 2 │ false │ false │ true ║ ╚═══════════╧═══════════╧═══════════╧═══════════╧═════════════════════╧════════════╝
  6. Usage - processing the output JSON: kmql ... -e "SELECT

    * FROM replicas LIMIT 3" --format=json | jq . [ { "TOPIC": "topic-xyz", "PARTITION": 0, "BROKER_ID": 1, "IS_LEADER": true, "IS_PREFERRED_LEADER": true, "IS_IN_SYNC": true }, ... ]
  7. Usage - processing the output SSV (Space-Separated-Values): $ kmql ...

    -e "SELECT * FROM replicas LIMIT 3" --format=ssv | grep ... # TOPIC PARTITION BROKER_ID IS_LEADER IS_PREFERRED_LEADER IS_IN_SYNC topic-xyz 0 1 true true true topic-xyz 0 3 false false true topic-xyz 0 2 false false true
  8. Features → Table caching → Keyword (table names, column names)

    completion → ANSI SQL (thanks to the underlying H2 DB)
  9. Supported Tables → replicas - topic, partition, assigned broker, ISR

    state etc. → brokers - brokers in the cluster, hostname, port, rack, and controllership → logdirs - directories storing topic data, topic, partition, size → configs - static/dynamic configurations like min.insync.replicas, retention.ms → consumers - all consumer groups, coordinator broker, group state, host and topic/partitions → acls - ACLs. (To-Be-Come)
  10. Simple Ones # Dump replica info for the specific topic

    SELECT * FROM replicas WHERE topic = 'topic-name' # Topic/partitions which its leader is assigned to broker 1 SELECT topic, partition FROM replicas WHERE broker_id = 1 AND is_leader # Topic partitions that has an out-of-sync replica SELECT DISTINCT topic, partition FROM replicas WHERE NOT is_in_sync
  11. Broker hostnames along with topic/ partitions that has out-of-sync replicas

    SELECT host, topic, partition FROM replicas JOIN brokers ON broker_id = brokers.id WHERE NOT is_in_sync; ╔═════════════╤═══════════╤═══════════╗ ║ HOST │ TOPIC │ PARTITION ║ ╠═════════════╪═══════════╪═══════════╣ ║ host008.com │ topic-xyz │ 0 ║ ╟─────────────┼───────────┼───────────╢ ║ host009.com │ topic-xyz │ 2 ║ ╟─────────────┼───────────┼───────────╢ ║ host012.com │ topic-xyz │ 8 ║ ╚═════════════╧═══════════╧═══════════╝
  12. Top disk space consuming cleanup.policy=compaction topics SELECT topic, SUM(size) as

    total_size FROM logdirs JOIN configs ON topic = configs.name WHERE resource_type = 'topic' AND key = 'cleanup.policy' AND value = 'compact' GROUP BY topic ORDER BY total_size DESC LIMIT 3; ╔═══════════════════════════╤═════════════╗ ║ TOPIC │ TOTAL_SIZE ║ ╠═══════════════════════════╪═════════════╣ ║ large-compaction-topicA │ 30732396152 ║ ╟───────────────────────────┼─────────────╢ ║ middle-compaction-topic │ 24356079436 ║ ╟───────────────────────────┼─────────────╢ ║ so-so-topic │ 15073406600 ║ ╚═══════════════════════════╧═════════════╝
  13. Consumer group along with their host that is consuming any

    topic from broker=3 SELECT DISTINCT group_id, host FROM consumers JOIN replicas ON consumers.topic = replicas.topic AND consumers.partition = replicas.partition WHERE broker_id = 3 AND is_leader; ╔══════════════════════════════════════════╤═════════════════╗ ║ GROUP_ID │ HOST ║ ╠══════════════════════════════════════════╪═════════════════╣ ║ group-A │ /xxx.yy.123.251 ║ ╟──────────────────────────────────────────┼─────────────────╢ ║ group-B │ /xxx.yy.123.76 ║ ╟──────────────────────────────────────────┼─────────────────╢ ║ group-C │ /xxx.yy.123.162 ║ ╚══════════════════════════════════════════╧═════════════════╝