Kubernetes 上で KVS をマネージドっぽく使いたい！ / Managed KVS on Kubernetes

Kubernetes 上で KVS を  マネージドっぽく使いたい！  @zuiurs  Cloud Native Meetup Tokyo
#10  2019/09/26 

自己紹介  2  • Mizuki Urushida @zuiurs  • 2018 年 CyberAgent,
Inc. 入社  ◦ AI 事業本部横断インフラ  • 仕事  ◦ AKE (Kubernetes as a Service) の開発  ◦ 内部 Go 勉強会  ◦ OpenStack とかネットワークとか少し 

Kubernetes 上で KVS を  マネージドっぽく使いたい！  @zuiurs  Cloud Native Meetup Tokyo
#10  2019/09/26 

5  【前提】    マネージドサービスはすごい 

6  【でも】    ・Kubernetes 上で全て完結させたい  ・オンプレのリソースを有効活用したい 

アジェンダ  7  • Kubernetes Operator (3 min)  • Redis Operator
(17 min)  ◦ 利用方法  ◦ Sentinel のアーキテクチャ  ◦ ElastiCache とのパフォーマンス比較  ◦ 各種細かい設定  • TiKV (10 min)  ◦ 利用方法  ◦ アーキテクチャ  ◦ パフォーマンス 

Kubernetes Operator  8 

Stateful Application が使いやすくなってきた  9  • Kubernetes Operator の普及  ◦ 人
(オペレーター) が行っていた運用を自動化する仕組み  • Operator ≒ CRD + CustomController  ◦ CRD: Declarative API  ◦ CustomController: API Logic  apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: samples.zuiurs.com spec: group: zuiurs.com version: v1 scope: Namespaced names: kind: Sample plural: samples Operator  Sample  管理・運用  CRD 

10  Operator でマネージドっぽく運用したい 

Redis Operator  11 

Redis Operator  12  • HA 構成の Redis を管理するための Operator  ◦
HA は Redis Sentinel で実現  ◦ https://github.com/spotahome/redis-operator  • 特徴  ◦ 何も考えずに Redis の HA 構成を用意できる  ◦ 一発で Redis エンドポイントを払い出せる  ◦ Read Replica を簡単にスケールできる  ◦ クラスタ内アクセスを想定  ◦ Sentinel なのでシャーディングではない (Write はシングル) 

Operator のインストール  13  • Helm Chart or 設定ファイルの直接適用  • Helm
推奨  ◦ パラメータの変更が容易  ◦ Operator の管理がしやすくなる  $ git clone https://github.com/spotahome/redis-operator.git $ cd redis-operator # Helm (Recommended) $ helm install --name redis-ha charts/redisoperator # or # apply configuration file directly $ kubectl apply -f example/operator/all-redis-operator-resources.yaml

Operator のインストール Tips  14  • 専用の RBAC を作成するように設定  ◦ デフォルトの
values.yaml では RBAC を作らない  ◦ ServiceAccount default は CRD を作る権限を持たない  • Namespace を分ける  ◦ ユーザーは Operator を基本的に触らないので別の空間に隠す  # values.yaml 内で定義しても OK  $ helm install \ --name redis-ha \ --set rbac.install=true \ # パラメータ定義  --namespace operators \  charts/redisoperator

15  Namespace  operators  Redis  Operator  Namespace  default  Pod  Service Legend 
 

Redis のデプロイ  16  • RedisFailover リソースを定義  • 各コンポーネントの設定  ◦ sentinel
と redis ◦ 詳細な機能は後述 apiVersion: databases.spotahome.com/v1 kind: RedisFailover metadata: name: redis spec: sentinel: replicas: 3 resources: requests: cpu: 100m redis: replicas: 3 resources: requests: cpu: 100m memory: 100Mi

17  rfr- は Redis、rfs- は Sentinel  $ kubectl get pod,svc
NAME READY STATUS RESTARTS AGE pod/rfr-redis-0 1/1 Running 0 4d19h pod/rfr-redis-1 1/1 Running 0 4d19h pod/rfr-redis-2 1/1 Running 0 4d19h pod/rfs-redis-7cb5779964-9vmpd 1/1 Running 0 4d19h pod/rfs-redis-7cb5779964-c2gb9 1/1 Running 0 4d15h pod/rfs-redis-7cb5779964-hlpfr 1/1 Running 0 4d15h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 38d service/rfs-redis ClusterIP 10.106.233.2 <none> 26379/TCP 4d19h

18  Namespace  operators  Redis  Operator  Namespace  default  Redis  Redis  Redis 
Redis  Sentinel  Redis  Sentinel  Redis  Sentinel  Pod  ClusterIP Service Legend  Deployment  StatefulSet   

Redis  Sentinel  Redis  Sentinel  Redis  Sentinel  Pod  ClusterIP Service Legend  RedisFailover  管理 

20  Redis の Master で SET/GET してみる  $ redis-cli -h
rfs-redis -p 26379 \ > sentinel get-master-addr-by-name mymaster 1) "10.112.0.29" 2) "6379" $ redis-cli -h 10.112.0.29 \ > set hello world OK $ redis-cli -h 10.112.0.29 \ > get hello "world"

21  Sentinel  

22  簡単に Sentinel の説明 

Redis の提供する冗長化  23  • Replication  ◦ slaveof 設定で Master のデータをコピーするだけ 
◦ Master が SPoF なのでそのまま運用する人はいない  • Cluster  ◦ Sharding の仕組み (Write がスケール)  ◦ とあるクラウドのマネージド KVS では  これが使われてる  • Sentinel  ◦ Replication を High Availability にしてくれる仕組み  ◦ 次のスライドで説明→ Redis (Master) Redis (Slave) Redis (Slave) SET key value SET key value Redis (Master0) Redis (Slave) Redis (Master2) Redis (Slave) Redis (Master1) Redis (Slave) 0~5461 5462~10922 10923~16383 SET hello world hello はこっち 

Redis Sentinel  24  • Replication の状態を監視して設定を管理  ◦ Master が落ちたら Slave
を Master に昇格させる (slaveof を切る)  ◦ 他の Slave を新 Master に向かせる (slaveof の再設定)  • Master/Slave の情報は Sentinel から教えてもらう Redis (Master) Redis (Slave) Redis (Slave) Redis (Sentinel) Redis (Sentinel) Redis (Sentinel) Replication 

Redis Sentinel  25  • Replication の状態を監視して設定を管理  ◦ Master が落ちたら Slave
を Master に昇格させる (slaveof を切る)  ◦ 他の Slave を新 Master に向かせる (slaveof の再設定)  • Master/Slave の情報は Sentinel から教えてもらう Redis (Master) Redis (Master) Redis (Slave) Redis (Sentinel) Redis (Sentinel) Redis (Sentinel) Replication  設定変更  設定変更 

Redis  Sentinel  Redis  Sentinel  Redis  Sentinel  Pod  ClusterIP Service Legend  Deployment  StatefulSet 

Redis Operator 機能紹介  27  • 対応バージョン  ◦ コンテナイメージがあれば使える  ▪ https://hub.docker.com/_/redis 
◦ 2.8 ~ 5.0.5 が Sentinel 2 なのでそれ以上が良さそう  ◦ Rolling Update は Master の位置によっては複数回 Failover が走る  • バックアップ  ◦ .spec.redis.customConfig[] に Redis の設定を書ける  ▪ - "save 300 10" みたいに 1 行ずつ  ◦ データディレクトリを RWX な Volume にしておいて CronJob でコピー  • Affinity  ◦ .spec.redis.affinity を設定  バージョンダウンは RDB が後方互換性を持たないので大体 NG MEMO

Failover 設定  28  • .spec.sentinel.customConfig[]で設定可能 (2 項目)  ◦ down-after-milliseconds 1000
にすると 1 秒 (デフォルト 5 秒)  ◦ failover-timeout 3000 で Failover 完了までの待ち時間を設定  • 注意  ◦ オレオレ書式  ▪ 正: down-after-milliseconds 1000 ▪ 誤: sentinel down-after-milliseconds mas 1000 ▪ .spec.redis.customConfig[] は通常書式なので安心して OK  ◦ 設定するなら両方一緒に設定する  ▪ 一方のみ設定した場合、片方が定義されない (bug)  ▪ Operator のデフォルト値ですらなく Redis のデフォルト 

性能測定  29  • memtier_benchmark を使用  ◦ Cluster 検証の流れで redis-benchmark ではなくこちらを使用 
◦ RedisLabs/memtier_benchmark  ◦ Pipeline 数を調整して最大のパフォーマンスを比較  ◦ マネージドの方は Redis Cluster にはしていません  Flavor  Core  RAM  (GiB)  BW  (Gbps)  CPU  (Xeon)  Single Thread Rate  とあるオンプレ k8s ノード  4  14  ~10  E5-2680 v3 @2.5GHz   1862  とあるマネージド  インスタンス  4  16  1~10  Platinum 8175M @ 2.5GHz   1796  cpu scored by cpubenchmark.net  CPU がボトルネックになるように選定 MEMO

最大 Ops  30  • SET/GET の秒間リクエストを計測  • GET (Read) は大きく差がある 
◦ CPU キャッシュの性能に依存  している？  • SET (Write) は少し差がある  ◦ バスクロックに依存している？  • 新しい CPU は良さそう  E5-2680 v3 @2.5GHz   1862  Platinum 8175M @ 2.5GHz   1796  Redis はシングルスレッド MEMO

31  おまけ  (サーバーサイドの人へ) 

Sentinel 特有のクラスタアクセス方法 (Operator 版)  32  1. ClusterIP を通して Sentinel にアクセス 
2. 接続したい Role の IP:Port をリクエスト  a. Master: SENTINEL get-master-addr-by-name <master_name> b. Slave: SENTINEL slaves <master_name> 3. 得られた IP:Port で再度アクセス  4. ROLE で Role を確認してから使う  a. KeepAlive で接続する  b. Failover 時には必ず切断されるので  深いことは考えずに同じ手順を行う  Redis  Redis  Redis  Redis  Sentinel  Redis  Sentinel  Redis  Sentinel  ClusterIP Client  公式: https://redis.io/topics/sentinel-clients 

所感  33  • Sentinel が理解できてしまえばシンプルで使いやすい  • シングルスレッドなので構成は考えやすい  ◦ Node 数とか
Replica 数  • クラスタ内部からのみアクセス可能なので若干使いづらい  ◦ Master/Slave に対する LB が勝手に作られればいいけど  Sentinel の思想からは外れそう  • 本番運用できるかも  ◦ Sentinel を利用しているだけなので安定感がある  ◦ Operator としての成熟度もある程度高い 

TiKV  34 

35  https://www.cncf.io/projects/   (20)  (16)  (6) 

36  https://www.cncf.io/projects/  

37  https://www.cncf.io/projects/  

38  https://www.cncf.io/blog/2019/05/21/toc-votes-to-move-tikv-into-cncf-incubator/ 

TiKV  39  • Rust で実装された分散 KVS  ◦ Google Spanner と
Apache HBase の影響を受けている  ◦ 2019/05 に Incubating に昇格  • TiDB のバックエンドストアとして PingCap が開発  ◦ TiDB は MySQL 互換の分散 HTAP データベース  • Incubating になったしこれから流行るかも？  • 特徴  ◦ 分散 KVS (アーキテクチャ良さそう)  ◦ 機能やドキュメントは未熟  ◦ クラスタ内アクセスを想定 (改善中)  「たいけーゔぃー」 MEMO

TiKV アーキテクチャ登場人物  40  • TiKV  ◦ 分散 KVS  ◦ Region
というデータの塊を複数のノードで共有して Raft で管理  ◦ トランザクション時のロックは Optimistic  • Placement Driver (PD)  ◦ TiKV クラスタを管理するやつ  ◦ 負荷やデータのバランシングをして Auto-sharding をする  ◦ (違うけど) Sentinel 的ポジションを想像するとわかりやすい  • RocksDB  ◦ TiKV のバックエンド DB ライブラリ  ◦ データ構造に LSM Tree を採用  3.0 以降は Pessimistic にもできる MEMO

41  【データへのアクセス方法 (ざっくり)】    PD にデータ (Region) の場所を問い合わせて  Client がリクエストを送る 
(gRPC でこのフローのレイテンシを小さくしている)  TiKV-0 TiKV-1 TiKV-2 PD-0 PD-1 PD-2 Client GET/PUT  Raft グループ間で同期  

42  Raft  

Raft (ざっくり)  43  • 分散合意アルゴリズム  ◦ 複数のノード間で 1 つの一貫した値を取る  •
ノードの状態  ◦ Leader  ▪ Read/Write 全てのリクエストを処理する  ▪ 結果をログとして Follower に伝達する  ◦ Follower  ▪ ログを受け入れる (過半数の受け入れでコミット)  ◦ Candidate  ▪ Leader が決まっていないときの状態  Leader Election のイメージ図  https://raft.github.io 

Multi-Raft による Region 管理  44  • データは Region という単位で Sharding
(Range Base)  ◦ Range Scan のパフォーマンスが高い  https://tikv.org/docs/3.0/concepts/architecture/  Leader  Leader 

45  デプロイしていく 

46  公式の TiKV デプロイ方法 

47   

48    Kubernetes の Manifest も Operator もない......？ 

49  待てよ？ 

50  TiDB Operator を使えそう 

51  どうにか使いたい 

52  https://github.com/pingcap/tidb-operator/issues/267 

53  https://github.com/pingcap/tidb-operator/issues/267  TiKV を使った開発をしたいなら  TiDB の Replica 数を 0 にすればいいよ 

54   

55  【荒業デプロイ】    ・TiDB Operator を使う  ・TiDB の Replica 数を
0 にする 

TiDB Operator のデプロイ  56  • 公式 Repository からインストール  ◦ Tips:
RBAC は何もせずとも作ってくれる  ◦ Tips: CRD は作ってくれないので対象リリースを自分で適用  $ helm repo add pingcap https://charts.pingcap.org/ $ helm repo update $ helm install \ --name tidb \ --namespace operators \ --version v1.0.0 \ # Helm Chart の Release 番号を指定 (helm search で確認) pingcap/tidb-operator $ kubectl apply -f \ # CRD のインストール (ブランチ名は Release 番号にあったものを使用)  https://raw.githubusercontent.com/pingcap/tidb-operator/release-1.0/mani fests/crd.yaml

57  Namespace  operators  TiDB  Controller  Manager  Namespace  default    TiDB 
Scheduler    ・Operator の中枢   ・各種リソースの管理   ・Pod スケジューリング   (Scheduler Extender)  

Kubernetes Scheduler Extender  58  • スケジューリング処理をユーザーが拡張できる機能  ◦ kube-controller-manager のカスタマイズが Custom
Controller  ◦ kube-scheduler のカスタマイズが Scheduler Extender  • TiDB Scheduler  ◦ 同一 Node に TiKV/PD が複数スケジューリングされないように調整  Predicates Priorities フィルタリング  スコアリング  候補 Node  Node 決定  Your logic フィルタリング  スコアリング  kube-scheduler  Scheduler Extender 

TiKV クラスタのデプロイ  59  • Cluster 用の Helm Chart を使用  ◦
TiDB のカウントを 0 にして作成  ◦ storageClassName は必須  ▪ 未指定でもデフォの SC が選択されないので PVC 作成で失敗する  $ helm install \ --name production \ --set tidb.replicas=0 \ --set pd.storageClassName=silver \ --set tikv.storageClassName=silver \ pingcap/tidb-cluster

60  Namespace  operators  Namespace  default  TiKV  TiKV  TiKV  PD  PD 
PD  Pod  ClusterIP Service Legend  Deployment  StatefulSet    TiDB  Controller  Manager  TiDB  Scheduler  Monitor  Discovery  NodePort ※色々省略 

Affinity 設定  61  • Affinity  ◦ Helm の Override を使用 
◦ 諸々のパラメータ設定もこっちで良いかも  # helm install -f override.yaml -n tikv pingcap/tidb-cluster tikv: storageClassName: silver affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodepool-name operator: In values: - tikv tidb: replicas: 0

モニタリング   62  • Prometheus + Grafana で  ダッシュボードを作ってくれる  ◦
TiDB Operator の機能  ◦ .monitor.create = true で作成  • ダッシュボードが充実している  • 基本は PD を見ていれば良さそう  ◦ 容量や Region の管理は PD が行う 

ダウンタイム  63  • Leader を落としてから数秒〜 20 秒程度で復帰  ◦ Lease Time
x2 の範囲内 (Leader を待つ時間 + 昇格までの時間)  ▪ Lease Time は 10 秒  ◦ Raft のレベルで Failover される  • Pod が落ちた  ◦ Kubernetes で Auto Healing  • Pod が応答を返さなくなった  ◦ 新しいノードを作成  ▪ .controllerManager.tikvFailoverPeriod で設定   ▪ デフォルトは 5 分  ▪ Operator の values.yaml で設定 

TiKV のバックアップ  64  • TiKV クラスタのバックアップ方法はなさそう  ◦ まだ RFC を定義している段階
(#11)  ▪ 半年くらい放置されている  ▪ 最終的には全 Raft グループに属すバックアップ用 Node を作って、そいつが各種データストアに吐き出す形になりそう  ◦ 一応 RocksDB のダンプは使えるが個々なので厳しい  ▪ tikv-ctl ldb --hex --db=/path/to/db dump • Operator に搭載の Backup 機能は TiDB のものなので注意  ◦ MySQL 形式でダンプされる  ◦ SQL に関係ない TiKV の Key は当然保存されない 

クライアント  65  • redis-cli みたいに手軽なのはなさそう  ◦ tikv-ctl は PUT 系コマンドがない 
• 各種言語のクライアントライブラリは充実している  ◦ Rust, Go, Java, C  ◦ zuiurs/tikv-cli (GET/PUT するだけのクライアント)  • PingCAP が開発している go-ycsb が TiKV に対応  $ kubectl run -it --rm --image=pingcap/go-ycsb \ --restart=Never tikv-cli --command -- \ /go-ycsb shell tikv -p tikv.pd=tikv-pd:2379

パフォーマンス  66  • 最新の go-ycsb による結果  ◦ Yahoo! YCSB の
Go 実装  ◦ 手元でうまく行かなかったので  公式の結果を引用  • Workload の割合 (R:W)  ◦ A: 50:50  ◦ B: 95:5  ◦ C: 100:0  • 分散 KVS なので Redis とは  比較しません  https://tikv.org/blog/tikv-3.0ga/ 

所感  67  • アーキテクチャが良い感じで信頼感ある  ◦ 分散データベースは壊れたとき怖いけど耐障害性は高そう  • まだ発展途中なので User Friendly
ではなさそう  • ドキュメントは徐々に充実してきているので期待はできる  ◦ 特に公式サイトは Deep Dive 系がかなり充実している 

68  Redis Operator 良さそう  TiKV も期待できそう 

まとめ  69  • Kubernetes に簡単に KVS をデプロイする方法を紹介  ◦ Redis Operator 
◦ TiKV  • Redis Operator は Sentinel で動く  ◦ 安定感があるので本番でも使えそう  • TiKV が流行りそう  ◦ アーキテクチャはとても良いように見える  ◦ まだ未熟で運用しづらそうなので今後に期待 

70  ご清聴ありがとうございました 

参考  71  • Redis  ◦ https://redis.io/topics/sentinel  ◦ https://redis.io/topics/sentinel-clients  • TiKV 
◦ https://tikv.org/docs/deep-dive/introduction/  ◦ https://github.com/tikv/tikv/wiki  ◦ Building a Transactional Key-Value Store That Scales to 100+ Nodes - FileId - 137042.pdf  ◦ TiKV Best Practices.pdf  ◦ https://www.slideshare.net/pfi/raft-36155398  ◦ https://gist.github.com/sile/ad435262c17eb79f133d 

Kubernetes 上で KVS をマネージドっぽく使いたい！ / Managed KVS ...

Kubernetes 上で KVS をマネージドっぽく使いたい！ / Managed KVS on Kubernetes

More Decks by Mizuki Urushida

Other Decks in Technology

Featured

Transcript