Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
平和なConsul Cluster運用 / consul-casual-1
Search
FUJIWARA Shunichiro
August 01, 2016
Technology
17
2.6k
平和なConsul Cluster運用 / consul-casual-1
Consul Casual Talks #1
http://connpass.com/event/35836/
FUJIWARA Shunichiro
August 01, 2016
Tweet
Share
More Decks by FUJIWARA Shunichiro
See All by FUJIWARA Shunichiro
ISUCONに強くなるかもしれない日々の過ごしかた/Findy ISUCON 2024-11-14
fujiwara3
8
870
「最高のチューニング」をしないために / hack@delta 24.10
fujiwara3
21
3.8k
AWS Lambdaで実現するスケーラブルで低コストなWebサービス構築/YAPC::Hakodate2024
fujiwara3
10
4.3k
CEL(Common Expression Language)で書いた条件にマッチしたIAM Policyを見つける / iam-policy-finder
fujiwara3
2
1.4k
awslim - Goで実装された高速なAWS CLIの代替品を作った/layerx.go#1
fujiwara3
6
730
AWS CLIの起動が重くてつらいので aws-sdk-client-go を書いた / kamakura.go#6
fujiwara3
7
10k
コードを書く隙間を見つけて生きていく技術/Findy 思考の現在地
fujiwara3
31
7.1k
fujiwara-ware OSSをひたすら紹介する/ya8-2024
fujiwara3
8
770
Amazon ECSで好きなだけ検証環境を起動できるOSSの設計・実装・運用 / YAPC::Hiroshima 2024
fujiwara3
25
8.5k
Other Decks in Technology
See All in Technology
ドメインの本質を掴む / Get the essence of the domain
sinsoku
2
150
[FOSS4G 2024 Japan LT] LLMを使ってGISデータ解析を自動化したい!
nssv
1
210
信頼性に挑む中で拡張できる・得られる1人のスキルセットとは?
ken5scal
2
530
VideoMamba: State Space Model for Efficient Video Understanding
chou500
0
190
dev 補講: プロダクトセキュリティ / Product security overview
wa6sn
1
2.3k
Why App Signing Matters for Your Android Apps - Android Bangkok Conference 2024
akexorcist
0
120
Amplify Gen2 Deep Dive / バックエンドの型をいかにしてフロントエンドへ伝えるか #TSKaigi #TSKaigiKansai #AWSAmplifyJP
tacck
PRO
0
370
リンクアンドモチベーション ソフトウェアエンジニア向け紹介資料 / Introduction to Link and Motivation for Software Engineers
lmi
4
300k
オープンソースAIとは何か? --「オープンソースAIの定義 v1.0」詳細解説
shujisado
5
590
マルチモーダル / AI Agent / LLMOps 3つの技術トレンドで理解するLLMの今後の展望
hirosatogamo
37
12k
iOSチームとAndroidチームでブランチ運用が違ったので整理してます
sansantech
PRO
0
130
複雑なState管理からの脱却
sansantech
PRO
1
140
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
334
57k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
31
2.7k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Building Better People: How to give real-time feedback that sticks.
wjessup
364
19k
We Have a Design System, Now What?
morganepeng
50
7.2k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Making the Leap to Tech Lead
cromwellryan
133
8.9k
The Art of Programming - Codeland 2020
erikaheidi
52
13k
Documentation Writing (for coders)
carmenintech
65
4.4k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
47
2.1k
GraphQLとの向き合い方2022年版
quramy
43
13k
Transcript
ฏͳ Consul cluster ӡ༻ Consul Casual Talks #1@fujiwara
౻ݪ ढ़Ұ @fujiwara github.com/fujiwara sfujiwara.hatenablog.com ٕज़෦
Game & Community
Agenda Consulͷ׆༻ࣄྫ ฏʹӡ༻͢ΔͨΊͷϙΠϯτ
Consulͷ׆༻ࣄྫ 1. Internal DNS (node, service) 2. maintͰϝϯςφϯε 3. StretcherʹΑΔσϓϩΠɺChef࣮ߦ
4. consul-templateʹΑΔnginxͷઃఆߋ৽ 5. 1͔͠ಈ͔ͨ͘͠ͳ͍daemonͷഉଞىಈ
Internal DNS
Internal DNS (node, service) node໊ (ྫ kayac-web-i-1234567...) service໊(ྫ) • log-aggregator
: Fluentdͷूαʔό • log-analyzer : Norikra • internal-proxy : ֎ʹग़ͯߦͨ͘ΊͷSquid • internal-mta : ֎ʹग़ͯߦͨ͘ΊͷPostfix
Internal DNS (node, service) dnsmasqΛશͰىಈ .consul υϝΠϯͷ໊લղܾconsul agent 127.0.0.1:53 Λdnsmasq͕Listen͢Δ
# dnsmasq.conf server=/consul/127.0.0.1#8600 bind-interfaces listen-address=127.0.0.1
None
Internal DNS (node, service) resolv.conf Ͱ (node|service).consul ΛݕࡧυϝΠϯʹࢦఆ → node໊ɺservice໊͚ͩͰଓͰ͖Δ
# /etc/resolv.conf search node.consul service.consul nameserver 127.0.0.1 # dnsmasq nameserver 172.16.0.2 # VPC resolver nameserver 172.16.0.254 # Unbound on EC2
bash-completionͰsshͷϗετ໊ิ ~/.bash_profile _known_hosts_real() { local members=$(consul members -status=alive | awk
'!/Node/{printf("%s ", $1)}') COMPREPLY=( $( \ compgen -W "$members" \ ${COMP_WORDS[COMP_CWORD]} \ ) ) return 0 } ੜ͖͍ͯΔϗετͷΈิީิʹͳΔ http://qiita.com/sfujiwara/items/f4fa907ead53ed104e1a
FluentdͷूαʔόૹΔઃఆ ConsulͰఏڙ͢ΔDNS໊ϥϯυϩϏϯͰૹ৴ <match **> type forward expire_dns_cache 15 dns_round_robin true
heartbeat_type tcp <server> host log-aggregator.service.consul </server> </match> ૹ৴ઌͷྻڍෆཁɺࣗಈΓ͠
maintͰϝϯςφϯε
consul maint consul maint -enable [-reason "..."] ͋ΔnodeΛϝϯςφϯεϞʔυʹ͢Δ → serviceͷ໊લղܾ͔Β֎ΕΔ
→ nodeͷ໊લղܾͰ͖Δ (sshͱ͔)
consul maint༻ྫ FluentdूαʔόΛϝϯςφϯε͍ͨ͠߹ʹmaint → DNS͔Β֎ΕΔͷͰૹ৴͕ࢭ·Δ (expire_dns_cache ͷઃఆ͕ඞཁ) NorikraͷΠϯελϯεΛऔΓସ͍͑ͨͱ͖ʹ • ৽͍͠ϗετΛ
maint -enable ͰηοτΞοϓ • چͰ maint -enable, ৽Ͱ maint -disable • DNSͰೖΕସΘΔͷͰૹ৴ઌ͕ΓସΘΔ
PackerͰ AMI ࡞࣌ʹ maint 1. consul cluster ʹ join 2.
maint -enable (ߏஙதʹΈࠐ·Εͳ͍Α͏ʹ) 3. ChefͰߏங 4. maintঢ়ଶͷ·· AMI ࡞ 5. AMI͔Βىಈͨ͠Πϯελϯεmaintͷ·· 6. ىಈޙͷॾʑ͕ऴΘͬͨΒ maint -disable → αʔϏεΠϯ
maintͳΒىಈ͠ͳ͍ daemontools ͷ run script #!/bin/bash maint=$(consul maint) if [[
$maint != "" ]]; then echo "$maint" sleep 10 exit 1 fi exec ... ϝϯς࣌ʹىಈͯ͠ཉ͘͠ͳ͍daemonΛ੍ޚ (maint -enableʹͳͬͯstopͨ͠Γ͠ͳ͍)
StretcherʹΑΔσϓϩΠ
StretcherʹΑΔσϓϩΠ github.com/fujiwara/stretcher Consul / Serf ͱ࿈ܞͯ͠ಈ͘σϓϩΠπʔϧ
None
StretcherͰChef࣮ߦ Chef-Server → Stretcher + Chef-Solo • Chef-Serverr͕ SPOF /
ϘτϧωοΫʹͳΒͳ͍ • શʹಉ͡tar, eventΛˠద༻͢ΔjsonΛ֤ϊʔυͰܾఆ # /etc/sysconfig/hostname-prefix HOSTNAME_PREFIX="xxx-app" → nodes/xxx-app.json Λద༻
ChefͷroleݕࡧΛserviceఆٛͰ /etc/consul.d/role.json { "service": { "name": "role", "tags": [ "batch-server",
"db-client", ... ] } } Serviceͱͯ͠ఆٛͯ͠ݕࡧՄೳʹ http://localhost:8500/v1/catalog/service/role? tag=db-client
http://localhost:8500/v1/catalog/service/role? tag=internal-proxy [ { "Node": "xxx-i-10bf0fe2", "Address": "10.0.0.123", "ServiceID": "role",
"ServiceName": "role", ... }, { "Node": "xxx-i-3c1b72b3", "Address": "10.0.1.234", "ServiceID": "role", "ServiceName": "role", ... } ]
DaemontoolsཧԼͷdaemonserviceఆٛ { "service": { "name": "daemontools", "tags": [ "app", stretcher",
"gunfish", ... ] } }
͋ΔdaemontoolsཧϓϩηεΛ࠶ىಈ͍ͨ͠ curl http://localhost:8500/v1/catalog/service/ daemontools?tag=gunfish | jq -r ".[].Node" xxx-admin-i-0391d6162be552655 xxx-app-i-01a7ff42f4796be4f
xxx-app-i-05bd652734828b522 xxx-batch-i-0095ac858fe87d8e5 Regexp::TrieͰ࠷దͳਖ਼نදݱʹͯ͠ consul exec consul exec -node '(?:xxx\-(?:a(?:dmin|pp)|batch))' "svc -h /service/gunfish"
consul-templateʹΑΔnginxͷઃఆߋ৽
consul-template https://github.com/hashicorp/consul-template • Consul KVͷɺServiceͷղܾ݁ՌͳͲΛτϦΨʹ • ςϯϓϨʔτߋ৽ɺҙscript kick͕Ͱ͖Δ
nginxͷઃఆߋ৽ # config.hcl template { source = "/etc/nginx/spam.ip.conf.ctmpl" destination =
"/etc/nginx/spam.ip.conf" command = "service nginx reload" perms = 0644 backup = true } # spam.ip.conf.ctmpl {{key "spam_ips"}} localhost:8500/v1/kv/spam_ips ʹPUT͢Δ͚ͩͰઃఆߋ৽
1͔͠ಈ͔ͨ͘͠ͳ͍daemonͷഉଞىಈ
1͔͠ಈ͔ͨ͘͠ͳ͍daemonͷഉଞىಈ WebSocketड৴Ͱಈ͘Slack bot→ 2Ҏ্Ͱಈ͘ͱ͢Δ ͰՄ༻ੑΛ͍࣋ͨͤͨ…
1͔͠ಈ͔ͨ͘͠ͳ͍daemonͷഉଞىಈ consul lock Λ͏ ϩοΫΛऔಘͰ͖ͨΒࢦఆͨ͠ίϚϯυ͕࣮ߦ͞ΕΔwrapper consul lock -n 1 nuko
"/path/to/run-nuko.sh" Consul leader͕ೖΕସΘΔͱϩοΫ͕ղ์͞ΕΔͷͰҙ
ฏʹӡ༻͢ΔͨΊͷϙΠϯτ
ฏʹӡ༻͢ΔͨΊͷϙΠϯτ RaftΛ(େ·͔ʹͰ͍͍ͷͰ)͓ͬͯ͘ http://thesecretlivesofdata.com/raft/ ࢄڥͰͷ߹ҙܗΞϧΰϦζϜ • Ϧʔμʔબग़ʹʮաʯͷ߹ҙ͕ඞཁ • 2 = 1མͪΔͱա(=2)͕औΕͳ͍
• 3 = 1མͪͯա(=2)͕औΕΔ • 4 = 2མͪΔͱա(=3)͕औΕͳ͍
Deployment Table ຊ൪Ͱ࠷3, 3 or 5͕ਪ consul.io/docs/internals/consensus.html
ServerʹඞཁͳϦιʔε • CPU: 2CPUͰे • Memory: 20MBʙ • Disk: 2MBʙ
Memory, DiskKVͷར༻ঢ়گ࣍ୈ KV dump JSON 10MB, data_dir/raft 120MB → consul agent RSS 250MB
Serverʹઐ༻ϗετ͕ඞཁʁ consul agentࣗମͦΕ΄ͲϦιʔεΛ༻͠ͳ͍ Disk IO͕ߴෛՙͳ߹ʹRaftͷHeartbeat͕ࣦഊ͍͢͠ • Timeout 500ms • Heartbeatʹࣦഊ͢ΔͱLeaderબग़͕ߦΘΕΔ
• ௨ৗ2,3ඵͰબग़ྃ͢Δ • Consul server tmpfs Λอଘઌʹͯ͠Disk IOͷӨڹճආ
ߴՄ༻ੑͷͨΊʹ ServerʹΑΓಉ࣌ʹোΛىͯ͜͠ ͳ͍node͕มΘΔ • 3 node → 1 • 5
node → 2 3 nodeߏ࣌ɺ2མͪͯΓ1ʹͳ ͬͯ͠·͏ͱLeader͕બग़Ͱ͖ͳ͍ ࣌ؒΓ͢ϝϯςφϯε࣌ʹҰ࣌ తʹServer nodeΛ૿͢ख
ߴՄ༻ੑͷͨΊʹ Server nodeͷfailoverࣗಈ ! Ϣʔβৗʹlocalhostͷagent͚ͩΛΈ͍ͯΕΑ͍
nodeো࣌ͷӨڹ ! LeaderͰͳ͍ → " ଞnodeʹӨڹͳ͠ ! Leader → "
Leader࠶બग़ σϑΥϧτͰͯ͢ͷಡΈॻ͖ΛLeader͕ॲཧ (ڧҰ؏ੑ) Leader͕ܾ·Δ·ͰΞΫηεෆೳ (DNS, HTTP)
Stale mode (DNS) Leader࠶બग़௨ৗ2ʙ3ඵͰྃ ͦͷؒDNSͰNode, Service໊ղܾΛ͍ͨ͠ʁ → Stale mode :
Leaderະબग़ͰԠՄೳ "dns_config":{ "allow_stale": true, // default false "max_stale": "10s" // default 5s } ݁Ռݹ͍Մೳੑ͕͋Δ(݁Ռ߹ੑ)
DNS TTL defaultTTL 0 → cache͞Εͳ͍ node, serviceผʹTTLΛઃఆՄೳ DNS cache(ͨͱ͑dnsmasq)Λલஈʹஔͯ͠cacheͰ͖Δ
"dns_config":{ "node_ttl": "60s", "service_ttl": { "*": "15s" } }
Stale mode (HTTP API) HTTP APIͰstale modeʹ͢Δ߹Ҿ stale $ curl
"http://127.0.0.1:8500/v1/kv/web/key1?stale" staleҾͳ͠ͰLeaderબग़தʹΞΫηε → 500 Internal Server Error
ӡ༻தͷUpgrade consul.io/docs/upgrading.html consul.io/docs/upgrade-specific.html όʔδϣϯผʹҙ͕͋ΔͷͰυΩϡϝϯτΛ ॱ൪ʹAgentΛೖΕସ͑Δ͜ͱͰ Rolling upgradeՄೳ (Leader nodeೖΕସ͑Ͱ࠶બग़ى͖Δ)
҆ఆੑ v0.2͔࣌Β2Ҏ্ӡ༻ Agentϓϩηε͕མͪͨ͜ͱ1ճ͚ͩ(0.4.1࣌) EBS(gp2)ͷΫϨδοτރׇ → IO waitେྔ → panic: Timeout
starting MDB transaction ΦϖϛεͰServerΛམͱ͗͢͠Δͱճ෮ෆೳ
KVͷόοΫΞοϓ ͋Δ֊ͷԼͷΛ࠶ؼతʹऔΓ͍ͨ߹ recurse $ curl -s "http://127.0.0.1:8500/v1/kv/?recurse" [ {"CreateIndex":112,"ModifyIndex":115,"LockIndex":0, "Key":"key1","Flags":123,"Value":"dGVzdA=="},
{"CreateIndex":122,"ModifyIndex":122,"LockIndex":0, "Key":"key2","Flags":0,"Value":"dGVzdDI="}, {"CreateIndex":124,"ModifyIndex":124,"LockIndex":0, "Key":"test/1","Flags":0,"Value":"dGVzdDM="} ] Key, Flags, ValueΛPUT͠ͳ͓͠ͰϨετΞͰ͖Δ
࣮ࡍʹLeader͕ೖΕସΘͬͨͱ͖ͷϩά 2016/07/30 10:07:28 [WARN] raft: Heartbeat timeout reached, starting election
2016/07/30 10:07:28 [INFO] raft: Node at 10.0.2.132:8300 [Candidate] entering Candidate state 2016/07/30 10:07:30 [WARN] raft: Election timeout reached, restarting election 2016/07/30 10:07:30 [INFO] raft: Node at 10.0.2.132:8300 [Candidate] entering Candidate state 2016/07/30 10:07:30 [INFO] raft: Election won. Tally: 3 2016/07/30 10:07:30 [INFO] raft: Node at 10.0.2.132:8300 [Leader] entering Leader state 2016/07/30 10:07:30 [INFO] consul: cluster leadership acquired 2016/07/30 10:07:30 [INFO] consul: New leader elected: xxx-consul-i-ff26ca5a 2ඵఔͰճ෮ DNSͷcache / stale mode ͰαʔϏεӨڹͳ͠ stale໌ࣔ͠ͳ͍HTTP API500ʹͳΔˠৗ࣌ୟ͖·͘Δͷ…?
࣮ࡍʹ͋ͬͨා͍
consul exec Ͱେྔ݁Ռऔಘ consul exec "cat /var/log/foo.log" | grep ...
֤ϗετͷϩάΛconsul execͰऔಘ͠Α͏ͱͨ͠ → consul exec KVʹҰ୴อଘ͢ΔͷͰϝϞϦ/DBංେԽ serverΛ1ͣͭ࠶ىಈͯ͠ճ෮
ΦϖϛεͰΫϥελ่յ upgrade͔ͨͬͨ͠ 3ߏͷserverͷ1Λམͱͯ͠ɺ৽͍͠όΠφϦͰىಈͨ͠ (ͭΓͩͬͨ) ͪΌΜͱىಈ͍ͯ͠ͳ͍ͷʹ2ͷαʔόΛམͱͨ͠ → ่յ
่յͨ͠ΒͲ͏͢Ε 1. མͪண͘ 2. serverΛશ෦ࢭΊΔ 3. σʔλ(data_dir)શ෦ফ͢ 4. serverΛ -bootstrap-expect
N Ͱىಈ • start_join ·ͨ खಈͰ join 5. (ඞཁͳΒ) KVΛόοΫΞοϓ͔Β͢