Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Real-World Kubernetes Deployments @ OSCON 2016
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Brandon Philips
May 16, 2016
Programming
1
690
Real-World Kubernetes Deployments @ OSCON 2016
Brandon Philips
May 16, 2016
Tweet
Share
More Decks by Brandon Philips
See All by Brandon Philips
Node.js Workflow with Minikube and Skaffold
philips
0
280
Manage the App on Kubernetes
philips
0
360
Production Backbone Monitoring Containerized Apps
philips
0
210
KubeCon EU 2017: Dancing on the Edge of a Volcano
philips
1
820
rkt - KubeCon EU keynote - 2017
philips
1
300
FOSDEM_Keynote_2017-_.pdf
philips
0
160
Tectonic Summit Day 2 Keynote
philips
0
390
Kubernetes: Simple to Manage Anywhere (self-hosted, Tectonic upgrade demo)
philips
0
430
KubeCon Keynote 2016- Distributed Systems Simplified on Kubernetes
philips
2
580
Other Decks in Programming
See All in Programming
「抽象に依存せよ」が分からなかった新卒1年目の私が Goのインターフェースと和解するまで
kurogenki
0
120
AIとペアプロして処理時間を97%削減した話 #pyconshizu
kashewnuts
1
240
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
120
AHC061解説
shun_pi
0
380
Go 1.26でのsliceのメモリアロケーション最適化 / Go 1.26 リリースパーティ #go126party
mazrean
1
390
AI駆動開発の本音 〜Claude Code並列開発で見えたエンジニアの新しい役割〜
hisuzuya
4
510
AI時代のソフトウェア開発でも「人が仕様を書く」から始めよう-医療IT現場での実践とこれから
koukimiura
0
150
nuget-server - あなたが必要だったNuGetサーバー
kekyo
PRO
0
250
LangChain4jとは一味違うLangChain4j-CDI
kazumura
1
180
API Platformを活用したPHPによる本格的なWeb API開発 / api-platform-book-intro
ttskch
1
140
RAGでハマりがちな"Excelの罠"を、データの構造化で突破する
harumiweb
9
2.8k
Goの型安全性で実現する複数プロダクトの権限管理
ishikawa_pro
2
330
Featured
See All Featured
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
150
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
170
Discover your Explorer Soul
emna__ayadi
2
1.1k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.7k
Six Lessons from altMBA
skipperchong
29
4.2k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
200
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
310
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.4k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.1k
KATA
mclloyd
PRO
35
15k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
310
Facilitating Awesome Meetings
lara
57
6.8k
Transcript
Real World Kubernetes Deployments failure domains, upgrades, high-availability @coreoslinux @brandonphilips
Follow Along Instructions http://bit.ly/1XeUbMW Stickers Upfront Decorate your laptop, dog,
kid, phone.
Brandon Philips CTO, CoreOS github.com/philips
None
Build, Store and Distribute your Containers quay.io
Linux
Secure the Internet MISSION
Separate Apps from OS STRATEGY
Make Servers Consistent STRATEGY
Tolerate Machine Failures STRATEGY
Make Servers Easy to Upgrade STRATEGY
Simplify Application Upgrades STRATEGY
None
None
None
None
None
Application Packaging 1
Abstract away app from the OS OS App
None
None
Linux at Scale 2
Patches to the OS and kernel are hard Retest after
updates No automation SECURITY Dependency breakage Uptime risk APPLICATION
None
Auto-updating browsers fixed security We got HTML5 at the same
time
Clustering 3
Operations Paradise Easy scale out Painless app upgrades Tolerant of
machine failure
App Req/sec: 6,000 App Healthy: True
App Req/sec: 6,000 App Healthy: True
App Req/sec: 7,000 App Healthy: True
App Req/sec: 8,000 App Healthy: True
App Req/sec: 7,000 App Healthy: True
App Req/sec: 6,000 App Healthy: True
App Req/sec: 8,000 App Healthy: True
App Req/sec: 7,000 App Healthy: True
App Req/sec: 8,000 App Healthy: True
App Req/sec: 8,000 App Healthy: True
3 Application packaging Clustering Linux at scale
3 Application packaging Clustering Linux at scale
Follow Along Instructions https://github.com/philips/repositories 2016-OSCON-containers-at-scale
CoreOS+Kubernetes vagrant, aws, bare metal, etc coreos.com/kubernetes/docs/latest/
kubernetes architecture in practice
worker kubelet worker kubelet worker kubelet scheduler & API worker
kubelet w ku t worker kubelet
worker kubelet worker kubelet scheduler & API
worker & API works on 1 node too
kube-aws Initial Cluster Setup
worker kubelet worker kubelet controller scheduler, etcd & API
Demo Boot up a Cluster
Demo Run an App
Demo Understand the Network
Domains Let's Talk About Failure
Failure domains are regions or components of the infrastructure which
contain a potential for failure.
These regions can be physical or logical boundaries, and each
has its own risks and challenges to architect for.
Failure Feud - Machine Failure - Network/Disks/RAM/Processor/Power Supply - Rack
Failure - Network/Power - Data Center Failure - Network/Power/Fire/Semi-trucks - Internet Failure - Network/Political/Natural
Failure Analysis Kid Celebrating
None
Kid Hitting His Eye Failure Analysis - Failure is caused
by human error - Celebration continues; eye unnecessary - Kid has two eyes can continue seeing - Brain elects new eye automatically
Primary Datastore etcd operations
/etc distributed hence, the name...
a clustered key-value store GET and SET operations
a building block for higher order systems primitives for building
reliable distributed systems
Demo play.etcd.io
None
None
None
None
None
None
None
Failure Analysis etcd
worker kubelet worker kubelet scheduler & API
kube-aws high availability in cloud
scheduler & API EBS { ASG
etcd protects against - Machine Failure - Replication, automatic leader
election - Flakey Disk Failure - CRC checksums on WAL files - Network Failure - Timeouts and linearized state machine
etcd does not protect against - Denial of Service -
Future work on proxies - Lying etcd Peers - We do a ton of functional testing a hedge - Buggy or Broken Clients - Client deleting all keys requires restore from backup
Demo etcd restore backup
1 2 3 4 { Log
1 2 3 4 Entries
1 2 3 4 Indexes
Kubernetes Control API Service, Scheduler, Controller Manager
Failure Analysis Kubernetes
Demo etcd down for API server
worker kubelet worker kubelet scheduler & API
scheduler & API
Demo etcd restore for API server
scheduler & API
Demo node partition from API
worker kubelet worker kubelet scheduler & API
Demo node scaling up
worker kubelet worker kubelet scheduler & API worker kubelet
Demo node scheduled outage API
worker kubelet worker kubelet scheduler & API
Demo node unplanned outage
worker kubelet worker kubelet scheduler & API
Demo node downgrade/upgrade outage
worker kubelet worker kubelet scheduler & API
Future Work Upstream Kubernetes and Elsewhere
Upstream rktnetes Auth/OIDC Node self-signed TLS
Scaling 15x scheduler performance 30k pods on 1k nodes SIG-scale
Automatic Node Drain Locksmith Design Doc
None
Performance etcd3 /ZooKeeper snapshot disabled
Performance etcd3 /ZooKeeper snapshot disabled
Memory 10GB 2.4GB 0.8GB 512MB data - 2M 256B keys
Sounds good, but... Is anyone successful with all this in
prod?
Publically traded options exchange
Containers on CoreOS are powering ISE's high- throughput, low-latency financial
exchange Running in production Bare metal & AWS Billions of transactions a day 150 million req/sec
None
Secure the Internet MISSION
Separate Apps from OS STRATEGY
Make Servers Consistent STRATEGY
Tolerate Machine Failures STRATEGY
Make Servers Easy to Upgrade STRATEGY
Simplify Application Upgrades STRATEGY
None
Thank you! Brandon Philips @brandonphilips |
[email protected]
| coreos.com We’re
hiring in all departments! Email:
[email protected]
Positions: coreos.com/ careers