Building PostgreSQL as a Service
with Kubernetes
PGConf.Asia 2019
2019/9/9
@tzkb
Slide 2
Slide 2 text
3
My Actitvities
PGConf.Asia 2018 @Tokyo
A guide of PostgreSQL on Kuberntes
- In terms of Storage -
CloudNativeDays Tokyo 2019
The Future of Database on Kubernetes
- What run with Cloud Native Storage -
+ =∞
Slide 3
Slide 3 text
4
Agenda
1. Recap: What is Kubernetes?
2. The Issues for Database on Kubernetes
3. How to run your PostgreSQL on K8s
4. Kubernetes becomes The Platform
Slide 4
Slide 4 text
6
1. Recap: What is Kubernetes?
Slide 5
Slide 5 text
7
What is Kubernetes?
Pod Pod
Pod
Pod Pod
• Kubernetes(K8s) is the orchestrator tool for containers.
It has 3 features below.
• Declarative config
• Auto-healing
• Immutable
Database is
not Immutable.
Slide 6
Slide 6 text
8
Better to handle the database system by Kubernetes?
Node Node Node
Master Slave
Replicate
• The database usually has a state that is not easy to maintain
by Kubernetes.
• Necessary to startup in
turn.
• Must never lose their
data.
• Handle the database as
pets.
Slide 7
Slide 7 text
9
Example of Database on Kubernetes: Vitess
VTtablet
VTtablet
VTtablet
VTgate
app
app
app
SQL
SQL
SQL
• Vitess that used on YouTube is the CNCF incubating project.
• Vitess provides MySQL
sharding in K8s.
• VTgate and VTtablet
can scale by K8s.
• When terminating a
component abnormally,
Kubernetes repair it
automatically.
Slide 8
Slide 8 text
10
The choice: How to manage your database
Compute
Storage
Managed
Amazon Aurora
Amazon Redshift
Amazon RDS
on Cloud on Kubernetes
• You can choose to manage the database by yourself or else.
Slide 9
Slide 9 text
17
2. The Issues for Database on Kubernetes
Slide 10
Slide 10 text
18
Kubernetes is the Distributed Systems
• Developed as following a distributed architecture.
• When doesn’t a node reply
– Network partition?
– Process failure?
– Node failure?
• If the disk resource attached,
harder to determine.
FailOver?
Slide 11
Slide 11 text
20
Database Architects are familiar with Clustering
“If you don’t know the status, it’s okay.
We act on the premise of failsafe.”
“No need to share resources. Right?”
“Both have long been known for database
clustering.”
Slide 12
Slide 12 text
21
Basic: Database Clustering
HA
(Active/Standby)
1
Sharding
Replication
(Active/Active)
2or
more
Instances Redundancy
2 or
more
Shared
Disk
Log
Shipping
---
×
Scaleout?
Read
Read/
Write
Failover
(Fencing)
Availability
Promotion
(Election)
---
• There are differences to build a DB cluster with some nodes.
Slide 13
Slide 13 text
22
Clustering #1: HA
• With Linux-HA
• Use high-available shared
storage
• Multiple writes to storage
• Fencing
VIP
Linux-HA
Controller Controller
• It's been used since before Linux but helpful.
Slide 14
Slide 14 text
23
Note: Fencing
VIP
Linux-HA
Controller Controller
< When Detecting Node Failure >
1. Forced node power off
i. Definite processes stop
ii. Unmount storage
iii. Detach virtual IP
2. PostgreSQL starts to run on
the standby node.
• Failed node is isolated from resources = Fencing
Slide 15
Slide 15 text
24
Clustering #2: Replication
WAL
• The master can Read/Write,
Slaves are Read-Only.
• Data synchronization by WAL
transmission
• 2 or more Masters
• Leader Election
• Redundancy built into PostgreSQL = Streaming Replication
Master
Slave
Slave
Slide 16
Slide 16 text
25
Note: Leader Election
WAL
Be promoted as
a master,
The other is still
a slave.
• Always one master
• The former master joins as a
slave.
1. The remaining one slave is
elected as the leader
2. The leader is promoted as a
master.
• Algorithms such as Paxos and Raft are used.
Master
Slave
Slide 17
Slide 17 text
26
Clustering #3 Sharding
• Divide data between nodes
and operates as one DB.
• Dispatches queries to relevant
nodes.
• Basically no availability.
• Problems with the transaction.
• For rather scalability than availability.
Coordinator
Slide 18
Slide 18 text
27
3. How to Run your on Kubernetes
Slide 19
Slide 19 text
28
Implemetation Overview : on Kubernetes
# Category OSS used Description
ⅰ
HA
• Use Rook/Ceph as Shared
Storage.
ⅱ • Use LINSTOR/DRBD as
Shared Storage.
ⅲ Replication • Use Streaming Replication,
without Shared Storage.
ⅳ Operator • Building and Operating
Replication automatically.
• We can see following four patterns.
Slide 20
Slide 20 text
29
• K8s manages
everything(DB,storage)
• Shared-Storage: Ceph
• Fenced by kube-fencing
< Disadvantage >
• Complicated
• Insufficient IO
HA (i):
Replicas:1
• is deployed as StatefulSet using Rook/Ceph.
kube-fencing
Slide 21
Slide 21 text
30
Note: Without Fencing
Replicas:1
• When a node goes down, never failover.
• To avoid network
partition.
• It is by design.
Slide 22
Slide 22 text
31
Note: What is
• Rook is Kubernetes Operator managing Ceph or others.
operator
agent/discover agent/discover agent/discover
osd osd osd
mon mon mon
CSI
csi-provisioner
csi-rbdplugin csi-rbdplugin csi-rbdplugin
Rook
• Rook makes easy to
build Ceph cluster.
• Also easy to deploy
CSI modules.
• CSI: Containar
Storage Interface
Slide 23
Slide 23 text
32
HA (ii):
Replicas:1
kube-fencing
• LINSTOR is Software-Defined Storage based on DRBD.
• K8s manages
everything(DB,storage)
• Redundancy: DRBD
• Simple, Read IO
without Network
< Disadvantage >
• Limited to Scale
34
Replication :
proxy proxy proxy
keeper keeper keeper
sentinel sentinel sentinel
• Builds Streaming Replication on top of Kubernetes.
• 3 types of processes
have different roles
• Without Shared-
Resources
< Disadvantage >
• Not builtin Read Off-
loading
Slide 26
Slide 26 text
36
Operator :
• KubeDB operates not only but also others.
kubedb-operator
-0 -1 -2
postgres snapshot
dormantdabases
• Database Operator for
– PostgreSQL
– MySQL
– Redis
• Kubedb-operator
builds SR.
• Able to get/restore
snapshot easily.
Slide 27
Slide 27 text
37
Example : PostgreSQL Configration by KubeDB
apiVersion: kubedb.com/v1alpha1
kind: Postgres
metadata:
name: ha-postgres
namespace: demo
spec:
version: “10.6-v2"
replicas: 3
storageType: Durable
storage:
storageClassName: "standard"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
spec.version
– Choose PostgreSQL version.
spec.replicas
– The number of Instances.
spec.storage
– Define storage type/size, etc.
• Allows to define Streaming Replication with a simple YAML.
Slide 28
Slide 28 text
38
Example : Snapshot by KubeDB
apiVersion: kubedb.com/v1alpha1
kind: Snapshot
metadata:
name: snapshot-to-s3
labels:
kubedb.com/kind: Postgres
spec:
databaseName: ha-postgres
storageSecretName: s3-secret
s3:
endpoint: 's3.amazonaws.com'
bucket: kubedb-qa
prefix: demo
• Write declarative Snapshot settings by YAML.
• Simple backup that applies only
this YAML.
• You can select storage,
– S3
– Swift
– Kubernetes Persistent Volume
42
To Recap
The components of database clustering with
Kubernetes Native are already available.
You can see some operators for DBA task
automation.
However, it is not over yet.
Cloud Native Storage + + = ???
Slide 32
Slide 32 text
43
The Signs
I. Pluggable Storage
Optimized Storage system for DB on K8s?
II. Forked and Cloud-Oriented PostgreSQL
AWS Aurora, Azure Hyperscale
Slide 33
Slide 33 text
44
THE LOG IS THE DATABASE.
SQL
Transactions
Caching
Storage
Logging
Storage
Logging
Storage
Logging
CPU
Memory
Cache(SSD)
Page
Cache(SSD) Log
AWS Aurora(PostgreSQL) Azure Hyperscale
• Both divide RDBMS functions and are extended by each cloud.
Slide 34
Slide 34 text
45
As the platform for PostgreSQL as a Service
DBaaS by Kubernetes
STaaS by Kubernetes
What we got for DBaaS
• HA
• Streaming Replication
• DB Operator
Also for STaaS
• Simple Redundancy
• Distributed Storage
• Interoperable IF(CSI)
• Kubernetes will be "The Platform for Platforms."