由Spanner來看Google資料庫的前世今生

由 Spanner來看 Google資料庫的前世今⽣生 Szu-Kai Hsu (brucehsu)

Spanner is a scalable multi-version globally-distributed synchronously-replicated database

BigTable

Handling

Handling really

Handling really BIG DATA

key-value

key-value { “CCU”: “123”, “NCTU”: “113”, “NTU”: “112” }; key

key-value { “CCU”: “123”, “NCTU”: “113”, “NTU”: “112” }; value

distributed

Lack of transaction, think of our ﬁrst project.

Consistency A P

Consistency Availability P

Consistency Availability Partition tolerance

Consistency Availability Partition tolerance Consistency

Megastore

NoSQL datastores are highly scalable, but their limited API and
loose consistency models complicate application development. “ “

In Megastore, data model is declared in a strong-typed schema
strong-typed schema CREATE TABLE User { required int64 user_id; required string name; } PRIMARY KEY(user_id), ENTITY GROUP ROOT;

Based on BigTable BigTable

PRIMARY user_id PRIMARY user_id, nyan_id

Local and Global Indexes are introduced: Local Index Find corresponding
data in entity group Global Index Find corresponding data in external groups Local Index Global Index

(user_id, born,nyan_id) For local index CREATE LOCAL INDEX NyanByBorn ON
Nyan(user_id, born); CREATE LOCAL INDEX NyanByBorn ON Nyan(user_id, born);

Consistency achieved via Paxos algorithm Paxos 2 Replicas 1 Witness
At least

Replica consists of Replication server and Coordinator Replication server Coordinator
write oversee

Witness’ Replication server only writes logs logs

Average Latency: 100-400ms Poor write throughput 100-400ms

Spanner ,ﬁnally.

We believe it is better to have application programmers deal
with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. “ “

Data model is almost identical to Megastore almost identical Basic
unit deﬁned as Directory Directory

unit deﬁned as Directory Directory Same preﬁx key, therefore adjacent

unit deﬁned as Directory Directory Same preﬁx key, therefore adjacent Fine-grained mapping

unit deﬁned as Directory Directory Same preﬁx key, therefore adjacent Fine-grained mapping Interleaved rows gain performance

Two-phase commit for distributed transactions Two-phase commit 1Vote Coordinator Participants

Two-phase commit for distributed transactions Two-phase commit 2Commit Coordinator Participants

Locking remains a big issue Locking Especially when someone went
down, causing deadlock, literally.

Paxos is here to rescue, again Paxos will make sure
ALL logs are copied to every replicas. ALL logs

Real Innovation lies in time TrueTime API utilizes atomic clock
& GPS to determine the order of each transactions atomic clock GPS

NewSQL is the new NoSQL and Spanner is the best
example so far.

由Spanner來看Google資料庫的前世今生

由Spanner來看Google資料庫的前世今生

More Decks by Szu-Kai Hsu (brucehsu)

Other Decks in Technology

Featured

Transcript