Couchbase Live 2016 - Criteo

Couchbase usage and perfs Criteo - Couchbase Live 2016 -
Paris

About me Pierre Mavro - Lead DevOps - NoSQL Team
Working at Criteo as Site Reliability Engineer @deimosfr

Criteo 31 Offices 2000+ employees

Criteo technical insights • 700 engineers • 17K servers •
27K displays per second • 2.4M requests per second

Criteo SRE: biggest challenges • Scaling • Low latency •
High throughput • Resiliency • Automation

Couchbase figures at Criteo (Worldwide) • 1300+ physical servers •
100+ clusters (up to 50 servers each) • 90TB of data in memory • 25M QPS • < 8ms constant latency

Couchbase usage at Criteo • Storing UUIDs < 30b •
Storing blobs (ex. binary images) • Storing keys size > value data size (sometimes) • Serving between 100Kqps to 2.5Mqps per cluster • Low latency at 99perc < 2ms • Data size per cluster between 500Gb to ~12Tb (with replica) • All data fits in memory • Inter datacenter replication (custom client driver)

What we wanted to solve

Legacy infrastructure • Couchbase v1.8 legacy (80%) and v3.0.1 community
(20%) • Slow rebalance (up to 48h for 1 server) • Rebalance failures on high loaded clusters • Max connection reached on v1.8 (9k)

Legacy infrastructure • Same cluster shares on persisted and non
persisted buckets • No dedicated latency monitoring tool • No auto restart/upgrade orchestrator • Server benchmarks update required • Lack of Couchbase best practices

How we achieved the change

1. Benchmarks

Benchmarks • Couchbase Enterprise 3.1.3 • 3x HP GEN9 DL360
(256GB RAM, 6x400GB SSD RAID10, 1Gb Network interface) (2x injectors + 1 server) • Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes) • Value size: uniform range between 750 B and 1250 B (avg 1 kB) • Number of items: 50M/node (with replica) or 100M/node (without replica) • Resident active items (= items fully in RAM): ~50% • Value-only ejection mode (only data value can be removed from RAM, keeping metadata + key in RAM).

Benchmarks Heavy Writes/Little Reads (10Kqps) without replica Write rate per
node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms 60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms 80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms 100 Kset/s OK 70M items 1.5 ms 5 ms 10 ms 40 ms

Benchmarks Heavy Writes/Little Reads (10Kqps) with one replica Write rate
per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 20 Kset/s OK 12M items 0.4 ms 1 ms 2 ms 10 ms 30 Kset/s OK 33M items 0.5 ms 2 ms 4 ms 20 ms 40 Kset/s OK 60M items 0.6 ms 2 ms 5 ms 25 ms 50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms

Benchmarks Heavy Reads/Little Writes (10Kqps) with one replica Read rate
per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms 50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms 75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms 100 Kset/s NOK 50k to 500k items 16 ms 25 ms 45 ms 100 ms

Benchmarks Conclusion for a single node: • Network 1Gb is
the bottleneck • Replicas introduce latency • Reads are fast • Max write with replica per node: 40 Kqps • Max read with replica per node: 90Kqps • Max read/write without replica per node: 90 Kqps

2. SLI, SLO & SLA

Metrics Metrics are greats ! • QPS total (read +
write) • Total RAM usage • Availability • Number of items • … But it’s not relevant enough to know the global service status !

SLI: add the major missing metric Adding latency monitoring as
SLI, to be part of our Couchbase SLO and SLA

3. Couchbase support

Support contract • Get latest Couchbase bug fixes • Suggest
Couchbase enhancements • Speed up resolution of incidents with the help of support • Get better Couchbase tuning recommendations for performance

4. Refactoring infrastructure

Split usages • High-load (QPS) buckets are on dedicated clusters
• Low-load (QPS) buckets are shared on separate “shared” clusters • Persisted and Non persisted clusters are not on the same servers anymore

5. Administration Automation

Automation: why? • Need to upgrade from the community to
the enterprise version • Need to apply new configuration options that require a restart of all the nodes in a cluster • Need to apply fixes that require a reboot of all the nodes in a cluster • Need to reinstall servers from scratch

Automation: how? • Criteo is using Chef to bootstrap servers,
deploy applications and configuration • We did not want to add another new tool in the loop • Nothing with the required features already exists • We developed a FOSS Chef cookbook for this and other use-cases: Choregraphie https://github.com/criteo-cookbooks/choregraphie

Automation: Choregraphie With Choregraphie we can perform: • Rolling restart
with rebalance • Rolling upgrade with rebalance • Use an optional, additional server to speed up rebalance • Rolling reboot with rebalance • Rolling reinstall with rebalance Choregraphie is open source! Feel free to contribute

6. Couchbase and system tuning

Couchbase best practices / system tuning • Minimize swap usage:
◦ vm.swappiness = 0 (set to 1 for kernel >3.5) • Disable transparent Hugepages: ◦ chkconfig disable-thp on • Set SSD IO-Scheduler to deadline: ◦ echo “deadline” > /sys/block/sdX/queue/scheduler • Change CPUFreq governor: ◦ modprobe cpufreq_performance • Leverage maximum connection: ◦ max_conns_on_port_XXXX: 30000

Couchbase tuning • Upgrade Nonio parameter to 8 to avoid
rebalance failures on high-load clusters: ◦ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval • Disable access log if you don’t need them to reduce disk usage (native in Couchbase 4.5): ◦ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval

Tuning...what’s next? • Network teaming 802.3ad (bonding) with 2x1Gb cards
• 10Gb network cards • Upgrade to Couchbase 4.5 • Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement (multi queues SSD) • Switch to Mesos to reduce administration time

Questions ? Criteo - Couchbase Live 2016 - Paris Pierre
Mavro / @deimosfr

Couchbase Live 2016 - Criteo

Couchbase Live 2016 - Criteo

Deimos Fr

More Decks by Deimos Fr

Other Decks in Technology

Featured

Transcript

Couchbase usage and perfs Criteo - Couchbase Live 2016 -

About me Pierre Mavro - Lead DevOps - NoSQL Team

Criteo 31 Offices 2000+ employees

Criteo technical insights • 700 engineers • 17K servers •

Criteo SRE: biggest challenges • Scaling • Low latency •

Couchbase figures at Criteo (Worldwide) • 1300+ physical servers •

Couchbase usage at Criteo • Storing UUIDs < 30b •

What we wanted to solve

Legacy infrastructure • Couchbase v1.8 legacy (80%) and v3.0.1 community

Legacy infrastructure • Same cluster shares on persisted and non

How we achieved the change

1. Benchmarks

Benchmarks • Couchbase Enterprise 3.1.3 • 3x HP GEN9 DL360

Benchmarks Heavy Writes/Little Reads (10Kqps) without replica Write rate per

Benchmarks Heavy Writes/Little Reads (10Kqps) with one replica Write rate

Benchmarks Heavy Reads/Little Writes (10Kqps) with one replica Read rate

Benchmarks Conclusion for a single node: • Network 1Gb is

2. SLI, SLO & SLA

Metrics Metrics are greats ! • QPS total (read +

SLI: add the major missing metric Adding latency monitoring as

3. Couchbase support

Support contract • Get latest Couchbase bug fixes • Suggest

4. Refactoring infrastructure

Split usages • High-load (QPS) buckets are on dedicated clusters

5. Administration Automation

Automation: why? • Need to upgrade from the community to

Automation: how? • Criteo is using Chef to bootstrap servers,

Automation: Choregraphie With Choregraphie we can perform: • Rolling restart

6. Couchbase and system tuning

Couchbase best practices / system tuning • Minimize swap usage:

Couchbase tuning • Upgrade Nonio parameter to 8 to avoid

Tuning...what’s next? • Network teaming 802.3ad (bonding) with 2x1Gb cards

Questions ? Criteo - Couchbase Live 2016 - Paris Pierre