Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Couchbase Live 2016 - Criteo

Deimos Fr
October 06, 2016

Couchbase Live 2016 - Criteo

Couchbase usage and performances at Criteo. Presentation made at Couchbase LIve in 2016

Deimos Fr

October 06, 2016
Tweet

More Decks by Deimos Fr

Other Decks in Technology

Transcript

  1. About me Pierre Mavro - Lead DevOps - NoSQL Team

    Working at Criteo as Site Reliability Engineer @deimosfr
  2. Criteo technical insights • 700 engineers • 17K servers •

    27K displays per second • 2.4M requests per second
  3. Criteo SRE: biggest challenges • Scaling • Low latency •

    High throughput • Resiliency • Automation
  4. Couchbase figures at Criteo (Worldwide) • 1300+ physical servers •

    100+ clusters (up to 50 servers each) • 90TB of data in memory • 25M QPS • < 8ms constant latency
  5. Couchbase usage at Criteo • Storing UUIDs < 30b •

    Storing blobs (ex. binary images) • Storing keys size > value data size (sometimes) • Serving between 100Kqps to 2.5Mqps per cluster • Low latency at 99perc < 2ms • Data size per cluster between 500Gb to ~12Tb (with replica) • All data fits in memory • Inter datacenter replication (custom client driver)
  6. Legacy infrastructure • Couchbase v1.8 legacy (80%) and v3.0.1 community

    (20%) • Slow rebalance (up to 48h for 1 server) • Rebalance failures on high loaded clusters • Max connection reached on v1.8 (9k)
  7. Legacy infrastructure • Same cluster shares on persisted and non

    persisted buckets • No dedicated latency monitoring tool • No auto restart/upgrade orchestrator • Server benchmarks update required • Lack of Couchbase best practices
  8. Benchmarks • Couchbase Enterprise 3.1.3 • 3x HP GEN9 DL360

    (256GB RAM, 6x400GB SSD RAID10, 1Gb Network interface) (2x injectors + 1 server) • Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes) • Value size: uniform range between 750 B and 1250 B (avg 1 kB) • Number of items: 50M/node (with replica) or 100M/node (without replica) • Resident active items (= items fully in RAM): ~50% • Value-only ejection mode (only data value can be removed from RAM, keeping metadata + key in RAM).
  9. Benchmarks Heavy Writes/Little Reads (10Kqps) without replica Write rate per

    node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms 60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms 80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms 100 Kset/s OK 70M items 1.5 ms 5 ms 10 ms 40 ms
  10. Benchmarks Heavy Writes/Little Reads (10Kqps) with one replica Write rate

    per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 20 Kset/s OK 12M items 0.4 ms 1 ms 2 ms 10 ms 30 Kset/s OK 33M items 0.5 ms 2 ms 4 ms 20 ms 40 Kset/s OK 60M items 0.6 ms 2 ms 5 ms 25 ms 50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms
  11. Benchmarks Heavy Reads/Little Writes (10Kqps) with one replica Read rate

    per node Status Disk Write Queue Latency 50 perc Latency 95 perc Latency 99perc Latency 99,9 perc 25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms 50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms 75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms 100 Kset/s NOK 50k to 500k items 16 ms 25 ms 45 ms 100 ms
  12. Benchmarks Conclusion for a single node: • Network 1Gb is

    the bottleneck • Replicas introduce latency • Reads are fast • Max write with replica per node: 40 Kqps • Max read with replica per node: 90Kqps • Max read/write without replica per node: 90 Kqps
  13. Metrics Metrics are greats ! • QPS total (read +

    write) • Total RAM usage • Availability • Number of items • … But it’s not relevant enough to know the global service status !
  14. SLI: add the major missing metric Adding latency monitoring as

    SLI, to be part of our Couchbase SLO and SLA
  15. Support contract • Get latest Couchbase bug fixes • Suggest

    Couchbase enhancements • Speed up resolution of incidents with the help of support • Get better Couchbase tuning recommendations for performance
  16. Split usages • High-load (QPS) buckets are on dedicated clusters

    • Low-load (QPS) buckets are shared on separate “shared” clusters • Persisted and Non persisted clusters are not on the same servers anymore
  17. Automation: why? • Need to upgrade from the community to

    the enterprise version • Need to apply new configuration options that require a restart of all the nodes in a cluster • Need to apply fixes that require a reboot of all the nodes in a cluster • Need to reinstall servers from scratch
  18. Automation: how? • Criteo is using Chef to bootstrap servers,

    deploy applications and configuration • We did not want to add another new tool in the loop • Nothing with the required features already exists • We developed a FOSS Chef cookbook for this and other use-cases: Choregraphie https://github.com/criteo-cookbooks/choregraphie
  19. Automation: Choregraphie With Choregraphie we can perform: • Rolling restart

    with rebalance • Rolling upgrade with rebalance • Use an optional, additional server to speed up rebalance • Rolling reboot with rebalance • Rolling reinstall with rebalance Choregraphie is open source! Feel free to contribute
  20. Couchbase best practices / system tuning • Minimize swap usage:

    ◦ vm.swappiness = 0 (set to 1 for kernel >3.5) • Disable transparent Hugepages: ◦ chkconfig disable-thp on • Set SSD IO-Scheduler to deadline: ◦ echo “deadline” > /sys/block/sdX/queue/scheduler • Change CPUFreq governor: ◦ modprobe cpufreq_performance • Leverage maximum connection: ◦ max_conns_on_port_XXXX: 30000
  21. Couchbase tuning • Upgrade Nonio parameter to 8 to avoid

    rebalance failures on high-load clusters: ◦ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval • Disable access log if you don’t need them to reduce disk usage (native in Couchbase 4.5): ◦ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>", [{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval
  22. Tuning...what’s next? • Network teaming 802.3ad (bonding) with 2x1Gb cards

    • 10Gb network cards • Upgrade to Couchbase 4.5 • Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement (multi queues SSD) • Switch to Mesos to reduce administration time