Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Swift at Scale: The IBM SoftLayer Story

Swift at Scale: The IBM SoftLayer Story

Brian Cline

October 25, 2016
Tweet

More Decks by Brian Cline

Other Decks in Technology

Transcript

  1. Swift at Scale:
 The IBM SoftLayer Story Brian Cline, Object

    Storage Development Lead OpenStack Summit • Ocata series 2016.10.25 Barcelona, Spain twitter/irc: @briancline
  2. Our History with Public Object Storage • 2012 — First

    three clusters go live (DAL, AMS, SNG) • 2014 — Dedicated development team • 2014 — Launch 11 clusters in new datacenters • 2015 — Launch 5 clusters in new DCs • 2015 — Product integrations with IBM Bluemix • 2016 — Launch 3 clusters in new DCs
 (and expand an existing cluster into multiple DCs)
  3. 2012: When things were [mostly] simpler… • 7-10 nodes in

    each cluster • Two node types • Proxy • Data - account, container, object services • Load balancer • FreeBSD with ZFS ⚠ Do not attempt. • No centralized logs • No log analysis tools
  4. 2016: Adjusted for scale (blood, sweat, tears, dreams, starlight…) •

    Up to hundreds of nodes per cluster • Three node types • Proxy • Meta - account and container services • Data - object services • Load balancer cluster • Debian Linux • Centralized and searchable logs • Analytics via Spark and Hadoop
  5. Tens of thousands requests per second GET HEAD PUT DELETE

    (with notable variability between clusters)
  6. Hardware we like • Supermicro 36-disk chassis • 12-16 physical

    cores
 (24-32 HT cores) • 128GB RAM for proxies • 256GB RAM for data nodes • 10Gbps NICs (separate API vs. storage/replication networks) • 3 - 4 TB disks • Controller card • 2 disks for OS (RAID1) • 1 disk for OS hotswap • 4 disks for SSD caching • 29 disks for data storage • Usually expand by ½-row or a
 full row at a time
  7. Our Stack — Software OS Debian Base Swift (duh) —

    sometimes with backports Authentication Swauth — some internal patches and enhancements Keystone (APIv3) — starting with Bluemix accounts Metadata Search Elasticsearch Monitoring & Logging collectd, Nagios, Capacity Dashboard Logstash, Kibana, Graphite, Grafana slogging Automation Chef, Jenkins, Fabric
  8. Our Stack — Custom Middlewares • CDN operations (purge, load,

    CNAMEs, TTL, compression, etc.) • CDN origin pull • Search indexer (on successful PUT/POST/DELETE) • Search query operations • Checkpoint (account enable/disable/etc. abilities for resellers) • Internal management (sysmeta read/write, proxy-level recon)
  9. Lessons Learned: Automation • Make automation a must-have, day-one deliverable

    • Never launch something new without test/deploy automation • Must work across all environments (dev, QA, UAT/staging, prod) • Automation needs tests and metrics, too — it is code! • Functional testing should be an automated part of every deploy • Remember your orchestration (knowledge of Swift zones)
  10. Lessons Learned: Monitoring • Scale test any monitoring/logging infrastructure you

    put into place • Very obvious stuff: • Space and IOPS, errors from SMART/XFS/kernel/controller, etc. • HTTP response code aggregates, latency aggregates by verb, etc. • Swift metrics: • If nothing else, async pendings • Replicator failures and partitions/sec rates • Replicator last completion timestamp vs. ring push timestamp
  11. Lessons Learned: Monitoring • Any middleware you create needs to

    emit ops metrics • New features benefit from emitting usage metrics • Don’t forget debug-level log messages • Automatic checks for precipitating conditions that lead to failures
 (not just for the error log lines that result from them afterwards)
  12. Lessons Learned: Rebalancing • Keep tabs on your rebalance times


    (and keep them small when possible) • Coordinate rebalances around node/cluster maintenance • Don’t let IOPS levels grow too high before expanding capacity • Customer IOPS vs. Replicator & Auditor IOPS — know your limits
  13. Lessons Learned: Swift • Use 256 byte inode sizes (or

    the smallest you can get away with) • Using swauth? Use an SSD storage policy for AUTH_.auth containers • Namespace any custom API additions (and be consistent) • When possible, ask community about new middleware thoughts • Upstream is important! Stay involved and give back when possible