Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch & 63 Million WordPress Sites

25e2ecf9b520e06d71e47ab083924300?s=47 xyu
February 06, 2014

Elasticsearch & 63 Million WordPress Sites

Overview of the Elasticsearch infrastructure that Automattic maintains to support WordPress.com.

25e2ecf9b520e06d71e47ab083924300?s=128

xyu

February 06, 2014
Tweet

More Decks by xyu

Other Decks in Technology

Transcript

  1. Elasticsearch & 63 Million WordPress Sites Elasticsearch Boston (Feb. 2014)

  2. Xiao Yu Code Wrangler — Automattic @HypertextRanch me@xyu.io xyu 

      
  3.   VaultPress Jetpack Simplenote Akismet Polldaddy Gravatar VideoPress IntenseDebate

    Simperium Code Poet Cloudup
  4. Cluster Stats • 63M Sites • 743M Documents • 12TB

    Primary + Replicas • 51M Query Ops / Day • 15M Index Ops / Day 2 Major Use Cases • Global Search ! • Local Search Elasticsearch + WordPress.com
  5. Infrastructure Layout Internal API Cache REST API PHP Node 1

    Node 2 Cluster A Node 1 Node 2 Node 3 Node n Cluster B Stats
  6. Documents & Types /index/post {
 blog_id: 123,
 post_id: 456,
 title:

    "Search!",
 content: "…",
 blog: {
 lang: "en",
 …
 },
 …
 } /index/blog {
 blog_id: 123,
 url: "www.xyu.io",
 follower_ids: [
 789,
 …
 ],
 lang: "en",
 indexable: true,
 …
 }
  7. Storage Strategy • Grow Number of Indices
 (10M Sites /

    Index) • 25 Shards / Index
 (400K Sites / Shard) • 3 Copies of Data
 (1 Primary + 2 Replicas) 2 Major Use Cases • Global Search • Query All Shards • Local Search • Query One Shard Indicies & Shards
  8. Nodes & Clusters Warning, YMMV! !  

  9. Monitoring Cluster Health

  10. Monitoring Cluster Health

  11. Monitoring Cluster Health

  12. Monitoring Cluster Health

  13. Thanks! @HypertextRanch me@xyu.io xyu   