Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cortex: Horizontally Scalable Prometheus

Cortex: Horizontally Scalable Prometheus

Cortex: open-source, horizontally-scalable, distributed Prometheus

In this talk we'll present a solution for a horizontally
scalable, distributed Prometheus implementation dubbed "Cortex" (née Frankenstein).

Cortex turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a cloud-native storage layer based on DynamoDB. Cortex not only add horizontal scalability in terms of ingest rate and active timeseries, but also adds virtually infinite data retention and some interesting opportunities for query optimisation.

Bio: Tom is an independant Software Engineer. Previously he was at Weaveworks as Director of Software Engineering, and Google as a Site Reliability Manager for Google Analytics. Before that he was Founder, VP Eng and CTO at Acunu, and before that a Software Engineer at XenSource. In his spare time, Tom likes to make craft beer and build home automation systems.

Tom Wilkie

June 14, 2017
Tweet

Other Decks in Technology

Transcript

  1. requirements: 1. API Compatible 2. Cost effective to run 3.

    Easy to operate & scale 4. Scale to thousands of users
  2. June 2016 Started design doc in Aug 2016 Launched at

    PromCon Nov 2016 Split into own repo Mar 2017 Launched hosted alerting Jun 2017 Talk at London Meetup http://goo.gl/prdUYV
  3. Frontend Ditributor DynamoDB Memcache Consul Ingester Write requests Read requests

    Control requests Prometheus Your Jobs S3 Cortex Architecture
  4. Frontend Ditributor DynamoDB Memcache Consul Ingester Write requests Read requests

    Control requests Prometheus Your Jobs S3 Table Manager Cortex Architecture
  5. Frontend Ditributor Querier Table Manager DynamoDB Memcache Consul Ingester Write

    requests Read requests Control requests Prometheus Your Jobs S3 Cortex Architecture
  6. Frontend Ditributor Querier Table Manager DynamoDB Memcache Consul Ingester Write

    requests Read requests Control requests Prometheus Your Jobs S3 Ruler Cortex Architecture
  7. Frontend Ditributor Querier Table Manager DynamoDB Memcache Consul Ruler Ingester

    Write requests Read requests Control requests Prometheus Your Jobs Cortex Architecture
  8. Running for ~10months • Availability: querier unavailable for ~12hrs ~99.95%

    • Durability: lost <2 days of data >99.5% • 99th percentile write performance ~60ms • 99th percentile query performance <200ms
  9. Future • Direct chunk writes from Prometheus -> Chunk Store

    • Single process for ease of experimentation & development • Storage for GCP - BigTable etc • Separate ingester index • Use prometheus/tsdb for the ingesters
  10. I left Weaveworks at the begging of June to focus

    on Prometheus & Cortex development. I’m available for training, bespoke development and support work. email: [email protected]