Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Distributed Timeseries Database in Go

Building Distributed Timeseries Database in Go

Vulcan an open source distributed timeseries database based on prometheus. In this talk we will talk about the origins and how we built it.

Matthew Campbell

February 25, 2017
Tweet

More Decks by Matthew Campbell

Other Decks in Programming

Transcript

  1. Distributed Timeseries
    Database in Go

    View Slide

  2. Go is the future of NoSQL/NewSQL

    View Slide

  3. Databases written in Go
    4 Prometheus
    4 CockroachDb
    4 InfluxDb
    4 Dgraph
    4 EtcD
    4 Consuld

    View Slide

  4. Talk about architecture

    View Slide

  5. What is a timeseries

    View Slide

  6. Use cases for timeseries
    4 Stocks
    4 Monitoring
    4 IOT

    View Slide

  7. About timeseries
    4 Timeseries can be lossy
    4 Timeseries compress uniquely on data sets
    4 Write heavy
    4 Key, Time, DataPoint
    4 CNX:IND, June 15 12:23, $23.40

    View Slide

  8. Dark days
    4 Graphite
    4 InfluxDb
    4 Mysql storing metrics
    4 OpenTSDB (UGGHHHHHH)

    View Slide

  9. Prometheus

    View Slide

  10. View Slide

  11. rate(http_request_latency[1m])

    View Slide

  12. View Slide

  13. Initial architecture
    Beta for 3000 customers

    View Slide

  14. Hash sharded Prometheus
    3-4 per datacenter

    View Slide

  15. View Slide

  16. View Slide

  17. Performance requirements
    4 3 Gbits/sec of traffic
    4 100k Writes a second
    4 50ms Reads
    4 100,000 customers to start
    4 20 TB of storage

    View Slide

  18. Introducing Vulcan
    https://github.com/digitalocean/vulcan

    View Slide

  19. Strange PRs

    View Slide

  20. A fateful meeting at Soundcloud...

    View Slide

  21. Architecture changes
    4 Split to microservices
    4 Containerization
    4 Message Queues

    View Slide

  22. Pipelining data

    View Slide

  23. View Slide

  24. Scaling storage

    View Slide

  25. View Slide

  26. Metrics format

    View Slide

  27. Timeseries Schema
    4 V1 Timeseries Table
    4 key (Combined Key)
    4 timestamp (Combined Key)
    4 datapoint (float64)

    View Slide

  28. 4 V2 Chunks (1 KB)

    View Slide

  29. 4 V2 Timeseries Table
    4 key (Combined Key)
    4 timestamp range (2hours) (Combined Key)
    4 raw data (1kb blob)

    View Slide

  30. 4 Index Table
    4 Customer (Combined key)
    4 keyPrefix (Combined key)
    4 time
    4 key

    View Slide

  31. In memory query engine

    View Slide

  32. Downsampling

    View Slide

  33. Final Architecture

    View Slide

  34. View Slide

  35. Questions?

    View Slide