Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Refactoring InfluxDB: from Go to Go

Avatar for Paul Dix Paul Dix
February 19, 2015
6.5k

Refactoring InfluxDB: from Go to Go

Talk given at the Golang SF meetup about the rewrite from 0.8 to 0.9 of InfluxDB

Avatar for Paul Dix

Paul Dix

February 19, 2015
Tweet

Transcript

  1. Refactoring InfluxDB: from Go to Go Paul Dix CEO and

    cofounder of InfluxDB @pauldix paul@influxdb.com
  2. Data model • Databases • Measurements • cpu_load, temperature, log_lines,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc.
  3. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset
  4. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset • Points • Fields - bool, int64, float64, string, []byte • Timestamp - nano epoch
  5. Writing Data {! "database": "mydb",! "retentionPolicy": "30d",! "points": [! {!

    "name": "cpu_load",! "tags": {! "host": "server01",! "region": "us-west"! },! "timestamp": "2009-11-10T23:00:00Z",! "fields": {! "value": 0.64! }! }! ]! }! Measurement Tags Fields
  6. SELECT value FROM cpu WHERE host = 'serverA'! {! "results":[!

    {! "query": "SELECT value FROM cpu WHERE host='serverA'",! "series": [! {! "name": "cpu",! "tags": {! "host": "serverA"! },! "columns": ["time", "value"],! "values": [! ["2009-11-10T23:00:00Z", 22.1],! ["2009-11-10T23:00:10Z", 25.2]! ]! }! ]! }! ]! }! QUERY: RESULTS:
  7. SELECT value FROM cpu! WHERE host = ‘serverA'OR host =

    'serverB'! QUERY: {! "series": [! {! "name": "cpu",! "tags": {! "host": "serverA"! },! "columns": ["time", "value"],! "values": []! },! {! "name": "cpu",! "tags": {! "host": "serverB"! },! "columns": ["time", "value"],! "values": []! } ! ]! }! SERIES! IN RESULT:
  8. SELECT percentile(90, value) FROM cpu! WHERE time > now() -

    4h! GROUP BY time(10m), region QUERY: [! {! "name": "cpu",! "tags": {! "region": "us-west"! },! "columns": ["time", "percentile"],! "values": []! },! {! "name": "cpu",! "tags": {! "region": "us-east"! },! "columns": ["time", "percentile"],! "values": []! } ! ]! SERIES! IN RESULT:
  9. Multiple aggregates SELECT mean(value), percentile(90, value), min(value), max(value)! FROM cpu!

    WHERE host='serverA' AND time > now() - 48h! GROUP BY time(1h)!
  10. Return every series in CPU SELECT mean(value)! FROM cpu! WHERE

    time > now() - 48h! GROUP BY time(1h), *!
  11. {! "results":[! {! "query": "SHOW MEASUREMENTS",! "series": [! {! "name":

    "measurements",! "columns": ["name"],! "values": [! ["cpu"],! ["memory"],! ["network"]! ]! }! ]! }! ]! }!
  12. {! "results":[! {! "query": "SHOW SERIES",! "series": [! {! "name":

    "cpu",! "columns": ["id", "region", "host"],! "values": [! [1, "us-west", "serverA"],! [2, "us-east", "serverB"]! ]! }! ]! }! ]! }!
  13. {! "query": "SHOW MEASUREMENTS WHERE service='redis'",! "series": [! {! "name":

    "measurements",! "name": "series",! "columns": ["measurement"],! "values": [! ["key_count"],! ["connections"]! ]! }! ]! }!
  14. {! "query": "SHOW TAG KEYS from cpu",! "series": [! {!

    "name": "keys",! "columns": ["key"],! "values": [! ["region"],! ["host"]! ]! }! ]! }!
  15. {! "query": "SHOW TAG VALUES WITH KEY = service",! "series":

    [! {! "name": "series",! "columns": ["service"],! "values": [! ["redis"],! ["apache"]! ]! }! ]! }!
  16. {! "query": "SHOW TAG VALUES FROM cpu WITH KEY =

    service",! "series": [! {! "name": "series",! "columns": ["service"],! "values": [! ["redis"],! ["apache"]! ]! }! ]! }!
  17. – Joel Spolsky on rewriting from scratch Things You Should

    Never Do, Part I http://www.joelonsoftware.com/articles/fog0000000069.html “… the single worst strategic mistake that any software company can make …”
  18. Feature Requests Moving average, different kinds of derivatives, ways to

    fill data, top N for a given period, exact data point for min/max
  19. Previous API Many series with metadata in the name like

    in Graphite region.us.data_center.1.host.serverA.network_in! region.us.data_center.1.host.serverA.network_out! 5m.mean.region.us.data_center.1.host.serverA.network_in! 5m.mean.region.us.data_center.1.host.serverA.network_out!
  20. Understandable API Design: Retention Policies • Previously called shard spaces

    • Users tell the server which shard space to read/ write data into based on a regex to match against the series name
  21. Pushing users in the right direction: Tags SELECT mean(value) FROM

    cpu! WHERE host = 'serverA'! Users wanted this:
  22. Structural Design Problems LIST SERIES /.*region\.uswest.*/! Tell users to have

    many series: SELECT mean(value)! FROM merge(/.*region\.uswest.*/)! WHERE time > now() - 2h! GROUP BY time(5m)!
  23. Structural Design Problems: Query Engine • Pipes raw data over

    the network • Need to redesign to get data locality • MapReduce framework
  24. Underlying Technology Choices • LevelDB • Too many file handles

    • No online backups • Too hard to transfer shard from one server to another
  25. Underlying Technology Choices • Flex & Bison for Parser •

    Very hard to understand and update • CGo code
  26. All of these things together pointed to a rewrite Large

    API changes, underlying technology changes, and code in every area getting touched