Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Refactoring InfluxDB: from Go to Go

Paul Dix
February 19, 2015
6.5k

Refactoring InfluxDB: from Go to Go

Talk given at the Golang SF meetup about the rewrite from 0.8 to 0.9 of InfluxDB

Paul Dix

February 19, 2015
Tweet

Transcript

  1. Refactoring InfluxDB: from Go to Go Paul Dix CEO and

    cofounder of InfluxDB @pauldix paul@influxdb.com
  2. Data model • Databases • Measurements • cpu_load, temperature, log_lines,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc.
  3. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset
  4. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset • Points • Fields - bool, int64, float64, string, []byte • Timestamp - nano epoch
  5. Writing Data {! "database": "mydb",! "retentionPolicy": "30d",! "points": [! {!

    "name": "cpu_load",! "tags": {! "host": "server01",! "region": "us-west"! },! "timestamp": "2009-11-10T23:00:00Z",! "fields": {! "value": 0.64! }! }! ]! }! Measurement Tags Fields
  6. SELECT value FROM cpu WHERE host = 'serverA'! {! "results":[!

    {! "query": "SELECT value FROM cpu WHERE host='serverA'",! "series": [! {! "name": "cpu",! "tags": {! "host": "serverA"! },! "columns": ["time", "value"],! "values": [! ["2009-11-10T23:00:00Z", 22.1],! ["2009-11-10T23:00:10Z", 25.2]! ]! }! ]! }! ]! }! QUERY: RESULTS:
  7. SELECT value FROM cpu! WHERE host = ‘serverA'OR host =

    'serverB'! QUERY: {! "series": [! {! "name": "cpu",! "tags": {! "host": "serverA"! },! "columns": ["time", "value"],! "values": []! },! {! "name": "cpu",! "tags": {! "host": "serverB"! },! "columns": ["time", "value"],! "values": []! } ! ]! }! SERIES! IN RESULT:
  8. SELECT percentile(90, value) FROM cpu! WHERE time > now() -

    4h! GROUP BY time(10m), region QUERY: [! {! "name": "cpu",! "tags": {! "region": "us-west"! },! "columns": ["time", "percentile"],! "values": []! },! {! "name": "cpu",! "tags": {! "region": "us-east"! },! "columns": ["time", "percentile"],! "values": []! } ! ]! SERIES! IN RESULT:
  9. Multiple aggregates SELECT mean(value), percentile(90, value), min(value), max(value)! FROM cpu!

    WHERE host='serverA' AND time > now() - 48h! GROUP BY time(1h)!
  10. Return every series in CPU SELECT mean(value)! FROM cpu! WHERE

    time > now() - 48h! GROUP BY time(1h), *!
  11. {! "results":[! {! "query": "SHOW MEASUREMENTS",! "series": [! {! "name":

    "measurements",! "columns": ["name"],! "values": [! ["cpu"],! ["memory"],! ["network"]! ]! }! ]! }! ]! }!
  12. {! "results":[! {! "query": "SHOW SERIES",! "series": [! {! "name":

    "cpu",! "columns": ["id", "region", "host"],! "values": [! [1, "us-west", "serverA"],! [2, "us-east", "serverB"]! ]! }! ]! }! ]! }!
  13. {! "query": "SHOW MEASUREMENTS WHERE service='redis'",! "series": [! {! "name":

    "measurements",! "name": "series",! "columns": ["measurement"],! "values": [! ["key_count"],! ["connections"]! ]! }! ]! }!
  14. {! "query": "SHOW TAG KEYS from cpu",! "series": [! {!

    "name": "keys",! "columns": ["key"],! "values": [! ["region"],! ["host"]! ]! }! ]! }!
  15. {! "query": "SHOW TAG VALUES WITH KEY = service",! "series":

    [! {! "name": "series",! "columns": ["service"],! "values": [! ["redis"],! ["apache"]! ]! }! ]! }!
  16. {! "query": "SHOW TAG VALUES FROM cpu WITH KEY =

    service",! "series": [! {! "name": "series",! "columns": ["service"],! "values": [! ["redis"],! ["apache"]! ]! }! ]! }!
  17. – Joel Spolsky on rewriting from scratch Things You Should

    Never Do, Part I http://www.joelonsoftware.com/articles/fog0000000069.html “… the single worst strategic mistake that any software company can make …”
  18. Feature Requests Moving average, different kinds of derivatives, ways to

    fill data, top N for a given period, exact data point for min/max
  19. Previous API Many series with metadata in the name like

    in Graphite region.us.data_center.1.host.serverA.network_in! region.us.data_center.1.host.serverA.network_out! 5m.mean.region.us.data_center.1.host.serverA.network_in! 5m.mean.region.us.data_center.1.host.serverA.network_out!
  20. Understandable API Design: Retention Policies • Previously called shard spaces

    • Users tell the server which shard space to read/ write data into based on a regex to match against the series name
  21. Pushing users in the right direction: Tags SELECT mean(value) FROM

    cpu! WHERE host = 'serverA'! Users wanted this:
  22. Structural Design Problems LIST SERIES /.*region\.uswest.*/! Tell users to have

    many series: SELECT mean(value)! FROM merge(/.*region\.uswest.*/)! WHERE time > now() - 2h! GROUP BY time(5m)!
  23. Structural Design Problems: Query Engine • Pipes raw data over

    the network • Need to redesign to get data locality • MapReduce framework
  24. Underlying Technology Choices • LevelDB • Too many file handles

    • No online backups • Too hard to transfer shard from one server to another
  25. Underlying Technology Choices • Flex & Bison for Parser •

    Very hard to understand and update • CGo code
  26. All of these things together pointed to a rewrite Large

    API changes, underlying technology changes, and code in every area getting touched