Big Data + Time Series + Real-time + Analytics avec KairosDB

Big Data + Time Series + Real-time + Analytics avec KairosDB

Les time series databases sont devenues une pièce fondamentale dans le stockage et dans l’analyse de plusieurs type de données machine. A l’ère de l’Internet des Objets, ces données machines portent en elle une composante essentielle, le temps !

Alors, comment les stocker de manière efficace et scalable avec ce flot des données qui déferle tous les jours sur votre système d’information ?

Quels sont les choix, les technologies, avantages et inconvénients des différentes solutions?

Loic Coulet, ingénieur logiciel et data chez Kratos ISE, nous propose un retour d'expérience sur des technologies Big Data et Times Series. Il reviendra sur son expérience dans le déploiement d’une application pour un opérateur satellite qui restitue en temps réel des KPI et le statut de chaque service en fonction du SLA et des règles business du client. Cette application permet aussi de générer des rapports de tendances et propose des outils d’aide à l’analyse.

Vous découvrirez au cours de sa présentation KairosDB, une base de donnée time series versatile et solide qui stocke les données dans Cassandra. Loic nous montrera comment lui et son équipe ont enrichi cet outil pour répondre à des besoins spécifiques BI et analytiques de leurs clients.

http://www.meetup.com/fr/Tlse-Data-Science/events/221370114/

6aa4f3c589d3108830b371d0310bc4da?s=128

Toulouse Data Science

July 02, 2015
Tweet

Transcript

  1. Big Data, Time Series, Real-Time & Analytics with KairosDB

  2. 2 www.integ-europe.com | TDS – KairosDB REX | Agenda 1.

    Presentation 2. Storage needs 1. Our Needs 2. Why KairosDB & Cassandra? 3. How does it work? 1. Design decisions 2. Data Model 3. Queries & aggregations 4. APIs 5. Modularity
  3. 3 www.integ-europe.com | TDS – KairosDB REX | Agenda 3.

    Add value to data 1. Enhance existing tools 2. Predict the future 3. Search & Discover correlations 4. A specific need: Satellite Business Intelligence 1. Real-time BI & Time Series 2. Simple configuration of a complex tool 3. Reporting 5. Conclusion 1. What kairosDB can do for us 2. Out contributions 3. And then ?
  4. 4 www.integ-europe.com | TDS – KairosDB REX | Agenda Presentation

  5. 5 www.integ-europe.com | TDS – KairosDB REX | Who’s talking?

    Loic COULET Software Engineer Kratos ISE Systems Intregration Software Passionate Now Learning Presenting today 10 years M&C CSM Satellite C2 Java Databases Web Computer Science Big Data Analytics NoSQL Big Data Business Intelligence Learning
  6. 6 www.integ-europe.com | TDS – KairosDB REX | Kratos Integral

    Systems Europe (KISE) Subsidiary of Kratos / Kratos ISI, Labège, France Toulouse
  7. 7 www.integ-europe.com | TDS – KairosDB REX | Kratos Integral

    Systems International (KISI) Formerly Integral Systems Incorporated USA - Lanham, MD, near Washington DC
  8. 8 www.integ-europe.com | TDS – KairosDB REX | Kratos Defense

    & Security Solutions
  9. 9 www.integ-europe.com | TDS – KairosDB REX | Informations about

    KISE 20 Employees System integrators …For satellite ground stations Multicutural company International EMEA, Asia Devices & software
  10. 10 www.integ-europe.com | TDS – KairosDB REX | KISE provides

    Ground stations solutions
  11. 11 www.integ-europe.com | TDS – KairosDB REX | Storage Needs

    • Our Needs • Why KairosDB & Cassandra?
  12. 12 www.integ-europe.com | TDS – KairosDB REX | needs Big

    Data? A new storage…
  13. 13 www.integ-europe.com | TDS – KairosDB REX | Big Data

    = 3V’s 3 V’s
  14. 14 www.integ-europe.com | TDS – KairosDB REX | Big Data

    = 3V’s NUMBER OF METRICS STORAGE REQUIREMENTS THROUGHPUT
  15. 15 www.integ-europe.com | TDS – KairosDB REX | Data Source

    Systems CSM M&C Satellite C2 Network Mgmt
  16. 16 www.integ-europe.com | TDS – KairosDB REX | Big Data

    Problem? • Store all data for any length of time ? • Correlation between data sources? • Further analysis to detect unknown information? • Learning model to anticipate failures? AND THEN … Systems generate… data amount Real-Time processing Legacy Storage is everything archived? How efficiently is data stored and used?
  17. 17 www.integ-europe.com | TDS – KairosDB REX | Why KairosDB

    & Cassandra ?
  18. 18 www.integ-europe.com | TDS – KairosDB REX | The choice

    drivers • Requirements • Millisecond precision • Efficient storage • Evolutive system • Affordable cost (R&D project, no large budget for a POC) • On-premises (no SaaS)
  19. 19 www.integ-europe.com | TDS – KairosDB REX | Back in

    2013 - Old generation TSDB In 2013 very few choices • Graphite http://graphite.wikidot.com/ (2006) • RRDtool http://oss.oetiker.ch/rrdtool/ (1999) • tsdb https://code.google.com/p/tsdb/ (2007) All based on static time buckets and no tagging.
  20. 20 www.integ-europe.com | TDS – KairosDB REX | Back in

    2013 - New Gen TSDB In 2013 few choices • OpenTSDB http://opentsdb.net/ ( 2010) • Druid http://druid.io/ ( end 2012) • Rhombus https://github.com/Pardot/Rhombus - also uses Cassandra (early 2013) • Seriesly https://github.com/dustin/seriesly (end 2012) • ElasticSearch stack ? http://elasticsearch.org • Then appeared KairosDB https://github.com/kairosdb/kairosdb ( 2nd quarter 2013)
  21. 21 www.integ-europe.com | TDS – KairosDB REX | Today Many

    new choices • InfluxDB http://influxdb.com/ ( end 2013) • DalmatinerDB https://dalmatiner.io/ ( end 2013) • Prometheus http://prometheus.io/ (2012, open-source released in Jan. 2015) – Undistributed time series DB • SiteWhere http://www.sitewhere.org/ (early 2014) • Rhombus https://github.com/Pardot/Rhombus (early 2013) • Akulumi https://github.com/akumuli/Akumuli (end 2014) • yawndb http://kukuruku.co/hub/erlang/yawndb-time-series- database (early 2014) • BlueFlood http://blueflood.io/ (end 2013) • Newts https://github.com/OpenNMS/newts Cassandra (end 2013) • SiteWhere http://www.sitewhere.org/ (early 2014) • … And many more
  22. 22 www.integ-europe.com | TDS – KairosDB REX | Main Reasons

    Back in 2013: OpenTSDB Vs KairosDB on some specific requirements OpenTSDB KairosDB License GPL Apache V2 Pluggable datastore No (WIP?) Yes Millisecond precision No (work-around in 2014) Yes Unlimited metrics & tags No (~16M) Yes Ad-hoc metric creation No (fixed in 2014) Yes Respect data integrity No Yes Extensible aggregation No (plugins in 2014?) Yes Presentation separated from processing No Yes Custom Data Types No Yes
  23. 23 www.integ-europe.com | TDS – KairosDB REX | Apache Cassandra

    Features 1. Distributed database (not relational) 2. Automatic sharding & replication 3. One node type, scale to any size 4. Widely used, big community, strong support 5. Free (Commercial support available)
  24. 24 www.integ-europe.com | TDS – KairosDB REX | 0 20

    40 60 80 100 RDBMS HDF Compressed HDF KairosDB Overhead Index Data Data size per sample (in bytes) Relational Database Indexed File archive KairosDB ~25B ~50B ~18B ~5B ~13B Transactions log Efficiency matters: Every bit counts Storage solution: Data size per sample (in bytes)
  25. 25 www.integ-europe.com | TDS – KairosDB REX | How does

    it work? • Time Series Database (KairosDB) • NoSQL Database as storage backend (Apache Cassandra) • Domain expertise and deep integration How does it work? • Design decisions • Data Model • Queries & aggregations • APIs • Modularity
  26. 26 www.integ-europe.com | TDS – KairosDB REX | How does

    it work? ONE single database with a Time Series Web Service frontend • A Time Series Database frontend (based on KairosDB) • A NoSQL Database as storage backend (Apache Cassandra) – We never query from Cassandra directly.
  27. 27 www.integ-europe.com | TDS – KairosDB REX | The architecture

    Carrier Monitoring M&C Satellite C2 NMS Data Collector agent Data Collector agent Data Collector agent Data Collector agent Data Integration Frontend Reporting & analytics Frontend Storage Web UI External Analytics systems Other Data Sources
  28. 28 www.integ-europe.com | TDS – KairosDB REX | Typical System(s)

    Frontend(s) Backend cluster Single Node Cluster • Commodity server • 4 to 8 TB HDD • 2 CPUs • 60GB RAM • Optional replica node Fault management Data replication Low cost Quick start Easy administration Scale to any size Best performances Backups
  29. 29 www.integ-europe.com | TDS – KairosDB REX | Optimal System?

    Fault-Tolerant Small Cluster Fault management Data replication Low cost Quick start Easy administration Scale to any size Best performances Backups Replication Factor: x3
  30. 30 www.integ-europe.com | TDS – KairosDB REX | Design Decisions

  31. 31 www.integ-europe.com | TDS – KairosDB REX | Design For

    Efficiency • Data is stored as time series • Data field name / metric name • Value • Timestamp • Contextual information (tag key/value pairs) • Optimize for large queries • Leverage Cassandra design • Row key = aggregate of metric + base timestamp + tag keys/values • Use column qualifier as timestamp offset, leverage wide rows Row key 0 . . . N . . . X Y Z X001{12234400}tag1=val1;tag2=val2… val1 valN ValY X001{12234400}tag1=val3;tag2=val2… ValX X002{12234400}tag1=val1;tag2=val2… val1 ValZ
  32. 32 www.integ-europe.com | TDS – KairosDB REX | Design decisions

    The design decisions were oriented towards • Simplicity • Scalability (one to N nodes) • Storage efficiency– a few bytes per sample (eq. DM4) • Fast processing (per node: 100K/s in - 500K/s out) • Flexibility and evolution • A unified format using time series
  33. 33 www.integ-europe.com | TDS – KairosDB REX | Data model

    Model is the same for all : time series How are the data points organized in the database? Datapoint - Metric - Timestamp - Value - Tags • Metric: identifies the measurement e.g. a telemetry mnemonic in Raw, Eu or State conversion • Timestamp: time of the measurement (ms) • Value: measurement itself, values are typed • Tags: characteristics of the measurement e. g. satellite or stream. Or quality of the measurement.
  34. 34 www.integ-europe.com | TDS – KairosDB REX | Data types

    Values are typed. A value can be of any data type. KairosDB provides basic types: • String / numerical (float, long) / complex Other data types are possible, including composite types : • Just implement an interface and a factory
  35. 35 www.integ-europe.com | TDS – KairosDB REX | System, queries

    & Ad-hoc aggregations
  36. 36 www.integ-europe.com | TDS – KairosDB REX | Presentation of

    the system architecture How do the systems interact? server Backend cluster Work station frontend query backend query raw data processing raw data
  37. 37 www.integ-europe.com | TDS – KairosDB REX | Presentation of

    the system architecture The two roles of the server Frontend query server Processing instructions Data selection - Data selection instruction are forwarded to the cluster to get raw data - Processing instructions are kept by the server to process the raw data
  38. 38 www.integ-europe.com | TDS – KairosDB REX | System throughput

    First implication: transfer speed server Backend cluster Work station SLOW FAST
  39. 39 www.integ-europe.com | TDS – KairosDB REX | System throughput

    First implication: transfer speed You are interested in the evolution of the central frequency of one carrier - 3,600 datapoints an hour - 86,400 datapoints a day - 31,536,000 datapoints a year SLOW FAST 31,536,000 DP 31,536,000 DP
  40. 40 www.integ-europe.com | TDS – KairosDB REX | System throughput

    What is the solution? You do not need 31,536,000 datapoints to see the evolution on one year  Use aggregation SLOW FAST 31,536,000 DP 365 DP Daily average
  41. 41 www.integ-europe.com | TDS – KairosDB REX | System throughput

    Second implication: data volume server Backend cluster Work station Very large data  10s of terabytes Large but limited memory  10s of gigabytes Very limited memory  Few gigabytes
  42. 42 www.integ-europe.com | TDS – KairosDB REX | Data model

    How do you query datapoints? • Metric name (mandatory) o E.g. metric.name • The tag filtering (optional) o E.g. source=src01, quality=good This will return all the datapoints that match these criteria  This might return too many points Day 1 Day 2 Day 3 Points from antenna 1 or 2
  43. 43 www.integ-europe.com | TDS – KairosDB REX | Query model

    How do you query datapoints? Group by Filtering Aggregation Data reduction steps
  44. 44 www.integ-europe.com | TDS – KairosDB REX | Query model

    And Then? Group by Filtering Aggregation Data reduction steps V. Aggregation Prediction Serialization RESULTS
  45. 45 www.integ-europe.com | TDS – KairosDB REX | 1. All

    features are provided as web services (HTTP / REST) 2. Open APIs 3. Interoperable data format based on JSON 4. Intuitive Web UI for starting using the system 5. APIs include: • Data acquisition • Data querying • Analysis features (prediction, correlations) Interoperability Features
  46. 46 www.integ-europe.com | TDS – KairosDB REX | Query API

    • KairosDB provide Web services for performing queries • Queries are JSON documents { "start_absolute": 1431986400000, "metrics": [ { "name": "kairosdb.jvm.free_memory", "limit": 1000000, "group_by": [ { "name": "tag", "tags": ["host"]}], "aggregators": [ { "name": "avg", "sampling": { "value": 1, "unit": "hours"}, "align_start_time": true, "align_sampling": true}]}], "cache_time": 0 }
  47. 47 www.integ-europe.com | TDS – KairosDB REX | Query Engine

    & aggregations • Ad-hoc queries and statistics calculation • Business Intelligence features already implemented (aggregate, drill & pivot) • Data aggregates: Min, Max, Sum, Average, Count, Rate, Std Deviation…etc • Multi-level Group-by feature using tags, value, or time • Filter by tags values
  48. 48 www.integ-europe.com | TDS – KairosDB REX | Aggregations Usually

    a function used to reduce or summarize the number of samples (features) In kairosDB an aggregator can do almost anything Group by Filtering Aggregation Data reduction steps
  49. 49 www.integ-europe.com | TDS – KairosDB REX | Aggregators are

    designed to be chained Work in streaming: Fast and memory-efficient Aggregations 5min Avg Derivative Derivative 1day Sum
  50. 50 www.integ-europe.com | TDS – KairosDB REX | • Min

    • Max • Avg • Sum • Std Dev • Scale • Rate (Derivative) • Least Square • Count • Percentile Aggregations: Available aggregators KairosDB
  51. 51 www.integ-europe.com | TDS – KairosDB REX | Flexibility &

    Modularity
  52. 52 www.integ-europe.com | TDS – KairosDB REX | KairosDB is

    modular core Module A Module B Module C Modules may add new: - Features, services - Web service API endpoints - Data types - Aggregators - Predictors - Query processor - New correlation models - Datastore(s) - …
  53. 53 www.integ-europe.com | TDS – KairosDB REX | Datastore Module

    Pluggable DataStore core HDF 5 kairosdb.service.datastore=org.kairosdb.datastore.cassandra.CassandraModule
  54. 54 www.integ-europe.com | TDS – KairosDB REX | Our usage:

    examples core Real-time Monitoring Dashboard Module Custom Analytics Module External Systems Dashboard Reporting Analytics
  55. 55 www.integ-europe.com | TDS – KairosDB REX | • Enhance

    existing tools • Predict the future • Search & Discover correlations Add Value to the data
  56. 56 www.integ-europe.com | TDS – KairosDB REX | Enhance The

    existing
  57. 57 www.integ-europe.com | TDS – KairosDB REX | Need for

    better aggregations We doubled the amount of aggregators
  58. 58 www.integ-europe.com | TDS – KairosDB REX | Issues •

    KairosDB time-windowed aggregation model is only “horizontal”
  59. 59 www.integ-europe.com | TDS – KairosDB REX | Issues •

    So it is highly affected by the series sampling rate
  60. 60 www.integ-europe.com | TDS – KairosDB REX | Issues •

    We also needed vertical aggregations – and made it
  61. 61 www.integ-europe.com | TDS – KairosDB REX | • Min

    • Max • Avg • Sum • Diff • Preference Aggregations: our vertical aggregators
  62. 62 www.integ-europe.com | TDS – KairosDB REX | Trying to

    predict the future
  63. 63 www.integ-europe.com | TDS – KairosDB REX | 1. Generic

    predictive analysis (in the query engine) is being implemented 2. Several predictors : linear (exponential Smoothing, holt, least squares), or dymanic with Dynamic Linear Model (DLM) Time Series Prediction analysis Actual Data Prediction
  64. 64 www.integ-europe.com | TDS – KairosDB REX | Search &

    Discover correlations
  65. 65 www.integ-europe.com | TDS – KairosDB REX | Correlate one

    reference series to many others Search correlations Interactive histogram represents most correlated series eirp carrier=Carrier_1_Ref
  66. 66 www.integ-europe.com | TDS – KairosDB REX | Same query

    model than in correlations search Time Series Correlations Discovery Interactive correlation matrix
  67. 67 www.integ-europe.com | TDS – KairosDB REX | • Real-time

    BI & Time Series • Simple configuration of a complex tool • Reporting A specific need: Satellite Business Intelligence
  68. 68 www.integ-europe.com | TDS – KairosDB REX | Real-time Business

    Intelligence Specific requirements have been provided by Es’hailsat. Determination of the service status from special business rules. Configurable through CSV file Reconfiguration of Compass devices from the dashboard Business Evaluation Analyze Implement KPI Evaluate
  69. 69 www.integ-europe.com | TDS – KairosDB REX | Satellite Business

    Intelligence = ? Business…
  70. 70 www.integ-europe.com | TDS – KairosDB REX | Business Evaluation

    Analyze Implement KPI Evaluate Satellite Business Intelligence = ? + Intelligence…
  71. 71 www.integ-europe.com | TDS – KairosDB REX | KPI?

  72. 72 www.integ-europe.com | TDS – KairosDB REX | …For your

    satellite services
  73. 73 www.integ-europe.com | TDS – KairosDB REX | Correlations for

    services monitoring dashboard Metrics KPI Limit checking Rules SLA Check Rules Services Report Data Source System Correlations Data Source System + Business rules
  74. 74 www.integ-europe.com | TDS – KairosDB REX | THE Dashboard

    Es’hailsat Monitoring Dashboard displays on a Web Browser with real-time information
  75. 75 www.integ-europe.com | TDS – KairosDB REX | Configuration using

    a CSV file The CMC Monitoring Dashboard configuration file (CSV file) is edited manually by the CMC operator. Configures: • KPI Thresholds • Monitoring Plans • Monitored Services
  76. 76 www.integ-europe.com | TDS – KairosDB REX | 10- Conclusion

    • What kairosDB can do for us • Out contributions • And then ?
  77. 77 www.integ-europe.com | TDS – KairosDB REX | What KairosDB

    can do for us
  78. 78 www.integ-europe.com | TDS – KairosDB REX | KairosDB Features

    • System operational and robust • On the fly statistical generation • Batch generation (rollups) expected soon – planned on KairosDB • System is: • Fast • Scalable (1 to N nodes) • Fault tolerant (1 to N replicas) • Easy to backup (e.g. Cassandra snapshots files) • Modular and evolutive
  79. 79 www.integ-europe.com | TDS – KairosDB REX | Lead to

    a simple system Thanks to KairosDB our system is simple, robust and versatile • Thanks to this system we could build efficient solution for generic and bespoke features
  80. 80 www.integ-europe.com | TDS – KairosDB REX | Contributions

  81. 81 www.integ-europe.com | TDS – KairosDB REX | Existing integrated

    toolsuite Because its model is simple with common web services API we could easily integrate it with • A reporting tool (using BIRT) • A real-time dashboard (using Grafana) • A Scientific computing environment (R)
  82. 82 www.integ-europe.com | TDS – KairosDB REX | Using BIRT

    reporting tool Reporting With BIRT
  83. 83 www.integ-europe.com | TDS – KairosDB REX | Grafana Real-time

    Dashboard – KairosDB Plugin
  84. 84 www.integ-europe.com | TDS – KairosDB REX | R interface

  85. 85 www.integ-europe.com | TDS – KairosDB REX | Library interfacing

    with R statistical environment R interface # Load the library library("kairosdb") # Create a metric queries metric1 = KairosMetric('kairosdb.jvm.free_memory',aggregators = aggregator.avg(1,TimeUnit.HOURS,alignSampling = TRUE), tagGroupBy = "host") metric2 = KairosMetric('kairosdb.jvm.max_memory',aggregators = aggregator.avg(1,TimeUnit.HOURS,alignSampling = TRUE), tagGroupBy = "host") # Query & prepare results query = KairosMetricQuery(list(metric1,metric2),'05/19/2015') response = executeQuery(query,'http://localhost:8081/api/v1/datapoints/query') series = getSeriesByTag(response, 'host') timestampsAsDate = convertTimestampsToDate(series[,'timestamp']) # plot results plot(timestampsAsDate,series[,'value'])
  86. 86 www.integ-europe.com | TDS – KairosDB REX |

  87. 87 www.integ-europe.com | TDS – KairosDB REX | • KairosDB

    will implement rollups (automatic pre-aggregation of data) • We keep on moving on time series Keep moving
  88. Thank You !

  89. Any Questions?