Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PTSM - #1 - Warp10 - Advanced Time Series Technology & Use Cases

PTSM - #1 - Warp10 - Advanced Time Series Technology & Use Cases

Introduction à Warp10 et aux usages avancés autour de la série temporelle par Mathias Heberts (CTO et co-fondateur SenX, éditeur de Warp10)



September 25, 2019


  1. Advanced Time Series Technology & Use Cases Mathias Herberts -

    CTO Mathias.Herberts@senx.io @herberts
  2. Introduction

  3. Time Series are universal and ubiquitous ▪ Time Series are

    all about capturing change, not simply state ▪ Time Series help understand the past and predict the future ▪ Time Series are the bridges between the physical world and its digital twin ▪ Time Series are the memory of the universe we live in ▪ Time Series are eating the world
  4. What are Time Series? ▪ Time Series are sequences of

    values indexed by time ▪ Time is an illusion, any sequence can be seen as a Time Series
  5. Where can Time Series be found? ▪ Time Series are

    present in many if not all verticals
  6. Why do Time Series require specific tools? ▪ Time Series

    data are different by nature ▪ Their production rate is massive and continuous ▪ The historical datasets that need to be retained are gigantic ▪ The access pattern to Time Series data is unique ▪ The type of analysis performed on Time Series data is uncommon ▪ Traditional tools MUST be adapted if they are to be used
  7. for Machine Data Storage Analytics Visualization

  8. Data Model

  9. A universal data model

  10. Geo Time Series™ data containers

  11. Architecture

  12. Warp 10™ standalone version Single jar, no external dependencies in-memory

    disk based persistence HDD / SSD
  13. Standalone Warp 10™ Standalone Warp 10™ Standalone Warp 10™ Standalone

    with datalog replication
  14. Standalone Warp 10™ Standalone Warp 10™ Standalone Warp 10™ Standalone

    with datalog sharding
  15. Metadata index WarpScript™ analytics engine Ingestion endpoint Persistence daemon Warp

    10™ distributed version
  16. Storage

  17. A high performance Geo TSDB ▪ Simple interaction via HTTP

    and text format for easy integration ▪ Ability to ingest and fetch very long streams of data points ▪ Support for WebSocket input and output ▪ Fine grained access control via cryptographic tokens ▪ Proven scalability with no cardinality problems ▪ Support for Univariate and Multivariate data points ▪ Distributed throttling mechanisms for number of series and data points rate
  18. Anatomy of storage engine input TIMESTAMP/LATITUDE:LONGITUDE/ELEVATION CLASS{LABELS} VALUE ▪ Support

    for time precisions from ns to ms ▪ Class and labels support UTF-8 in both names and values ▪ Support for 5 types LONG, DOUBLE, BOOLEAN, STRING, BINARY 64 -Infinity NaN 4E-05 F ’foo’ b64:UmVmbHV4Cg== ▪ Support for nested Multivariate values - each MV is a GTS (Geo Time Series™) [ 2/42 64/48.0:-4.5/’hello’ 128/[ 1 2 3 ] 256/hex:12345 ]
  19. Real world scalability and performance figures ▪ Known deployments of

    over 500M series ▪ Ingestion performance of 120M data points per second on a single in-memory ▪ Historical datasets of several hundreds of trillions of data points ▪ Sustained ingestion of several million data points per second per ingress ▪ Ingestion of over 300k data points per second on a single thread on a RPi 4 ▪ Random deletions at several million data points per second
  20. Analytics

  21. Built around a data processing language

  22. Full featured language dedicated to Time Series ▪ Fully functional

    concatenative language ▪ Turing complete with loops, conditionals, asynchronous transfer of control ▪ Supports Geo Time Series as first class citizens ▪ Over 980 functions available - from summary statistics to signal processing ▪ 6 frameworks - BUCKETIZE, MAP, REDUCE, FILL, APPLY, FILTER ▪ Fully extensible and embeddable ▪ Ability to call external programs
  23. Web IDE and Visual Studio Code Plugin

  24. Powerful expressiveness [ ‘TOKEN’ ‘class’ {} NOW 24 h ]

    FETCH ‘gts’ STORE // Fetch last 24 hours [ $gts bucketizer.mean NOW 0 1 m ] BUCKETIZE ‘mean’ STORE // mean every 1’ [ $gts mapper.rate 1 0 0 ] MAP ‘rate’ STORE // Compute rate of change NEWGTS 'randomwalk' RENAME 0.0 'v' STORE 42 PRNG 1 1000 <% 10 m * NOW SWAP - NaN DUP DUP $v SRAND 0.5 - + 'v' STORE $v ADDVALUE %> FOR
  25. Complex algorithms available as simple functions NEWGTS 'randomwalk' RENAME 0.0

    'v' STORE 42 PRNG 1 1000 <% 10 m * NOW SWAP - NaN DUP DUP $v SRAND 0.5 - + 'v' STORE $v ADDVALUE %> FOR DUP 100 LTTB
  26. Hiding WarpScript complexity in macros NEWGTS 'randomwalk' RENAME 0.0 'v'

    STORE 42 PRNG 1 1000 <% 10 m * NOW SWAP - NaN DUP DUP $v SRAND 0.5 - + 'v' STORE $v ADDVALUE %> FOR 'UTC' @senx/cal/byday
  27. None
  28. Complete documentation online at warp10.io

  29. Visualization

  30. Flexible visualization options

  31. Full support for Processing in WarpScript 800 'width' STORE 800

    'height' STORE 400.0 'maxspeed' STORE 40000.0 'maxalt' STORE 3.0 2.0 2.0 @orbit/heatmap/kernel/triangular 'kernel' STORE @orbit/heatmap/palette/classic 'palette' STORE 'TOKEN''token' STORE $width $height '2D' PGraphics 'MULTIPLY' PblendMode 'CENTER' PimageMode [ $token '~(ALT|CAS)' {} NOW -2000000 ] FETCH DUP 0 GET LASTTICK 'now' STORE [ SWAP bucketizer.last $now STU 0 ] BUCKETIZE // Create heatmap <% 7 GET LIST-> DROP 'CAS' STORE 'ALT' STORE <% $CAS ISNULL NOT $ALT ISNULL NOT && %> <% $kernel $CAS $maxspeed / $width * $ALT $maxalt / 1.0 SWAP - $height * Pimage %> IFT 0 NaN NaN NaN NULL %> MACROREDUCER 'GRAPHER' STORE [ SWAP [] $GRAPHER ] REDUCE DROP // Colorize Ppixels <% DROP Palpha $palette SWAP GET %> LMAP PupdatePixels Pencode Pdecode $width $height '2D' PGraphics // Do the grid PnoFill 0 0 $width 1 - $height 1 - Prect 2.0 PstrokeWeight 200.0 Pcolor Pstroke 250.0 $maxspeed / $width * DUP 0 SWAP $height Pline 0 10000 $maxalt / 1.0 SWAP - $height * DUP $width SWAP Pline SWAP 0 0 Pimage Pencode
  32. Extensibility

  33. Macros Factorizing WarpScript code to separate responsabilities and encourage reusability

    <% // This is a macro body %> ▪ Macros can be deployed on the server side ▪ Macros can be packaged in a jar ▪ Macros can access some config elements (MACROCONFIG) ▪ Macros can be deployed on a remote server
  34. WarpFleet™ Resolver Enable hosting of macros on remote servers ▪

    Macros can be hosted on any HTTP server including GitHub ▪ Resolution is performed at runtime ▪ Support for multiple macro repositories ▪ Script execution can modify repositories ▪ WarpFleet™ resolver can be disabled altogether ▪ Support for versioning via the IMPORT function ▪ SenX provides a growing set of macros via its own repo ▪ Warp 10 does intelligent caching of fetched macros ▪ Support for runtime injection of elements (MACROCONFIG)
  35. Extensions Add, remove or modify WarpScript functions ▪ Write new

    functions in Java (JVM), Go, Rust, C++, C (JNA) ▪ Simple API to interact with the WarpScript execution runtime ▪ Freedom of licensing for extensions ▪ Growing list of existing extensions, contributions welcome! Barcode, GeoTransforms, Grok, InfluxDB, JDBC, PCap, PMML, Polyglot, Redis, S3, Swift, TensorFlow, EGADS, Elastic, GCode, H2O, Keras, memcached, Parquet, ORC, Neo4J, OpenTSDB, LAS, Pig, Spark Some commercial ones by SenX LevelDB, MapMatching, Forecasting, WarpScript Compiler
  36. Plugins Extend Warp 10 by adding new features ▪ Plugins

    are run in the Warp 10 process ▪ Plugins can be in a Java (JVM) or Go, Rust, C, C++ (JNA) ▪ Very diverse things can be done using plugins ▪ Authentication plugins add new types of credentials ▪ No license constraints Kafka, MQTT, WarpStudio, Zeppelin, HTTP, UDP, TCP, Py4J, InfluxDB Line Protocol OVH is considering open sourcing plugins to support PromQL, Graphite, OpenTSDB, InfluxQL query languages Poke them to make it happen!
  37. WarpFleet™ Community site for finding extensions, macro packages and plugins

    ▪ CLI tool on NPM - npm install -g @senx/warpfleet ▪ Modules are hosted on maven repositories ▪ Benefit from dependency resolution mechanisms ▪ Modules can be fetched by Spark for example ▪ Again, contributions more than welcome!
  38. Integrations

  39. Augment existing tools and frameworks

  40. Use Cases

  41. Flight data analysis for fleet reliability Pressure Altitude vs TAS

  42. Weather data 1,000,000 cells 400 parameters 208 time steps 86

    B data points every 6 hours in 400 M series Using rank 2 tensors multi values Warp 10 can store all of GFS in just 1,000,000 Geo Time Series
  43. Helping racing sailboats fly Automatic phase extraction by TWA analysis

  44. chemtrails-locator.com 200,000 aircrafts 15 B positions Spatio-temporal indexing 150 km

    / 5 minutes cells Served entirely by Warp 10
  45. sandbox.senx.io

  46. @SenXHQ - @Warp10io - @WarpScript senx.io - warp10.io