Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PTSM - #1 - Warp10 - Advanced Time Series Technology & Use Cases

TimeSeriesFr
September 25, 2019

PTSM - #1 - Warp10 - Advanced Time Series Technology & Use Cases

Introduction à Warp10 et aux usages avancés autour de la série temporelle par Mathias Heberts (CTO et co-fondateur SenX, éditeur de Warp10)

TimeSeriesFr

September 25, 2019
Tweet

More Decks by TimeSeriesFr

Other Decks in Technology

Transcript

  1. Time Series are universal and ubiquitous ▪ Time Series are

    all about capturing change, not simply state ▪ Time Series help understand the past and predict the future ▪ Time Series are the bridges between the physical world and its digital twin ▪ Time Series are the memory of the universe we live in ▪ Time Series are eating the world
  2. What are Time Series? ▪ Time Series are sequences of

    values indexed by time ▪ Time is an illusion, any sequence can be seen as a Time Series
  3. Where can Time Series be found? ▪ Time Series are

    present in many if not all verticals
  4. Why do Time Series require specific tools? ▪ Time Series

    data are different by nature ▪ Their production rate is massive and continuous ▪ The historical datasets that need to be retained are gigantic ▪ The access pattern to Time Series data is unique ▪ The type of analysis performed on Time Series data is uncommon ▪ Traditional tools MUST be adapted if they are to be used
  5. A high performance Geo TSDB ▪ Simple interaction via HTTP

    and text format for easy integration ▪ Ability to ingest and fetch very long streams of data points ▪ Support for WebSocket input and output ▪ Fine grained access control via cryptographic tokens ▪ Proven scalability with no cardinality problems ▪ Support for Univariate and Multivariate data points ▪ Distributed throttling mechanisms for number of series and data points rate
  6. Anatomy of storage engine input TIMESTAMP/LATITUDE:LONGITUDE/ELEVATION CLASS{LABELS} VALUE ▪ Support

    for time precisions from ns to ms ▪ Class and labels support UTF-8 in both names and values ▪ Support for 5 types LONG, DOUBLE, BOOLEAN, STRING, BINARY 64 -Infinity NaN 4E-05 F ’foo’ b64:UmVmbHV4Cg== ▪ Support for nested Multivariate values - each MV is a GTS (Geo Time Series™) [ 2/42 64/48.0:-4.5/’hello’ 128/[ 1 2 3 ] 256/hex:12345 ]
  7. Real world scalability and performance figures ▪ Known deployments of

    over 500M series ▪ Ingestion performance of 120M data points per second on a single in-memory ▪ Historical datasets of several hundreds of trillions of data points ▪ Sustained ingestion of several million data points per second per ingress ▪ Ingestion of over 300k data points per second on a single thread on a RPi 4 ▪ Random deletions at several million data points per second
  8. Full featured language dedicated to Time Series ▪ Fully functional

    concatenative language ▪ Turing complete with loops, conditionals, asynchronous transfer of control ▪ Supports Geo Time Series as first class citizens ▪ Over 980 functions available - from summary statistics to signal processing ▪ 6 frameworks - BUCKETIZE, MAP, REDUCE, FILL, APPLY, FILTER ▪ Fully extensible and embeddable ▪ Ability to call external programs
  9. Powerful expressiveness [ ‘TOKEN’ ‘class’ {} NOW 24 h ]

    FETCH ‘gts’ STORE // Fetch last 24 hours [ $gts bucketizer.mean NOW 0 1 m ] BUCKETIZE ‘mean’ STORE // mean every 1’ [ $gts mapper.rate 1 0 0 ] MAP ‘rate’ STORE // Compute rate of change NEWGTS 'randomwalk' RENAME 0.0 'v' STORE 42 PRNG 1 1000 <% 10 m * NOW SWAP - NaN DUP DUP $v SRAND 0.5 - + 'v' STORE $v ADDVALUE %> FOR
  10. Complex algorithms available as simple functions NEWGTS 'randomwalk' RENAME 0.0

    'v' STORE 42 PRNG 1 1000 <% 10 m * NOW SWAP - NaN DUP DUP $v SRAND 0.5 - + 'v' STORE $v ADDVALUE %> FOR DUP 100 LTTB
  11. Hiding WarpScript complexity in macros NEWGTS 'randomwalk' RENAME 0.0 'v'

    STORE 42 PRNG 1 1000 <% 10 m * NOW SWAP - NaN DUP DUP $v SRAND 0.5 - + 'v' STORE $v ADDVALUE %> FOR 'UTC' @senx/cal/byday
  12. Full support for Processing in WarpScript 800 'width' STORE 800

    'height' STORE 400.0 'maxspeed' STORE 40000.0 'maxalt' STORE 3.0 2.0 2.0 @orbit/heatmap/kernel/triangular 'kernel' STORE @orbit/heatmap/palette/classic 'palette' STORE 'TOKEN''token' STORE $width $height '2D' PGraphics 'MULTIPLY' PblendMode 'CENTER' PimageMode [ $token '~(ALT|CAS)' {} NOW -2000000 ] FETCH DUP 0 GET LASTTICK 'now' STORE [ SWAP bucketizer.last $now STU 0 ] BUCKETIZE // Create heatmap <% 7 GET LIST-> DROP 'CAS' STORE 'ALT' STORE <% $CAS ISNULL NOT $ALT ISNULL NOT && %> <% $kernel $CAS $maxspeed / $width * $ALT $maxalt / 1.0 SWAP - $height * Pimage %> IFT 0 NaN NaN NaN NULL %> MACROREDUCER 'GRAPHER' STORE [ SWAP [] $GRAPHER ] REDUCE DROP // Colorize Ppixels <% DROP Palpha $palette SWAP GET %> LMAP PupdatePixels Pencode Pdecode $width $height '2D' PGraphics // Do the grid PnoFill 0 0 $width 1 - $height 1 - Prect 2.0 PstrokeWeight 200.0 Pcolor Pstroke 250.0 $maxspeed / $width * DUP 0 SWAP $height Pline 0 10000 $maxalt / 1.0 SWAP - $height * DUP $width SWAP Pline SWAP 0 0 Pimage Pencode
  13. Macros Factorizing WarpScript code to separate responsabilities and encourage reusability

    <% // This is a macro body %> ▪ Macros can be deployed on the server side ▪ Macros can be packaged in a jar ▪ Macros can access some config elements (MACROCONFIG) ▪ Macros can be deployed on a remote server
  14. WarpFleet™ Resolver Enable hosting of macros on remote servers ▪

    Macros can be hosted on any HTTP server including GitHub ▪ Resolution is performed at runtime ▪ Support for multiple macro repositories ▪ Script execution can modify repositories ▪ WarpFleet™ resolver can be disabled altogether ▪ Support for versioning via the IMPORT function ▪ SenX provides a growing set of macros via its own repo ▪ Warp 10 does intelligent caching of fetched macros ▪ Support for runtime injection of elements (MACROCONFIG)
  15. Extensions Add, remove or modify WarpScript functions ▪ Write new

    functions in Java (JVM), Go, Rust, C++, C (JNA) ▪ Simple API to interact with the WarpScript execution runtime ▪ Freedom of licensing for extensions ▪ Growing list of existing extensions, contributions welcome! Barcode, GeoTransforms, Grok, InfluxDB, JDBC, PCap, PMML, Polyglot, Redis, S3, Swift, TensorFlow, EGADS, Elastic, GCode, H2O, Keras, memcached, Parquet, ORC, Neo4J, OpenTSDB, LAS, Pig, Spark Some commercial ones by SenX LevelDB, MapMatching, Forecasting, WarpScript Compiler
  16. Plugins Extend Warp 10 by adding new features ▪ Plugins

    are run in the Warp 10 process ▪ Plugins can be in a Java (JVM) or Go, Rust, C, C++ (JNA) ▪ Very diverse things can be done using plugins ▪ Authentication plugins add new types of credentials ▪ No license constraints Kafka, MQTT, WarpStudio, Zeppelin, HTTP, UDP, TCP, Py4J, InfluxDB Line Protocol OVH is considering open sourcing plugins to support PromQL, Graphite, OpenTSDB, InfluxQL query languages Poke them to make it happen!
  17. WarpFleet™ Community site for finding extensions, macro packages and plugins

    ▪ CLI tool on NPM - npm install -g @senx/warpfleet ▪ Modules are hosted on maven repositories ▪ Benefit from dependency resolution mechanisms ▪ Modules can be fetched by Spark for example ▪ Again, contributions more than welcome!
  18. Weather data 1,000,000 cells 400 parameters 208 time steps 86

    B data points every 6 hours in 400 M series Using rank 2 tensors multi values Warp 10 can store all of GFS in just 1,000,000 Geo Time Series