Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PTSM - #1 - Warp10 - Advanced Time Series Technology & Use Cases

TimeSeriesFr
September 25, 2019

PTSM - #1 - Warp10 - Advanced Time Series Technology & Use Cases

Introduction à Warp10 et aux usages avancés autour de la série temporelle par Mathias Heberts (CTO et co-fondateur SenX, éditeur de Warp10)

TimeSeriesFr

September 25, 2019
Tweet

More Decks by TimeSeriesFr

Other Decks in Technology

Transcript

  1. Advanced Time Series
    Technology & Use Cases
    Mathias Herberts - CTO
    [email protected]
    @herberts

    View Slide

  2. Introduction

    View Slide

  3. Time Series are universal and ubiquitous
    ■ Time Series are all about capturing change, not simply state
    ■ Time Series help understand the past and predict the future
    ■ Time Series are the bridges between the physical world and its digital twin
    ■ Time Series are the memory of the universe we live in
    ■ Time Series are eating the world

    View Slide

  4. What are Time Series?
    ■ Time Series are sequences of values indexed by time
    ■ Time is an illusion, any sequence can be seen as a Time Series

    View Slide

  5. Where can Time Series be found?
    ■ Time Series are present in many if not all verticals

    View Slide

  6. Why do Time Series require specific tools?
    ■ Time Series data are different by nature
    ■ Their production rate is massive and continuous
    ■ The historical datasets that need to be retained are gigantic
    ■ The access pattern to Time Series data is unique
    ■ The type of analysis performed on Time Series data is uncommon
    ■ Traditional tools MUST be adapted if they are to be used

    View Slide

  7. for Machine Data
    Storage Analytics Visualization

    View Slide

  8. Data Model

    View Slide

  9. A universal data model

    View Slide

  10. Geo Time Series™ data containers

    View Slide

  11. Architecture

    View Slide

  12. Warp 10™ standalone version
    Single jar, no external dependencies
    in-memory
    disk based persistence
    HDD / SSD

    View Slide

  13. Standalone Warp 10™
    Standalone Warp 10™
    Standalone Warp 10™
    Standalone with datalog replication

    View Slide

  14. Standalone Warp 10™ Standalone Warp 10™
    Standalone Warp 10™
    Standalone with datalog sharding

    View Slide

  15. Metadata index
    WarpScript™ analytics engine
    Ingestion endpoint
    Persistence daemon
    Warp 10™ distributed version

    View Slide

  16. Storage

    View Slide

  17. A high performance Geo TSDB
    ■ Simple interaction via HTTP and text format for easy integration
    ■ Ability to ingest and fetch very long streams of data points
    ■ Support for WebSocket input and output
    ■ Fine grained access control via cryptographic tokens
    ■ Proven scalability with no cardinality problems
    ■ Support for Univariate and Multivariate data points
    ■ Distributed throttling mechanisms for number of series and data points rate

    View Slide

  18. Anatomy of storage engine input
    TIMESTAMP/LATITUDE:LONGITUDE/ELEVATION CLASS{LABELS} VALUE
    ■ Support for time precisions from ns to ms
    ■ Class and labels support UTF-8 in both names and values
    ■ Support for 5 types LONG, DOUBLE, BOOLEAN, STRING, BINARY
    64 -Infinity NaN 4E-05 F ’foo’ b64:UmVmbHV4Cg==
    ■ Support for nested Multivariate values - each MV is a GTS (Geo Time Series™)
    [ 2/42 64/48.0:-4.5/’hello’ 128/[ 1 2 3 ] 256/hex:12345 ]

    View Slide

  19. Real world scalability and performance figures
    ■ Known deployments of over 500M series
    ■ Ingestion performance of 120M data points per second on a single in-memory
    ■ Historical datasets of several hundreds of trillions of data points
    ■ Sustained ingestion of several million data points per second per ingress
    ■ Ingestion of over 300k data points per second on a single thread on a RPi 4
    ■ Random deletions at several million data points per second

    View Slide

  20. Analytics

    View Slide

  21. Built around a data processing language

    View Slide

  22. Full featured language dedicated to Time Series
    ■ Fully functional concatenative language
    ■ Turing complete with loops, conditionals, asynchronous transfer of control
    ■ Supports Geo Time Series as first class citizens
    ■ Over 980 functions available - from summary statistics to signal processing
    ■ 6 frameworks - BUCKETIZE, MAP, REDUCE, FILL, APPLY, FILTER
    ■ Fully extensible and embeddable
    ■ Ability to call external programs

    View Slide

  23. Web IDE and Visual Studio Code Plugin

    View Slide

  24. Powerful expressiveness
    [ ‘TOKEN’ ‘class’ {} NOW 24 h ] FETCH ‘gts’ STORE // Fetch last 24 hours
    [ $gts bucketizer.mean NOW 0 1 m ] BUCKETIZE ‘mean’ STORE // mean every 1’
    [ $gts mapper.rate 1 0 0 ] MAP ‘rate’ STORE // Compute rate of change
    NEWGTS 'randomwalk' RENAME 0.0 'v' STORE 42 PRNG 1 1000
    FOR

    View Slide

  25. Complex algorithms available as simple functions
    NEWGTS 'randomwalk' RENAME 0.0 'v' STORE 42 PRNG 1 1000
    FOR
    DUP 100 LTTB

    View Slide

  26. Hiding WarpScript complexity in macros
    NEWGTS 'randomwalk' RENAME 0.0 'v' STORE 42 PRNG 1 1000
    FOR
    'UTC' @senx/cal/byday

    View Slide

  27. View Slide

  28. Complete documentation online at warp10.io

    View Slide

  29. Visualization

    View Slide

  30. Flexible visualization options

    View Slide

  31. Full support for Processing in WarpScript
    800 'width' STORE 800 'height' STORE
    400.0 'maxspeed' STORE 40000.0 'maxalt' STORE
    3.0 2.0 2.0 @orbit/heatmap/kernel/triangular 'kernel' STORE
    @orbit/heatmap/palette/classic 'palette' STORE
    'TOKEN''token' STORE
    $width $height '2D' PGraphics
    'MULTIPLY' PblendMode 'CENTER' PimageMode
    [ $token '~(ALT|CAS)' {} NOW -2000000 ] FETCH
    DUP 0 GET LASTTICK 'now' STORE
    [ SWAP bucketizer.last $now STU 0 ] BUCKETIZE
    // Create heatmap
    7 GET LIST-> DROP 'CAS' STORE 'ALT' STORE

    %>
    IFT
    0 NaN NaN NaN NULL
    %> MACROREDUCER 'GRAPHER' STORE
    [ SWAP [] $GRAPHER ] REDUCE DROP
    // Colorize
    Ppixels LMAP
    PupdatePixels Pencode Pdecode
    $width $height '2D' PGraphics
    // Do the grid
    PnoFill 0 0 $width 1 - $height 1 - Prect
    2.0 PstrokeWeight 200.0 Pcolor Pstroke
    250.0 $maxspeed / $width * DUP 0 SWAP $height Pline
    0 10000 $maxalt / 1.0 SWAP - $height * DUP $width SWAP Pline
    SWAP 0 0 Pimage Pencode

    View Slide

  32. Extensibility

    View Slide

  33. Macros
    Factorizing
    WarpScript code to
    separate
    responsabilities and
    encourage
    reusability
    // This is a macro body
    %>
    ■ Macros can be deployed on the server side
    ■ Macros can be packaged in a jar
    ■ Macros can access some config elements (MACROCONFIG)
    ■ Macros can be deployed on a remote server

    View Slide

  34. WarpFleet™
    Resolver
    Enable hosting of
    macros on remote
    servers
    ■ Macros can be hosted on any HTTP server including GitHub
    ■ Resolution is performed at runtime
    ■ Support for multiple macro repositories
    ■ Script execution can modify repositories
    ■ WarpFleet™ resolver can be disabled altogether
    ■ Support for versioning via the IMPORT function
    ■ SenX provides a growing set of macros via its own repo
    ■ Warp 10 does intelligent caching of fetched macros
    ■ Support for runtime injection of elements (MACROCONFIG)

    View Slide

  35. Extensions
    Add, remove or
    modify WarpScript
    functions
    ■ Write new functions in Java (JVM), Go, Rust, C++, C (JNA)
    ■ Simple API to interact with the WarpScript execution runtime
    ■ Freedom of licensing for extensions
    ■ Growing list of existing extensions, contributions welcome!
    Barcode, GeoTransforms, Grok, InfluxDB, JDBC, PCap, PMML,
    Polyglot, Redis, S3, Swift, TensorFlow, EGADS, Elastic,
    GCode, H2O, Keras, memcached, Parquet, ORC, Neo4J,
    OpenTSDB, LAS, Pig, Spark
    Some commercial ones by SenX
    LevelDB, MapMatching, Forecasting, WarpScript Compiler

    View Slide

  36. Plugins
    Extend Warp 10 by
    adding new features
    ■ Plugins are run in the Warp 10 process
    ■ Plugins can be in a Java (JVM) or Go, Rust, C, C++ (JNA)
    ■ Very diverse things can be done using plugins
    ■ Authentication plugins add new types of credentials
    ■ No license constraints
    Kafka, MQTT, WarpStudio, Zeppelin, HTTP, UDP, TCP, Py4J,
    InfluxDB Line Protocol
    OVH is considering open sourcing plugins to support
    PromQL, Graphite, OpenTSDB, InfluxQL query languages
    Poke them to make it happen!

    View Slide

  37. WarpFleet™
    Community site for
    finding extensions,
    macro packages and
    plugins
    ■ CLI tool on NPM - npm install -g @senx/warpfleet
    ■ Modules are hosted on maven repositories
    ■ Benefit from dependency resolution mechanisms
    ■ Modules can be fetched by Spark for example
    ■ Again, contributions more than welcome!

    View Slide

  38. Integrations

    View Slide

  39. Augment existing tools and frameworks

    View Slide

  40. Use Cases

    View Slide

  41. Flight data analysis for fleet reliability
    Pressure Altitude vs TAS

    View Slide

  42. Weather data
    1,000,000 cells
    400 parameters
    208 time steps
    86 B data points every 6 hours
    in 400 M series
    Using rank 2 tensors multi values
    Warp 10 can store all of GFS in just
    1,000,000 Geo Time Series

    View Slide

  43. Helping racing sailboats fly
    Automatic phase extraction
    by TWA analysis

    View Slide

  44. chemtrails-locator.com
    200,000 aircrafts
    15 B positions
    Spatio-temporal indexing
    150 km / 5 minutes cells
    Served entirely by Warp 10

    View Slide

  45. sandbox.senx.io

    View Slide

  46. @SenXHQ - @Warp10io - @WarpScript
    senx.io - warp10.io

    View Slide