Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PTSM - #2 - Présentation QuasarDB

TimeSeriesFr
November 05, 2019

PTSM - #2 - Présentation QuasarDB

TimeSeriesFr

November 05, 2019
Tweet

More Decks by TimeSeriesFr

Other Decks in Technology

Transcript

  1. Meet-up Time Series Paris
    QuasarDB
    1

    View Slide

  2. The endless growth of data
    2
    0
    20
    40
    60
    80
    100
    120
    140
    160
    180
    2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
    Annual Size of the Global Datasphere
    • Spending on cognitive and AI systems will
    exceed $50B in 2021 (IDC)
    • AI derived business value is forecast to reach
    $3.9 trillion in 2022 (Gartner)
    AI
    • IoT software spending will total $154 billion in
    2019 and will see the fastest growth over the
    five-year forecast period with a CAGR of
    16.6% (IDC)
    IoT
    • By 2021 AI augmentation will help recover 6.2
    billion hours of worker productivity
    • By 2022, half of data and analytics services
    will be performed by machines
    The new world
    Zetabytes

    View Slide

  3. Timeseries: a new challenge
    3
    Timeseries database is the fastest growing
    database category for a cause
    Everything event based can be stored as timeseries
    to observe and analyze change
    Timeseries have unique challenges that can’t be
    solved by relational databases
    The challenge starts at the edge
    • Collecting, storing, and indexing the data is a challenge
    in itself because of the volume of data per device (jet
    engines Paris to NYC: 200 TB)
    • You need operational intelligence before the data is in
    the datacenter
    Most of the data is “uninteresting” but you don’t know
    which part
    • The database needs to handle write heavy scenarios
    • Requires powerful indexing and compression
    technology
    80,00
    90,00
    100,00
    110,00
    120,00
    130,00
    140,00
    150,00
    160,00
    170,00
    mai-18 juil-18 sept-18 nov-18 janv-19 mars-19 mai-19
    Popularity Changes
    ©2019, DB-Engines.com
    Time Series DBMS
    Graph DBMS
    Key-value stores
    Native XML DBMS
    Wide column stores
    Multivalue DBMS
    Relational DBMS
    RDF stores
    Search engines
    Document stores
    Object oriented DBMS
    Timeseries have specific querying needs
    • ASOF joins – Merge two different timeseries into one
    • Resampling – Aggregate a high frequency timeseries
    into a lower frequency timeseries
    • Trimming – Remove quickly “old data”

    View Slide

  4. From Big Data to Big Value
    4
    We have the edge to cloud solution, based on 10 years of R&D
    What about Relational Database
    Management Systems (RDBMS)?
    Unable to handle the volume, very
    inefficient TS management, and are not
    optimized for write heavy scenarios ->
    TCO unrealistic
    Structured data models don’t scale, hence
    the apparition of Big Data
    “Big Data” technologies (NoSQL,
    Hadoop) are disappointing
    Because they sacrificed speed, reliability,
    or functionalities for scale
    The goal is not to hoard data, but to
    extract intelligence out of the data.
    Timeseries database were for the
    moment mostly monitoring databases
    Today 2/3 of timeseries use cases are IT
    operations related (Gartner)
    • You don’t need high speed reliable ingestion
    for those use cases
    • You don’t need advanced analytics
    • Thus, the abundance of “timeseries
    databases” is a red herring
    The future is in the other third, IoT and
    operational intelligence, which is growing
    rapidly
    • But can the existing offering deliver?

    View Slide

  5. Cheaper
    Faster
    Safer
    Easier
    QuasarDB: a unified, integrated, and optimized stack
    5
    High speed
    direct data
    insertion *
    Data quasarDB
    API
    quasarDB
    API
    Query
    *Results
    QuasarDB
    Cluster
    Fastest data ingestion in the market
    Queries on both historical and real-time data with standard tooling (Python, SQL)
    Data is loaded and added from disk as needed quickly and transparently as if in
    memory by QuasarDB
    Client

    View Slide

  6. QuasarDB was designed with the following principles:
    6
    SPEED –Use the best algorithms. Implement them efficiently. Don’t use scalability as an excuse for poor
    nominal performance.
    SCALABILITY –The engine must scale up, then scale out, without any headache.
    STORAGE EFFICIENCY –Leverage compression and smart encoding as much as possible, enable the user to
    store as much as they want.
    SAFETY – Preserve the data. Don’t surprise the user with behavior such as “eventual consistency.” Be
    consistent. Be transactional
    SECURITY – Make unauthorized access extremely hard. Deploy in-depth security. Empower the administrators
    to control and audit access.
    CONVENIENCE AND FAMILIARITY – Don’t push the hard problems back to the users (consistency,
    integrity, aggregations, etc.). Give users access to the data through familiar interfaces.
    UNIVERSALITY – Data is everywhere. Run everywhere. In the cloud. In the datacenter. On the desktop. On the
    edge (e.g., support Intel and ARM architectures).

    View Slide

  7. Technology deep dive
    7
    Implementation
    QuasarDB is written in modern C++ 17 with the most critical parts handwritten in assembly. This
    delivers a portable binary unconstrainted by a virtual machine or complex dependencies.
    Zero-overhead architecture
    As QuasarDB combines in a single binary the orchestrator, the query engine, the aggregation engine,
    and the persistence. Thus, it can use a zero-copy architecture that prevents the efficiency loss typically
    found in big data systems that combines different software products. For example, the memory buffer
    from the network card can be written directly to disk after sanity checks, removing overheads caused
    by allocations and copies.
    Core engine
    QuasarDB is a multithreaded, column-oriented timeseries database. It combines the best of two
    worlds: the speed of in-memory databases with the capacity of long-term storage through its hybrid
    engine.The aggregation engine leverages the column-oriented nature of data for ultra-fast aggregations
    and uses Single Instruction Multiple Data (SIMD) instructions whenever available

    View Slide

  8. Beyond the Quasar Engine
    8
    Network
    protocols
    Numerical data
    compression
    Caching
    strategies
    Distributed
    queries
    Lock-free
    memory
    management
    structures
    Indexing
    structures and
    algorithms
    The Quasar Engine by itself doesn’t explain all the performance gains.

    View Slide

  9. Clustering
    9

    View Slide

  10. Partitionning
    10

    View Slide

  11. Storage
    11

    View Slide

  12. Queries
    12
    Queries
    QuasarDB can be operated through a comprehensive and simple to use API available in all
    popular languages.
    It also supports SQL queries with timeseries extensions. Instead of inventing a brand-new
    language and jailing users in a proprietary product, QuasarDB makes sure data scientists are
    quickly up to speed.
    The timeseries extensions simplify the writing of complex time-related queries while making them
    easy to understand, even for someone unfamiliar with QuasarDB. The query language is aware of
    the Gregorian calendar and makes creating time slices a breeze.
    SELECT sum(volume) FROM my_stocks IN RANGE (today, -1month) GROUP BY day
    14

    View Slide

  13. Query
    13

    View Slide

  14. Memory usage optimisation
    14

    View Slide

  15. Compression
    15

    View Slide

  16. 1 GB consolidated benchmarks from 10/2018
    16
    25851
    123964
    435755 InfluxDB
    TimescaleDB
    QuasarDB
    320
    2212
    274 InfluxDB
    TimescaleDB
    QuasarDB
    91
    2800
    42502 InfluxDB
    TimescaleDB
    QuasarDB
    Insertion time
    in ms
    Disk usage
    in MiB
    Query time
    in ms

    View Slide

  17. Who are we ?
    17
    A Software
    Publisher, with 10
    years of R&D
    A recognized
    actor in financial
    markets
    We have
    currently raised
    3,5M USD
    We have two
    main offices
    An international
    team

    View Slide

  18. QuasarDB in the field
    18
    • Need to have an extremely good understanding of
    the behavior and status of devices
    • They will collect data from a million vehicles to
    model correctly the behavior of their equipments
    • After unsuccessfully trying open-source solutions,
    customer evaluated QuasarDB,considerably
    reduced storage and can now scale up
    • With petabytes of data in Amazon S3, the stock
    exchange unsuccessfully tried to build a system
    based on Spark to give analytical capabilities to
    their customers.
    • QuasarDB is the only solution that can, with a
    realistic cost, leverage that data directly to enable
    the stock exchange’s customer to run ad-hoc
    queries.
    • 200 million sensors at 2 Hz.
    • Constrained environment that cannot use “any
    hardware”.
    • The defense contractor has no solution.
    • QuasarDB is currently working to deliver a solution
    to this “impossible” problem thanks to its patent-
    pending compression technology.
    Transportation -
    Energy
    Finance
    Defense
    Tier 1 Automotive supplier Defense contractor Tier I stock exchange

    View Slide

  19. They trust QuasarDB
    19

    View Slide

  20. Business model
    20
    Subscription includes edge and
    datacenter licensing; customer opts
    for fully managed or not.
    Extended support available as
    premium
    Deployment can be part of a
    package and done by a partner
    company
    • Sum the RAM of every machine running QuasarDB at
    the customer
    • €120k per TB of RAM per year
    • Includes basic 9 to 5 support
    • Unlimited storage
    • Unlimited cores
    Customer managed
    Solution
    QuasarDB managed
    Solution
    • Charge €48k per year per block
    • A block is
    • 8 GB of RAM
    • 64 GB of storage
    • 1:1 redundancy
    • Daily backups
    • 24/7 monitoring
    Ad-hoc edge (embedded) licensing based on device cost

    View Slide

  21. What does QuasarDB give you?
    21
    Ingest all the datas you need at the speed you require.
    QuasarDB delivers the fastest timeseries ingestion speed on the market through its combination of
    ultra-optimized protocol, unique compression algorithms, and built-inclustering.
    QuasarDB is designed to ingest terabytes per day and manage petabytes.
    Out of the box, QuasarDB comes with the tooling to ingest text files at maximum efficiency without
    requiring you to write programs or scripts.
    The model adapts to your need, not the other way around.
    Timeseries are stored into tables. QuasarDB combines the flexibility typically found in NoSQL
    database with the rigor of relational databases.
    Powerful and instant insights on all your data
    Query your data through SQL or perform powerful analytics with Python Pandas or R. Stream your results
    through the BI tool of your choice with our ODBC connector. Since access to the data is uniform regardless of
    its age, insights are more accurate and unconstrained.

    View Slide

  22. To conclude , What does QuasarDB give you?
    22
    Safely and securely manage your data
    QuasarDB distributed transactional engine enables you to perform complex queries over the cluster
    with the guarantee you’ll have a consistent view. Security features include fine-grained access control,
    cryptographically, strong authentication, and traffic encryption. Every update to the database is stored
    into an audit trail.
    QuasarDB supports intra-cluster replication . Cluster to cluster replication is also possible with a tool
    that allows one-time replication, migration and continuous updates.
    Unlimited and efficient storage
    With its Delta4C compression algorithm, QuasarDB delivers efficient disk usage (ten times less than
    relational DB) without impacting performance. It brings possibility to have a higher tier of storage (for
    example, move from SDD to NVMe). Contrary to in-memory databases, QuasarDB does not require you
    to explicitly load the data from the disk before performing a query. It transparently paginates data in
    and out.

    View Slide

  23. What does QuasarDB give you?
    23
    Low total cost of ownership
    The combination of efficient storage, low CPU usage, integration, and ease of use results in tremendous
    savings for businesses. Deploying QuasarDB is cheaper and faster than typical Big Data solutions
    Data analysts use known tools such as Python and SQL to interface with their data: no
    arcane proprietary language to learn and skills aretransferable.
    QuasarDB integrates ETL, distribution, storage, analytics, orchestration, and connectivity in
    a single package radically simplifying deployment and administration, shortening projects time.
    Efficient storage means significantly reduced storage costs.
    Speed means lower CPU usage, which directly translates to lower hosting costs.

    View Slide

  24. Automotive Parts Manufacturer
    24

    View Slide

  25. JEAN-CLAUDE TAGGER
    COO
    [email protected]
    25
    USA
    222 BROADWAY – 19TH FLOOR
    NEW YORK
    NY 10038
    UK
    40 BANK STREET
    CANARY WHARF
    LONDON E14 5NR
    FRANCE
    24, RUE FEYDEAU
    75002 PARIS

    View Slide

  26. 26
    Backup slides

    View Slide

  27. Details on Security (1)
    27
    Authentication and rights management
    User authentication is cryptographically strong and based on opaque security tokens, preventing
    security holes associated with passwords.
    Administrators can regenerate tokens and disable users at will, should an account be
    compromised.
    Users have default privileges, and the database supports fine-grained access control to handle
    even the most advanced security requirements.
    Data entitlement can be managed with table-based filters, for example, to prevent users who
    don’t have the proper privileges to see the most recent data.

    View Slide

  28. Details on Security (2)
    28
    Audit trial
    Every update to the database is stored into an audit trail, stored as a timeseries. This timeseries can be
    protected through access control to prevent external modifications. The audit trail contains the nature
    of the operation, the concerned timeseries, the modified range, and the user at the origin of the
    modification.
    Backup, replication and migration
    QuasarDB supports intra-cluster replication through a configuration setting. This internal replication
    delivers continuous service should a node fail. Cluster to cluster replication is possible through an
    external tool developed by the QuasarDB team. This tool leverages the audit trail and thus allows one
    time replication, migration, and continuous updates.Backup can be performed at the file level, or,
    through a backup tool that also leverages the audit trail. Full and incremental backups are thus
    supported.

    View Slide

  29. Details on Security (3)
    29
    Security
    QuasarDB supports traffic encryption based on AES-256 GCM. The key is 256-bit large and the MAC 128-bit
    large. Every packet is authenticated to detect accidental or intentional modifications.Session keys are securely
    exchanged using X25519. Each session key is used only once, which permits Perfect Forward Secrecy (PFS).
    This means that should an attacker manage to compromise a user certificate; she will not be able to decipher
    the messages from the past, only future messages.
    QuasarDB security isn’t limited to user’s management and authentication. It is also deeply integrated into the
    protocol.
    The server has several mechanisms to protect from Denial of Service (DoS) whether it’s through maliciously
    crafted packets or valid, but excessive user requests. For example, there is a server-side parameter that limits
    how large user packets may be, ensuring the server cannot be in a situation where it goes OOM because of
    user requests.
    QuasarDB’s cryptographic primitives are based on NaCl written by Daniel J. Bernstein, Tanja Lange, and Peter
    Schwabe.

    View Slide