Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PTSM - #2 - Présentation QuasarDB

TimeSeriesFr
November 05, 2019

PTSM - #2 - Présentation QuasarDB

TimeSeriesFr

November 05, 2019
Tweet

More Decks by TimeSeriesFr

Other Decks in Technology

Transcript

  1. The endless growth of data 2 0 20 40 60

    80 100 120 140 160 180 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 Annual Size of the Global Datasphere • Spending on cognitive and AI systems will exceed $50B in 2021 (IDC) • AI derived business value is forecast to reach $3.9 trillion in 2022 (Gartner) AI • IoT software spending will total $154 billion in 2019 and will see the fastest growth over the five-year forecast period with a CAGR of 16.6% (IDC) IoT • By 2021 AI augmentation will help recover 6.2 billion hours of worker productivity • By 2022, half of data and analytics services will be performed by machines The new world Zetabytes
  2. Timeseries: a new challenge 3 Timeseries database is the fastest

    growing database category for a cause Everything event based can be stored as timeseries to observe and analyze change Timeseries have unique challenges that can’t be solved by relational databases The challenge starts at the edge • Collecting, storing, and indexing the data is a challenge in itself because of the volume of data per device (jet engines Paris to NYC: 200 TB) • You need operational intelligence before the data is in the datacenter Most of the data is “uninteresting” but you don’t know which part • The database needs to handle write heavy scenarios • Requires powerful indexing and compression technology 80,00 90,00 100,00 110,00 120,00 130,00 140,00 150,00 160,00 170,00 mai-18 juil-18 sept-18 nov-18 janv-19 mars-19 mai-19 Popularity Changes ©2019, DB-Engines.com Time Series DBMS Graph DBMS Key-value stores Native XML DBMS Wide column stores Multivalue DBMS Relational DBMS RDF stores Search engines Document stores Object oriented DBMS Timeseries have specific querying needs • ASOF joins – Merge two different timeseries into one • Resampling – Aggregate a high frequency timeseries into a lower frequency timeseries • Trimming – Remove quickly “old data”
  3. From Big Data to Big Value 4 We have the

    edge to cloud solution, based on 10 years of R&D What about Relational Database Management Systems (RDBMS)? Unable to handle the volume, very inefficient TS management, and are not optimized for write heavy scenarios -> TCO unrealistic Structured data models don’t scale, hence the apparition of Big Data “Big Data” technologies (NoSQL, Hadoop) are disappointing Because they sacrificed speed, reliability, or functionalities for scale The goal is not to hoard data, but to extract intelligence out of the data. Timeseries database were for the moment mostly monitoring databases Today 2/3 of timeseries use cases are IT operations related (Gartner) • You don’t need high speed reliable ingestion for those use cases • You don’t need advanced analytics • Thus, the abundance of “timeseries databases” is a red herring The future is in the other third, IoT and operational intelligence, which is growing rapidly • But can the existing offering deliver?
  4. Cheaper Faster Safer Easier QuasarDB: a unified, integrated, and optimized

    stack 5 High speed direct data insertion * Data quasarDB API quasarDB API Query *Results QuasarDB Cluster Fastest data ingestion in the market Queries on both historical and real-time data with standard tooling (Python, SQL) Data is loaded and added from disk as needed quickly and transparently as if in memory by QuasarDB Client
  5. QuasarDB was designed with the following principles: 6 SPEED –Use

    the best algorithms. Implement them efficiently. Don’t use scalability as an excuse for poor nominal performance. SCALABILITY –The engine must scale up, then scale out, without any headache. STORAGE EFFICIENCY –Leverage compression and smart encoding as much as possible, enable the user to store as much as they want. SAFETY – Preserve the data. Don’t surprise the user with behavior such as “eventual consistency.” Be consistent. Be transactional SECURITY – Make unauthorized access extremely hard. Deploy in-depth security. Empower the administrators to control and audit access. CONVENIENCE AND FAMILIARITY – Don’t push the hard problems back to the users (consistency, integrity, aggregations, etc.). Give users access to the data through familiar interfaces. UNIVERSALITY – Data is everywhere. Run everywhere. In the cloud. In the datacenter. On the desktop. On the edge (e.g., support Intel and ARM architectures).
  6. Technology deep dive 7 Implementation QuasarDB is written in modern

    C++ 17 with the most critical parts handwritten in assembly. This delivers a portable binary unconstrainted by a virtual machine or complex dependencies. Zero-overhead architecture As QuasarDB combines in a single binary the orchestrator, the query engine, the aggregation engine, and the persistence. Thus, it can use a zero-copy architecture that prevents the efficiency loss typically found in big data systems that combines different software products. For example, the memory buffer from the network card can be written directly to disk after sanity checks, removing overheads caused by allocations and copies. Core engine QuasarDB is a multithreaded, column-oriented timeseries database. It combines the best of two worlds: the speed of in-memory databases with the capacity of long-term storage through its hybrid engine.The aggregation engine leverages the column-oriented nature of data for ultra-fast aggregations and uses Single Instruction Multiple Data (SIMD) instructions whenever available
  7. Beyond the Quasar Engine 8 Network protocols Numerical data compression

    Caching strategies Distributed queries Lock-free memory management structures Indexing structures and algorithms The Quasar Engine by itself doesn’t explain all the performance gains.
  8. Queries 12 Queries QuasarDB can be operated through a comprehensive

    and simple to use API available in all popular languages. It also supports SQL queries with timeseries extensions. Instead of inventing a brand-new language and jailing users in a proprietary product, QuasarDB makes sure data scientists are quickly up to speed. The timeseries extensions simplify the writing of complex time-related queries while making them easy to understand, even for someone unfamiliar with QuasarDB. The query language is aware of the Gregorian calendar and makes creating time slices a breeze. SELECT sum(volume) FROM my_stocks IN RANGE (today, -1month) GROUP BY day 14
  9. 1 GB consolidated benchmarks from 10/2018 16 25851 123964 435755

    InfluxDB TimescaleDB QuasarDB 320 2212 274 InfluxDB TimescaleDB QuasarDB 91 2800 42502 InfluxDB TimescaleDB QuasarDB Insertion time in ms Disk usage in MiB Query time in ms
  10. Who are we ? 17 A Software Publisher, with 10

    years of R&D A recognized actor in financial markets We have currently raised 3,5M USD We have two main offices An international team
  11. QuasarDB in the field 18 • Need to have an

    extremely good understanding of the behavior and status of devices • They will collect data from a million vehicles to model correctly the behavior of their equipments • After unsuccessfully trying open-source solutions, customer evaluated QuasarDB,considerably reduced storage and can now scale up • With petabytes of data in Amazon S3, the stock exchange unsuccessfully tried to build a system based on Spark to give analytical capabilities to their customers. • QuasarDB is the only solution that can, with a realistic cost, leverage that data directly to enable the stock exchange’s customer to run ad-hoc queries. • 200 million sensors at 2 Hz. • Constrained environment that cannot use “any hardware”. • The defense contractor has no solution. • QuasarDB is currently working to deliver a solution to this “impossible” problem thanks to its patent- pending compression technology. Transportation - Energy Finance Defense Tier 1 Automotive supplier Defense contractor Tier I stock exchange
  12. Business model 20 Subscription includes edge and datacenter licensing; customer

    opts for fully managed or not. Extended support available as premium Deployment can be part of a package and done by a partner company • Sum the RAM of every machine running QuasarDB at the customer • €120k per TB of RAM per year • Includes basic 9 to 5 support • Unlimited storage • Unlimited cores Customer managed Solution QuasarDB managed Solution • Charge €48k per year per block • A block is • 8 GB of RAM • 64 GB of storage • 1:1 redundancy • Daily backups • 24/7 monitoring Ad-hoc edge (embedded) licensing based on device cost
  13. What does QuasarDB give you? 21 Ingest all the datas

    you need at the speed you require. QuasarDB delivers the fastest timeseries ingestion speed on the market through its combination of ultra-optimized protocol, unique compression algorithms, and built-inclustering. QuasarDB is designed to ingest terabytes per day and manage petabytes. Out of the box, QuasarDB comes with the tooling to ingest text files at maximum efficiency without requiring you to write programs or scripts. The model adapts to your need, not the other way around. Timeseries are stored into tables. QuasarDB combines the flexibility typically found in NoSQL database with the rigor of relational databases. Powerful and instant insights on all your data Query your data through SQL or perform powerful analytics with Python Pandas or R. Stream your results through the BI tool of your choice with our ODBC connector. Since access to the data is uniform regardless of its age, insights are more accurate and unconstrained.
  14. To conclude , What does QuasarDB give you? 22 Safely

    and securely manage your data QuasarDB distributed transactional engine enables you to perform complex queries over the cluster with the guarantee you’ll have a consistent view. Security features include fine-grained access control, cryptographically, strong authentication, and traffic encryption. Every update to the database is stored into an audit trail. QuasarDB supports intra-cluster replication . Cluster to cluster replication is also possible with a tool that allows one-time replication, migration and continuous updates. Unlimited and efficient storage With its Delta4C compression algorithm, QuasarDB delivers efficient disk usage (ten times less than relational DB) without impacting performance. It brings possibility to have a higher tier of storage (for example, move from SDD to NVMe). Contrary to in-memory databases, QuasarDB does not require you to explicitly load the data from the disk before performing a query. It transparently paginates data in and out.
  15. What does QuasarDB give you? 23 Low total cost of

    ownership The combination of efficient storage, low CPU usage, integration, and ease of use results in tremendous savings for businesses. Deploying QuasarDB is cheaper and faster than typical Big Data solutions Data analysts use known tools such as Python and SQL to interface with their data: no arcane proprietary language to learn and skills aretransferable. QuasarDB integrates ETL, distribution, storage, analytics, orchestration, and connectivity in a single package radically simplifying deployment and administration, shortening projects time. Efficient storage means significantly reduced storage costs. Speed means lower CPU usage, which directly translates to lower hosting costs.
  16. JEAN-CLAUDE TAGGER COO [email protected] 25 USA 222 BROADWAY – 19TH

    FLOOR NEW YORK NY 10038 UK 40 BANK STREET CANARY WHARF LONDON E14 5NR FRANCE 24, RUE FEYDEAU 75002 PARIS
  17. Details on Security (1) 27 Authentication and rights management User

    authentication is cryptographically strong and based on opaque security tokens, preventing security holes associated with passwords. Administrators can regenerate tokens and disable users at will, should an account be compromised. Users have default privileges, and the database supports fine-grained access control to handle even the most advanced security requirements. Data entitlement can be managed with table-based filters, for example, to prevent users who don’t have the proper privileges to see the most recent data.
  18. Details on Security (2) 28 Audit trial Every update to

    the database is stored into an audit trail, stored as a timeseries. This timeseries can be protected through access control to prevent external modifications. The audit trail contains the nature of the operation, the concerned timeseries, the modified range, and the user at the origin of the modification. Backup, replication and migration QuasarDB supports intra-cluster replication through a configuration setting. This internal replication delivers continuous service should a node fail. Cluster to cluster replication is possible through an external tool developed by the QuasarDB team. This tool leverages the audit trail and thus allows one time replication, migration, and continuous updates.Backup can be performed at the file level, or, through a backup tool that also leverages the audit trail. Full and incremental backups are thus supported.
  19. Details on Security (3) 29 Security QuasarDB supports traffic encryption

    based on AES-256 GCM. The key is 256-bit large and the MAC 128-bit large. Every packet is authenticated to detect accidental or intentional modifications.Session keys are securely exchanged using X25519. Each session key is used only once, which permits Perfect Forward Secrecy (PFS). This means that should an attacker manage to compromise a user certificate; she will not be able to decipher the messages from the past, only future messages. QuasarDB security isn’t limited to user’s management and authentication. It is also deeply integrated into the protocol. The server has several mechanisms to protect from Denial of Service (DoS) whether it’s through maliciously crafted packets or valid, but excessive user requests. For example, there is a server-side parameter that limits how large user packets may be, ensuring the server cannot be in a situation where it goes OOM because of user requests. QuasarDB’s cryptographic primitives are based on NaCl written by Daniel J. Bernstein, Tanja Lange, and Peter Schwabe.