Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to CrateDB (Michael Beer, crate.io)

Introduction to CrateDB (Michael Beer, crate.io)

This talk demonstrates crate.io's open machine data stack for data science containing CrateDB, IBM Watson and others.

Presentation given at the 4ländereck Data Science Meetup: https://www.meetup.com/4laendereck-Data-Science-Meetup/events/238393988/

Transcript

  1. 0 10 20 30 40 50 60 70 80 90

    100 Data from "things" Log data Customer / user data Network data Geospatial Social Media Metadata Full text Other Crate Focus: Machine Data Management — • Network monitoring • IT/Cloud infrastructure monitoring • Security audit monitoring • IoT / Sensors • Industrial IoT • Wearables • Machine learning 72% of production CrateDB deployments manage machine data
  2. Unique Challenges of Machine Data Management & Analysis — Millions

    of data points/second • Streaming in from sensors, devices, logs, etc. • highly concurrent from many connections Data diversity • Structured & unstructured JSON, Blobs Real-time query performance • custom apps, analytics, interactive dashboards • monitoring & alerting, machine learning Complex queries of big data volumes • Query terabytes of historic data • Combine streaming data with large historic data CrateDB makes this possible for millions of mainstream SQL developers
  3. CrateDB Key Ideas — • Distributed SQL for scale out

    • Highly concurrent read & write • Simple scalability - Masterless, shared-nothing - Microservices architecture - Auto-sharding & partioning • Realtime search & aggregations • Dynamic schema • Timeseries, Geospatial support • In-memory speed
  4. Production Scale Actual CrateDB cluster deployments in production — ‣on

    premise / private cloud / public cloud ‣eg. 100+ nodes in 4 datacenters ‣Billions of inserts per day ‣Peak speeds: several million inserts per second ‣100s TB of data ‣Running on cheap commodity hardware, but highly available
  5. Apps DB Input The Crate Open Machine Data Stack -

    build your own with SQL — ‣ Integrates easily ‣ Low learning curve ‣ Greatest flexibility ‣ No lock in Custom SQL Apps
  6. Crate Customer Skyhigh Networks — Realtime cloud security monitoring •

    Millions of users • 40% of Fortune 500 • 600+ enterprise customers Data: • Billions of inserts/day (peak 100K/sec) • Tens of thousands of concurrent TCP connections Implementation: • Replaced MySQL and Elasticsearch with CrateDB • 75% fewer servers after switch • 20x faster performance
  7. IM360 - recently launched immersive 360 streaming video experience Data:

    • Server log monitoring • Crate enables realtime metrics from a huge AWS server farm, as it unfolds second by second Implementation: • Grafana as visualization engine • “Better time series than Kibana” OBLIVION IRON MAN 3 INDEPENDENCE DAY: RESURGENCE X-MEN: APOCOLYPSE 47 RONIN THE GIRL WITH THE DRAGON TATTOO AEON FLUX THE FIFTH ELEMENT Crate Customer Digital Domain —
  8. Artificial Intelligence with Watson — Since 12th Feb. 2011 -

    Today https://www.githubarchive.org/ github.raw http://jupyter.org/index.html github.event https://nodered.org/ github.event https://www.ibm.com/watson/