Apache Druid

What Is Druid ? • Real Time Analytics Database •
Distributed Architecture • Open Source • Highly Performant • Time Series Database • Apache 2 License • Written in Java

Druid Use Cases • User activity and behaviour • Network
flows • Digital marketing • Application performance management • IoT and device metrics • OLAP and business intelligence For real time data ingestion, fast query and high uptime.

Druid Features • Column-oriented storage • Native search indexes •
Streaming and batch ingest • Flexible schemas • Time-optimized partitioning • SQL support • Horizontal scalability • Easy operation

Druid Users • Airbnb • Alibaba • Booking.com • Cisco
• Ebay • Hulu • Lyft • Outbrain • Paypal • Pinterest • Slack • Twitter • Walmart • Yahoo Some of the more famous users among many others

Druid MetaStore • Stores Metadata about system and data stored
• Can use the following databases – Derby, MySQL, Postgresql • Stores Meta data information like – Segments, Rules, Config – Tasks, Audit

Druid Deep Storage • Deep storage persists Druid segment data
• Uses storage like – Local Mounts, AWS S3, HDFS • Core extensions available from Druid committers • Extension examples include – Azure, Cassandra, Cloudfiles

Druid Architecture

Druid Architecture 2

Druid Processes • Historical – store and query historic data
• MiddleManager – ingest new data • Broker – process client queries • Coordinator – watch over Historical processes • Overlord - watch over MiddleManager processes • Router – optional – provide a unified API gateway

Druid Query • Druid supports JSON and SQL based queries
• The SQL syntax is as follows • GROUPING SETS improves efficiency, reduces scanning • ROLLUP provides grouped data for each level of data • CUBE provides grouped data for each combination of data

Druid High Availability (HA) • Use 3 or 5 ZooKeeper
nodes on own hardware • MetaStore use MySQL or Postgresql – With replication and failover • Use multiple Coordinators and Overlords – Using same metaStore and ZooKeeper • Scale Brokers horizontally • Use a load balancer

Available Books • See “Big Data Made Easy” – Apress
Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” – • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
• See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Druid

Apache Druid

Mike Frampton

More Decks by Mike Frampton

Other Decks in Technology

Featured

Transcript

What Is Druid ? • Real Time Analytics Database •

Druid Use Cases • User activity and behaviour • Network

Druid Features • Column-oriented storage • Native search indexes •

Druid Users • Airbnb • Alibaba • Booking.com • Cisco

Druid MetaStore • Stores Metadata about system and data stored

Druid Deep Storage • Deep storage persists Druid segment data

Druid Architecture

Druid Architecture 2

Druid Processes • Historical – store and query historic data

Druid Query • Druid supports JSON and SQL based queries

Druid High Availability (HA) • Use 3 or 5 ZooKeeper

Available Books • See “Big Data Made Easy” – Apress

Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020