Gian Merlino
[email protected]
Three weird tips for high performance
analytics applications
Slide 2
Slide 2 text
Who am I?
Gian Merlino
Committer & PMC member on
Cofounder & CTO at
2
Slide 3
Slide 3 text
Agenda
Intro to Druid
The virtue of brevity
The virtue of foresight
The virtue of flexibility
3
Slide 4
Slide 4 text
4
open source, high-performance,
column-oriented, distributed data store
Slide 5
Slide 5 text
What is Druid?
● “high performance”: low query latency, high ingest rates
● “column-oriented”: best possible scan rates
● “distributed”: deployed in clusters, typically 10s–100s of nodes
● “data store”: the cluster stores a copy of your data
5
Slide 6
Slide 6 text
The Problem
6
Slide 7
Slide 7 text
Powered by Druid
7
Source: http://druid.io/druid-powered.html
Not an endorsement.
Slide 8
Slide 8 text
Powered by Druid
“The performance is great ... some of the tables that we have
internally in Druid have billions and billions of events in them,
and we’re scanning them in under a second.”
8
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
From Yahoo: