Time Series Processing with Solr and Spark

Slide 1

Slide 1 text

O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A

Slide 2

Slide 2 text

Time Series Processing with Solr and Spark Josef Adersberger (@adersberger) CTO, QAware

Slide 3

Slide 3 text

TIME SERIES 101

Slide 4

Slide 4 text

4 01 WE’RE SURROUNDED BY TIME SERIES ▸ Operational data: Monitoring data, performance metrics, log events, … ▸ Data Warehouse: Dimension time ▸ Measured Me: Activity tracking, ECG, … ▸ Sensor telemetry: Sensor data, … ▸ Financial data: Stock charts, … ▸ Climate data: Temperature, … ▸ Web tracking: Clickstreams, … ▸ … @adersberger

Slide 5

Slide 5 text

5 WE’RE SURROUNDED BY TIME SERIES (Pt. 2) ▸ Oktoberfest: Visitor and beer consumption trend the singularity

Slide 6

Slide 6 text

6 01 TIME SERIES: BASIC TERMS univariate time series multivariate time series multi-dimensional time series (time series tensor) time series set observation @adersberger

Slide 7

Slide 7 text

7 01 ILLUSTRATIVE OPERATIONS ON TIME SERIES align Time series => Time series diff downsampling outlier min/max avg/med slope std-dev Time series => Scalar @adersberger

Slide 8

Slide 8 text

OUR USE CASE

Slide 9

Slide 9 text

Monitoring Data Analysis   of a business-critical,  worldwide distributed   software system. Enable  root cause analysis and  anomaly detection.  > 1,000 nodes worldwide > 10 processes per node > 20 metrics per process  (OS, JVM, App-spec.) Measured every second. = about 6.3 trillions observations p.a.  Data retention: 5 yrs.

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

11 01 USE CASE: EXPLORING Drill-down host process measurements counters (metrics) Query time series metadata Superimpose time series @adersberger

Slide 12

Slide 12 text

12 01 USE CASE: STATISTICS @adersberger

Slide 13

Slide 13 text

13 01 USE CASE: ANOMALY DETECTION Featuring Twitter Anomaly Detection (https://github.com/twitter/AnomalyDetection  and Yahoo EGDAS https://github.com/yahoo/egads @adersberger

Slide 14

Slide 14 text

14 01 USE CASE: SQL AND ZEPPELIN @adersberger

Slide 15

Slide 15 text

CHRONIX SPARK https://github.com/ChronixDB/chronix.spark

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

http://www.datasciencecentral.com

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

19 01 AVAILABLE TIME SERIES DATABASES https://github.com/qaware/big-data-landscape

Slide 20

Slide 20 text

EASY-TO-USE BIG TIME SERIES DATA STORAGE & PROCESSING ON SPARK

Slide 21

Slide 21 text

21 01 THE CHRONIX STACK chronix.io Big time series database Scale-out Storage-efﬁcient Interactive queries  No separate servers: Drop-in   to existing Solr and Spark   installations  Integrated into the relevant  open source ecosystem @adersberger Core Chronix Storage Chronix Server Chronix Spark Chronix Format Grafana Chronix Analytics Collection Analytics Frontends Logstash fluentd collectd Zeppelin Prometheus Ingestion Bridge KairosDB OpenTSDB InfluxDB Graphite

Slide 22

Slide 22 text

22 node Distributed Data &  Data Retrieval ‣ Data sharding ‣ Fast index-based queries ‣ Efficient storage format Distributed Processing ‣ Heavy lifting distributed processing ‣ Efficient integration of Spark and Solr Result Processing Post-processing on a smaller set of time series data flow icon credits to Nimal Raj (database), Arthur Shlain (console) and alvarobueno (takslist) @adersberger

Slide 23

Slide 23 text

23 TIME SERIES MODEL Set of univariate multi-dimensional numeric time series ▸ set … because it’s more ﬂexible and better to parallelise if operations can input and output multiple time series. ▸ univariate … because multivariate will introduce too much complexity (and we have our set to bundle multiple time series). ▸ multi-dimensional … because the ability to slice & dice in the set of time series is very convenient for a lot of use cases. ▸ numeric … because it’s the most common use case. A single time series is identiﬁed by a combination of its non-temporal dimensional values (e.g. unit “mem usage” + host “aws42” + process “tomcat”) @adersberger

Slide 24

Slide 24 text

24 01 CHRONIX SPARK API: ENTRY POINTS CHRONIX SPARK   ChronixRDD ChronixSparkContext ‣ Represents a set of time series ‣ Distributed operations on sets of time series ‣ Creates ChronixRDDs ‣ Speaks with the Chronix Server (Solr) @adersberger

Slide 25

Slide 25 text

25 01 CHRONIX SPARK API: DATA MODEL MetricTimeSeries MetricObservation DataFrame + toDataFrame() @adersberger Dataset Dataset + toDataset() + toObservationsDataset() ChronixRDD

Slide 26

Slide 26 text

26 01 SPARK APIs FOR DATA PROCESSING RDD DataFrame Dataset typed yes no yes optimized medium highly highly mature yes yes medium SQL no yes no @adersberger

Slide 27

Slide 27 text

27 01 CHRONIX RDD Statistical operations the set characteristic:   a JavaRDD of   MetricTimeSeries Filter the set (esp. by  dimensions) @adersberger

Slide 28

Slide 28 text

28 01 METRICTIMESERIES DATA TYPE access all timestamps the multi-dimensionality:  get/set dimensions  (attributes) access all observations as stream access all numeric values @adersberger

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

30 01 //Create Chronix Spark context from a SparkContext / JavaSparkContext  ChronixSparkContext csc = new ChronixSparkContext(sc);    //Read data into ChronixRDD  SolrQuery query = new SolrQuery(  "metric:\"java.lang:type=Memory/HeapMemoryUsage/used\"");    ChronixRDD rdd = csc.query(query,  "localhost:9983", //ZooKeeper host  "chronix", //Solr collection for Chronix  new ChronixSolrCloudStorage());    //Calculate the overall min/max/mean of all time series in the RDD  double min = rdd.min();  double max = rdd.max();  double mean = rdd.mean(); DataFrame df = rdd.toDataFrame(sqlContext);  DataFrame res = df  .select("time", "value", "process", "metric")  .where("process='jenkins-jolokia'")  .orderBy("time");  res.show(); @adersberger

Slide 31

Slide 31 text

CHRONIX SPARK INTERNALS

Slide 32

Slide 32 text

32 Distributed Data &  Data Retrieval ‣ Data sharding (OK) ‣ Fast index-based queries (OK) ‣ Efﬁcient storage format @adersberger

Slide 33

Slide 33 text

33 01 CHRONIX FORMAT: CHUNKING TIME SERIES TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map ‣ observations: byte[] TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map ‣ observations: byte[] Logical TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map ‣ observations: byte[] Physical Chunking: 1 logical time series =   n physical time series (chunks) 1 chunk = ﬁxed amount of observations 1 chunk = 1 Solr document @adersberger

Slide 34

Slide 34 text

34 01 CHRONIX FORMAT: ENCODING OF OBSERVATIONS Binary encoding of all timestamp/value pairs (observations) with ProtoBuf incl. binary compression. Delta encoding leading to more effective binary compression … of time stamps (DCC, Date-Delta-Compaction)              … of values: diff chunck • timespan • nbr. of observations periodic distributed time stamps (pts): timespan / nbr. of observations real time stamps (rts) if |pts(x) - rts(x)| < threshold : rts(x) = pts(x) value_to_store = pts(x) - rts(x) value_to_store = value(x) - value(x-1) @adersberger

Slide 35

Slide 35 text

35 01 CHRONIX FORMAT: TUNING CHUNK SIZE AND CODEC GZIP + 128 kBytes Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger  Chronix: Efficient Storage and Query of Operational Time Series International Conference on Software Maintenance and Evolution 2016 (submitted) @adersberger storage   demand access  time

Slide 36

Slide 36 text

36 01 CHRONIX FORMAT: STORAGE EFFICIENCY BENCHMARK @adersberger

Slide 37

Slide 37 text

37 01 CHRONIX FORMAT: PERFORMANCE BENCHMARK unit: seconds nbr of queries query @adersberger

Slide 38

Slide 38 text

38 Distributed Processing ‣ Heavy lifting distributed processing ‣ Efﬁcient integration of Spark  and Solr @adersberger

Slide 39

Slide 39 text

39 01 SPARK AND SOLR BEST PRACTICES: ALIGN PARALLELISM SolrDocument  (Chunk) Solr Shard Solr Shard TimeSeries TimeSeries TimeSeries TimeSeries TimeSeries Partition Partition ChronixRDD • Unit of parallelism in Spark: Partition • Unit of parallelism in Solr: Shard • 1 Spark Partition = 1 Solr Shard SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) SolrDocument  (Chunk) @adersberger

Slide 40

Slide 40 text

40 01 ALIGN THE PARALLELISM WITHIN CHRONIXRDD public ChronixRDD queryChronixChunks(  final SolrQuery query,  final String zkHost,  final String collection,  final ChronixSolrCloudStorage chronixStorage) throws SolrServerException, IOException {    // first get a list of replicas to query for this collection  List shards = chronixStorage.getShardList(zkHost, collection);    // parallelize the requests to the shards  JavaRDD docs = jsc.parallelize(shards, shards.size()).flatMap(  (FlatMapFunction) shardUrl -> chronixStorage.streamFromSingleNode(  new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);  return new ChronixRDD(docs);  } Figure out all Solr shards (using CloudSolrClient in the background) Query each shard in parallel and convert SolrDocuments to MetricTimeSeries @adersberger

Slide 41

Slide 41 text

41 01 SPARK AND SOLR BEST PRACTICES: PUSHDOWN SolrQuery query = new SolrQuery(  “");    ChronixRDD rdd = csc.query(query, … @adersberger Predicate pushdown • Pre-ﬁlter time series based on their   metadata (dimensions, start, end)  with Solr.  Aggregation pushdown • Perform pre-aggregations (min/max/avg/…) at ingestion time and store it as metadata. • (to come) Perform aggregations on Solr-level at query time by enabling Solr to decode observations

Slide 42

Slide 42 text

42 01 SPARK AND SOLR BEST PRACTICES: EFFICIENT DATA TRANSFER Reduce volume: Pushdown & compression  Use efﬁcient protocols:   Low-overhead, bulk, stream  Avoid remote transfer: Place Spark  tasks (processes 1 partition) on the   Solr node with the appropriate shard.  (to come by using SolrRDD) @adersberger Export   Handler Chronix  RDD CloudSolr  Stream Format Decoder bulk of   JSON tuples Chronix Spark Solr / SolrJ

Slide 43

Slide 43 text

43 private Stream   streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query,  TimeSeriesConverter converter) throws IOException {  Map params = new HashMap();  params.put("q", query.getQuery());  params.put("sort", "id asc");  params.put("shards", extractShardIdFromShardUrl(shardUrl));  params.put("fl",  Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END +  ", metric, host, measurement, process, ag, group");  params.put("qt", "/export");  params.put("distrib", false);    CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params);  solrStream.open();  SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter);  return StreamSupport.stream( Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);  } Pin query to one shard Use export request handler Boilerplate code to stream response @adersberger

Slide 44

Slide 44 text

Time Series Databases should be ﬁrst-class citizens. Chronix leverages Solr and Spark to   be storage efﬁcient and to allow interactive   queries for big time series data.

Slide 45

Slide 45 text

THANK YOU! QUESTIONS? Mail: [email protected] Twitter: @adersberger TWITTER.COM/QAWARE - SLIDESHARE.NET/QAWARE

Slide 46

Slide 46 text

BONUS SLIDES

Slide 47

Slide 47 text

PERFORMANCE

Slide 48

Slide 48 text

codingvoding.tumblr.com

Slide 49

Slide 49 text

PREMATURE OPTIMIZATION IS NOT EVIL IF YOU HANDLE BIG DATA Josef Adersberger

Slide 50

Slide 50 text

PERFORMANCE USING A JAVA PROFILER WITH A LOCAL CLUSTER

Slide 51

Slide 51 text

PERFORMANCE HIGH-PERFORMANCE, LOW-OVERHEAD COLLECTIONS

Slide 52

Slide 52 text

PERFORMANCE 830 MB -> 360 MB  (- 57%) unveiled wrong Jackson   handling inside of SolrClient

Slide 53

Slide 53 text

53 01 THE SECRETS OF DISTRIBUTED PROCESSING PERFORMANCE   Rule 1: Be as close to the data as possible!  (CPU cache > memory > local disk > network)  Rule 2: Reduce data volume as early as possible!   (as long as you don’t sacriﬁce parallelization)  Rule 3: Parallelize as much as possible!   (max = #cores * x)

Slide 54

Slide 54 text

PERFORMANCE THE RULES APPLIED ‣ Rule 1: Be as close to the data as possible! 1. Solr caching 2. Spark in-memory processing with activated RDD compression 3. Binary protocol between Solr and Spark  ‣ Rule 2: Reduce data volume as early as possible! ‣ Eﬃcient storage format (Chronix Format) ‣ Predicate pushdown to Solr (query) ‣ Group-by & aggregation pushdown to Solr (faceting within a query)  ‣ Rule 3: Parallelize as much as possible! ‣ Scale-out on data-level with SolrCloud ‣ Scale-out on processing-level with Spark

Slide 55

Slide 55 text

APACHE SPARK 101

Slide 56

Slide 56 text

CHRONIX SPARK WONDERLAND ARCHITECTURE

Slide 57

Slide 57 text

APACHE SPARK SPARK TERMINOLOGY (1/2) ▸ RDD: Has transformations and actions. Hides data partitioning & distributed computation. References a set of partitions (“output partitions”) - materialized or not - and has dependencies to another RDD (“input partitions”). RDD operations are evaluated as late as possible (when an action is called). As long as not being the root RDD the partitions of an RDD are in memory but they can be persisted by request. ▸ Partitions: (Logical) chunks of data. Default unit and level of parallelism - inside of a partition everything is a sequential operation on records. Has to ﬁt into memory. Can have different representations (in-memory, on disk, off heap, …)

Slide 58

Slide 58 text

APACHE SPARK SPARK TERMINOLOGY (2/2) ▸ Job: A computation job which is launched when an action is called on a RDD. ▸ Task: The atomic unit of work (function). Bound to exactly one partition. ▸ Stage: Set of Task pipelines which can be executed in parallel on one executor. ▸ Shuffling: If partitions need to be transferred between executors. Shuffle write = outbound partition transfer. Shuffle read = inbound partition transfer. ▸ DAG Scheduler: Computes DAG of stages from RDD DAG. Determines the preferred location for each task.

Slide 59

Slide 59 text

THE COMPETITORS / ALTERNATIVES CHRONIX RDD VS. SPARK-TS ▸ Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown. ▸ In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation ▸ Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM. ▸ Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML. ▸ Chronix is purely written in Java. There is no explicit support for Python and Scala yet. ▸ Chronix doesn not support a ZonedTime as this makes it way more complicated.

Slide 60

Slide 60 text

CHRONIX SPARK INTERNALS

Slide 61

Slide 61 text

61 01 CHRONIXRDD: GET THE CHUNKS FROM SOLR public ChronixRDD queryChronixChunks(  final SolrQuery query,  final String zkHost,  final String collection,  final ChronixSolrCloudStorage chronixStorage) throws SolrServerException, IOException {    // first get a list of replicas to query for this collection  List shards = chronixStorage.getShardList(zkHost, collection);    // parallelize the requests to the shards  JavaRDD docs = jsc.parallelize(shards, shards.size()).flatMap(  (FlatMapFunction) shardUrl -> chronixStorage.streamFromSingleNode(  new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);  return new ChronixRDD(docs);  } Figure out all Solr shards (using CloudSolrClient in the background) Query each shard in parallel and convert SolrDocuments to MetricTimeSeries

Slide 62

Slide 62 text

62 01 BINARY PROTOCOL WITH STANDARD SOLR CLIENT private Stream streamWithHttpSolrClient(String shardUrl,  SolrQuery query,  TimeSeriesConverter converter) {  HttpSolrClient solrClient = getSingleNodeSolrClient(shardUrl);  solrClient.setRequestWriter(new BinaryRequestWriter());  query.set("distrib", false);  SolrStreamingService solrStreamingService =   new SolrStreamingService<>(converter, query, solrClient, nrOfDocumentPerBatch);  return StreamSupport.stream(  Spliterators.spliteratorUnknownSize(solrStreamingService, Spliterator.SIZED), false);  } Use HttpSolrClient pinned to one shard Use binary (request)  protocol Boilerplate code to stream response

Slide 63

Slide 63 text

63 private Stream   streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query,  TimeSeriesConverter converter) throws IOException {  Map params = new HashMap();  params.put("q", query.getQuery());  params.put("sort", "id asc");  params.put("shards", extractShardIdFromShardUrl(shardUrl));  params.put("fl",  Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END +  ", metric, host, measurement, process, ag, group");  params.put("qt", "/export");  params.put("distrib", false);    CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params);  solrStream.open();  SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter);  return StreamSupport.stream( Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);  } EXPORT HANDLER PROTOCOL Pin query to one shard Use export request handler Boilerplate code to stream response

Slide 64

Slide 64 text

64 01 CHRONIXRDD: FROM CHUNKS TO TIME SERIES public ChronixRDD joinChunks() {  JavaPairRDD> groupRdd  = this.groupBy(MetricTimeSeriesKey::new);    JavaPairRDD joinedRdd  = groupRdd.mapValues((Function, MetricTimeSeries>) mtsIt -> {  MetricTimeSeriesOrdering ordering = new MetricTimeSeriesOrdering();  List orderedChunks = ordering.immutableSortedCopy(mtsIt);  MetricTimeSeries result = null;  for (MetricTimeSeries mts : orderedChunks) {  if (result == null) {  result = new MetricTimeSeries  .Builder(mts.getMetric())  .attributes(mts.attributes()).build();  }  result.addAll(mts.getTimestampsAsArray(), mts.getValuesAsArray());  }  return result;  });    JavaRDD resultJavaRdd =  joinedRdd.map((Tuple2 mtTuple) -> mtTuple._2);    return new ChronixRDD(resultJavaRdd); } group chunks according identity join chunks to  logical time   series