Web Content Analytics at Scale: How Parse.ly uses Elasticsearch for Multi-Terabyte Time Series

Slide 1

Slide 1 text

‹#› Andrew Montalenti! CTO, Parse.ly @amontalenti Web Content Analytics at Scale with Parse.ly

Slide 2

Slide 2 text

content analytics fuels real-time decision-making for people who run the largest sites

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

175 TB of compressed customer data.! Growing 20 TB+ per month.

Slide 12

Slide 12 text

before MongoDB, before Cassandra, before “NoSQL”, there was Lucene.

Slide 13

Slide 13 text

2013! "Can one use Solr as a Time Series Engine?"

Slide 14

Slide 14 text

2014! "Are Elasticsearch aggregations a dream query layer?"

Slide 15

Slide 15 text

2015! "Can we push ES to take 10K+ writes/sec and store 10TB+ of customer data?"

Slide 16

Slide 16 text

(turns out, answer to each question is "yes", but lots of caveats!)

Slide 17

Slide 17 text

ﬁelddata: don't do it!! ! doc_values All The Things!

Slide 18

Slide 18 text

_source: don't do it!! ! especially if your schema has high-cardinality multi-value ﬁelds.

Slide 19

Slide 19 text

"logstash-style" raw records! are nice, but...! ! to operate with good query latency, you need rollups, and these are tricky.

Slide 20

Slide 20 text

{ "url": "http://arstechnica.com/12345", "ts": "2015-01-02T00:00:000Z",! "visitors": ["3f3f", "3f3g", ...millions],! ! "metrics": { "$all/page_views": 6200000, "desktop/page_views": 4200000,! "mobile/page_views": 2000000,! "$all/engaged_secs": 27500000,! "new/engaged_secs": 250000000,! "returning/engaged_secs": 25000000, }, ! "metas": { "title": "Obama gives speech",! "authors": ["Mike Journo"],! "section": "Politics",! "pub_date": "2015-01-02T08:00:000Z", } } partition and time bucket high-cardinality metric numeric metrics metadata 1day rollup! (1 per day)

Slide 21

Slide 21 text

{ "url": "http://arstechnica.com/12345", "ts": "2015-01-02T08:05:000Z",! "visitors": ["3f3f", "3f3g", ...hundreds],! ! "metrics": { "$all/page_views": 62, "desktop/page_views": 42,! "mobile/page_views": 20,! "$all/engaged_secs": 275,! "new/engaged_secs": 250,! "returning/engaged_secs": 25, }, ! "metas": { "title": "Obama gives speech",! "authors": ["Mike Journo"],! "section": "Politics",! "pub_date": "2015-01-02T08:00:000Z", } } partition and time bucket high-cardinality metric numeric metrics metadata 5min rollup! (288 per day)

Slide 22

Slide 22 text

{ "url": "http://arstechnica.com/12345", "ts": "2015-01-02T08:05:123Z",! "visitors": ["3f3f3"],! ! "metrics": { "$all/page_views": 1, "desktop/page_views": 1,! "mobile/page_views": 0,! "$all/engaged_secs": 0,! "new/engaged_secs": 0,! "returning/engaged_secs": 0, }, ! "metas": { "title": "Obama gives speech",! "authors": ["Mike Journo"],! "section": "Politics",! "pub_date": "2015-01-02T08:00:000Z", } } partition and time bucket high-cardinality metric numeric metrics metadata raw event! (millions per day)

Slide 23

Slide 23 text

url-raw by hour url-5min by day url-1day by month document grouping! in time-based indices

Slide 24

Slide 24 text

top_things(...)! thing_details(...)! site_timeline(...) initial data access layer

Slide 25

Slide 25 text

historical analytics for spotting long-term opportunities, trends, and insights in heaps of data

Slide 26

Slide 26 text

Parse.ly "Batch Layer" Topologies with Spark and Amazon S3 Parse.ly "Speed Layer" Topologies with Storm & Kafka Parse.ly Dashboards and APIs with Elasticsearch & Cassandra Parse.ly Raw Data Pipeline with Amazon Kinesis & S3 Access mage building blocks

Slide 27

Slide 27 text

rebuild the world!

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Mid-2015! "This sort of works, but seems that we need more hardware... and what's up with response times?"

Slide 30

Slide 30 text

"You need to give big ! customers their own indices." - Otis "You need to use node-shard! allocation for hot/cold tiers." - Radu

Slide 31

Slide 31 text

Time-based indices Index versioning Customer namespaces Node-shard allocation all together now!

Slide 32

Slide 32 text

Time-Based • v1_shared-1day-2015.01 • v1_shared-1day-2015.02

Slide 33

Slide 33 text

Versioning • v1_shared-1day-2015.01 • v2_shared-1day-2015.01

Slide 34

Slide 34 text

Namespaces • v1_shared-1day-2015.01 • v1_condenast-1day-2015.01

Slide 35

Slide 35 text

Node-Shard Allocation • v1_shared-1day-2015.01 => cold (mem, rust)! • v1_shared-5min-2015.02.01 => warm (mem, ssd)! • v1_shared-5min-2015.03.15 => hot (mem, cpu)! • v1_shared-raw-2015.03.15T12 => raw (cpu)

Slide 36

Slide 36 text

• Cluster: 40 nodes, 500+ indices, 7,000+ shards • Tiers: 4 client, 3 master, 9 raw, 9 hot, 12 warm, 3 cold • Instances: 1TB+ of RAM, 500+ CPU cores • Disks: 12+ TB data, >50% in SSDs, rest in rust • Writes: 10K+ writes per second • Reads: 100's of aggregations per second

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Late-2015! "This is shipped! It works! ... but, some issues remain."

Slide 40

Slide 40 text

OOMs = bugs timeouts = lies queries = hogs bugs, lies, and hogs

Slide 41

Slide 41 text

In the worst case, a bad query takes longer than its timeout, hogs the cluster, and hits an OOM bug.

Slide 42

Slide 42 text

better resiliency store compression task management aggregation paging query proﬁling excited about future

Slide 43

Slide 43 text

Questions? Tweet to @amontalenti!

Slide 44

Slide 44 text

links • Lucene: The Good Parts! • Mage: The Magical Time Series Backend! • Pythonic Analytics with Elasticsearch! • Visit us: http://parse.ly • Join us: http://parse.ly/jobs

Slide 45

Slide 45 text

appendix

Slide 46

Slide 46 text

building mage streaming time series engine for our next 1,000 customers

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

pykafka ingest raw event data at high speed

Slide 52

Slide 52 text

Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 consumer = ... # balanced while True: msg = consumer.consume() msg = json.loads(msg) urlparse(msg["url"]) Python State Code Python State Code Python State Code Python State Code Python State Code pykafka.producer Python State Code

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

scale-out functions over a stream of inputs in order to generate a stream of outputs

Slide 55

Slide 55 text

Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 Python State Code Python State Code Python State Code Python State Code pykafka.producer Python State Code multi-lang json protocol class UrlParser(Topology): url_spout = UrlSpout.spec(p=1) url_bolt = UrlBolt.spec(p=4, input=url_spout)

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

pyspark scale-out batch functions over static dataset to perform transformations and actions

Slide 58

Slide 58 text

Python State Code Server 1 Core 2 Core 1 Server 2 Core 2 Core 1 Server 3 Core 2 Core 1 Python State Code Python State Code Python State Code Python State Code pyspark.SparkContext sc = SparkContext() file_rdd = sc.textFile(files) file_rdd.map(urlparse).take(1) cloudpickle py4j and binary pipes

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

lesson learned log-oriented "lambda architecture" works well, but it costs time and money!

Slide 61

Slide 61 text

multi-process, not multi-thread multi-node, not multi-core message passing, not shared memory ! heaps of data and streams of data