Slide 1

Slide 1 text

1 March 2016 What’s new in Elasticland? San Francisco Edition

Slide 2

Slide 2 text

2 Agenda 1. #elasticon16 2. BM25 3. Security with Elastic 4. Q&A Q&A all the time!

Slide 3

Slide 3 text

3 Attendees 1,802 Elastic{ON} 16 – Pier 48, San Francisco, CA February 17 – 19, 2016 All recordings on https://www.elastic.co/elasticon/conf/2016/sf Days 3 Talks 28

Slide 4

Slide 4 text

4 Speakers

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

7

Slide 8

Slide 8 text

8 Updates

Slide 9

Slide 9 text

9 Meetups 500 Community Update Members 51,000+ Downloads 50M

Slide 10

Slide 10 text

10 You know, for Search

Slide 11

Slide 11 text

11 ELK stack

Slide 12

Slide 12 text

12 Along Came Beats

Slide 13

Slide 13 text

13 1 BELK? KELB? ELKB?

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

15 The Elastic Stack

Slide 16

Slide 16 text

16

Slide 17

Slide 17 text

17 Columnar Store

Slide 18

Slide 18 text

18 Java Security Manager

Slide 19

Slide 19 text

19 Profile API

Slide 20

Slide 20 text

20 Pipeline Aggregations Thu 31 Smooth Average Data Value Upper Control Limit August Aug 03 Tue 05 Thu 07 Sat 09 Mon 11 Wed 13 Fri 15 Aug 17 Tue 19 10 20 30 40 50 60 70 10 20 30 40 50 60 70

Slide 21

Slide 21 text

21 Demo Time NYC

Slide 22

Slide 22 text

22

Slide 23

Slide 23 text

23 Colors!

Slide 24

Slide 24 text

24 Custom Maps

Slide 25

Slide 25 text

25 Global Timezone Which 3 pm?

Slide 26

Slide 26 text

26

Slide 27

Slide 27 text

27 Config Reload Load from file webserver

Slide 28

Slide 28 text

28 75% MORE AWESOME Java Event

Slide 29

Slide 29 text

29 Performance

Slide 30

Slide 30 text

30

Slide 31

Slide 31 text

31 Packetbeat Capture the packet

Slide 32

Slide 32 text

32 Topbeat Old-school server monitoring

Slide 33

Slide 33 text

33 Topbeat The New World

Slide 34

Slide 34 text

34 Filebeat Logstash Forwarder Filebeat

Slide 35

Slide 35 text

35 Winlogbeat Let’s go from 1998

Slide 36

Slide 36 text

36 Winlogbeat To 2016

Slide 37

Slide 37 text

37 Metricbeat MySQL metricbeat Redis Apache + To the Future

Slide 38

Slide 38 text

38 Community execbeat elasticbeat redisbeat twitterbeat apachebeat nginxbeat dockerbeat pingbeat udpbeat

Slide 39

Slide 39 text

39 Versions

Slide 40

Slide 40 text

40 It’s complicated es kibana ls beats Nov 5, 2014 1.4 May 23, 2015 1.5 Jun 9, 2015 1.6 Jul 16, 2015 1.7 Feb 19, 2015 4.0 Jun 10, 2015 4.1 May 14, 2015 1.5 May 27, 2015 1.0 Beta 1 Jul 13, 2015 1.0 Beta 2 Sep 4, 2015 1.0 Beta 3

Slide 41

Slide 41 text

41 Release Bonanza

Slide 42

Slide 42 text

42

Slide 43

Slide 43 text

43 Ingest “I just want to tail a file.”

Slide 44

Slide 44 text

44

Slide 45

Slide 45 text

45 Simple things should be simple

Slide 46

Slide 46 text

46 Ingest Node

Slide 47

Slide 47 text

47 Extensions

Slide 48

Slide 48 text

48 NO EDITIONS OPEN SOURCE VS. ENTERPRISE

Slide 49

Slide 49 text

49 Security (Shield) Scheduling (Watcher) Monitoring (Marvel) Graph (coming in 2.3) Reporting (coming in 5.0)

Slide 50

Slide 50 text

50 Simple UI to Explore Your Data in New Ways 22

Slide 51

Slide 51 text

51 Simple API that combines Search and Graph Techniques 21 GET /wikipedia/_graph/explore { "query": { "query_string": { "query": "Jack Johnson” } }, "vertices": [{ "field": “artists.raw” }], "connections": { "vertices": [{ "field": “artists.raw" }] } }

Slide 52

Slide 52 text

52 Elasticsearch + Kibana as a Service Latest release of the Elastic Stack and X-Pack FOUND

Slide 53

Slide 53 text

53 Cloud as a Product * Not actual packaging

Slide 54

Slide 54 text

54 WATCH IT YOURSELF https://www.elastic.co/elasticon/conf/2016/sf/opening-keynote

Slide 55

Slide 55 text

55 Math Time TF/IDF and BM25

Slide 56

Slide 56 text

56 Lucene Practical Scoring Function score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q) Query normalization factor - Normalize query to compare with results from other queries Coordination factor - Reward documents with more individual query terms Term frequency – How often does the term appear in the doc? Inverse document frequency – How often does the term appear in all docs? Term boost Field length norm – How long is the field?

Slide 57

Slide 57 text

57 TF/IDF score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q) TF/IDF

Slide 58

Slide 58 text

58 TF/IDF and BM25 score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q)

Slide 59

Slide 59 text

59 Term frequency bm25 - approaches limit tf/idf - keeps growing TF/IDF BM25

Slide 60

Slide 60 text

60 IDF comparison TF/IDF BM25

Slide 61

Slide 61 text

61 Field length norm TF/IDF BM25

Slide 62

Slide 62 text

62 So why BM25?

Slide 63

Slide 63 text

63 Why BM25? • Less influence of common words • Short fields not auto-boosted • Tweakable parameters (beware) • Is BM25 better? ‒ Literature suggests so ‒ Challenges suggest so (TREC, ...) ‒ Users say so ‒ Lucene developers say so ‒ Konrad Beiske says so: Blog “BM25 vs Lucene Default Similarity” • But: It depends on the features of your corpus

Slide 64

Slide 64 text

64 You’ll see. If you don’t like it, change it to any of: TF/IDF DFR DFI IB LM

Slide 65

Slide 65 text

65 WATCH IT YOURSELF https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25

Slide 66

Slide 66 text

66 USE CASE SECURITY

Slide 67

Slide 67 text

67 1 Chris Rimondi @crimondi TAP(ping) Out Security Threats at FireEye

Slide 68

Slide 68 text

68 7 TAP Overview

Slide 69

Slide 69 text

69 5 How do we enable the analyst?

Slide 70

Slide 70 text

70 11 Analysts asks TAP Citrix connections originating from Russia, China, Ireland and grouped by duration, received bytes, and destination port

Slide 71

Slide 71 text

71 12 Query Language class:bro_conn dstipv4:$external_citrix_serve rs srccountrycode:[ru,cn,ir] connstate:sf | groupby [duration,rcvdipbytes,dstport] 200

Slide 72

Slide 72 text

72 13 Elasticsearch DSL {"query":{"filtered":{"filter":{"and":[{"range":{"meta_ts": {"gte":"2016-01-26T17:00:00.000Z","lte":"2016-02-02T17:10:20. 478Z"}}},{"term":{"class":"bro_conn"}},{"terms":{"dstipv4": {"index":"lists","type":"indicator","id":"external_citrix_s ervers","path":"values","cache":false}}},{"terms": {"srccountrycode":["ru","cn","ir"],"execution":"or"}}, {"term":{"connstate":"sf"}},{"limit":{"value": 208333}}]}}},"aggs":{"groupby:duration_rcvdipbytes_dstport": {"terms":{"lang":"native","script":"join","params": {"fields":["duration","rcvdipbytes","dstport"],"separator":", "},"size":200,"min_doc_count":1,"order": {"_count":"desc"}}}},"size":10,"from":0,"timeout":120000}

Slide 73

Slide 73 text

73 19 Raw Storage Across ~ 40 production clusters 3.6P 700B 300K Production Footprint EPS Events per second indexed to production Indexed Events In 400+ Nodes Peak 20B/day

Slide 74

Slide 74 text

74 22 How many eggs can we fry with a bad regex query?

Slide 75

Slide 75 text

75 23 Show me credit card data! {"query":{"filtered":{"filter":{"and":[{"range": {"meta_ts": {"gte":"2015-10-25T13:00:00.000Z","lte":"2015-10 -26T13:37:07.554Z"}}},{"query":{"common": {"metaclass": {"query":"http_proxy","low_freq_operator":"and", "high_freq_operator":"and","cutoff_frequency": 0.001,"analyzer":"standard"}}}},{"script": {"script":"regexp","lang":"native","params": {"regexp":".*encoding\\\\=.*\\\\&t\\\\=.*\\\\&cc \\\\=.*\\\\&process\\\\=.*\\\\&track\\\ \=/","field":"uri","limit":-1}}}]}}},"size": 10,"from":0,"timeout":120000}

Slide 76

Slide 76 text

76 24 1080 Cores pegged at 100% CPU for 83 minutes !

Slide 77

Slide 77 text

77 25 Eggs Fried Per Query 1 2 3 4 Thermal mass for a single egg is 274 J / °C Integrated temperature from 4 to 80 C gives us total heat of: 274 J/C * (80 - 4 °C) D2 series uses Haswell Intel Xeon E5-2673v3 processors Thermal Design Power: 120W We used 8 cores of the 12 cores total for .75 * 120W * 135 Procs = 90W Total Query execution time in seconds: 83 min x 60 s 5 Total Energy = 12,150W * 4980 seconds (length of query) 274 J/°C 20,812 J 12,150 W 4,990 s 60.5 MJ

Slide 78

Slide 78 text

78 26 2,907 Eggs Fried Searching for credit card track data in URIs

Slide 79

Slide 79 text

79 16 0 1 2 3 4 5 6 7 8 0.9 (3) 1.1 (3) 1.2 (3) 1.3 (3) 1.5 (3) 1.7 (4) Wakeups per week Elasticsearch version (Number of kids) Elasticsearch Kids

Slide 80

Slide 80 text

80 WATCH IT YOURSELF https://www.elastic.co/elasticon/conf/2016/sf/tapping-out-security-threats-at-fireeye

Slide 81

Slide 81 text

81 ALSO WATCH Graph Capabilities in the Elastic Stack All Quiet on the Digital Front: Security Analytics @ USAA OpenSource Connections: The Ghost in the Search Machine Grid Monitoring at CERN with the Elastic Stack Contributing to Elasticsearch: How to Get Started All recordings: https://www.elastic.co/elasticon/conf/2016/sf/

Slide 82

Slide 82 text

82 Core Elasticsearch: Operations STOCKHOLM, Sweden April 25 Core Elasticsearch: Developer STOCKHOLM, Sweden April 26 - 27 training.elastic.co Public Training

Slide 83

Slide 83 text

83 Q&A ASK ME ANYTHING