What's New In Elasticland?

098332e9d988080a9057816f84d668f7?s=47 Elasticsearch Inc
March 17, 2016
730

What's New In Elasticland?

Following Elastic{ON}16 this is a recap of the key announcements followed by brief summaries of the talks on BM25 and FireEye's TAP security platform

098332e9d988080a9057816f84d668f7?s=128

Elasticsearch Inc

March 17, 2016
Tweet

Transcript

  1. 1 March 2016 What’s new in Elasticland? San Francisco Edition

  2. 2 Agenda 1. #elasticon16 2. BM25 3. Security with Elastic

    4. Q&A Q&A all the time!
  3. 3 Attendees 1,802 Elastic{ON} 16 – Pier 48, San Francisco,

    CA February 17 – 19, 2016 All recordings on https://www.elastic.co/elasticon/conf/2016/sf Days 3 Talks 28
  4. 4 Speakers

  5. 5

  6. 6

  7. 7

  8. 8 Updates

  9. 9 Meetups 500 Community Update Members 51,000+ Downloads 50M

  10. 10 You know, for Search

  11. 11 ELK stack

  12. 12 Along Came Beats

  13. 13 1 BELK? KELB? ELKB?

  14. 14

  15. 15 The Elastic Stack

  16. 16

  17. 17 Columnar Store

  18. 18 Java Security Manager

  19. 19 Profile API

  20. 20 Pipeline Aggregations Thu 31 Smooth Average Data Value Upper

    Control Limit August Aug 03 Tue 05 Thu 07 Sat 09 Mon 11 Wed 13 Fri 15 Aug 17 Tue 19 10 20 30 40 50 60 70 10 20 30 40 50 60 70
  21. 21 Demo Time NYC

  22. 22

  23. 23 Colors!

  24. 24 Custom Maps

  25. 25 Global Timezone Which 3 pm?

  26. 26

  27. 27 Config Reload Load from file webserver

  28. 28 75% MORE AWESOME Java Event

  29. 29 Performance

  30. 30

  31. 31 Packetbeat Capture the packet

  32. 32 Topbeat Old-school server monitoring

  33. 33 Topbeat The New World

  34. 34 Filebeat Logstash Forwarder Filebeat

  35. 35 Winlogbeat Let’s go from 1998

  36. 36 Winlogbeat To 2016

  37. 37 Metricbeat MySQL metricbeat Redis Apache + To the Future

  38. 38 Community execbeat elasticbeat redisbeat twitterbeat apachebeat nginxbeat dockerbeat pingbeat

    udpbeat
  39. 39 Versions

  40. 40 It’s complicated es kibana ls beats Nov 5, 2014

    1.4 May 23, 2015 1.5 Jun 9, 2015 1.6 Jul 16, 2015 1.7 Feb 19, 2015 4.0 Jun 10, 2015 4.1 May 14, 2015 1.5 May 27, 2015 1.0 Beta 1 Jul 13, 2015 1.0 Beta 2 Sep 4, 2015 1.0 Beta 3
  41. 41 Release Bonanza

  42. 42

  43. 43 Ingest “I just want to tail a file.”

  44. 44

  45. 45 Simple things should be simple

  46. 46 Ingest Node

  47. 47 Extensions

  48. 48 NO EDITIONS OPEN SOURCE VS. ENTERPRISE

  49. 49 Security (Shield) Scheduling (Watcher) Monitoring (Marvel) Graph (coming in

    2.3) Reporting (coming in 5.0)
  50. 50 Simple UI to Explore Your Data in New Ways

    22
  51. 51 Simple API that combines Search and Graph Techniques 21

    GET /wikipedia/_graph/explore { "query": { "query_string": { "query": "Jack Johnson” } }, "vertices": [{ "field": “artists.raw” }], "connections": { "vertices": [{ "field": “artists.raw" }] } }
  52. 52 Elasticsearch + Kibana as a Service Latest release of

    the Elastic Stack and X-Pack FOUND
  53. 53 Cloud as a Product * Not actual packaging

  54. 54 WATCH IT YOURSELF https://www.elastic.co/elasticon/conf/2016/sf/opening-keynote

  55. 55 Math Time TF/IDF and BM25

  56. 56 Lucene Practical Scoring Function score(q, d) = queryNorm(q) ・coord(q,

    d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q) Query normalization factor - Normalize query to compare with results from other queries Coordination factor - Reward documents with more individual query terms Term frequency – How often does the term appear in the doc? Inverse document frequency – How often does the term appear in all docs? Term boost Field length norm – How long is the field?
  57. 57 TF/IDF score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t

    in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q) TF/IDF
  58. 58 TF/IDF and BM25 score(q, d) = queryNorm(q) ・coord(q, d)

    ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q)
  59. 59 Term frequency bm25 - approaches limit tf/idf - keeps

    growing TF/IDF BM25
  60. 60 IDF comparison TF/IDF BM25

  61. 61 Field length norm TF/IDF BM25

  62. 62 So why BM25?

  63. 63 Why BM25? • Less influence of common words •

    Short fields not auto-boosted • Tweakable parameters (beware) • Is BM25 better? ‒ Literature suggests so ‒ Challenges suggest so (TREC, ...) ‒ Users say so ‒ Lucene developers say so ‒ Konrad Beiske says so: Blog “BM25 vs Lucene Default Similarity” • But: It depends on the features of your corpus
  64. 64 You’ll see. If you don’t like it, change it

    to any of: TF/IDF DFR DFI IB LM
  65. 65 WATCH IT YOURSELF https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25

  66. 66 USE CASE SECURITY

  67. 67 1 Chris Rimondi @crimondi TAP(ping) Out Security Threats at

    FireEye
  68. 68 7 TAP Overview

  69. 69 5 How do we enable the analyst?

  70. 70 11 Analysts asks TAP Citrix connections originating from Russia,

    China, Ireland and grouped by duration, received bytes, and destination port
  71. 71 12 Query Language class:bro_conn dstipv4:$external_citrix_serve rs srccountrycode:[ru,cn,ir] connstate:sf |

    groupby [duration,rcvdipbytes,dstport] 200
  72. 72 13 Elasticsearch DSL {"query":{"filtered":{"filter":{"and":[{"range":{"meta_ts": {"gte":"2016-01-26T17:00:00.000Z","lte":"2016-02-02T17:10:20. 478Z"}}},{"term":{"class":"bro_conn"}},{"terms":{"dstipv4": {"index":"lists","type":"indicator","id":"external_citrix_s ervers","path":"values","cache":false}}},{"terms": {"srccountrycode":["ru","cn","ir"],"execution":"or"}},

    {"term":{"connstate":"sf"}},{"limit":{"value": 208333}}]}}},"aggs":{"groupby:duration_rcvdipbytes_dstport": {"terms":{"lang":"native","script":"join","params": {"fields":["duration","rcvdipbytes","dstport"],"separator":", "},"size":200,"min_doc_count":1,"order": {"_count":"desc"}}}},"size":10,"from":0,"timeout":120000}
  73. 73 19 Raw Storage Across ~ 40 production clusters 3.6P

    700B 300K Production Footprint EPS Events per second indexed to production Indexed Events In 400+ Nodes Peak 20B/day
  74. 74 22 How many eggs can we fry with a

    bad regex query?
  75. 75 23 Show me credit card data! {"query":{"filtered":{"filter":{"and":[{"range": {"meta_ts": {"gte":"2015-10-25T13:00:00.000Z","lte":"2015-10

    -26T13:37:07.554Z"}}},{"query":{"common": {"metaclass": {"query":"http_proxy","low_freq_operator":"and", "high_freq_operator":"and","cutoff_frequency": 0.001,"analyzer":"standard"}}}},{"script": {"script":"regexp","lang":"native","params": {"regexp":".*encoding\\\\=.*\\\\&t\\\\=.*\\\\&cc \\\\=.*\\\\&process\\\\=.*\\\\&track\\\ \=/","field":"uri","limit":-1}}}]}}},"size": 10,"from":0,"timeout":120000}
  76. 76 24 1080 Cores pegged at 100% CPU for 83

    minutes !
  77. 77 25 Eggs Fried Per Query 1 2 3 4

    Thermal mass for a single egg is 274 J / °C Integrated temperature from 4 to 80 C gives us total heat of: 274 J/C * (80 - 4 °C) D2 series uses Haswell Intel Xeon E5-2673v3 processors Thermal Design Power: 120W We used 8 cores of the 12 cores total for .75 * 120W * 135 Procs = 90W Total Query execution time in seconds: 83 min x 60 s 5 Total Energy = 12,150W * 4980 seconds (length of query) 274 J/°C 20,812 J 12,150 W 4,990 s 60.5 MJ
  78. 78 26 2,907 Eggs Fried Searching for credit card track

    data in URIs
  79. 79 16 0 1 2 3 4 5 6 7

    8 0.9 (3) 1.1 (3) 1.2 (3) 1.3 (3) 1.5 (3) 1.7 (4) Wakeups per week Elasticsearch version (Number of kids) Elasticsearch Kids
  80. 80 WATCH IT YOURSELF https://www.elastic.co/elasticon/conf/2016/sf/tapping-out-security-threats-at-fireeye

  81. 81 ALSO WATCH Graph Capabilities in the Elastic Stack All

    Quiet on the Digital Front: Security Analytics @ USAA OpenSource Connections: The Ghost in the Search Machine Grid Monitoring at CERN with the Elastic Stack Contributing to Elasticsearch: How to Get Started All recordings: https://www.elastic.co/elasticon/conf/2016/sf/
  82. 82 Core Elasticsearch: Operations STOCKHOLM, Sweden April 25 Core Elasticsearch:

    Developer STOCKHOLM, Sweden April 26 - 27 training.elastic.co Public Training
  83. 83 Q&A ASK ME ANYTHING