$30 off During Our Annual Pro Sale. View Details »

What's New In Elasticland?

Elasticsearch Inc
March 17, 2016
780

What's New In Elasticland?

Following Elastic{ON}16 this is a recap of the key announcements followed by brief summaries of the talks on BM25 and FireEye's TAP security platform

Elasticsearch Inc

March 17, 2016
Tweet

Transcript

  1. 1
    March 2016
    What’s new in Elasticland?
    San Francisco Edition

    View Slide

  2. 2
    Agenda
    1. #elasticon16
    2. BM25
    3. Security with Elastic
    4. Q&A
    Q&A all the time!

    View Slide

  3. 3
    Attendees
    1,802
    Elastic{ON} 16 – Pier 48, San Francisco, CA
    February 17 – 19, 2016
    All recordings on https://www.elastic.co/elasticon/conf/2016/sf
    Days
    3
    Talks
    28

    View Slide

  4. 4
    Speakers

    View Slide

  5. 5

    View Slide

  6. 6

    View Slide

  7. 7

    View Slide

  8. 8
    Updates

    View Slide

  9. 9
    Meetups
    500
    Community Update
    Members
    51,000+
    Downloads
    50M

    View Slide

  10. 10
    You know, for Search

    View Slide

  11. 11
    ELK stack

    View Slide

  12. 12
    Along Came Beats

    View Slide

  13. 13 1
    BELK?
    KELB?
    ELKB?

    View Slide

  14. 14

    View Slide

  15. 15
    The
    Elastic
    Stack

    View Slide

  16. 16

    View Slide

  17. 17
    Columnar Store

    View Slide

  18. 18
    Java Security Manager

    View Slide

  19. 19
    Profile API

    View Slide

  20. 20
    Pipeline Aggregations
    Thu 31
    Smooth Average Data Value Upper Control Limit
    August Aug 03 Tue 05 Thu 07 Sat 09 Mon 11 Wed 13 Fri 15 Aug 17 Tue 19
    10
    20
    30
    40
    50
    60
    70
    10
    20
    30
    40
    50
    60
    70

    View Slide

  21. 21
    Demo Time
    NYC

    View Slide

  22. 22

    View Slide

  23. 23
    Colors!

    View Slide

  24. 24
    Custom Maps

    View Slide

  25. 25
    Global Timezone
    Which
    3 pm?

    View Slide

  26. 26

    View Slide

  27. 27
    Config Reload
    Load from file
    webserver

    View Slide

  28. 28
    75%
    MORE
    AWESOME
    Java Event

    View Slide

  29. 29
    Performance

    View Slide

  30. 30

    View Slide

  31. 31
    Packetbeat
    Capture the
    packet

    View Slide

  32. 32
    Topbeat
    Old-school
    server
    monitoring

    View Slide

  33. 33
    Topbeat
    The
    New World

    View Slide

  34. 34
    Filebeat
    Logstash
    Forwarder
    Filebeat

    View Slide

  35. 35
    Winlogbeat
    Let’s go
    from 1998

    View Slide

  36. 36
    Winlogbeat
    To 2016

    View Slide

  37. 37
    Metricbeat
    MySQL
    metricbeat
    Redis Apache +
    To the Future

    View Slide

  38. 38
    Community
    execbeat elasticbeat redisbeat
    twitterbeat apachebeat nginxbeat
    dockerbeat pingbeat udpbeat

    View Slide

  39. 39
    Versions

    View Slide

  40. 40
    It’s complicated
    es
    kibana
    ls
    beats
    Nov 5, 2014
    1.4
    May 23, 2015
    1.5
    Jun 9, 2015
    1.6
    Jul 16, 2015
    1.7
    Feb 19, 2015
    4.0
    Jun 10, 2015
    4.1
    May 14, 2015
    1.5
    May 27, 2015
    1.0 Beta 1
    Jul 13, 2015
    1.0 Beta 2
    Sep 4, 2015
    1.0 Beta 3

    View Slide

  41. 41
    Release Bonanza

    View Slide

  42. 42

    View Slide

  43. 43
    Ingest
    “I just want to tail a file.”

    View Slide

  44. 44

    View Slide

  45. 45
    Simple things should be simple

    View Slide

  46. 46
    Ingest Node

    View Slide

  47. 47
    Extensions

    View Slide

  48. 48
    NO EDITIONS
    OPEN SOURCE VS. ENTERPRISE

    View Slide

  49. 49
    Security (Shield)
    Scheduling (Watcher)
    Monitoring (Marvel)
    Graph (coming in 2.3)
    Reporting (coming in 5.0)

    View Slide

  50. 50
    Simple UI to Explore Your Data in New Ways
    22

    View Slide

  51. 51
    Simple API that combines Search and Graph Techniques
    21
    GET /wikipedia/_graph/explore
    {
    "query": {
    "query_string": { "query": "Jack Johnson” }
    },
    "vertices": [{ "field": “artists.raw” }],
    "connections": {
    "vertices": [{ "field": “artists.raw" }]
    }
    }

    View Slide

  52. 52
    Elasticsearch + Kibana as a Service
    Latest release of the Elastic Stack and X-Pack
    FOUND

    View Slide

  53. 53
    Cloud as a Product
    * Not actual packaging

    View Slide

  54. 54
    WATCH IT YOURSELF
    https://www.elastic.co/elasticon/conf/2016/sf/opening-keynote

    View Slide

  55. 55
    Math Time
    TF/IDF and BM25

    View Slide

  56. 56
    Lucene Practical Scoring Function
    score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q)
    Query normalization
    factor - Normalize query
    to compare with results
    from other queries
    Coordination factor -
    Reward documents
    with more individual
    query terms
    Term frequency –
    How often does the
    term appear in the
    doc?
    Inverse document
    frequency – How
    often does the term
    appear in all docs?
    Term boost
    Field length norm
    – How long is the
    field?

    View Slide

  57. 57
    TF/IDF
    score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q)
    TF/IDF

    View Slide

  58. 58
    TF/IDF and BM25
    score(q, d) = queryNorm(q) ・coord(q, d) ・∑ (tf(t in d) ・idf(t)2 ・norm(t, d)・t.getBoost()) (t in q)

    View Slide

  59. 59
    Term frequency
    bm25 - approaches limit
    tf/idf - keeps growing
    TF/IDF
    BM25

    View Slide

  60. 60
    IDF comparison
    TF/IDF
    BM25

    View Slide

  61. 61
    Field length norm
    TF/IDF
    BM25

    View Slide

  62. 62
    So why BM25?

    View Slide

  63. 63
    Why BM25?
    • Less influence of common words
    • Short fields not auto-boosted
    • Tweakable parameters (beware)
    • Is BM25 better?
    ‒ Literature suggests so
    ‒ Challenges suggest so (TREC, ...)
    ‒ Users say so
    ‒ Lucene developers say so
    ‒ Konrad Beiske says so: Blog “BM25 vs Lucene Default Similarity”
    • But: It depends on the features of your corpus

    View Slide

  64. 64
    You’ll see.
    If you don’t like it, change it to any of:
    TF/IDF
    DFR
    DFI
    IB
    LM

    View Slide

  65. 65
    WATCH IT YOURSELF
    https://www.elastic.co/elasticon/conf/2016/sf/improved-text-scoring-with-bm25

    View Slide

  66. 66
    USE CASE
    SECURITY

    View Slide

  67. 67 1
    Chris Rimondi
    @crimondi
    TAP(ping) Out Security
    Threats at FireEye

    View Slide

  68. 68 7
    TAP Overview

    View Slide

  69. 69 5
    How do we enable
    the analyst?

    View Slide

  70. 70 11
    Analysts asks TAP
    Citrix connections originating
    from Russia, China, Ireland
    and grouped by duration,
    received bytes, and destination
    port

    View Slide

  71. 71 12
    Query Language
    class:bro_conn
    dstipv4:$external_citrix_serve
    rs srccountrycode:[ru,cn,ir]
    connstate:sf | groupby
    [duration,rcvdipbytes,dstport]
    200

    View Slide

  72. 72 13
    Elasticsearch DSL
    {"query":{"filtered":{"filter":{"and":[{"range":{"meta_ts":
    {"gte":"2016-01-26T17:00:00.000Z","lte":"2016-02-02T17:10:20.
    478Z"}}},{"term":{"class":"bro_conn"}},{"terms":{"dstipv4":
    {"index":"lists","type":"indicator","id":"external_citrix_s
    ervers","path":"values","cache":false}}},{"terms":
    {"srccountrycode":["ru","cn","ir"],"execution":"or"}},
    {"term":{"connstate":"sf"}},{"limit":{"value":
    208333}}]}}},"aggs":{"groupby:duration_rcvdipbytes_dstport":
    {"terms":{"lang":"native","script":"join","params":
    {"fields":["duration","rcvdipbytes","dstport"],"separator":",
    "},"size":200,"min_doc_count":1,"order":
    {"_count":"desc"}}}},"size":10,"from":0,"timeout":120000}

    View Slide

  73. 73 19
    Raw Storage
    Across ~ 40
    production
    clusters
    3.6P 700B 300K
    Production Footprint
    EPS
    Events per second
    indexed to
    production
    Indexed
    Events
    In 400+ Nodes
    Peak 20B/day

    View Slide

  74. 74 22
    How many
    eggs can we
    fry with a bad
    regex query?

    View Slide

  75. 75 23
    Show me credit card data!
    {"query":{"filtered":{"filter":{"and":[{"range":
    {"meta_ts":
    {"gte":"2015-10-25T13:00:00.000Z","lte":"2015-10
    -26T13:37:07.554Z"}}},{"query":{"common":
    {"metaclass":
    {"query":"http_proxy","low_freq_operator":"and",
    "high_freq_operator":"and","cutoff_frequency":
    0.001,"analyzer":"standard"}}}},{"script":
    {"script":"regexp","lang":"native","params":
    {"regexp":".*encoding\\\\=.*\\\\&t\\\\=.*\\\\&cc
    \\\\=.*\\\\&process\\\\=.*\\\\&track\\\
    \=/","field":"uri","limit":-1}}}]}}},"size":
    10,"from":0,"timeout":120000}

    View Slide

  76. 76 24
    1080 Cores pegged at 100%
    CPU for 83 minutes !

    View Slide

  77. 77 25
    Eggs Fried Per Query
    1 2 3 4
    Thermal mass for
    a single egg is
    274 J / °C
    Integrated
    temperature from
    4 to 80 C gives us
    total heat of: 274
    J/C * (80 - 4 °C)
    D2 series uses
    Haswell Intel
    Xeon E5-2673v3
    processors
    Thermal Design
    Power: 120W
    We used 8 cores
    of the 12 cores
    total for .75 *
    120W * 135 Procs
    = 90W
    Total Query
    execution time in
    seconds: 83 min x
    60 s
    5
    Total Energy =
    12,150W * 4980
    seconds (length of
    query)
    274 J/°C 20,812 J 12,150 W 4,990 s 60.5 MJ

    View Slide

  78. 78 26
    2,907
    Eggs Fried
    Searching for
    credit card track
    data in URIs

    View Slide

  79. 79 16
    0
    1
    2
    3
    4
    5
    6
    7
    8
    0.9 (3) 1.1 (3) 1.2 (3) 1.3 (3) 1.5 (3) 1.7 (4)
    Wakeups per week
    Elasticsearch version
    (Number of kids)
    Elasticsearch
    Kids

    View Slide

  80. 80
    WATCH IT YOURSELF
    https://www.elastic.co/elasticon/conf/2016/sf/tapping-out-security-threats-at-fireeye

    View Slide

  81. 81
    ALSO WATCH
    Graph Capabilities in the Elastic Stack
    All Quiet on the Digital Front: Security Analytics @ USAA
    OpenSource Connections: The Ghost in the Search Machine
    Grid Monitoring at CERN with the Elastic Stack
    Contributing to Elasticsearch: How to Get Started
    All recordings: https://www.elastic.co/elasticon/conf/2016/sf/

    View Slide

  82. 82
    Core Elasticsearch:
    Operations
    STOCKHOLM, Sweden April 25
    Core Elasticsearch:
    Developer
    STOCKHOLM, Sweden April 26 - 27
    training.elastic.co
    Public Training

    View Slide

  83. 83
    Q&A
    ASK ME ANYTHING

    View Slide