ElasticSearch Data Exploration in Your Terminal

ElasticSearch Data Exploration in Your Terminal

You've seen the pretty graphs. Visuals are great for signaling there is a problem somewhere in your system. How do you, a command line guru, go from pretty graphs to root cause analysis? Most likely you'll be reaching for paradigms from the command line: composability and a flexible, compact syntax to ask your questions. I'd like to talk more about integrating ElasticSearch-based dashboards back to the command line workflows I love.

This talk is an overview of a tool I developed while working at Booking.com to drastically reduce the time and complexity of performing incident reponse against rich, structured data in ElasticSearch. It was developed with the help of the security and fraud teams to perform adhoc queries critical for incident response. The tool served the team well and it's been under active development ever since. It continues to grow in capabilities aimed to make adhoc analysis simple, easy, and accessible to hardened command line jockeys and command line newbies.

Join me to learn how to bring the logging data you love back to your terminal!

8d96f5c273062cb617255e630fe0705c?s=128

Brad Lhotsky

October 22, 2019
Tweet

Transcript

  1. ELASTICSEARCH DATA EXPLORATION IN YOUR TERMINAL Things you never knew

    you needed until it was too late @reyjrar
  2. TRIGGER WARNING 4 - 7 - 8 (Inhale) - (Hold)

    - (Exhale)
  3. WHY? @reyjrar

  4. WE HAVE KIBANA It’s an Elastic Product too! @reyjrar

  5. WE HAVE GRAFANA Such pretty, much datasources! @reyjrar

  6. AND NOW WE HAVE LOKI OR WHATEVER Flip between logs

    and graphs, oh my. @reyjrar
  7. WHY DRAG THE TERMINAL INTO THIS? @reyjrar

  8. MICE SLOW ME DOWN If you prefer a browser, cool.

    @reyjrar
  9. MY BROWSER IS NOT AN IDE ➤ Which browser? ➤

    Are you using privacy extensions? ➤ What happens when I hit “Backspace” ➤ Don’t get me started on “gestures” ➤ Bloaty and slow ➤ Prone to distraction @reyjrar
  10. THE CLI IS A WORKSPACE ➤ Which shell? ➤ Tab

    Autocomplete ➤ ReadLine ➤ dotfiles ➤ Access to a plethora of interoperable tools ➤ OK, I could MUD from my terminal ➤ Otherwise, fairly purpose built @reyjrar
  11. MY TERMINAL IS WHERE I WORK @reyjrar

  12. GUIDING LIGHTS Things that mean something to me

  13. “ There are a finite number of key strokes before

    you die, use them wisely. - A Wise Programmer
  14. UNIX PHILOSOPHY ➤ Do One Thing Well ➤ Assume output

    will be used as input and vice versa ➤ Favor the creation of tools or scripts, even for seemingly one-off jobs
  15. PERL ➤ Easy things easy, hard things possible ➤ Sloppy

    and unpredictable uses just like natural languages ➤ Grow with you ➤ DWIM ➤ TIMTOWDI ➤ The CPAN
  16. EXPLORATION being places you weren't intended to be

  17. I SAW THE POWER OF ELASTICSEARCH EARLY

  18. BUT IT WASN’T VERY CLI OR OPS FRIENDLY

  19. App::ElasticSearch::Utilities https://github.com/reyjrar/es-utils https://metacpan.org/pod/App::ElasticSearch::Utilities SO I DID A PERL @reyjrar

  20. ES-UTILS ➤ Monitoring ➤ Maintenance ➤ Status and Informational Tools

    ➤ Built as a Reusable Perl functional library ➤ ES Version Agnostic ➤ Assumes index-%Y.%m.%d index names ➤ And then came.., ➤ es-search.pl @reyjrar
  21. OPTIONAL: INSTALL PERLBREW # Install perlbrew curl -L https://install.perlbrew.pl \

    | bash # Setup perlbrew perlbrew install -j8 -n 5.30.0 perlbrew switch 5.30.0 perlbrew install-cpanm @reyjrar
  22. INSTALLATION cpanm App::ElasticSearch::Utilities @reyjrar

  23. ES-SEARCH.PL Bringing ElasticSearch to your terminal since 2012 @reyjrar

  24. GETTING STARTED: RTFM es-search.pl --help es-search.pl --manual @reyjrar

  25. ILL ADVISED LIVE DEMO

  26. GETTING STARTED: CONNECTING # Defaults es-search.pl --host localhost --port 9200

    # Connect to es-node01 es-search.pl --host es-node01 @reyjrar
  27. GETTING STARTED: CONNECT PREFERENCES cat ~/.es-utils.yml --- host: es-gateway.corp.company.com port:

    443 proto: https http-username: bob password-exec: ~/bin/get-es-password.sh @reyjrar
  28. SOME HELPFUL NOTES ➤ Searches are constrained by the calendar

    date in the index name ➤ Use --days 7 for opening scope to 7 days ➤ Searches will stop once they receive --size 20 results ➤ Use --all to get all results across full timespan ➤ Sort order is descending, override with --asc ➤ Target a specific index with --index logstash-2019.10.21 @reyjrar
  29. BOUND TO FAIL LIVE DEMO

  30. GETTING STARTED: INDEX SELECTION # List index basenames $ es-search.pl

    --bases Bases available for search: access security syslog # Bases: 1 from a combined 61 indices.
  31. GETTING STARTED: SHOW ME MONEY DATA # Show all the

    fields in the base es-search.pl --base log --fields # Show most recently indexed doc es-search.pl --base log --size 1 @reyjrar
  32. GETTING STARTED: SET A DEFAULT "BASE" cat ~/.es-utils.yml --- host:

    es-gateway.corp.company.com base: log days: 1 @reyjrar
  33. GETTING STARTED: TIMESTAMP DETECTION # Specify timestamp es-search.pl --base log

    \ --timestamp timestamp cat ~/.es-utils.yml --- base: log timestamp: timestamp @reyjrar
  34. GETTING STARTED: TIMESTAMP PREFERENCES cat ~/.es-utils.yml --- base: log #

    Global default timestamp field timestamp: timestamp # Per base settings meta: logstash: timestamp: '@timestamp'
  35. GETTING STARTED: SHOW ME MONEY DATA QUICKER # Show most

    recently indexed doc es-search.pl --size 1 @reyjrar
  36. GETTING STARTED: SHOW ME MONEY DATA MORE LIKE LOGS #

    Show just selected fields es-search.pl --show hostname,program,message @reyjrar
  37. GETTING STARTED: SEARCH FOR MATCHING DOCS # Search for program:sshd

    es-search.pl program:sshd @reyjrar
  38. GETTING STARTED: COMPLEX QUERIES # Search for sshd and ip

    1.2.3.4 es-search.pl program:sshd AND src_ip:1.2.3.4 @reyjrar
  39. GETTING STARTED: SEARCH OPTIMIZATIONS # Search for sshd and ip

    1.2.3.4 es-search.pl program:sshd src_ip:1.2.3.4 App::ElasticSearch::Utilities::QueryString uses a default join for dangling search terms of 'AND' # Search for sshd or ip 1.2.3.4 es-search.pl --or program:sshd src_ip:1.2.3.4 @reyjrar
  40. GETTING STARTED: I WANT TO USE JQ # Make output

    pipe friendly to jq es-search.pl program:sshd --exists src_ip \ --jq | jq .src_ip | sort | uniq -c @reyjrar
  41. EVEN MORE OPTIMIZATIONS Short cuts to save key strokes

  42. WHY WOULD YOU TRY A LIVE DEMO

  43. QUERY STRING EXTENSIONS: BARE WORDS # and, or, not uppercased

    es-search.pl not program:sshd @reyjrar
  44. QUERY STRING EXTENSIONS: IP # Use CIDR Notation for IPs

    es-search.pl src_ip:10.0.0.0/8 @reyjrar
  45. QUERY STRING EXTENSIONS: RANGE # Range and range combos es-search.pl

    dst_port:'<1024' es-search.pl status:'<500,>=400' @reyjrar
  46. QUERY STRING EXTENSIONS: TERMS PROMOTION # Don't stress the Lucene

    escapes es-search.pl =exec:/bin/bash @reyjrar
  47. QUERY STRING EXTENSIONS: PREFIX # String prefixes es-search.pl _prefix_:user_agent:"Go" @reyjrar

  48. QUERY STRING EXTENSIONS: TERMS IN A FILE # Build terms

    from a TSV file, last column es-search.pl src_ip:badguys.dat # Build terms from a TSV file, first column es-search.pl src_ip:badguys.dat[0] # Build terms from a CSV file, last column es-search.pl src_ip:badguys.csv @reyjrar
  49. QUERY STRING EXTENSIONS: TERMS IN A JSON DATA SET #

    Build terms from an NDJSON file .ip es-search.pl src_ip:threatfeed.json[ip] # Build terms from an NDJSON file nested field es-search.pl src_ip:threatfeed.json[actor.ip] @reyjrar
  50. CAN HAZ AGGREGATIONS i thought you'd never ask!

  51. AGGREGATION CAVEATS ➤ Supported during "facets" and ES 0.17 ➤

    Early versions of ES, up to v2.x were splodey ➤ Some limitations which I'm slowly rolling back ➤ per day ➤ Top aggregation must be a bucket ➤ Limited to 2 levels deep ➤ Well, 3 in a certain instance
  52. TIME TO BURN SAGE LIVE DEMO

  53. AGGREGATIONS: TOP THING # Top 20 5xx-ing uri es-search.pl --top

    uri status:>=500 # Top 50 5xx-ing uri es-search.pl --top uri status:>=500 --size 50 es-search.pl --top uri status:>=500 --limit 50 es-search.pl --top uri status:>=500 -n 50 @reyjrar
  54. AGGREGATIONS: TOP THING PER HOUR # Top 20 uri per

    hour es-search.pl --top uri --interval 1h @reyjrar
  55. AGGREGATIONS: TOP THING WITH ANOTHER THING # Top 20 uri

    with the top 3 countries es-search.pl --top uri --with src_country # Top 20 uri with the top 10 countries es-search.pl --top uri --with src_country:10 @reyjrar
  56. AGGREGATIONS: TOP THING BY SOMETHING OTHER THAN DOC COUNT #

    Top 20 uri by the cardinality of country es-search.pl --top uri \ --by cardinality:src_country # Top 20 ip by the total traffic es-search.pl --top src_ip \ --by sum:out_bytes @reyjrar
  57. AGGREGATIONS: WHERE'S MY DATA GOING # Top 20 ip by

    the total traffic # With top uri's es-search.pl --top src_ip \ --by sum:out_bytes \ --with uri:1 @reyjrar
  58. AGGREGATIONS: STATISTICS ANYONE? # Top 20 uri by average render

    time # with a statistical summary es-search.pl --top uri \ --by avg:render_ms \ --with stats:render_ms @reyjrar
  59. AGGREGATIONS: PERCENTILES, TOO # Top 20 uri by average render

    time # With median, 90, and 99th percentile es-search.pl --top uri \ --by avg:render_ms \ --with percentiles:render_ms:50,90,99 @reyjrar
  60. AGGREGATIONS: I GOT YOUR HISTOGRAMS # Top 20 uri by

    average render time # with histogram of 100ms es-search.pl --top uri \ --by avg:render_ms \ --with histogram:render_ms:100 @reyjrar
  61. AGGREGATIONS: I'M ALL ABOUT SIGNIFICANCE # Top 20 significant uri

    for search es-search.pl --top significant_terms:uri \ render_ms:>1000 src_country:US # Top 20 significant uri for search, # Background is only US es-search.pl --top significant_terms:uri \ render_ms:>1000 src_country:US \ --bg-filter src_country:US
  62. ONE MORE THING Well, maybe more than one more thing..

  63. BUILT WITH CLI::HELPERS ➤ General purpose, functional library for developing

    command line utilities in Perl ➤ Handles input ➤ Provides output customization including color support, --color ➤ Allow output tagged as data to be redirected into a file, --data-file=output.dat ➤ NoPaste support via App::NoPaste and --no-paste @reyjrar
  64. NOTES ON APP::NOPASTE ➤ CLI::Helpers will only paste to a

    service flagged as "public" if you specify --no-paste- public ➤ Subclass an App::NoPaste::Service object for your internal paste service, it's pretty simple ➤ Easily share things with colleagues directly from the command line @reyjrar
  65. PUTTING SOME THINGS TOGETHER # Let's say we have a

    list of bad ip es-search.pl --top src_ip \ _prefix_:path:\/admin \ status:<400 \ src_ip:threatfeed.json[ip] \ --data-file=insidethehouse.dat @reyjrar
  66. PUTTING SOME THINGS TOGETHER # Dump a full log of

    what they've done es-search.pl src_ip:insidethehouse.dat --show src_ip,src_user,uri,out_bytes \ --all # Share with your colleagues es-search.pl src_ip:insidethehouse.dat --show src_ip,src_user,uri,out_bytes \ --all --no-paste @reyjrar
  67. FUTURE PLANS ➤ Arbitrary levels of nested aggregations ➤ JSON

    output for aggregations ➤ Better support for nested documents ➤ Arbitrary data joins at query time: rdns, whois, db lookups, etc. ➤ <your idea here> @reyjrar
  68. Thank you! brad.lhotsky@gmail.com https://twitter.com/reyjrar https://github.com/reyjrar https://speakerdeck.com/reyjrar https://www.craigslist.org/about/craigslist_is_hiring https://www.craigslist.org/about/cl_app_beta