Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch Data Exploration in Your Terminal

ElasticSearch Data Exploration in Your Terminal

You've seen the pretty graphs. Visuals are great for signaling there is a problem somewhere in your system. How do you, a command line guru, go from pretty graphs to root cause analysis? Most likely you'll be reaching for paradigms from the command line: composability and a flexible, compact syntax to ask your questions. I'd like to talk more about integrating ElasticSearch-based dashboards back to the command line workflows I love.

This talk is an overview of a tool I developed while working at Booking.com to drastically reduce the time and complexity of performing incident reponse against rich, structured data in ElasticSearch. It was developed with the help of the security and fraud teams to perform adhoc queries critical for incident response. The tool served the team well and it's been under active development ever since. It continues to grow in capabilities aimed to make adhoc analysis simple, easy, and accessible to hardened command line jockeys and command line newbies.

Join me to learn how to bring the logging data you love back to your terminal!

Brad Lhotsky

October 22, 2019
Tweet

More Decks by Brad Lhotsky

Other Decks in Technology

Transcript

  1. MY BROWSER IS NOT AN IDE ➤ Which browser? ➤

    Are you using privacy extensions? ➤ What happens when I hit “Backspace” ➤ Don’t get me started on “gestures” ➤ Bloaty and slow ➤ Prone to distraction @reyjrar
  2. THE CLI IS A WORKSPACE ➤ Which shell? ➤ Tab

    Autocomplete ➤ ReadLine ➤ dotfiles ➤ Access to a plethora of interoperable tools ➤ OK, I could MUD from my terminal ➤ Otherwise, fairly purpose built @reyjrar
  3. “ There are a finite number of key strokes before

    you die, use them wisely. - A Wise Programmer
  4. UNIX PHILOSOPHY ➤ Do One Thing Well ➤ Assume output

    will be used as input and vice versa ➤ Favor the creation of tools or scripts, even for seemingly one-off jobs
  5. PERL ➤ Easy things easy, hard things possible ➤ Sloppy

    and unpredictable uses just like natural languages ➤ Grow with you ➤ DWIM ➤ TIMTOWDI ➤ The CPAN
  6. ES-UTILS ➤ Monitoring ➤ Maintenance ➤ Status and Informational Tools

    ➤ Built as a Reusable Perl functional library ➤ ES Version Agnostic ➤ Assumes index-%Y.%m.%d index names ➤ And then came.., ➤ es-search.pl @reyjrar
  7. OPTIONAL: INSTALL PERLBREW # Install perlbrew curl -L https://install.perlbrew.pl \

    | bash # Setup perlbrew perlbrew install -j8 -n 5.30.0 perlbrew switch 5.30.0 perlbrew install-cpanm @reyjrar
  8. GETTING STARTED: CONNECTING # Defaults es-search.pl --host localhost --port 9200

    # Connect to es-node01 es-search.pl --host es-node01 @reyjrar
  9. GETTING STARTED: CONNECT PREFERENCES cat ~/.es-utils.yml --- host: es-gateway.corp.company.com port:

    443 proto: https http-username: bob password-exec: ~/bin/get-es-password.sh @reyjrar
  10. SOME HELPFUL NOTES ➤ Searches are constrained by the calendar

    date in the index name ➤ Use --days 7 for opening scope to 7 days ➤ Searches will stop once they receive --size 20 results ➤ Use --all to get all results across full timespan ➤ Sort order is descending, override with --asc ➤ Target a specific index with --index logstash-2019.10.21 @reyjrar
  11. GETTING STARTED: INDEX SELECTION # List index basenames $ es-search.pl

    --bases Bases available for search: access security syslog # Bases: 1 from a combined 61 indices.
  12. GETTING STARTED: SHOW ME MONEY DATA # Show all the

    fields in the base es-search.pl --base log --fields # Show most recently indexed doc es-search.pl --base log --size 1 @reyjrar
  13. GETTING STARTED: SET A DEFAULT "BASE" cat ~/.es-utils.yml --- host:

    es-gateway.corp.company.com base: log days: 1 @reyjrar
  14. GETTING STARTED: TIMESTAMP DETECTION # Specify timestamp es-search.pl --base log

    \ --timestamp timestamp cat ~/.es-utils.yml --- base: log timestamp: timestamp @reyjrar
  15. GETTING STARTED: TIMESTAMP PREFERENCES cat ~/.es-utils.yml --- base: log #

    Global default timestamp field timestamp: timestamp # Per base settings meta: logstash: timestamp: '@timestamp'
  16. GETTING STARTED: SHOW ME MONEY DATA QUICKER # Show most

    recently indexed doc es-search.pl --size 1 @reyjrar
  17. GETTING STARTED: SHOW ME MONEY DATA MORE LIKE LOGS #

    Show just selected fields es-search.pl --show hostname,program,message @reyjrar
  18. GETTING STARTED: COMPLEX QUERIES # Search for sshd and ip

    1.2.3.4 es-search.pl program:sshd AND src_ip:1.2.3.4 @reyjrar
  19. GETTING STARTED: SEARCH OPTIMIZATIONS # Search for sshd and ip

    1.2.3.4 es-search.pl program:sshd src_ip:1.2.3.4 App::ElasticSearch::Utilities::QueryString uses a default join for dangling search terms of 'AND' # Search for sshd or ip 1.2.3.4 es-search.pl --or program:sshd src_ip:1.2.3.4 @reyjrar
  20. GETTING STARTED: I WANT TO USE JQ # Make output

    pipe friendly to jq es-search.pl program:sshd --exists src_ip \ --jq | jq .src_ip | sort | uniq -c @reyjrar
  21. QUERY STRING EXTENSIONS: BARE WORDS # and, or, not uppercased

    es-search.pl not program:sshd @reyjrar
  22. QUERY STRING EXTENSIONS: IP # Use CIDR Notation for IPs

    es-search.pl src_ip:10.0.0.0/8 @reyjrar
  23. QUERY STRING EXTENSIONS: RANGE # Range and range combos es-search.pl

    dst_port:'<1024' es-search.pl status:'<500,>=400' @reyjrar
  24. QUERY STRING EXTENSIONS: TERMS PROMOTION # Don't stress the Lucene

    escapes es-search.pl =exec:/bin/bash @reyjrar
  25. QUERY STRING EXTENSIONS: TERMS IN A FILE # Build terms

    from a TSV file, last column es-search.pl src_ip:badguys.dat # Build terms from a TSV file, first column es-search.pl src_ip:badguys.dat[0] # Build terms from a CSV file, last column es-search.pl src_ip:badguys.csv @reyjrar
  26. QUERY STRING EXTENSIONS: TERMS IN A JSON DATA SET #

    Build terms from an NDJSON file .ip es-search.pl src_ip:threatfeed.json[ip] # Build terms from an NDJSON file nested field es-search.pl src_ip:threatfeed.json[actor.ip] @reyjrar
  27. AGGREGATION CAVEATS ➤ Supported during "facets" and ES 0.17 ➤

    Early versions of ES, up to v2.x were splodey ➤ Some limitations which I'm slowly rolling back ➤ per day ➤ Top aggregation must be a bucket ➤ Limited to 2 levels deep ➤ Well, 3 in a certain instance
  28. AGGREGATIONS: TOP THING # Top 20 5xx-ing uri es-search.pl --top

    uri status:>=500 # Top 50 5xx-ing uri es-search.pl --top uri status:>=500 --size 50 es-search.pl --top uri status:>=500 --limit 50 es-search.pl --top uri status:>=500 -n 50 @reyjrar
  29. AGGREGATIONS: TOP THING PER HOUR # Top 20 uri per

    hour es-search.pl --top uri --interval 1h @reyjrar
  30. AGGREGATIONS: TOP THING WITH ANOTHER THING # Top 20 uri

    with the top 3 countries es-search.pl --top uri --with src_country # Top 20 uri with the top 10 countries es-search.pl --top uri --with src_country:10 @reyjrar
  31. AGGREGATIONS: TOP THING BY SOMETHING OTHER THAN DOC COUNT #

    Top 20 uri by the cardinality of country es-search.pl --top uri \ --by cardinality:src_country # Top 20 ip by the total traffic es-search.pl --top src_ip \ --by sum:out_bytes @reyjrar
  32. AGGREGATIONS: WHERE'S MY DATA GOING # Top 20 ip by

    the total traffic # With top uri's es-search.pl --top src_ip \ --by sum:out_bytes \ --with uri:1 @reyjrar
  33. AGGREGATIONS: STATISTICS ANYONE? # Top 20 uri by average render

    time # with a statistical summary es-search.pl --top uri \ --by avg:render_ms \ --with stats:render_ms @reyjrar
  34. AGGREGATIONS: PERCENTILES, TOO # Top 20 uri by average render

    time # With median, 90, and 99th percentile es-search.pl --top uri \ --by avg:render_ms \ --with percentiles:render_ms:50,90,99 @reyjrar
  35. AGGREGATIONS: I GOT YOUR HISTOGRAMS # Top 20 uri by

    average render time # with histogram of 100ms es-search.pl --top uri \ --by avg:render_ms \ --with histogram:render_ms:100 @reyjrar
  36. AGGREGATIONS: I'M ALL ABOUT SIGNIFICANCE # Top 20 significant uri

    for search es-search.pl --top significant_terms:uri \ render_ms:>1000 src_country:US # Top 20 significant uri for search, # Background is only US es-search.pl --top significant_terms:uri \ render_ms:>1000 src_country:US \ --bg-filter src_country:US
  37. BUILT WITH CLI::HELPERS ➤ General purpose, functional library for developing

    command line utilities in Perl ➤ Handles input ➤ Provides output customization including color support, --color ➤ Allow output tagged as data to be redirected into a file, --data-file=output.dat ➤ NoPaste support via App::NoPaste and --no-paste @reyjrar
  38. NOTES ON APP::NOPASTE ➤ CLI::Helpers will only paste to a

    service flagged as "public" if you specify --no-paste- public ➤ Subclass an App::NoPaste::Service object for your internal paste service, it's pretty simple ➤ Easily share things with colleagues directly from the command line @reyjrar
  39. PUTTING SOME THINGS TOGETHER # Let's say we have a

    list of bad ip es-search.pl --top src_ip \ _prefix_:path:\/admin \ status:<400 \ src_ip:threatfeed.json[ip] \ --data-file=insidethehouse.dat @reyjrar
  40. PUTTING SOME THINGS TOGETHER # Dump a full log of

    what they've done es-search.pl src_ip:insidethehouse.dat --show src_ip,src_user,uri,out_bytes \ --all # Share with your colleagues es-search.pl src_ip:insidethehouse.dat --show src_ip,src_user,uri,out_bytes \ --all --no-paste @reyjrar
  41. FUTURE PLANS ➤ Arbitrary levels of nested aggregations ➤ JSON

    output for aggregations ➤ Better support for nested documents ➤ Arbitrary data joins at query time: rdns, whois, db lookups, etc. ➤ <your idea here> @reyjrar