Toppling the Stack: Practical Outlier Detection for Threat Hunters

Toppling the Stack: Practical Outlier Detection for Threat Hunters

As presented at the SANS Threat Hunting and Incident Response Summit 2017.

So much of what we do as hunters is based on finding oddballs, but most published hunt procedures seem to rely on a single method: stack counting. In this session, we’ll examine a few other ways of finding outliers in your data, with samples and use cases for each.

49d635b47da1fee5d0972745390e0633?s=128

David J. Bianco

April 18, 2017
Tweet

Transcript

  1. None
  2. ... okay, but because you said that, we’re breaking up.

    https://xkcd.com/539
  3. Least-Commonly Accessed Files Dataset: CERT/CC Insider Threat Dataset, http://www.cert.org/insider-threat/tools/

  4. SQL SELECT cs_user_agent, count(*) as cnt FROM proxy WHERE […]

    GROUP BY cs_user_agent ORDER BY cnt ASC LIMIT 100 Splunk index=proxy method=POST status=200 | stats count by cs_user_agent | sort +count | head 100 Python # Using the ‘pandas’ module and DataFrames print proxy_df.value_counts(by=“cs_user_agent”, sort=True, ascending=True).head(100)
  5. • • • • • • • “I’ve never seen

    this service on any of my 100,000+ systems before. That’s pretty suspicious.”
  6. http://skepdic.com/graphics/elvistoast.jpg

  7. None
  8. trace = go.Scattergl( x=points_x, y=points_y, mode='markers' ) data = [trace]

    pyo.iplot(dict(data=data)) Basic scatter plot: <= 6 LOC
  9. • • • • • • “Most users don’t change

    upload habits often, so benign activity should fall close to the ‘no change’ trend line.”
  10. None
  11. data = [ go.Box( y=shells["command_line_length"], boxpoints='all’ ) ] layout =

    go.Layout( title="Command Line Lengths for Shell Processes" ) pyo.iplot(dict(data=data, layout=layout)) Basic box plot: <= 7 LOC
  12. • • • • • “I have no idea what’s

    normal for this column of numbers, let alone whether there are any outliers.”
  13. You have to be careful doing this. Sometimes, when you

    push the whisker down, dynamite explodes. https://xkcd.com/1798
  14. A form of unsupervised machine learning. Iteratively split dataset on

    random dimensions and their values until you can’t split anymore. This is a “tree”. Do this several times to grow the tree into a “forest”. Average depth across all trees for each point reflects “outlierness”. http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf
  15. https://github.com/DavidJBianco/Clearcut

  16. • • • • • “Which of these HTTP log

    entries are most unlike the others?”
  17. When it comes to outlier detection, you’ve got options. Don’t

    be afraid to play around a bit and see what works best for you!
  18. “ ”