DNS Log Analysis -- Case Study

DNS Log Analysis -- Case Study

AusCERT 2015 Talk

D500450ea21d23477c1f2b22589627d3?s=128

Mohammed Makhlouf

June 11, 2015
Tweet

Transcript

  1. DNS Log Analysis Case Study Osama Kamal @okamalo Mohammed Makhlouf

    www.mak.my
  2. Why DNS? Attacker side Analysis side kill chain little privacy

    concerns C&C payload encryption waidely used by attackers easy setup whois data unverified free/low cost
  3. raw data >> information

  4. from raw logs 3/31/2015 11:58:31 PM 1154 PACKET 00000000064A2360 UDP

    Snd 172.21.2.224 8b50 Q [1001 D NOERROR] A (13)csc3-2010-crl(8)verisign(3)com(0) 3/31/2015 11:58:31 PM 1154 PACKET 00000000058AD750 UDP Snd 172.22.9.209 c8da R Q [8081 DR NOERROR] A (13)csc3-2010-crl(8)verisign(3)com(0) 3/31/2015 11:58:31 PM 1154 PACKET 000000000191FBA0 UDP Rcv 172.21.2.224 8b50 R Q [9081 DR NOERROR] A (13)csc3-2010-crl(8)verisign(3)com(0) 3/31/2015 11:58:34 PM 1154 PACKET 0000000005D2BE40 UDP Rcv 172.22.9.209 f315 Q [0001 D NOERROR] A (5)ctldl(13)windowsupdate(3)com(0) 3/31/2015 11:58:34 PM 1154 PACKET 0000000005D2BE40 UDP Snd 172.22.9.209 f315 R Q [8081 DR NOERROR] A (5)ctldl(13)windowsupdate(3)com(0) 3/31/2015 11:58:41 PM 1154 PACKET 0000000004BB9610 UDP Snd 172.21.2.224 05b8 Q [1001 D NOERROR] A (5)e8218(2)ce(10)akamaiedge(3)net(0) 3/31/2015 11:58:41 PM 1154 PACKET 000000000191FBA0 UDP Snd 172.22.9.209 c33f R Q [8081 DR NOERROR] A (4)ocsp(8)verisign(3)com(0) 3/31/2015 11:58:41 PM 1154 PACKET 000000000191FBA0 UDP Rcv 172.22.9.209 c33f Q [0001 D NOERROR] A (4)ocsp(8)verisign(3)com(0) 3/31/2015 11:58:41 PM 1154 PACKET 0000000004B1F460 UDP Rcv 172.21.2.224 05b8 R Q [9081 DR NOERROR] A (5)e8218(2)ce(10)akamaiedge(3)net(0) 3/31/2015 11:58:46 PM 114C PACKET 0000000004AE68A0 UDP Rcv 172.22.9.209 e85d Q [0001 D NOERROR] A (3)crl(8)verisign(3)com(0) 3/31/2015 11:58:46 PM 114C PACKET 0000000004AE68A0 UDP Snd 172.22.9.209 e85d R Q [8081 DR NOERROR] A (3)crl(8)verisign(3)com(0)
  5. Enrichment DNS records (dig) Whois GeoLocation Passive DNS Historical Whois

    Reputation data Active probing
  6. to information

  7. Threat feeds Blacklist Various sources Large volume Easy searchable

  8. Historical IOCs

  9. Findings

  10. Unknown DGA Algorithm false positives xxxxxxxxxxxxxxxx.mcafee.com xxxxxxxxxxxxxxxx.gstats.com xxxxxxxxxxxxxxxx.symantec

  11. DGA - Bedep malware [ad fraud]

  12. DGA - Dyre/Dyreza malware [PoS]

  13. DGA - NXDOMAIN

  14. Never seen before

  15. China DNS Server - 1 dig @219.141.136.10 twitter.com twitter.com. 1828

    IN A 189.203.254.212
  16. China DNS Server - 2

  17. bad ip reputation

  18. PUP - Potentially Unwanted Programs

  19. PUP - Potentially Unwanted Programs

  20. Dynamic DNS

  21. Keywords updat e- j ava . net syst em svc

    . net adobe- updat e . net
  22. Final Report

  23. Activly Exploring Domains

  24. None
  25. None
  26. System Architecture

  27. High Level Goals •Accept logs at any rate -Batches of

    log files or Stream of log entries •Never drop a single log entry -or else we would come up with wrong conclusions •Absolute Elasticity -scale dynamically by adding / removing nodes
  28. Data Pipeline Stages Ingest & Persist Enrich & Analyze Index

    & Visualize
  29. Ingest & Persist

  30. Apache Kafka •An Apache project initially developed at LinkedIn •Distributed

    publish-subscribe messaging system •Specifically designed for real time activity streams •Does not use JMS APIs •Great multi-language client libraries
  31. None
  32. Ingest & Persist Our own Multi-Threaded Python based Producer •Can

    accepts log entries over TCP / HTTP •Can scan DFS/Network mounted Directory of log files •Performs basic parsing & validation •Immediately writes to kafka at 240K logs / sec [Avg. 200 Bytes] •Uses the kafka python client https://github.com/mumrah/kafka-python
  33. We Kafka •Persistent messaging •High throughput, low overhead •Uses ZooKeeper

    for forming a cluster of nodes •Supports both queue and topic semantics
  34. +200K Messages / Sec http://research.micros oft.com/en- us/um/people/srikant h/netdb11/netdb11pa

  35. +2M Messages / Sec Multi-Threaded Producers / Consumers

  36. Enrich & Analyse

  37. Apache Storm •Developed by Nathan Marz •Open sourced by Twitter

    in 2011 •Now an Apache Software Foundation project •{Map/Reduce}-like semantics for stream processing •Supports a multi-language protocol (JSON over STDIN/STDOUT)
  38. Apache Storm Spout provides stream of data, Bolt performs actual

    computation
  39. We Kinda Storm •Scalable real-time computation system •Also Uses ZooKeeper

    for forming a cluster of nodes But need to use the Java toolchain to build and submit topologies.
  40. We srsly Pyleus A Python framework for developing & launching

    Storm topologies. •Open sourced by Yelp •Storm Topology in defined YAML •MsgPack based serializer (Runs Faster) •Code entirely in Python •Don’t have to touch Java
  41. Enrich & Analyze A storm Topology written entirely in python

    over the Pyleus framework. •Kafka-Python Spout for “pull”-ing the log entries •Summary Stats Bolts [2 Bolts] •Enrichment Bolts [+10 bolts] •Analysis Bolts [+6] •Archiving Bolts [2 bolts]
  42. Main Storm Topology

  43. Main Storm Topology

  44. Main Storm Topology

  45. Main Storm Topology

  46. Minions Pre-configured cloud based instances for time consuming enrichment processes.

    We’ve got 20 of them. •DNS records (Dig) •Whois •Domain Reputation •Active Probing •GeoIP •Histroical Whois
  47. Index & Visualize

  48. ElasticSearch Open sourced under Apache license •Distributed search engine •Fully

    exposes Lucene search functionality •Built for clustering from the ground-up •High availability / Multi-tenancy
  49. Plan to throw one away

  50. Cassandra •Highly scalable key-value distributed store •Impressive write performance •Apache

    project Use cassandra as both an authoritative data store and as a queue.
  51. Kafka > Cassandra Cassandra is not designed to be a

    queue system. Kafka does a great job persisting the data (Less headaches)
  52. Java Allergy

  53. Big Wins Both Raw and Enriched logs are indexed in

    ES Parsed Raw log stream is persisted in Kafka (can replay the queue) Adding new enrichment or analysis bolts is very simple with Pyleus
  54. All means to an end

  55. Thank You