Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DNS Log Analysis -- Case Study

DNS Log Analysis -- Case Study

AusCERT 2015 Talk

Mohammed Makhlouf

June 11, 2015
Tweet

More Decks by Mohammed Makhlouf

Other Decks in Programming

Transcript

  1. Why DNS? Attacker side Analysis side kill chain little privacy

    concerns C&C payload encryption waidely used by attackers easy setup whois data unverified free/low cost
  2. from raw logs 3/31/2015 11:58:31 PM 1154 PACKET 00000000064A2360 UDP

    Snd 172.21.2.224 8b50 Q [1001 D NOERROR] A (13)csc3-2010-crl(8)verisign(3)com(0) 3/31/2015 11:58:31 PM 1154 PACKET 00000000058AD750 UDP Snd 172.22.9.209 c8da R Q [8081 DR NOERROR] A (13)csc3-2010-crl(8)verisign(3)com(0) 3/31/2015 11:58:31 PM 1154 PACKET 000000000191FBA0 UDP Rcv 172.21.2.224 8b50 R Q [9081 DR NOERROR] A (13)csc3-2010-crl(8)verisign(3)com(0) 3/31/2015 11:58:34 PM 1154 PACKET 0000000005D2BE40 UDP Rcv 172.22.9.209 f315 Q [0001 D NOERROR] A (5)ctldl(13)windowsupdate(3)com(0) 3/31/2015 11:58:34 PM 1154 PACKET 0000000005D2BE40 UDP Snd 172.22.9.209 f315 R Q [8081 DR NOERROR] A (5)ctldl(13)windowsupdate(3)com(0) 3/31/2015 11:58:41 PM 1154 PACKET 0000000004BB9610 UDP Snd 172.21.2.224 05b8 Q [1001 D NOERROR] A (5)e8218(2)ce(10)akamaiedge(3)net(0) 3/31/2015 11:58:41 PM 1154 PACKET 000000000191FBA0 UDP Snd 172.22.9.209 c33f R Q [8081 DR NOERROR] A (4)ocsp(8)verisign(3)com(0) 3/31/2015 11:58:41 PM 1154 PACKET 000000000191FBA0 UDP Rcv 172.22.9.209 c33f Q [0001 D NOERROR] A (4)ocsp(8)verisign(3)com(0) 3/31/2015 11:58:41 PM 1154 PACKET 0000000004B1F460 UDP Rcv 172.21.2.224 05b8 R Q [9081 DR NOERROR] A (5)e8218(2)ce(10)akamaiedge(3)net(0) 3/31/2015 11:58:46 PM 114C PACKET 0000000004AE68A0 UDP Rcv 172.22.9.209 e85d Q [0001 D NOERROR] A (3)crl(8)verisign(3)com(0) 3/31/2015 11:58:46 PM 114C PACKET 0000000004AE68A0 UDP Snd 172.22.9.209 e85d R Q [8081 DR NOERROR] A (3)crl(8)verisign(3)com(0)
  3. Keywords updat e- j ava . net syst em svc

    . net adobe- updat e . net
  4. High Level Goals •Accept logs at any rate -Batches of

    log files or Stream of log entries •Never drop a single log entry -or else we would come up with wrong conclusions •Absolute Elasticity -scale dynamically by adding / removing nodes
  5. Apache Kafka •An Apache project initially developed at LinkedIn •Distributed

    publish-subscribe messaging system •Specifically designed for real time activity streams •Does not use JMS APIs •Great multi-language client libraries
  6. Ingest & Persist Our own Multi-Threaded Python based Producer •Can

    accepts log entries over TCP / HTTP •Can scan DFS/Network mounted Directory of log files •Performs basic parsing & validation •Immediately writes to kafka at 240K logs / sec [Avg. 200 Bytes] •Uses the kafka python client https://github.com/mumrah/kafka-python
  7. We Kafka •Persistent messaging •High throughput, low overhead •Uses ZooKeeper

    for forming a cluster of nodes •Supports both queue and topic semantics
  8. Apache Storm •Developed by Nathan Marz •Open sourced by Twitter

    in 2011 •Now an Apache Software Foundation project •{Map/Reduce}-like semantics for stream processing •Supports a multi-language protocol (JSON over STDIN/STDOUT)
  9. We Kinda Storm •Scalable real-time computation system •Also Uses ZooKeeper

    for forming a cluster of nodes But need to use the Java toolchain to build and submit topologies.
  10. We srsly Pyleus A Python framework for developing & launching

    Storm topologies. •Open sourced by Yelp •Storm Topology in defined YAML •MsgPack based serializer (Runs Faster) •Code entirely in Python •Don’t have to touch Java
  11. Enrich & Analyze A storm Topology written entirely in python

    over the Pyleus framework. •Kafka-Python Spout for “pull”-ing the log entries •Summary Stats Bolts [2 Bolts] •Enrichment Bolts [+10 bolts] •Analysis Bolts [+6] •Archiving Bolts [2 bolts]
  12. Minions Pre-configured cloud based instances for time consuming enrichment processes.

    We’ve got 20 of them. •DNS records (Dig) •Whois •Domain Reputation •Active Probing •GeoIP •Histroical Whois
  13. ElasticSearch Open sourced under Apache license •Distributed search engine •Fully

    exposes Lucene search functionality •Built for clustering from the ground-up •High availability / Multi-tenancy
  14. Cassandra •Highly scalable key-value distributed store •Impressive write performance •Apache

    project Use cassandra as both an authoritative data store and as a queue.
  15. Kafka > Cassandra Cassandra is not designed to be a

    queue system. Kafka does a great job persisting the data (Less headaches)
  16. Big Wins Both Raw and Enriched logs are indexed in

    ES Parsed Raw log stream is persisted in Kafka (can replay the queue) Adding new enrichment or analysis bolts is very simple with Pyleus