Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Network Forensic Analysis in an Encrypted World

ICEBRG
August 04, 2017

Network Forensic Analysis in an Encrypted World

Will Peteroy and Justin Warner presented some of their research on performing analysis and forensics on top of encrypted communication at BSidesLV 2017.

ICEBRG

August 04, 2017
Tweet

Other Decks in Technology

Transcript

  1. Will Peteroy (@wepiv) • Co-founder and CEO of ICEBRG •

    Former lead for Windows and Internet Explorer product security @ Microsoft / MSRC • Bug bounty programs, Security Strategy, etc. @ Microsoft / MSRC • Former Technical Director and subject matter expert at DoD • Global presenter at information security conferences 2
  2. Justin Warner (@sixdub) • Principal Security Engineer at ICEBRG •

    Computer Science grad from USAF Academy & former USAF Cyber Guy • BlackHat USA Instructor in 2015 & 2016 for Adaptive Red Team Tactics • Co-founder of PowerShell Empire and contributor on numerous open-source projects 3
  3. Real Talk With the genesis of the HTTP/S everywhere and

    other movements encrypted traffic is becoming ubiquitous. In client networks we observe that 35-45% of global enterprise network traffic is now encrypted with SSL/TLS. 4
  4. Encryption’s Impact on the Quadrant 6 Batch Metadata Content Real

    Time • No content inspection, application inspection or signatures • No artifact extraction or analysis (or sandboxing) The bottom half of the quadrant disappears
  5. What this Means for Network Defenders Effectively leaves defenders with

    two options Just surrender and let your network get owned Terminate SSL/TLS traffic at an A10 or F5-like appliance Shift from content-based to metadata-based collection and analysis techniques 8
  6. Termination pros and cons Old content tools work Full visibility*

    of encrypted streams Pros User privacy Adding certificate to trusted store for endpoints Broken compatibility for devices that do not MITM / use certificate pinning Cons Metadata analysis pros and cons Can evaluate larger and long-term data flows and patterns Pros Requires large amount of storage (hundreds of TBs) Significant development and infrastructure management resources (~3 FTE to set up, manage and maintain in a < 10,000 person organization) Collecting metadata != Analysis Have to build analysis to go along with it Cons 9
  7. We Still Have a Lot to Work With… Honing in

    on detection of malicious SSL is all the rage now days. Lots of products have started focusing on magic solutions to spotting all evil with promises of high success machine learning algorithms and predictability of traffic. What have we seen? The internet is a pretty strange place. With that said, there are some definite techniques that can help. Let us be transparent with you about what we have seen… 11
  8. Encrypted Traffic Metadata Encrypted traffic actually provides a decent amount

    of telemetry that can be used for threat detection or tracking: … And all the typical flow metadata (IPs, upload/download sizes, etc. 12 Feature Meaning Protocol Version Version of the protocol in use Cipher Suite negotiated cipher Server Name The hostname of the server embedded in the cert Server Subject Additional host names to be protected by the certificate Server Issuer Identifies the entity who has signed and issued the certificate Client Subject Additional client names to be accepted for the certificate Client Issuer Identifies the entity who has signed and issued the certificate Validation Dates Identifies the valid dates for the certificate
  9. Leverage Encryption as an Advantage to Shift Balance of Power

    to Defenders 13 Hypothesis Proven Opportunity to leverage SSL certificates to hunt for adversaries as they may be more likely to use overlapping SSL certificates and infrastructure In Mark Parson's research he presented at BSides Charm in 2016 - able to more than triple his coverage and in some cases increase his coverage of attacker infrastructure by over 100x
  10. Hunting Primer Proactive identification of threats in the environment that

    have successfully evaded traditional reactive security measures Performed utilizing focused hypothesis centric approach to efficiently analyze activity at scale Key to focus in on how specific features vary between normal traffic and malicious traffic 15 What’s normal/common? Where are the outliers? What do we know about attackers? Hunting
  11. What is Normal? Cipher suites, versions and other observed features

    of SSL / TLS vary 16 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 1E+10 Cipher Suite % TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 26.435% TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 18.560% TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 14.104% TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 6.999% TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA 6.616% TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 6.473% TLS_RSA_WITH_AES_128_GCM_SHA256 4.833% TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA 4.394% TLS_RSA_WITH_AES_256_CBC_SHA 2.288% TLS_RSA_WITH_AES_128_CBC_SHA 1.916% TLS_RSA_WITH_AES_256_CBC_SHA256 1.795% TLS_RSA_WITH_AES_256_GCM_SHA384 1.040% TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 0.849% TLS_RSA_WITH_RC4_128_SHA 0.788% TLS_RSA_WITH_RC4_128_MD5 0.688% TLS_RSA_WITH_3DES_EDE_CBC_SHA 0.643% TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 0.443% TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 0.354% TLS_RSA_WITH_AES_128_CBC_SHA256 0.248% … List Continues
  12. Commonality – Asset / Request Distributions 17 Hypothesis Results Adversaries

    will utilize SSL/TLS C2 to gain access to a relatively isolated set of hosts but have high request counts due to beaconing • Useful feature to enrich upon suspicious communications • Combined with other patterns of beaconing, can be predictable • One good way to rule out advertising Assets: 1 / Requests: 344 (across 24 hours) - Also, cert date invalid. In this case, looks like ad/redirect activity
  13. Send/Recv Ratios by Server Name 18 Hypothesis Results Malware infections

    using encryption will have a predictable pattern of send/receive bytes. • Low variance • Send > Receive during odd hours (not normal for host) • Low fidelity when used as a first search but useful as contextual enrichment • Possible detection point for data loss analytic • Tough to model in modern enterprises due to VPN, 24/7 work, and always-on software
  14. Newly Observed Certs 19 Hypothesis Results Adversaries will introduce new

    certificates into the environment breaking the established baseline and creating observables. • Very useful starting point • Useful to construct passive SSL (pSSL) table to track observation dates • VERY good data point for organic threat intel programs
  15. What We Have Seen Actors Using… In practical observation we

    have seen actors leveraging: • Self-signed / default certificates • Free certificates • Cheap certificates Research: • Let’s look at who’s using free certificates… Follow-up: Send examples of attackers using EV certs to [email protected] 20
  16. Let’s Encrypt Things! Let’s Encrypt - free SSL/TLS certificates for

    everyone Handy article "The CA's Role in Fighting Phishing and Malware"* • CAs Make Poor Content Watchdogs A.k.a. we're encrypting content but take no responsibility for making it easy/free to encrypt malicious communications This can cause issues when the general public doesn’t understand the difference between DV and EV certificates … is this a big deal? 21
  17. Different Levels of Certificates Domain Validation (DV) Certificates • The

    issuer gives the cert to anyone who can show they own the domain • No documentation required for a DV certificate • You can set up a script that will grab a DV certificate in less than 30 seconds with no real validation Extended Validation (EV) Certificates • The issuer will only give a cert to a person or company that can prove that they are the legal entity and representative Users often do not know how to recognize EV vs DV certificates 22
  18. Who Uses DV Certificates? 23 Issuer Percentage CN=Symantec Class 3

    Secure Server CA - G4,OU=Symantec Trust Network,O=Symantec Corporation,C=US 12.913% CN=Microsoft IT SSL SHA2,OU=Microsoft IT,O=Microsoft Corporation,L=Redmond,ST=Washington,C=US 11.736% CN=Google Internet Authority G2,O=Google Inc,C=US 10.841% CN=DigiCert SHA2 High Assurance Server CA,OU=www.digicert.com,O=DigiCert Inc,C=US 8.754% CN=DigiCert SHA2 Secure Server CA,O=DigiCert Inc,C=US 6.937% CN=GeoTrust SSL CA - G3,O=GeoTrust Inc.,C=US 6.063% CN=Go Daddy Secure Certificate Authority - G2,OU=http://certs.godaddy.com/repository/,O=GoDaddy.com\, Inc.,L=Scottsdale,ST=Arizona,C=US 5.594% CN=COMODO RSA Domain Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB 3.646% CN=RapidSSL SHA256 CA - G3,O=GeoTrust Inc.,C=US 3.030% CN=DigiCert Cloud Services CA-1,O=DigiCert Inc,C=US 2.922% CN=Amazon,OU=Server CA 1B,O=Amazon,C=US 2.689% CN=Entrust Certification Authority - L1C,OU=(c) 2009 Entrust\, Inc.,OU=www.entrust.net/rpa is incorporated by reference,O=Entrust\, Inc.,C=US 2.340% CN=Microsoft Secure Server CA 2011,O=Microsoft Corporation,L=Redmond,ST=Washington,C=US 2.237% CN=Entrust Certification Authority - L1K,OU=(c) 2012 Entrust\, Inc. - for authorized use only,OU=See www.entrust.net/legal-terms,O=Entrust\, Inc.,C=US 2.082% …… List continues
  19. Changing the Mindset Traditionally, people saw SSL/TLS enabled websites as

    legitimate in nature – there were much fewer of them then. With the increased use of DV certs and wider adoption of encryption across the board, users need to recognize that the use SSL/TLS does not make traffic more “legitimate” in nature. Great discussion here: https://www.troyhunt.com/on-the-perceived-value-ev-certs-cas-phishing-lets-encrypt/ 24
  20. Who Would Abuse Free Certificates? Does this happen in the

    real world? 25 Yes – we observe advertising as the top consumer in our data set of Let’s Encrypt Domain / Server Name Count in last 30 days Purpose Ifmnwi.club 798,494 User tracking / advertising litix.io 58,824 user tracking / advertising metrics.nt.vc 58,675 user tracking / advertising ardrone.swoop.com 45,808 user tracking / advertising s.arlime.com 25,793 user tracking / advertising pageview.activengage.com 19,665 user tracking / advertising
  21. Predictability of Low Cost / Free Certificate Issuer 26 Hypothesis

    Results Actors want to minimize cost of operations so hunting there is useful. • Lots of suspicious and malicious domains discovered • Low fidelity as a standalone analytic but easy to combine • Several legitimate providers seem to be moving to Let’s Encrypt which will dilute the use of this Domain / Server Name Notes b6227.xyz parked namecheap vy95e.xyz parked namecheap accounactivity-info.xyz suspended phishing domain avtopoliv.top ukranian wordpress site c.ozjga.top not super legit iqsns.top 26
  22. Basic Detection → Forensics Process 29 Detection – Identify Suspicious

    Traffic • Traditional Intrusion Detection • File-based detection (sandboxing and static analysis) • Correlation - Threat Intelligence • Advanced analytics – ratio of GETs to HEAD requests Analyze Content for Artifacts • Infrastructure • IPs • Hostnames • Protocol- level content • File-based artifacts Analyze Artifacts • Pull extracted artifacts • Sandbox / analyze files • Enrich / pivot on infrastructure Build Timeline • When did the activity start? • Where were actions taken (affected assets) • What actions were taken • When did the activity end? Impact Analysis + Presentation • Was sensitive data accessed? • What data taken from the environment? Detection Investigation + Forensic Analysis
  23. Detection and Forensics – Unencrypted HTTP When performing detection or

    forensic investigation work on HTTP, you simply look for the bad content (because you can see content) Unencrypted content yields an abundance of forensic artifacts: • Infrastructure information – IPs, Host Headers • Payload / Content – Full content, URI, parameters, response code, user-agent, request/response body, files, etc. With all of this information, analyzing intrusions over HTTP is fairly straight forward 31
  24. Detection and Forensics – Unencrypted HTTP Workflow 32 Starting point

    • Content alerts fire – look to match content Analyze infrastructure (IPs) • Enrich -> pull HTTP Host header to get domain Timelining • Pull content for affected time window • Match content / context with HTTP content • (HTTP response codes, encoded data in GETs or PUTs, files) • Easily match HTTP host records to content alerts and generate information on communications frequency and data types exchanged Finish line / Presentation • Content match to IP, domain infrastructure • Communications timeline and content matched to full infrastructure
  25. Detection and Forensics – Encrypted HTTP/S Detection • No content-based

    detection (files, hostnames, domains) • Basic detection based on IP-matches in Threat Intelligence? No HTTP protocol metadata • Starting with an IP address and an event time… • We can pull events to IP – but how do we understand: • What IPs correspond? • What was the nature of the communication? • Which IP connections relate to? 34 Without all of this information, analyzing intrusions over HTTPs is not fairly straight forward
  26. Detection and Forensics — Encrypted HTTP/S Detection • Can no

    longer match http host data (proxy logs etc) • Instead, we collect SSL certificate data and match on domains in server_name and subject field Since we do not have HTTP payload data • Starting with an IP address and an event time… • We can pull tight-time-bounded PDNS to understand which domains were accessed on cloud / shared hosting IPs • We can pull data sent / received ratio in time window to understand type of communication We need more data — new techniques 35
  27. Detection and Forensics — Encrypted HTTP/S Workflow 36 Starting point

    • Matching on domains / tlds in SSL certificate metadata • Analyze infrastructure (IPs) • Leverage tight-time-bounded PDNS to associate time frames, hostnames and IPs Timelining • Pull events for affected time window • Pull tight-time-bounded PDNS to understand which domains were accessed on cloud / shared hosting IPs • Pull data sent / received ratio in time window to understand type of communication Finish line / Presentation • Content match to IP, domain infrastructure • Communications timeline and content matched to full infrastructure
  28. So… Encryption Isn’t the End of the World Encrypted traffic

    is in abundant use across every enterprise and is growing in prevalence. Attackers will use encryption as a smoke screen to infiltrate networks over legitimate and illegitimate services. You have options: • Terminate encryption to better inspect traffic • Leverage features of encryption for analytics and intel There is a way forward for this: • Detection by looking at predictable or notable features for abnormalities is possible • Forensics is of lower fidelity but you are still able to obtain a lot of critical information 38
  29. Encrypted NSM Security Model (ECNSMM) Level People Process Technology Content

    1 1+ Method for Terminating SSL/TLS traffic to a single appliance Web Proxy/Gateway Content 2 1+ Terminating all traffic to a central TAP/NSM layer SSL Terminator, TAP architecture Metadata 1 1+ Comprehensive visibility and logging of DNS to a central repository, make DNS available to all analysts DNS metadata extraction, logging, access layers Metadata 2 1+ Automatically link DNS and IP resolutions based on time period, basic data flows calculated Big-data infrastructure to dynamically link DNS and IP resolutions in real-time Metadata 3 1+ Data flow analytics, encryption metadata analytics, enrich data onto data flow records Analytics infrastructure to analyze flows and metadata at scale, distributed queuing for enrichment Define a dual-track maturity model for termination and non- termination NSM options: 39