Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DDoS Beasts and How to Fight Them (Nginx Conf 2018)

DDoS Beasts and How to Fight Them (Nginx Conf 2018)

DDoS threat has been rapidly evolving recently, up to the point when
it started to be a community-wide problem. Numerous IoT-related
working groups were spawned throughout the last 2 years mostly due to
the infamous 1,1Tbps IoT DDoS attack in autumn 2016. Fast-forward 1,5
years, and we see attacks even more disastrous.

This tutorial aims at dissecting the DDoS threat. It goes over the
ISO/OSI layers, offering a mutually exclusive and collectively
exhaustive classification of denial-of-service attacks, a description
of what makes them possible, and a set of possible ways to mitigate
attacks of any kind, from an ISP perspective.

The tutorial is based on a personal experience. It is vendor-agnostic
and doesn't cover or promote any solutions available on the market, an
attendee is welcome to use this as a guide to build their own.

Video:
- First part: https://youtu.be/9psG7knYFtw
- Second part: https://youtu.be/YAX11MKMDb4

Artyom "Töma" Gavrichenkov

October 09, 2018
Tweet

More Decks by Artyom "Töma" Gavrichenkov

Other Decks in Technology

Transcript

  1. Timeline of ancient history •First attacks: 1999-2000 •2005: STRIDE model

    by Microsoft • Spoofing Identity • Tampering with Data • Repudiation • Information Disclosure • Denial of Service • Elevation of Privileges
  2. [D?]DoS The difference between “a distributed attack” and an, err,

    not distributed one is vague. Traditional meaning: a distributed attack comes from multiple sources. • What is a source? Is it an IP address or a machine? • If it is a machine, does a virtual instance count? Or a few instances under the same physical hypervisor? What if they often migrate between physical machines? If I’m a victim, how do I tell a single-sourced from a multiple-sourced? • If it is an IP, then how do we treat spoofed traffic?
  3. [D?]DoS Hence, a different sort of thinking applies: • DoS

    (as implied in STRIDE): a vulnerability in a software (e.g. NULL pointer dereference, like Ping of Death) • DDoS: computational resource exhaustion
  4. Risk management The basic idea behind STRIDE and other approaches

    is risk assessment, modelling and management.
  5. Probability/Impact Matrix Trivial Minor Moderate Significant Severe Rare Unlikely Moderate

    Likely Very Likely DDoS attack, 2018 • Impact: Severe • Probability: ?
  6. Motivation of an attacker • Fun! • Blackmail • Self-promotion

    • Political statement • Revenge • Market competition • Diverting attention (e.g. in case of theft) • Preventing access to a compromising information
  7. Motivation of an attacker • Fun! • Blackmail • Self-promotion

    • Political statement • Revenge • Market competition • Diverting attention (e.g. in case of theft) • Preventing access to a compromising information Rather hard to evaluate and control More or less predictable!
  8. Network resource exhaustion • A computer network, as of today*,

    consists of layers • A network resource is not available to its users when at least one network layer fails to provide service • Hence, a DDoS attack can be attributed to a network layer which it affects
  9. DDoS Classification L2-3: L4-6: L7: generic bandwidth exhaustion According to

    the ISO/OSI model: exploitation of TCP/TLS edge cases application-specific bottlenecks
  10. Typical amplification attack • Most servers on the Internet send

    more data to a client than they receive • UDP-based servers generally do not verify the source IP address • This allows for amplification DDoS Attacker Victim Src: victim (spoofed) Dst: amplifier “ANY? com.” 1 Gbps Src: amplifier Dst: victim ”com. NS i.gtld-...” 29 Gbps
  11. • NTP • DNS • SNMP • SSDP • ICMP

    • NetBIOS • RIPv1 • PORTMAP • CHARGEN • Quake • Steam • … Vulnerable protocols • A long list actually • Mostly obsolete protocols (RIPv1 anyone?) • Modern protocols as well: gaming
  12. • As it’s mostly obsolete servers, they eventually get updated

    • or replaced • or just trashed • Thus, the amount of amplifiers shows steady downtrend Vulnerable servers Source: Qrator.Radar network scanner
  13. • Downtrend in terms of the amount – and a

    downtrend in terms of available power • However, once in a while, a new vulnerable protocol is discovered Amp power Source: Qrator.Radar network scanner
  14. • Most amplification attacks are easy to track, as the

    source UDP port is fixed Mitigation • NTP • DNS • SNMP • SSDP • ICMP • NetBIOS • RIPv1 • PORTMAP • CHARGEN • QOTD • Quake • …
  15. • Most amplification attacks are easy to track, as the

    source UDP port is fixed • Two major issues: • ICMP • Amplification without a fixed port (Bittorrent?) Mitigation • NTP • DNS • SNMP • SSDP • ICMP • NetBIOS • RIPv1 • PORTMAP • CHARGEN • QOTD • Quake • …
  16. memcached •A fast in-memory cache •Heavily used in Web development

    •Listens on all interfaces, port 11211, by default
  17. memcached •Basic ASCII protocol doesn’t do authentication •2014, Blackhat USA:

    “An attacker can inject arbitrary data into memory”
  18. memcached •Basic ASCII protocol doesn’t do authentication •2014, Blackhat USA:

    “An attacker can inject arbitrary data into memory” •2017, Power of Community: “An attacker can send data from memory to a third party via spoofing victim’s IP address”
  19. print ’\0\x01\0\0\0\x01\0\0gets a a a a a\r\n’ – to retrieve

    a value 5 times. Or 10 times. Or a hundred.
  20. Default memcached conf. in Red Hat • memcached listens on

    all network interfaces • both TCP and UDP transports are enabled • no authentication is required to access Memcached • the service has to be manually enabled or started • the default firewall configuration does not allow remote access to Memcached •Also Zimbra, etc.
  21. Amplification factor 0 200 400 600 NTP CharGEN QotD RIPv1

    Quake LDAP Source: https://www.us-cert.gov/ncas/alerts/TA14-017A • Typical amplification factor used to be hundreds • For memcached, it’s millions, and no fixed source port • Amplification isn’t something to underestimate
  22. ipv4 access-list exploitable-ports permit udp any eq 11211 any !

    ipv6 access-list exploitable-ports-v6 permit udp any eq 11211 any ! class-map match-any exploitable-ports match access-group ipv4 exploitable-ports end-class-map ! policy-map ntt-external-in class exploitable-ports police rate percent 1 conform-action transmit exceed-action drop ! set precedence 0 set mpls experimental topmost 0 ! Source: http://mailman.nlnog.net/pipermail/nlnog/2018-March/002697.html
  23. ... class class-default set mpls experimental imposition 0 set precedence

    0 ! end-policy-map ! interface Bundle-Ether19 description Customer: the best customer service-policy input ntt-external-in ipv4 address xxx/x ipv6 address yyy/y ... ! interface Bundle-Ether20 service-policy input ntt-external-in ... ... etc ... Source: http://mailman.nlnog.net/pipermail/nlnog/2018-March/002697.html
  24. Proof of Source Address Ownership E.g., QUIC: • Initial handshake

    packet padded to 1280 bytes • Source address validation
  25. IoT attacks! •2014: LizardStresser •2015: SOHO routers become a persistent

    target for malware •2016: Mirai •2017: Persirai, Hajime, …
  26. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks
  27. L2-3 mitigation From a victim’s perspective: • Anycast network with

    enough inspection power • Inventory management to drop unsolicited traffic vectors (e.g. UDP towards an HTTP server) • Rate-limiting less important traffic • Challenges and handshakes (more on that later)
  28. L2-3 mitigation From a victim’s perspective: • Anycast network with

    enough inspection power • Inventory management to drop unsolicited traffic vectors (e.g. UDP towards an HTTP server) • Rate-limiting less important traffic • Challenges and handshakes (more on that later) From an ISP’s view: • Simple heuristics against typical attacks • RTBH (and let the customer take care of it themselves)
  29. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks
  30. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks • L4-6 • SYN flood, TCP connection flood, Sockstress, and so on • TLS attacks
  31. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks • L4-6 • SYN flood, TCP connection flood, Sockstress, and so on • TLS attacks An attack can affect multiple layers at once
  32. Combined attacks • Say, NTP amplification and SYN flood at

    the same time. • The idea is to divert attention of people who are in charge of mitigation and to prevent them from focusing on the real threat
  33. 21:30:01.226868 IP 94.251.116.51 > 178.248.233.141: GREv0, length 544: IP 184.224.242.144.65323

    > 167.42.221.164.80: UDP, length 512 21:30:01.226873 IP 46.227.212.111 > 178.248.233.141: GREv0, length 544: IP 90.185.119.106.50021 > 179.57.238.88.80: UDP, length 512 21:30:01.226881 IP 46.39.29.150 > 178.248.233.141: GREv0, length 544: IP 31.173.79.118.42580 > 115.108.7.79.80: UDP, length 512
  34. L4+ mitigation • SYN flood: 3-way handshake-based SYN cookies &

    SYN proxy, allowing a victim to verify the source IP address
  35. L4+ mitigation • SYN flood: 3-way handshake-based SYN cookies &

    SYN proxy, allowing a victim to verify the source IP address • Other packet-based flood: other handshakes and challenges to do the same • The rest: session analysis, heuristics and blocklists
  36. A True Story • An enterprise got ~40 Gbps of

    DNS amplification • Decided it’s a good idea to parse the source IP addresses of reflectors and populate a blocklist
  37. A True Story • An enterprise got ~40 Gbps of

    DNS amplification • Decided it’s a good idea to parse the source IP addresses of reflectors and populate a blocklist • 2 hours after, the attacker started enumerating IPv4 0/0 within empty packets’ sources (with source UDP port 53) • Started with most popular ISP access prefixes
  38. A True Story • An enterprise got ~40 Gbps of

    DNS amplification • Decided it’s a good idea to parse the source IP addresses of reflectors and populate a blocklist • 2 hours after, the attacker started enumerating IPv4 0/0 within empty packets’ sources (with source UDP port 53) • Started with most popular ISP access prefixes • 8 hours later, nothing is working, ~1 bln IPv4 in blocklist
  39. L4+ mitigation • SYN flood: 3-way handshake-based SYN cookies &

    SYN proxy, allowing a victim to verify the source IP address • Other packet-based flood: other handshakes and challenges to do the same • The rest: session analysis, heuristics and blocklists
  40. L4+ mitigation • SYN flood: 3-way handshake-based SYN cookies &

    SYN proxy, allowing a victim to verify the source IP address • Other packet-based flood: other handshakes and challenges to do the same • The rest: session analysis, heuristics and blocklists • It is dangerous to use blocklists or allowlists without source IP address verification! • Do not forget about inventory management!
  41. L4+ mitigation • L2-L4 attacks might target not only servers,

    but client networks as well • Real world scenarios: • Gaming and betting: altering the results of an online tournament • Altering results of online exams to prevent competing students from collecting good marks • Stocks and auctions • https://www.v3.co.uk/v3-uk/news/2478411/ec-offices-taken-offline-by- large-scale-ddos-attack • Defense is basically the same • Scalability is a problem though
  42. L4+ mitigation • It’s wrong to believe L4 is only

    TCP (though, yes, UDP doesn’t matter a lot) • New transport protocols are implemented • By vendors • By applications • By IETF • End-user servers? • End-user backoffice? • Transit and ISPs?
  43. Blocking known attack sources • Also known as: “I’m not

    expecting Spanish inquisition Chinese customers, why don’t we just deny access to the Chinese IPs?”
  44. Network Redlining Why is it a bad idea? Here are

    a few reasons: • GeoIP databases are unofficial and unreliable
  45. MaxMind GeoIP database Has its “owner location vs actual location”

    dilemma. Generally unreliable for anything except statistics. • https://stackoverflow.com/questions/22986794/continuously- decreasing-accuracy-of-maxmind-geolite-city • https://www.techdirt.com/articles/20160413/12012834171/ho w-bad-are-geolocation-tools-really-really-bad.shtml • https://splinternews.com/how-an-internet-mapping-glitch- turned-a-random-kansas-f-1793856052
  46. MaxMind GeoIP database Has its “owner location vs actual location”

    dilemma. Generally unreliable for anything except statistics. • There’s no geography on the Internet, just network topology. • There are no countries, just autonomous systems and their relations.
  47. Network Redlining Why is it a bad idea? Here are

    a few reasons: • GeoIP databases are unofficial and unreliable
  48. Network Redlining Why is it a bad idea? Here are

    a few reasons: • GeoIP databases are unofficial and unreliable • IP addresses get sold and bought • Some IP networks are being used far from the original RIR • Anycast
  49. Network Redlining • GeoIP databases are unofficial and unreliable •

    IP addresses get sold and bought • Some IP networks are being used far from the original RIR • Anycast Some of the above might be better with IPv6.
  50. IPv6 issues • 128-bit IP addresses • Possible: to address

    each atom on the Earth surface • Impossible: to store a large number of entries in memory • About 10 years ago, blocking whole IPv4 networks was already considered a bad practice • With IPv6, this method has no other way than to return
  51. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks • L4-6 • SYN flood, TCP connection flood, Sockstress, and so on • TLS attacks
  52. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks • L4-6 • SYN flood, TCP connection flood, Sockstress, and so on • TLS attacks • L7 • Application-specific flood
  53. GET /whatever User-Agent: WordPress/3.9.2; http://example.com/; verifying pingback from 192.0.2.150 •

    150 000 – 170 000 vulnerable servers at once • SSL/TLS-enabled Wordpress Pingback Data from Qrator monitoring engine
  54. Another example of a L7 attack: FBS • A bot

    can actually be more clever than a Wordpress machine • Advanced botnets are capable of using a headless browser (IE/Edge or Chrome) => “full browser stack” (FBS) botnets • A FBS-enabled bot is able to go through even complex challenges, like Javascript code execution
  55. Another example of a L7 attack: FBS CAPTCHA is a

    weapon of last resort against FBS, when we speak of active countermeasures. Pros: • Easy to implement • Generally, might work
  56. CAPTCHA Cons (1/3): • Requires UX injection, may break UX

    • Breaks mobile applications • Sometimes harder for humans than for robots
  57. CAPTCHA Cons (2/3): • Requires UX injection, may break UX

    • Breaks mobile applications • Sometimes harder for humans than for robots • Not all bots are malicious, and not all humans are innocent • CAPTCHA proxies and farms, like http://antigate.com/ • Malware is able to inject CAPTCHA into pages user of the infected computer is looking at
  58. CAPTCHA Cons (3/3): • Requires UX injection, may break UX

    • Breaks mobile applications • Sometimes harder for humans than for robots • Not all bots are malicious, and not all humans are innocent • CAPTCHA proxies and farms, like http://antigate.com/ • Malware is able to inject CAPTCHA into pages user of the infected computer is looking at • OCR tools evolve fast • Voice recognition evolves even faster • “Security by obscurity”: an open-sourced CAPTCHA is (relatively) easy to break using open source machine learning tools.
  59. Another example of a L7 attack: FBS Under most conditions,

    unlike Wordpress pingback, such attacks won’t cause a link degradation, hence generally out of scope of a network operator’s responsibility
  60. Another example of a L7 attack: DNS • DNS is

    built on top of UDP*, and a DNS request fits in a packet • The structure of a DNS query is simple
  61. 10:00:34.510826 IP (proto UDP (17), length 56) 192.168.1.5.63097 > 8.8.8.8.53:

    9508+ A? facebook.com. (30) 10:00:34.588632 IP (proto UDP (17), length 72) 8.8.8.8.53 > 192.168.1.5.63097: 9508 1/0/0 facebook.com. A 31.13.72.36 (45) DNS lookup
  62. DNS lookup • DNS is built on top of UDP*,

    and a DNS request fits in a packet • The structure of a DNS query is simple • An attacker capable of generating spoofed queries will make a userspace DNS application process all those fake requests, rendering a DNS server unavailable L7-wise.
  63. DNS lookup • An attacker capable of generating spoofed queries

    will make an userspace DNS application process all those fake requests, rendering a DNS server unavailable, this time L7-wise. • “Water torture” • This is what happened in October 2016 with Dyn.
  64. DNS lookup • An attacker capable of generating spoofed queries

    will make an userspace DNS application process all those fake requests, rendering a DNS server unavailable, this time L7-wise. • Luckily, DNS protocol allows switching to TCP, and in TCP, we have a handshake to verify the source IP address, hence, blocklists apply. • Once again, though, enough bandwidth and inspection power is required
  65. DNS lookup • Luckily, DNS protocol allows switching to TCP,

    and in TCP, we have a handshake to verify the source IP address, hence, blocklists apply. • Unfortunately, other UDP-based protocols (e.g. gaming) are mostly built without DDoS mitigation in mind
  66. L7 mitigation COMPLICATED • Active: • HTTP/JS challenges • CAPTCHA

    • Passive: • Application session analysis • Big Data • Correlation, machine learning • Monitoring, incident response
  67. False P/N • Everything learning-based is not strict • A

    false positive: the algorithm shows a match when there’s no match • A false negative: the algorithm shows no match when there’s a match • Basically, any algorithm may be tuned to either 0% FP or 0% FN • The truth is somewhere in between • The balance is defined by the purpose
  68. Attack examples • L2-3 • Volumetric attacks: UDP flood, SYN

    flood, amplification, and so on (we don’t need to care exactly) • Infrastructure attacks • L4-6 • SYN flood, TCP connection flood, Sockstress, and so on • TLS attacks • L7 • Application-based flood A classification which is: • Mutually exclusive * • Collectively exhaustive
  69. A decades old job interview quiz • “What happens when

    you type www.google.com in your browser?” • https://github.com/alex/what-happens-when:
  70. “What happens when…”? • DNS lookup • Opening of a

    socket • TLS handshake • HTTP protocol • HTTP Server Request Handle
  71. “What happens when…”? • DNS lookup • IPv4/IPv6 selection •

    Opening of a socket • Deep packet inspection • TLS handshake • CRL/OCSP • HTTP protocol • Load balancer • HTTP Server Request Handle • CDN
  72. “What happens when…”? • DNS lookup • IPv4/IPv6 selection •

    Opening of a socket • Deep packet inspection • TLS handshake • CRL/OCSP • HTTP protocol • Load balancer • HTTP Server Request Handle • CDN • As the Dyn incident shows: an application server could not only be a direct target of a DDoS attack • Each step could suffer from an attack, L2-L7-wise • Inventory management • Infrastructure monitoring
  73. Architectural view • Security is not a product, not an

    appliance, it’s a process • Ability of a DDoS mitigation must be built into the design of any protocol • A concerned company must follow policies: • Updates • Risk management • Incident handling
  74. Risk management for an ISP/DC/cloud • A network operator will

    basically suffer only from bandwidth-consuming attacks • Sometimes, cloud adds CPU/memory costs • However, an attacker will most likely use just the tool they have at their disposal: amplifier or a botnet, doesn’t matter • Thus, the probability of an attack towards the network is the aggregate probability of an attack for each customer in the network
  75. Risk management for a customer • The rest of it!

    • It’s important to stay aware of PR activities, marketing initiatives, and news • Even more important: to choose a solution, given all the layers and risks
  76. What’s next? •memcached: • Disclosure in November 2017 • In

    the wild: February 2018 •Three months are an overly short interval •Next time, it might be even shorter •Meltdown/Spectre show: the “embargo” approach doesn’t work well for a community large enough
  77. What’s next? •The problem is not Internet of Things only,

    it’s the overall insecurity, operational failures, and ignorance of some Internet community members. •Sounds like we’ve found the root cause… yet, it won’t go away anytime soon.
  78. What’s next? •Collaboration •Proper and timely reaction •Reach out to

    your CERT/CSIRT (you do have one, right?) for advisory.