Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tales from the Ops Side - Creepy Crawler

Tales from the Ops Side - Creepy Crawler

Continuing this series, this is a tale about a battle to keep a website online from scrappers trying everything in their arsenal to steal a customer's content.

Hany Fahim

June 25, 2018
Tweet

More Decks by Hany Fahim

Other Decks in Technology

Transcript

  1. Site down! • Very high load on the main database

    server (MySQL). • Looking closer, noticed a lot of similar queries (pagination). • Site recovered on it’s own within 10 minutes.
  2. Investigation • Excessive pagination - could it be a crawler?

    • Customer does a lot of SEO. • Checked logs, noticed a few Yahoo crawlers (less than 200/hr). • Collected info and went back to life.
  3. Same pattern • High load on DB server. • Traced

    some queries to a scheduled job running on an admin server. • Further investigation shows it runs
 every 5 min.
  4. Same pattern • High load on DB server. • Traced

    some queries to a scheduled job running on an admin server. • Further investigation shows it runs
 every 5 min. git blame says it’s 1yr old!
  5. Site recovers • Outage lasted about 15 min. • Cron

    job was prime suspect. • Job is a cache-refresher. • Waited for next execution. Nothing. • Will disable if site goes down again.
  6. Same pattern • High DB load. • Cron was running.

    • Terminated and disabled the job. • Site did not recover. • Where to look next?
  7. Staring at the logs • Something caught our eye. •

    Proxy logs showed many requests with Referrer of customer’s blog. • Grouped these requests together and analyzed further.
  8. Staring at the logs • Also noticed: • User-Agent was

    Baiduspider/2.0. • Request to /en/all-categories • Query String was:
 
 manufacturer=<VARYING NUMBER>
 
 &on_sale=yes
  9. Is it a crawler? • Was Baidu wreaking havoc? •

    Known to misbehave. • 10,829 unique IPs in last 24 hours. • All belonged to American Residential ISPs (Comcast, Verizon, Roadrunner, etc…).
  10. Suspicious • Customer mainly targets Canada with some US business.

    • High volume of American IPs during previous 2 outages. • Site recovered in 15 min. • Flow of American IPs stopped.
  11. Not Baidu • Convinced us this was malicious. Time to

    block. • Want to be surgical. • Because IPs were residential and varied, can’t be used for blocking.
  12. Surgical blocking • Requests for /en/all-categories • User-Agent was Baiduspider/2.0

    • Is on_sale=yes • Had a Referrer of customer’s blog!
  13. Checked logs • Our rules were not being matched anymore.

    • American IPs were coming through. • Referrer was now empty! • Attacker was active and adapting.
  14. Modify block • Remove Referrer condition. • Not preferable -

    less surgical. • Site recovers immediately.
  15. Log check • Same request pattern, same American IPs. •

    Still empty Referrer. • No longer Baiduspider/2.0. • User-Agent was now Chrome!
  16. User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML,

    like Gecko) Chrome/40.0.2214.115 Safari/537.36 This version was 6 months old!
  17. Old Chrome • All the suspicious requests had the same

    User-Agent. • Unusual since Chrome is good at auto- updating. • This version had known vulnerabilities.
  18. CVE-2014-9689
 aka: Gyrophone Gyroscopes found on modern smart phones are

    sufficiently sensitive to measure acoustic signals in the vicinity of the phone.
  19. CVE-2014-9689
 aka: Gyrophone Gyroscopes found on modern smart phones are

    sufficiently sensitive to measure acoustic signals in the vicinity of the phone. Likely not relevant
  20. Adapt block • Strange that so many requests have the

    same User-Agent. • Possible, but unlikely. Client was hesitant. • Adapted block to include this version of Chrome. • Site recovers immediately.
  21. New pattern • Different request: /en/gear-equipment. • Same American IPs,

    same User-Agent. • Decided to block User-Agent outright. • Site recovers immediately.
  22. New User-Agent Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)

    Chrome/ 1413019477.612818924.1339797527.1263967477 Safari/537.36
  23. Getting very annoyed • Clearly an invalid version. • “Version”

    numbers would vary. • OK fine. Let’s play.
  24. Time for RegEx (\d{6,})\.(\d{6,})\.(\d{6,})\.(\d{6,}) This matches 4 sets of 6

    or more numbers like this 1413019477.612818924.1339797527.1263967477
  25. Different pattern • Nothing in the logs. No traffic flow.

    • DB was not loaded this time. • App servers had very high load. • Every app process consuming 100% CPU.
  26. Different pattern • Nothing in the logs. No traffic flow.

    • DB was not loaded this time. • App servers had very high load. • Every app process consuming 100% CPU. Attacker upping their game?
  27. strace • Attached strace to a php-fpm process. • No

    output. Very odd. • Restarted app processes.
  28. Watch logs • Some requests come in, then flow stops

    again. • CPU spikes back up immediately. • Quick analysis: about 200 request after restart. • 20 app servers x 10 processes each = 200.
  29. Isolate • Removed 1 server from the load balancer to

    isolate. • Configured rules to direct our IP to this server. • Reduced process count to 1. • Attached strace and loaded site in browser.
  30. Analysis • Lots of expected output. • Some calls to

    the DB and Redis. • Then nothing. CPU spikes.
  31. Deeper dive • Could it be Redis? No, server was

    healthy. • Turned on “slow request” logging in php-fpm. • Found our culprit! Awesome idea!
  32. 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38 } $code

    is “manufacturer”
 $unKey is “headhaus"
 $hash[$code][$unKey] is 10560
  33. Infinite loop • 'special_char' is set to - • headhaus

    should become headhaus- then headhaus--, etc… • Mage::getStoreConfig values are cached in Redis, then memory. • Redis returns an empty string.
  34. Infinite loop • 'special_char' is set to - • headhaus

    should become headhaus- then headhaus--, etc… • Mage::getStoreConfig values are cached in Redis, then memory. • Redis returns an empty string. Did our attacker compromise Redis somehow?
  35. More scheduled jobs • admin server has a scheduled job

    at 3am to clear the cache. • Also had a job at 12am to refresh products. • Doesn’t really smell like an attack. • Went to bed. Needed to rest.
  36. Following morning • Previous day, customer had updated several components

    of their app. • Newer versions had performance improvements, including phpredis. • Customer was also able to replicate in staging!
  37. Trigger on demand • Running the product refresh job breaks

    the app. • Running cache-clearing job fixes the app. • What was the product refresh job doing?
  38. Simple logic • Clears several cached entries,
 including special_char. •

    Reads in attributes: Mage::getStoreConfig('attributes') • Refreshes catalogue.
  39. Simple logic • Clears several cached entries,
 including special_char. •

    Reads in attributes: Mage::getStoreConfig('attributes') • Refreshes catalogue. This was returning garbage!
  40. Compressed data • New version of phpredis supported compression. •

    admin server was not updated. • Could not parse returned value and fails. • Never re-populates special_char. • App breaks.
  41. Hypotheis • Due to the lack of sophistication of the

    scrape attempt (low skill), • And access to a very large number of compromised systems (>50k — high skill), • It’s likely the attackers rented out a botnet in an effort to steal their content in a rapid fashion.
  42. Lessons • Heavy applications are vulnerable. • Sometimes staring at

    logs works. Sometimes. • strace telling you nothing tells you everything.