Tales from the Ops Side - Creepy Crawler

Tales from the Ops Side - Creepy Crawler

Continuing this series, this is a tale about a battle to keep a website online from scrappers trying everything in their arsenal to steal a customer's content.

37c696dc622a7a15f03bf654278370c2?s=128

Hany Fahim

June 25, 2018
Tweet

Transcript

  1. Hany Fahim Founder & CEO @iHandroid Creepy Crawler Tales From

    The Ops Side The Tale of the
  2. It was a lovely late summer’s evening…

  3. ! 10:07pm Site Down! A customer’s large E-commerce app went

    down.
  4. Site down! • Very high load on the main database

    server (MySQL). • Looking closer, noticed a lot of similar queries (pagination). • Site recovered on it’s own within 10 minutes.
  5. Investigation • Excessive pagination - could it be a crawler?

    • Customer does a lot of SEO. • Checked logs, noticed a few Yahoo crawlers (less than 200/hr). • Collected info and went back to life.
  6. ! 4:56am Site down!

  7. Same pattern • High load on DB server. • Traced

    some queries to a scheduled job running on an admin server. • Further investigation shows it runs
 every 5 min.
  8. Same pattern • High load on DB server. • Traced

    some queries to a scheduled job running on an admin server. • Further investigation shows it runs
 every 5 min. git blame says it’s 1yr old!
  9. Site recovers • Outage lasted about 15 min. • Cron

    job was prime suspect. • Job is a cache-refresher. • Waited for next execution. Nothing. • Will disable if site goes down again.
  10. Back to bed. ZZZ

  11. ! 9:19am Site down again!

  12. Same pattern • High DB load. • Cron was running.

    • Terminated and disabled the job. • Site did not recover. • Where to look next?
  13. Staring at the logs • Something caught our eye. •

    Proxy logs showed many requests with Referrer of customer’s blog. • Grouped these requests together and analyzed further.
  14. Staring at the logs • Also noticed: • User-Agent was

    Baiduspider/2.0. • Request to /en/all-categories • Query String was:
 
 manufacturer=<VARYING NUMBER>
 
 &on_sale=yes
  15. Is it a crawler? • Was Baidu wreaking havoc? •

    Known to misbehave. • 10,829 unique IPs in last 24 hours. • All belonged to American Residential ISPs (Comcast, Verizon, Roadrunner, etc…).
  16. Suspicious • Customer mainly targets Canada with some US business.

    • High volume of American IPs during previous 2 outages. • Site recovered in 15 min. • Flow of American IPs stopped.
  17. Not Baidu • Convinced us this was malicious. Time to

    block. • Want to be surgical. • Because IPs were residential and varied, can’t be used for blocking.
  18. Surgical blocking • Requests for /en/all-categories • User-Agent was Baiduspider/2.0

    • Is on_sale=yes • Had a Referrer of customer’s blog!
  19. Sit and wait.

  20. ! 1:36pm 4hrs later

  21. Checked logs • Our rules were not being matched anymore.

    • American IPs were coming through. • Referrer was now empty! • Attacker was active and adapting.
  22. Modify block • Remove Referrer condition. • Not preferable -

    less surgical. • Site recovers immediately.
  23. Sit and wait.

  24. ! 10:50am The next day

  25. Log check • Same request pattern, same American IPs. •

    Still empty Referrer. • No longer Baiduspider/2.0. • User-Agent was now Chrome!
  26. User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML,

    like Gecko) Chrome/40.0.2214.115 Safari/537.36
  27. User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML,

    like Gecko) Chrome/40.0.2214.115 Safari/537.36 This version was 6 months old!
  28. Old Chrome • All the suspicious requests had the same

    User-Agent. • Unusual since Chrome is good at auto- updating. • This version had known vulnerabilities.
  29. CVE-2014-9689
 aka: Gyrophone Gyroscopes found on modern smart phones are

    sufficiently sensitive to measure acoustic signals in the vicinity of the phone.
  30. CVE-2014-9689
 aka: Gyrophone Gyroscopes found on modern smart phones are

    sufficiently sensitive to measure acoustic signals in the vicinity of the phone. Likely not relevant
  31. Adapt block • Strange that so many requests have the

    same User-Agent. • Possible, but unlikely. Client was hesitant. • Adapted block to include this version of Chrome. • Site recovers immediately.
  32. ! 12:24pm 1.5hrs later

  33. New pattern • Different request: /en/gear-equipment. • Same American IPs,

    same User-Agent. • Decided to block User-Agent outright. • Site recovers immediately.
  34. ! 3:32pm 3hrs later

  35. Come on! • American IPs. • Same request. • New

    User-Agent?
  36. New User-Agent Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)

    Chrome/ 1413019477.612818924.1339797527.1263967477 Safari/537.36
  37. Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 1413019477.612818924.1339797527.1263967477

    Safari/537.36 Now using Linux? New User-Agent
  38. Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 1413019477.612818924.1339797527.1263967477

    Safari/537.36 What the heck is this? Now using Linux? New User-Agent
  39. Getting very annoyed • Clearly an invalid version. • “Version”

    numbers would vary. • OK fine. Let’s play.
  40. Time for RegEx (\d{6,})\.(\d{6,})\.(\d{6,})\.(\d{6,})

  41. Time for RegEx (\d{6,})\.(\d{6,})\.(\d{6,})\.(\d{6,}) This matches 4 sets of 6

    or more numbers like this 1413019477.612818924.1339797527.1263967477
  42. Site recovers immediately.

  43. Site recovers immediately. Bring it on.

  44. ! 4:31pm 1hr later

  45. Mozilla/5.0 (Windows NT 6.1) AppleWebKit/ 537.36 (KHTML, like Gecko) Chrome/0.0.0.0

    Safari/537.36 New User-Agent
  46. Mozilla/5.0 (Windows NT 6.1) AppleWebKit/ 537.36 (KHTML, like Gecko) Chrome/0.0.0.0

    Safari/537.36 Really? Windows? New User-Agent
  47. Adapt: 0\.0\.0\.0

  48. Adapt: 0\.0\.0\.0 Recover.

  49. Adapt: 0\.0\.0\.0 Recover. We can
 do this
 all day.

  50. And then silence.

  51. ! 12:02am 2 days later

  52. Different pattern • Nothing in the logs. No traffic flow.

    • DB was not loaded this time. • App servers had very high load. • Every app process consuming 100% CPU.
  53. Different pattern • Nothing in the logs. No traffic flow.

    • DB was not loaded this time. • App servers had very high load. • Every app process consuming 100% CPU. Attacker upping their game?
  54. strace • Attached strace to a php-fpm process. • No

    output. Very odd. • Restarted app processes.
  55. Watch logs • Some requests come in, then flow stops

    again. • CPU spikes back up immediately. • Quick analysis: about 200 request after restart. • 20 app servers x 10 processes each = 200.
  56. Every request locks up every php-fpm process?

  57. Every request locks up every php-fpm process? What was our

    attacker up to?
  58. Isolate • Removed 1 server from the load balancer to

    isolate. • Configured rules to direct our IP to this server. • Reduced process count to 1. • Attached strace and loaded site in browser.
  59. Analysis • Lots of expected output. • Some calls to

    the DB and Redis. • Then nothing. CPU spikes.
  60. Deeper dive • Could it be Redis? No, server was

    healthy. • Turned on “slow request” logging in php-fpm. • Found our culprit! Awesome idea!
  61. Stack trace script_filename = /data/app/index.php [0x00007f5baee1e1d0] getAllFilterableOptionsAsHash() /data/app/code/Shopby/Helper/Attributes.php:36

  62. Stack trace script_filename = /data/app/index.php [0x00007f5baee1e1d0] getAllFilterableOptionsAsHash() /data/app/code/Shopby/Helper/Attributes.php:36 What’s on

    line 36?
  63. Line 36 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38

    }
  64. Line 36 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38

    } Seems to be stuck here.
  65. Line 36 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38

    } Seems to be stuck here. Debug with print statements!
  66. 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38 } $code

    is “manufacturer”
 $unKey is “headhaus"
 $hash[$code][$unKey] is 10560
  67. 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38 } Appends

    to $unKey
  68. 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38 } Appends

    to $unKey This returns nothing!
  69. 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38 } Appends

    to $unKey This returns nothing! No wonder we’re looping!
  70. Infinite loop • 'special_char' is set to - • headhaus

    should become headhaus- then headhaus--, etc… • Mage::getStoreConfig values are cached in Redis, then memory. • Redis returns an empty string.
  71. Infinite loop • 'special_char' is set to - • headhaus

    should become headhaus- then headhaus--, etc… • Mage::getStoreConfig values are cached in Redis, then memory. • Redis returns an empty string. Did our attacker compromise Redis somehow?
  72. Site recovers on its own.

  73. Site recovers on its own. We were down for 3hrs.

    Time check: 3am
  74. 36 while (isset($hash[$code][$unKey]))){ 37 $unKey .= Mage::getStoreConfig('special_char'); 38 } This

    now returns -! Loop is broken!
  75. More scheduled jobs • admin server has a scheduled job

    at 3am to clear the cache. • Also had a job at 12am to refresh products. • Doesn’t really smell like an attack. • Went to bed. Needed to rest.
  76. Following morning • Previous day, customer had updated several components

    of their app. • Newer versions had performance improvements, including phpredis. • Customer was also able to replicate in staging!
  77. Trigger on demand • Running the product refresh job breaks

    the app. • Running cache-clearing job fixes the app. • What was the product refresh job doing?
  78. Simple logic • Clears several cached entries,
 including special_char. •

    Reads in attributes: Mage::getStoreConfig('attributes') • Refreshes catalogue.
  79. Simple logic • Clears several cached entries,
 including special_char. •

    Reads in attributes: Mage::getStoreConfig('attributes') • Refreshes catalogue. This was returning garbage!
  80. Compressed data • New version of phpredis supported compression. •

    admin server was not updated. • Could not parse returned value and fails. • Never re-populates special_char. • App breaks.
  81. Updating on admin server resolves the issue.

  82. Attacker never came back.

  83. Few months later customer found their own content on other

    e-commerce sites and blogs.
  84. Hypotheis • Due to the lack of sophistication of the

    scrape attempt (low skill), • And access to a very large number of compromised systems (>50k — high skill), • It’s likely the attackers rented out a botnet in an effort to steal their content in a rapid fashion.
  85. Lessons • Heavy applications are vulnerable.

  86. Lessons • Heavy applications are vulnerable. • Sometimes staring at

    logs works. Sometimes.
  87. Lessons • Heavy applications are vulnerable. • Sometimes staring at

    logs works. Sometimes. • strace telling you nothing tells you everything.
  88. Hany Fahim Founder & CEO @iHandroid Thank you Psst… we’re

    hiring!