Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LDNWebPerf October 2015 - Tudor James

LDNWebPerf October 2015 - Tudor James

Blocking Bots - Good or Bad for Innovation & Performance

London Web Performance Group

October 06, 2015
Tweet

More Decks by London Web Performance Group

Other Decks in Technology

Transcript

  1. Blocking Bots - Good or Bad for Innovation and Performance.

    Tudor James London Web Performance October 2015
  2. Profile Battling ‘Bad Bots’ for over 15 years Worked for

    online business Directories in the UK and US where ‘data scraping’ is a particular problem. Started deploying in house solutions in Yell.com, then involved a security consultancy - Sentor to help increase defenses. Joined Distil, in April of this year. Distil is a SasS provider of Bot Mitigation Services.
  3. What is a Bot - A Bot or Robot is

    simply an automated way of visiting a web property and mimicking human behaviour via a computer program. - Bots Crawl / Spider to download content, purchase concert tickets, book restaurants or cheap flights before human end users get the chance.
  4. What Makes a bot Good or Bad Much depends on

    context. Bad Bots don’t obey the Robots.txt. They disguise themselves as humans or even as the google bot. Generally written for financial gain. Monitoring Software, SEO and SEM tools can all be considered good bots along with a number of online tools for checking web performance. http://www.robotstxt.org/robotstxt.html
  5. Bad Bots tasks Unauthorised downloading and use of Data Price

    Scraping Parasitic business model Brute force login attacks Vulnerability Scans. Click Fraud
  6. Bot Problem Key Facts Non ‘human’ activity on the web

    is increasing. ‘Good-bot’ could be draining your resources. Amazon #1 offender in 2014 Search Engine crawls aggressive indexing
  7. The Growing Bad Bot problem Cloud / DC Based Hosted

    services are getting cheaper. Free Tiers available - semi legal botnets. Docker and other virtualization techniques means software is portable Can simply move to one hosting provider to the next. Hard to detect as IPs are not static. BotNets Number of compromised machines increasing Number of internet and broadband connections increasing globally and connections getting faster. Price of Zombie Armies / BotNets decreasing Detection hard as request might be distributed around 100s of machines each making 1 or 2 requests. VPNs and TOR Network.
  8. How Bad Bots can Impact Performance Makes Capacity planning hard.

    No Quality Assurance on Bad Bots No warning of sudden spikes in usage. Can literally cause the DevOps team to lose sleep ! Burn through your AWS budget
  9. How do you know if you have a Bot Problem

    ? Suspicious Visitor Activity Slow or Unresponsive Pages Skewed Access Logs Unknown Incoming IP Ranges Unexpected Technology costs Foreign User Agents
  10. Detecting and Preventing Bots - Basics User Agent Blocks IP

    Blocks Rate Limiting CAPTCHA Log File analysis
  11. Detecting and Preventing Bots - Advanced Honeypots Good at catching

    less sophisticated bots. Easy to implement, via a HTML snippet. <a href="byyfwvrwrvzxvyabqxdfr.html" style="display: none;" rel="file" id="qqcfxsrq">qqazdstr</a>
  12. Detecting and Preventing Bots - Advanced Check that users of

    the site ‘behave’ like ‘normal users’ Download Images Download JavaScript Execute JavaScript Cookie Validation. If you use Turing tests, only serve CAPTCHA to the suspected Bot, not to all users.
  13. Detecting and Preventing Bots - Advanced Code Obfuscation Make sure

    to keep writers of bots guessing. Change location and names of honey pots and JavaScript files. Data Science / Big data to determine trends
  14. Innovation from Bots Start-ups love scraping, it allow them access

    to content without partnerships, and they have people with the technical skills. Often start-ups will make much better use of the data, than the scraped sites. Some sites will not engage with small start-ups, or charge and limit API access, so only alternative is to write bots.
  15. How to write a good Bot Defined Useragent, with URL

    and Defined set of IP addresses. Follow site guidance for writing a bot. Use Public API where available - check terms and conditions. Follow the Robots.txt file, if no specific guidance use Googlebot. Generally be a good citizen, test and take care.
  16. Conclusions Blocking Bad Bots can increase site performance, increase scale,

    reduce site slows and maintain competitive advantage. Protect data / brand reputation. Bot mitigation can prevent more serious security problems (Login / Password theft, Security Vulnerabilities being detected and exploited) Blocking Bad Bots shouldn’t impact innovation - You should still be able to re- purpose data, just be ‘good’