Slide 1

Slide 1 text

Blocking Bots - Good or Bad for Innovation and Performance. Tudor James London Web Performance October 2015

Slide 2

Slide 2 text

Profile Battling ‘Bad Bots’ for over 15 years Worked for online business Directories in the UK and US where ‘data scraping’ is a particular problem. Started deploying in house solutions in Yell.com, then involved a security consultancy - Sentor to help increase defenses. Joined Distil, in April of this year. Distil is a SasS provider of Bot Mitigation Services.

Slide 3

Slide 3 text

What is a Bot - A Bot or Robot is simply an automated way of visiting a web property and mimicking human behaviour via a computer program. - Bots Crawl / Spider to download content, purchase concert tickets, book restaurants or cheap flights before human end users get the chance.

Slide 4

Slide 4 text

What Makes a bot Good or Bad Much depends on context. Bad Bots don’t obey the Robots.txt. They disguise themselves as humans or even as the google bot. Generally written for financial gain. Monitoring Software, SEO and SEM tools can all be considered good bots along with a number of online tools for checking web performance. http://www.robotstxt.org/robotstxt.html

Slide 5

Slide 5 text

Bad Bots tasks Unauthorised downloading and use of Data Price Scraping Parasitic business model Brute force login attacks Vulnerability Scans. Click Fraud

Slide 6

Slide 6 text

Bot Problem Key Facts Non ‘human’ activity on the web is increasing. ‘Good-bot’ could be draining your resources. Amazon #1 offender in 2014 Search Engine crawls aggressive indexing

Slide 7

Slide 7 text

The Growing Bad Bot problem Cloud / DC Based Hosted services are getting cheaper. Free Tiers available - semi legal botnets. Docker and other virtualization techniques means software is portable Can simply move to one hosting provider to the next. Hard to detect as IPs are not static. BotNets Number of compromised machines increasing Number of internet and broadband connections increasing globally and connections getting faster. Price of Zombie Armies / BotNets decreasing Detection hard as request might be distributed around 100s of machines each making 1 or 2 requests. VPNs and TOR Network.

Slide 8

Slide 8 text

How Bad Bots can Impact Performance Makes Capacity planning hard. No Quality Assurance on Bad Bots No warning of sudden spikes in usage. Can literally cause the DevOps team to lose sleep ! Burn through your AWS budget

Slide 9

Slide 9 text

How do you know if you have a Bot Problem ? Suspicious Visitor Activity Slow or Unresponsive Pages Skewed Access Logs Unknown Incoming IP Ranges Unexpected Technology costs Foreign User Agents

Slide 10

Slide 10 text

Detecting and Preventing Bots - Basics User Agent Blocks IP Blocks Rate Limiting CAPTCHA Log File analysis

Slide 11

Slide 11 text

Detecting and Preventing Bots - Advanced Honeypots Good at catching less sophisticated bots. Easy to implement, via a HTML snippet. qqazdstr

Slide 12

Slide 12 text

Detecting and Preventing Bots - Advanced Check that users of the site ‘behave’ like ‘normal users’ Download Images Download JavaScript Execute JavaScript Cookie Validation. If you use Turing tests, only serve CAPTCHA to the suspected Bot, not to all users.

Slide 13

Slide 13 text

Detecting and Preventing Bots - Advanced Code Obfuscation Make sure to keep writers of bots guessing. Change location and names of honey pots and JavaScript files. Data Science / Big data to determine trends

Slide 14

Slide 14 text

Detecting and Preventing Bots - Professional Fingerprinting Network Effect

Slide 15

Slide 15 text

Innovation from Bots Start-ups love scraping, it allow them access to content without partnerships, and they have people with the technical skills. Often start-ups will make much better use of the data, than the scraped sites. Some sites will not engage with small start-ups, or charge and limit API access, so only alternative is to write bots.

Slide 16

Slide 16 text

How to write a good Bot Defined Useragent, with URL and Defined set of IP addresses. Follow site guidance for writing a bot. Use Public API where available - check terms and conditions. Follow the Robots.txt file, if no specific guidance use Googlebot. Generally be a good citizen, test and take care.

Slide 17

Slide 17 text

Conclusions Blocking Bad Bots can increase site performance, increase scale, reduce site slows and maintain competitive advantage. Protect data / brand reputation. Bot mitigation can prevent more serious security problems (Login / Password theft, Security Vulnerabilities being detected and exploited) Blocking Bad Bots shouldn’t impact innovation - You should still be able to re- purpose data, just be ‘good’

Slide 18

Slide 18 text

Questions, Comments ? [email protected] www.distilnetworks.com London Web Performance October 2015