Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LDNWebPerf October 2015 - Tudor James

LDNWebPerf October 2015 - Tudor James

Blocking Bots - Good or Bad for Innovation & Performance

London Web Performance Group

October 06, 2015
Tweet

More Decks by London Web Performance Group

Other Decks in Technology

Transcript

  1. Blocking Bots - Good or Bad
    for Innovation and
    Performance.
    Tudor James
    London Web Performance October 2015

    View Slide

  2. Profile
    Battling ‘Bad Bots’ for over 15 years
    Worked for online business Directories in the UK
    and US where ‘data scraping’ is a particular
    problem.
    Started deploying in house solutions in Yell.com,
    then involved a security consultancy - Sentor to
    help increase defenses.
    Joined Distil, in April of this year. Distil is a SasS
    provider of Bot Mitigation Services.

    View Slide

  3. What is a Bot
    - A Bot or Robot is simply an automated way of
    visiting a web property and mimicking human
    behaviour via a computer program.
    - Bots Crawl / Spider to download content, purchase
    concert tickets, book restaurants or cheap flights
    before human end users get the chance.

    View Slide

  4. What Makes a bot Good or Bad
    Much depends on context.
    Bad Bots don’t obey the Robots.txt. They
    disguise themselves as humans or even as the
    google bot. Generally written for financial gain.
    Monitoring Software, SEO and SEM tools can
    all be considered good bots along with a
    number of online tools for checking web
    performance.
    http://www.robotstxt.org/robotstxt.html

    View Slide

  5. Bad Bots tasks
    Unauthorised downloading and use of Data
    Price Scraping
    Parasitic business model
    Brute force login attacks
    Vulnerability Scans.
    Click Fraud

    View Slide

  6. Bot Problem
    Key Facts
    Non ‘human’ activity on the web is increasing.
    ‘Good-bot’ could be draining your resources.
    Amazon #1 offender in 2014
    Search Engine crawls aggressive indexing

    View Slide

  7. The Growing Bad Bot problem
    Cloud / DC Based
    Hosted services are getting cheaper.
    Free Tiers available - semi legal botnets.
    Docker and other virtualization techniques
    means software is portable
    Can simply move to one hosting provider to the
    next.
    Hard to detect as IPs are not static.
    BotNets
    Number of compromised machines increasing
    Number of internet and broadband connections
    increasing globally and connections getting
    faster.
    Price of Zombie Armies / BotNets decreasing
    Detection hard as request might be distributed
    around 100s of machines each making 1 or 2
    requests.
    VPNs and TOR Network.

    View Slide

  8. How Bad Bots can Impact Performance
    Makes Capacity planning hard.
    No Quality Assurance on Bad Bots
    No warning of sudden spikes in usage.
    Can literally cause the DevOps team to lose sleep !
    Burn through your AWS budget

    View Slide

  9. How do you know if you have a Bot Problem ?
    Suspicious Visitor Activity
    Slow or Unresponsive Pages
    Skewed Access Logs
    Unknown Incoming IP Ranges
    Unexpected Technology costs
    Foreign User Agents

    View Slide

  10. Detecting and Preventing Bots - Basics
    User Agent Blocks
    IP Blocks
    Rate Limiting
    CAPTCHA
    Log File analysis

    View Slide

  11. Detecting and Preventing Bots - Advanced
    Honeypots
    Good at catching less sophisticated bots.
    Easy to implement, via a HTML snippet.
    id="qqcfxsrq">qqazdstr

    View Slide

  12. Detecting and Preventing Bots - Advanced
    Check that users of the site ‘behave’ like ‘normal users’
    Download Images
    Download JavaScript
    Execute JavaScript
    Cookie Validation.
    If you use Turing tests, only serve CAPTCHA to the suspected Bot, not to all
    users.

    View Slide

  13. Detecting and Preventing Bots - Advanced
    Code Obfuscation
    Make sure to keep writers of bots guessing.
    Change location and names of
    honey pots and JavaScript files.
    Data Science / Big data to determine trends

    View Slide

  14. Detecting and Preventing Bots - Professional
    Fingerprinting
    Network Effect

    View Slide

  15. Innovation from Bots
    Start-ups love scraping, it allow them access to
    content without partnerships, and they have people
    with the technical skills.
    Often start-ups will make much better use of the data,
    than the scraped sites.
    Some sites will not engage with small start-ups, or
    charge and limit API access, so only alternative is to
    write bots.

    View Slide

  16. How to write a good Bot
    Defined Useragent, with URL and Defined set of IP addresses.
    Follow site guidance for writing a bot.
    Use Public API where available - check terms and conditions.
    Follow the Robots.txt file, if no specific guidance use Googlebot.
    Generally be a good citizen, test and take care.

    View Slide

  17. Conclusions
    Blocking Bad Bots can increase site performance, increase scale, reduce site
    slows and maintain competitive advantage. Protect data / brand reputation.
    Bot mitigation can prevent more serious security problems (Login / Password
    theft, Security Vulnerabilities being detected and exploited)
    Blocking Bad Bots shouldn’t impact innovation - You should still be able to re-
    purpose data, just be ‘good’

    View Slide

  18. Questions, Comments ?
    [email protected]
    www.distilnetworks.com
    London Web Performance October 2015

    View Slide