Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data for Web Application Security

Big Data for Web Application Security

The security posture of an application is directly proportional to the amount of information that is known about the application. Although the advantages of analytics from a data science perspective are well known and well documented, the advantages of analytics from a web application security perspective are neither well known nor well documented. How can we, as web application security practitioners, take advantage of big data stacks to improve the security posture of our applications? This talk will dive into the ways that big data analytics can be taken advantage of to create effective defenses for web applications today. We'll outline the fundamental problems that can and should be solved with big data and outline the classes of security mechanisms that simply, based on their nature, cannot be solved with big data. Once an understanding of the domain is established, we'll explore several specific examples that outline how one security team uses big data every day to solve hard, interesting problems and create a safer experience for its users.

Mike Arpaia

August 01, 2013
Tweet

More Decks by Mike Arpaia

Other Decks in Programming

Transcript

  1. Big Data for Web
    Application Security
    Mike Arpaia
    Kyle Barry

    View Slide

  2. View Slide

  3. Mike Arpaia
    Senior Software Engineer
    @mikearpaia

    View Slide

  4. Mike Arpaia
    Senior Software Engineer
    @mikearpaia
    Kyle Barry
    Security Engineering Manager
    @allofmywats

    View Slide

  5. https://www.etsy.com/listing/92868829/the-oh-my-orange-elephant-designer-wall

    View Slide

  6. https://www.etsy.com/listing/100312293/keep-calm-and-carry-yarn-digital

    View Slide

  7. http://www.etsy.com/listing/116016218/atomic-orbits-chemistry-fat-quarter

    View Slide

  8. https://www.etsy.com/listing/104411356/leather-iphone-44s-case-slipcover-sleeve

    View Slide

  9. MapReduce

    View Slide

  10. Disk Performance
    0
    500
    1000
    1500
    2000
    1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
    Capacity in GB

    View Slide

  11. Disk Performance
    0
    500
    1000
    1500
    2000
    1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
    Capacity in GB Transfer Rate in GB/s

    View Slide

  12. Let’s add disks!
    0
    275
    550
    825
    1100
    1 2 3 4 5 6 7 8 9 10
    Seconds it takes to read 1 TB of data at 1 GB/s

    View Slide

  13. Sounds good.

    View Slide

  14. Good for ad-hoc, whole dataset analysis
    Linearly scalable programming model

    View Slide

  15. Cascading

    View Slide

  16. Complex Workflows

    View Slide

  17. Less lines of code

    View Slide

  18. Minimal barrier to
    entry

    View Slide

  19. Etsy’s Workflow

    View Slide

  20. Awesome Data Team

    View Slide

  21. 5 steps to create a job

    View Slide

  22. 1. Write job

    View Slide

  23. 2. Run job

    View Slide

  24. 3. Verify job output

    View Slide

  25. 4. Commit job

    View Slide

  26. 5. Schedule job

    View Slide

  27. Be happy

    View Slide

  28. Security
    Mechanisms

    View Slide

  29. First, a thesis

    View Slide

  30. The security posture of your
    application is directly
    proportional to how much
    you know about your
    application.

    View Slide

  31. Reactive Security

    View Slide

  32. Real-time event monitoring and alerting
    Events that trigger immediate response
    You always query the same data and you
    do it often

    View Slide

  33. View Slide

  34. graphite

    View Slide

  35. View Slide

  36. Proactive Security

    View Slide

  37. Things we do now to protect us later
    Actions taken to prevent future
    compromise

    View Slide

  38. View Slide

  39. View Slide

  40. Incident Response

    View Slide

  41. Ad-hoc analysis of a large dataset
    Driven by an event or incident
    You’re not going to do it more than once
    Needs to be fast

    View Slide

  42. View Slide

  43. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create a new proactive security
    mechanism
    Perform incident response

    View Slide

  44. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create a new proactive security
    mechanism
    Perform incident response

    View Slide

  45. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create new proactive security
    mechanisms
    Perform incident response

    View Slide

  46. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create new proactive security
    mechanisms
    Perform incident response

    View Slide

  47. View Slide

  48. Reactive Security

    View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. Use analytics to set
    thresholds

    View Slide

  54. Proactive Security

    View Slide

  55. Goal
    Full-site SSL for all
    Etsy sellers

    View Slide

  56. analytics_cascade do
    analytics_flow do
    analytics_source 'event_logs'
    tap_db_snapshot 'users_index'
    assembly 'event_logs' do
    group_by 'user_id', 'scheme' do
    count 'value'
    end
    end
    assembly 'users_index' do
    project 'user_id', 'is_seller'
    end
    assembly 'ssl_traffic' do
    project 'user_id', 'is_seller', 'scheme', 'value'
    group_by 'is_seller', 'scheme' do
    count 'value'
    end
    end
    analytics_sink 'ssl_traffic'
    end
    end

    View Slide

  57. View Slide

  58. Incident Response

    View Slide

  59. • URL Patterns
    • IP Addresses
    Simple Patterns

    View Slide

  60. analytics_cascade do
    analytics_flow do
    analytics_source 'access_logs'
    assembly 'incident_response' do
    query_event 'timestamp', 'request_uri', 'useragent', 'ip'
    where '"/bad_url.php'".equals(request_uri:string)
    group_by ’url’ do
    count 'value'
    end
    end
    analytics_sink 'incident_response'
    end
    end

    View Slide

  61. View Slide