Big Data for Web Application Security

Big Data for Web Application Security

The security posture of an application is directly proportional to the amount of information that is known about the application. Although the advantages of analytics from a data science perspective are well known and well documented, the advantages of analytics from a web application security perspective are neither well known nor well documented. How can we, as web application security practitioners, take advantage of big data stacks to improve the security posture of our applications? This talk will dive into the ways that big data analytics can be taken advantage of to create effective defenses for web applications today. We'll outline the fundamental problems that can and should be solved with big data and outline the classes of security mechanisms that simply, based on their nature, cannot be solved with big data. Once an understanding of the domain is established, we'll explore several specific examples that outline how one security team uses big data every day to solve hard, interesting problems and create a safer experience for its users.

Bc60a5fc6a131ea6cfa80e000b40c743?s=128

Mike Arpaia

August 01, 2013
Tweet

Transcript

  1. Big Data for Web Application Security Mike Arpaia Kyle Barry

  2. None
  3. Mike Arpaia Senior Software Engineer @mikearpaia

  4. Mike Arpaia Senior Software Engineer @mikearpaia Kyle Barry Security Engineering

    Manager @allofmywats
  5. https://www.etsy.com/listing/92868829/the-oh-my-orange-elephant-designer-wall

  6. https://www.etsy.com/listing/100312293/keep-calm-and-carry-yarn-digital

  7. http://www.etsy.com/listing/116016218/atomic-orbits-chemistry-fat-quarter

  8. https://www.etsy.com/listing/104411356/leather-iphone-44s-case-slipcover-sleeve

  9. MapReduce

  10. Disk Performance 0 500 1000 1500 2000 1998 1999 2000

    2001 2002 2003 2004 2005 2006 2007 2008 Capacity in GB
  11. Disk Performance 0 500 1000 1500 2000 1998 1999 2000

    2001 2002 2003 2004 2005 2006 2007 2008 Capacity in GB Transfer Rate in GB/s
  12. Let’s add disks! 0 275 550 825 1100 1 2

    3 4 5 6 7 8 9 10 Seconds it takes to read 1 TB of data at 1 GB/s
  13. Sounds good.

  14. Good for ad-hoc, whole dataset analysis Linearly scalable programming model

  15. Cascading

  16. Complex Workflows

  17. Less lines of code

  18. Minimal barrier to entry

  19. Etsy’s Workflow

  20. Awesome Data Team

  21. 5 steps to create a job

  22. 1. Write job

  23. 2. Run job

  24. 3. Verify job output

  25. 4. Commit job

  26. 5. Schedule job

  27. Be happy

  28. Security Mechanisms

  29. First, a thesis

  30. The security posture of your application is directly proportional to

    how much you know about your application.
  31. Reactive Security

  32. Real-time event monitoring and alerting Events that trigger immediate response

    You always query the same data and you do it often
  33. None
  34. graphite

  35. None
  36. Proactive Security

  37. Things we do now to protect us later Actions taken

    to prevent future compromise
  38. None
  39. None
  40. Incident Response

  41. Ad-hoc analysis of a large dataset Driven by an event

    or incident You’re not going to do it more than once Needs to be fast
  42. None
  43. Gather data to create reactive security mechanisms Gather data to

    create proactive security mechanisms Directly create a new proactive security mechanism Perform incident response
  44. Gather data to create reactive security mechanisms Gather data to

    create proactive security mechanisms Directly create a new proactive security mechanism Perform incident response
  45. Gather data to create reactive security mechanisms Gather data to

    create proactive security mechanisms Directly create new proactive security mechanisms Perform incident response
  46. Gather data to create reactive security mechanisms Gather data to

    create proactive security mechanisms Directly create new proactive security mechanisms Perform incident response
  47. None
  48. Reactive Security

  49. None
  50. None
  51. None
  52. None
  53. Use analytics to set thresholds

  54. Proactive Security

  55. Goal Full-site SSL for all Etsy sellers

  56. analytics_cascade do analytics_flow do analytics_source 'event_logs' tap_db_snapshot 'users_index' assembly 'event_logs'

    do group_by 'user_id', 'scheme' do count 'value' end end assembly 'users_index' do project 'user_id', 'is_seller' end assembly 'ssl_traffic' do project 'user_id', 'is_seller', 'scheme', 'value' group_by 'is_seller', 'scheme' do count 'value' end end analytics_sink 'ssl_traffic' end end
  57. None
  58. Incident Response

  59. • URL Patterns • IP Addresses Simple Patterns

  60. analytics_cascade do analytics_flow do analytics_source 'access_logs' assembly 'incident_response' do query_event

    'timestamp', 'request_uri', 'useragent', 'ip' where '"/bad_url.php'".equals(request_uri:string) group_by ’url’ do count 'value' end end analytics_sink 'incident_response' end end
  61. None