Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Driven Web Application Security

Data Driven Web Application Security

The security posture of an application is directly proportional to the amount of information that is known about the application. How can we, as web application security practitioners, take advantage of application metrics to improve the security posture of our product? This talk will explore the ways that application data and metrics can be taken advantage of to create effective defenses for web applications today. We’ll outline the fundamental classes of web application security mechanisms and once an understanding of the domain is established, we’ll explore several specific examples that outline how Etsy’s security team uses metrics, analytics and big data every day to solve hard, interesting problems and create a safer experience for millions of users all over the world.

Mike Arpaia

August 30, 2013
Tweet

More Decks by Mike Arpaia

Other Decks in Technology

Transcript

  1. Data Driven Web
    Application Security
    Mike Arpaia
    Kyle Barry

    View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. Mike Arpaia
    Senior Software Engineer
    @mikearpaia

    View Slide

  6. Mike Arpaia
    Senior Software Engineer
    @mikearpaia
    Kyle Barry
    Security Engineering Manager
    @allofmywats

    View Slide

  7. https://www.etsy.com/listing/92868829/the-oh-my-orange-elephant-designer-wall

    View Slide

  8. https://www.etsy.com/listing/116016218/atomic-orbits-chemistry-fat-quarter

    View Slide

  9. https://www.etsy.com/listing/104411356/leather-iphone-44s-case-slipcover-sleeve

    View Slide

  10. Data
    Infrastructure

    View Slide

  11. Graphite

    View Slide

  12. View Slide

  13. https://github.com/etsy/dashboard

    View Slide

  14. https://github.com/etsy/statsd

    View Slide

  15. if is_xss($request_params) {
    StatsD::increment('security.potenital_xss');
    }

    View Slide

  16. View Slide

  17. View Slide

  18. Splunk

    View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. MySQL

    View Slide

  24. View Slide

  25. Sharded application
    data
    http://www.slideshare.net/jgoulah/the-etsy-shard-architecture-starts-with-s-and-ends-with-hard

    View Slide

  26. Dozens of database
    servers

    View Slide

  27. Hundreds of tables

    View Slide

  28. Postgres

    View Slide

  29. Legacy

    View Slide

  30. Hadoop

    View Slide

  31. MapReduce

    View Slide

  32. Disk Performance
    0
    500
    1000
    1500
    2000
    1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
    Capacity in GB

    View Slide

  33. Disk Performance
    0
    500
    1000
    1500
    2000
    1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
    Capacity in GB Transfer Rate in GB/s

    View Slide

  34. Let’s add disks!
    0
    275
    550
    825
    1100
    1 2 3 4 5 6 7 8 9 10
    Seconds it takes to read 1 TB of data at 1 GB/s

    View Slide

  35. Sounds good.

    View Slide

  36. Good for ad-hoc, whole dataset analysis
    Linearly scalable programming model

    View Slide

  37. MySQL data

    View Slide

  38. Event logs
    &
    Visit logs

    View Slide

  39. Cascading

    View Slide

  40. Complex Workflows

    View Slide

  41. Less lines of code

    View Slide

  42. Minimal barrier to
    entry

    View Slide

  43. Awesome Data Team

    View Slide

  44. 96 cores

    View Slide

  45. 384 GB of RAM

    View Slide

  46. 24 TB of storage

    View Slide

  47. ...per 2U of rack space

    View Slide

  48. 160 nodes
    960 TB storage
    3840 cores
    15 TB of RAM

    View Slide

  49. Vertica

    View Slide

  50. Proprietary

    View Slide

  51. Columnar

    View Slide

  52. Postgres-like syntax

    View Slide

  53. MySQL + Postgres

    View Slide

  54. Fast analytics

    View Slide

  55. Security
    Mechanisms

    View Slide

  56. First, a thesis

    View Slide

  57. The security posture of your
    application is directly
    proportional to how much
    you know about your
    application.

    View Slide

  58. Reactive Security

    View Slide

  59. Real-time event monitoring and alerting
    Events that trigger immediate response
    You always query the same data and you
    do it often

    View Slide

  60. View Slide

  61. graphite

    View Slide

  62. View Slide

  63. Proactive Security

    View Slide

  64. Things we do now to protect us later
    Actions taken to prevent future
    compromise

    View Slide

  65. View Slide

  66. View Slide

  67. Incident Response

    View Slide

  68. Ad-hoc analysis of a large dataset
    Driven by an event or incident
    You’re not going to do it more than once
    Needs to be fast

    View Slide

  69. View Slide

  70. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create a new proactive security
    mechanism
    Perform incident response

    View Slide

  71. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create a new proactive security
    mechanism
    Perform incident response

    View Slide

  72. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create new proactive security
    mechanisms
    Perform incident response

    View Slide

  73. Gather data to create reactive security
    mechanisms
    Gather data to create proactive security
    mechanisms
    Directly create new proactive security
    mechanisms
    Perform incident response

    View Slide

  74. Case Studies

    View Slide

  75. Reactive Security

    View Slide

  76. View Slide

  77. Alerting

    View Slide

  78. View Slide

  79. View Slide

  80. View Slide

  81. Use analytics to set
    thresholds

    View Slide

  82. Reporting

    View Slide

  83. SuperBIT

    View Slide

  84. View Slide

  85. Putting it together

    View Slide

  86. View Slide

  87. Proactive Security

    View Slide

  88. Goal
    Full-site SSL for all
    Etsy sellers

    View Slide

  89. analytics_cascade do
    analytics_flow do
    analytics_source 'event_logs'
    tap_db_snapshot 'users_index'
    assembly 'event_logs' do
    group_by 'user_id', 'scheme' do
    count 'value'
    end
    end
    assembly 'users_index' do
    project 'user_id', 'is_seller'
    end
    assembly 'ssl_traffic' do
    project 'user_id', 'is_seller', 'scheme', 'value'
    group_by 'is_seller', 'scheme' do
    count 'value'
    end
    end
    analytics_sink 'ssl_traffic'
    end
    end

    View Slide

  90. View Slide

  91. Keeping current

    View Slide

  92. Two Factor
    Authentication

    View Slide

  93. View Slide

  94. Do Etsy app users
    use two factor auth?

    View Slide

  95. Splunk & Vertica

    View Slide

  96. Proactively Realtime

    View Slide

  97. Content Security
    Policy Violations

    View Slide

  98. View Slide

  99. Incident Response

    View Slide

  100. Needle in a haystack

    View Slide

  101. • URL Patterns
    • IP Addresses
    Simple Patterns

    View Slide

  102. analytics_cascade do
    analytics_flow do
    analytics_source 'access_logs'
    assembly 'incident_response' do
    query_event 'timestamp', 'request_uri', 'useragent', 'ip'
    where '"/bad_url.php'".equals(request_uri:string)
    group_by ’url’ do
    count 'value'
    end
    end
    analytics_sink 'incident_response'
    end
    end

    View Slide

  103. Phishing Attack
    In Two Parts

    View Slide

  104. Part One

    View Slide

  105. View Slide

  106. View Slide

  107. Part Two

    View Slide

  108. source=”access_logs” client_ip=10.163.2.3 | transaction request_uri

    View Slide

  109. Collusion Fraud

    View Slide

  110. Look for patterns
    Incident Response

    View Slide

  111. Set up monitoring
    Be reactive

    View Slide

  112. Stay Aware
    Get proactive

    View Slide

  113. Conclusions

    View Slide

  114. Instrument your application at length
    Understand security mechanisms
    Use your data and use it often

    View Slide

  115. View Slide