Slide 1

Slide 1 text

Big Data for Web Application Security Mike Arpaia Kyle Barry

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Mike Arpaia Senior Software Engineer @mikearpaia

Slide 4

Slide 4 text

Mike Arpaia Senior Software Engineer @mikearpaia Kyle Barry Security Engineering Manager @allofmywats

Slide 5

Slide 5 text

https://www.etsy.com/listing/92868829/the-oh-my-orange-elephant-designer-wall

Slide 6

Slide 6 text

https://www.etsy.com/listing/100312293/keep-calm-and-carry-yarn-digital

Slide 7

Slide 7 text

http://www.etsy.com/listing/116016218/atomic-orbits-chemistry-fat-quarter

Slide 8

Slide 8 text

https://www.etsy.com/listing/104411356/leather-iphone-44s-case-slipcover-sleeve

Slide 9

Slide 9 text

MapReduce

Slide 10

Slide 10 text

Disk Performance 0 500 1000 1500 2000 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Capacity in GB

Slide 11

Slide 11 text

Disk Performance 0 500 1000 1500 2000 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Capacity in GB Transfer Rate in GB/s

Slide 12

Slide 12 text

Let’s add disks! 0 275 550 825 1100 1 2 3 4 5 6 7 8 9 10 Seconds it takes to read 1 TB of data at 1 GB/s

Slide 13

Slide 13 text

Sounds good.

Slide 14

Slide 14 text

Good for ad-hoc, whole dataset analysis Linearly scalable programming model

Slide 15

Slide 15 text

Cascading

Slide 16

Slide 16 text

Complex Workflows

Slide 17

Slide 17 text

Less lines of code

Slide 18

Slide 18 text

Minimal barrier to entry

Slide 19

Slide 19 text

Etsy’s Workflow

Slide 20

Slide 20 text

Awesome Data Team

Slide 21

Slide 21 text

5 steps to create a job

Slide 22

Slide 22 text

1. Write job

Slide 23

Slide 23 text

2. Run job

Slide 24

Slide 24 text

3. Verify job output

Slide 25

Slide 25 text

4. Commit job

Slide 26

Slide 26 text

5. Schedule job

Slide 27

Slide 27 text

Be happy

Slide 28

Slide 28 text

Security Mechanisms

Slide 29

Slide 29 text

First, a thesis

Slide 30

Slide 30 text

The security posture of your application is directly proportional to how much you know about your application.

Slide 31

Slide 31 text

Reactive Security

Slide 32

Slide 32 text

Real-time event monitoring and alerting Events that trigger immediate response You always query the same data and you do it often

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

graphite

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Proactive Security

Slide 37

Slide 37 text

Things we do now to protect us later Actions taken to prevent future compromise

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Incident Response

Slide 41

Slide 41 text

Ad-hoc analysis of a large dataset Driven by an event or incident You’re not going to do it more than once Needs to be fast

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Gather data to create reactive security mechanisms Gather data to create proactive security mechanisms Directly create a new proactive security mechanism Perform incident response

Slide 44

Slide 44 text

Gather data to create reactive security mechanisms Gather data to create proactive security mechanisms Directly create a new proactive security mechanism Perform incident response

Slide 45

Slide 45 text

Gather data to create reactive security mechanisms Gather data to create proactive security mechanisms Directly create new proactive security mechanisms Perform incident response

Slide 46

Slide 46 text

Gather data to create reactive security mechanisms Gather data to create proactive security mechanisms Directly create new proactive security mechanisms Perform incident response

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Reactive Security

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

Use analytics to set thresholds

Slide 54

Slide 54 text

Proactive Security

Slide 55

Slide 55 text

Goal Full-site SSL for all Etsy sellers

Slide 56

Slide 56 text

analytics_cascade do analytics_flow do analytics_source 'event_logs' tap_db_snapshot 'users_index' assembly 'event_logs' do group_by 'user_id', 'scheme' do count 'value' end end assembly 'users_index' do project 'user_id', 'is_seller' end assembly 'ssl_traffic' do project 'user_id', 'is_seller', 'scheme', 'value' group_by 'is_seller', 'scheme' do count 'value' end end analytics_sink 'ssl_traffic' end end

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

Incident Response

Slide 59

Slide 59 text

• URL Patterns • IP Addresses Simple Patterns

Slide 60

Slide 60 text

analytics_cascade do analytics_flow do analytics_source 'access_logs' assembly 'incident_response' do query_event 'timestamp', 'request_uri', 'useragent', 'ip' where '"/bad_url.php'".equals(request_uri:string) group_by ’url’ do count 'value' end end analytics_sink 'incident_response' end end

Slide 61

Slide 61 text

No content