Slide 1

Slide 1 text

FERAL!DPODVSSFODZ!DPOUSPM;! AN EMPIRICAL INVESTIGATION OF MODERN APPLICATION INTEGRITY PETER BAILIS, Alan Fekete, Mike Franklin, Ali Ghodsi, Joe Hellerstein, Ion Stoica UC Berkeley and University of Sydney SIGMOD 2015 3 June, Melbourne, Australia

Slide 2

Slide 2 text

How do application developers use concurrency control today?

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

1.) Adopted

Slide 5

Slide 5 text

1.) Adopted

Slide 6

Slide 6 text

2.) Idiomatic

Slide 7

Slide 7 text

2.) Idiomatic

Slide 8

Slide 8 text

2.) Idiomatic

Slide 9

Slide 9 text

2.) Idiomatic

Slide 10

Slide 10 text

3.) Opinionated

Slide 11

Slide 11 text

“I don’t want my database to be clever! …Stored procedures and constraints [are] vile and reckless destroyers of coherence. No, Mr. Database, you can not have my business logic…you’ll have to pry [it] from my dead, cold object- oriented hands . . .” 3.) Opinionated

Slide 12

Slide 12 text

How do developers use concurrency control today?

Slide 13

Slide 13 text

1.) Adopted 2.) Idiomatic 3.) Opinionated How do developers use concurrency control today?

Slide 14

Slide 14 text

For our purposes: application server in three-tier model

Slide 15

Slide 15 text

For our purposes: application server in three-tier model Web Server

Slide 16

Slide 16 text

For our purposes: application server in three-tier model Web Server DBMS

Slide 17

Slide 17 text

For our purposes: application server in three-tier model Rails Rails Rails Web Server DBMS

Slide 18

Slide 18 text

For our purposes: application server in three-tier model Rails Rails Rails Web Server DBMS

Slide 19

Slide 19 text

For our purposes: application server in three-tier model Rails Rails Rails Web Server DBMS Requests processed concurrently

Slide 20

Slide 20 text

For our purposes: application server in three-tier model Rails Rails Rails Web Server DBMS Requests processed concurrently So how do applications handle concurrency?

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter- server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig

Slide 23

Slide 23 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables

Slide 24

Slide 24 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table

Slide 25

Slide 25 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table Transactions are not popular!

Slide 26

Slide 26 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table Transactions are not popular! So how do apps ensure integrity?

Slide 27

Slide 27 text

Rails Integrity Mechanisms

Slide 28

Slide 28 text

1.) Transactions: in business logic Rails Integrity Mechanisms

Slide 29

Slide 29 text

1.) Transactions: in business logic 2.) Validations: invariants on schema Rails Integrity Mechanisms

Slide 30

Slide 30 text

1.) Transactions: in business logic 2.) Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end Rails Integrity Mechanisms

Slide 31

Slide 31 text

3.) Associations: relational integrity 1.) Transactions: in business logic 2.) Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end Rails Integrity Mechanisms

Slide 32

Slide 32 text

3.) Associations: relational integrity 1.) Transactions: in business logic 2.) Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end class Order < ActiveRecord::Base belongs_to :customer end Rails Integrity Mechanisms

Slide 33

Slide 33 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table

Slide 34

Slide 34 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table

Slide 35

Slide 35 text

ALTERNATIVES MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table

Slide 36

Slide 36 text

ALTERNATIVES MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table WHY?

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

IT’S ALL ABOUT IDIOMS!

Slide 39

Slide 39 text

Validations and associations are core Rails concepts IT’S ALL ABOUT IDIOMS!

Slide 40

Slide 40 text

Validations and associations are core Rails concepts IT’S ALL ABOUT IDIOMS!

Slide 41

Slide 41 text

Validations and associations are core Rails concepts IT’S ALL ABOUT IDIOMS!

Slide 42

Slide 42 text

3.) Associations: relational integrity 1.) Transactions: in business logic 2.) Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end class Order < ActiveRecord::Base belongs_to :customer end Rails Integrity Mechanisms

Slide 43

Slide 43 text

3.) Associations: relational integrity 2.) Validations: invariants on schema

Slide 44

Slide 44 text

3.) Associations: relational integrity 2.) Validations: invariants on schema

Slide 45

Slide 45 text

ARE - OPAQUE TO DBMS 3.) Associations: relational integrity 2.) Validations: invariants on schema

Slide 46

Slide 46 text

ARE - OPAQUE TO DBMS - IMPLEMENTED IN RAILS LOGIC 3.) Associations: relational integrity 2.) Validations: invariants on schema

Slide 47

Slide 47 text

ARE - OPAQUE TO DBMS - IMPLEMENTED IN RAILS LOGIC FERAL CONCURRENCY CONTROL 3.) Associations: relational integrity 2.) Validations: invariants on schema

Slide 48

Slide 48 text

Rails developers prefer feral mechanisms

Slide 49

Slide 49 text

Rails developers prefer feral mechanisms Do they work?

Slide 50

Slide 50 text

Rails developers prefer feral mechanisms Do they work? What does this mean for databases?

Slide 51

Slide 51 text

Rails developers prefer feral mechanisms Do they work? What does this mean for databases?

Slide 52

Slide 52 text

Rails Execution Model Requests processed concurrently Rails Rails Rails Web Server DBMS

Slide 53

Slide 53 text

Rails Execution Model Requests processed concurrently Rails Rails Rails Web Server DBMS

Slide 54

Slide 54 text

Rails Execution Model Requests processed concurrently Validations run concurrently! Rails Rails Rails Web Server DBMS

Slide 55

Slide 55 text

Rails Execution Model Requests processed concurrently Validations run concurrently! CAN’T WE USE TXNS? Rails Rails Rails Web Server DBMS

Slide 56

Slide 56 text

Rails Execution Model Requests processed concurrently Validations run concurrently! CAN’T WE USE TXNS? Rails Rails Rails Web Server DBMS INSUFFICIENT DUE TO WEAK ISOLATION

Slide 57

Slide 57 text

How can we tell if a given validation will enforce integrity under concurrent execution?

Slide 58

Slide 58 text

Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015] How can we tell if a given validation will enforce integrity under concurrent execution?

Slide 59

Slide 59 text

Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]

Slide 60

Slide 60 text

VALIDATION: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]

Slide 61

Slide 61 text

VALIDATION: User IDs are unique OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Jan,ID=1}} Validation fails! {} MERGE add {Stu,ID=1} add {Ann,ID=1} Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]

Slide 62

Slide 62 text

Key idea: Check if validations can be violated by “merging” independent operations VALIDATION: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB ICT: Invariant Confluence Test [VLDB 2015]

Slide 63

Slide 63 text

Key idea: Check if validations can be violated by “merging” independent operations VALIDATION: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Jan,ID=1}} Validation holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]

Slide 64

Slide 64 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table

Slide 65

Slide 65 text

adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table 86.9% PASS ICT

Slide 66

Slide 66 text

Do they work? Rails developers prefer feral mechanisms

Slide 67

Slide 67 text

Do they work? Rails developers prefer feral mechanisms 86.9% PASS ICT

Slide 68

Slide 68 text

Do they work? Rails developers prefer feral mechanisms Yes, safe! 86.9% PASS ICT

Slide 69

Slide 69 text

Do they work? Rails developers prefer feral mechanisms Yes, safe! 86.9% PASS ICT 13.1% FAIL ICT

Slide 70

Slide 70 text

Do they work? Rails developers prefer feral mechanisms Yes, safe! 86.9% PASS ICT 13.1% FAIL ICT No, unsafe!

Slide 71

Slide 71 text

Do they work? Rails developers prefer feral mechanisms Yes, safe! 86.9% PASS ICT 13.1% FAIL ICT No, unsafe! How bad are these 13.1%?

Slide 72

Slide 72 text

Uniqueness validation is broken

Slide 73

Slide 73 text

Uniqueness validation is broken FAILS ICT

Slide 74

Slide 74 text

1 2 4 8 16 32 64 Number of Rails Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT

Slide 75

Slide 75 text

1 2 4 8 16 32 64 Number of Rails Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT

Slide 76

Slide 76 text

1 2 4 8 16 32 64 Number of Rails Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT

Slide 77

Slide 77 text

1 2 4 8 16 32 64 Number of Rails Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT

Slide 78

Slide 78 text

Association validation is broken

Slide 79

Slide 79 text

1 2 4 8 16 32 64 Number of Rails Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken

Slide 80

Slide 80 text

1 2 4 8 16 32 64 Number of Rails Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT

Slide 81

Slide 81 text

1 2 4 8 16 32 64 Number of Rails Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT

Slide 82

Slide 82 text

1 2 4 8 16 32 64 Number of Rails Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT

Slide 83

Slide 83 text

1 2 4 8 16 32 64 Number of Rails Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT

Slide 84

Slide 84 text

Takeaway: Rails’s feral validations, though popular, do not guarantee integrity!

Slide 85

Slide 85 text

Takeaway: Rails’s feral validations, though popular, do not guarantee integrity! Not just Rails! ORM Supports Feral Validation? Java Persistence API Yes Hibernate (Java) Yes CakePHP Yes Laravel (PHP) Yes Django (Python) Yes Waterline (node.js) Yes

Slide 86

Slide 86 text

Databases are providing the wrong abstractions:

Slide 87

Slide 87 text

Databases are providing the wrong abstractions: 1.) DBs should allow apps to express validations natively, while enforcing them correctly

Slide 88

Slide 88 text

Databases are providing the wrong abstractions: 1.) DBs should allow apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time)

Slide 89

Slide 89 text

Databases are providing the wrong abstractions: 1.) DBs should allow apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time) 3.) DBs should facilitate portable deployment of applications

Slide 90

Slide 90 text

Databases are providing the wrong abstractions: 1.) DBs should allow apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time) 3.) DBs should facilitate portable deployment of applications RESEARCH GOLDMINE USERS CARE! +

Slide 91

Slide 91 text

How do application developers use concurrency control today?

Slide 92

Slide 92 text

A broader opportunity

Slide 93

Slide 93 text

A broader opportunity How are our abstractions performing?

Slide 94

Slide 94 text

A broader opportunity How are our abstractions performing? Do people like using transactions? Do people like using the relational model? Do people like using XML?

Slide 95

Slide 95 text

A broader opportunity How are our abstractions performing? Open source provides a quantitative lens! Do people like using transactions? Do people like using the relational model? Do people like using XML?

Slide 96

Slide 96 text

Open source provides a quantitative lens!

Slide 97

Slide 97 text

Open source provides a quantitative lens! 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Commits Authored (CDF) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Authors 0.0 0.2 0.4 0.6 0.8 1.0 Valid/Assoc Authored (CDF) Asking creative questions is the hardest part!

Slide 98

Slide 98 text

Conclusions • Feral validations are surprisingly common in many modern open source web applications • By usage, many validations are safe, but a substantial fraction are not and may allow data corruption • Providing proper RDBMS support for these validations is a pressing research challenge for the community • Open source is an exciting, untapped resource; be bold!