Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

pbailis
June 03, 2015

Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

SIGMOD 2015
3 June, Melbourne, Australia

Paper URL: http://www.bailis.org/papers/feral-sigmod2015.pdf

Abstract

The rise of data-intensive “Web 2.0” Internet services has led to a range of popular new programming frameworks that collectively embody the latest incarnation of the vision of Object-Relational Mapping (ORM) systems, albeit at unprecedented scale. In this work, we empirically investigate modern ORM-backed applications’ use and disuse of database concurrency control mechanisms. Specifically, we focus our study on the common use of feral, or application-level, mechanisms for maintaining database integrity, which, across a range of ORM systems, often take the form of declarative correctness criteria, or invariants. We quantitatively analyzethe use of these mechanisms in a range of open source applications written using the Ruby on Rails ORM and find that feral invariants are the most popular means of ensuring integrity (and, by usage, are over 37 times more popular than transactions). We evaluate which of these feral invariants actually ensure integrity (by usage, up to 86.9%) and which—due to concurrency errors and lack of database support—may lead to data corruption (the remainder), which we experimentally quantify. In light of these findings, we present recommendations for database system designers for better supporting these modern ORM programming patterns, thus eliminating their adverse effects on application integrity.

pbailis

June 03, 2015
Tweet

More Decks by pbailis

Other Decks in Technology

Transcript

  1. FERAL!DPODVSSFODZ!DPOUSPM;! AN EMPIRICAL INVESTIGATION OF MODERN APPLICATION INTEGRITY PETER BAILIS,

    Alan Fekete, Mike Franklin, Ali Ghodsi, Joe Hellerstein, Ion Stoica UC Berkeley and University of Sydney SIGMOD 2015 3 June, Melbourne, Australia
  2. “I don’t want my database to be clever! …Stored procedures

    and constraints [are] vile and reckless destroyers of coherence. No, Mr. Database, you can not have my business logic…you’ll have to pry [it] from my dead, cold object- oriented hands . . .” 3.) Opinionated
  3. For our purposes: application server in three-tier model Rails Rails

    Rails Web Server DBMS Requests processed concurrently
  4. For our purposes: application server in three-tier model Rails Rails

    Rails Web Server DBMS Requests processed concurrently So how do applications handle concurrency?
  5. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter- server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig
  6. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables
  7. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table
  8. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table Transactions are not popular!
  9. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table Transactions are not popular! So how do apps ensure integrity?
  10. 1.) Transactions: in business logic 2.) Validations: invariants on schema

    class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end Rails Integrity Mechanisms
  11. 3.) Associations: relational integrity 1.) Transactions: in business logic 2.)

    Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end Rails Integrity Mechanisms
  12. 3.) Associations: relational integrity 1.) Transactions: in business logic 2.)

    Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end class Order < ActiveRecord::Base belongs_to :customer end Rails Integrity Mechanisms
  13. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table
  14. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table
  15. ALTERNATIVES MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy

    browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table
  16. ALTERNATIVES MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy

    browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table WHY?
  17. 3.) Associations: relational integrity 1.) Transactions: in business logic 2.)

    Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end class Order < ActiveRecord::Base belongs_to :customer end Rails Integrity Mechanisms
  18. ARE - OPAQUE TO DBMS - IMPLEMENTED IN RAILS LOGIC

    3.) Associations: relational integrity 2.) Validations: invariants on schema
  19. ARE - OPAQUE TO DBMS - IMPLEMENTED IN RAILS LOGIC

    FERAL CONCURRENCY CONTROL 3.) Associations: relational integrity 2.) Validations: invariants on schema
  20. Rails Execution Model Requests processed concurrently Validations run concurrently! CAN’T

    WE USE TXNS? Rails Rails Rails Web Server DBMS INSUFFICIENT DUE TO WEAK ISOLATION
  21. How can we tell if a given validation will enforce

    integrity under concurrent execution?
  22. Key idea: Check if validations can be violated by “merging”

    independent operations ICT: Invariant Confluence Test [VLDB 2015] How can we tell if a given validation will enforce integrity under concurrent execution?
  23. Key idea: Check if validations can be violated by “merging”

    independent operations ICT: Invariant Confluence Test [VLDB 2015]
  24. VALIDATION: User IDs are unique OPERATION: Save new user MERGE:

    Add both records to DB Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]
  25. VALIDATION: User IDs are unique OPERATION: Save new user MERGE:

    Add both records to DB {{Stu,ID=1}, {Jan,ID=1}} Validation fails! {} MERGE add {Stu,ID=1} add {Ann,ID=1} Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]
  26. Key idea: Check if validations can be violated by “merging”

    independent operations VALIDATION: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB ICT: Invariant Confluence Test [VLDB 2015]
  27. Key idea: Check if validations can be violated by “merging”

    independent operations VALIDATION: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Jan,ID=1}} Validation holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]
  28. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table
  29. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table 86.9% PASS ICT
  30. Do they work? Rails developers prefer feral mechanisms Yes, safe!

    86.9% PASS ICT 13.1% FAIL ICT No, unsafe!
  31. Do they work? Rails developers prefer feral mechanisms Yes, safe!

    86.9% PASS ICT 13.1% FAIL ICT No, unsafe! How bad are these 13.1%?
  32. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  33. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  34. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  35. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  36. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken
  37. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  38. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  39. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  40. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  41. Takeaway: Rails’s feral validations, though popular, do not guarantee integrity!

    Not just Rails! ORM Supports Feral Validation? Java Persistence API Yes Hibernate (Java) Yes CakePHP Yes Laravel (PHP) Yes Django (Python) Yes Waterline (node.js) Yes
  42. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly
  43. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time)
  44. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time) 3.) DBs should facilitate portable deployment of applications
  45. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time) 3.) DBs should facilitate portable deployment of applications RESEARCH GOLDMINE USERS CARE! +
  46. A broader opportunity How are our abstractions performing? Do people

    like using transactions? Do people like using the relational model? Do people like using XML?
  47. A broader opportunity How are our abstractions performing? Open source

    provides a quantitative lens! Do people like using transactions? Do people like using the relational model? Do people like using XML?
  48. Open source provides a quantitative lens! 0.0 0.2 0.4 0.6

    0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Commits Authored (CDF) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Authors 0.0 0.2 0.4 0.6 0.8 1.0 Valid/Assoc Authored (CDF) Asking creative questions is the hardest part!
  49. Conclusions • Feral validations are surprisingly common in many modern

    open source web applications • By usage, many validations are safe, but a substantial fraction are not and may allow data corruption • Providing proper RDBMS support for these validations is a pressing research challenge for the community • Open source is an exciting, untapped resource; be bold!