Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

B7dc26518988058faa50712248c80bd3?s=47 pbailis
June 03, 2015

Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity

SIGMOD 2015
3 June, Melbourne, Australia

Paper URL: http://www.bailis.org/papers/feral-sigmod2015.pdf

Abstract

The rise of data-intensive “Web 2.0” Internet services has led to a range of popular new programming frameworks that collectively embody the latest incarnation of the vision of Object-Relational Mapping (ORM) systems, albeit at unprecedented scale. In this work, we empirically investigate modern ORM-backed applications’ use and disuse of database concurrency control mechanisms. Specifically, we focus our study on the common use of feral, or application-level, mechanisms for maintaining database integrity, which, across a range of ORM systems, often take the form of declarative correctness criteria, or invariants. We quantitatively analyzethe use of these mechanisms in a range of open source applications written using the Ruby on Rails ORM and find that feral invariants are the most popular means of ensuring integrity (and, by usage, are over 37 times more popular than transactions). We evaluate which of these feral invariants actually ensure integrity (by usage, up to 86.9%) and which—due to concurrency errors and lack of database support—may lead to data corruption (the remainder), which we experimentally quantify. In light of these findings, we present recommendations for database system designers for better supporting these modern ORM programming patterns, thus eliminating their adverse effects on application integrity.

B7dc26518988058faa50712248c80bd3?s=128

pbailis

June 03, 2015
Tweet

Transcript

  1. FERAL!DPODVSSFODZ!DPOUSPM;! AN EMPIRICAL INVESTIGATION OF MODERN APPLICATION INTEGRITY PETER BAILIS,

    Alan Fekete, Mike Franklin, Ali Ghodsi, Joe Hellerstein, Ion Stoica UC Berkeley and University of Sydney SIGMOD 2015 3 June, Melbourne, Australia
  2. How do application developers use concurrency control today?

  3. None
  4. 1.) Adopted

  5. 1.) Adopted

  6. 2.) Idiomatic

  7. 2.) Idiomatic

  8. 2.) Idiomatic

  9. 2.) Idiomatic

  10. 3.) Opinionated

  11. “I don’t want my database to be clever! …Stored procedures

    and constraints [are] vile and reckless destroyers of coherence. No, Mr. Database, you can not have my business logic…you’ll have to pry [it] from my dead, cold object- oriented hands . . .” 3.) Opinionated
  12. How do developers use concurrency control today?

  13. 1.) Adopted 2.) Idiomatic 3.) Opinionated How do developers use

    concurrency control today?
  14. For our purposes: application server in three-tier model

  15. For our purposes: application server in three-tier model Web Server

  16. For our purposes: application server in three-tier model Web Server

    DBMS
  17. For our purposes: application server in three-tier model Rails Rails

    Rails Web Server DBMS
  18. For our purposes: application server in three-tier model Rails Rails

    Rails Web Server DBMS
  19. For our purposes: application server in three-tier model Rails Rails

    Rails Web Server DBMS Requests processed concurrently
  20. For our purposes: application server in three-tier model Rails Rails

    Rails Web Server DBMS Requests processed concurrently So how do applications handle concurrency?
  21. None
  22. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable- mexican-sofa communityengine copycopter- server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig
  23. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables
  24. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table
  25. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table Transactions are not popular!
  26. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table Transactions are not popular! So how do apps ensure integrity?
  27. Rails Integrity Mechanisms

  28. 1.) Transactions: in business logic Rails Integrity Mechanisms

  29. 1.) Transactions: in business logic 2.) Validations: invariants on schema

    Rails Integrity Mechanisms
  30. 1.) Transactions: in business logic 2.) Validations: invariants on schema

    class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end Rails Integrity Mechanisms
  31. 3.) Associations: relational integrity 1.) Transactions: in business logic 2.)

    Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end Rails Integrity Mechanisms
  32. 3.) Associations: relational integrity 1.) Transactions: in business logic 2.)

    Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end class Order < ActiveRecord::Base belongs_to :customer end Rails Integrity Mechanisms
  33. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 259 total; avg. 0.13 per table
  34. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table
  35. ALTERNATIVES MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy

    browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table
  36. ALTERNATIVES MORE COMMON 37x adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy

    browsercms bucketwise calagator canvas-lms carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table WHY?
  37. None
  38. IT’S ALL ABOUT IDIOMS!

  39. Validations and associations are core Rails concepts IT’S ALL ABOUT

    IDIOMS!
  40. Validations and associations are core Rails concepts IT’S ALL ABOUT

    IDIOMS!
  41. Validations and associations are core Rails concepts IT’S ALL ABOUT

    IDIOMS!
  42. 3.) Associations: relational integrity 1.) Transactions: in business logic 2.)

    Validations: invariants on schema class Person < ActiveRecord::Base validates :name, presence: true, uniqueness: true end class Order < ActiveRecord::Base belongs_to :customer end Rails Integrity Mechanisms
  43. 3.) Associations: relational integrity 2.) Validations: invariants on schema

  44. 3.) Associations: relational integrity 2.) Validations: invariants on schema

  45. ARE - OPAQUE TO DBMS 3.) Associations: relational integrity 2.)

    Validations: invariants on schema
  46. ARE - OPAQUE TO DBMS - IMPLEMENTED IN RAILS LOGIC

    3.) Associations: relational integrity 2.) Validations: invariants on schema
  47. ARE - OPAQUE TO DBMS - IMPLEMENTED IN RAILS LOGIC

    FERAL CONCURRENCY CONTROL 3.) Associations: relational integrity 2.) Validations: invariants on schema
  48. Rails developers prefer feral mechanisms

  49. Rails developers prefer feral mechanisms Do they work?

  50. Rails developers prefer feral mechanisms Do they work? What does

    this mean for databases?
  51. Rails developers prefer feral mechanisms Do they work? What does

    this mean for databases?
  52. Rails Execution Model Requests processed concurrently Rails Rails Rails Web

    Server DBMS
  53. Rails Execution Model Requests processed concurrently Rails Rails Rails Web

    Server DBMS
  54. Rails Execution Model Requests processed concurrently Validations run concurrently! Rails

    Rails Rails Web Server DBMS
  55. Rails Execution Model Requests processed concurrently Validations run concurrently! CAN’T

    WE USE TXNS? Rails Rails Rails Web Server DBMS
  56. Rails Execution Model Requests processed concurrently Validations run concurrently! CAN’T

    WE USE TXNS? Rails Rails Rails Web Server DBMS INSUFFICIENT DUE TO WEAK ISOLATION
  57. How can we tell if a given validation will enforce

    integrity under concurrent execution?
  58. Key idea: Check if validations can be violated by “merging”

    independent operations ICT: Invariant Confluence Test [VLDB 2015] How can we tell if a given validation will enforce integrity under concurrent execution?
  59. Key idea: Check if validations can be violated by “merging”

    independent operations ICT: Invariant Confluence Test [VLDB 2015]
  60. VALIDATION: User IDs are unique OPERATION: Save new user MERGE:

    Add both records to DB Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]
  61. VALIDATION: User IDs are unique OPERATION: Save new user MERGE:

    Add both records to DB {{Stu,ID=1}, {Jan,ID=1}} Validation fails! {} MERGE add {Stu,ID=1} add {Ann,ID=1} Key idea: Check if validations can be violated by “merging” independent operations ICT: Invariant Confluence Test [VLDB 2015]
  62. Key idea: Check if validations can be violated by “merging”

    independent operations VALIDATION: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB ICT: Invariant Confluence Test [VLDB 2015]
  63. Key idea: Check if validations can be violated by “merging”

    independent operations VALIDATION: User IDs are positive OPERATION: Save new user MERGE: Add both records to DB {{Stu,ID=1}, {Jan,ID=1}} Validation holds! {} MERGE add {Stu,ID=1} add {Ann,ID=1} ICT: Invariant Confluence Test [VLDB 2015]
  64. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table
  65. adopt-a-hydrant alchemy_cms amahi bostonrb boxroom brevidy browsercms bucketwise calagator canvas-lms

    carter chiliproject citizenry comas comfortable-mexican-sofa communityengine copycopter-server danbooru diaspora discourse enki fat_free_crm fedena forem fulcrum gitlab-ci gitlabhq govsgo heaven inkwell insoshi jobsworth juvia kandan linuxfr.org lobsters lovd-by-less nimbleshop obtvse onebody opal opencongress opengovernment openproject piggybak publify radiant railscollab redmine refinerycms ror_ecommerce rucksack saasy salor-retail selfstarter sharetribe skyline spot-us spree sprintapp squaresquash sugar teambox tracks tryshoppe wallgig zena 67 projects 1.77M LoC 1957 tables 9986 total; avg. 5.1 per table 259 total; avg. 0.13 per table 86.9% PASS ICT
  66. Do they work? Rails developers prefer feral mechanisms

  67. Do they work? Rails developers prefer feral mechanisms 86.9% PASS

    ICT
  68. Do they work? Rails developers prefer feral mechanisms Yes, safe!

    86.9% PASS ICT
  69. Do they work? Rails developers prefer feral mechanisms Yes, safe!

    86.9% PASS ICT 13.1% FAIL ICT
  70. Do they work? Rails developers prefer feral mechanisms Yes, safe!

    86.9% PASS ICT 13.1% FAIL ICT No, unsafe!
  71. Do they work? Rails developers prefer feral mechanisms Yes, safe!

    86.9% PASS ICT 13.1% FAIL ICT No, unsafe! How bad are these 13.1%?
  72. Uniqueness validation is broken

  73. Uniqueness validation is broken FAILS ICT

  74. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  75. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  76. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  77. 1 2 4 8 16 32 64 Number of Rails

    Processes 0 100 101 102 103 104 Number of Duplicate Records Without validation With validation Uniqueness validation is broken FAILS ICT
  78. Association validation is broken

  79. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken
  80. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  81. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  82. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  83. 1 2 4 8 16 32 64 Number of Rails

    Workers 0 100 101 102 103 104 Number of Orphaned Users Without validation With validation Association validation is broken FAILS ICT
  84. Takeaway: Rails’s feral validations, though popular, do not guarantee integrity!

  85. Takeaway: Rails’s feral validations, though popular, do not guarantee integrity!

    Not just Rails! ORM Supports Feral Validation? Java Persistence API Yes Hibernate (Java) Yes CakePHP Yes Laravel (PHP) Yes Django (Python) Yes Waterline (node.js) Yes
  86. Databases are providing the wrong abstractions:

  87. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly
  88. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time)
  89. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time) 3.) DBs should facilitate portable deployment of applications
  90. Databases are providing the wrong abstractions: 1.) DBs should allow

    apps to express validations natively, while enforcing them correctly 2.) Apps should pay the price of coordination only when strictly necessary (13.1% of time) 3.) DBs should facilitate portable deployment of applications RESEARCH GOLDMINE USERS CARE! +
  91. How do application developers use concurrency control today?

  92. A broader opportunity

  93. A broader opportunity How are our abstractions performing?

  94. A broader opportunity How are our abstractions performing? Do people

    like using transactions? Do people like using the relational model? Do people like using XML?
  95. A broader opportunity How are our abstractions performing? Open source

    provides a quantitative lens! Do people like using transactions? Do people like using the relational model? Do people like using XML?
  96. Open source provides a quantitative lens!

  97. Open source provides a quantitative lens! 0.0 0.2 0.4 0.6

    0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Commits Authored (CDF) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Authors 0.0 0.2 0.4 0.6 0.8 1.0 Valid/Assoc Authored (CDF) Asking creative questions is the hardest part!
  98. Conclusions • Feral validations are surprisingly common in many modern

    open source web applications • By usage, many validations are safe, but a substantial fraction are not and may allow data corruption • Providing proper RDBMS support for these validations is a pressing research challenge for the community • Open source is an exciting, untapped resource; be bold!