Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL in SQL: Getting to Have It All

NoSQL in SQL: Getting to Have It All

Three techniques that work together to make your Ruby on Rails application use a relational database with much of the power and expressivity as a NoSQL store, while maintaining the robustness, operational flexibility, and robust ecosystem of a relational database. Given at SFRails, May 15, 2014: http://www.meetup.com/SFRails/events/167606272/.

Andrew Geweke

May 15, 2014
Tweet

Other Decks in Technology

Transcript

  1. NoSQL in SQL: Getting to Have It All Andrew Geweke

    SFRails • May 15, 2014 a.k.a. ageweke’s guide to a totally fun, crazy-scalable relational database
  2. Who Am I? Started with Rails in 2005 Rails 0.13.0,

    80KBytes of sheer awesomeness
 (Rails 4.1.0 is 17x bigger) Architecture, Code, Operations, Management at: 2009–2011 Dir. Eng, Scribd (top-100 site, 50+ pages/second) 2011–2013 VP Product/Eng, Couchsurfing (≥7M users) 2014–now Swiftype (Search as a Service) My Interests: Rails “At Large” — challenges of:
 lots of developers, large codebase, high complexity These techniques proven at ≥100 million row scale
  3. Who Is This Talk For? Anyone involved with the core

    data layer in your app Wherever you store key business data Not about “Big Data” for analytics or data science Apps can be just about anything: Traditional page-oriented Web apps JavaScript-driven single-page apps Native mobile apps Both brand-new and grizzled old applications
  4. Core Business Data Users Accounts Permissions Email Preferences A/B Test

    Assignments Blocked Users Web Logs Behavioral
 Analysis Audit Trails Objects ≥ 10MB Graph Data
  5. Where Do We Put This Stuff? RDBMS ✓ ACID Transactions

    ✓ Operational Flexibility ✓ Robust Ecosystem ✓ Efficient Storage NoSQL ╳ Limited Data Types ╳ Heavyweight Schema ╳ Scalability Concerns? ╳ Transactions? ╳ Deployment/Support? ╳ We’re All Learning ╳ Less-Efficient Storage ✓ Flexible Data Types ✓ No Migrations! ✓ Better Scaling (?)
  6. Our Users Table Column Name Type Column Name Type email_address

    VARCHAR(255) latitude DOUBLE username VARCHAR(30) longitude DOUBLE password_hash CHAR(60) email_verified BOOLEAN locale VARCHAR(8) deleted BOOLEAN gender TINYINT send_newsletter BOOLEAN full_name VARCHAR(255) forum_level TINYINT anonymous BOOLEAN facebook_oauth VARCHAR(255) email_visible BOOLEAN stripe_token VARCHAR(255) tos_date DATE created_at TIMESTAMP date_of_birth DATE updated_at TIMESTAMP phone VARCHAR(30)
  7. What Was Once Heresy… There’s no reason not to store

    data this way If you’re careful And have the right tools
  8. Being Careful Means: Only storing data that… You won’t ever

    need to query on You only ever need for one row at a time Never, ever storing defaults Defaults should usually be in code, for lots of good reasons
  9. Columns for a JSON Store Column Name Type Column Name

    Type email_address VARCHAR(255) latitude DOUBLE username VARCHAR(30) longitude DOUBLE password_hash CHAR(60) email_verified BOOLEAN locale VARCHAR(8) deleted BOOLEAN gender TINYINT send_newsletter BOOLEAN full_name VARCHAR(255) forum_level TINYINT anonymous BOOLEAN facebook_oauth VARCHAR(255) email_visible BOOLEAN stripe_token VARCHAR(255) tos_date DATE created_at TIMESTAMP date_of_birth DATE updated_at TIMESTAMP phone VARCHAR(30)
  10. Let’s Make It Happen! class AddJsonToUsers < ActiveRecord::Migration def up

    add_column :users, :json_store, :text remove_column :users, :password_hash remove_column :users, :locale remove_column :users, :phone remove_column :users, :facebook_oauth_token remove_column :users, :stripe_customer_token end end ! class User < ActiveRecord::Base store :json_store, :accessors => [ :password_hash, :locale, :phone, :facebook_oauth_token, :stripe_customer_token ], :coder => JSON end Tah-Dah! …except not quite.
  11. What Have We Really Done? Removed five standard columns Added

    one JSON store with their contents +85 bytes of JSON overhead Mostly “column names” like facebook_oauth_token Loaded every time you retrieve users for any reason This is not a win.
  12. Let’s Try Again. class AddJsonTableForUsers < ActiveRecord::Migration def up create_table

    :user_details, :primary_key => :user_id do |t| t.text :json_store end ! remove_column :users, :password_hash remove_column :users, :locale remove_column :users, :phone remove_column :users, :facebook_oauth_token remove_column :users, :stripe_customer_token end end
  13. Let’s Try Again. Now, we only load the JSON store

    for (e.g.) the
 currently-logged-in user, instead of on every User find. require 'flex_columns' ! class UserDetail < ActiveRecord::Base flex_column :json_store do field :password_hash field :locale field :phone field :facebook_oauth_token field :stripe_customer_token end end ! class User < ActiveRecord::Base has_one :user_detail include_flex_columns_from :user_detail end
  14. Let’s Try Again. require 'flex_columns' ! class UserDetail < ActiveRecord::Base

    flex_column :json_store do field :password_hash, :json => :pw field :locale, :json => :l field :phone, :json => :p field :facebook_oauth_token, :json => :fb field :stripe_customer_token, :json => :st end end ! class User < ActiveRecord::Base has_one :user_detail include_flex_columns_from :user_detail end 67% less JSON overhead — same code.
  15. More flex_columns Store JSON in a separate table Short JSON

    field names for less overhead Transparent GZip compression (optional) Override accessor methods Full validation support Remove no-longer-used fields from JSON “Retire” old fields so you don’t accidentally reuse them Supports Rails 3.0–4.1, Ruby 1.8–2.1 + JRuby,
 MySQL, PostgreSQL, SQLite Industrial-strength version of ActiveRecord::Store.
  16. cardinality (n.) The number of distinct values a column in

    the database stores. High cardinality: username, email address, first name Low cardinality: deleted, gender, admin level
 “enum types” Why use an entire column for a 1-bit flag?
  17. Low-Cardinality Columns Column Name Type Column Name Type email_address VARCHAR(255)

    latitude DOUBLE username VARCHAR(30) longitude DOUBLE password_hash CHAR(60) email_verified BOOLEAN locale VARCHAR(8) deleted BOOLEAN gender TINYINT send_newsletter BOOLEAN full_name VARCHAR(255) forum_level TINYINT anonymous BOOLEAN facebook_oauth VARCHAR(255) email_visible BOOLEAN stripe_token VARCHAR(255) tos_date DATE created_at TIMESTAMP date_of_birth DATE updated_at TIMESTAMP phone VARCHAR(30)
  18. Moving To low_card_tables class AddUserStatuses < ActiveRecord::Migration def up create_table

    :user_statuses, :low_card => true do t.string :gender t.boolean :anonymous t.boolean :email_visible t.date :tos_date t.boolean :email_verified t.boolean :deleted t.boolean :send_newsletter t.string :forum_level end end end ! class UserStatus < ActiveRecord::Base is_low_card_table end ! class User < ActiveRecord::Base has_low_card_table :status end
  19. Looking At The Data id gender anonymous forum_level email_verified send_newsletter

    1 female 0 beginner FALSE FALSE 2 female 0 beginner TRUE FALSE 3 female 0 beginner TRUE TRUE 4 male 0 beginner FALSE FALSE user_id username full_name user_status_id 1 mschultz@gmail Mary Schultz 3 2 fbonana@yahoo. Friday Bonana 2 3 alison.b@faceb Alison Bateman 3 4 mstrauss@gmail Mark Strauss 4 . . . . . .
  20. What’s Going On Here? Low-cardinality table gets one row for

    each distinct combination of values, auto-assigned to parent table Rows are generated on-demand, only for combinations used Entire table is cached in memory — zero extra queries We can use friendly strings, not integers we have to remember Vastly more straightforward from a SQL prompt Indexes become useful again!
  21. Using It From Rails my_user = User.find(12345) my_user.deleted? # =>

    false my_user.gender = :male my_user.forum_level # => 'wise_veteran' ! User.where(:deleted => false) ! class User < ActiveRecord::Base validates :forum_level, :in => %w{beginner in_training wise_veteran} ! default_scope { where(:deleted => false) } ! def self.anonymous_newsletter_recipients where(:anonymous => true, :send_newsletter => true) end end ! class UserStatus < ActiveRecord::Base validates :tos_date, :in => ALLOWED_TOS_DATES end Exactly like you’d expect.
  22. More low_card_tables Validations work on either the low-card model or

    the owning model Full bulk support Look up new low-card values en masse Update them easily using activerecord-import, etc. API for partial queries (“which IDs satisfy this predicate?”), etc. ActiveSupport::Notifications support for logging/debugging Supports Rails 3.0–4.1, Ruby 1.8–2.1 + JRuby,
 MySQL, PostgreSQL, SQLite
  23. What Have We Done? users email_address VARCHAR(255) username VARCHAR(30) full_name

    VARCHAR(255) date_of_birth DATE phone VARCHAR(30) latitude DOUBLE longitude DOUBLE user_status_id INTEGER created_at TIMESTAMP updated_at TIMESTAMP user_details user_id INTEGER json_store TEXT user_statuses id INTEGER gender VARCHAR(20) anonymous BOOLEAN email_visible BOOLEAN tos_date DATE email_verified BOOLEAN deleted BOOLEAN send_newsletter BOOLEAN forum_level VARCHAR(50)
  24. What Have We Done? users email_address VARCHAR(255) username VARCHAR(30) full_name

    VARCHAR(255) date_of_birth DATE phone VARCHAR(30) latitude DOUBLE longitude DOUBLE user_status_id INTEGER created_at TIMESTAMP updated_at TIMESTAMP before: 302 bytes after: 88 bytes Rows now take up 29% as much space Disk is cheap, but RAM (buffer cache) is not! EC2 r3.xlarge ($24/month): 180 million users, served from RAM 29% as much data across the network, ≤½ as many columns to map in AR
  25. What Have We Done? user_details user_id INTEGER json_store TEXT before:

    loaded for every single user after: only loaded when necessary
  26. What Have We Done? user_statuses id INTEGER gender VARCHAR(20) anonymous

    BOOLEAN email_visible BOOLEAN tos_date DATE email_verified BOOLEAN deleted BOOLEAN send_newsletter BOOLEAN forum_level VARCHAR(50) before: bad representation, so many flags after: compact, migrateable DB-friendly format
  27. One Last Step… Rails only supports one database …well, in

    2006, that was true There’s no reason to keep everything in one single database any more World, meet db-charmer Huge shout-out to Oleksiy Kovyrin
 (http://github.com/kovyrin/) 
 Most amazing ops person I’ve ever met No, you can’t have him, he’s ours
  28. Models with db-charmer class User < ActiveRecord::Base db_magic :connection =>

    :users_master, :slave => [ :users_slave1, :users_slave2 ] end ! my_user = User.find(params[:user_id]) # on a round-robin slave my_user.deleted = true my_user.save! # writes go to master ! User.on_master do # everything in here goes to the master end Assign different models to different databases Automatic use of slaves for reads Flexible connection management, migration support, database remapping, … Has processed hundreds of billions of queries without error
  29. Allocating Databases: Users lots of RAM, serves from memory users

    low RAM, serves from disk user_details leftover server, almost no load at all user_statuses
  30. Allocating Databases: All lots of RAM, serves from memory users

    low RAM, serves from disk, add spindles user_details leftover server, almost no load at all user_statuses lots of RAM, posts
  31. Allocating Databases: Starting lots of RAM, serves from memory users

    low RAM, serves from disk user_details leftover server, almost no load at all user_statuses posts links tweets
  32. Fun, Scalable
 Relational Databases Non-queried, one-row-at-once columns: JSON store!
 http://github.com/ageweke/flex_columns

    Low-cardinality columns (few distinct values):
 http://github.com/ageweke/low_card_tables Databases anywhere, read slaves, etc.:
 http://github.com/kovyrin/db-charmer ! Design your tables correctly, move them to the right hosts:
 by the time you have scaling problems, you’ll be stupidly successful and have lots of minions to help you
  33. Why Do I Care? RDBMS ✓ ACID Transactions ✓ Operational

    Flexibility ✓ Robust Ecosystem ✓ Efficient Storage NoSQL ╳ Limited Data Types ╳ Heavyweight Schema ╳ Scalability Concerns? ╳ Transactions? ╳ Deployment/Support? ╳ We’re All Learning ╳ Less-Efficient Storage ✓ Flexible Data Types ✓ No Migrations! ✓ Better Scaling (?) This is way more
 painful to lose
 than you realize
  34. Where Do We Put This Stuff? RDBMS ✓ ACID Transactions

    ✓ Operational Flexibility ✓ Robust Ecosystem ✓ Really Efficient Storage NoSQL ✓ Flexible Data Types ✓ Lighter-weight Schema ✓ Scales Like Crazy ╳ Transactions? ╳ Deployment/Support? ╳ We’re All Learning ╳ Less-Efficient Storage ✓ Flexible Data Types ✓ No Migrations! ✓ Better Scaling (?)