Slide 1

Slide 1 text

Threadsafe Upsert Myles Megyesi Software Craftsman @mylesmegyesi

Slide 2

Slide 2 text

Let’s talk about a use case

Slide 3

Slide 3 text

CSV Import Name Email Last Updated John [email protected] 2014-11-14 12:42:24 Mary [email protected] 2014-11-13 12:42:24 Sam [email protected] 2014-11-12 12:42:24

Slide 4

Slide 4 text

Requirements • If the person does not exist, create them • If the person’s email already exists, update their name and last updated timestamp and do not create a new row

Slide 5

Slide 5 text

Setup before :each do with_db_connection do |connection| connection.create_table(:people) do primary_key :id String :vendor_name String :vendor_email, unique: true Time :vendor_updated_at end end end

Slide 6

Slide 6 text

Test Case #1

Slide 7

Slide 7 text

it 'creates a person if they do not exist' do with_db_connection do |connection| result = upsert(connection, 'John', '[email protected]', yesterday) expect(result).to eq(status: :success) people = all_people(connection) expect(people.size).to eq(1) row = people.first expect(row[:vendor_name]).to eq('John') expect(row[:vendor_email]).to eq('[email protected]') expect(row[:vendor_updated_at]).to eq(yesterday) end end

Slide 8

Slide 8 text

% bundle exec rspec F Failures: 1) Upsert::Mysql creates a person if they do not exist Failure/Error: result = upsert(connection, 'John', '[email protected]', yesterday) NoMethodError: undefined method `upsert' for Upsert::Mysql:Class Finished in 0.02278 seconds (files took 0.29508 seconds to load) 1 example, 1 failure

Slide 9

Slide 9 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end

Slide 10

Slide 10 text

% bundle exec rspec . Finished in 0.04644 seconds (files took 0.29773 seconds to load) 1 example, 0 failures

Slide 11

Slide 11 text

Test Case #2

Slide 12

Slide 12 text

it "updates the person's name and updated_at if the email already exists" do with_db_connection do |connection| create_result = upsert(connection, 'John', '[email protected]', yesterday) update_result = upsert(connection, 'Johnathon', '[email protected]', now) expect(update_result).to eq(status: :success) people = all_people(connection) expect(people.size).to eq(1) row = people.first expect(row[:vendor_name]).to eq('Johnathon') expect(row[:vendor_email]).to eq('[email protected]') expect(row[:vendor_updated_at]).to eq(now) end end

Slide 13

Slide 13 text

% bundle exec rspec spec/upsert/mysql_spec.rb Compiled extensions not installed, pure Ruby Atomic will be used. .F Failures: 1) Upsert::Mysql updates the person's name and updated_at if the email already exists Failure/Error: update_result = upsert(connection, 'Johnathon', '[email protected]', now) Sequel::UniqueConstraintViolation: Mysql2::Error: Duplicate entry '[email protected]' for key 'vendor_email' Finished in 0.05595 seconds (files took 0.28592 seconds to load) 2 examples, 1 failure Failed examples: rspec ./spec/upsert/shared_examples.rb:44 # Upsert::Mysql updates the person's name and updated_at if the email already exists

Slide 14

Slide 14 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end

Slide 15

Slide 15 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row connection.from(:people). where(vendor_email: email). update({ vendor_name: name, vendor_updated_at: updated_at }) {status: :success} else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 16

Slide 16 text

% bundle exec rspec ... Finished in 0.06233 seconds (files took 0.27974 seconds to load) 2 examples, 0 failures

Slide 17

Slide 17 text

Test Case #3

Slide 18

Slide 18 text

it 'does not override newer data with old data' do with_db_connection do |connection| create_result = upsert(connection, 'John', '[email protected]', now) update_result = upsert(connection, 'Johnathon', '[email protected]', yesterday) expect(update_result).to eq({ status: :failure, reason: :stale_data }) people = all_people(connection) expect(people.size).to eq(1) row = people.first expect(row[:vendor_name]).to eq('John') expect(row[:vendor_email]).to eq('[email protected]') expect(row[:vendor_updated_at]).to eq(now) end end

Slide 19

Slide 19 text

% bundle exec rspec ..F Failures: 1) Upsert::Mysql does not override newer data with old data Failure/Error: expect(update_result).to eq({ expected: {:status=>:failure, :reason=>:stale_data} got: {:status=>:success} (compared using ==) Finished in 0.07561 seconds (files took 0.28662 seconds to load) 3 examples, 1 failure

Slide 20

Slide 20 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row connection.from(:people). where(vendor_email: email). update({ vendor_name: name, vendor_updated_at: updated_at }) {status: :success} else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 21

Slide 21 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row if updated_at > row[:vendor_updated_at] connection.from(:people). where(vendor_email: email). update({ vendor_name: name, vendor_updated_at: updated_at }) {status: :success} else {status: :failure, reason: :stale_data} end else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 22

Slide 22 text

% bundle exec rspec ... Finished in 0.06233 seconds (files took 0.27974 seconds to load) 3 examples, 0 failures

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row if updated_at > row[:vendor_updated_at] connection.from(:people). where(vendor_email: email). update({ vendor_name: name, vendor_updated_at: updated_at }) {status: :success} else {status: :failure, reason: :stale_data} end else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 25

Slide 25 text

What is a race condition?

Slide 26

Slide 26 text

A race condition occurs when two or more threads can access shared data and they try to change it at the same time. — Some guy on StackOverflow

Slide 27

Slide 27 text

Test Case #4

Slide 28

Slide 28 text

it 'handles many writers trying to update' do writers = 100 updated_at_times = generate_n_times_between(yesterday, now, writers) in_parallel_options = { times: writers, concurrency: writers / 5, args: updated_at_times.shuffle } results = in_parallel(in_parallel_options) do |connection, updated_at| upsert(connection, 'John', '[email protected]', updated_at) end expect(results).to have(writers).items successful_writes, failed_writes = results.partition do |result| result[:status] == :success end expect(successful_writes).to have_at_least(1).item with_db_connection do |connection| people = all_people(connection) expect(people.size).to eq(1) row = people.first expect(row[:vendor_name]).to eq('John') expect(row[:vendor_email]).to eq('[email protected]') expect(row[:vendor_updated_at]).to eq(updated_at_times.max) end end

Slide 29

Slide 29 text

% bundle exec rspec .... Finished in 0.12343 seconds (files took 0.28886 seconds to load) 4 examples, 0 failures

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

% rvm use jruby-1.7.16.1@threadsafe-upsert Using ~/.rvm/gems/jruby-1.7.16.1 with gemset threadsafe-upsert

Slide 32

Slide 32 text

% bundle exec rspec ...F Failures: 1) Upsert::Mysql handles many writers trying to update Failure/Error: expect(row[:vendor_updated_at]).to eq(updated_at_times.max) expected: 2014-11-14 12:42:24.000000000 +0000 got: 2014-11-14 10:32:48.000000000 +0000 Finished in 12.62 seconds (files took 1.49 seconds to load) 4 examples, 1 failure

Slide 33

Slide 33 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row if updated_at > row[:vendor_updated_at] connection.from(:people). where(vendor_email: email). update({ vendor_name: name, vendor_updated_at: updated_at }) {status: :success} else {status: :failure, reason: :stale_data} end else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 34

Slide 34 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row affected_rows = connection.from(:people). where(vendor_email: email). where(Sequel.expr(updated_at) > :vendor_updated_at). update({ vendor_name: name, vendor_updated_at: updated_at }) if affected_rows == 0 {status: :failure, reason: :stale_data} else {status: :success} end else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 35

Slide 35 text

% bundle exec rspec .... Finished in 0.12343 seconds (files took 0.28886 seconds to load) 4 examples, 0 failures

Slide 36

Slide 36 text

Test Case #5

Slide 37

Slide 37 text

it 'handles many writers trying to insert the same piece of data' do writers = 20 results = in_parallel(times: writers) do |connection| upsert(connection, 'John', '[email protected]', now) end expect(results.size).to eq(writers) successful_writes, failed_writes = results.partition do |result| result[:status] == :success end expect(successful_writes.size).to eq(1) expect(failed_writes.size).to eq(writers - 1) expect(failed_writes.map{|r| r[:reason]}).to all(eq(:stale_data)) with_db_connection do |connection| people = all_people(connection) expect(people.size).to eq(1) row = people.first expect(row[:vendor_name]).to eq('John') expect(row[:vendor_email]).to eq('[email protected]') expect(row[:vendor_updated_at]).to eq(now) end end

Slide 38

Slide 38 text

% bundle exec rspec ....F Failures: 1) Upsert::Mysql handles many writers trying to insert the same piece of data Failure/Error: expect(results.size).to eq(writers) expected: 100 got: 85 Finished in 2.02 seconds (files took 1.64 seconds to load) 5 examples, 1 failure

Slide 39

Slide 39 text

it 'handles many writers trying to insert the same piece of data' do writers = 20 results = in_parallel(times: writers) do |connection| upsert(connection, 'John', '[email protected]', now) end expect(results.size).to eq(writers) successful_writes, failed_writes = results.partition do |result| result[:status] == :success end expect(successful_writes.size).to eq(1) expect(failed_writes.size).to eq(writers - 1) expect(failed_writes.map{|r| r[:reason]}).to all(eq(:stale_data)) with_db_connection do |connection| people = all_people(connection) expect(people.size).to eq(1) row = people.first expect(row[:vendor_name]).to eq('John') expect(row[:vendor_email]).to eq('[email protected]') expect(row[:vendor_updated_at]).to eq(now) end end

Slide 40

Slide 40 text

results = in_parallel(times: writers, concurrency: writers / 5) do |connection| begin upsert(connection, 'John', '[email protected]', now) rescue => e puts "#{Thread.current.object_id} #{e.message}" raise end end

Slide 41

Slide 41 text

% bundle exec rspec .... 2724 Sequel::UniqueConstraintViolation: Duplicate entry '[email protected]' for key 'vendor_email' 2728 Sequel::UniqueConstraintViolation: Duplicate entry '[email protected]' for key 'vendor_email' 2732 Sequel::UniqueConstraintViolation: Duplicate entry '[email protected]' for key 'vendor_email' 2730 Sequel::UniqueConstraintViolation: Duplicate entry '[email protected]' for key 'vendor_email' F Failures: 1) Upsert::Mysql handles many writers trying to insert the same piece of data Failure/Error: expect(results.size).to eq(writers) expected: 100 got: 96 Finished in 2.31 seconds (files took 1.35 seconds to load) 5 examples, 1 failures

Slide 42

Slide 42 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) row = connection.from(:people). where(vendor_email: email). first if row affected_rows = connection.from(:people). where(vendor_email: email). where(Sequel.expr(updated_at) > :vendor_updated_at). update({ vendor_name: name, vendor_updated_at: updated_at }) if affected_rows == 0 {status: :failure, reason: :stale_data} else {status: :success} end else connection.from(:people).insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} end end end end

Slide 43

Slide 43 text

module Upsert class Mysql def self.upsert(connection, name, email, updated_at) table = connection.from(:people) begin table.insert({ vendor_name: name, vendor_email: email, vendor_updated_at: updated_at }) {status: :success} rescue Sequel::UniqueConstraintViolation => e affected_rows = table. where(Sequel.expr(updated_at) > :vendor_updated_at). where(vendor_email: email). update({ vendor_name: name, vendor_updated_at: updated_at, }) if affected_rows == 0 {status: :failure, reason: :stale_data} else {status: :success} end end end end end

Slide 44

Slide 44 text

% bundle exec rspec ..... Finished in 2.15 seconds (files took 1.42 seconds to load) 5 examples, 0 failures

Slide 45

Slide 45 text

Thank you! @mylesmegyesi https://github.com/mylesmegyesi/threadsafe-upsert