I am Alexis Bernard and work at Official.fm as
chief technical architect.
[email protected]
github.com/alexisbernard
twitter.com/alexis_bernard
Slide 3
Slide 3 text
What is a batch?
A batch is a processing that runs on a group of
records.
Slide 4
Slide 4 text
How do we code batches with Rails
on Rails?
Track.not_encoded.each do |track|
track.not_encoded.encode_mp3
end
Loads all instances in memory
Not usable on large tables
Slide 5
Slide 5 text
Track.not_encoded.find_each do |track|
track.encode_mp3
end
Load instances per batch of 1000,
but not parallelizable
Since it is not parallelized, this batch can take
many days or weeks !
Improved version
Slide 6
Slide 6 text
How can we parallelize ?
Threads and processes are limited by resources
of a single device.
Distributing has no resources limitation, since it
run on many devices.
Parallel batch handles everything to distribute
your batches across servers.
Slide 7
Slide 7 text
class TrackBatch < ParallelBatch
def scope
Track.not_encoded
end
def perform(track)
track.encode_mp3
end
end
TrackBatch.start(5) # Start 5 workers
Slide 8
Slide 8 text
How it works under the hood ?
Parallel batch relies on the database as a mutex.
It prevents workers to process a record multiple
times.
Slide 9
Slide 9 text
ParallelBatch is an ActiveRecord
sub class
irb :001 > ParallelBatch
ParallelBatch(
id: integer,
type: string,
offset: string,
created_at: datetime,
updated_at: datetime)
Slide 10
Slide 10 text
class ParallelBatch < ActiveRecord::Base
def run
while records = next_batch
records.each { |record| perform(record) }
end
end
def next_batch
transaction do
reload(lock: true)
next unless (records = find_records).last
update_attributes!(offset: records.last.id)
records
end
end
end