Slide 1

Slide 1 text

Parallel batch http://github.com/officialfm/parallel_batch

Slide 2

Slide 2 text

I am Alexis Bernard and work at Official.fm as chief technical architect. [email protected] github.com/alexisbernard twitter.com/alexis_bernard

Slide 3

Slide 3 text

What is a batch? A batch is a processing that runs on a group of records.

Slide 4

Slide 4 text

How do we code batches with Rails on Rails? Track.not_encoded.each do |track| track.not_encoded.encode_mp3 end Loads all instances in memory Not usable on large tables

Slide 5

Slide 5 text

Track.not_encoded.find_each do |track| track.encode_mp3 end Load instances per batch of 1000, but not parallelizable Since it is not parallelized, this batch can take many days or weeks ! Improved version

Slide 6

Slide 6 text

How can we parallelize ? Threads and processes are limited by resources of a single device. Distributing has no resources limitation, since it run on many devices. Parallel batch handles everything to distribute your batches across servers.

Slide 7

Slide 7 text

class TrackBatch < ParallelBatch def scope Track.not_encoded end def perform(track) track.encode_mp3 end end TrackBatch.start(5) # Start 5 workers

Slide 8

Slide 8 text

How it works under the hood ? Parallel batch relies on the database as a mutex. It prevents workers to process a record multiple times.

Slide 9

Slide 9 text

ParallelBatch is an ActiveRecord sub class irb :001 > ParallelBatch ParallelBatch( id: integer, type: string, offset: string, created_at: datetime, updated_at: datetime)

Slide 10

Slide 10 text

class ParallelBatch < ActiveRecord::Base def run while records = next_batch records.each { |record| perform(record) } end end def next_batch transaction do reload(lock: true) next unless (records = find_records).last update_attributes!(offset: records.last.id) records end end end

Slide 11

Slide 11 text

Thanks for your attention ! Any question ?