Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parallel batch

alexis
September 19, 2012

Parallel batch

Parallel batch helps to distribute batches across servers. It works great with Rails app.

alexis

September 19, 2012
Tweet

More Decks by alexis

Other Decks in Programming

Transcript

  1. I am Alexis Bernard and work at Official.fm as chief

    technical architect. [email protected] github.com/alexisbernard twitter.com/alexis_bernard
  2. What is a batch? A batch is a processing that

    runs on a group of records.
  3. How do we code batches with Rails on Rails? Track.not_encoded.each

    do |track| track.not_encoded.encode_mp3 end Loads all instances in memory Not usable on large tables
  4. Track.not_encoded.find_each do |track| track.encode_mp3 end Load instances per batch of

    1000, but not parallelizable Since it is not parallelized, this batch can take many days or weeks ! Improved version
  5. How can we parallelize ? Threads and processes are limited

    by resources of a single device. Distributing has no resources limitation, since it run on many devices. Parallel batch handles everything to distribute your batches across servers.
  6. class TrackBatch < ParallelBatch def scope Track.not_encoded end def perform(track)

    track.encode_mp3 end end TrackBatch.start(5) # Start 5 workers
  7. How it works under the hood ? Parallel batch relies

    on the database as a mutex. It prevents workers to process a record multiple times.
  8. ParallelBatch is an ActiveRecord sub class irb :001 > ParallelBatch

    ParallelBatch( id: integer, type: string, offset: string, created_at: datetime, updated_at: datetime)
  9. class ParallelBatch < ActiveRecord::Base def run while records = next_batch

    records.each { |record| perform(record) } end end def next_batch transaction do reload(lock: true) next unless (records = find_records).last update_attributes!(offset: records.last.id) records end end end