Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Little needs, big problems

Little needs, big problems

"That's easy, I just need... could you do that in 5 minutes?"
Love it or hate it, you've heard that a lot.

Young devs accept to be the rockstars, old devs forbid this outright as the symptom of deeper and more dangerous causes.
How and why?

What I'll show here are simple needs you run into every day, the traps they hide and some possible evolutions I've met.

[Currently this slide only includes importing CSV, my next pet peeves could be versioning, logging, cleaning data...]

Sylvain Abélard

November 12, 2013
Tweet

More Decks by Sylvain Abélard

Other Decks in Programming

Transcript

  1. comp_cache = {} FasterCSV.parse(File.open('test.csv'), 'r').each{|line| Client.create( attr: line[0], company: comp_cache[line[1]]

    || (comp_cache[line[1]] = Company.find_by_name(line[1])), active: ['1', 'oui', 'X'].include?(line[2]) ) } 10mn: perf’s not good
  2. comp_cache = {} FasterCSV.parse(File.open('test.csv'), 'r').each{|line| next if line.blank? || line.size

    != 5 comp = comp_cache[line[1]] || (comp_cache[line[1]] = Company.find_by_name(line[1])) if !comp puts "OMG IDK #{line[1]}" next end Client.create( attr: line[0], company: comp, active: ['1', 'oui', 'X'].include?(line[2]) ) } 20mn: data not good
  3. comp_cache = {} FasterCSV.parse(File.open('test.csv'), 'r').each{|line| next if line.blank? || line.size

    != 5 comp = comp_cache[line[1]] || (comp_cache[line[1]] = Company.find_by_name(line[1])) if !comp puts "OMG IDK #{line[1]}" next end Client.create( attr: line[0], company: comp, active: ['1', 'oui', 'X'].include?(line[2]), first_name: line[3].split.first, ! ! last_name: line[3].split.last, comments: "#{line[4]} #{line[5]} (#{line[6]})" ) } 30mn: more info
  4. comp_cache = {} FasterCSV.parse(File.open('test.csv'), 'r').each{|line| next if line.blank? || line.size

    != 5 comp = comp_cache[line[1]] || (comp_cache[line[1]] = Company.find_by_name(line[1])) if !comp puts "OMG IDK #{line[1]}" next end c = Client.find_or_create_by_attr(line[0]) c.update_columns( company: comp, active: ['1', 'oui', 'X'].include?(line[2]), first_name: line[3].split.first, ! ! last_name: line[3].split.last, comments: "#{line[4]} #{line[5]} (#{line[6]})" ) ok = c.save puts c.errors.full_messages.inspect unless ok ok }.partition{|x| x}.map{|a| "#{a.size} #{a.first}" } 45mn: some logs
  5. def to_date(date) months = ["JAN", "FEB", "MAR", "APR", "MAY", "JUN",

    "JUL", "AUG", "SEP", "OCT", "NOV", "DEC"] tmp = date.scan(/([0-9]{2})([A-Z]{3})([0-9]{4})/).flatten tmp[0] = tmp[0].to_i tmp[1] = months.index(tmp[1]) + 1 tmp[2] = tmp[2].to_i Date.new(tmp[2], tmp[1], tmp[0]) end def to_float(var) (var.blank? || var == '.' ? nil : var.to_s.tr(' ,', '_.').to_f) end def transcode_file(filepath) input = File.open(filepath) File.open(Rails.root.join("tmp", "import_utf8.csv"), "wb") do |f| until input.eof? content = input.read(2**16) detection = CharlockHolmes::EncodingDetector.detect(content) content = CharlockHolmes::Converter.convert(content, detection[:encoding], "UTF-8") f.write(content) end end end 1h: some ‘utils’
  6. # gem install upsert # gem install smartercsv SmarterCSV.process( Rails.root.join("tmp",

    "test.csv"), :col_sep => ';', :chunk_size => 10000) do |chunk| # ... Upsert.batch(connection, 'infos') do |upsert_infos| upsert.row({code: row[:code]}, name: row[:nom]) # ... end end Tools to the rescue
  7. And much more... - source - field separator - encoding

    - XLS tab - behaviour - numbers - offset - errors - create / update - column match - update on... ?
  8. Check my data ‣ find duplicates - error or merge

    ‣ error correction - merging tool ‣ errors as a CSV - import, export, reimport
  9. Another source ‣ another format? - some more refactoring ahead

    - perf issues? ‣ keep the source! - file - format - person - timestamp - BLAME ALL THE THINGS!
  10. I just need... ‣ to import CSV files - that’s

    easy right? - right... ‣ reporting dashboards - just some sums right? - yeah, and filters - and export! - and graphs! - ...
  11. Programmer Time Translation Table est real coder thinks manager knows

    30s 1h trivial! do, build, test, deploy... 5mn 2h easy! unexpected problem 1h 2h code... not on the 1st try 4h 4h check docs realistic 8h 12~16d minor refactor many dependencies 2d 5d OK, real code same as before 1wk 2~20d wow, er... let’s see with the team http://coding.abel.nu/2012/06/programmer-time-translation-table/