$30 off During Our Annual Pro Sale. View Details »

Little needs, big problems

Little needs, big problems

"That's easy, I just need... could you do that in 5 minutes?"
Love it or hate it, you've heard that a lot.

Young devs accept to be the rockstars, old devs forbid this outright as the symptom of deeper and more dangerous causes.
How and why?

What I'll show here are simple needs you run into every day, the traps they hide and some possible evolutions I've met.

[Currently this slide only includes importing CSV, my next pet peeves could be versioning, logging, cleaning data...]

Sylvain Abélard

November 12, 2013
Tweet

More Decks by Sylvain Abélard

Other Decks in Programming

Transcript

  1. Little
    needs
    @abelar_s / maitre-du-monde.fr
    HumanTalks 2013-11-12

    View Slide

  2. Big
    problems

    View Slide

  3. Advice

    View Slide

  4. Gurus

    View Slide

  5. YAGNI
    Gurus

    View Slide

  6. YAGNI
    You Ain’t Gonna Need It
    So many devs...

    View Slide

  7. Suits

    View Slide

  8. $
    Suits

    View Slide

  9. $
    Contract
    So many salesguys...

    View Slide

  10. My friend

    View Slide

  11. $
    My friend

    View Slide

  12. $
    Free has no value
    Freelancer

    View Slide

  13. ?
    Myself

    View Slide

  14. ?
    My own company

    View Slide

  15. ?
    Long-term value
    Software editor

    View Slide

  16. Code?

    View Slide

  17. Import a
    CSV file

    View Slide

  18. IO.readlines('test.csv').each{|line|
    Model.create(attr: line[0])
    }
    30s: trivial

    View Slide

  19. FasterCSV.parse(File.open('test.csv'), 'r').each{ |line|
    Client.create(
    attr: line[0],
    company: Company.find_by_name(line[1]),
    active: ['1', 'oui', 'X'].include?(line[2])
    )
    }
    5mn: some more data

    View Slide

  20. comp_cache = {}
    FasterCSV.parse(File.open('test.csv'), 'r').each{|line|
    Client.create(
    attr: line[0],
    company: comp_cache[line[1]] ||
    (comp_cache[line[1]] =
    Company.find_by_name(line[1])),
    active: ['1', 'oui', 'X'].include?(line[2])
    )
    }
    10mn: perf’s not good

    View Slide

  21. comp_cache = {}
    FasterCSV.parse(File.open('test.csv'), 'r').each{|line|
    next if line.blank? || line.size != 5
    comp = comp_cache[line[1]] ||
    (comp_cache[line[1]] =
    Company.find_by_name(line[1]))
    if !comp
    puts "OMG IDK #{line[1]}"
    next
    end
    Client.create(
    attr: line[0],
    company: comp,
    active: ['1', 'oui', 'X'].include?(line[2])
    )
    }
    20mn: data not good

    View Slide

  22. comp_cache = {}
    FasterCSV.parse(File.open('test.csv'), 'r').each{|line|
    next if line.blank? || line.size != 5
    comp = comp_cache[line[1]] ||
    (comp_cache[line[1]] =
    Company.find_by_name(line[1]))
    if !comp
    puts "OMG IDK #{line[1]}"
    next
    end
    Client.create(
    attr: line[0],
    company: comp,
    active: ['1', 'oui', 'X'].include?(line[2]),
    first_name: line[3].split.first,
    ! ! last_name: line[3].split.last,
    comments: "#{line[4]} #{line[5]} (#{line[6]})"
    )
    }
    30mn: more info

    View Slide

  23. comp_cache = {}
    FasterCSV.parse(File.open('test.csv'), 'r').each{|line|
    next if line.blank? || line.size != 5
    comp = comp_cache[line[1]] ||
    (comp_cache[line[1]] =
    Company.find_by_name(line[1]))
    if !comp
    puts "OMG IDK #{line[1]}"
    next
    end
    c = Client.find_or_create_by_attr(line[0])
    c.update_columns(
    company: comp,
    active: ['1', 'oui', 'X'].include?(line[2]),
    first_name: line[3].split.first,
    ! ! last_name: line[3].split.last,
    comments: "#{line[4]} #{line[5]} (#{line[6]})"
    )
    ok = c.save
    puts c.errors.full_messages.inspect unless ok
    ok
    }.partition{|x| x}.map{|a| "#{a.size} #{a.first}" }
    45mn: some logs

    View Slide

  24. def to_date(date)
    months = ["JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP",
    "OCT", "NOV", "DEC"]
    tmp = date.scan(/([0-9]{2})([A-Z]{3})([0-9]{4})/).flatten
    tmp[0] = tmp[0].to_i
    tmp[1] = months.index(tmp[1]) + 1
    tmp[2] = tmp[2].to_i
    Date.new(tmp[2], tmp[1], tmp[0])
    end
    def to_float(var)
    (var.blank? || var == '.' ? nil : var.to_s.tr(' ,', '_.').to_f)
    end
    def transcode_file(filepath)
    input = File.open(filepath)
    File.open(Rails.root.join("tmp", "import_utf8.csv"), "wb") do |f|
    until input.eof?
    content = input.read(2**16)
    detection = CharlockHolmes::EncodingDetector.detect(content)
    content = CharlockHolmes::Converter.convert(content,
    detection[:encoding], "UTF-8")
    f.write(content)
    end
    end
    end
    1h: some ‘utils’

    View Slide

  25. Feeling
    ashamed

    View Slide

  26. ... automating...

    View Slide

  27. # gem install upsert
    # gem install smartercsv
    SmarterCSV.process(
    Rails.root.join("tmp", "test.csv"),
    :col_sep => ';',
    :chunk_size => 10000) do |chunk|
    # ...
    Upsert.batch(connection, 'infos') do |upsert_infos|
    upsert.row({code: row[:code]}, name: row[:nom])
    # ...
    end
    end
    Tools to the rescue

    View Slide

  28. See the
    client

    View Slide

  29. Independence day!

    View Slide

  30. Bad data: purge

    View Slide

  31. Where am I?

    View Slide

  32. Bad files

    View Slide

  33. OK, KO, warnings...

    View Slide

  34. And much more...
    - source
    - field separator
    - encoding
    - XLS tab
    - behaviour
    - numbers
    - offset
    - errors
    - create / update
    - column match
    - update on... ?

    View Slide

  35. What
    else?

    View Slide

  36. Check my data
    ‣ find duplicates
    - error or merge
    ‣ error correction
    - merging tool
    ‣ errors as a CSV
    - import, export, reimport

    View Slide

  37. Another source
    ‣ another format?
    - some more refactoring ahead
    - perf issues?
    ‣ keep the source!
    - file
    - format
    - person
    - timestamp
    - BLAME ALL THE THINGS!

    View Slide

  38. I just need...
    ‣ to import CSV files
    - that’s easy right?
    - right...
    ‣ reporting dashboards
    - just some sums right?
    - yeah, and filters
    - and export!
    - and graphs!
    - ...

    View Slide

  39. Programmer Time
    Translation Table
    est real coder thinks manager knows
    30s 1h trivial! do, build, test, deploy...
    5mn 2h easy! unexpected problem
    1h 2h code... not on the 1st try
    4h 4h check docs realistic
    8h 12~16d minor refactor many dependencies
    2d 5d OK, real code same as before
    1wk 2~20d wow, er... let’s see with the team
    http://coding.abel.nu/2012/06/programmer-time-translation-table/

    View Slide

  40. Thanks!
    @abelar_s / maitre-du-monde.fr
    HumanTalks 2013-11-12

    View Slide

  41. Questions?
    @abelar_s / maitre-du-monde.fr
    HumanTalks 2013-11-12

    View Slide