Cleaning up a huge ruby application

0f54aa7d6df3dbcbde91a02d0e2f75d5?s=47 Shia
April 20, 2019

Cleaning up a huge ruby application

0f54aa7d6df3dbcbde91a02d0e2f75d5?s=128

Shia

April 20, 2019
Tweet

Transcript

  1. 3.

    !3

  2. 4.

    Index •about cookpad_all •define the code want to delete •what

    makes it difficult •what we try & solve !4
  3. 6.
  4. 7.
  5. 9.

    cookpad_all •Make a bug to fix a bug •Gems are

    too old version which have no feature we want, and it's hard to update •A lot of monkey patch to Object, String, gems !9
  6. 11.

    Project Odaiba •Clean up codes •Let it work on (docker)

    container •Extract system/services which mixed in it !11
  7. 12.

    Project Odaiba •Clean up codes ‣ Today's topic •Let it

    work on (docker) container •Extract system/services which mixed in it !12
  8. 14.

    the code we may delete •Known unnecessary code •Unknown unnecessary

    code ‣ Executed meaningless code ‣ Not executed code !14
  9. 15.

    the code we may delete •Known unnecessary code •Unknown unnecessary

    code ‣ Executed meaningless code ‣ Not executed code !15
  10. 16.

    the code we may delete •Known unnecessary code •Unknown unnecessary

    code ‣ Executed meaningless code ‣ Not executed code !16
  11. 23.

    What makes it difficult? !23 def foo # ... end

    def foo_with_bar # ... end alias_method_chain :foo, :bar alias_method_chain :foo, :bar # => alias_method :foo_without_bar, :foo alias_method :foo, :foo_with_bar
  12. 29.

    What can we do •Low priority ‣ Raise priority •Continuously

    growing •High cost(especially time) !29
  13. 30.

    What can we do •Low priority ‣ Raise priority •Continuously

    growing ‣ Continuously delete code •High cost(especially time) !30
  14. 31.

    What can we do •Low priority ‣ Raise priority •Continuously

    growing ‣ Continuously delete code •High cost(especially time) ‣ Auto detect potential unused code to reduce cost !31
  15. 34.

    KitchenCleaner •Auto-detect potential unused code •Guess people related these code

    •Open issue or ping assignee •Give context as much as possible for decision making !34
  16. 38.

    Target code •Controller which has no pv in a year

    •Batch class which have no execution history in 3 month •Unused (chanko) unit !38
  17. 39.

    Target developer •List up authors from "git log" •Filter out

    who quited the job •Random assign from it !39
  18. 42.

    !42 ko1: It seems to be possible to find out

    unused codes on production environment by iseq lazy loading introduced in Ruby 2.4 shia: Why not? Let's try! ko1: I wrote the patch on my way to work.
  19. 48.

    InstructionSequence(iseq) !48 > puts RubyVM::InstructionSequence.compile_file('sample.rb').disasm == disasm: #<ISeq:<main>@sample.rb:1 (1,0)-(5,3)> (catch:

    FALSE) ... == disasm: #<ISeq:<class:Cat>@sample.rb:1 (1,0)-(5,3)> (catch: FALSE) ... == disasm: #<ISeq:sleep?@sample.rb:2 (2,2)-(4,5)> (catch: FALSE) ...
  20. 50.

    Logging code execution !50 # sample.rb class Cat # <-

    executed def sleep? # <- not executed true end def go_out # <- executed # ... end end Cat.new.go_out
  21. 51.

    Logging code execution !51 // https://github.com/ruby/ruby/blob/v2_4_3/vm_core.h#L415-L424 static inline const rb_iseq_t

    * rb_iseq_check(const rb_iseq_t *iseq) { #if USE_LAZY_LOAD if (iseq->body == NULL) { rb_iseq_complete((rb_iseq_t *)iseq); } #endif return iseq; }
  22. 52.

    Logging code execution !52 static inline const rb_iseq_t * rb_iseq_check(const

    rb_iseq_t *iseq) { @@ -419,6 +429,11 @@ rb_iseq_check(const rb_iseq_t *iseq) if (iseq->body == NULL) { rb_iseq_complete((rb_iseq_t *)iseq); } +#endif +#if USE_EXECUTED_CHECK + if ((iseq->flags & ISEQ_FL_EXECUTED) == 0) { + rb_iseq_executed_check_dump((rb_iseq_t *)iseq); + } #endif
  23. 53.

    Performance •Impact to performance is almost zero. •Boot time might

    be slower. ‣ In our case, logging via File I/O gives 2~3% slower boot time. !53
  24. 54.

    Transfer logs app fluentd Redshift •log to file when it

    triggered. •{ created_at:, filename:, lineno:, label: } !54
  25. 55.

    Generate report •What we need ‣ Which iseq is used

    or not ‣ Base on latest source code !55
  26. 56.

    Logging code execution !56 Iseq = Struct.new(:path, :lineno, :label) def

    traverse(results, iseq) results << Iseq.new(iseq.absolute_path.sub(PROJECT_BASE_PATH + '/', ''), iseq.first_lineno, iseq.label) iseq.each_child do |child| traverse(results, child) end results end iseq = RubyVM::InstructionSequence.compile_file(some_path)
  27. 57.

    Logging code execution !57 { created_at: ..., filename: 'sample.rb', lineno:

    1, label: 'main' } { created_at: ..., filename: 'sample.rb', lineno: 1, label: '<class:Cat>' } { created_at: ..., filename: 'sample.rb', lineno: 2, label: 'sleep?' } { created_at: ..., filename: 'sample.rb', lineno: 1, label: 'main' } { created_at: ..., filename: 'sample.rb', lineno: 1, label: '<class:Cat>' } Executed iseqs Iseqs from latest code
  28. 63.

    Judge which log is usable •How to judge which log

    is usable or not? ‣ Let log usable if previous log have same filename + lineno + label with latest source code. !63
  29. 64.

    Judge which log is usable !64 # revision A #

    monthly_batch.rb # ... # recipe.rb class Recipe def author_name author.name end end # revision B # monthly_batch.rb # ... # recipe.rb class Recipe def author_name author.name end def author_id author.id end end
  30. 65.

    Judge which log is usable !65 Logs - 4/19, recipe.rb,

    L1, “<class:Recipe>”, revision A - 4/19, recipe.rb, L2, “<class:Recipe>”, revision A - 4/19, monthly_batch.rb, L1, “<class:MonthlyBatch>”, revision A - 4/20, recipe.rb, L1, “<class:Recipe>”, revision B - 4/20, recipe.rb, L6, “<class:Recipe>”, revision B # revision B # monthly_batch.rb # ... # recipe.rb class Recipe def author_name author.name end def author_id author.id end end
  31. 66.

    Judge which log is usable •In our case, only 65%

    of files updated in a year •Assume file which rarely updated might have unused codes then frequently updated one !66
  32. 68.
  33. 69.

    Bad point !69 # It has dead condition if some_method_returns_always_false

    not_called_in_if else always_called end # What we want always_called not_called_in_if
  34. 70.

    Bad point !70 # It has dead condition if some_method_returns_always_false

    not_called_in_if else always_called end not_called_in_if # What we want always_called not_called_in_if
  35. 71.
  36. 73.

    !73 mame: I think coverage is enough for that use

    case. shia: I'm worry about degrade performance by coverage, because it tries to hook when every line execution. mame: How about remove hook after it triggered once?
  37. 77.

    Oneshot coverage •It just work as same as normal coverage,

    but it only check line executed or not. ‣ It removes hook after triggered. !77
  38. 78.

    Oneshot coverage !78 require "coverage" Coverage.start(oneshot_lines: true) load "test.rb" p

    Coverage.result # {"test.rb"=>{:oneshot_lines=>[2, 10, 3, 4, 11]}} # 1: # test.rb # 2: def foo(n) # 3: if n <= 10 # 4: p "n < 10" # 5: else # 6: p "n >= 10" # 7: end # 8: end # 9: # 10: foo(1) # 11: foo(2)
  39. 79.

    Transfer logs app DynamoDB Redshift •Send log when: ‣ per

    1 hour ‣ after batch execution finished •{ filename:, md5_hash:, executed_lines: } !79 per 1 hour per 1 day
  40. 80.

    Transfer logs app DynamoDB Redshift •Use dynamoDB: ‣ Sending logs

    per line is not practical approach in this case ‣ Send data to Redshift as daily snapshot •{ filename:, md5_hash:, executed_lines: } !80 per 1 hour per 1 day Purge per day
  41. 81.

    How to merge logs !81 Case: need to merge -

    DynamoDB => "recipe.rb-somehash": [1,2,3,5,9] - Coverage.result => "recipe.rb-somehash": [1,2,3,6,7,8] - New data => "recipe.rb-somehash": [1,2,3,5,9,6,7,8] Case: Don't need to merge - DynamoDB => "recipe.rb-somehash": [1,2,3,5,9] - Coverage.result => "recipe.rb-somehash": [1,2,3]
  42. 82.

    Transfer logs app DynamoDB Redshift •Use dynamoDB: ‣ Sending logs

    per line is not practical approach in this case ‣ Send data to Redshift as daily snapshot •{ filename:, md5_hash:, executed_lines: } !82 per 1 hour per 1 day Purge per day
  43. 83.

    Generate report •What we need ‣ Which line is used

    or not ‣ Base on latest source code !83
  44. 86.

    Judge which log is usable •In our case, only 65%

    of files updated in a year •Assume file which rarely updated might have unused codes then frequently updated one !86
  45. 88.

    Good point •No need to build Ruby! •Per line logs

    give us much more information !88
  46. 89.

    Bad point •Too many number of logs •Hard to find

    unused file (than IseqLogger) !89
  47. 90.

    Bad point !90 1: class Recipe 2: def name 3:

    @name 4: end 5: 6: def author 7: @author 8: end 9:end
  48. 91.

    Bad point !91 1: class Recipe 2: def name 3:

    @name -: end -: 6: def author 7: @author -: end -:end
  49. 92.

    Bad point !92 # oneshot coverage 1: class Recipe 2:

    def name 3: @name -: end -: 6: def author 7: @author -: end -:end # IseqLogger class Recipe # <- Executed def name # <- Not executed @name end def author # <- Not executed @author end end
  50. 96.

    Coverband •https://github.com/danmayer/coverband/ •We don't use this: ‣ oneshot coverage is

    not supported, yet ‣ verbose toolkit to us ‣ might loss unused information with not changed file !96
  51. 100.
  52. 101.
  53. 102.
  54. 104.
  55. 105.

    !105 •Low priority ‣ Let code cleaning be a part

    of project. •Continuously growing ‣ Let developers remove their own unused codes per month •High cost(especially time) ‣ Auto detect potential unused code on production to reduce cost Summary
  56. 108.

    It is necessary to work hard for cleaning codes !108

    It is necessary to work little more