Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cleaning up a huge ruby application

Shia
April 20, 2019

Cleaning up a huge ruby application

Shia

April 20, 2019
Tweet

More Decks by Shia

Other Decks in Technology

Transcript

  1. Cleaning up a huge ruby
    application
    RubyKaigi 2019
    @shia

    View Slide

  2. •Sangyong Sim(@shia)
    •Cookpad Inc.
    •Software Engineer
    •i18n team of w.r-l.o, elixirschool
    !2

    View Slide

  3. !3

    View Slide

  4. Index
    •about cookpad_all
    •define the code want to delete
    •what makes it difficult
    •what we try & solve
    !4

    View Slide

  5. cookpad_all

    View Slide

  6. View Slide

  7. View Slide

  8. cookpad_all
    !8
    cookpad
    (web)
    pantry
    (api)
    kuroko
    (batch)
    papa
    (admin)
    mobile
    shared

    View Slide

  9. cookpad_all
    •Make a bug to fix a bug
    •Gems are too old version which have no feature we
    want, and it's hard to update
    •A lot of monkey patch to Object, String, gems
    !9

    View Slide

  10. Project Odaiba

    View Slide

  11. Project Odaiba
    •Clean up codes
    •Let it work on (docker) container
    •Extract system/services which mixed in it
    !11

    View Slide

  12. Project Odaiba
    •Clean up codes
    ‣ Today's topic
    •Let it work on (docker) container
    •Extract system/services which mixed in it
    !12

    View Slide

  13. the code we want to delete

    View Slide

  14. the code we may delete
    •Known unnecessary code
    •Unknown unnecessary code
    ‣ Executed meaningless code
    ‣ Not executed code
    !14

    View Slide

  15. the code we may delete
    •Known unnecessary code
    •Unknown unnecessary code
    ‣ Executed meaningless code
    ‣ Not executed code
    !15

    View Slide

  16. the code we may delete
    •Known unnecessary code
    •Unknown unnecessary code
    ‣ Executed meaningless code
    ‣ Not executed code
    !16

    View Slide

  17. Why delete code?

    View Slide

  18. why delete code?
    •Reduce code to understand
    •Reduce library dependency
    •Fast test, fast boot
    !18

    View Slide

  19. why delete code?
    •Better productivity
    !19

    View Slide

  20. What makes it difficult?

    View Slide

  21. What makes it difficult?
    •High cost(especially time)
    •Low priority
    •Continuously growing
    !21

    View Slide

  22. What makes it difficult?
    !22
    if condition_seem_not_to_be_false
    # ..
    else
    foo_without_bar
    end

    View Slide

  23. What makes it difficult?
    !23
    def foo
    # ...
    end
    def foo_with_bar
    # ...
    end
    alias_method_chain :foo, :bar
    alias_method_chain :foo, :bar
    # =>
    alias_method :foo_without_bar, :foo
    alias_method :foo, :foo_with_bar

    View Slide

  24. What makes it difficult?
    •High cost(especially time)
    •Low priority
    •Continuously growing
    !24

    View Slide

  25. What makes it difficult?
    •High cost(especially time)
    •Low priority
    •Continuously growing
    !25

    View Slide

  26. What makes it difficult?
    !26

    View Slide

  27. What makes it difficult?
    •High cost(especially time)
    •Low priority
    •Continuously growing
    !27

    View Slide

  28. What we can do?

    View Slide

  29. What can we do
    •Low priority
    ‣ Raise priority
    •Continuously growing
    •High cost(especially time)
    !29

    View Slide

  30. What can we do
    •Low priority
    ‣ Raise priority
    •Continuously growing
    ‣ Continuously delete code
    •High cost(especially time)
    !30

    View Slide

  31. What can we do
    •Low priority
    ‣ Raise priority
    •Continuously growing
    ‣ Continuously delete code
    •High cost(especially time)
    ‣ Auto detect potential unused code to reduce cost
    !31

    View Slide

  32. What we try

    View Slide

  33. Let continuously delete code

    View Slide

  34. KitchenCleaner
    •Auto-detect potential unused code
    •Guess people related these code
    •Open issue or ping assignee
    •Give context as much as possible for decision
    making
    !34

    View Slide

  35. KitchenCleaner
    !35

    View Slide

  36. KitchenCleaner
    !36

    View Slide

  37. KitchenCleaner
    !37

    View Slide

  38. Target code
    •Controller which has no pv in a year
    •Batch class which have no execution history in 3
    month
    •Unused (chanko) unit
    !38

    View Slide

  39. Target developer
    •List up authors from "git log"
    •Filter out who quited the job
    •Random assign from it
    !39

    View Slide

  40. Current status
    •Target code is reducing, but not fast
    •It leaves a good context
    !40

    View Slide

  41. Logging code execution on
    production

    View Slide

  42. !42
    ko1: It seems to be possible to find out unused codes on
    production environment by iseq lazy loading introduced
    in Ruby 2.4
    shia: Why not? Let's try!
    ko1: I wrote the patch on my way to work.

    View Slide

  43. Ship it!!!!

    View Slide

  44. IseqLogger
    •Logging when iseq first executed.
    !44

    View Slide

  45. Wait, What is "iseq"?

    View Slide

  46. InstructionSequence(iseq)
    •A compiled sequence of instructions for the Ruby
    Virtual Machine.
    !46

    View Slide

  47. InstructionSequence(iseq)
    !47
    # sample.rb
    class Cat
    def sleep?
    true
    end
    end

    View Slide

  48. InstructionSequence(iseq)
    !48
    > puts RubyVM::InstructionSequence.compile_file('sample.rb').disasm
    == disasm: #@sample.rb:1 (1,0)-(5,3)> (catch: FALSE)
    ...
    == disasm: #@sample.rb:1 (1,0)-(5,3)> (catch: FALSE)
    ...
    == disasm: # (catch: FALSE)
    ...

    View Slide

  49. InstructionSequence(iseq)
    !49

    -
    - sleep?

    View Slide

  50. Logging code execution
    !50
    # sample.rb
    class Cat # <- executed
    def sleep? # <- not executed
    true
    end
    def go_out # <- executed
    # ...
    end
    end
    Cat.new.go_out

    View Slide

  51. Logging code execution
    !51
    // https://github.com/ruby/ruby/blob/v2_4_3/vm_core.h#L415-L424
    static inline const rb_iseq_t *
    rb_iseq_check(const rb_iseq_t *iseq)
    {
    #if USE_LAZY_LOAD
    if (iseq->body == NULL) {
    rb_iseq_complete((rb_iseq_t *)iseq);
    }
    #endif
    return iseq;
    }

    View Slide

  52. Logging code execution
    !52
    static inline const rb_iseq_t *
    rb_iseq_check(const rb_iseq_t *iseq)
    {
    @@ -419,6 +429,11 @@ rb_iseq_check(const rb_iseq_t *iseq)
    if (iseq->body == NULL) {
    rb_iseq_complete((rb_iseq_t *)iseq);
    }
    +#endif
    +#if USE_EXECUTED_CHECK
    + if ((iseq->flags & ISEQ_FL_EXECUTED) == 0) {
    + rb_iseq_executed_check_dump((rb_iseq_t *)iseq);
    + }
    #endif

    View Slide

  53. Performance
    •Impact to performance is almost zero.
    •Boot time might be slower.
    ‣ In our case, logging via File I/O gives 2~3% slower
    boot time.
    !53

    View Slide

  54. Transfer logs
    app fluentd Redshift
    •log to file when it triggered.
    •{ created_at:, filename:, lineno:, label: }
    !54

    View Slide

  55. Generate report
    •What we need
    ‣ Which iseq is used or not
    ‣ Base on latest source code
    !55

    View Slide

  56. Logging code execution
    !56
    Iseq = Struct.new(:path, :lineno, :label)
    def traverse(results, iseq)
    results << Iseq.new(iseq.absolute_path.sub(PROJECT_BASE_PATH + '/', ''),
    iseq.first_lineno, iseq.label)
    iseq.each_child do |child|
    traverse(results, child)
    end
    results
    end
    iseq = RubyVM::InstructionSequence.compile_file(some_path)

    View Slide

  57. Logging code execution
    !57
    { created_at: ..., filename: 'sample.rb', lineno: 1, label: 'main' }
    { created_at: ..., filename: 'sample.rb', lineno: 1, label: '' }
    { created_at: ..., filename: 'sample.rb', lineno: 2, label: 'sleep?' }
    { created_at: ..., filename: 'sample.rb', lineno: 1, label: 'main' }
    { created_at: ..., filename: 'sample.rb', lineno: 1, label: '' }
    Executed iseqs
    Iseqs from latest code

    View Slide

  58. Logging code execution
    !58

    View Slide

  59. Logging code execution
    !59

    View Slide

  60. Logging code execution
    !60

    View Slide

  61. Logging code execution
    !61

    View Slide

  62. Judge which log is usable
    •How to judge which log is usable or not?
    !62

    View Slide

  63. Judge which log is usable
    •How to judge which log is usable or not?
    ‣ Let log usable if previous log have same filename +
    lineno + label with latest source code.
    !63

    View Slide

  64. Judge which log is usable
    !64
    # revision A
    # monthly_batch.rb
    # ...
    # recipe.rb
    class Recipe
    def author_name
    author.name
    end
    end
    # revision B
    # monthly_batch.rb
    # ...
    # recipe.rb
    class Recipe
    def author_name
    author.name
    end
    def author_id
    author.id
    end
    end

    View Slide

  65. Judge which log is usable
    !65
    Logs
    - 4/19, recipe.rb, L1, “”, revision A
    - 4/19, recipe.rb, L2, “”, revision A
    - 4/19, monthly_batch.rb, L1, “”, revision A
    - 4/20, recipe.rb, L1, “”, revision B
    - 4/20, recipe.rb, L6, “”, revision B
    # revision B
    # monthly_batch.rb
    # ...
    # recipe.rb
    class Recipe
    def author_name
    author.name
    end
    def author_id
    author.id
    end
    end

    View Slide

  66. Judge which log is usable
    •In our case, only 65% of files updated in a year
    •Assume file which rarely updated might have
    unused codes then frequently updated one
    !66

    View Slide

  67. Good point
    •We could find out which template rendered or not!!
    !67

    View Slide

  68. Bad point
    •Need to build Ruby
    •Using iseq for detecting could miss some cases we
    want
    !68

    View Slide

  69. Bad point
    !69
    # It has dead condition
    if some_method_returns_always_false
    not_called_in_if
    else
    always_called
    end
    # What we want
    always_called
    not_called_in_if

    View Slide

  70. Bad point
    !70
    # It has dead condition
    if some_method_returns_always_false
    not_called_in_if
    else
    always_called
    end
    not_called_in_if
    # What we want
    always_called
    not_called_in_if

    View Slide

  71. Bad point
    •Need to build Ruby
    •Using iseq for detecting could miss some cases we
    want
    !71

    View Slide

  72. Logging code execution on
    production per line!!

    View Slide

  73. !73
    mame: I think coverage is enough for that use case.
    shia: I'm worry about degrade performance by coverage,
    because it tries to hook when every line execution.
    mame: How about remove hook after it triggered once?

    View Slide

  74. Ship it!!!!

    View Slide

  75. First of all...

    View Slide

  76. Let cookpad_all
    to use Ruby 2.6!!!!!!

    View Slide

  77. Oneshot coverage
    •It just work as same as normal coverage, but it
    only check line executed or not.
    ‣ It removes hook after triggered.
    !77

    View Slide

  78. Oneshot coverage
    !78
    require "coverage"
    Coverage.start(oneshot_lines: true)
    load "test.rb"
    p Coverage.result
    # {"test.rb"=>{:oneshot_lines=>[2,
    10, 3, 4, 11]}}
    # 1: # test.rb
    # 2: def foo(n)
    # 3: if n <= 10
    # 4: p "n < 10"
    # 5: else
    # 6: p "n >= 10"
    # 7: end
    # 8: end
    # 9:
    # 10: foo(1)
    # 11: foo(2)

    View Slide

  79. Transfer logs
    app DynamoDB Redshift
    •Send log when:
    ‣ per 1 hour
    ‣ after batch execution finished
    •{ filename:, md5_hash:, executed_lines: }
    !79
    per 1 hour per 1 day

    View Slide

  80. Transfer logs
    app DynamoDB Redshift
    •Use dynamoDB:
    ‣ Sending logs per line is not practical approach in this case
    ‣ Send data to Redshift as daily snapshot
    •{ filename:, md5_hash:, executed_lines: }
    !80
    per 1 hour per 1 day
    Purge per day

    View Slide

  81. How to merge logs
    !81
    Case: need to merge
    - DynamoDB => "recipe.rb-somehash": [1,2,3,5,9]
    - Coverage.result => "recipe.rb-somehash": [1,2,3,6,7,8]
    - New data => "recipe.rb-somehash": [1,2,3,5,9,6,7,8]
    Case: Don't need to merge
    - DynamoDB => "recipe.rb-somehash": [1,2,3,5,9]
    - Coverage.result => "recipe.rb-somehash": [1,2,3]

    View Slide

  82. Transfer logs
    app DynamoDB Redshift
    •Use dynamoDB:
    ‣ Sending logs per line is not practical approach in this case
    ‣ Send data to Redshift as daily snapshot
    •{ filename:, md5_hash:, executed_lines: }
    !82
    per 1 hour per 1 day
    Purge per day

    View Slide

  83. Generate report
    •What we need
    ‣ Which line is used or not
    ‣ Base on latest source code
    !83

    View Slide

  84. Generate report
    !84

    View Slide

  85. Judge which log is usable
    •How to judge which log is usable or not?
    !85

    View Slide

  86. Judge which log is usable
    •In our case, only 65% of files updated in a year
    •Assume file which rarely updated might have
    unused codes then frequently updated one
    !86

    View Slide

  87. Judge which log is usable
    •Use md5 hash whether log is usable or not
    !87

    View Slide

  88. Good point
    •No need to build Ruby!
    •Per line logs give us much more information
    !88

    View Slide

  89. Bad point
    •Too many number of logs
    •Hard to find unused file (than IseqLogger)
    !89

    View Slide

  90. Bad point
    !90
    1: class Recipe
    2: def name
    3: @name
    4: end
    5:
    6: def author
    7: @author
    8: end
    9:end

    View Slide

  91. Bad point
    !91
    1: class Recipe
    2: def name
    3: @name
    -: end
    -:
    6: def author
    7: @author
    -: end
    -:end

    View Slide

  92. Bad point
    !92
    # oneshot coverage
    1: class Recipe
    2: def name
    3: @name
    -: end
    -:
    6: def author
    7: @author
    -: end
    -:end
    # IseqLogger
    class Recipe # <- Executed
    def name # <- Not executed
    @name
    end
    def author # <- Not executed
    @author
    end
    end

    View Slide

  93. Current status
    •Running on production
    !93

    View Slide

  94. Performance Impact
    !94

    View Slide

  95. Performance Impact
    !95

    View Slide

  96. Coverband
    •https://github.com/danmayer/coverband/
    •We don't use this:
    ‣ oneshot coverage is not supported, yet
    ‣ verbose toolkit to us
    ‣ might loss unused information with not changed file
    !96

    View Slide

  97. oneshot_coverage
    •https://github.com/riseshia/oneshot_coverage
    •Provide minimum toolkit for logging
    •Support (only) oneshot coverage mode
    •Focus on how to emit data from app
    !97

    View Slide

  98. Auto-detect unused codes on production
    is not almighty.

    View Slide

  99. Rome was not built in a day

    View Slide

  100. !100

    View Slide

  101. !101

    View Slide

  102. !102

    View Slide

  103. This is everyone's achievement.

    View Slide

  104. Summary

    View Slide

  105. !105
    •Low priority
    ‣ Let code cleaning be a part of project.
    •Continuously growing
    ‣ Let developers remove their own unused codes per month
    •High cost(especially time)
    ‣ Auto detect potential unused code on production to reduce cost
    Summary

    View Slide

  106. Conclusion
    !106

    View Slide

  107. It is necessary to work hard for cleaning codes
    !107

    View Slide

  108. It is necessary to work hard for cleaning codes
    !108
    It is necessary to work little more

    View Slide