$30 off During Our Annual Pro Sale. View Details »

Saving People from Typos

Saving People from Typos

Yuki Nishijima

June 05, 2015
Tweet

More Decks by Yuki Nishijima

Other Decks in Programming

Transcript

  1. 4 B W J O H  1F P Q

    M F  G S P N  5Z Q P T
  2. !ZVLJ : 6 , *  / * 4 )

    * + * . "  ੢ ౢ  ༔ و
  3. !ZVLJ 4PGUXBSF&OHJOFFSBU : 6 , *  / * 4

    ) * + * . "  ੢ ౢ  ༔ و
  4. !ZVLJ 4PGUXBSF&OHJOFFSBU $PNNJUFS : 6 , *  / *

    4 ) * + * . "  ੢ ౢ  ༔ و
  5. !ZVLJ .BJOUBJOFSPGkaminari : 6 , *  / * 4

    ) * + * . "  ੢ ౢ  ༔ و
  6. !ZVLJ .BJOUBJOFSPGkaminari $SFBUPSPGdid_you_mean : 6 , *  / *

    4 ) * + * . "  ੢ ౢ  ༔ و
  7. 4 B W J O H  1F P Q

    M F  G S P N  5Z Q P T
  8. "Yuki".starts_with?("Y")

  9. "Yuki".starts_with?("Y") # => NoMethodError: undefined method # `starts_with?’ for “Yuki":String

  10. "Yuki".starts_with?("Y") # => NoMethodError: undefined method # `starts_with?’ for “Yuki":String

    "Yuki".start_with?("Y") # => true
  11. None
  12. None
  13. 8 ) :  $ " / ` 5 

    3 6 # :  % 0  5 ) &  4 " . &
  14. 3 6 # :  * 4  % &

    4 * ( / & %  5 0  . " , &  1 3 0 ( 3 " . . & 3 4  ) " 1 1 :
  15. 8 ) :  $ " / ` 5 

    3 6 # :  % 0  5 ) &  4 " . &
  16. 5 I F  d i d _ y o

    u _ m e a n  H F N
  17. "Yuki".starts_with?("Y") # => NoMethodError: undefined method # `starts_with?’ for “Yuki":String

    "Yuki".start_with?("Y") # => true
  18. require "did_you_mean" "Yuki".starts_with?("Y") # => NoMethodError: undefined method # `starts_with?’

    for “Yuki":String "Yuki".start_with?("Y") # => true
  19. require "did_you_mean" "Yuki".starts_with?("Y") # => NoMethodError: undefined method # `starts_with?’

    for “Yuki":String # # Did you mean? start_with? # "Yuki".start_with?("Y") # => true
  20. ) 0 8  % 0 & 4  *

    5  8 0 3 ,
  21. w "  T Q F M M  D

    I F D L F S  w . P O L F Z  Q B U D I F T
  22. w "  T Q F M M  D

    I F D L F S  w . P O L F Z  Q B U D I F T
  23. 4QFMMDIFDLFS

  24. 4QFMMDIFDLFS *OQVU XJUIOPJTF

  25. 4QFMMDIFDLFS *OQVU 0VUQVU XJUIOPJTF

  26. 4QFMMDIFDLFS *OQVU 0VUQVU XJUIOPJTF

  27. * / 4 * % &  5 ) &

     4 1 & - -  $ ) & $ , & 3
  28. w %JDUJPOBSZ * / 4 * % &  5

    ) &  4 1 & - -  $ ) & $ , & 3
  29. w %JDUJPOBSZ w $POUSPMNFDIBOJTN * / 4 * % &

     5 ) &  4 1 & - -  $ ) & $ , & 3
  30. w %JDUJPOBSZ w $POUSPMNFDIBOJTN w 0QUJNJ[BUJPO * / 4 *

    % &  5 ) &  4 1 & - -  $ ) & $ , & 3
  31. % * $ 5 * 0 / " 3: w

    "TFUPGXPSET w "TQFMMDIFDLFSNBZIBWFTFWFSBMEJDUJPOBSJFT w 5IFDPOUFOUPGBEJDUJPOBSZNBZEJGGFSEFQFOEJOHPO UIFUZQFPGUIFTQFMMDIFDLFS
  32. $ 0 / 5 3 0 -  . &

    $ ) " / * 4 . w 1JDLTVQUIFNPTUMJLFMZTQFMMJOHDPSSFDUJPO T GPSUIFJOQVU w "TQFMMDIFDLFSNBZIBWFTFWFSBMDPOUSPMNFDIBOJTNT w 6TFTPOFPSNPSFNFUSJDT w TJNJMBSJUZCFUXFFOTUSJOHT FH-FWFOTIUFJO +BSP8JOLMFS  w OHSBN w /BJWFMZTDBOOJOHBMMUIFXPSETJOUIFEJDUJPOBSZXPVMECF QBJOGVMMZTMPX
  33. 0 1 5 * . * ; "5 * 0

    / w *NQSPWFTQFSGPSNBODFBOEPSBDDVSBDZ w 0QUJNJ[BUJPOUFDIOJRVFTNBZCFDPOUFYUTQFDJpD w (SBNNBSBOBMZTJT w 4UBUJTUJDBMNPEFM w 1SPOVODJBUJPOCBTFEDPSSFDUJPO
  34. l 5 I F Z  X F F S

    F  U S B W F M J O H  U I S P V H I  U I F  O J H I U  z
  35. XFSF XIFSF l 5 I F Z  X F

    F S F  U S B W F M J O H  U I S P V H I  U I F  O J H I U  z
  36. XFSF XIFSF ✅HSBNNBUJDBMMZDPSSFDU "JODPSSFDU l 5 I F Z 

    X F F S F  U S B W F M J O H  U I S P V H I  U I F  O J H I U  z
  37. l 5 I F Z  O P  I

    P X  U P  E P  J U  z
  38. LOPX ✅TBNFQSPOVODJBUJPO l 5 I F Z  O P

     I P X  U P  E P  J U  z
  39. 5 I F  d i d _ y o

    u _ m e a n  H F N w %JDUJPOBSZ w "TFUPGTZNCPMT w $POUSPMNFDIBOJTN w 6TFT-FWFOTIUFJOEJTUBODFUPDPSSFDUNJTUZQFEXPSET w 6TFT+BSP8JOLMFSEJTUBODFUPDPSSFDUNJTTQFMUXPSET w 0QUJNJ[BUJPO w $POUFYUCBTFEEJDUJPOBSZ
  40. Symbol.all_symbols % J D U J P O B S

    Z
  41. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker O P T

    I M I Z AT I O N
  42. . J T U Z Q F  $ P

    S S F D U J P O  . J T T Q F M M  $ P S S F D U J P O C O N T R O L M E C H A N I S M
  43. 8IZ/FFET5ZQFTPG$POUSPM.FDIBOJTN .JTUZQFEXPSET w 5IFDPSSFDUTQFMMJOHJTDPSSFDUMZSFNFNCFSFE CVUPOFPSNPSF JODPSSFDUMFUUFSTBSFUZQFECZNJTUBLF w DPSSFDUFECZUIFdid_you_meanHFNVTJOH-FWFOTIUFJOEJTUBODF .JTTQFMUXPSET w

    5IFDPSSFDUTQFMMJOHJTNJTSFNFNCFSFEPSOPUSFNFNCFSFEBUBMM w 5IFpSTUDIBSBDUFSJTBMXBZTDPSSFDU • Yannakoudakis, E.J. and Fawthrop, D., "The rules of spelling errors," Information Processing and Management, vol. 19, no. 2, pp. 87-99, 1983. w DPSSFDUFECZUIFdid_you_mean HFNVTJOH+BSP8JOLMFSEJTUBODF
  44. . J T U Z Q F  $ P

    S S F D U J P O  . J T T Q F M M  $ P S S F D U J P O
  45. - F W F O T I U F J

    O  % J T U B O D F
  46. start_with starts_with - F W F O T I U

    F J O  % J T U B O D F
  47. start_with starts_with - F W F O T I U

    F J O  % J T U B O D F 5IFEJTUBODFJT ✅JOTFSUJPO
  48. first_name full_name - F W F O T I U

    F J O  % J T U B O D F
  49. first_name full_name - F W F O T I U

    F J O  % J T U B O D F ✅ ✅ ✅ TVCTUJUVUJPO
  50. first_name full_name - F W F O T I U

    F J O  % J T U B O D F ✅ ✅ ✅ ✅ 5IFEJTUBODFJT EFMFUJPO TVCTUJUVUJPO
  51. .JTUZQF$PSSFDUJPO threshold = (input.length * 0.25).ceil dictionary.select do |word| Levenshtein.distance(word,

    input) <= threshold end
  52. .JTUZQF$PSSFDUJPO threshold = (input.length * 0.25).ceil dictionary.select do |word| Levenshtein.distance(word,

    input) <= threshold end
  53. .JTUZQF$PSSFDUJPO threshold = (input.length * 0.25).ceil dictionary.select do |word| Levenshtein.distance(word,

    input) <= threshold end
  54. . J T U Z Q F  $ P

    S S F D U J P O  . J T T Q F M M  $ P S S F D U J P O
  55. + B S P  8 J O L M

    F S  % J T U B O D F
  56. + B S P  % J T U B

    O D F   1 S F G J Y  # P O V T
  57. + B S P  % J T U B

    O D F
  58. +BSP%JTUBODF m : the number of matching characters t :

    half the number of transpositions
  59. first_name frist_name +BSP%JTUBODF

  60. +BSP%JTUBODF first_name frist_name

  61. +BSP%JTUBODF ??? first_name frist_name

  62. .BUDIJOHXJOEPX l1 = length of string1 l2 = length of

    string2
  63. first_name frist_name +BSP%JTUBODF ⛔ Matching window = (10 / 2)

    - 1 = 4
  64. first_name frist_name +BSP%JTUBODF mJT10

  65. first_name frist_name +BSP%JTUBODF ✅ ✅

  66. first_name frist_name +BSP%JTUBODF ✅ ✅ tJT1 IBMGOVNCFSPGUSBOTQPTJUJPOT

  67. +BSP%JTUBODF m : 10 (the number of matching letters) t

    : 1 (half the number of transpositions)
  68. +BSP%JTUBODF m : 10 (the number of matching letters) t

    : 1 (half the number of transpositions)
  69. m : 10 (the number of matching letters) t :

    1 (half the number of transpositions) 5IFEJTUBODFJT0.9666… +BSP%JTUBODF
  70. 1SFpY#POVT

  71. first_name frist_name 1SFpY#POVT

  72. first_name frist_name 1SFpY#POVT

  73. first_name frist_name ? ? ? ? 1SFpY#POVT

  74. first_name frist_name 1SFpY#POVT

  75. first_name frist_name ⛔ 1SFpY#POVT

  76. 1SFpY#POVT w = 0.1 (weight) mp = number of prefix

    matches j = Jaro distance
  77. 5IFQSFpYCPOVTJT0.0033… 1SFpY#POVT w = 0.1 (weight) mp = number of

    prefix matches j = Jaro distance
  78. +BSP8JOLMFS%JTUBODF Jaro-Winkler distance = 0.9666… + 0.0033… = 0.9699…

  79. .JTTQFMM$PSSFDUJPO w 1JDLTVQUIFDMPTFTUXPSEPOMZJG w /PNJTUZQFDPSSFDUJPOTBSFGPVOE w 5IF-FWFOTIUFJOEJTUBODFJTMPXFSUIBOUIFMFOHUIPG UIFTIPSUFSTUSJOH d =

    Levenshtein distance between 2 strings l1 = length of string1 l2 = length of string2
  80. I U U Q    H J U

     J P  W 3 E : 8 5 I F  E J E @ Z P V @ N F B O ` T  4 Q F M M  $ I F D L F S
  81. "  T Q F M M  D I

    F D L F S  . P O L F Z  Q B U D I F T ✅
  82. w /BNF&SSPS w /P.FUIPE&SSPS 4QFMMDIFDLFS

  83. w /BNF&SSPS w /P.FUIPE&SSPS 4QFMMDIFDLFS 6TFSJOQVU %JDUJPOBSZ

  84. w /BNF&SSPS w /P.FUIPE&SSPS 4QFMMDIFDLFS 6TFSJOQVU %JDUJPOBSZ “Did you mean?

    …”
  85. 07 & 3 3 * % * / ( 

    5 ) &  & 3 3 0 3  . & 4 4 "( & module DidYouMean module NameErrorExtension prepend_features NameError def to_s super + Formatter.new(corrections).to_s rescue super end def corrections SPELL_CHECKERS[self.class.to_s].new(self).corrections end end end
  86. module DidYouMean module NameErrorExtension prepend_features NameError def to_s super +

    Formatter.new(corrections).to_s rescue super end def corrections SPELL_CHECKERS[self.class.to_s].new(self).corrections end end end 07 & 3 3 * % * / (  5 ) &  & 3 3 0 3  . & 4 4 "( &
  87. module DidYouMean module NameErrorExtension prepend_features NameError def to_s super +

    Formatter.new(corrections).to_s rescue super end def corrections SPELL_CHECKERS[self.class.to_s].new(self).corrections end end end 07 & 3 3 * % * / (  5 ) &  & 3 3 0 3  . & 4 4 "( &
  88. * / 4 * % &  5 ) &

     * / * 5 * " - * ; & 3 module DidYouMean class ANameChecker include SpellCheckable def initialize(exception) # pull out the user input and generate # a dictionary using the exception object. end end end
  89. NameError#name

  90. NameError#name begin doesnt_exist rescue NameError => error error.name # =>

    :doesnt_exist end begin DoesntExist rescue NameError => error error.name # => :DoesntExist end begin @@doesnt_exist rescue NameError => error error.name # => :@@doesnt_exist end
  91. NameError#name begin doesnt_exist("argument") rescue NoMethodError => error error.name # =>

    :doesnt_exist end begin self.doesnt_exist rescue NoMethodError => error error.name # => :doesnt_exist end
  92. * / 4 * % &  5 ) &

     * / * 5 * " - * ; & 3 module DidYouMean class ANameChecker include SpellCheckable def initialize(exception) @name = exception.name # user input @names = ???? # dictionary end end end
  93. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker

  94. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker

  95. . & 5 ) 0 %  / " .

    & 4 module DidYouMean class MethodNameChecker include SpellCheckable def initialize(no_method_error) # user input @name = no_method_error.name # dictionary @method_names = no_method_error.receiver.methods end end end
  96. module DidYouMean class MethodNameChecker include SpellCheckable def initialize(no_method_error) # user

    input @name = no_method_error.name # dictionary @method_names = no_method_error.receiver.methods end end end . & 5 ) 0 %  / " . & 4
  97. NameError#receiver R uby 2.3! string = "receiver" begin string.doesnt_exist rescue

    NameError => error error.receiver == string # => true end w 6TFEUPCFJNQMFNFOUFEBTB$FYUFOTJPO w *TOPXQBSUPG3VCZ w 3FUVSOTUIFSFDFJWFSXIFSFUIFNFUIPEJTDBMMFEPO
  98. . & 5 ) 0 %  / " .

    & 4 module DidYouMean class MethodNameChecker include SpellCheckable def initialize(no_method_error) # user input @name = no_method_error.name # dictionary @method_names = no_method_error.receiver.methods end end end
  99. . & 5 ) 0 %  / " .

    & 4 module DidYouMean class MethodNameChecker include SpellCheckable def initialize(no_method_error) # user input @name = no_method_error.name # dictionary @method_names = no_method_error.receiver.methods end def candidates { @name => @method_names } end end end
  100. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker ✅

  101. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker ✅

  102. 7" 3 * " # - &  / "

    . & 4 module DidYouMean class VariableNameChecker include SpellCheckable def initialize(name_error) # user input @name = name_error.name.to_s # dictionary r = name_error.receiver @method_names = r.methods + r.private_methods @ivar_names = r.instance_variables @cvar_names = r.class.class_variables @cvar_names += r.class_variables if r.kind_of?(Module) @lvar_names = name_error.local_variables end def candidates { @name => (@lvar_names + @method_names + @ivar_names + @cvar_names) } end end end
  103. 7" 3 * " # - &  / "

    . & 4 module DidYouMean class VariableNameChecker include SpellCheckable def initialize(name_error) # user input @name = name_error.name.to_s # dictionary r = name_error.receiver @method_names = r.methods + r.private_methods @ivar_names = r.instance_variables @cvar_names = r.class.class_variables @cvar_names += r.class_variables if r.kind_of?(Module) @lvar_names = name_error.local_variables end def candidates { @name => (@lvar_names + @method_names + @ivar_names + @cvar_names) } end end end
  104. 7" 3 * " # - &  / "

    . & 4 module DidYouMean class VariableNameChecker include SpellCheckable def initialize(name_error) # user input @name = name_error.name.to_s # dictionary r = name_error.receiver @method_names = r.methods + r.private_methods @ivar_names = r.instance_variables @cvar_names = r.class.class_variables @cvar_names += r.class_variables if r.kind_of?(Module) @lvar_names = name_error.local_variables end def candidates { @name => (@lvar_names + @method_names + @ivar_names + @cvar_names) } end end end
  105. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker ✅ ✅

  106. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker ✅ ✅

  107. $ - " 4 4  / " . &

    4 module DidYouMean class ClassNameChecker include SpellCheckable def initialize(exception) @class_name, @receiver = exception.name, exception.receiver end def candidates { @class_name => class_names } end # generates a dictionary def class_names scopes.flat_map do |scope| scope.constants.map do |constant| scope == Object ? constant : "#{scope}::#{constant}" end end end def scopes @receiver.to_s.split("::").inject([Object]) do |_scopes, scope| _scopes << _scopes.last.const_get(scope) end.uniq end end end
  108. module DidYouMean class ClassNameChecker include SpellCheckable def initialize(exception) @class_name, @receiver

    = exception.name, exception.receiver end def candidates { @class_name => class_names } end # generates a dictionary def class_names scopes.flat_map do |scope| scope.constants.map do |constant| scope == Object ? constant : "#{scope}::#{constant}" end end end def scopes @receiver.to_s.split("::").inject([Object]) do |_scopes, scope| _scopes << _scopes.last.const_get(scope) end.uniq end end end $ - " 4 4  / " . & 4 5-%3
  109. $ - " 4 4  / " . &

    4 ... class Person ... def address Address.new(raw_address) end class Address ... def zipcode ZipCode.new(raw_zipcode) # => NameError end class Zipcode ... end end end
  110. $ - " 4 4  / " . &

    4 ... class Person ... def address Address.new(raw_address) end class Address ... def zipcode ZipCode.new(raw_zipcode) # => NameError end class Zipcode ... end end end Object.constants Person.constants Address.constants
  111. $ - " 4 4  / " . &

    4 ... class Person ... def address Address.new(raw_address) end class Address ... def zipcode ZipCode.new(raw_zipcode) # => NameError end class Zipcode ... end end end + + Object.constants Person.constants Address.constants
  112. NameError NoMethodError 6OJOJUJBMJ[FEDPOTUBOU ClassNameChecker VariableNameChecker PUIFSFSSPST MethodNameChecker ✅ ✅ ✅

  113. w /BNF&SSPS w /P.FUIPE&SSPS 4QFMMDIFDLFS 6TFSJOQVU %JDUJPOBSZ “Did you mean?

    …” ✅ ✅
  114. "  T Q F M M  D I

    F D L F S  . P O L F Z  Q B U D I F T ✅ ✅
  115. https://rubygems.org/gems/did_you_mean did_you_mean 1.0.0.rc1 N ow available!

  116. 3VCZXJMMTIJQXJUIdid_you_mean C om ing soon!

  117. 5 ) " / , 4