Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My favourite algorithm

My favourite algorithm

A lightning talk about my favourite algorithm, the Burrows–Wheeler transform (http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform).

Given at Brighton Ruby Conference 2014 (http://lanyrd.com/2014/brightonruby/) and BaRuCo 2014 (http://lanyrd.com/2014/baruco/).

Tom Stuart

July 21, 2014
Tweet

More Decks by Tom Stuart

Other Decks in Programming

Transcript

  1. MY
    FAVOURITE
    ALGORITHM
    @tomstuart / BaRuCo / 2014-09-12

    View Slide

  2. BURROWS—
    WHEELER
    TRANSFORM

    View Slide

  3. B A N A N A

    View Slide

  4. B A N A N A
    B
    A N A N A
    B A
    N A N A
    B A N
    A N A
    B A N A
    N A
    B A N A N
    A

    View Slide

  5. B A N A N A
    B
    A N A N A
    B A
    N A N A
    B A N
    A N A
    B A N A
    N A
    B A N A N
    A

    View Slide

  6. B A N A N A
    B
    A N A N A
    B A
    N A N A
    B A N
    A N A
    B A N A
    N A
    B A N A N
    A
    N
    N
    A
    B
    A
    A
    “NNBAAA”

    View Slide

  7. def burrows_wheeler(string)
    chars = string.chars
    !
    chars.each_index.
    map(&chars.method(:rotate)).
    sort.map(&:last).join
    end

    View Slide

  8. >> string = 'banana'
    => "banana"

    View Slide

  9. >> string = 'banana'
    => "banana"
    !
    >> chars = string.chars
    => ["b", "a", "n", "a", "n", "a"]

    View Slide

  10. >> string = 'banana'
    => "banana"
    !
    >> chars = string.chars
    => ["b", "a", "n", "a", "n", "a"]
    !
    >> chars.each_index
    !
    !
    => #"n", "a"]:each_index>

    View Slide

  11. >> string = 'banana'
    => "banana"
    !
    >> chars = string.chars
    => ["b", "a", "n", "a", "n", "a"]
    !
    >> chars.each_index.
    map(&chars.method(:rotate))
    !
    => [
    ["b", "a", "n", "a", "n", "a"],
    ["a", "n", "a", "n", "a", "b"],
    ["n", "a", "n", "a", "b", "a"],
    ["a", "n", "a", "b", "a", "n"],
    ["n", "a", "b", "a", "n", "a"],
    ["a", "b", "a", "n", "a", "n"]
    ]

    View Slide

  12. >> string = 'banana'
    => "banana"
    !
    >> chars = string.chars
    => ["b", "a", "n", "a", "n", "a"]
    !
    >> chars.each_index.
    map(&chars.method(:rotate)).
    sort
    => [
    ["a", "b", "a", "n", "a", "n"],
    ["a", "n", "a", "b", "a", "n"],
    ["a", "n", "a", "n", "a", "b"],
    ["b", "a", "n", "a", "n", "a"],
    ["n", "a", "b", "a", "n", "a"],
    ["n", "a", "n", "a", "b", "a"]
    ]

    View Slide

  13. >> string = 'banana'
    => "banana"
    !
    >> chars = string.chars
    => ["b", "a", "n", "a", "n", "a"]
    !
    >> chars.each_index.
    map(&chars.method(:rotate)).
    sort.map(&:last)
    => ["n", "n", "b", "a", "a", "a"]

    View Slide

  14. >> string = 'banana'
    => "banana"
    !
    >> chars = string.chars
    => ["b", "a", "n", "a", "n", "a"]
    !
    >> chars.each_index.
    map(&chars.method(:rotate)).
    sort.map(&:last).join
    => "nnbaaa"

    View Slide

  15. >> burrows_wheeler('banana')
    => "nnbaaa"
    >> burrows_wheeler('The rain in Spain
    stays mainly in the plain')
    => "nnyseenn nrplmthhtT aa aapn
    iiiiiiS y s la"

    View Slide

  16. !

    View Slide

  17. ?
    ?
    ?
    ?
    ?
    ?
    ? ? ? ? ?
    ? ? ? ? ?
    ?
    ? ? ? ?
    ? ?
    ? ? ?
    ? ? ?
    ? ?
    ? ? ? ?
    ?
    “NNBAAA”

    View Slide

  18. A
    B
    A
    N
    A
    N
    ? ? ? ? ?
    ? ? ? ? ?
    ?
    ? ? ? ?
    ? ?
    ? ? ?
    ? ? ?
    ? ?
    ? ? ? ?
    ?
    “NNBAAA”

    View Slide

  19. ?
    A ? ? ? ?
    ?
    B ? ? ? ?
    ?
    ?
    A ? ? ?
    ?
    ? ?
    N ? ?
    ?
    ? ? ?
    A ?
    ?
    ? ? ? ?
    N
    “NNBAAA”

    View Slide

  20. ?
    A ? ? ? ?
    ?
    B ? ? ? ?
    ?
    ?
    A ? ? ?
    ?
    ? ?
    N ? ?
    ?
    ? ? ?
    A ?
    ?
    ? ? ? ?
    N
    “NNBAAA”

    View Slide

  21. A
    B
    A
    N
    A
    N
    B ? ? ? ?
    A ? ? ? ?
    ?
    N ? ? ?
    ? ?
    A ? ?
    ? ? ?
    N ?
    ? ? ? ?
    A
    “NNBAAA”

    View Slide

  22. ?
    A B ? ? ?
    ?
    B A ? ? ?
    ?
    ?
    A N ? ?
    ?
    ? ?
    N A ?
    ?
    ? ? ?
    A N
    ?
    A ? ? ?
    N
    “NNBAAA”

    View Slide

  23. ?
    A B ? ? ?
    ?
    B A ? ? ?
    ?
    ?
    A N ? ?
    ?
    ? ?
    N A ?
    ?
    ? ? ?
    A N
    ?
    A ? ? ?
    N
    “NNBAAA”

    View Slide

  24. A
    B
    A
    N
    A
    N
    B A ? ? ?
    A N ? ? ?
    ?
    N A ? ?
    ? ?
    A N ?
    ? ? ?
    N A
    B ? ? ?
    A
    “NNBAAA”

    View Slide

  25. ?
    A B A ? ?
    ?
    B A N ? ?
    ?
    ?
    A N A ?
    ?
    ? ?
    N A N
    ?
    A ? ?
    A N
    ?
    A B ? ?
    N
    “NNBAAA”

    View Slide

  26. ?
    A B A ? ?
    ?
    B A N ? ?
    ?
    ?
    A N A ?
    ?
    ? ?
    N A N
    ?
    A ? ?
    A N
    ?
    A B ? ?
    N
    “NNBAAA”

    View Slide

  27. A
    B
    A
    N
    A
    N
    B A N ? ?
    A N A ? ?
    ?
    N A N ?
    ? ?
    A N A
    B ? ?
    N A
    B A ? ?
    A
    “NNBAAA”

    View Slide

  28. ?
    A B A N ?
    ?
    B A N A ?
    ?
    ?
    A N A N
    ?
    A ?
    N A N
    ?
    A B ?
    A N
    ?
    A B A ?
    N
    “NNBAAA”

    View Slide

  29. ?
    A B A N ?
    ?
    B A N A ?
    ?
    ?
    A N A N
    ?
    A ?
    N A N
    ?
    A B ?
    A N
    ?
    A B A ?
    N
    “NNBAAA”

    View Slide

  30. A
    B
    A
    N
    A
    N
    B A N A ?
    A N A N ?
    ?
    N A N A
    B ?
    A N A
    B A ?
    N A
    B A N ?
    A
    “NNBAAA”

    View Slide

  31. ?
    A B A N A
    ?
    B A N A N
    ?
    A
    A N A N
    ?
    A B
    N A N
    ?
    A B A
    A N
    ?
    A B A N
    N
    “NNBAAA”

    View Slide

  32. ?
    A B A N A
    ?
    B A N A N
    ?
    A
    A N A N
    ?
    A B
    N A N
    ?
    A B A
    A N
    ?
    A B A N
    N
    “NNBAAA”

    View Slide

  33. A
    A
    N
    N
    B A N A N
    B
    N A N A
    B A
    A N A
    B A N A
    A
    “NNBAAA”
    B
    A
    A N A N A
    B A N
    N A

    View Slide

  34. A
    A
    N
    N
    B A N A N
    B
    N A N A
    B A
    A N A
    B A N A
    A
    “NNBAAA”
    B A N A N A
    B
    A
    A N A N A
    B A N
    N A

    View Slide

  35. def inverse_burrows_wheeler(string)
    chars = string.chars
    !
    chars.
    inject([]) { |table| chars.zip(table).sort }.
    map(&:join)
    end

    View Slide

  36. >> inverse_burrows_wheeler('nnbaaa')
    => ["abanan", "anaban", "ananab",
    "banana", "nabana", "nanaba"]

    View Slide

  37. >> inverse_burrows_wheeler('nnyseenn nrplmthhtT aa aapn iiiiiiS y s la')

    => [" Spain stays mainly in the plainThe rain in", " in Spain stays mainly in
    the plainThe rain", " in the plainThe rain in Spain stays mainly", " mainly in
    the plainThe rain in Spain stays", " plainThe rain in Spain stays mainly in
    the", " rain in Spain stays mainly in the plainThe", " stays mainly in the
    plainThe rain in Spain", " the plainThe rain in Spain stays mainly in", "Spain
    stays mainly in the plainThe rain in ", "The rain in Spain stays mainly in the
    plain", "ain in Spain stays mainly in the plainThe r", "ain stays mainly in the
    plainThe rain in Sp", "ainThe rain in Spain stays mainly in the pl", "ainly in
    the plainThe rain in Spain stays m", "ays mainly in the plainThe rain in Spain
    st", "e plainThe rain in Spain stays mainly in th", "e rain in Spain stays
    mainly in the plainTh", "he plainThe rain in Spain stays mainly in t", "he rain
    in Spain stays mainly in the plainT", "in Spain stays mainly in the plainThe
    rain ", "in in Spain stays mainly in the plainThe ra", "in stays mainly in the
    plainThe rain in Spa", "in the plainThe rain in Spain stays mainly ", "inThe
    rain in Spain stays mainly in the pla", "inly in the plainThe rain in Spain
    stays ma", "lainThe rain in Spain stays mainly in the p", "ly in the plainThe
    rain in Spain stays main", "mainly in the plainThe rain in Spain stays ", "n
    Spain stays mainly in the plainThe rain i", "n in Spain stays mainly in the
    plainThe rai", "n stays mainly in the plainThe rain in Spai", "n the plainThe
    rain in Spain stays mainly i", "nThe rain in Spain stays mainly in the plai",
    "nly in the plainThe rain in Spain stays mai", "pain stays mainly in the
    plainThe rain in S", "plainThe rain in Spain stays mainly in the ", "rain in
    Spain stays mainly in the plainThe ", "s mainly in the plainThe rain in Spain
    stay", "stays mainly in the plainThe rain in Spain ", "tays mainly in the
    plainThe rain in Spain s", "the plainThe rain in Spain stays mainly in ", "y in
    the plainThe rain in Spain stays mainl", "ys mainly in the plainThe rain in
    Spain sta"]

    View Slide

  38. def burrows_wheeler(string)
    chars = string.chars + ['$']
    !
    chars.each_index.
    map(&chars.method(:rotate)).
    sort.map(&:last).join
    end
    !
    def inverse_burrows_wheeler(string)
    chars = string.chars
    !
    chars.
    inject([]) { |table| chars.zip(table).sort }.
    map(&:join).detect { |s| s.end_with?('$') }.chop
    end

    View Slide

  39. >> burrows_wheeler('banana')

    => "annb$aa"
    !
    >> inverse_burrows_wheeler('annb$aa')

    => "banana"
    !
    >> burrows_wheeler('The rain in Spain stays mainly
    in the plain')

    => "nnyseennn $rplmthhtT aa aapn iiiiiiS y s la"
    !
    >> inverse_burrows_wheeler('nnyseennn $rplmthhtT
    aa aapn iiiiiiS y s la')

    => "The rain in Spain stays mainly in the plain"

    View Slide

  40. thanks!
    @tomstuart / [email protected]

    View Slide