Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Horspool's Algorithm

Jim Counts
November 13, 2012

Horspool's Algorithm

A presentation about a sub string matching algorithm that I created for CS 513 Fall 2012

Jim Counts

November 13, 2012
Tweet

More Decks by Jim Counts

Other Decks in Programming

Transcript

  1. Problem: Substring Matching  How can we find an instance

    of a pattern in some text?  Pattern: CS  Text: You must pass CS 513 to obtain your MS in CS.  Brute Force Efficiency  Worst Case: − + 1 ∈  Average Case: Θ()
  2. Improving Brute Force Search  First: Match from Right to

    Left  Second: When the match fails move pattern as far to the right as possible
  3. Problem: Finding the Shift Size  If we have to

    scan the left side of the pattern for repeated instances of the text character, we don’t gain much.  Solution: Create a table which maps every* character to the shift size. Text char (c) A B E R * Shift t(c) 4 2 1 3 6 *Unfortunately you really do need every character in the table if you want O(1) access times. Why?
  4. Algorithm  1. For some text and pattern (where the

    alphabet of pattern is a subset of the alphabet of the text), construct the shift table for the entire alphabet.  2. Align the pattern against the beginning of the text.  3. Starting with the last character in the pattern, compare the character in the text.  On a match, compare the next character to the left trying to match them all.  On a mismatch, look up the rightmost aligned character (not necessarily the mismatched character!) from the text in the shift table. Then shift the pattern to the right by the amount indicated in the shift table.  Repeat until the text matches, or until the text cannot match because the pattern is longer than the remaining text.
  5. Efficiency  Worst case in  How?  Consider looking

    for pattern 100 in text 0000  = − + 1 ∈ ()  Average case in Θ  Still better than brute force.  Why?  Although two algorithms might have the same relationship between their growth rate and their input size, we cannot conclude that they have the same performance.