Horspool's Algorithm

Horspool’s Algorithm JIM COUNTS | CSUSM FALL 2012 | [email protected]

Problem: Substring Matching  How can we find an instance
of a pattern in some text?  Pattern: CS  Text: You must pass CS 513 to obtain your MS in CS.  Brute Force Efficiency  Worst Case: − + 1 ∈  Average Case: Θ()

Improving Brute Force Search  First: Match from Right to
Left  Second: When the match fails move pattern as far to the right as possible

How far can we shift? Character from text isn’t in
pattern

How far can we shift? Character from text is misaligned
with pattern

How far can we shift? Character from text doesn’t repeat
in pattern

How far can we shift? Character from text repeats in
pattern

Problem: Finding the Shift Size  If we have to
scan the left side of the pattern for repeated instances of the text character, we don’t gain much.  Solution: Create a table which maps every* character to the shift size. Text char (c) A B E R * Shift t(c) 4 2 1 3 6 *Unfortunately you really do need every character in the table if you want O(1) access times. Why?

Algorithm  1. For some text and pattern (where the
alphabet of pattern is a subset of the alphabet of the text), construct the shift table for the entire alphabet.  2. Align the pattern against the beginning of the text.  3. Starting with the last character in the pattern, compare the character in the text.  On a match, compare the next character to the left trying to match them all.  On a mismatch, look up the rightmost aligned character (not necessarily the mismatched character!) from the text in the shift table. Then shift the pattern to the right by the amount indicated in the shift table.  Repeat until the text matches, or until the text cannot match because the pattern is longer than the remaining text.

Efficiency  Worst case in  How?  Consider looking
for pattern 100 in text 0000  = − + 1 ∈ ()  Average case in Θ  Still better than brute force.  Why?  Although two algorithms might have the same relationship between their growth rate and their input size, we cannot conclude that they have the same performance.

Horspool's Algorithm

Horspool's Algorithm

Jim Counts

More Decks by Jim Counts

Other Decks in Programming

Featured

Transcript

Horspool’s Algorithm JIM COUNTS | CSUSM FALL 2012 | [email protected]

Problem: Substring Matching  How can we find an instance

Improving Brute Force Search  First: Match from Right to

How far can we shift? Character from text isn’t in

How far can we shift? Character from text is misaligned

How far can we shift? Character from text doesn’t repeat

How far can we shift? Character from text repeats in

Problem: Finding the Shift Size  If we have to

Algorithm  1. For some text and pattern (where the

Efficiency  Worst case in  How?  Consider looking