Slide 1

Slide 1 text

Horspool’s Algorithm JIM COUNTS | CSUSM FALL 2012 | [email protected]

Slide 2

Slide 2 text

Problem: Substring Matching  How can we find an instance of a pattern in some text?  Pattern: CS  Text: You must pass CS 513 to obtain your MS in CS.  Brute Force Efficiency  Worst Case: − + 1 ∈  Average Case: Θ()

Slide 3

Slide 3 text

Improving Brute Force Search  First: Match from Right to Left  Second: When the match fails move pattern as far to the right as possible

Slide 4

Slide 4 text

How far can we shift? Character from text isn’t in pattern

Slide 5

Slide 5 text

How far can we shift? Character from text is misaligned with pattern

Slide 6

Slide 6 text

How far can we shift? Character from text doesn’t repeat in pattern

Slide 7

Slide 7 text

How far can we shift? Character from text repeats in pattern

Slide 8

Slide 8 text

Problem: Finding the Shift Size  If we have to scan the left side of the pattern for repeated instances of the text character, we don’t gain much.  Solution: Create a table which maps every* character to the shift size. Text char (c) A B E R * Shift t(c) 4 2 1 3 6 *Unfortunately you really do need every character in the table if you want O(1) access times. Why?

Slide 9

Slide 9 text

Algorithm  1. For some text and pattern (where the alphabet of pattern is a subset of the alphabet of the text), construct the shift table for the entire alphabet.  2. Align the pattern against the beginning of the text.  3. Starting with the last character in the pattern, compare the character in the text.  On a match, compare the next character to the left trying to match them all.  On a mismatch, look up the rightmost aligned character (not necessarily the mismatched character!) from the text in the shift table. Then shift the pattern to the right by the amount indicated in the shift table.  Repeat until the text matches, or until the text cannot match because the pattern is longer than the remaining text.

Slide 10

Slide 10 text

Efficiency  Worst case in  How?  Consider looking for pattern 100 in text 0000  = − + 1 ∈ ()  Average case in Θ  Still better than brute force.  Why?  Although two algorithms might have the same relationship between their growth rate and their input size, we cannot conclude that they have the same performance.