LeetCode Day 09 String(2/2)

4 minute read

Published: August 12, 2023

Today is another day of Strings Day(2/2), focus on two KMP algorithm questions.

Question 1

28. Find the Index of the First Occurrence in a String

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Intuition: usw two pointers:

### ***WRONG Answer***
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        pHay, pNee = 0, 0
        result = 0
        while pHay <= len(haystack)-1:
            for i in range(len(haystack)):
                if haystack[i] == needle[pNee]:
                    result = i
                    while pNee <= len(needle)-1:
                        for j in range(len(needle)):
                            if haystack[i+j] == needle[j]:
                                pNee += 1
                            else:
                                return -1
                    return result
                else: 
                    return -1
            pHay += 1

But with two levels of while and for loop, not only is the code very poorly readable, there are test cases that don’t pass. It seems that this is not the right way.

Starting today’s study: the KMP algorithm. This question is a classic KMP algorithm question.

The main idea of KMP is that when there is a string mismatch, you can record a portion of the text that has been matched before, and use this information to avoid having to do the matching from scratch.

What is KMP

It was invented by these three scholars: Knuth, Morris and Pratt, so it took the initials of the three scholars’ names. So it is called KMP

What is the prefix table

The next array is a prefix table (prefix table).

What is the purpose of the prefix table?

The prefix table is used for backtracking, it records where the pattern string should be re-matched from when the pattern string does not match the main string (text string). The task of the prefix table is to fail to match at the current position, find the position that has been matched before, and then re-match, which also means that in case of a character mismatch, the prefix table will tell you which position the pattern string should be jumped to in the next step of matching. How does the prefix table work? The first step is to calculate the longest common prefix and suffix.

How to calculate the longest equal prefix and suffix

Prefixes are all consecutive substrings not containing the last character that begin with the first character.
Suffixes are all consecutive substrings not containing the first character that end with the last character.

For example, aabaaf

Prefixes with a, aa, aab, aaba, aabaa

Suffixes are f, af, aaf, baaf, aabaaf

Taking the prefixes, the longest equal prefix and suffix of a is 0, aa is 1, aab is 0, aaba is 1, aabaa is 2

The longest is 2, so when using KMP matching, you can restart the match from the position where the subscript is 2 without having to start all over again

So the solution of this Question is:

class Solution:
    def getNext(self, next, s):
        j = -1
        next[0] = j
        for i in range(1, len(s)):
            while j >= 0 and s[i] != s[j+1]:
                j = next[j]
            if s[i] == s[j+1]:
                j += 1
            next[i] = j
    
    def strStr(self, haystack: str, needle: str) -> int:
        if not needle:
            return 0
        next = [0] * len(needle)
        self.getNext(next, needle)
        j = -1
        for i in range(len(haystack)):
            while j >= 0 and haystack[i] != needle[j+1]:
                j = next[j]
            if haystack[i] == needle[j+1]:
                j += 1
            if j == len(needle) - 1:
                return i - len(needle) + 1
        return -1

Question 2

459. Repeated Substring Pattern

Given a string s, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.

Another problem that can be solved using the KMP algorithm. In fact, the next array records the longest identical prefixes and suffixes, so if you find the next array, you’ll be halfway through the problem.

class Solution:
    def repeatedSubstringPattern(self, s: str) -> bool:  
        if len(s) == 0:
            return False
        nxt = [0] * len(s)
        self.getNext(nxt, s)
        if nxt[-1] != -1 and len(s) % (len(s) - (nxt[-1] + 1)) == 0:
            return True
        return False
    
    def getNext(self, nxt, s):
        nxt[0] = -1
        j = -1
        for i in range(1, len(s)):
            while j >= 0 and s[i] != s[j+1]:
                j = nxt[j]
            if s[i] == s[j+1]:
                j += 1
            nxt[i] = j
        return nxt

Ziqi Han

LeetCode Day 09 String(2/2)

Question 1

What is KMP

What is the prefix table

What is the purpose of the prefix table?

How to calculate the longest equal prefix and suffix

Question 2

You May Also Enjoy

LeetCode Day 60 Monotonic Stack (3/3)

LeetCode Day 59 Monotonic Stack (2/3)

LeetCode Day 58 Monotonic Stack (1/3)

LeetCode Day 57 Dynamic Programming (17/17)