LeetCode Day 09 String(2/2)
Published:
Today is another day of Strings Day(2/2), focus on two KMP algorithm questions.
Question 1
28. Find the Index of the First Occurrence in a String
Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Intuition: usw two pointers:
### ***WRONG Answer***
class Solution:
def strStr(self, haystack: str, needle: str) -> int:
pHay, pNee = 0, 0
result = 0
while pHay <= len(haystack)-1:
for i in range(len(haystack)):
if haystack[i] == needle[pNee]:
result = i
while pNee <= len(needle)-1:
for j in range(len(needle)):
if haystack[i+j] == needle[j]:
pNee += 1
else:
return -1
return result
else:
return -1
pHay += 1
But with two levels of while and for loop, not only is the code very poorly readable, there are test cases that don’t pass. It seems that this is not the right way.
Starting today’s study: the KMP algorithm. This question is a classic KMP algorithm question.
The main idea of KMP is that when there is a string mismatch, you can record a portion of the text that has been matched before, and use this information to avoid having to do the matching from scratch.
What is KMP
It was invented by these three scholars: Knuth, Morris and Pratt, so it took the initials of the three scholars’ names. So it is called KMP
What is the prefix table
The next array is a prefix table (prefix table).
What is the purpose of the prefix table?
The prefix table is used for backtracking, it records where the pattern string should be re-matched from when the pattern string does not match the main string (text string). The task of the prefix table is to fail to match at the current position, find the position that has been matched before, and then re-match, which also means that in case of a character mismatch, the prefix table will tell you which position the pattern string should be jumped to in the next step of matching. How does the prefix table work? The first step is to calculate the longest common prefix and suffix.
How to calculate the longest equal prefix and suffix
Prefixes are all consecutive substrings not containing the last character that begin with the first character.
Suffixes are all consecutive substrings not containing the first character that end with the last character.
For example, aabaaf
Prefixes with a, aa, aab, aaba, aabaa
Suffixes are f, af, aaf, baaf, aabaaf
Taking the prefixes, the longest equal prefix and suffix of a is 0, aa is 1, aab is 0, aaba is 1, aabaa is 2
The longest is 2, so when using KMP matching, you can restart the match from the position where the subscript is 2 without having to start all over again
So the solution of this Question is:
class Solution:
def getNext(self, next, s):
j = -1
next[0] = j
for i in range(1, len(s)):
while j >= 0 and s[i] != s[j+1]:
j = next[j]
if s[i] == s[j+1]:
j += 1
next[i] = j
def strStr(self, haystack: str, needle: str) -> int:
if not needle:
return 0
next = [0] * len(needle)
self.getNext(next, needle)
j = -1
for i in range(len(haystack)):
while j >= 0 and haystack[i] != needle[j+1]:
j = next[j]
if haystack[i] == needle[j+1]:
j += 1
if j == len(needle) - 1:
return i - len(needle) + 1
return -1
Question 2
459. Repeated Substring Pattern
Given a string
s, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.
Another problem that can be solved using the KMP algorithm. In fact, the next array records the longest identical prefixes and suffixes, so if you find the next array, you’ll be halfway through the problem.
class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
if len(s) == 0:
return False
nxt = [0] * len(s)
self.getNext(nxt, s)
if nxt[-1] != -1 and len(s) % (len(s) - (nxt[-1] + 1)) == 0:
return True
return False
def getNext(self, nxt, s):
nxt[0] = -1
j = -1
for i in range(1, len(s)):
while j >= 0 and s[i] != s[j+1]:
j = nxt[j]
if s[i] == s[j+1]:
j += 1
nxt[i] = j
return nxt
