Leetcode 10 Regular Expression Matching

给定一个字符串 (s) 和一个字符模式 (p)。实现支持 “.” 和 “*“ 的正则表达式匹配。
‘.’ 匹配任意单个字符。
‘*‘ 匹配零个或多个前面的元素。
匹配应该覆盖整个字符串 (s) ，而不是部分字符串。
说明:
s 可能为空，且只包含从 a-z 的小写字母。
p 可能为空，且只包含从 a-z 的小写字母，以及字符 . 和 *。

示例 1:
输入:
s = “aa”
p = “a”
输出: false
解释: “a” 无法匹配 “aa” 整个字符串。

示例 2:
输入:
s = “aa”
p = “a*“
输出: true
解释: ‘*‘ 代表可匹配零个或多个前面的元素, 即可以匹配 ‘a’ 。因此, 重复 ‘a’ 一次, 字符串可变为 “aa”。

示例 3:
输入:
s = “ab”
p = “.*“
输出: true
解释: “.*“ 表示可匹配零个或多个(‘*‘)任意字符(‘.’)。

示例 4:
输入:
s = “aab”
p = “c*a*b”
输出: true
解释: ‘c’ 可以不被重复, ‘a’ 可以被重复一次。因此可以匹配字符串 “aab”。

示例 5:
输入:
s = “mississippi”
p = “mis*is*p*.”
输出: false

题意分析：
就是让我们实现正则表达式的匹配函数，满足两条规则

‘.’匹配任意单个字符
‘a*‘代表有x个连续的a字符 0 <= x < inf

思路分析：
对于字符串p来说，我们先分析他前两个字符(先默认p的长度大于等于2，小于2的情况之后处理)，记作a,b；那么：
a可能是'.'，也可能是小写字母；
b可能是'.'，可能是小写字母，可能是'*'；
我们发现只要b不为'*'，那么他就无法对a产生影响，我们就可以使a和字符串s去匹配。
我们就可以继续分析s[1:], p[1:]了。

def isMatch(s,p):
    if len(p) >= 2 and p[1] != '*':
        if p[0] == s[0] or p[0] == '.':
            return isMatch(s[1:] ,p[1:])

那如果b就是’*‘呢？此时代表着可以匹配若干个a字符。
我们考虑这种情况s = 'cccbacd', p = 'c*cbacd'
这两个字符串是否能匹配，关键在于判断'c*'到底匹配了多少个’c’，我们可以每一个都分两种情况判断一下，只要两种情况中有一种满足即可
如假设c*匹配0个c，s = 'cccbacd, p = cbacd False
如假设c*匹配1个c，s = 'ccbacd, p = cbacd False
如假设c*匹配2个c，s = 'cbacd, p = cbacd True
…

1
2
3

def isMatch(s, p):
    if len(p) >= 2 and p[1] == '*':
        return isMatch(s[1:], p) or isMatch(s, p[2:])

ok，整体思路已经确定, len(p) < 2的情况很简单，完整代码如下：

class Solution(object):
    def isMatch(self, s, p):
        if not p: return not s
        first_match = bool(s) and p[0] in [s[0],'.']

        if len(p) >= 2 and p[1] == '*':
            return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))

        return first_match and self.isMatch(s[1:], p[1:])

但是你会发现其中是会有大量重复计算的，self.isMatch(s,p[2:]) 和 self.isMatch(s[1:],p) 中有很多重叠点。所以我们再采用一下记忆化的思想降低时间复杂度，但是将整个字符作为hash的key值代价太大，这里采用下标的方式处理。
改进代码如下：

class Solution(object):
    def isMatch(self, s, p):
        cache = {}
        def helper(i, j):
            if j >= len(p):
                return i == len(s)

            key = (i, j)
            if key in cache:
                return cache[key]
            first_match = i < len(s) and p[j] in [s[i], '.']

            if j+1 < len(p) and p[j+1] == '*':
                cache[key] = helper(i, j+2) or (first_match and helper(i+1, j))
            else:
                cache[key] = first_match and helper(i+1, j+1)
            return cache[key]

        return helper(0, 0)