检查 Python 中是否存在切片列表-IT科技

摘要：问题描述：我想编写一个函数来确定一个子列表是否存在于一个更大的列表中。list1 = [1,0,1,1,1,0,0] list2 = [1,0,1,0,1,0,1] #Should return true sublistExists(list1, [1,1,1]) #Should return false...

问题描述：

我想编写一个函数来确定一个子列表是否存在于一个更大的列表中。

list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]

#Should return true
sublistExists(list1, [1,1,1])

#Should return false
sublistExists(list2, [1,1,1])

是否有一个 Python 函数可以做到这一点？

解决方案 1：

让我们更实用一点吧，可以吗？:)

def contains_sublist(lst, sublst):
    n = len(sublst)
    return any((sublst == lst[i:i+n]) for i in range(len(lst)-n+1))

请注意，any()在 O(m*n) 次操作后，如果 lst 中第一次匹配 sublst，则停止 - 如果没有匹配，则失败

解决方案 2：

如果您确定输入只包含个位数 0 和 1，那么您可以转换为字符串：

def sublistExists(list1, list2):
    return ''.join(map(str, list2)) in ''.join(map(str, list1))

这会创建两个字符串，因此它不是最有效的解决方案，但由于它利用了 Python 中优化的字符串搜索算法，所以它可能足以满足大多数用途。

如果效率非常重要，您可以查看适合列表的Boyer-Moore字符串搜索算法。

简单搜索的最坏情况是 O(n*m)，但如果您不能使用转换为字符串技巧并且不需要担心性能，那么它是合适的。

解决方案 3：

我不知道有这个函数

def sublistExists(list, sublist):
    for i in range(len(list)-len(sublist)+1):
        if sublist == list[i:i+len(sublist)]:
            return True #return position (i) if you wish
    return False #or -1

正如 Mark 所说，这不是最有效的搜索（它是 O(n*m)）。这个问题可以用与字符串搜索大致相同的方式来解决。

解决方案 4：

我最喜欢的简单解决方案如下（但是，它非常暴力，所以我不建议在大数据上使用它）：

>>> l1 = ['z','a','b','c']
>>> l2 = ['a','b']
>>>any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

上述代码实际上创建了长度为 l2 的所有可能的 l1 切片，并按顺序将它们与 l2 进行比较。

详细说明

只有当你不明白它的工作原理（并且你想知道它）时才阅读此解释，否则没有必要阅读它

首先，这是迭代 l1 项索引的方法：

>>> [i for i in range(len(l1))]
[0, 1, 2, 3]

因此，由于i代表 l1 中项目的索引，您可以使用它来显示实际项目，而不是索引号：

>>> [l1[i] for i in range(len(l1))]
['z', 'a', 'b', 'c']

然后从 l1 创建长度为 2 的切片（类似于从列表中选择项目）：

>>> [l1[i:i+len(l2)] for i in range(len(l1))]
[['z', 'a'], ['a', 'b'], ['b', 'c'], ['c']] #last one is shorter, because there is no next item.

现在你可以将每个切片与 l2 进行比较，你会看到第二个切片匹配：

>>> [l1[i:i+len(l2)] == l2 for i in range(len(l1))]
[False, True, False, False] #notice that the second one is that matching one

最后，使用名为any 的函数，可以检查至少有一个布尔值是否为 True：

>>> any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

解决方案 5：

实现此目的的有效方法是使用Boyer-Moore 算法，正如 Mark Byers 所建议的那样。我已经在这里完成了：使用 Python 中的 Boyer-Moore 搜索列表以查找子列表，但将在此处粘贴代码。它基于 Wikipedia 文章。

该search()函数返回正在搜索的子列表的索引，如果失败则返回 -1。

def search(haystack, needle):
    """
    Search list `haystack` for sublist `needle`.
    """
    if len(needle) == 0:
        return 0
    char_table = make_char_table(needle)
    offset_table = make_offset_table(needle)
    i = len(needle) - 1
    while i < len(haystack):
        j = len(needle) - 1
        while needle[j] == haystack[i]:
            if j == 0:
                return i
            i -= 1
            j -= 1
        i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
    return -1

    
def make_char_table(needle):
    """
    Makes the jump table based on the mismatched character information.
    """
    table = {}
    for i in range(len(needle) - 1):
        table[needle[i]] = len(needle) - 1 - i
    return table
    
def make_offset_table(needle):
    """
    Makes the jump table based on the scan offset in which mismatch occurs.
    """
    table = []
    last_prefix_position = len(needle)
    for i in reversed(range(len(needle))):
        if is_prefix(needle, i + 1):
            last_prefix_position = i + 1
        table.append(last_prefix_position - i + len(needle) - 1)
    for i in range(len(needle) - 1):
        slen = suffix_length(needle, i)
        table[slen] = len(needle) - 1 - i + slen
    return table
    
def is_prefix(needle, p):
    """
    Is needle[p:end] a prefix of needle?
    """
    j = 0
    for i in range(p, len(needle)):
        if needle[i] != needle[j]:
            return 0
        j += 1    
    return 1
    
def suffix_length(needle, p):
    """
    Returns the maximum length of the substring ending at p that is a suffix.
    """
    length = 0;
    j = len(needle) - 1
    for i in reversed(range(p + 1)):
        if needle[i] == needle[j]:
            length += 1
        else:
            break
        j -= 1
    return length

以下是问题中的例子：

def main():
    list1 = [1,0,1,1,1,0,0]
    list2 = [1,0,1,0,1,0,1]
    index = search(list1, [1, 1, 1])
    print(index)
    index = search(list2, [1, 1, 1])
    print(index)

if __name__ == '__main__':
    main()

输出：

2
-1

解决方案 6：

这是一种适用于简单列表的方法，比 Mark 的方法稍微不那么脆弱

def sublistExists(haystack, needle):
    def munge(s):
        return ", "+format(str(s)[1:-1])+","
    return munge(needle) in munge(haystack)

解决方案 7：

def sublistExists(x, y):
  occ = [i for i, a in enumerate(x) if a == y[0]]
  for b in occ:
      if x[b:b+len(y)] == y:
           print 'YES-- SUBLIST at : ', b
           return True
      if len(occ)-1 ==  occ.index(b):
           print 'NO SUBLIST'
           return False

list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]

#should return True
sublistExists(list1, [1,1,1])

#Should return False
sublistExists(list2, [1,1,1])

解决方案 8：

不妨加入@NasBanov 解决方案的递归版本

def foo(sub, lst):
    '''Checks if sub is in lst.

    Expects both arguments to be lists
    '''
    if len(lst) < len(sub):
        return False
    return sub == lst[:len(sub)] or foo(sub, lst[1:])

解决方案 9：

def sublist(l1,l2):
  if len(l1) < len(l2):
    for i in range(0, len(l1)):
      for j in range(0, len(l2)):
        if l1[i]==l2[j] and j==i+1:
        pass
      return True
  else:
    return False

解决方案 10：

Nas Banov 的答案中提出的函数有一个根本问题：它创建了许多列表，这些列表仅用于比较原始列表的部分内容。

具体来说，当有人写

(sublst == lst[i: i + n]) for i in range(len(lst) - n + 1)

然后创建len(lst) - n + 1长度列表n（在最坏的情况下，即在原始列表中找不到子列表时）。

在某些情况下，这种方法可能会变得非常慢，尤其是当要创建的列表很大时（即子列表很大）。对于这种情况，此实现要快得多：

def is_sublist(sub_lst, lst):
    n = len(sub_lst)
    return any(
        all(lst[i + j] == sub_lst[j] for j in range(n))
        for i in range(len(lst) - n + 1)
    )

让我们比较一下这两个函数（is_sublist_a是 Nas Banov 最初提出的函数，is_sublist_b也是我提出的函数）：

def is_sublist_a(sub_lst, lst):
    n = len(sub_lst)
    return any(
        sub_lst == lst[i: i + n]
        for i in range(len(lst) - n + 1)
    )

def is_sublist_b(sub_lst, lst):
    n = len(sub_lst)
    return any(
        all(lst[i + j] == sub_lst[j] for j in range(n))
        for i in range(len(lst) - n + 1)
    )

让我们看一下最坏的情况（子列表不存在）。这是我用来测量函数返回结果所需时间的函数：

from time import time

def exec_time(is_sublist, lst, sub_lst):
    start = time()
    _ = is_sublist(sub_lst, lst)
    return time() - start

我们可以看到，对于较小的子列表，第一个函数更快：

>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 10)
1.7239012718200684
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 30)
2.3223540782928467
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 50)
3.017274856567383

>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 10)
5.492832899093628
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 30)
5.4729719161987305
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 50)
5.4685280323028564

您可以轻松注意到，函数is_sublist_a更快。但您也可以注意到函数的速度is_sublist_b与子列表的长度无关，而is_sublist_a的速度则取决于。

因此很容易证明这is_sublist_b比更大的子列表要快得多is_sublist_a：

>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 500)
15.868159055709839
>>> exec_time(is_sublist_a, [0] * 10**7, [1] * 1000)
29.75873899459839

>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 500)
5.8182408809661865
>>> exec_time(is_sublist_b, [0] * 10**7, [1] * 1000)
6.155586004257202

解决方案 11：

我知道这可能与原始问题不太相关，但如果两个列表中项目的顺序无关紧要，那么对于其他人来说，这可能是非常优雅的一行解决方案。如果 List1 元素在 List2 中（无论顺序如何），则下面的结果将显示 True。如果顺序很重要，请不要使用此解决方案。

List1 = [10, 20, 30]
List2 = [10, 20, 30, 40]
result = set(List1).intersection(set(List2)) == set(List1)
print(result)

输出

True

解决方案 12：

如果我理解正确的话，你有一个更大的列表，例如：

list_A= ['john', 'jeff', 'dave', 'shane', 'tim']

然后还有其他列表

list_B= ['sean', 'bill', 'james']

list_C= ['cole', 'wayne', 'jake', 'moose']

然后我将列表 B 和 C 附加到列表 A

list_A.append(list_B)

list_A.append(list_C)

所以当我打印list_A时

print (list_A)

我得到以下输出

['john', 'jeff', 'dave', 'shane', 'tim', ['sean', 'bill', 'james'], ['cole', 'wayne', 'jake', 'moose']]

现在我想检查子列表是否存在：

for value in list_A:
    value= type(value)
    value= str(value).strip('<>').split()[1]
    if (value == "'list'"):
        print "True"
    else:
        print "False"

如果大列表中有任何子列表，这将返回“True”。