如何检查一本词典是否是另一本更大词典的子集？-IT科技

摘要：问题描述：我正在尝试编写一个自定义过滤方法，该方法接受任意数量的kwargs并返回一个列表，该列表包含包含这些kwargs 的类似数据库的列表的元素。例如，假设d1 = {'a':'2', 'b':'3'}和d2= 是同一个东西，d1 == d2结果为 True。但假设d2= 是同一个东西加上一堆其他东西。我...

问题描述：

我正在尝试编写一个自定义过滤方法，该方法接受任意数量的kwargs并返回一个列表，该列表包含包含这些kwargs 的类似数据库的列表的元素。

例如，假设d1 = {'a':'2', 'b':'3'}和d2= 是同一个东西，d1 == d2结果为 True。但假设d2= 是同一个东西加上一堆其他东西。我的方法需要能够判断d1 是否在 d2 中，但 Python 无法用字典做到这一点。

语境：

我有一个 Word 类，每个对象都包含诸如word、definition、part_of_speech等等属性。我希望能够在这些单词的主列表上调用一个过滤方法，例如Word.objects.filter(word='jump', part_of_speech='verb-intransitive')。我不知道如何同时管理这些键和值。但对于其他人来说，这可能在上下文之外具有更大的功能。

解决方案 1：

在 Python 3 中，你可以使用dict.items()获取字典项的集合视图。然后，你可以使用<=运算符测试一个视图是否是另一个视图的“子集”：

d1.items() <= d2.items()

在 Python 2.7 中，使用以下命令dict.viewitems()执行相同操作：

d1.viewitems() <= d2.viewitems()

在 Python 2.6 及以下版本中，您将需要不同的解决方案，例如使用all()：

all(key in d2 and d2[key] == d1[key] for key in d1)

解决方案 2：

转换为项目对并检查其包含情况。

all(item in superset.items() for item in subset.items())

优化留给读者作为练习。

解决方案 3：

assertDictContainsSubset()对于需要进行单元测试的人来说请注意： Python 类中还有一种方法TestCase。

http://docs.python.org/2/library/unittest.html?highlight=assertdictcontainssubset#unittest.TestCase.assertDictContainsSubset

然而它在 3.2 中已被弃用，不确定原因，也许有替代品。

解决方案 4：

为了完整性，您还可以这样做：

def is_subdict(small, big):
    return dict(big, **small) == big

但是，我对速度（或缺乏速度）或可读性（或缺乏可读性）不做任何保证。

更新：正如 Boris 的评论所指出的，如果你的小字典有非字符串键并且你使用的是 Python >= 3，那么这个技巧就不起作用（或者换句话说：面对任意输入的键，它只在旧版 Python 2.x 中有效）。

但是，如果您使用的是Python 3.9 或更新版本，则可以使其同时使用非字符串类型的键以及获得更简洁的语法。

假设您的代码已经将两个字典作为变量，那么检查这个内联就非常简洁了：

if big | small == big:
    # do something

否则，或者如果您更喜欢如上所述的可重复使用的功能，您可以使用这个：

def is_subdict(small, big):
    return big | small == big

工作原理与第一个函数相同，只是这次利用了扩展以支持字典的联合运算符。

解决方案 5：

对于键和值检查使用：
set(d1.items()).issubset(set(d2.items()))

如果您只需要检查键：
set(d1).issubset(set(d2))

解决方案 6：

这是一个解决方案，它也能正确地递归到字典中包含的列表和集合。您也可以将其用于包含字典等的列表……

def is_subset(subset, superset):
    if isinstance(subset, dict):
        return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
    
    if isinstance(subset, list) or isinstance(subset, set):
        return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)

    # assume that subset is a plain value if none of the above match
    return subset == superset

当使用python 3.10时，可以使用python的新匹配语句来进行类型检查：

def is_subset(subset, superset):
    match subset:
        case dict(_): return all(key in superset and is_subset(val, superset[key]) for key, val in subset.items())
        case list(_) | set(_): return all(any(is_subset(subitem, superitem) for superitem in superset) for subitem in subset)
        # assume that subset is a plain value if none of the above match
        case _: return subset == superset

解决方案 7：

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True

语境：

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> list(d1.iteritems())
[('a', '2'), ('b', '3')]
>>> [(k,v) for k,v in d1.iteritems()]
[('a', '2'), ('b', '3')]
>>> k,v = ('a','2')
>>> k
'a'
>>> v
'2'
>>> k in d2
True
>>> d2[k]
'2'
>>> k in d2 and d2[k]==v
True
>>> [(k in d2 and d2[k]==v) for k,v in d1.iteritems()]
[True, True]
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems())
<generator object <genexpr> at 0x02A9D2B0>
>>> ((k in d2 and d2[k]==v) for k,v in d1.iteritems()).next()
True
>>> all((k in d2 and d2[k]==v) for k,v in d1.iteritems())
True
>>>

解决方案 8：

对于 Python 3.9，我使用的内容如下：

def dict_contains_dict(small: dict, big: dict):    
   return (big | small) == big

解决方案 9：

我的函数用于同样的目的，以递归方式执行此操作：

def dictMatch(patn, real):
    """does real dict match pattern?"""
    try:
        for pkey, pvalue in patn.iteritems():
            if type(pvalue) is dict:
                result = dictMatch(pvalue, real[pkey])
                assert result
            else:
                assert real[pkey] == pvalue
                result = True
    except (AssertionError, KeyError):
        result = False
    return result

在您的示例中，dictMatch(d1, d2)即使 d2 中包含其他内容，也应该返回 True，而且它也适用于较低级别：

d1 = {'a':'2', 'b':{3: 'iii'}}
d2 = {'a':'2', 'b':{3: 'iii', 4: 'iv'},'c':'4'}

dictMatch(d1, d2)   # True

注意：可能存在更好的解决方案，可以避免该if type(pvalue) is dict子句，并适用于更广泛的情况（例如哈希列表等）。此外，递归不受此限制，因此使用时请自行承担风险。;)

解决方案 10：

我知道这个问题有点老了，但这是我检查一个嵌套字典是否是另一个嵌套字典的一部分的解决方案。该解决方案是递归的。

def compare_dicts(a, b):
    for key, value in a.items():
        if key in b:
            if isinstance(a[key], dict):
                if not compare_dicts(a[key], b[key]):
                    return False
            elif value != b[key]:
                return False
        else:
            return False
    return True

解决方案 11：

如果你不介意使用的pydash 话，有is_match一个方法可以做到这一点：

import pydash

a = {1:2, 3:4, 5:{6:7}}
b = {3:4.0, 5:{6:8}}
c = {3:4.0, 5:{6:7}}

pydash.predicates.is_match(a, b) # False
pydash.predicates.is_match(a, c) # True

解决方案 12：

这个看似简单的问题花了我几个小时的研究才找到一个 100% 可靠的解决方案，所以我在这个答案中记录了我的发现。

从“Python 风格”的角度来说，small_dict <= big_dict这是最直观的方法，但很遗憾它行不通。它似乎在 Python 2 中有效，但并不可靠，因为官方文档明确提到了这一点。请搜索本节{'a': 1} < {'a': 1, 'b': 2}中的“除相等性之外的结果都一致解决，但没有其他定义。” 。更不用说，在 Python 3 中比较两个字典会导致 TypeError 异常。
第二个最直观的做法是small.viewitems() <= big.viewitems()只适用于 Python 2.7，以及small.items() <= big.items()Python 3。但有一个警告：它可能存在 bug。如果你的程序可能在 Python <=2.6 版本上使用，它d1.items() <= d2.items()实际上是在比较两个元组列表，且没有特定的顺序，因此最终结果将不可靠，并会成为程序中的一个严重 bug。我并不想再为 Python <=2.6 版本编写另一个实现，但我还是不放心我的代码带有已知的 bug（即使它运行在一个不受支持的平台上）。所以我放弃了这种方法。
我接受了@blubberdiblub 的回答（感谢他的回答）：

`def is_subdict(small, big):
return dict(big, **small) == big`

值得指出的是，这个答案依赖于==dict之间的行为，这在官方文档中有明确的定义，因此应该在每个Python版本中都有效。去搜索：

* “当且仅当字典具有相同的（键，值）对时，字典才相等。”是本页的最后一句话
* 本页内容：“当且仅当映射（字典的实例）具有相等的（键，值）对时，它们才比较相等。键和元素的相等性比较会增强自反性。”

解决方案 13：

以下是针对给定问题的一般递归解决方案：

import traceback
import unittest

def is_subset(superset, subset):
    for key, value in subset.items():
        if key not in superset:
            return False

        if isinstance(value, dict):
            if not is_subset(superset[key], value):
                return False

        elif isinstance(value, str):
            if value not in superset[key]:
                return False

        elif isinstance(value, list):
            if not set(value) <= set(superset[key]):
                return False
        elif isinstance(value, set):
            if not value <= superset[key]:
                return False

        else:
            if not value == superset[key]:
                return False

    return True


class Foo(unittest.TestCase):

    def setUp(self):
        self.dct = {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
            'f': {
                'a': 'hello world',
                'b': 12345,
                'c': 1.2345,
                'd': [1, 2, 3, 4, 5],
                'e': {1, 2, 3, 4, 5},
                'g': False,
                'h': None
            },
            'g': False,
            'h': None,
            'question': 'mcve',
            'metadata': {}
        }

    def tearDown(self):
        pass

    def check_true(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), True)

    def check_false(self, superset, subset):
        return self.assertEqual(is_subset(superset, subset), False)

    def test_simple_cases(self):
        self.check_true(self.dct, {'a': 'hello world'})
        self.check_true(self.dct, {'b': 12345})
        self.check_true(self.dct, {'c': 1.2345})
        self.check_true(self.dct, {'d': [1, 2, 3, 4, 5]})
        self.check_true(self.dct, {'e': {1, 2, 3, 4, 5}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'b': 12345,
            'c': 1.2345,
            'd': [1, 2, 3, 4, 5],
            'e': {1, 2, 3, 4, 5},
        }})
        self.check_true(self.dct, {'g': False})
        self.check_true(self.dct, {'h': None})

    def test_tricky_cases(self):
        self.check_true(self.dct, {'a': 'hello'})
        self.check_true(self.dct, {'d': [1, 2, 3]})
        self.check_true(self.dct, {'e': {3, 4}})
        self.check_true(self.dct, {'f': {
            'a': 'hello world',
            'h': None
        }})
        self.check_false(
            self.dct, {'question': 'mcve', 'metadata': {'author': 'BPL'}})
        self.check_true(
            self.dct, {'question': 'mcve', 'metadata': {}})
        self.check_false(
            self.dct, {'question1': 'mcve', 'metadata': {}})

if __name__ == "__main__":
    unittest.main()

注意：原始代码在某些情况下会失败，修复归功于@ olivier-melançon

解决方案 14：

另一种方法：

>>> d1 = {'a':'2', 'b':'3'}
>>> d2 = {'a':'2', 'b':'3','c':'4'}
>>> d3 = {'a':'1'}
>>> set(d1.items()).issubset(d2.items())
True
>>> set(d3.items()).issubset(d2.items())
False

解决方案 15：

使用这个提供部分比较和良好差异的包装对象：


class DictMatch(dict):
    """ Partial match of a dictionary to another one """
    def __eq__(self, other: dict):
        assert isinstance(other, dict)
        return all(other[name] == value for name, value in self.items())

actual_name = {'praenomen': 'Gaius', 'nomen': 'Julius', 'cognomen': 'Caesar'}
expected_name = DictMatch({'praenomen': 'Gaius'})  # partial match
assert expected_name == actual_name  # True

解决方案 16：

如果 dict 中存在一些其他 dict 的数组，则大多数答案将不起作用，这里有一个解决方案：

def d_eq(d, d1):
   if not isinstance(d, (dict, list)):
      return d == d1
   if isinstance(d, list):
      return all(d_eq(a, b) for a, b in zip(d, d1))
   return all(d.get(i) == d1[i] or d_eq(d.get(i), d1[i]) for i in d1)

def is_sub(d, d1):
  if isinstance(d, list):
     return any(is_sub(i, d1) for i in d)
  return d_eq(d, d1) or (isinstance(d, dict) and any(is_sub(b, d1) for b in d.values()))

print(is_sub(dct_1, dict_2))

摘自如何检查 dict 是否是另一个复杂 dict 的子集

解决方案 17：

assert d1 == {k: v for (k,v) in d2.items() if k in d1}

这将从 pytest 给出一个干净的错误消息，因为它内置了对字典相等断言的支持。

解决方案 18：

此函数适用于不可哈希的值。我也认为它清晰易读。

def isSubDict(subDict,dictionary):
    for key in subDict.keys():
        if (not key in dictionary) or (not subDict[key] == dictionary[key]):
            return False
    return True

In [126]: isSubDict({1:2},{3:4})
Out[126]: False

In [127]: isSubDict({1:2},{1:2,3:4})
Out[127]: True

In [128]: isSubDict({1:{2:3}},{1:{2:3},3:4})
Out[128]: True

In [129]: isSubDict({1:{2:3}},{1:{2:4},3:4})
Out[129]: False

解决方案 19：

适用于嵌套字典的简短递归实现：

def compare_dicts(a,b):
    if not a: return True
    if isinstance(a, dict):
        key, val = a.popitem()
        return isinstance(b, dict) and key in b and compare_dicts(val, b.pop(key)) and compare_dicts(a, b)
    return a == b

这将消耗 a 和 b 字典。如果有人知道一个避免这种情况的好方法，而不需要像其他答案那样使用部分迭代解决方案，请告诉我。我需要一种基于键将字典拆分为头和尾的方法。

这段代码作为编程练习更有用，而且可能比这里混合递归和迭代的其他解决方案慢得多。@Nutcracker 的解决方案对于嵌套字典来说非常好。

解决方案 20：

我已经阅读了这里的大部分答案，但仍然认为缺少了一些东西：具体来说，必须考虑两个方面，希望有一个尽可能简单的解决方案：

一方面，所呈现的子字典与更大的的一部分相等dict，另一方面，所呈现的子字典与更大的的一部分相等，两者之间存在差异dict。
该解决方案肯定必须能够处理嵌套的dicts 和嵌套的lists，无论复杂程度如何。

注意：就我的目的而言，如果一个“上级字典”包含该“子字典”的所有键，以及其他一些键，即使这些键具有匹配的值，我也不会将该dict“子字典”视为另一个“上级字典”的“子字典”。这看起来可能有点奇怪，但是……我认为相等性/同一性的测试本身就是一个函数，为了简单起见，我首先应用了最严格的测试。

相反，主要标准是在上级字典结构中的某处找到下级字典（相等或恒等），可以使用dicts 和lists 进行深度嵌套。

对其进行调整以应用更宽松的键匹配和/或键值匹配条件相当简单。我认为将该函数与“遍历”结构的任务分开是合理的。从语言角度来说，我认为最好这样说：一个键dict比另一个键多，但所有公共键值对都匹配的函数，扩展了子字典……

我的解决方案是：

def is_subdict(subdict, superdict, identity_required=True, depth=0):
    if depth == 0:
        if not isinstance(subdict, dict):
            print(f'subdict is type {type(subdict)}, not dict: {subdict}')
            return False
        if not isinstance(superdict, dict):
            print(f'superdict is type {type(superdict)}, not dict: {superdict}')
            return False
        
    if dicts_match(subdict, superdict, identity_required):
        return True
    
    for key, value in superdict.items():
        if isinstance(value, dict):
            if is_subdict(subdict, value, identity_required, depth+1):
                return True
            
        elif isinstance(value, list):
            for element in value:
                if isinstance(element, dict):
                    if is_subdict(subdict, element, identity_required, depth+1):
                        return True
    return False

def dicts_match(subdict, superdict, identity_required):
    if identity_required:
        # strictest test for identity (can be modified)
        return subdict is superdict
    else:
        # strictest test for equality (can be modified)
        return subdict == superdict

本质上，这项任务可以归结为：探索dict大数组中的每个可能值dict。然后应用您想要的任何测试（相等性/同一性）。

以下是一些测试数据：

TITLE_STRING = 'title'
CHILD_ITEMS_STRING = 'child_items'
TREE_STRING = 'tree'

MAIN_DICT = {
    TREE_STRING: {
        CHILD_ITEMS_STRING: [
            {  TITLE_STRING: 'bobble',
                CHILD_ITEMS_STRING: [
                    {   TITLE_STRING: 'son of bobble',
                        CHILD_ITEMS_STRING: [
                            { TITLE_STRING: 'grandoffspring 1 of bobble',},
                            { TITLE_STRING: 'grandoffspring 2 of bobble',
                                CHILD_ITEMS_STRING: [
                                    { TITLE_STRING: 'great-grandoffspring 1 of bobble', },
                                ]
                            },
                            { TITLE_STRING: 'grandoffspring 3 of bobble', },
                        ]
                    },
                    { TITLE_STRING: '2nd son of bobble', },
                    { TITLE_STRING: '3rd son of bobble',
                        CHILD_ITEMS_STRING: [
                            { TITLE_STRING: 'granddaughter 1 of bobble', },
                            { TITLE_STRING: 'granddaughter 2 of bobble', },
                            { TITLE_STRING: 'granddaughter 3 of bobble', },
                        ]
                    },
                    { TITLE_STRING: '4th son of bobble', },
                ]
            },
            { TITLE_STRING: 'sibling of bobble', },
        ]
    }
}

# looks like part of the above, but instead is just **equal** to part of the above
TEST_DATA_4 = { TITLE_STRING: 'grandoffspring 1 of bobble', }

# **equal** to part of the above, except has an extra key
TEST_DATA_5 = { TITLE_STRING: 'grandoffspring 1 of bobble',
    'mash': 'mish'
}

...以及一些测试：

dict_actually_inside_superdict = MAIN_DICT[TREE_STRING][CHILD_ITEMS_STRING][0][CHILD_ITEMS_STRING][0][CHILD_ITEMS_STRING][1] 
print(f'dict_actually_inside_superdict[TITLE] {dict_actually_inside_superdict[TITLE_STRING]}') # ("grandoffstring 2 of bobble")
print(f'+++ is_subdict(dict_actually_inside_superdict, MAIN_DICT) {is_subdict(dict_actually_inside_superdict, MAIN_DICT)}') # True - subdict was found, on basis of identity, somewhere in superdict
print(f'+++ is_subdict(TEST_DATA_4, MAIN_DICT) {is_subdict(TEST_DATA_4, MAIN_DICT)}') # False: "subdict" not **inside** superdict, but keys and values match a part of superdict
print(f'+++ is_subdict(TEST_DATA_4, MAIN_DICT) {is_subdict(TEST_DATA_4, MAIN_DICT, False)}') # True: "subdict" not **inside** superdict, but keys and values match a part of superdict
print(f'+++ is_subdict(TEST_DATA_5, MAIN_DICT) {is_subdict(TEST_DATA_5, MAIN_DICT, False)}') # False: extra  key in submitted "subdict"

解决方案 21：

该anys包的AnyWithEntries匹配器使这变得简单，支持嵌套的任意子集匹配dict：

from anys import AnyWithEntries

assert {'a': '2', 'b': '3'} == AnyWithEntries({'a': '2'})
assert {'a': '2', 'b': '3'} != AnyWithEntries({'c': '2'})
assert {'a': '2', 'b': {'x': True, 'y': False}} == AnyWithEntries({
    'a': '2',
    'b': AnyWithEntries({'x': True}),
})
assert {'a': '2', 'b': {'x': True, 'y': False}} != AnyWithEntries({
    'a': '2',
    'b': {'x': True},
})

缺点是pytest的断言错误解释可读性较差。