比较 Python 字典和嵌套字典-IT科技

摘要：问题描述：我知道有很多类似的问题，但我的问题对我来说完全不同且困难。我有两本词典：d1 = {'a': {'b': {'cs': 10}, 'd': {'cs': 20}}} d2 = {'a': {'b': {'cs': 30}, 'd': {'cs': 20}}, 'newa': {'q': {'cs':...

问题描述：

我知道有很多类似的问题，但我的问题对我来说完全不同且困难。我有两本词典：

d1 = {'a': {'b': {'cs': 10}, 'd': {'cs': 20}}}
d2 = {'a': {'b': {'cs': 30}, 'd': {'cs': 20}}, 'newa': {'q': {'cs': 50}}}

即d1有键'a'，并且d2有键'a'和'newa'（换句话说d1是我的旧字典并且d2是我的新字典）。

我想遍历这些字典，如果键相同，则检查其值（嵌套字典），例如，当我'a'在中找到键时d2，我将检查是否存在'b'，如果是，则检查值'cs'（从更改为10）30，如果此值已更改，我想打印它。

另一种情况是，我想'newa'从中获取密钥d2作为新添加的密钥。

因此，在迭代这两个字典之后，预期的输出如下：

"d2" has new key "newa"
Value of "cs" is changed from 10 to 30 of key "b" which is of key "a"

我有以下代码，我正在尝试使用许多循环，但这些循环不起作用，但这也不是一个好选择，因此我正在寻找是否可以使用递归代码获得预期的输出。

for k, v in d1.iteritems():
    for k1, v1 in d2.iteritems():
        if k is k1:
            print k
            for k2 in v:
                for k3 in v1:
                    if k2 is k3:
                        print k2, "sub key matched"

        else:
            print "sorry no match found"

解决方案 1：

使用递归比较两个字典：

针对 Python 3 进行了编辑（也适用于 Python 2）：

d1= {'a':{'b':{'cs':10},'d':{'cs':20}}}
d2= {'a':{'b':{'cs':30} ,'d':{'cs':20}},'newa':{'q':{'cs':50}}}

def findDiff(d1, d2, path=""):
    for k in d1:
        if k in d2:
            if type(d1[k]) is dict:
                findDiff(d1[k],d2[k], "%s -> %s" % (path, k) if path else k)
            if d1[k] != d2[k]:
                result = [ "%s: " % path, " - %s : %s" % (k, d1[k]) , " + %s : %s" % (k, d2[k])]
                print("
".join(result))
        else:
            print ("%s%s as key not in d2
" % ("%s: " % path if path else "", k))

print("comparing d1 to d2:")
findDiff(d1,d2)
print("comparing d2 to d1:")
findDiff(d2,d1)

Python 2 旧答案：

def findDiff(d1, d2, path=""):
    for k in d1:
        if (k not in d2):
            print (path, ":")
            print (k + " as key not in d2", "
")
        else:
            if type(d1[k]) is dict:
                if path == "":
                    path = k
                else:
                    path = path + "->" + k
                findDiff(d1[k],d2[k], path)
            else:
                if d1[k] != d2[k]:
                    print (path, ":")
                    print (" - ", k," : ", d1[k])
                    print (" + ", k," : ", d2[k])

输出：

comparing d1 to d2:
a -> b: 
 - cs : 10
 + cs : 30
comparing d2 to d1:
a -> b: 
 - cs : 30
 + cs : 10

解决方案 2：

修改了用户3的代码，使其更好

d1= {'as': 1, 'a':
        {'b':
            {'cs':10,
             'qqq': {'qwe':1}
            },
            'd': {'csd':30}
        }
    }
d2= {'as': 3, 'a':
        {'b':
            {'cs':30,
             'qqq': 123
            },
            'd':{'csd':20}
        },
        'newa':
        {'q':
            {'cs':50}
        }
    }

def compare_dictionaries(dict_1, dict_2, dict_1_name, dict_2_name, path=""):
    """Compare two dictionaries recursively to find non matching elements

    Args:
        dict_1: dictionary 1
        dict_2: dictionary 2

    Returns: string

    """
    err = ''
    key_err = ''
    value_err = ''
    old_path = path
    for k in dict_1.keys():
        path = old_path + "[%s]" % k
        if not dict_2.has_key(k):
            key_err += "Key %s%s not in %s
" % (dict_1_name, path, dict_2_name)
        else:
            if isinstance(dict_1[k], dict) and isinstance(dict_2[k], dict):
                err += compare_dictionaries(dict_1[k],dict_2[k],'d1','d2', path)
            else:
                if dict_1[k] != dict_2[k]:
                    value_err += "Value of %s%s (%s) not same as %s%s (%s)
"\n                        % (dict_1_name, path, dict_1[k], dict_2_name, path, dict_2[k])

    for k in dict_2.keys():
        path = old_path + "[%s]" % k
        if not dict_1.has_key(k):
            key_err += "Key %s%s not in %s
" % (dict_2_name, path, dict_1_name)

    return key_err + value_err + err


a = compare_dictionaries(d1,d2,'d1','d2')
print a

输出：

Key d2[newa] not in d1
Value of d1[as] (1) not same as d2[as] (3)
Value of d1[a][b][cs] (10) not same as d2[a][b][cs] (30)
Value of d1[a][b][qqq] ({'qwe': 1}) not same as d2[a][b][qqq] (123)
Value of d1[a][d][csd] (30) not same as d2[a][d][csd] (20)

解决方案 3：

为什么不使用 deepdiff 库。

请参阅： https: //github.com/seperman/deepdiff

>>> from deepdiff import DeepDiff
>>> t1 = {1:1, 3:3, 4:4}
>>> t2 = {1:1, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> print(ddiff)
{'dictionary_item_added': {'root[5]', 'root[6]'}, 'dictionary_item_removed': {'root[4]'}}

当然它更强大，请查看文档了解更多信息。

解决方案 4：

这应该可以提供您所需要的有用的功能：

对于 Python 2.7

def isDict(obj):
    return obj.__class__.__name__ == 'dict'

def containsKeyRec(vKey, vDict):
    for curKey in vDict:
        if curKey == vKey or (isDict(vDict[curKey]) and containsKeyRec(vKey, vDict[curKey])):
            return True
    return False

def getValueRec(vKey, vDict):
    for curKey in vDict:
        if curKey == vKey:
            return vDict[curKey]
        elif isDict(vDict[curKey]) and getValueRec(vKey, vDict[curKey]):
            return containsKeyRec(vKey, vDict[curKey])
    return None

d1= {'a':{'b':{'cs':10},'d':{'cs':20}}}
d2= {'a':{'b':{'cs':30} ,'d':{'cs':20}},'newa':{'q':{'cs':50}}}

for key in d1:
    if containsKeyRec(key, d2):
        print "dict d2 contains key: " + key
        d2Value = getValueRec(key, d2)
        if d1[key] == d2Value:
            print "values are equal, d1: " + str(d1[key]) + ", d2: " + str(d2Value)
        else:
            print "values are not equal, d1: " + str(d1[key]) + ", d2: " + str(d2Value)

    else:
        print "dict d2 does not contain key: " + key

对于 Python 3 (或更高版本)：

def id_dict(obj):
    return obj.__class__.__name__ == 'dict'


def contains_key_rec(v_key, v_dict):
    for curKey in v_dict:
        if curKey == v_key or (id_dict(v_dict[curKey]) and contains_key_rec(v_key, v_dict[curKey])):
            return True
    return False


def get_value_rec(v_key, v_dict):
    for curKey in v_dict:
        if curKey == v_key:
            return v_dict[curKey]
        elif id_dict(v_dict[curKey]) and get_value_rec(v_key, v_dict[curKey]):
            return contains_key_rec(v_key, v_dict[curKey])
    return None


d1 = {'a': {'b': {'cs': 10}, 'd': {'cs': 20}}}
d2 = {'a': {'b': {'cs': 30}, 'd': {'cs': 20}}, 'newa': {'q': {'cs': 50}}}

for key in d1:
if contains_key_rec(key, d2):
    d2_value = get_value_rec(key, d2)
    if d1[key] == d2_value:
        print("values are equal, d1: " + str(d1[key]) + ", d2: " + str(d2_value))
        pass
    else:
        print("values are not equal:
"
              "list1: " + str(d1[key]) + "
" +
              "list2: " + str(d2_value))

else:
    print("dict d2 does not contain key: " + key)

解决方案 5：

对于 python 3 或更高版本，用于比较任何数据的代码。

def do_compare(data1, data2, data1_name, data2_name, path=""):
    if operator.eq(data1, data2) and not path:
        log.info("Both data have same content")
    else:
        if isinstance(data1, dict) and isinstance(data2, dict):
            compare_dict(data1, data2, data1_name, data2_name, path)
        elif isinstance(data1, list) and isinstance(data2, list):
            compare_list(data1, data2, data1_name, data2_name, path)
        else:
            if data1 != data2:
                value_err = "Value of %s%s (%s) not same as %s%s (%s)
"\n                            % (data1_name, path, data1, data2_name, path, data2)
                print (value_err)
        # findDiff(data1, data2)

def compare_dict(data1, data2, data1_name, data2_name, path):
    old_path = path
    for k in data1.keys():
        path = old_path + "[%s]" % k
        if k not in data2:
            key_err = "Key %s%s not in %s
" % (data1_name, path, data2_name)
            print (key_err)
        else:
            do_compare(data1[k], data2[k], data1_name, data2_name, path)
    for k in data2.keys():
        path = old_path + "[%s]" % k
        if k not in data1:
            key_err = "Key %s%s not in %s
" % (data2_name, path, data1_name)
            print (key_err)

def compare_list(data1, data2, data1_name, data2_name, path):
    data1_length = len(data1)
    data2_length = len(data2)
    old_path = path
    if data1_length != data2_length:
        value_err = "No: of items in %s%s (%s) not same as %s%s (%s)
"\n                            % (data1_name, path, data1_length, data2_name, path, data2_length)
        print (value_err)
    for index, item in enumerate(data1):
        path = old_path + "[%s]" % index
        try:
            do_compare(data1[index], data2[index], data1_name, data2_name, path)
        except IndexError:
            pass

解决方案 6：

添加一个可增加更多功能的版本：

可以比较任意嵌套的 JSON 类字典和列表
允许您指定要忽略的键（例如在不稳定的单元测试中）
允许您指定具有数值的键，只要这些数值彼此相差一定百分比，则这些数值将被视为相等

如果您定义deep_diff如下所示的函数并在@rkatkam 的示例上调用它，您将得到：

>>> deep_diff(d1, d2)

{'newa': (None, {'q': {'cs': 50}}), 'a': {'b': {'cs': (10, 30)}}}

这是函数定义：

def deep_diff(x, y, parent_key=None, exclude_keys=[], epsilon_keys=[]):
    """
    Take the deep diff of JSON-like dictionaries

    No warranties when keys, or values are None

    """
    # pylint: disable=unidiomatic-typecheck

    EPSILON = 0.5
    rho = 1 - EPSILON

    if x == y:
        return None

    if parent_key in epsilon_keys:
        xfl, yfl = float_or_None(x), float_or_None(y)
        if xfl and yfl and xfl * yfl >= 0 and rho * xfl <= yfl and rho * yfl <= xfl:
            return None

    if not (isinstance(x, (list, dict)) and (isinstance(x, type(y)) or isinstance(y, type(x)))):
        return x, y

    if isinstance(x, dict):
        d = type(x)()  # handles OrderedDict's as well
        for k in x.keys() ^ y.keys():
            if k in exclude_keys:
                continue
            if k in x:
                d[k] = (deepcopy(x[k]), None)
            else:
                d[k] = (None, deepcopy(y[k]))

        for k in x.keys() & y.keys():
            if k in exclude_keys:
                continue

            next_d = deep_diff(
                x[k], y[k], parent_key=k, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys
            )
            if next_d is None:
                continue

            d[k] = next_d

        return d if d else None

    # assume a list:
    d = [None] * max(len(x), len(y))
    flipped = False
    if len(x) > len(y):
        flipped = True
        x, y = y, x

    for i, x_val in enumerate(x):
        d[i] = (
            deep_diff(
                y[i], x_val, parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys
            )
            if flipped
            else deep_diff(
                x_val, y[i], parent_key=i, exclude_keys=exclude_keys, epsilon_keys=epsilon_keys
            )
        )

    for i in range(len(x), len(y)):
        d[i] = (y[i], None) if flipped else (None, y[i])

    return None if all(map(lambda x: x is None, d)) else d

解决方案 7：

添加非递归解决方案。

  # Non Recursively traverses through a large nested dictionary
  # Uses a queue of dicts_to_process to keep track of what needs to be traversed rather than using recursion.
  # Slightly more complex than the recursive version, but arguably better as there is no risk of stack overflow from
  # too many levels of recursion
  def get_dict_diff_non_recursive(dict1, dict2):
      dicts_to_process=[(dict1,dict2,"")]
      while dicts_to_process:
          d1,d2,current_path = dicts_to_process.pop()
          for key in d1.keys():
              current_path = os.path.join(current_path, f"{key}")
              #print(f"searching path {current_path}")
              if key not in d2 or d1[key] != d2[key]:
                  print(f"difference at {current_path}")
              if type(d1[key]) == dict:
                  dicts_to_process.append((d1[key],d2[key],current_path))
              elif type(d1[key]) == list and d1[key] and type(d1[key][0]) == dict:
                  for i in range(len(d1[key])):
                      dicts_to_process.append((d1[key][i], d2[key][i],current_path))

解决方案 8：

我不喜欢在许多线程中找到的许多答案...他们中的很多人都建议使用deepdiff非常强大的功能，不要误会我的意思，但它并没有给我我想要的输出，这不仅仅是一个差异字符串，或者一个新建的奇怪的字典，其中新的键是从原始的嵌套键中收集的...但实际上返回一个带有原始键和增量值的真实字典。

我的用例是，如果 MQTT 网络上没有差异，则发送较小的有效负载或不发送任何有效负载。

我找到的解决方案部分来自此链接，但经过修改后只提供增量。然后我递归解析它，diff_dict()如果嵌套则再次调用以构建最终的差异字典。结果发现它比许多示例简单得多。仅供参考，它不关心排序。

我的解决方案：

def diff_dict(d1, d2):
    d1_keys = set(d1.keys())
    d2_keys = set(d2.keys())
    shared_keys = d1_keys.intersection(d2_keys)
    shared_deltas = {o: (d1[o], d2[o]) for o in shared_keys if d1[o] != d2[o]}
    added_keys = d2_keys - d1_keys
    added_deltas = {o: (None, d2[o]) for o in added_keys}
    deltas = {**shared_deltas, **added_deltas}
    return parse_deltas(deltas)


def parse_deltas(deltas: dict):
    res = {}
    for k, v in deltas.items():
        if isinstance(v[0], dict):
            tmp = diff_dict(v[0], v[1])
            if tmp:
                res[k] = tmp
        else:
            res[k] = v[1]
    return res

例子：

original = {
    'int': 1,
    'float': 0.1000,
    'string': 'some string',
    'bool': True,
    'nested1': {
        'int': 2,
        'float': 0.2000,
        'string': 'some string2',
        'bool': True,
        'nested2': {
            'string': 'some string3'
        }
    }
}
new = {
    'int': 2,
    'string': 'some string',
    'nested1': {
        'int': 2,
        'float': 0.5000,
        'string': 'new string',
        'bool': False,
        'nested2': {
            'string': 'new string nested 2 time'
        }
    },
    'test_added': 'added_val'
}

print(diff_dict(original, new))

输出：

{'int': 2, 'nested1': {'string': 'new string', 'nested2': {'string': 'new string nested 2 time'}, 'bool': False, 'float': 0.5}, 'test_added': 'added_val'}

解决方案 9：

解决方案

def compare_dicts(dict1, dict2, indent=4, level=0, offset=0):
    if not (isinstance(dict1, dict) or isinstance(dict2, dict)):
        if dict1 == dict2:
            return 'OK!'
        else:
            return 'MISMATCH!'
        
    if level > 0:
        print()
    keys1 = set(dict1.keys())
    keys2 = set(dict2.keys())
    if len(keys1 | keys2) == 0:
        return '' if level else None
        
    max_len = max(tuple(map(len, keys1 | keys2))) + 2
    for key in keys1 & keys2:
        print(' '*indent*level + f'{key+":":<{max_len}}', end='')
        print(compare_dicts(dict1[key], dict2[key], indent=indent, level=level+1))
    for key in keys1 - keys2:
        print(' '*indent*level + f'{key+":":<{max_len}}'
              + 'presented only in dict 1!', end='')
    for key in keys2 - keys1:
        print(' '*indent*level + f'{key+":":<{max_len}}'
              + 'presented only in dict 2!', end='')
        
    return '' if level else None

例子

dict1 = {
    'a': 1,
    'b': {
        'ba': 21,
        'bb': 22,
        'bc': 23,
    },
    'c': 3,
    'd': 4,
}

dict2 = {
    'a': 1,
    'b': {
        'ba': 21,
        'bb': -22,
    },
    'c': 3,
    'd': -4,
    'e': 5,
}

compare_dicts(dict1, dict2)

输出

b: 
    bb: MISMATCH!
    ba: OK!
    bc: presented only in dict 1!
a: OK!
d: MISMATCH!
c: OK!
e: presented only in dict 2!