我需要合并多个字典,这是我有例如:
dict1 = {1:{"a":{A}}, 2:{"b":{B}}}
dict2 = {2:{"c":{C}}, 3:{"d":{D}}}
A、B、C和D是树的叶子,比如{"info1":"value", "info2":"value2"}
字典的级别(深度)未知,可能是{2:{"c":{"z":{"y":{c}}}}}
在我的例子中,它表示一个目录/文件结构,节点是文档,叶子是文件。
我想将它们合并得到:
dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}
我不确定如何用Python轻松做到这一点。
这个简单的递归过程将一个字典合并到另一个字典,同时覆盖冲突的键:
#!/usr/bin/env python2.7
def merge_dicts(dict1, dict2):
""" Recursively merges dict2 into dict1 """
if not isinstance(dict1, dict) or not isinstance(dict2, dict):
return dict2
for k in dict2:
if k in dict1:
dict1[k] = merge_dicts(dict1[k], dict2[k])
else:
dict1[k] = dict2[k]
return dict1
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))
输出:
{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}
当然,代码将取决于您解决合并冲突的规则。这里有一个版本,它可以接受任意数量的参数,并递归地将它们合并到任意深度,而不使用任何对象突变。它使用以下规则来解决合并冲突:
字典优先于非字典值({"foo":{…}}优先于{"foo": "bar"})
后面的参数优先于前面的参数(如果按顺序合并{"a": 1}, {"a", 2}和{"a": 3},结果将是{"a": 3})
try:
from collections import Mapping
except ImportError:
Mapping = dict
def merge_dicts(*dicts):
"""
Return a new dictionary that is the result of merging the arguments together.
In case of conflicts, later arguments take precedence over earlier arguments.
"""
updated = {}
# grab all keys
keys = set()
for d in dicts:
keys = keys.union(set(d))
for key in keys:
values = [d[key] for d in dicts if key in d]
# which ones are mapping types? (aka dict)
maps = [value for value in values if isinstance(value, Mapping)]
if maps:
# if we have any mapping types, call recursively to merge them
updated[key] = merge_dicts(*maps)
else:
# otherwise, just grab the last value we have, since later arguments
# take precedence over earlier arguments
updated[key] = values[-1]
return updated
这个问题的一个问题是字典的值可以是任意复杂的数据块。基于这些和其他答案,我得出了以下代码:
class YamlReaderError(Exception):
pass
def data_merge(a, b):
"""merges b into a and return merged result
NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
key = None
# ## debug output
# sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
try:
if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
# border case for first run or if a is a primitive
a = b
elif isinstance(a, list):
# lists can be only appended
if isinstance(b, list):
# merge lists
a.extend(b)
else:
# append to list
a.append(b)
elif isinstance(a, dict):
# dicts must be merged
if isinstance(b, dict):
for key in b:
if key in a:
a[key] = data_merge(a[key], b[key])
else:
a[key] = b[key]
else:
raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
else:
raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
except TypeError, e:
raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
return a
我的用例是合并YAML文件,其中我只需要处理可能的数据类型的子集。因此我可以忽略元组和其他对象。对我来说,合理的合并逻辑意味着
取代标量
添加列表
通过添加缺失键和更新现有键来合并字典
其他任何事情和不可预见的事情都会导致错误。
以下是来自@andrew cooke的回答。
它以更好的方式处理嵌套列表。
def deep_merge_lists(original, incoming):
"""
Deep merge two lists. Modifies original.
Recursively call deep merge on each correlated element of list.
If item type in both elements are
a. dict: Call deep_merge_dicts on both values.
b. list: Recursively call deep_merge_lists on both values.
c. any other type: Value is overridden.
d. conflicting types: Value is overridden.
If length of incoming list is more that of original then extra values are appended.
"""
common_length = min(len(original), len(incoming))
for idx in range(common_length):
if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
deep_merge_dicts(original[idx], incoming[idx])
elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
deep_merge_lists(original[idx], incoming[idx])
else:
original[idx] = incoming[idx]
for idx in range(common_length, len(incoming)):
original.append(incoming[idx])
def deep_merge_dicts(original, incoming):
"""
Deep merge two dictionaries. Modifies original.
For key conflicts if both values are:
a. dict: Recursively call deep_merge_dicts on both values.
b. list: Call deep_merge_lists on both values.
c. any other type: Value is overridden.
d. conflicting types: Value is overridden.
"""
for key in incoming:
if key in original:
if isinstance(original[key], dict) and isinstance(incoming[key], dict):
deep_merge_dicts(original[key], incoming[key])
elif isinstance(original[key], list) and isinstance(incoming[key], list):
deep_merge_lists(original[key], incoming[key])
else:
original[key] = incoming[key]
else:
original[key] = incoming[key]
我还没有对此进行广泛的测试,因此,鼓励您的反馈。
from collections import defaultdict
dict1 = defaultdict(list)
dict2= defaultdict(list)
dict3= defaultdict(list)
dict1= dict(zip(Keys[ ],values[ ]))
dict2 = dict(zip(Keys[ ],values[ ]))
def mergeDict(dict1, dict2):
dict3 = {**dict1, **dict2}
for key, value in dict3.items():
if key in dict1 and key in dict2:
dict3[key] = [value , dict1[key]]
return dict3
dict3 = mergeDict(dict1, dict2)
#sort keys alphabetically.
dict3.keys()
合并两个字典并添加公共键的值