我有一个数据结构,本质上相当于一个嵌套的字典。假设它是这样的:

{'new jersey': {'mercer county': {'plumbers': 3,
                                  'programmers': 81},
                'middlesex county': {'programmers': 81,
                                     'salesmen': 62}},
 'new york': {'queens county': {'plumbers': 9,
                                'salesmen': 36}}}

现在,维护和创造这个是相当痛苦的;每次我有一个新的州/县/职业,我都必须通过讨厌的try/catch块创建下层字典。此外,如果我想遍历所有值,就必须创建恼人的嵌套迭代器。

我也可以使用元组作为键,像这样:

{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'programmers'): 81,
 ('new jersey', 'middlesex county', 'salesmen'): 62,
 ('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

这使得遍历值非常简单和自然,但在语法上更痛苦的事情,如聚合和查看字典的子集(例如,如果我只想逐个状态查看)。

基本上,有时我想把一个嵌套字典看作一个平面字典,有时我想把它看作一个复杂的层次结构。我可以把这些都打包到一个类中,但似乎有人已经这样做了。或者,似乎有一些非常优雅的语法结构可以做到这一点。

我怎样才能做得更好呢?

附录:我知道setdefault(),但它并不是真正的干净语法。此外,您创建的每个子字典仍然需要手动设置setdefault()。


当前回答

我可以把这些都打包到一个类中,但似乎有人已经这样做了。

来自开源ndicts包(我是作者)的NestedDict类试图减轻处理嵌套字典的痛苦。我认为它满足了所有问题的要求。

这里有它的功能概要,要了解更多细节,请查看文档。

初始化

>>> from ndicts import NestedDict
>>> nd = NestedDict({"a": {"aa": 0}, "b": 1})

得到项目

把NestedDict看作是一个扁平的字典。

>>> nd["a", "aa"]
0

同时,您可以获得中间节点,而不仅仅是叶值。

>>> nd["a"]
{"aa": 0}

如果键不存在,则抛出异常。

>>> nd["asd"]
Traceback (most recent call last):
...
KeyError: ('asd',)

设置项

与普通字典一样,如果缺少一个键,则将它添加到NestedDict中。

>>> nd["a", "ab"] = 2
>>> nd
NestedDict({"a": {"aa": 0, "ab": 2}, "b": 1})

这允许从一个空的NestedDict开始,可以通过设置新项来激活它。

迭代

谈到迭代,可以把NestedDict看作是一个扁平的字典。我们熟悉的.keys(), .values()和.item()方法是可用的。

>>> [key for key in nd]
[('a', 'aa'), ('a', 'ab'), ('b',)]
>>> [value for value in nd.values()]
[0, 2, 1]

其他回答

我以前用过这个函数。安全、快捷、易于维护。

def deep_get(dictionary, keys, default=None):
    return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)

例子:

>>> from functools import reduce
>>> def deep_get(dictionary, keys, default=None):
...     return reduce(lambda d, key: d.get(key, default) if isinstance(d, dict) else default, keys.split("."), dictionary)
...
>>> person = {'person':{'name':{'first':'John'}}}
>>> print (deep_get(person, "person.name.first"))
John
>>> print (deep_get(person, "person.name.lastname"))
None
>>> print (deep_get(person, "person.name.lastname", default="No lastname"))
No lastname
>>>

我也有类似的事情。我有很多这样的案例:

thedict = {}
for item in ('foo', 'bar', 'baz'):
  mydict = thedict.get(item, {})
  mydict = get_value_for(item)
  thedict[item] = mydict

但要深入很多层次。这是“。”Get (item,{})",这是一个键,因为如果已经没有字典,它将创建另一个字典。与此同时,我一直在想办法对付 这个更好。现在,有很多

value = mydict.get('foo', {}).get('bar', {}).get('baz', 0)

所以,我做了:

def dictgetter(thedict, default, *args):
  totalargs = len(args)
  for i,arg in enumerate(args):
    if i+1 == totalargs:
      thedict = thedict.get(arg, default)
    else:
      thedict = thedict.get(arg, {})
  return thedict

如果你这样做,效果是一样的:

value = dictgetter(mydict, 0, 'foo', 'bar', 'baz')

更好吗?我想是的。

我发现setdefault非常有用;它检查一个键是否存在,如果不存在就添加它:

d = {}
d.setdefault('new jersey', {}).setdefault('mercer county', {})['plumbers'] = 3

Setdefault总是返回相关的键,所以你实际上是在原地更新'd'的值。

说到迭代,我相信你可以很容易地编写一个生成器,如果Python中还没有这样的生成器:

def iterateStates(d):
    # Let's count up the total number of "plumbers" / "dentists" / etc.
    # across all counties and states
    job_totals = {}

    # I guess this is the annoying nested stuff you were talking about?
    for (state, counties) in d.iteritems():
        for (county, jobs) in counties.iteritems():
            for (job, num) in jobs.iteritems():
                # If job isn't already in job_totals, default it to zero
                job_totals[job] = job_totals.get(job, 0) + num

    # Now return an iterator of (job, number) tuples
    return job_totals.iteritems()

# Display all jobs
for (job, num) in iterateStates(d):
    print "There are %d %s in total" % (job, num)
class JobDb(object):
    def __init__(self):
        self.data = []
        self.all = set()
        self.free = []
        self.index1 = {}
        self.index2 = {}
        self.index3 = {}

    def _indices(self,(key1,key2,key3)):
        indices = self.all.copy()
        wild = False
        for index,key in ((self.index1,key1),(self.index2,key2),
                                             (self.index3,key3)):
            if key is not None:
                indices &= index.setdefault(key,set())
            else:
                wild = True
        return indices, wild

    def __getitem__(self,key):
        indices, wild = self._indices(key)
        if wild:
            return dict(self.data[i] for i in indices)
        else:
            values = [self.data[i][-1] for i in indices]
            if values:
                return values[0]

    def __setitem__(self,key,value):
        indices, wild = self._indices(key)
        if indices:
            for i in indices:
                self.data[i] = key,value
        elif wild:
            raise KeyError(k)
        else:
            if self.free:
                index = self.free.pop(0)
                self.data[index] = key,value
            else:
                index = len(self.data)
                self.data.append((key,value))
                self.all.add(index)
            self.index1.setdefault(key[0],set()).add(index)
            self.index2.setdefault(key[1],set()).add(index)
            self.index3.setdefault(key[2],set()).add(index)

    def __delitem__(self,key):
        indices,wild = self._indices(key)
        if not indices:
            raise KeyError
        self.index1[key[0]] -= indices
        self.index2[key[1]] -= indices
        self.index3[key[2]] -= indices
        self.all -= indices
        for i in indices:
            self.data[i] = None
        self.free.extend(indices)

    def __len__(self):
        return len(self.all)

    def __iter__(self):
        for key,value in self.data:
            yield key

例子:

>>> db = JobDb()
>>> db['new jersey', 'mercer county', 'plumbers'] = 3
>>> db['new jersey', 'mercer county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'programmers'] = 81
>>> db['new jersey', 'middlesex county', 'salesmen'] = 62
>>> db['new york', 'queens county', 'plumbers'] = 9
>>> db['new york', 'queens county', 'salesmen'] = 36

>>> db['new york', None, None]
{('new york', 'queens county', 'plumbers'): 9,
 ('new york', 'queens county', 'salesmen'): 36}

>>> db[None, None, 'plumbers']
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new york', 'queens county', 'plumbers'): 9}

>>> db['new jersey', 'mercer county', None]
{('new jersey', 'mercer county', 'plumbers'): 3,
 ('new jersey', 'mercer county', 'programmers'): 81}

>>> db['new jersey', 'middlesex county', 'programmers']
81

>>>

编辑:现在使用通配符(None)查询时返回字典,否则返回单值。

你可以在lambdas和defaultdict中使用递归,不需要定义名称:

a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))

这里有一个例子:

>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
        {'new jersey': defaultdict(<function __main__.<lambda>>,
                     {'mercer county': defaultdict(<function __main__.<lambda>>,
                                  {'plumbers': 3, 'programmers': 81}),
                      'middlesex county': defaultdict(<function __main__.<lambda>>,
                                  {'programmers': 81, 'salesmen': 62})})})