我有一个字典,里面有一大堆词条。我只对其中的几个感兴趣。有什么简单的方法可以把其他的都剪掉吗?


当前回答

给定你的原始字典orig和你感兴趣的键的条目集:

filtered = dict(zip(keys, [orig[k] for k in keys]))

这并不像delnan的答案那么好,但应该适用于每个感兴趣的Python版本。然而,它对原始字典中存在的每个键元素都是脆弱的。

其他回答

这是我的方法,支持嵌套字段,如mongo查询。

使用方法:

>>> obj = { "a":1, "b":{"c":2,"d":3}}
>>> only(obj,["a","b.c"])
{'a': 1, 'b': {'c': 2}}

只有函数:

def only(object,keys):
    obj = {}
    for path in keys:
        paths = path.split(".")
        rec=''
        origin = object
        target = obj
        for key in paths:
            rec += key
            if key in target:
                target = target[key]
                origin = origin[key]
                rec += '.'
                continue
            if key in origin:
                if rec == path:
                    target[key] = origin[key]
                else:
                    target[key] = {}
                target = target[key]
                origin = origin[key]
                rec += '.'
            else:
                target[key] = None
                break
    return obj

我们也可以通过稍微更优雅的字典理解来实现这一点:

my_dict = {"a":1,"b":2,"c":3,"d":4}

filtdict = {k: v for k, v in my_dict.items() if k.startswith('a')}
print(filtdict)

这只是一个简单的单行函数,带有一个过滤器,只允许现有的键。

data = {'give': 'what', 'not': '___', 'me': 'I', 'no': '___', 'these': 'needed'}
keys = ['give', 'me', 'these', 'not_present']

n = { k: data[k] for k in filter(lambda k: k in data, keys) }

print(n)
print(list(n.keys()))
print(list(n.values()))

输出:

{“给予”:“什么”,“我”:“我”,“这些”:“需要”} ['give', 'me', 'these'] ['what', 'I', 'needed']

在我看来,这是最简单的方法:

d1 = {'a':1, 'b':2, 'c':3}
d2 = {k:v for k,v in d1.items() if k in ['a','c']}

我也喜欢这样做来揭示价值观:

a, c = {k:v for k,v in d1.items() if k in ['a','c']}.values()

根据问题的标题,人们会期望在适当的地方过滤字典-几个答案建议了这样做的方法-仍然不明显的一个明显的方法是什么-我添加了一些时间:

import random
import timeit
import collections

repeat = 3
numbers = 10000

setup = ''
def timer(statement, msg='', _setup=None):
    print(msg, min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

timer('pass', 'Empty statement')

dsize = 1000
d = dict.fromkeys(range(dsize))
keep_keys = set(random.sample(range(dsize), 500))
drop_keys = set(random.sample(range(dsize), 500))

def _time_filter_dict():
    """filter a dict"""
    global setup
    setup = r"""from __main__ import dsize, collections, drop_keys, \
keep_keys, random"""
    timer('d = dict.fromkeys(range(dsize));'
          'collections.deque((d.pop(k) for k in drop_keys), maxlen=0)',
          "pop inplace - exhaust iterator")
    timer('d = dict.fromkeys(range(dsize));'
          'drop_keys = [k for k in d if k not in keep_keys];'
          'collections.deque('
              '(d.pop(k) for k in list(d) if k not in keep_keys), maxlen=0)',
          "pop inplace - exhaust iterator (drop_keys)")
    timer('d = dict.fromkeys(range(dsize));'
          'list(d.pop(k) for k in drop_keys)',
          "pop inplace - create list")
    timer('d = dict.fromkeys(range(dsize));'
          'drop_keys = [k for k in d if k not in keep_keys];'
          'list(d.pop(k) for k in drop_keys)',
          "pop inplace - create list (drop_keys)")
    timer('d = dict.fromkeys(range(dsize))\n'
          'for k in drop_keys: del d[k]', "del inplace")
    timer('d = dict.fromkeys(range(dsize));'
          'drop_keys = [k for k in d if k not in keep_keys]\n'
          'for k in drop_keys: del d[k]', "del inplace (drop_keys)")
    timer("""d = dict.fromkeys(range(dsize))
{k:v for k,v in d.items() if k in keep_keys}""", "copy dict comprehension")
    timer("""keep_keys=random.sample(range(dsize), 5)
d = dict.fromkeys(range(dsize))
{k:v for k,v in d.items() if k in keep_keys}""",
          "copy dict comprehension - small keep_keys")

if __name__ == '__main__':
    _time_filter_dict()

结果:

Empty statement 8.375600000000427e-05
pop inplace - exhaust iterator 1.046749841
pop inplace - exhaust iterator (drop_keys) 1.830537424
pop inplace - create list 1.1531293939999987
pop inplace - create list (drop_keys) 1.4512304149999995
del inplace 0.8008298079999996
del inplace (drop_keys) 1.1573763689999979
copy dict comprehension 1.1982901489999982
copy dict comprehension - small keep_keys 1.4407784069999998

因此,如果我们想要在适当的地方更新,似乎del是赢家-字典理解解决方案取决于正在创建的字典的大小,当然,删除一半的键已经太慢了-所以避免创建一个新的字典,如果你可以在适当的地方过滤。

编辑来解决@mpen的评论-我从keep_keys中计算了drop key(假设我们没有drop key) -我假设keep_keys/drop_keys是这个迭代的集合,或者会花很长时间。有了这些假设,del仍然更快——但要确定的是:如果你有一个(set, list, tuple)的下拉键,使用del