我创建了一个这样的对象:

company1.name = 'banana' 
company1.value = 40

我想保存这个对象。我该怎么做呢?


您可以在标准库中使用pickle模块。 下面是它在你的例子中的一个基本应用:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as outp:
    company1 = Company('banana', 40)
    pickle.dump(company1, outp, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, outp, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as inp:
    company1 = pickle.load(inp)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(inp)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

你也可以定义你自己的简单实用程序,像下面这样打开一个文件并写入一个对象:

def save_object(obj, filename):
    with open(filename, 'wb') as outp:  # Overwrites any existing file.
        pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

更新

由于这是一个非常流行的答案,我想谈谈一些稍微高级的用法主题。

cPickle(或_pickle) vs pickle

实际上,使用cPickle模块几乎总是比pickle更可取,因为前者是用C编写的,速度要快得多。它们之间有一些细微的区别,但在大多数情况下,它们是等效的,C版本将提供出色得多的性能。切换到它再简单不过了,只需将import语句改为这样:

import cPickle as pickle

在Python 3中,cPickle被重命名为_pickle,但现在不再需要这样做了,因为pickle模块现在会自动执行—参见Python 3中pickle和_pickle之间的区别?

简单来说,你可以使用类似下面这样的东西来确保你的代码在Python 2和3中都可用时总是使用C版本:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

数据流格式(协议)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, "Protocol version 0" is ASCII and therefore "human-readable". Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it's Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn't exist in Python 2.

幸运的是,pickle有速记法。在每个调用中(假设这是您想要的,并且您通常会这样做),只使用字面数-1—类似于通过负索引引用序列的最后一个元素。 所以,不要写:

pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)

你可以这样写:

pickle.dump(obj, outp, -1)

不管怎样,如果你创建了一个Pickler对象用于多个pickle操作,你只需要指定一次协议:

pickler = pickle.Pickler(outp, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

注意:如果您在运行不同版本的Python的环境中,那么您可能希望显式地使用(即硬编码)所有这些版本都可以读取的特定协议号(后期版本通常可以读取由早期版本生成的文件)。

多个对象

虽然pickle文件可以包含任意数量的pickle对象,如上面的示例所示,但当它们的数量未知时,通常更容易将它们全部存储在某种可变大小的容器中,如list、tuple或dict,并在一次调用中将它们全部写入文件:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

然后恢复列表和其中的所有内容:

with open('tech_companies.pkl', 'rb') as inp:
    tech_companies = pickle.load(inp)

这样做的主要优点是您不需要知道保存了多少对象实例以便稍后将它们加载回来(尽管可以在没有这些信息的情况下这样做,但需要一些稍微专门的代码)。查看相关问题的答案保存和加载pickle文件中的多个对象?有关不同方法的详细信息。就我个人而言,我最喜欢@Lutz Prechelt的回答,所以这就是下面示例代码中使用的方法:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickle_loader(filename):
    """ Deserialize a file of pickled objects. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickle_loader('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

I think it's a pretty strong assumption to assume that the object is a class. What if it's not a class? There's also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__ after creation, pickle doesn't respect the addition of those attributes (i.e. it 'forgets' they were added -- because pickle serializes by reference to the object definition).

在所有这些情况下,pickle和cPickle会让你非常失望。

如果你想保存一个对象(任意创建的),其中你有属性(添加在对象定义中,或之后)…你最好的选择是使用dill,它可以序列化python中的几乎任何东西。

我们从一节课开始…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

现在关机,重新启动…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

糟糕,腌黄瓜应付不来。我们试试莳萝。为了便于度量,我们将引入另一个对象类型(a lambda)。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

现在读取文件。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

它的工作原理。pickle失败而dill没有的原因是,dill将__main__视为一个模块(在大多数情况下),并且也可以pickle类定义,而不是通过引用pickle(就像pickle那样)。莳萝可以腌制lambda的原因是它给了它一个名字,然后腌制魔法就会发生。

实际上,有一种更简单的方法来保存所有这些对象,特别是当你创建了很多对象时。只需转储整个python会话,稍后再回到它。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

现在,关掉你的电脑,去喝杯浓缩咖啡什么的,然后再回来……

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

唯一的主要缺点是dill不是python标准库的一部分。所以如果你不能在你的服务器上安装python包,那么你就不能使用它。

但是,如果你能够在你的系统上安装python包,你可以通过git+https://github.com/uqfoundation/dill.git@master#egg=dill获得最新的dill。您可以使用pip install dill获得最新发布的版本。

您可以使用任何缓存来为您做这项工作。它考虑了所有的细节:

它使用莳萝作为后端, 它扩展了python pickle模块来处理lambda和所有漂亮的东西 python特性。 它将不同的对象存储到不同的文件中,并正确地重新加载它们。 限制缓存大小 允许缓存清理 允许在多个运行之间共享对象 允许尊重影响结果的输入文件

假设你有一个myfunc函数来创建实例:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache第一次调用myfunc并将结果pickle到 使用唯一标识符(取决于函数名及其参数)作为文件名的cachedir中的文件。 在任何连续的运行中,都会加载pickle对象。 如果在python运行之间保留了cachedir,则pickle对象将取自前一次python运行。

欲了解更多细节,请参阅文档

用python3使用问题中的company1的快速示例。

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

然而,正如这个答案所指出的,pickle经常失败。所以你应该用莳萝。

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))

新版本的熊猫还有保存泡菜的功能。

我发现它更容易。如。

pd.to_pickle(object_to_save,'/temp/saved_pkl.pickle' )