在Tensorflow中训练一个模型后:
如何拯救训练过的模型? 您以后如何恢复这个保存的模型?
在Tensorflow中训练一个模型后:
如何拯救训练过的模型? 您以后如何恢复这个保存的模型?
当前回答
下面是一个使用Tensorflow 2.0 SavedModel格式(根据文档,这是推荐的格式)的简单MNIST数据集分类器的简单示例,使用Keras函数式API,没有太多的花哨操作:
# Imports
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
# Load data
mnist = tf.keras.datasets.mnist # 28 x 28
(x_train,y_train), (x_test, y_test) = mnist.load_data()
# Normalize pixels [0,255] -> [0,1]
x_train = tf.keras.utils.normalize(x_train,axis=1)
x_test = tf.keras.utils.normalize(x_test,axis=1)
# Create model
input = Input(shape=(28,28), dtype='float64', name='graph_input')
x = Flatten()(input)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
output = Dense(10, activation='softmax', name='graph_output', dtype='float64')(x)
model = Model(inputs=input, outputs=output)
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train
model.fit(x_train, y_train, epochs=3)
# Save model in SavedModel format (Tensorflow 2.0)
export_path = 'model'
tf.saved_model.save(model, export_path)
# ... possibly another python program
# Reload model
loaded_model = tf.keras.models.load_model(export_path)
# Get image sample for testing
index = 0
img = x_test[index] # I normalized the image on a previous step
# Predict using the signature definition (Tensorflow 2.0)
predict = loaded_model.signatures["serving_default"]
prediction = predict(tf.constant(img))
# Show results
print(np.argmax(prediction['graph_output'])) # prints the class number
plt.imshow(x_test[index], cmap=plt.cm.binary) # prints the image
serving_default是什么?
它是所选标记的签名定义的名称(在本例中,选择了默认的服务标记)。此外,本文还解释了如何使用saved_model_cli查找模型的标记和签名。
免责声明
这只是一个基本的例子,如果你只是想让它运行起来,但这绝不是一个完整的答案-也许我可以在未来更新它。我只是想给出一个在TF 2.0中使用SavedModel的简单示例,因为我在任何地方都没有见过这样简单的SavedModel。
@Tom的回答是一个SavedModel的例子,但它在Tensorflow 2.0上不起作用,因为不幸的是有一些突破性的变化。
@Vishnuvardhan Janapati的回答是TF 2.0,但它不适合SavedModel格式。
其他回答
如果它是一个内部保存的模型,您只需为所有变量指定一个恢复器为
restorer = tf.train.Saver(tf.all_variables())
并使用它来恢复当前会话中的变量:
restorer.restore(self._sess, model_file)
对于外部模型,您需要指定从它的变量名到您的变量名的映射。您可以使用该命令查看模型变量名
python /path/to/tensorflow/tensorflow/python/tools/inspect_checkpoint.py --file_name=/path/to/pretrained_model/model.ckpt
inspect_checkpoint.py脚本可以在`。tensorflow源码的/tensorflow/python/tools文件夹。
为了指定映射,你可以使用我的Tensorflow-Worklab,它包含一组类和脚本来训练和再训练不同的模型。它包括一个再训练ResNet模型的例子,位于这里
tensorflow - 2.0
这很简单。
import tensorflow as tf
SAVE
model.save("model_name")
恢复
model = tf.keras.models.load_model('model_name')
在大多数情况下,使用tf.train.Saver从磁盘保存和恢复是最好的选择:
... # build your model
saver = tf.train.Saver()
with tf.Session() as sess:
... # train the model
saver.save(sess, "/tmp/my_great_model")
with tf.Session() as sess:
saver.restore(sess, "/tmp/my_great_model")
... # use the model
您还可以保存/恢复图结构本身(详细信息请参阅MetaGraph文档)。默认情况下,保存程序将图形结构保存到.meta文件中。您可以调用import_meta_graph()来恢复它。它恢复图形结构并返回一个你可以用来恢复模型状态的保护程序:
saver = tf.train.import_meta_graph("/tmp/my_great_model.meta")
with tf.Session() as sess:
saver.restore(sess, "/tmp/my_great_model")
... # use the model
然而,在某些情况下,您需要更快的方法。例如,如果您实现了早期停止,那么您希望在训练期间每次模型改进时都保存检查点(在验证集上测量),然后如果一段时间内没有进展,则希望回滚到最佳模型。如果每次模型改进时都将其保存到磁盘,则会极大地降低训练速度。诀窍是将变量状态保存到内存中,然后稍后恢复它们:
... # build your model
# get a handle on the graph nodes we need to save/restore the model
graph = tf.get_default_graph()
gvars = graph.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
assign_ops = [graph.get_operation_by_name(v.op.name + "/Assign") for v in gvars]
init_values = [assign_op.inputs[1] for assign_op in assign_ops]
with tf.Session() as sess:
... # train the model
# when needed, save the model state to memory
gvars_state = sess.run(gvars)
# when needed, restore the model state
feed_dict = {init_value: val
for init_value, val in zip(init_values, gvars_state)}
sess.run(assign_ops, feed_dict=feed_dict)
A quick explanation: when you create a variable X, TensorFlow automatically creates an assignment operation X/Assign to set the variable's initial value. Instead of creating placeholders and extra assignment ops (which would just make the graph messy), we just use these existing assignment ops. The first input of each assignment op is a reference to the variable it is supposed to initialize, and the second input (assign_op.inputs[1]) is the initial value. So in order to set any value we want (instead of the initial value), we need to use a feed_dict and replace the initial value. Yes, TensorFlow lets you feed a value for any op, not just for placeholders, so this works fine.
您可以保存网络中的变量使用
saver = tf.train.Saver()
saver.save(sess, 'path of save/fileName.ckpt')
要恢复网络以供以后或在另一个脚本中重用,请使用:
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('path of save/')
sess.run(....)
重要的几点:
第一次运行和以后运行之间的Sess必须相同(一致的结构)。 储蓄者。还原需要保存文件的文件夹路径,而不是单个文件路径。
Tensorflow 2 Docs
储蓄检查点
改编自文档
# -------------------------
# ----- Toy Context -----
# -------------------------
import tensorflow as tf
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
def toy_dataset():
inputs = tf.range(10.0)[:, None]
labels = inputs * 5.0 + tf.range(5.0)[None, :]
return (
tf.data.Dataset.from_tensor_slices(dict(x=inputs, y=labels)).repeat().batch(2)
)
def train_step(net, example, optimizer):
"""Trains `net` on `example` using `optimizer`."""
with tf.GradientTape() as tape:
output = net(example["x"])
loss = tf.reduce_mean(tf.abs(output - example["y"]))
variables = net.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return loss
# ----------------------------
# ----- Create Objects -----
# ----------------------------
net = Net()
opt = tf.keras.optimizers.Adam(0.1)
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)
# ----------------------------
# ----- Train and Save -----
# ----------------------------
ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
print("Restored from {}".format(manager.latest_checkpoint))
else:
print("Initializing from scratch.")
for _ in range(50):
example = next(iterator)
loss = train_step(net, example, opt)
ckpt.step.assign_add(1)
if int(ckpt.step) % 10 == 0:
save_path = manager.save()
print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
print("loss {:1.2f}".format(loss.numpy()))
# ---------------------
# ----- Restore -----
# ---------------------
# In another script, re-initialize objects
opt = tf.keras.optimizers.Adam(0.1)
net = Net()
dataset = toy_dataset()
iterator = iter(dataset)
ckpt = tf.train.Checkpoint(
step=tf.Variable(1), optimizer=opt, net=net, iterator=iterator
)
manager = tf.train.CheckpointManager(ckpt, "./tf_ckpts", max_to_keep=3)
# Re-use the manager code above ^
ckpt.restore(manager.latest_checkpoint)
if manager.latest_checkpoint:
print("Restored from {}".format(manager.latest_checkpoint))
else:
print("Initializing from scratch.")
for _ in range(50):
example = next(iterator)
# Continue training or evaluate etc.
更多的链接
详尽而有用的教程saved_model -> https://www.tensorflow.org/guide/saved_model Keras详细指南保存模型-> https://www.tensorflow.org/guide/keras/save_and_serialize
Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available. The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).
(重点是我自己的)
Tensorflow < 2
从文档中可以看出:
Save
# Create some variables.
v1 = tf.get_variable("v1", shape=[3], initializer = tf.zeros_initializer)
v2 = tf.get_variable("v2", shape=[5], initializer = tf.zeros_initializer)
inc_v1 = v1.assign(v1+1)
dec_v2 = v2.assign(v2-1)
# Add an op to initialize the variables.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables, do some work, and save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Do some work with the model.
inc_v1.op.run()
dec_v2.op.run()
# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in path: %s" % save_path)
恢复
tf.reset_default_graph()
# Create some variables.
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "/tmp/model.ckpt")
print("Model restored.")
# Check the values of the variables
print("v1 : %s" % v1.eval())
print("v2 : %s" % v2.eval())
simple_save
很多不错的答案,为了完整起见,我将添加我的2分:simple_save。这也是一个使用tf.data.Dataset API的独立代码示例。
Python 3;Tensorflow 1.14
import tensorflow as tf
from tensorflow.saved_model import tag_constants
with tf.Graph().as_default():
with tf.Session() as sess:
...
# Saving
inputs = {
"batch_size_placeholder": batch_size_placeholder,
"features_placeholder": features_placeholder,
"labels_placeholder": labels_placeholder,
}
outputs = {"prediction": model_output}
tf.saved_model.simple_save(
sess, 'path/to/your/location/', inputs, outputs
)
恢复:
graph = tf.Graph()
with restored_graph.as_default():
with tf.Session() as sess:
tf.saved_model.loader.load(
sess,
[tag_constants.SERVING],
'path/to/your/location/',
)
batch_size_placeholder = graph.get_tensor_by_name('batch_size_placeholder:0')
features_placeholder = graph.get_tensor_by_name('features_placeholder:0')
labels_placeholder = graph.get_tensor_by_name('labels_placeholder:0')
prediction = restored_graph.get_tensor_by_name('dense/BiasAdd:0')
sess.run(prediction, feed_dict={
batch_size_placeholder: some_value,
features_placeholder: some_other_value,
labels_placeholder: another_value
})
独立的例子
原创博客文章
为了便于演示,下面的代码生成随机数据。
We start by creating the placeholders. They will hold the data at runtime. From them, we create the Dataset and then its Iterator. We get the iterator's generated tensor, called input_tensor which will serve as input to our model. The model itself is built from input_tensor: a GRU-based bidirectional RNN followed by a dense classifier. Because why not. The loss is a softmax_cross_entropy_with_logits, optimized with Adam. After 2 epochs (of 2 batches each), we save the "trained" model with tf.saved_model.simple_save. If you run the code as is, then the model will be saved in a folder called simple/ in your current working directory. In a new graph, we then restore the saved model with tf.saved_model.loader.load. We grab the placeholders and logits with graph.get_tensor_by_name and the Iterator initializing operation with graph.get_operation_by_name. Lastly we run an inference for both batches in the dataset, and check that the saved and restored model both yield the same values. They do!
代码:
import os
import shutil
import numpy as np
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants
def model(graph, input_tensor):
"""Create the model which consists of
a bidirectional rnn (GRU(10)) followed by a dense classifier
Args:
graph (tf.Graph): Tensors' graph
input_tensor (tf.Tensor): Tensor fed as input to the model
Returns:
tf.Tensor: the model's output layer Tensor
"""
cell = tf.nn.rnn_cell.GRUCell(10)
with graph.as_default():
((fw_outputs, bw_outputs), (fw_state, bw_state)) = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell,
cell_bw=cell,
inputs=input_tensor,
sequence_length=[10] * 32,
dtype=tf.float32,
swap_memory=True,
scope=None)
outputs = tf.concat((fw_outputs, bw_outputs), 2)
mean = tf.reduce_mean(outputs, axis=1)
dense = tf.layers.dense(mean, 5, activation=None)
return dense
def get_opt_op(graph, logits, labels_tensor):
"""Create optimization operation from model's logits and labels
Args:
graph (tf.Graph): Tensors' graph
logits (tf.Tensor): The model's output without activation
labels_tensor (tf.Tensor): Target labels
Returns:
tf.Operation: the operation performing a stem of Adam optimizer
"""
with graph.as_default():
with tf.variable_scope('loss'):
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=labels_tensor, name='xent'),
name="mean-xent"
)
with tf.variable_scope('optimizer'):
opt_op = tf.train.AdamOptimizer(1e-2).minimize(loss)
return opt_op
if __name__ == '__main__':
# Set random seed for reproducibility
# and create synthetic data
np.random.seed(0)
features = np.random.randn(64, 10, 30)
labels = np.eye(5)[np.random.randint(0, 5, (64,))]
graph1 = tf.Graph()
with graph1.as_default():
# Random seed for reproducibility
tf.set_random_seed(0)
# Placeholders
batch_size_ph = tf.placeholder(tf.int64, name='batch_size_ph')
features_data_ph = tf.placeholder(tf.float32, [None, None, 30], 'features_data_ph')
labels_data_ph = tf.placeholder(tf.int32, [None, 5], 'labels_data_ph')
# Dataset
dataset = tf.data.Dataset.from_tensor_slices((features_data_ph, labels_data_ph))
dataset = dataset.batch(batch_size_ph)
iterator = tf.data.Iterator.from_structure(dataset.output_types, dataset.output_shapes)
dataset_init_op = iterator.make_initializer(dataset, name='dataset_init')
input_tensor, labels_tensor = iterator.get_next()
# Model
logits = model(graph1, input_tensor)
# Optimization
opt_op = get_opt_op(graph1, logits, labels_tensor)
with tf.Session(graph=graph1) as sess:
# Initialize variables
tf.global_variables_initializer().run(session=sess)
for epoch in range(3):
batch = 0
# Initialize dataset (could feed epochs in Dataset.repeat(epochs))
sess.run(
dataset_init_op,
feed_dict={
features_data_ph: features,
labels_data_ph: labels,
batch_size_ph: 32
})
values = []
while True:
try:
if epoch < 2:
# Training
_, value = sess.run([opt_op, logits])
print('Epoch {}, batch {} | Sample value: {}'.format(epoch, batch, value[0]))
batch += 1
else:
# Final inference
values.append(sess.run(logits))
print('Epoch {}, batch {} | Final inference | Sample value: {}'.format(epoch, batch, values[-1][0]))
batch += 1
except tf.errors.OutOfRangeError:
break
# Save model state
print('\nSaving...')
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
shutil.rmtree(path, ignore_errors=True)
inputs_dict = {
"batch_size_ph": batch_size_ph,
"features_data_ph": features_data_ph,
"labels_data_ph": labels_data_ph
}
outputs_dict = {
"logits": logits
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Ok')
# Restoring
graph2 = tf.Graph()
with graph2.as_default():
with tf.Session(graph=graph2) as sess:
# Restore saved values
print('\nRestoring...')
tf.saved_model.loader.load(
sess,
[tag_constants.SERVING],
path
)
print('Ok')
# Get restored placeholders
labels_data_ph = graph2.get_tensor_by_name('labels_data_ph:0')
features_data_ph = graph2.get_tensor_by_name('features_data_ph:0')
batch_size_ph = graph2.get_tensor_by_name('batch_size_ph:0')
# Get restored model output
restored_logits = graph2.get_tensor_by_name('dense/BiasAdd:0')
# Get dataset initializing operation
dataset_init_op = graph2.get_operation_by_name('dataset_init')
# Initialize restored dataset
sess.run(
dataset_init_op,
feed_dict={
features_data_ph: features,
labels_data_ph: labels,
batch_size_ph: 32
}
)
# Compute inference for both batches in dataset
restored_values = []
for i in range(2):
restored_values.append(sess.run(restored_logits))
print('Restored values: ', restored_values[i][0])
# Check if original inference and restored inference are equal
valid = all((v == rv).all() for v, rv in zip(values, restored_values))
print('\nInferences match: ', valid)
这将打印:
$ python3 save_and_restore.py
Epoch 0, batch 0 | Sample value: [-0.13851789 -0.3087595 0.12804556 0.20013677 -0.08229901]
Epoch 0, batch 1 | Sample value: [-0.00555491 -0.04339041 -0.05111827 -0.2480045 -0.00107776]
Epoch 1, batch 0 | Sample value: [-0.19321944 -0.2104792 -0.00602257 0.07465433 0.11674127]
Epoch 1, batch 1 | Sample value: [-0.05275984 0.05981954 -0.15913513 -0.3244143 0.10673307]
Epoch 2, batch 0 | Final inference | Sample value: [-0.26331693 -0.13013336 -0.12553 -0.04276478 0.2933622 ]
Epoch 2, batch 1 | Final inference | Sample value: [-0.07730117 0.11119192 -0.20817074 -0.35660955 0.16990358]
Saving...
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: b'/some/path/simple/saved_model.pb'
Ok
Restoring...
INFO:tensorflow:Restoring parameters from b'/some/path/simple/variables/variables'
Ok
Restored values: [-0.26331693 -0.13013336 -0.12553 -0.04276478 0.2933622 ]
Restored values: [-0.07730117 0.11119192 -0.20817074 -0.35660955 0.16990358]
Inferences match: True