从Udacity的深度学习课程中,y_i的softmax仅仅是指数除以整个Y向量的指数之和:
其中S(y_i)是y_i的软最大函数e是指数函数j是no。输入向量Y中的列。
我试过以下几种方法:
import numpy as np
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
scores = [3.0, 1.0, 0.2]
print(softmax(scores))
返回:
[ 0.8360188 0.11314284 0.05083836]
但建议的解决方案是:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
它产生与第一个实现相同的输出,尽管第一个实现显式地取每列与Max的差值,然后除以和。
有人能用数学方法解释一下吗?一个是对的,另一个是错的?
实现在代码和时间复杂度方面是否相似?哪个更有效率?
根据所有的回复和CS231n的注释,请允许我总结如下:
def softmax(x, axis):
x -= np.max(x, axis=axis, keepdims=True)
return np.exp(x) / np.exp(x).sum(axis=axis, keepdims=True)
用法:
x = np.array([[1, 0, 2,-1],
[2, 4, 6, 8],
[3, 2, 1, 0]])
softmax(x, axis=1).round(2)
输出:
array([[0.24, 0.09, 0.64, 0.03],
[0. , 0.02, 0.12, 0.86],
[0.64, 0.24, 0.09, 0.03]])
编辑。从1.2.0版本开始,scipy包含了softmax作为一个特殊函数:
https://scipy.github.io/devdocs/generated/scipy.special.softmax.html
我写了一个在任意轴上应用softmax的函数:
def softmax(X, theta = 1.0, axis = None):
"""
Compute the softmax of each element along an axis of X.
Parameters
----------
X: ND-Array. Probably should be floats.
theta (optional): float parameter, used as a multiplier
prior to exponentiation. Default = 1.0
axis (optional): axis to compute values along. Default is the
first non-singleton axis.
Returns an array the same size as X. The result will sum to 1
along the specified axis.
"""
# make X at least 2d
y = np.atleast_2d(X)
# find axis
if axis is None:
axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)
# multiply y against the theta parameter,
y = y * float(theta)
# subtract the max for numerical stability
y = y - np.expand_dims(np.max(y, axis = axis), axis)
# exponentiate y
y = np.exp(y)
# take the sum along the specified axis
ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)
# finally: divide elementwise
p = y / ax_sum
# flatten if X was 1D
if len(X.shape) == 1: p = p.flatten()
return p
正如其他用户所描述的那样,减去最大值是很好的做法。我在这里写了一篇详细的文章。