从Udacity的深度学习课程中,y_i的softmax仅仅是指数除以整个Y向量的指数之和:

其中S(y_i)是y_i的软最大函数e是指数函数j是no。输入向量Y中的列。

我试过以下几种方法:

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回:

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出,尽管第一个实现显式地取每列与Max的差值,然后除以和。

有人能用数学方法解释一下吗?一个是对的,另一个是错的?

实现在代码和时间复杂度方面是否相似?哪个更有效率?


当前回答

在上面的回答中已经回答了很多细节。Max被减去以避免溢出。我在这里添加了python3中的另一个实现。

import numpy as np
def softmax(x):
    mx = np.amax(x,axis=1,keepdims = True)
    x_exp = np.exp(x - mx)
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)
    res = x_exp / x_sum
    return res

x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

其他回答

从数学的角度看,两边是相等的。

这很容易证明。m = max (x)。现在你的函数softmax返回一个向量,它的第i个坐标等于

注意,这适用于任何m,因为对于所有(甚至是复数)数e^m != 0

from computational complexity point of view they are also equivalent and both run in O(n) time, where n is the size of a vector. from numerical stability point of view, the first solution is preferred, because e^x grows very fast and even for pretty small values of x it will overflow. Subtracting the maximum value allows to get rid of this overflow. To practically experience the stuff I was talking about try to feed x = np.array([1000, 5]) into both of your functions. One will return correct probability, the second will overflow with nan your solution works only for vectors (Udacity quiz wants you to calculate it for matrices as well). In order to fix it you need to use sum(axis=0)

在这里你可以找到为什么他们使用- max。

从这里开始:

“当你在实际中编写计算Softmax函数的代码时,由于指数的存在,中间项可能非常大。大数除法在数值上可能不稳定,所以使用标准化技巧很重要。”

更简明的说法是:

def softmax(x):
    return np.exp(x) / np.exp(x).sum(axis=0)

我用了这三句简单的话:

x_exp=np.exp(x)
x_sum=np.sum(x_exp, axis = 1, keepdims = True)
s=x_exp / x_sum
import tensorflow as tf
import numpy as np

def softmax(x):
    return (np.exp(x).T / np.exp(x).sum(axis=-1)).T

logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])

sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()