从Udacity的深度学习课程中,y_i的softmax仅仅是指数除以整个Y向量的指数之和:

其中S(y_i)是y_i的软最大函数e是指数函数j是no。输入向量Y中的列。

我试过以下几种方法:

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

返回:

[ 0.8360188   0.11314284  0.05083836]

但建议的解决方案是:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出,尽管第一个实现显式地取每列与Max的差值,然后除以和。

有人能用数学方法解释一下吗?一个是对的,另一个是错的?

实现在代码和时间复杂度方面是否相似?哪个更有效率?


当前回答

从数学的角度看,两边是相等的。

这很容易证明。m = max (x)。现在你的函数softmax返回一个向量,它的第i个坐标等于

注意,这适用于任何m,因为对于所有(甚至是复数)数e^m != 0

from computational complexity point of view they are also equivalent and both run in O(n) time, where n is the size of a vector. from numerical stability point of view, the first solution is preferred, because e^x grows very fast and even for pretty small values of x it will overflow. Subtracting the maximum value allows to get rid of this overflow. To practically experience the stuff I was talking about try to feed x = np.array([1000, 5]) into both of your functions. One will return correct probability, the second will overflow with nan your solution works only for vectors (Udacity quiz wants you to calculate it for matrices as well). In order to fix it you need to use sum(axis=0)

其他回答

在这里你可以找到为什么他们使用- max。

从这里开始:

“当你在实际中编写计算Softmax函数的代码时,由于指数的存在,中间项可能非常大。大数除法在数值上可能不稳定,所以使用标准化技巧很重要。”

目标是使用Numpy和Tensorflow实现类似的结果。与原始答案的唯一变化是np的轴参数。和api。

初始方法:axis=0 -然而,当维度为N时,这并不能提供预期的结果。

修改方法:axis=len(e_x.shape)-1 -总是在最后一个维度上求和。这提供了与tensorflow的softmax函数类似的结果。

def softmax_fn(input_array):
    """
    | **@author**: Prathyush SP
    |
    | Calculate Softmax for a given array
    :param input_array: Input Array
    :return: Softmax Score
    """
    e_x = np.exp(input_array - np.max(input_array))
    return e_x / e_x.sum(axis=len(e_x.shape)-1)

下面是使用numpy的广义解,以及与tensorflow和scipy的正确性比较:

数据准备:

import numpy as np

np.random.seed(2019)

batch_size = 1
n_items = 3
n_classes = 2
logits_np = np.random.rand(batch_size,n_items,n_classes).astype(np.float32)
print('logits_np.shape', logits_np.shape)
print('logits_np:')
print(logits_np)

输出:

logits_np.shape (1, 3, 2)
logits_np:
[[[0.9034822  0.3930805 ]
  [0.62397    0.6378774 ]
  [0.88049906 0.299172  ]]]

使用tensorflow的Softmax:

import tensorflow as tf

logits_tf = tf.convert_to_tensor(logits_np, np.float32)
scores_tf = tf.nn.softmax(logits_np, axis=-1)

print('logits_tf.shape', logits_tf.shape)
print('scores_tf.shape', scores_tf.shape)

with tf.Session() as sess:
    scores_np = sess.run(scores_tf)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np,axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出:

logits_tf.shape (1, 3, 2)
scores_tf.shape (1, 3, 2)
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

使用scipy的Softmax:

from scipy.special import softmax

scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出:

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.4965232  0.5034768 ]
  [0.6413727  0.35862732]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

Softmax使用numpy (https://nolanbconaway.github.io/blog/2017/softmax-numpy):

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats.
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter,
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p


scores_np = softmax(logits_np, axis=-1)

print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)

print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))

输出:

scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
  [0.49652317 0.5034768 ]
  [0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

似乎每个人都发布了他们的解决方案,所以我将发布我的:

def softmax(x):
    e_x = np.exp(x.T - np.max(x, axis = -1))
    return (e_x / e_x.sum(axis=0)).T

我得到了与从sklearn导入的完全相同的结果:

from sklearn.utils.extmath import softmax
import tensorflow as tf
import numpy as np

def softmax(x):
    return (np.exp(x).T / np.exp(x).sum(axis=-1)).T

logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])

sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()