lasagne.objectives¶
Provides some minimal help with building loss expressions for training or validating a neural network.
Six functions build element- or item-wise loss expressions from network predictions and targets:
binary_crossentropy | Computes the binary cross-entropy between predictions and targets. |
categorical_crossentropy | Computes the categorical cross-entropy between predictions and targets. |
squared_error | Computes the element-wise squared difference between two tensors. |
binary_hinge_loss | Computes the binary hinge loss between predictions and targets. |
multiclass_hinge_loss | Computes the multi-class hinge loss between predictions and targets. |
huber_loss | Computes the huber loss between predictions and targets. |
A convenience function aggregates such losses into a scalar expression suitable for differentiation:
aggregate | Aggregates an element- or item-wise loss to a scalar loss. |
Note that these functions only serve to write more readable code, but are completely optional. Essentially, any differentiable scalar Theano expression can be used as a training objective.
Finally, two functions compute evaluation measures that are useful for validation and testing only, not for training:
binary_accuracy | Computes the binary accuracy between predictions and targets. |
categorical_accuracy | Computes the categorical accuracy between predictions and targets. |
Those can also be aggregated into a scalar expression if needed.
Examples¶
Assuming you have a simple neural network for 3-way classification:
>>> from lasagne.layers import InputLayer, DenseLayer, get_output
>>> from lasagne.nonlinearities import softmax, rectify
>>> l_in = InputLayer((100, 20))
>>> l_hid = DenseLayer(l_in, num_units=30, nonlinearity=rectify)
>>> l_out = DenseLayer(l_hid, num_units=3, nonlinearity=softmax)
And Theano variables representing your network input and targets:
>>> import theano
>>> data = theano.tensor.matrix('data')
>>> targets = theano.tensor.matrix('targets')
You’d first construct an element-wise loss expression:
>>> from lasagne.objectives import categorical_crossentropy, aggregate
>>> predictions = get_output(l_out, data)
>>> loss = categorical_crossentropy(predictions, targets)
Then aggregate it into a scalar (you could also just call mean() on it):
>>> loss = aggregate(loss, mode='mean')
Finally, this gives a loss expression you can pass to any of the update methods in lasagne.updates. For validation of a network, you will usually want to repeat these steps with deterministic network output, i.e., without dropout or any other nondeterministic computation in between:
>>> test_predictions = get_output(l_out, data, deterministic=True)
>>> test_loss = categorical_crossentropy(test_predictions, targets)
>>> test_loss = aggregate(test_loss)
This gives a loss expression good for monitoring validation error.
Loss functions¶
- lasagne.objectives.binary_crossentropy(predictions, targets)[source]¶
Computes the binary cross-entropy between predictions and targets.
\[L = -t \log(p) - (1 - t) \log(1 - p)\]Parameters: - predictions : Theano tensor
Predictions in (0, 1), such as sigmoidal output of a neural network.
- targets : Theano tensor
Targets in [0, 1], such as ground truth labels.
Returns: - Theano tensor
An expression for the element-wise binary cross-entropy.
Notes
This is the loss function of choice for binary classification problems and sigmoid output units.
- lasagne.objectives.categorical_crossentropy(predictions, targets)[source]¶
Computes the categorical cross-entropy between predictions and targets.
\[L_i = - \sum_j{t_{i,j} \log(p_{i,j})}\]\(p\) are the predictions, \(t\) are the targets, \(i\) denotes the data point and \(j\) denotes the class.
Parameters: - predictions : Theano 2D tensor
Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
- targets : Theano 2D tensor or 1D tensor
Either targets in [0, 1] matching the layout of predictions, or a vector of int giving the correct class index per data point. In the case of an integer vector argument, each element represents the position of the ‘1’ in a one-hot encoding.
Returns: - Theano 1D tensor
An expression for the item-wise categorical cross-entropy.
Notes
This is the loss function of choice for multi-class classification problems and softmax output units. For hard targets, i.e., targets that assign all of the probability to a single class per data point, providing a vector of int for the targets is usually slightly more efficient than providing a matrix with a single 1.0 per row.
- lasagne.objectives.squared_error(a, b)[source]¶
Computes the element-wise squared difference between two tensors.
\[L = (p - t)^2\]Parameters: - a, b : Theano tensor
The tensors to compute the squared difference between.
Returns: - Theano tensor
An expression for the element-wise squared difference.
Notes
This is the loss function of choice for many regression problems or auto-encoders with linear output units.
- lasagne.objectives.binary_hinge_loss(predictions, targets, delta=1, log_odds=None, binary=True)[source]¶
Computes the binary hinge loss between predictions and targets.
\[L_i = \max(0, \delta - t_i p_i)\]Parameters: - predictions : Theano tensor
Predictions in (0, 1), such as sigmoidal output of a neural network (or log-odds of predictions depending on log_odds).
- targets : Theano tensor
Targets in {0, 1} (or in {-1, 1} depending on binary), such as ground truth labels.
- delta : scalar, default 1
The hinge loss margin
- log_odds : bool, default None
False if predictions are sigmoid outputs in (0, 1), True if predictions are sigmoid inputs, or log-odds. If None, will assume True, but warn that the default will change to False.
- binary : bool, default True
True if targets are in {0, 1}, False if they are in {-1, 1}
Returns: - Theano tensor
An expression for the element-wise binary hinge loss
Notes
This is an alternative to the binary cross-entropy loss for binary classification problems.
Note that it is a drop-in replacement only when giving log_odds=False. Otherwise, it requires log-odds rather than sigmoid outputs. Be aware that depending on the Theano version, log_odds=False with a sigmoid output layer may be less stable than log_odds=True with a linear layer.
- lasagne.objectives.multiclass_hinge_loss(predictions, targets, delta=1)[source]¶
Computes the multi-class hinge loss between predictions and targets.
\[L_i = \max_{j \not = t_i} (0, p_j - p_{t_i} + \delta)\]Parameters: - predictions : Theano 2D tensor
Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
- targets : Theano 2D tensor or 1D tensor
Either a vector of int giving the correct class index per data point or a 2D tensor of one-hot encoding of the correct class in the same layout as predictions (non-binary targets in [0, 1] do not work!)
- delta : scalar, default 1
The hinge loss margin
Returns: - Theano 1D tensor
An expression for the item-wise multi-class hinge loss
Notes
This is an alternative to the categorical cross-entropy loss for multi-class classification problems
- lasagne.objectives.huber_loss(predictions, targets, delta=1)[source]¶
Computes the huber loss between predictions and targets.
\[L_i = \frac{(p - t)^2}{2}, |p - t| \le \delta\]\[L_i = \delta (|p - t| - \frac{\delta}{2} ), |p - t| \gt \delta\]Parameters: - predictions : Theano 2D tensor or 1D tensor
Prediction outputs of a neural network.
- targets : Theano 2D tensor or 1D tensor
Ground truth to which the prediction is to be compared with. Either a vector or 2D Tensor.
- delta : scalar, default 1
This delta value is defaulted to 1, for SmoothL1Loss described in Fast-RCNN paper [1] .
Returns: - Theano tensor
An expression for the element-wise huber loss [2] .
Notes
This is an alternative to the squared error for regression problems.
References
[1] (1, 2) Ross Girshick et al (2015): Fast RCNN https://arxiv.org/pdf/1504.08083.pdf [2] (1, 2) Huber, Peter et al (1964) Robust Estimation of a Location Parameter https://projecteuclid.org/euclid.aoms/1177703732
Aggregation functions¶
- lasagne.objectives.aggregate(loss, weights=None, mode='mean')[source]¶
Aggregates an element- or item-wise loss to a scalar loss.
Parameters: - loss : Theano tensor
The loss expression to aggregate.
- weights : Theano tensor, optional
The weights for each element or item, must be broadcastable to the same shape as loss if given. If omitted, all elements will be weighted the same.
- mode : {‘mean’, ‘sum’, ‘normalized_sum’}
Whether to aggregate by averaging, by summing or by summing and dividing by the total weights (which requires weights to be given).
Returns: - Theano scalar
A scalar loss expression suitable for differentiation.
Notes
By supplying binary weights (i.e., only using values 0 and 1), this function can also be used for masking out particular entries in the loss expression. Note that masked entries still need to be valid values, not-a-numbers (NaNs) will propagate through.
When applied to batch-wise loss expressions, setting mode to 'normalized_sum' ensures that the loss per batch is of a similar magnitude, independent of associated weights. However, it means that a given data point contributes more to the loss when it shares a batch with low-weighted or masked data points than with high-weighted ones.
Evaluation functions¶
- lasagne.objectives.binary_accuracy(predictions, targets, threshold=0.5)[source]¶
Computes the binary accuracy between predictions and targets.
\[L_i = \mathbb{I}(t_i = \mathbb{I}(p_i \ge \alpha))\]Parameters: - predictions : Theano tensor
Predictions in [0, 1], such as a sigmoidal output of a neural network, giving the probability of the positive class
- targets : Theano tensor
Targets in {0, 1}, such as ground truth labels.
- threshold : scalar, default: 0.5
Specifies at what threshold to consider the predictions being of the positive class
Returns: - Theano tensor
An expression for the element-wise binary accuracy in {0, 1}
Notes
This objective function should not be used with a gradient calculation; its gradient is zero everywhere. It is intended as a convenience for validation and testing, not training.
To obtain the average accuracy, call theano.tensor.mean() on the result, passing dtype=theano.config.floatX to compute the mean on GPU.
- lasagne.objectives.categorical_accuracy(predictions, targets, top_k=1)[source]¶
Computes the categorical accuracy between predictions and targets.
\[L_i = \mathbb{I}(t_i = \operatorname{argmax}_c p_{i,c})\]Can be relaxed to allow matches among the top \(k\) predictions:
\[L_i = \mathbb{I}(t_i \in \operatorname{argsort}_c (-p_{i,c})_{:k})\]Parameters: - predictions : Theano 2D tensor
Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
- targets : Theano 2D tensor or 1D tensor
Either a vector of int giving the correct class index per data point or a 2D tensor of 1 hot encoding of the correct class in the same layout as predictions
- top_k : int
Regard a prediction to be correct if the target class is among the top_k largest class probabilities. For the default value of 1, a prediction is correct only if the target class is the most probable.
Returns: - Theano 1D tensor
An expression for the item-wise categorical accuracy in {0, 1}
Notes
This is a strictly non differential function as it includes an argmax. This objective function should never be used with a gradient calculation. It is intended as a convenience for validation and testing not training.
To obtain the average accuracy, call theano.tensor.mean() on the result, passing dtype=theano.config.floatX to compute the mean on GPU.