`lasagne.objectives`¶

Provides some minimal help with building loss expressions for training or validating a neural network.

Six functions build element- or item-wise loss expressions from network predictions and targets:

`binary_crossentropy`	Computes the binary cross-entropy between predictions and targets.
`categorical_crossentropy`	Computes the categorical cross-entropy between predictions and targets.
`squared_error`	Computes the element-wise squared difference between two tensors.
`binary_hinge_loss`	Computes the binary hinge loss between predictions and targets.
`multiclass_hinge_loss`	Computes the multi-class hinge loss between predictions and targets.
`huber_loss`	Computes the huber loss between predictions and targets.

A convenience function aggregates such losses into a scalar expression suitable for differentiation:

aggregate Aggregates an element- or item-wise loss to a scalar loss.

Note that these functions only serve to write more readable code, but are completely optional. Essentially, any differentiable scalar Theano expression can be used as a training objective.

Finally, two functions compute evaluation measures that are useful for validation and testing only, not for training:

`binary_accuracy`	Computes the binary accuracy between predictions and targets.
`categorical_accuracy`	Computes the categorical accuracy between predictions and targets.

Those can also be aggregated into a scalar expression if needed.

Examples¶

Assuming you have a simple neural network for 3-way classification:

>>> from lasagne.layers import InputLayer, DenseLayer, get_output
>>> from lasagne.nonlinearities import softmax, rectify
>>> l_in = InputLayer((100, 20))
>>> l_hid = DenseLayer(l_in, num_units=30, nonlinearity=rectify)
>>> l_out = DenseLayer(l_hid, num_units=3, nonlinearity=softmax)

And Theano variables representing your network input and targets:

>>> import theano
>>> data = theano.tensor.matrix('data')
>>> targets = theano.tensor.matrix('targets')

You’d first construct an element-wise loss expression:

>>> from lasagne.objectives import categorical_crossentropy, aggregate
>>> predictions = get_output(l_out, data)
>>> loss = categorical_crossentropy(predictions, targets)

Then aggregate it into a scalar (you could also just call mean() on it):

>>> loss = aggregate(loss, mode='mean')

Finally, this gives a loss expression you can pass to any of the update methods in lasagne.updates. For validation of a network, you will usually want to repeat these steps with deterministic network output, i.e., without dropout or any other nondeterministic computation in between:

>>> test_predictions = get_output(l_out, data, deterministic=True)
>>> test_loss = categorical_crossentropy(test_predictions, targets)
>>> test_loss = aggregate(test_loss)

This gives a loss expression good for monitoring validation error.

Loss functions¶

lasagne.objectives.binary_crossentropy(predictions, targets)[source]¶

Computes the binary cross-entropy between predictions and targets.

\[L = -t \log(p) - (1 - t) \log(1 - p)\]

Parameters:	predictions : Theano tensor Predictions in (0, 1), such as sigmoidal output of a neural network. targets : Theano tensor Targets in [0, 1], such as ground truth labels.
Returns:	Theano tensor An expression for the element-wise binary cross-entropy.

Notes

This is the loss function of choice for binary classification problems and sigmoid output units.

lasagne.objectives.categorical_crossentropy(predictions, targets)[source]¶

Computes the categorical cross-entropy between predictions and targets.

\[L_i = - \sum_j{t_{i,j} \log(p_{i,j})}\]

\(p\) are the predictions, \(t\) are the targets, \(i\) denotes the data point and \(j\) denotes the class.

Parameters:

predictions : Theano 2D tensor: Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
targets : Theano 2D tensor or 1D tensor: Either targets in [0, 1] matching the layout of predictions, or a vector of int giving the correct class index per data point. In the case of an integer vector argument, each element represents the position of the ‘1’ in a one-hot encoding.

Returns:

Theano 1D tensor: An expression for the item-wise categorical cross-entropy.

Notes

This is the loss function of choice for multi-class classification problems and softmax output units. For hard targets, i.e., targets that assign all of the probability to a single class per data point, providing a vector of int for the targets is usually slightly more efficient than providing a matrix with a single 1.0 per row.

lasagne.objectives.squared_error(a, b)[source]¶

Computes the element-wise squared difference between two tensors.

\[L = (p - t)^2\]

Parameters:	a, b : Theano tensor The tensors to compute the squared difference between.
Returns:	Theano tensor An expression for the element-wise squared difference.

Notes

This is the loss function of choice for many regression problems or auto-encoders with linear output units.

lasagne.objectives.binary_hinge_loss(predictions, targets, delta=1, log_odds=None, binary=True)[source]¶

Computes the binary hinge loss between predictions and targets.

\[L_i = \max(0, \delta - t_i p_i)\]

Parameters:

predictions : Theano tensor: Predictions in (0, 1), such as sigmoidal output of a neural network (or log-odds of predictions depending on log_odds).
targets : Theano tensor: Targets in {0, 1} (or in {-1, 1} depending on binary), such as ground truth labels.
delta : scalar, default 1: The hinge loss margin
log_odds : bool, default None: False if predictions are sigmoid outputs in (0, 1), True if predictions are sigmoid inputs, or log-odds. If None, will assume True, but warn that the default will change to False.
binary : bool, default True: True if targets are in {0, 1}, False if they are in {-1, 1}

Returns:

Theano tensor: An expression for the element-wise binary hinge loss

Notes

This is an alternative to the binary cross-entropy loss for binary classification problems.

Note that it is a drop-in replacement only when giving log_odds=False. Otherwise, it requires log-odds rather than sigmoid outputs. Be aware that depending on the Theano version, log_odds=False with a sigmoid output layer may be less stable than log_odds=True with a linear layer.

lasagne.objectives.multiclass_hinge_loss(predictions, targets, delta=1)[source]¶

Computes the multi-class hinge loss between predictions and targets.

\[L_i = \max_{j \not = t_i} (0, p_j - p_{t_i} + \delta)\]

Parameters:

predictions : Theano 2D tensor: Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
targets : Theano 2D tensor or 1D tensor: Either a vector of int giving the correct class index per data point or a 2D tensor of one-hot encoding of the correct class in the same layout as predictions (non-binary targets in [0, 1] do not work!)
delta : scalar, default 1: The hinge loss margin

Returns:

Theano 1D tensor: An expression for the item-wise multi-class hinge loss

Notes

This is an alternative to the categorical cross-entropy loss for multi-class classification problems

lasagne.objectives.huber_loss(predictions, targets, delta=1)[source]¶

Computes the huber loss between predictions and targets.

\[L_i = \frac{(p - t)^2}{2}, |p - t| \le \delta\]\[L_i = \delta (|p - t| - \frac{\delta}{2} ), |p - t| \gt \delta\]

Parameters:	predictions : Theano 2D tensor or 1D tensor Prediction outputs of a neural network. targets : Theano 2D tensor or 1D tensor Ground truth to which the prediction is to be compared with. Either a vector or 2D Tensor. delta : scalar, default 1 This delta value is defaulted to 1, for SmoothL1Loss described in Fast-RCNN paper [1] .
Returns:	Theano tensor An expression for the element-wise huber loss [2] .

Notes

This is an alternative to the squared error for regression problems.

References

[1]	(1, 2) Ross Girshick et al (2015): Fast RCNN https://arxiv.org/pdf/1504.08083.pdf

[2]	(1, 2) Huber, Peter et al (1964) Robust Estimation of a Location Parameter https://projecteuclid.org/euclid.aoms/1177703732

Aggregation functions¶

lasagne.objectives.aggregate(loss, weights=None, mode='mean')[source]¶

Aggregates an element- or item-wise loss to a scalar loss.

Parameters:

loss : Theano tensor: The loss expression to aggregate.
weights : Theano tensor, optional: The weights for each element or item, must be broadcastable to the same shape as loss if given. If omitted, all elements will be weighted the same.
mode : {‘mean’, ‘sum’, ‘normalized_sum’}: Whether to aggregate by averaging, by summing or by summing and dividing by the total weights (which requires weights to be given).

Returns:

Theano scalar: A scalar loss expression suitable for differentiation.

Notes

By supplying binary weights (i.e., only using values 0 and 1), this function can also be used for masking out particular entries in the loss expression. Note that masked entries still need to be valid values, not-a-numbers (NaNs) will propagate through.

When applied to batch-wise loss expressions, setting mode to 'normalized_sum' ensures that the loss per batch is of a similar magnitude, independent of associated weights. However, it means that a given data point contributes more to the loss when it shares a batch with low-weighted or masked data points than with high-weighted ones.

Evaluation functions¶

lasagne.objectives.binary_accuracy(predictions, targets, threshold=0.5)[source]¶

Computes the binary accuracy between predictions and targets.

\[L_i = \mathbb{I}(t_i = \mathbb{I}(p_i \ge \alpha))\]

Parameters:	predictions : Theano tensor Predictions in [0, 1], such as a sigmoidal output of a neural network, giving the probability of the positive class targets : Theano tensor Targets in {0, 1}, such as ground truth labels. threshold : scalar, default: 0.5 Specifies at what threshold to consider the predictions being of the positive class
Returns:	Theano tensor An expression for the element-wise binary accuracy in {0, 1}

Notes

This objective function should not be used with a gradient calculation; its gradient is zero everywhere. It is intended as a convenience for validation and testing, not training.

To obtain the average accuracy, call theano.tensor.mean() on the result, passing dtype=theano.config.floatX to compute the mean on GPU.

lasagne.objectives.categorical_accuracy(predictions, targets, top_k=1)[source]¶

Computes the categorical accuracy between predictions and targets.

\[L_i = \mathbb{I}(t_i = \operatorname{argmax}_c p_{i,c})\]

Can be relaxed to allow matches among the top \(k\) predictions:

\[L_i = \mathbb{I}(t_i \in \operatorname{argsort}_c (-p_{i,c})_{:k})\]

Parameters:

predictions : Theano 2D tensor: Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
targets : Theano 2D tensor or 1D tensor: Either a vector of int giving the correct class index per data point or a 2D tensor of 1 hot encoding of the correct class in the same layout as predictions
top_k : int: Regard a prediction to be correct if the target class is among the top_k largest class probabilities. For the default value of 1, a prediction is correct only if the target class is the most probable.

Returns:

Theano 1D tensor: An expression for the item-wise categorical accuracy in {0, 1}

Notes

This is a strictly non differential function as it includes an argmax. This objective function should never be used with a gradient calculation. It is intended as a convenience for validation and testing not training.

To obtain the average accuracy, call theano.tensor.mean() on the result, passing dtype=theano.config.floatX to compute the mean on GPU.

lasagne.objectives¶

Examples¶

Loss functions¶

Aggregation functions¶

Evaluation functions¶

`lasagne.objectives`¶