lasagne.objectives¶
Provides some minimal help with building loss expressions for training or validating a neural network.
Three functions build element- or item-wise loss expressions from network predictions and targets:
binary_crossentropy | Computes the binary cross-entropy between predictions and targets. |
categorical_crossentropy | Computes the categorical cross-entropy between predictions and targets. |
squared_error | Computes the element-wise squared difference between two tensors. |
A convenience function aggregates such losses into a scalar expression suitable for differentiation:
aggregate | Aggregates an element- or item-wise loss to a scalar loss. |
Note that these functions only serve to write more readable code, but are completely optional. Essentially, any differentiable scalar Theano expression can be used as a training objective.
Examples¶
Assuming you have a simple neural network for 3-way classification:
>>> from lasagne.layers import InputLayer, DenseLayer, get_output
>>> from lasagne.nonlinearities import softmax, rectify
>>> l_in = InputLayer((100, 20))
>>> l_hid = DenseLayer(l_in, num_units=30, nonlinearity=rectify)
>>> l_out = DenseLayer(l_hid, num_units=3, nonlinearity=softmax)
And Theano variables representing your network input and targets:
>>> import theano
>>> data = theano.tensor.matrix('data')
>>> targets = theano.tensor.matrix('targets')
You’d first construct an element-wise loss expression:
>>> from lasagne.objectives import categorical_crossentropy, aggregate
>>> predictions = get_output(l_out, data)
>>> loss = categorical_crossentropy(predictions, targets)
Then aggregate it into a scalar (you could also just call mean() on it):
>>> loss = aggregate(loss, mode='mean')
Finally, this gives a loss expression you can pass to any of the update methods in lasagne.updates. For validation of a network, you will usually want to repeat these steps with deterministic network output, i.e., without dropout or any other nondeterministic computation in between:
>>> test_predictions = get_output(l_out, data, deterministic=True)
>>> test_loss = categorical_crossentropy(test_predictions, targets)
>>> test_loss = aggregate(test_loss)
This gives a loss expression good for monitoring validation error.
Loss functions¶
- lasagne.objectives.binary_crossentropy(predictions, targets)[source]¶
Computes the binary cross-entropy between predictions and targets.
\[L = -t \log(p) - (1 - t) \log(1 - p)\]Parameters: predictions : Theano tensor
Predictions in (0, 1), such as sigmoidal output of a neural network.
targets : Theano tensor
Targets in [0, 1], such as ground truth labels.
Returns: Theano tensor
An expression for the element-wise binary cross-entropy.
Notes
This is the loss function of choice for binary classification problems and sigmoid output units.
- lasagne.objectives.categorical_crossentropy(predictions, targets)[source]¶
Computes the categorical cross-entropy between predictions and targets.
\[L_i = - \sum_j{t_{i,j} \log(p_{i,j})}\]Parameters: predictions : Theano 2D tensor
Predictions in (0, 1), such as softmax output of a neural network, with data points in rows and class probabilities in columns.
targets : Theano 2D tensor or 1D tensor
Either targets in [0, 1] matching the layout of predictions, or a vector of int giving the correct class index per data point.
Returns: Theano 1D tensor
An expression for the item-wise categorical cross-entropy.
Notes
This is the loss function of choice for multi-class classification problems and softmax output units. For hard targets, i.e., targets that assign all of the probability to a single class per data point, providing a vector of int for the targets is usually slightly more efficient than providing a matrix with a single 1.0 per row.
- lasagne.objectives.squared_error(a, b)[source]¶
Computes the element-wise squared difference between two tensors.
\[L = (p - t)^2\]Parameters: a, b : Theano tensor
The tensors to compute the squared difference between.
Returns: Theano tensor
An expression for the item-wise squared difference.
Notes
This is the loss function of choice for many regression problems or auto-encoders with linear output units.
Aggregation functions¶
- lasagne.objectives.aggregate(loss, weights=None, mode='mean')[source]¶
Aggregates an element- or item-wise loss to a scalar loss.
Parameters: loss : Theano tensor
The loss expression to aggregate.
weights : Theano tensor, optional
The weights for each element or item, must be broadcastable to the same shape as loss if given. If omitted, all elements will be weighted the same.
mode : {‘mean’, ‘sum’, ‘normalized_sum’}
Whether to aggregate by averaging, by summing or by summing and dividing by the total weights (which requires weights to be given).
Returns: Theano scalar
A scalar loss expression suitable for differentiation.
Notes
By supplying binary weights (i.e., only using values 0 and 1), this function can also be used for masking out particular entries in the loss expression. Note that masked entries still need to be valid values, not-a-numbers (NaNs) will propagate through.
When applied to batch-wise loss expressions, setting mode to 'normalized_sum' ensures that the loss per batch is of a similar magnitude, independent of associated weights. However, it means that a given datapoint contributes more to the loss when it shares a batch with low-weighted or masked datapoints than with high-weighted ones.