Special-purpose layers¶

class lasagne.layers.NonlinearityLayer(incoming, nonlinearity=lasagne.nonlinearities.rectify, **kwargs)[source]¶

A layer that just applies a nonlinearity.

Parameters:	incoming : a `Layer` instance or a tuple The layer feeding into this layer, or the expected input shape nonlinearity : callable or None The nonlinearity that is applied to the layer activations. If None is provided, the layer will be linear.

class lasagne.layers.BiasLayer(incoming, b=lasagne.init.Constant(0), shared_axes='auto', **kwargs)[source]¶

A layer that just adds a (trainable) bias term.

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape
b : Theano shared variable, expression, numpy array, callable or None: Initial value, expression or initializer for the biases. If set to None, the layer will have no biases and pass through its input unchanged. Otherwise, the bias shape must match the incoming shape, skipping those axes the biases are shared over (see the example below). See lasagne.utils.create_param() for more information.
shared_axes : ‘auto’, int or tuple of int: The axis or axes to share biases over. If 'auto' (the default), share over all axes except for the second: this will share biases over the minibatch dimension for dense layers, and additionally over all spatial dimensions for convolutional layers.

Notes

The bias parameter dimensionality is the input dimensionality minus the number of axes the biases are shared over, which matches the bias parameter conventions of DenseLayer or Conv2DLayer. For example:

>>> layer = BiasLayer((20, 30, 40, 50), shared_axes=(0, 2))
>>> layer.b.get_value().shape
(30, 50)

class lasagne.layers.ScaleLayer(incoming, scales=lasagne.init.Constant(1), shared_axes='auto', **kwargs)[source]¶

A layer that scales its inputs by learned coefficients.

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape
scales : Theano shared variable, expression, numpy array, or callable: Initial value, expression or initializer for the scale. The scale shape must match the incoming shape, skipping those axes the scales are shared over (see the example below). See lasagne.utils.create_param() for more information.
shared_axes : ‘auto’, int or tuple of int: The axis or axes to share scales over. If 'auto' (the default), share over all axes except for the second: this will share scales over the minibatch dimension for dense layers, and additionally over all spatial dimensions for convolutional layers.

Notes

The scales parameter dimensionality is the input dimensionality minus the number of axes the scales are shared over, which matches the bias parameter conventions of DenseLayer or Conv2DLayer. For example:

>>> layer = ScaleLayer((20, 30, 40, 50), shared_axes=(0, 2))
>>> layer.scales.get_value().shape
(30, 50)

lasagne.layers.standardize(layer, offset, scale, shared_axes='auto')[source]¶

Convenience function for standardizing inputs by applying a fixed offset and scale. This is usually useful when you want the input to your network to, say, have zero mean and unit standard deviation over the feature dimensions. This layer allows you to include the appropriate statistics to achieve this normalization as part of your network, and applies them to its input. The statistics are supplied as the offset and scale parameters, which are applied to the input by subtracting offset and dividing by scale, sharing dimensions as specified by the shared_axes argument.

Parameters:

layer : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape.
offset : Theano shared variable or numpy array: The offset to apply (via subtraction) to the axis/axes being standardized.
scale : Theano shared variable or numpy array: The scale to apply (via division) to the axis/axes being standardized.
shared_axes : ‘auto’, int or tuple of int: The axis or axes to share the offset and scale over. If 'auto' (the default), share over all axes except for the second: this will share scales over the minibatch dimension for dense layers, and additionally over all spatial dimensions for convolutional layers.

Examples

Assuming your training data exists in a 2D numpy ndarray called training_data, you can use this function to scale input features to the [0, 1] range based on the training set statistics like so:

>>> import lasagne
>>> import numpy as np
>>> training_data = np.random.standard_normal((100, 20))
>>> input_shape = (None, training_data.shape[1])
>>> l_in = lasagne.layers.InputLayer(input_shape)
>>> offset = training_data.min(axis=0)
>>> scale = training_data.max(axis=0) - training_data.min(axis=0)
>>> l_std = standardize(l_in, offset, scale, shared_axes=0)

Alternatively, to z-score your inputs based on training set statistics, you could set offset = training_data.mean(axis=0) and scale = training_data.std(axis=0) instead.

class lasagne.layers.ExpressionLayer(incoming, function, output_shape=None, **kwargs)[source]¶

This layer provides boilerplate for a custom layer that applies a simple transformation to the input.

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape.
function : callable: A function to be applied to the output of the previous layer.
output_shape : None, callable, tuple, or ‘auto’: Specifies the output shape of this layer. If a tuple, this fixes the output shape for any input shape (the tuple can contain None if some dimensions may vary). If a callable, it should return the calculated output shape given the input shape. If None, the output shape is assumed to be the same as the input shape. If ‘auto’, an attempt will be made to automatically infer the correct output shape.

Notes

An ExpressionLayer that does not change the shape of the data (i.e., is constructed with the default setting of output_shape=None) is functionally equivalent to a NonlinearityLayer.

Examples

>>> from lasagne.layers import InputLayer, ExpressionLayer
>>> l_in = InputLayer((32, 100, 20))
>>> l1 = ExpressionLayer(l_in, lambda X: X.mean(-1), output_shape='auto')
>>> l1.output_shape
(32, 100)

class lasagne.layers.InverseLayer(incoming, layer, **kwargs)[source]¶

The InverseLayer class performs inverse operations for a single layer of a neural network by applying the partial derivative of the layer to be inverted with respect to its input: transposed layer for a DenseLayer, deconvolutional layer for Conv2DLayer, Conv1DLayer; or an unpooling layer for MaxPool2DLayer.

It is specially useful for building (convolutional) autoencoders with tied parameters.

Note that if the layer to be inverted contains a nonlinearity and/or a bias, the InverseLayer will include the derivative of that in its computation.

Parameters:	incoming : a `Layer` instance or a tuple The layer feeding into this layer, or the expected input shape. layer : a `Layer` instance or a tuple The layer with respect to which the instance of the `InverseLayer` is inverse to.

Examples

>>> import lasagne
>>> from lasagne.layers import InputLayer, Conv2DLayer, DenseLayer
>>> from lasagne.layers import InverseLayer
>>> l_in = InputLayer((100, 3, 28, 28))
>>> l1 = Conv2DLayer(l_in, num_filters=16, filter_size=5)
>>> l2 = DenseLayer(l1, num_units=20)
>>> l_u2 = InverseLayer(l2, l2)  # backprop through l2
>>> l_u1 = InverseLayer(l_u2, l1)  # backprop through l1

class lasagne.layers.TransformerLayer(incoming, localization_network, downsample_factor=1, border_mode='nearest', **kwargs)[source]¶

Spatial transformer layer

The layer applies an affine transformation on the input. The affine transformation is parameterized with six learned parameters [1]. The output is interpolated with a bilinear transformation.

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape. The output of this layer should be a 4D tensor, with shape (batch_size, num_input_channels, input_rows, input_columns).
localization_network : a Layer instance: The network that calculates the parameters of the affine transformation. See the example for how to initialize to the identity transform.
downsample_factor : float or iterable of float: A float or a 2-element tuple specifying the downsample factor for the output image (in both spatial dimensions). A value of 1 will keep the original size of the input. Values larger than 1 will downsample the input. Values below 1 will upsample the input.
border_mode : ‘nearest’, ‘mirror’, or ‘wrap’: Determines how border conditions are handled during interpolation. If ‘nearest’, points outside the grid are clipped to the boundary. If ‘mirror’, points are mirrored across the boundary. If ‘wrap’, points wrap around to the other side of the grid. See http://stackoverflow.com/q/22669252/22670830#22670830 for details.

References

[1]	(1, 2, 3) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu (2015): Spatial Transformer Networks. NIPS 2015, http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf

Examples

Here we set up the layer to initially do the identity transform, similarly to [1]. Note that you will want to use a localization with linear output. If the output from the localization networks is [t1, t2, t3, t4, t5, t6] then t1 and t5 determines zoom, t2 and t4 determines skewness, and t3 and t6 move the center position.

>>> import numpy as np
>>> import lasagne
>>> b = np.zeros((2, 3), dtype='float32')
>>> b[0, 0] = 1
>>> b[1, 1] = 1
>>> b = b.flatten()  # identity transform
>>> W = lasagne.init.Constant(0.0)
>>> l_in = lasagne.layers.InputLayer((None, 3, 28, 28))
>>> l_loc = lasagne.layers.DenseLayer(l_in, num_units=6, W=W, b=b,
... nonlinearity=None)
>>> l_trans = lasagne.layers.TransformerLayer(l_in, l_loc)

class lasagne.layers.TPSTransformerLayer(incoming, localization_network, downsample_factor=1, control_points=16, precompute_grid='auto', border_mode='nearest', **kwargs)[source]¶

Spatial transformer layer

The layer applies a thin plate spline transformation [2] on the input as in [1]. The thin plate spline transform is determined based on the movement of some number of control points. The starting positions for these control points are fixed. The output is interpolated with a bilinear transformation.

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape. The output of this layer should be a 4D tensor, with shape (batch_size, num_input_channels, input_rows, input_columns).
localization_network : a Layer instance: The network that calculates the parameters of the thin plate spline transformation as the x and y coordinates of the destination offsets of each control point. The output of the localization network should be a 2D tensor, with shape (batch_size, 2 * num_control_points)
downsample_factor : float or iterable of float: A float or a 2-element tuple specifying the downsample factor for the output image (in both spatial dimensions). A value of 1 will keep the original size of the input. Values larger than 1 will downsample the input. Values below 1 will upsample the input.
control_points : integer: The number of control points to be used for the thin plate spline transformation. These points will be arranged as a grid along the image, so the value must be a perfect square. Default is 16.
precompute_grid : ‘auto’ or boolean: Flag to precompute the U function [2] for the grid and source points. If ‘auto’, will be set to true as long as the input height and width are specified. If true, the U function is computed when the layer is constructed for a fixed input shape. If false, grid will be computed as part of the Theano computational graph, which is substantially slower as this computation scales with num_pixels*num_control_points. Default is ‘auto’.
border_mode : ‘nearest’, ‘mirror’, or ‘wrap’: Determines how border conditions are handled during interpolation. If ‘nearest’, points outside the grid are clipped to the boundary’. If ‘mirror’, points are mirrored across the boundary. If ‘wrap’, points wrap around to the other side of the grid. See http://stackoverflow.com/q/22669252/22670830#22670830 for details.

References

[1]	(1, 2, 3) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu (2015): Spatial Transformer Networks. NIPS 2015, http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf

[2]	(1, 2, 3) Fred L. Bookstein (1989): Principal warps: thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence. http://doi.org/10.1109/34.24792

Examples

Here, we’ll implement an identity transform using a thin plate spline transform. First we’ll create the destination control point offsets. To make everything invariant to the shape of the image, the x and y range of the image is normalized to [-1, 1] as in ref [1]. To replicate an identity transform, we’ll set the bias to have all offsets be 0. More complicated transformations can easily be implemented using different x and y offsets (importantly, each control point can have it’s own pair of offsets).

>>> import numpy as np
>>> import lasagne
>>>
>>> # Create the network
>>> # we'll initialize the weights and biases to zero, so it starts
>>> # as the identity transform (all control point offsets are zero)
>>> W = b = lasagne.init.Constant(0.0)
>>>
>>> # Set the number of points
>>> num_points = 16
>>>
>>> l_in = lasagne.layers.InputLayer((None, 3, 28, 28))
>>> l_loc = lasagne.layers.DenseLayer(l_in, num_units=2*num_points,
...                                   W=W, b=b, nonlinearity=None)
>>> l_trans = lasagne.layers.TPSTransformerLayer(l_in, l_loc,
...                                          control_points=num_points)

class lasagne.layers.ParametricRectifierLayer(incoming, alpha=init.Constant(0.25), shared_axes='auto', **kwargs)[source]¶

A layer that applies parametric rectify nonlinearity to its input following [1].

Equation for the parametric rectifier linear unit: \(\varphi(x) = \max(x,0) + \alpha \min(x,0)\)

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape
alpha : Theano shared variable, expression, numpy array or callable: Initial value, expression or initializer for the alpha values. The shape must match the incoming shape, skipping those axes the alpha values are shared over (see the example below). See lasagne.utils.create_param() for more information.
shared_axes : ‘auto’, ‘all’, int or tuple of int: The axes along which the parameters of the rectifier units are going to be shared. If 'auto' (the default), share over all axes except for the second - this will share the parameter over the minibatch dimension for dense layers, and additionally over all spatial dimensions for convolutional layers. If 'all', share over all axes, which corresponds to a single scalar parameter.
**kwargs: Any additional keyword arguments are passed to the Layer superclass.

Notes

The alpha parameter dimensionality is the input dimensionality minus the number of axes it is shared over, which matches the same convention as the BiasLayer.

>>> layer = ParametricRectifierLayer((20, 3, 28, 28), shared_axes=(0, 3))
>>> layer.alpha.get_value().shape
(3, 28)

References

[1]	(1, 2) K He, X Zhang et al. (2015): Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, http://arxiv.org/abs/1502.01852

lasagne.layers.prelu(layer, **kwargs)[source]¶

Convenience function to apply parametric rectify to a given layer’s output. Will set the layer’s nonlinearity to identity if there is one and will apply the parametric rectifier instead.

Parameters:	layer: a :class:`Layer` instance The Layer instance to apply the parametric rectifier layer to; note that it will be irreversibly modified as specified above **kwargs Any additional keyword arguments are passed to the `ParametericRectifierLayer`

Examples

Note that this function modifies an existing layer, like this:

>>> from lasagne.layers import InputLayer, DenseLayer, prelu
>>> layer = InputLayer((32, 100))
>>> layer = DenseLayer(layer, num_units=200)
>>> layer = prelu(layer)

In particular, prelu() can not be passed as a nonlinearity.

class lasagne.layers.RandomizedRectifierLayer(incoming, lower=0.3, upper=0.8, shared_axes='auto', **kwargs)[source]¶

A layer that applies a randomized leaky rectify nonlinearity to its input.

The randomized leaky rectifier was first proposed and used in the Kaggle NDSB Competition, and later evaluated in [1]. Compared to the standard leaky rectifier leaky_rectify(), it has a randomly sampled slope for negative input during training, and a fixed slope during evaluation.

Equation for the randomized rectifier linear unit during training: \(\varphi(x) = \max((\sim U(lower, upper)) \cdot x, x)\)

During evaluation, the factor is fixed to the arithmetic mean of lower and upper.

Parameters:

incoming : a Layer instance or a tuple: The layer feeding into this layer, or the expected input shape
lower : Theano shared variable, expression, or constant: The lower bound for the randomly chosen slopes.
upper : Theano shared variable, expression, or constant: The upper bound for the randomly chosen slopes.
shared_axes : ‘auto’, ‘all’, int or tuple of int: The axes along which the random slopes of the rectifier units are going to be shared. If 'auto' (the default), share over all axes except for the second - this will share the random slope over the minibatch dimension for dense layers, and additionally over all spatial dimensions for convolutional layers. If 'all', share over all axes, thus using a single random slope.
**kwargs: Any additional keyword arguments are passed to the Layer superclass.

References

[1]	(1, 2) Bing Xu, Naiyan Wang et al. (2015): Empirical Evaluation of Rectified Activations in Convolutional Network, http://arxiv.org/abs/1505.00853

get_output_for(input, deterministic=False, **kwargs)[source]¶

Parameters:	input : tensor output from the previous layer deterministic : bool If true, the arithmetic mean of lower and upper are used for the leaky slope.

lasagne.layers.rrelu(layer, **kwargs)[source]¶

Convenience function to apply randomized rectify to a given layer’s output. Will set the layer’s nonlinearity to identity if there is one and will apply the randomized rectifier instead.

Parameters:	layer: a :class:`Layer` instance The Layer instance to apply the randomized rectifier layer to; note that it will be irreversibly modified as specified above **kwargs Any additional keyword arguments are passed to the `RandomizedRectifierLayer`

Examples

Note that this function modifies an existing layer, like this:

>>> from lasagne.layers import InputLayer, DenseLayer, rrelu
>>> layer = InputLayer((32, 100))
>>> layer = DenseLayer(layer, num_units=200)
>>> layer = rrelu(layer)

In particular, rrelu() can not be passed as a nonlinearity.