lasagne.nonlinearities¶
Non-linear activation functions for artificial neurons.
sigmoid(x) | Sigmoid activation function \(\varphi(x) = \frac{1}{1 + e^{-x}}\) |
softmax(x) | Softmax activation function \(\varphi(\mathbf{x})_j = \frac{e^{\mathbf{x}_j}}{\sum_{k=1}^K e^{\mathbf{x}_k}}\) where \(K\) is the total number of neurons in the layer. |
tanh(x) | Tanh activation function \(\varphi(x) = \tanh(x)\) |
ScaledTanH([scale_in, scale_out]) | Scaled tanh \(\varphi(x) = \tanh(\alpha \cdot x) \cdot \beta\) |
rectify(x) | Rectify activation function \(\varphi(x) = \max(0, x)\) |
LeakyRectify([leakiness]) | Leaky rectifier \(\varphi(x) = (x > 0)? x : \alpha \cdot x\) |
leaky_rectify(x) | Instance of LeakyRectify with leakiness \(\alpha=0.01\) |
very_leaky_rectify(x) | Instance of LeakyRectify with leakiness \(\alpha=1/3\) |
elu(x) | Exponential Linear Unit \(\varphi(x) = (x > 0) ? x : e^x - 1\) |
SELU([scale, scale_neg]) | Scaled Exponential Linear Unit |
selu(x) | Instance of SELU with :math:`alphaapprox 1.6733, |
softplus(x) | Softplus activation function \(\varphi(x) = \log(1 + e^x)\) |
linear(x) | Linear activation function \(\varphi(x) = x\) |
identity(x) | Linear activation function \(\varphi(x) = x\) |
Detailed description¶
- lasagne.nonlinearities.sigmoid(x)[source]¶
Sigmoid activation function \(\varphi(x) = \frac{1}{1 + e^{-x}}\)
Parameters: - x : float32
The activation (the summed, weighted input of a neuron).
Returns: - float32 in [0, 1]
The output of the sigmoid function applied to the activation.
- lasagne.nonlinearities.softmax(x)[source]¶
Softmax activation function \(\varphi(\mathbf{x})_j = \frac{e^{\mathbf{x}_j}}{\sum_{k=1}^K e^{\mathbf{x}_k}}\) where \(K\) is the total number of neurons in the layer. This activation function gets applied row-wise.
Parameters: - x : float32
The activation (the summed, weighted input of a neuron).
Returns: - float32 where the sum of the row is 1 and each single value is in [0, 1]
The output of the softmax function applied to the activation.
- lasagne.nonlinearities.tanh(x)[source]¶
Tanh activation function \(\varphi(x) = \tanh(x)\)
Parameters: - x : float32
The activation (the summed, weighted input of a neuron).
Returns: - float32 in [-1, 1]
The output of the tanh function applied to the activation.
- class lasagne.nonlinearities.ScaledTanH(scale_in=1, scale_out=1)[source]¶
Scaled tanh \(\varphi(x) = \tanh(\alpha \cdot x) \cdot \beta\)
This is a modified tanh function which allows to rescale both the input and the output of the activation.
Scaling the input down will result in decreasing the maximum slope of the tanh and as a result it will be in the linear regime in a larger interval of the input space. Scaling the input up will increase the maximum slope of the tanh and thus bring it closer to a step function.
Scaling the output changes the output interval to \([-\beta,\beta]\).
Parameters: - scale_in : float32
The scale parameter \(\alpha\) for the input
- scale_out : float32
The scale parameter \(\beta\) for the output
Notes
LeCun et al. (in [1], Section 4.4) suggest scale_in=2./3 and scale_out=1.7159, which has \(\varphi(\pm 1) = \pm 1\), maximum second derivative at 1, and an effective gain close to 1.
By carefully matching \(\alpha\) and \(\beta\), the nonlinearity can also be tuned to preserve the mean and variance of its input:
- scale_in=0.5, scale_out=2.4: If the input is a random normal variable, the output will have zero mean and unit variance.
- scale_in=1, scale_out=1.6: Same property, but with a smaller linear regime in input space.
- scale_in=0.5, scale_out=2.27: If the input is a uniform normal variable, the output will have zero mean and unit variance.
- scale_in=1, scale_out=1.48: Same property, but with a smaller linear regime in input space.
References
[1] (1, 2) LeCun, Yann A., et al. (1998): Efficient BackProp, http://link.springer.com/chapter/10.1007/3-540-49430-8_2, http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf [2] Masci, Jonathan, et al. (2011): Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction, http://link.springer.com/chapter/10.1007/978-3-642-21735-7_7, http://people.idsia.ch/~ciresan/data/icann2011.pdf Examples
In contrast to other activation functions in this module, this is a class that needs to be instantiated to obtain a callable:
>>> from lasagne.layers import InputLayer, DenseLayer >>> l_in = InputLayer((None, 100)) >>> from lasagne.nonlinearities import ScaledTanH >>> scaled_tanh = ScaledTanH(scale_in=0.5, scale_out=2.27) >>> l1 = DenseLayer(l_in, num_units=200, nonlinearity=scaled_tanh)
Methods
__call__(x) Apply the scaled tanh function to the activation x.
- lasagne.nonlinearities.ScaledTanh[source]¶
alias of ScaledTanH
- lasagne.nonlinearities.rectify(x)[source]¶
Rectify activation function \(\varphi(x) = \max(0, x)\)
Parameters: - x : float32
The activation (the summed, weighted input of a neuron).
Returns: - float32
The output of the rectify function applied to the activation.
- class lasagne.nonlinearities.LeakyRectify(leakiness=0.01)[source]¶
Leaky rectifier \(\varphi(x) = (x > 0)? x : \alpha \cdot x\)
The leaky rectifier was introduced in [1]. Compared to the standard rectifier rectify(), it has a nonzero gradient for negative input, which often helps convergence.
Parameters: - leakiness : float
Slope for negative input, usually between 0 and 1. A leakiness of 0 will lead to the standard rectifier, a leakiness of 1 will lead to a linear activation function, and any value in between will give a leaky rectifier.
See also
- leaky_rectify
- Instance with default leakiness of 0.01, as in [1].
- very_leaky_rectify
- Instance with high leakiness of 1/3, as in [2].
References
[1] (1, 2, 3) Maas et al. (2013): Rectifier Nonlinearities Improve Neural Network Acoustic Models, http://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf [2] (1, 2) Graham, Benjamin (2014): Spatially-sparse convolutional neural networks, http://arxiv.org/abs/1409.6070 Examples
In contrast to other activation functions in this module, this is a class that needs to be instantiated to obtain a callable:
>>> from lasagne.layers import InputLayer, DenseLayer >>> l_in = InputLayer((None, 100)) >>> from lasagne.nonlinearities import LeakyRectify >>> custom_rectify = LeakyRectify(0.1) >>> l1 = DenseLayer(l_in, num_units=200, nonlinearity=custom_rectify)
Alternatively, you can use the provided instance for leakiness=0.01:
>>> from lasagne.nonlinearities import leaky_rectify >>> l2 = DenseLayer(l_in, num_units=200, nonlinearity=leaky_rectify)
Or the one for a high leakiness of 1/3:
>>> from lasagne.nonlinearities import very_leaky_rectify >>> l3 = DenseLayer(l_in, num_units=200, nonlinearity=very_leaky_rectify)
Methods
__call__(x) Apply the leaky rectify function to the activation x.
- lasagne.nonlinearities.leaky_rectify(x)[source]¶
Instance of LeakyRectify with leakiness \(\alpha=0.01\)
- lasagne.nonlinearities.very_leaky_rectify(x)[source]¶
Instance of LeakyRectify with leakiness \(\alpha=1/3\)
- lasagne.nonlinearities.elu(x)[source]¶
Exponential Linear Unit \(\varphi(x) = (x > 0) ? x : e^x - 1\)
The Exponential Linear Unit (ELU) was introduced in [1]. Compared to the linear rectifier rectify(), it has a mean activation closer to zero and nonzero gradient for negative input, which can help convergence. Compared to the leaky rectifier LeakyRectify, it saturates for highly negative inputs.
Parameters: - x : float32
The activation (the summed, weighed input of a neuron).
Returns: - float32
The output of the exponential linear unit for the activation.
Notes
In [1], an additional parameter \(\alpha\) controls the (negative) saturation value for negative inputs, but is set to 1 for all experiments. It is omitted here.
References
[1] (1, 2, 3) Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter (2015): Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), http://arxiv.org/abs/1511.07289
- class lasagne.nonlinearities.SELU(scale=1, scale_neg=1)[source]¶
Scaled Exponential Linear Unit \(\varphi(x)=\lambda \left[(x>0) ? x : \alpha(e^x-1)\right]\)
The Scaled Exponential Linear Unit (SELU) was introduced in [1] as an activation function that allows the construction of self-normalizing neural networks.
Parameters: - scale : float32
The scale parameter \(\lambda\) for scaling all output.
- scale_neg : float32
The scale parameter \(\alpha\) for scaling output for nonpositive argument values.
References
[1] (1, 2, 3) Günter Klambauer et al. (2017): Self-Normalizing Neural Networks, https://arxiv.org/abs/1706.02515 Examples
In contrast to other activation functions in this module, this is a class that needs to be instantiated to obtain a callable:
>>> from lasagne.layers import InputLayer, DenseLayer >>> l_in = InputLayer((None, 100)) >>> from lasagne.nonlinearities import SELU >>> selu = SELU(2, 3) >>> l1 = DenseLayer(l_in, num_units=200, nonlinearity=selu)
Methods
__call__(x) Apply the SELU function to the activation x.
- lasagne.nonlinearities.selu(x)[source]¶
Instance of SELU with \(\alpha\approx 1.6733, \lambda\approx 1.0507\)
This has a stable and attracting fixed point of \(\mu=0\), \(\sigma=1\) under the assumptions of the original paper on self-normalizing neural networks.
- lasagne.nonlinearities.softplus(x)[source]¶
Softplus activation function \(\varphi(x) = \log(1 + e^x)\)
Parameters: - x : float32
The activation (the summed, weighted input of a neuron).
Returns: - float32
The output of the softplus function applied to the activation.