Activation Function
Nerd Cafe
What is an Activation Function in Artificial Neural Networks (ANNs)?
An activation function is a mathematical function applied to the output of each neuron (or node) in a neural network. It determines whether the neuron should be activated or not, meaning whether it should pass its signal to the next layer or not.
Why Do We Use Activation Functions?
Activation functions serve three major purposes:
1. Introduce Non-Linearity
Without activation functions, an ANN would just be a linear model (like linear regression), regardless of how many layers it has.
Most real-world problems (e.g., image recognition, speech, finance) are non-linear, and linear models can't capture complex patterns.
Activation functions like ReLU, Sigmoid, Tanh introduce non-linear boundaries to make the network capable of learning complex mappings.
2. Control the Output Range
Each activation function maps the neuron's output to a specific range:
Sigmoid
(0, 1)
Tanh
(−1, 1)
ReLU
[0, ∞)
Leaky ReLU / ELU
(−∞, ∞)
Softmax
[0, 1] (probabilities)
This is especially useful in:
Sigmoid for binary classification.
Softmax for multi-class classification.
3. Gradient Propagation in Backpropagation
During training, ANNs use gradient descent to update weights. The derivatives of activation functions are essential for calculating how much each neuron contributes to the error.
For example:
The derivative of ReLU is either 0 or 1, which helps avoid vanishing gradients.
Sigmoid and Tanh can suffer from vanishing gradient problems when the input is very large or small.
1. Step Function (Binary Threshold)
Formula:
Numerical Example:
Python Code:
Output
2. Sigmoid Function
Formula:
Numerical Example:
Python Code:
Output
3. Tanh (Hyperbolic Tangent)
Formula:
Numerical Example:
Python Code:
Output
4. ReLU (Rectified Linear Unit)
Formula:
Numerical Example:
Python Code:
Output:
5. Leaky ReLU
Formula:
Numerical Example:
Python Code:
Output:
6. ELU (Exponential Linear Unit)
Formula:
Numerical Example:
For 𝛼=1:
Python Code:
Output:
7. Softplus
Formula:
Numerical Example:
Python Code:
Output:
8. Swish (by Google)
Formula:
Numerical Example:
Python Code:
Output:
9. Mish
Formula:
Python Code:
Output:
Summary Table:
Step
{0,1}
❌
✔️
Simple perceptrons
Sigmoid
(0,1)
✔️
✔️
Binary classification
Tanh
(-1,1)
✔️
✔️
Hidden layers
ReLU
[0,∞)
✔️
✔️
Deep networks
Leaky ReLU
(-∞,∞)
✔️
✔️
Solves ReLU dying neurons
ELU
(-α,∞)
✔️
✔️
Better than ReLU in some cases
Softplus
(0,∞)
✔️
✔️
Smooth ReLU
Swish
(-∞,∞)
✔️
✔️
New gen deep learning
Mish
(-∞,∞)
✔️
✔️
Best performance in some nets
Keywords
activation function
, artificial neural network
, ANN
, non-linearity
, ReLU
, sigmoid
, tanh
, leaky ReLU
, ELU
, softmax
, swish
, mish
, deep learning
, neuron
, backpropagation
, gradient descent
, output range
, classification
, neural network training
, nonlinear activation
, nerd cafe
Last updated