What is an Activation Function in Artificial Neural Networks (ANNs)?
An activation function is a mathematical function applied to the output of each neuron (or node) in a neural network. It determines whether the neuron should be activated or not, meaning whether it should pass its signal to the next layer or not.
Why Do We Use Activation Functions?
Activation functions serve three major purposes:
1. Introduce Non-Linearity
Without activation functions, an ANN would just be a linear model (like linear regression), regardless of how many layers it has.
Most real-world problems (e.g., image recognition, speech, finance) are non-linear, and linear models can't capture complex patterns.
Activation functions like ReLU, Sigmoid, Tanh introduce non-linear boundaries to make the network capable of learning complex mappings.
2. Control the Output Range
Each activation function maps the neuron's output to a specific range:
Activation Function
Output Range
Sigmoid
(0, 1)
Tanh
(−1, 1)
ReLU
[0, ∞)
Leaky ReLU / ELU
(−∞, ∞)
Softmax
[0, 1] (probabilities)
This is especially useful in:
Sigmoid for binary classification.
Softmax for multi-class classification.
3. Gradient Propagation in Backpropagation
During training, ANNs use gradient descent to update weights. The derivatives of activation functions are essential for calculating how much each neuron contributes to the error.
For example:
The derivative of ReLU is either 0 or 1, which helps avoid vanishing gradients.
Sigmoid and Tanh can suffer from vanishing gradient problems when the input is very large or small.