Activation Function

Nerd Cafe

What is an Activation Function in Artificial Neural Networks (ANNs)?

An activation function is a mathematical function applied to the output of each neuron (or node) in a neural network. It determines whether the neuron should be activated or not, meaning whether it should pass its signal to the next layer or not.

Why Do We Use Activation Functions?

Activation functions serve three major purposes:

1. Introduce Non-Linearity

  • Without activation functions, an ANN would just be a linear model (like linear regression), regardless of how many layers it has.

  • Most real-world problems (e.g., image recognition, speech, finance) are non-linear, and linear models can't capture complex patterns.

  • Activation functions like ReLU, Sigmoid, Tanh introduce non-linear boundaries to make the network capable of learning complex mappings.

2. Control the Output Range

Each activation function maps the neuron's output to a specific range:

Activation Function
Output Range

Sigmoid

(0, 1)

Tanh

(−1, 1)

ReLU

[0, ∞)

Leaky ReLU / ELU

(−∞, ∞)

Softmax

[0, 1] (probabilities)

This is especially useful in:

  • Sigmoid for binary classification.

  • Softmax for multi-class classification.

3. Gradient Propagation in Backpropagation

During training, ANNs use gradient descent to update weights. The derivatives of activation functions are essential for calculating how much each neuron contributes to the error.

For example:

  • The derivative of ReLU is either 0 or 1, which helps avoid vanishing gradients.

  • Sigmoid and Tanh can suffer from vanishing gradient problems when the input is very large or small.

1. Step Function (Binary Threshold)

Formula:

f(x)={1if x00if x<0f(x) = \begin{cases} 1 & \text{if } x \geq 0 \\ 0 & \text{if } x < 0 \end{cases}

Numerical Example:

f(2)=1    and    f(3)=0f(2)=1 \;\;and\;\;f(−3)=0

Python Code:

Output

2. Sigmoid Function

Formula:

f(x)=11+exf(x)=\frac{1}{1+e^{-x}}

Numerical Example:

f(0)=0.5    and    f(2)0.88f(0)=0.5\;\;and\;\;f(2)≈0.88

Python Code:

Output

3. Tanh (Hyperbolic Tangent)

Formula:

f(x)=tanh(x)=exexex+exf(x)=tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}

Numerical Example:

tanh(0)=0    and    tanh(1)0.76tanh(0)=0\;\;and\;\;tanh(1)≈0.76

Python Code:

Output

4. ReLU (Rectified Linear Unit)

Formula:

f(x)=max(0,x)f(x)=max(0,x)

Numerical Example:

f(3)=0    and    f(2)=2f(−3)=0\;\;and\;\;f(2)=2

Python Code:

Output:

5. Leaky ReLU

Formula:

f(x)={xif x>0α.xif x0    where    α=0.01f(x) = \begin{cases} x & \text{if } x \gt 0 \\ \alpha.x & \text{if } x \le 0 \end{cases} \;\;where\;\;\alpha=0.01

Numerical Example:

f(3)=0.03    and    f(2)=2f(−3)=−0.03\;\;and\;\;f(2)=2

Python Code:

Output:

6. ELU (Exponential Linear Unit)

Formula:

f(x)={xif x>0α(ex1)if x0f(x) = \begin{cases} x & \text{if } x \gt 0 \\ \alpha(e^{-x}-1) & \text{if } x \le 0 \end{cases}

Numerical Example:

For 𝛼=1:

f(1)0.632    and    f(1)=1f(−1)≈−0.632\;\;and\;\;f(1)=1

Python Code:

Output:

7. Softplus

Formula:

f(x)=ln(1+ex)f(x)=ln(1+e^{x})

Numerical Example:

f(0)0.693    and    f(1)1.31f(0)≈0.693\;\;and\;\;f(1)≈1.31

Python Code:

Output:

8. Swish (by Google)

Formula:

f(x)=x.σ(x)=x1+exf(x)=x.\sigma(x)=\frac{x}{1+e^{-x}}

Numerical Example:

f(0)=0    and    f(2)1.76f(0)=0\;\;and\;\;f(2)≈1.76

Python Code:

Output:

9. Mish

Formula:

f(x)=xtanh(ln(1+ex))=ln(softplus(x))f(x)=x⋅tanh(ln(1+e^{x}))=ln(softplus(x))

Python Code:

Output:

Summary Table:

Function
Range
Differentiable
Non-linearity
Uses

Step

{0,1}

✔️

Simple perceptrons

Sigmoid

(0,1)

✔️

✔️

Binary classification

Tanh

(-1,1)

✔️

✔️

Hidden layers

ReLU

[0,∞)

✔️

✔️

Deep networks

Leaky ReLU

(-∞,∞)

✔️

✔️

Solves ReLU dying neurons

ELU

(-α,∞)

✔️

✔️

Better than ReLU in some cases

Softplus

(0,∞)

✔️

✔️

Smooth ReLU

Swish

(-∞,∞)

✔️

✔️

New gen deep learning

Mish

(-∞,∞)

✔️

✔️

Best performance in some nets

Keywords

activation function, artificial neural network, ANN, non-linearity, ReLU, sigmoid, tanh, leaky ReLU, ELU, softmax, swish, mish, deep learning, neuron, backpropagation, gradient descent, output range, classification, neural network training, nonlinear activation, nerd cafe

Last updated