Activation Function

Nerd Cafe

What is an Activation Function in Artificial Neural Networks (ANNs)?

An activation function is a mathematical function applied to the output of each neuron (or node) in a neural network. It determines whether the neuron should be activated or not, meaning whether it should pass its signal to the next layer or not.

Why Do We Use Activation Functions?

Activation functions serve three major purposes:

1. Introduce Non-Linearity

Without activation functions, an ANN would just be a linear model (like linear regression), regardless of how many layers it has.
Most real-world problems (e.g., image recognition, speech, finance) are non-linear, and linear models can't capture complex patterns.
Activation functions like ReLU, Sigmoid, Tanh introduce non-linear boundaries to make the network capable of learning complex mappings.

2. Control the Output Range

Each activation function maps the neuron's output to a specific range:

Activation Function

Output Range

Sigmoid

(0, 1)

Tanh

(−1, 1)

ReLU

[0, ∞)

Leaky ReLU / ELU

(−∞, ∞)

Softmax

[0, 1] (probabilities)

This is especially useful in:

Sigmoid for binary classification.
Softmax for multi-class classification.

3. Gradient Propagation in Backpropagation

During training, ANNs use gradient descent to update weights. The derivatives of activation functions are essential for calculating how much each neuron contributes to the error.

For example:

The derivative of ReLU is either 0 or 1, which helps avoid vanishing gradients.
Sigmoid and Tanh can suffer from vanishing gradient problems when the input is very large or small.

1. Step Function (Binary Threshold)

Formula:

f(x) = \begin{cases} 1 & \text{if } x \geq 0 \\ 0 & \text{if } x < 0 \end{cases}

Numerical Example:

f(2)=1 \;\;and\;\;f(−3)=0

Python Code:

import numpy as np
import matplotlib.pyplot as plt

def step_function(x):
    return np.where(x >= 0, 1, 0)

x = np.linspace(-10, 10, 1000)
y = step_function(x)

plt.plot(x, y)
plt.title("Step Function")
plt.grid(True)
plt.show()

Output

2. Sigmoid Function

Formula:

f(x)=\frac{1}{1+e^{-x}}

Numerical Example:

f(0)=0.5\;\;and\;\;f(2)≈0.88

Python Code:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

y = sigmoid(x)

plt.plot(x, y)
plt.title("Sigmoid Function")
plt.grid(True)
plt.show()

Output

3. Tanh (Hyperbolic Tangent)

Formula:

f(x)=tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}

Numerical Example:

tanh(0)=0\;\;and\;\;tanh(1)≈0.76

Python Code:

def tanh(x):
    return np.tanh(x)

y = tanh(x)

plt.plot(x, y)
plt.title("Tanh Function")
plt.grid(True)
plt.show()

Output

4. ReLU (Rectified Linear Unit)

Formula:

f(x)=max(0,x)

Numerical Example:

f(−3)=0\;\;and\;\;f(2)=2

Python Code:

def relu(x):
    return np.maximum(0, x)

y = relu(x)

plt.plot(x, y)
plt.title("ReLU Function")
plt.grid(True)
plt.show()

Output:

5. Leaky ReLU

Formula:

f(x) = \begin{cases} x & \text{if } x \gt 0 \\ \alpha.x & \text{if } x \le 0 \end{cases} \;\;where\;\;\alpha=0.01

Numerical Example:

f(−3)=−0.03\;\;and\;\;f(2)=2

Python Code:

def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

y = leaky_relu(x)

plt.plot(x, y)
plt.title("Leaky ReLU Function")
plt.grid(True)
plt.show()

Output:

6. ELU (Exponential Linear Unit)

Formula:

f(x) = \begin{cases} x & \text{if } x \gt 0 \\ \alpha(e^{-x}-1) & \text{if } x \le 0 \end{cases}

Numerical Example:

For 𝛼=1:

f(−1)≈−0.632\;\;and\;\;f(1)=1

Python Code:

def elu(x, alpha=1.0):
    return np.where(x > 0, x, alpha * (np.exp(x) - 1))

y = elu(x)

plt.plot(x, y)
plt.title("ELU Function")
plt.grid(True)
plt.show()

Output:

7. Softplus

Formula:

f(x)=ln(1+e^{x})

Numerical Example:

f(0)≈0.693\;\;and\;\;f(1)≈1.31

Python Code:

def softplus(x):
    return np.log(1 + np.exp(x))

y = softplus(x)

plt.plot(x, y)
plt.title("Softplus Function")
plt.grid(True)
plt.show()

Output:

8. Swish (by Google)

Formula:

f(x)=x.\sigma(x)=\frac{x}{1+e^{-x}}

Numerical Example:

f(0)=0\;\;and\;\;f(2)≈1.76

Python Code:

def swish(x):
    return x * sigmoid(x)

y = swish(x)

plt.plot(x, y)
plt.title("Swish Function")
plt.grid(True)
plt.show()

Output:

9. Mish

Formula:

f(x)=x⋅tanh(ln(1+e^{x}))=ln(softplus(x))

Python Code:

def mish(x):
    return x * np.tanh(np.log(1 + np.exp(x)))

y = mish(x)

plt.plot(x, y)
plt.title("Mish Function")
plt.grid(True)
plt.show()

Output:

Summary Table:

Function

Range

Differentiable

Non-linearity

Uses

Step

{0,1}

❌

✔️

Simple perceptrons

Sigmoid

(0,1)

✔️

Binary classification

Tanh

(-1,1)

✔️

Hidden layers

ReLU

[0,∞)

✔️

Deep networks

Leaky ReLU

(-∞,∞)

✔️

Solves ReLU dying neurons

ELU

(-α,∞)

✔️

Better than ReLU in some cases

Softplus

(0,∞)

✔️

Smooth ReLU

Swish

(-∞,∞)

✔️

New gen deep learning

Mish

(-∞,∞)

✔️

Best performance in some nets

Keywords

activation function, artificial neural network, ANN, non-linearity, ReLU, sigmoid, tanh, leaky ReLU, ELU, softmax, swish, mish, deep learning, neuron, backpropagation, gradient descent, output range, classification, neural network training, nonlinear activation, nerd cafe

PreviousReview linear algebra and calculus essentials for ANNs NextPre-Calculus

Last updated 1 month ago

import numpy as np import matplotlib.pyplot as plt def step_function(x): return np.where(x >= 0, 1, 0) x = np.linspace(-10, 10, 1000) y = step_function(x) plt.plot(x, y) plt.title("Step Function") plt.grid(True) plt.show()