Python Encyclopedia for Academics
  • Course Outline
  • Artificial Intelligence
    • Data Science Foundation
      • Python Programming
        • Introduction and Basics
          • Variables
          • Print Function
          • Input From User
          • Data Types
          • Type Conversion
        • Operators
          • Arithmetic Operators
          • Relational Operators
          • Bitwise Operators
          • Logical Operators
          • Assignment Operators
          • Compound Operators
          • Membership Operators
          • Identity Operators
      • Numpy
        • Vectors, Matrix
        • Operations on Matrix
        • Mean, Variance, and Standard Deviation
        • Reshaping Arrays
        • Transpose and Determinant of Matrix
      • Pandas
        • Series and DataFrames
        • Slicing, Rows, and Columns
        • Operations on DataFrames
        • Different wayes to creat DataFrame
        • Read, Write Operations with CSV files
      • Matplotlib
        • Graph Basics
        • Format Strings in Plots
        • Label Parameters, Legend
        • Bar Chart, Pie Chart, Histogram, and Scatter Plot
  • Machine Learning Algorithms
    • Regression Analysis In ML
      • Regression Analysis in Machine Learning
      • Proof of Linear Regression Formulas
      • Simple Linear Regression Implementation
      • Multiple Linear Regression
      • Advertising Dataset Example
      • Bike Sharing Dataset
      • Wine Quality Dataset
      • Auto MPG Dataset
    • Classification Algorithms in ML
      • Proof of Logistic Regression
      • Simplified Mathematical Proof of SVM
      • Iris Dataset
  • Machine Learning Laboratory
    • Lab 1: Titanic Dataset
      • Predicting Survival on the Titanic with Machine Learning
    • Lab 2: Dow Jones Index Dataset
      • Dow Jones Index Predictions Using Machine Learning
    • Lab 3: Diabetes Dataset
      • Numpy
      • Pandas
      • Matplotlib
      • Simple Linear Regression
      • Simple Non-linear Regression
      • Performance Matrix
      • Preprocessing
      • Naive Bayes Classification
      • K-Nearest Neighbors (KNN) Classification
      • Decision Tree & Random Forest
      • SVM Classifier
      • Logistic Regression
      • Artificial Neural Network
      • K means Clustering
    • Lab 4: MAGIC Gamma Telescope Dataset
      • Classification in ML-MAGIC Gamma Telescope Dataset
    • Lab 5: Seoul Bike Sharing Demand Dataset
      • Regression in ML-Seoul Bike Sharing Demand Dataset
    • Lab 6: Medical Cost Personal Datasets
      • Predict Insurance Costs with Linear Regression in Python
    • Lab 6: Predict The S&P 500 Index With Machine Learning And Python
      • Predict The S&P 500 Index With Machine Learning And Python
  • Artificial Neural Networks
    • Biological Inspiration vs. Artificial Neurons
    • Review linear algebra and calculus essentials for ANNs
    • Activation Function
  • Mathematics
    • Pre-Calculus
      • Factorials
      • Roots of Polynomials
      • Complex Numbers
      • Polar Coordinates
      • Graph of a Function
    • Calculus 1
      • Limit of a Function
      • Derivative of Function
      • Critical Points
      • Indefinite Integrals
  • Calculus 2
    • 3D Coordinates and Vectors
    • Vectors and Vector Operations
    • Lines and Planes in Space (3D)
    • Partial Derivatives
    • Optimization Problems (Maxima/Minima) in Multivariable Functions
    • Gradient Vectors
  • Engineering Mathematics
    • Laplace Transform
  • Electrical & electronics Eng
    • Resistor
      • Series Resistors
      • Parallel Resistors
    • Nodal Analysis
      • Example 1
      • Example 2
    • Transient State
      • RC Circuit Equations in the s-Domain
      • RL Circuit Equations in the s-Domain
      • LC Circuit Equations in the s-Domain
      • Series RLC Circuit with DC Source
  • Computer Networking
    • Fundamental
      • IPv4 Addressing
      • Network Diagnostics
  • Cybersecurity
    • Classical Ciphers
      • Caesar Cipher
      • Affine Cipher
      • Atbash Cipher
      • Vigenère Cipher
      • Gronsfeld Cipher
      • Alberti Cipher
      • Hill Cipher
Powered by GitBook
On this page
  • What is an Activation Function in Artificial Neural Networks (ANNs)?
  • Why Do We Use Activation Functions?
  • 1. Step Function (Binary Threshold)
  • 2. Sigmoid Function
  • 3. Tanh (Hyperbolic Tangent)
  • 4. ReLU (Rectified Linear Unit)
  • 5. Leaky ReLU
  • 6. ELU (Exponential Linear Unit)
  • 7. Softplus
  • 8. Swish (by Google)
  • 9. Mish
  • Summary Table:
  • Keywords
  1. Artificial Neural Networks

Activation Function

Nerd Cafe

What is an Activation Function in Artificial Neural Networks (ANNs)?

An activation function is a mathematical function applied to the output of each neuron (or node) in a neural network. It determines whether the neuron should be activated or not, meaning whether it should pass its signal to the next layer or not.

Why Do We Use Activation Functions?

Activation functions serve three major purposes:

1. Introduce Non-Linearity

  • Without activation functions, an ANN would just be a linear model (like linear regression), regardless of how many layers it has.

  • Most real-world problems (e.g., image recognition, speech, finance) are non-linear, and linear models can't capture complex patterns.

  • Activation functions like ReLU, Sigmoid, Tanh introduce non-linear boundaries to make the network capable of learning complex mappings.

2. Control the Output Range

Each activation function maps the neuron's output to a specific range:

Activation Function
Output Range

Sigmoid

(0, 1)

Tanh

(−1, 1)

ReLU

[0, ∞)

Leaky ReLU / ELU

(−∞, ∞)

Softmax

[0, 1] (probabilities)

This is especially useful in:

  • Sigmoid for binary classification.

  • Softmax for multi-class classification.

3. Gradient Propagation in Backpropagation

During training, ANNs use gradient descent to update weights. The derivatives of activation functions are essential for calculating how much each neuron contributes to the error.

For example:

  • The derivative of ReLU is either 0 or 1, which helps avoid vanishing gradients.

  • Sigmoid and Tanh can suffer from vanishing gradient problems when the input is very large or small.

1. Step Function (Binary Threshold)

Formula:

f(x)={1if x≥00if x<0f(x) = \begin{cases} 1 & \text{if } x \geq 0 \\ 0 & \text{if } x < 0 \end{cases} f(x)={10​if x≥0if x<0​

Numerical Example:

f(2)=1    and    f(−3)=0f(2)=1 \;\;and\;\;f(−3)=0f(2)=1andf(−3)=0

Python Code:

import numpy as np
import matplotlib.pyplot as plt

def step_function(x):
    return np.where(x >= 0, 1, 0)

x = np.linspace(-10, 10, 1000)
y = step_function(x)

plt.plot(x, y)
plt.title("Step Function")
plt.grid(True)
plt.show()

Output

2. Sigmoid Function

Formula:

f(x)=11+e−xf(x)=\frac{1}{1+e^{-x}}f(x)=1+e−x1​

Numerical Example:

f(0)=0.5    and    f(2)≈0.88f(0)=0.5\;\;and\;\;f(2)≈0.88f(0)=0.5andf(2)≈0.88

Python Code:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

y = sigmoid(x)

plt.plot(x, y)
plt.title("Sigmoid Function")
plt.grid(True)
plt.show()

Output

3. Tanh (Hyperbolic Tangent)

Formula:

f(x)=tanh(x)=ex−e−xex+e−xf(x)=tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}f(x)=tanh(x)=ex+e−xex−e−x​

Numerical Example:

tanh(0)=0    and    tanh(1)≈0.76tanh(0)=0\;\;and\;\;tanh(1)≈0.76tanh(0)=0andtanh(1)≈0.76

Python Code:

def tanh(x):
    return np.tanh(x)

y = tanh(x)

plt.plot(x, y)
plt.title("Tanh Function")
plt.grid(True)
plt.show()

Output

4. ReLU (Rectified Linear Unit)

Formula:

f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x)

Numerical Example:

f(−3)=0    and    f(2)=2f(−3)=0\;\;and\;\;f(2)=2f(−3)=0andf(2)=2

Python Code:

def relu(x):
    return np.maximum(0, x)

y = relu(x)

plt.plot(x, y)
plt.title("ReLU Function")
plt.grid(True)
plt.show()

Output:

5. Leaky ReLU

Formula:

f(x)={xif x>0α.xif x≤0    where    α=0.01f(x) = \begin{cases} x & \text{if } x \gt 0 \\ \alpha.x & \text{if } x \le 0 \end{cases} \;\;where\;\;\alpha=0.01 f(x)={xα.x​if x>0if x≤0​whereα=0.01

Numerical Example:

f(−3)=−0.03    and    f(2)=2f(−3)=−0.03\;\;and\;\;f(2)=2f(−3)=−0.03andf(2)=2

Python Code:

def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

y = leaky_relu(x)

plt.plot(x, y)
plt.title("Leaky ReLU Function")
plt.grid(True)
plt.show()

Output:

6. ELU (Exponential Linear Unit)

Formula:

f(x)={xif x>0α(e−x−1)if x≤0f(x) = \begin{cases} x & \text{if } x \gt 0 \\ \alpha(e^{-x}-1) & \text{if } x \le 0 \end{cases} f(x)={xα(e−x−1)​if x>0if x≤0​

Numerical Example:

For 𝛼=1:

f(−1)≈−0.632    and    f(1)=1f(−1)≈−0.632\;\;and\;\;f(1)=1f(−1)≈−0.632andf(1)=1

Python Code:

def elu(x, alpha=1.0):
    return np.where(x > 0, x, alpha * (np.exp(x) - 1))

y = elu(x)

plt.plot(x, y)
plt.title("ELU Function")
plt.grid(True)
plt.show()

Output:

7. Softplus

Formula:

f(x)=ln(1+ex)f(x)=ln(1+e^{x})f(x)=ln(1+ex)

Numerical Example:

f(0)≈0.693    and    f(1)≈1.31f(0)≈0.693\;\;and\;\;f(1)≈1.31f(0)≈0.693andf(1)≈1.31

Python Code:

def softplus(x):
    return np.log(1 + np.exp(x))

y = softplus(x)

plt.plot(x, y)
plt.title("Softplus Function")
plt.grid(True)
plt.show()

Output:

8. Swish (by Google)

Formula:

f(x)=x.σ(x)=x1+e−xf(x)=x.\sigma(x)=\frac{x}{1+e^{-x}}f(x)=x.σ(x)=1+e−xx​

Numerical Example:

f(0)=0    and    f(2)≈1.76f(0)=0\;\;and\;\;f(2)≈1.76f(0)=0andf(2)≈1.76

Python Code:

def swish(x):
    return x * sigmoid(x)

y = swish(x)

plt.plot(x, y)
plt.title("Swish Function")
plt.grid(True)
plt.show()

Output:

9. Mish

Formula:

f(x)=x⋅tanh(ln(1+ex))=ln(softplus(x))f(x)=x⋅tanh(ln(1+e^{x}))=ln(softplus(x))f(x)=x⋅tanh(ln(1+ex))=ln(softplus(x))

Python Code:

def mish(x):
    return x * np.tanh(np.log(1 + np.exp(x)))

y = mish(x)

plt.plot(x, y)
plt.title("Mish Function")
plt.grid(True)
plt.show()

Output:

Summary Table:

Function
Range
Differentiable
Non-linearity
Uses

Step

{0,1}

❌

✔️

Simple perceptrons

Sigmoid

(0,1)

✔️

✔️

Binary classification

Tanh

(-1,1)

✔️

✔️

Hidden layers

ReLU

[0,∞)

✔️

✔️

Deep networks

Leaky ReLU

(-∞,∞)

✔️

✔️

Solves ReLU dying neurons

ELU

(-α,∞)

✔️

✔️

Better than ReLU in some cases

Softplus

(0,∞)

✔️

✔️

Smooth ReLU

Swish

(-∞,∞)

✔️

✔️

New gen deep learning

Mish

(-∞,∞)

✔️

✔️

Best performance in some nets

Keywords

activation function, artificial neural network, ANN, non-linearity, ReLU, sigmoid, tanh, leaky ReLU, ELU, softmax, swish, mish, deep learning, neuron, backpropagation, gradient descent, output range, classification, neural network training, nonlinear activation, nerd cafe

PreviousReview linear algebra and calculus essentials for ANNsNextPre-Calculus

Last updated 1 month ago