Python Encyclopedia for Academics
  • Course Outline
  • Artificial Intelligence
    • Data Science Foundation
      • Python Programming
        • Introduction and Basics
          • Variables
          • Print Function
          • Input From User
          • Data Types
          • Type Conversion
        • Operators
          • Arithmetic Operators
          • Relational Operators
          • Bitwise Operators
          • Logical Operators
          • Assignment Operators
          • Compound Operators
          • Membership Operators
          • Identity Operators
      • Numpy
        • Vectors, Matrix
        • Operations on Matrix
        • Mean, Variance, and Standard Deviation
        • Reshaping Arrays
        • Transpose and Determinant of Matrix
      • Pandas
        • Series and DataFrames
        • Slicing, Rows, and Columns
        • Operations on DataFrames
        • Different wayes to creat DataFrame
        • Read, Write Operations with CSV files
      • Matplotlib
        • Graph Basics
        • Format Strings in Plots
        • Label Parameters, Legend
        • Bar Chart, Pie Chart, Histogram, and Scatter Plot
  • Machine Learning Algorithms
    • Regression Analysis In ML
      • Regression Analysis in Machine Learning
      • Proof of Linear Regression Formulas
      • Simple Linear Regression Implementation
      • Multiple Linear Regression
      • Advertising Dataset Example
      • Bike Sharing Dataset
      • Wine Quality Dataset
      • Auto MPG Dataset
    • Classification Algorithms in ML
      • Proof of Logistic Regression
      • Simplified Mathematical Proof of SVM
      • Iris Dataset
  • Machine Learning Laboratory
    • Lab 1: Titanic Dataset
      • Predicting Survival on the Titanic with Machine Learning
    • Lab 2: Dow Jones Index Dataset
      • Dow Jones Index Predictions Using Machine Learning
    • Lab 3: Diabetes Dataset
      • Numpy
      • Pandas
      • Matplotlib
      • Simple Linear Regression
      • Simple Non-linear Regression
      • Performance Matrix
      • Preprocessing
      • Naive Bayes Classification
      • K-Nearest Neighbors (KNN) Classification
      • Decision Tree & Random Forest
      • SVM Classifier
      • Logistic Regression
      • Artificial Neural Network
      • K means Clustering
    • Lab 4: MAGIC Gamma Telescope Dataset
      • Classification in ML-MAGIC Gamma Telescope Dataset
    • Lab 5: Seoul Bike Sharing Demand Dataset
      • Regression in ML-Seoul Bike Sharing Demand Dataset
    • Lab 6: Medical Cost Personal Datasets
      • Predict Insurance Costs with Linear Regression in Python
    • Lab 6: Predict The S&P 500 Index With Machine Learning And Python
      • Predict The S&P 500 Index With Machine Learning And Python
  • Artificial Neural Networks
    • Biological Inspiration vs. Artificial Neurons
    • Review linear algebra and calculus essentials for ANNs
    • Activation Function
  • Mathematics
    • Pre-Calculus
      • Factorials
      • Roots of Polynomials
      • Complex Numbers
      • Polar Coordinates
      • Graph of a Function
    • Calculus 1
      • Limit of a Function
      • Derivative of Function
      • Critical Points
      • Indefinite Integrals
  • Calculus 2
    • 3D Coordinates and Vectors
    • Vectors and Vector Operations
    • Lines and Planes in Space (3D)
    • Partial Derivatives
    • Optimization Problems (Maxima/Minima) in Multivariable Functions
    • Gradient Vectors
  • Engineering Mathematics
    • Laplace Transform
  • Electrical & electronics Eng
    • Resistor
      • Series Resistors
      • Parallel Resistors
    • Nodal Analysis
      • Example 1
      • Example 2
    • Transient State
      • RC Circuit Equations in the s-Domain
      • RL Circuit Equations in the s-Domain
      • LC Circuit Equations in the s-Domain
      • Series RLC Circuit with DC Source
  • Computer Networking
    • Fundamental
      • IPv4 Addressing
      • Network Diagnostics
  • Cybersecurity
    • Classical Ciphers
      • Caesar Cipher
      • Affine Cipher
      • Atbash Cipher
      • Vigenère Cipher
      • Gronsfeld Cipher
      • Alberti Cipher
      • Hill Cipher
Powered by GitBook
On this page
  • Key Terminologies
  • How Regression Works
  • Types of Regression Models in ML
  • Regression Model Types
  • How to Choose the Best Regression Model
  • Evaluation Metrics
  • Building a Regressor in Python
  • Keywords
  1. Machine Learning Algorithms
  2. Regression Analysis In ML

Regression Analysis in Machine Learning

Nerd Cafe

Regression analysis is a statistical technique in machine learning used to predict continuous numeric values by exploring the relationship between independent (predictor) and dependent (target) variables. Its primary objective is to estimate how a change in one or more predictors affects the outcome variable and to plot a best-fitting line or curve that models the relationship.

Key Terminologies

Term
Description

Independent Variables

Input features (predictors) used to predict output.

Dependent Variable

The output value we want to predict (target).

Regression Line

A best-fit line or curve that describes the relationship.

Overfitting

Model fits training data too well but fails to generalize.

Underfitting

Model fails to capture underlying trend (too simplistic).

Outliers

Extreme data points deviating from other observations.

Multicollinearity

Predictors are correlated with each other, which can distort the model.

How Regression Works

  • Use labeled data (with known inputs and outputs).

  • Train a model to learn the relationship between features and the target.

  • Use the model to predict continuous output for new input data.

Types of Regression Models in ML

Type
Description
  1. Linear Regression

Predicts output using a straight line: Y = mX + b.

  1. Logistic Regression

Used for classification; outputs probability (0 to 1).

  1. Polynomial Regression

Fits data with a polynomial curve (non-linear).

  1. Lasso Regression

L1 regularization; shrinks less important features.

  1. Ridge Regression

L2 regularization; helps in multicollinearity.

  1. Decision Tree Regression

Splits data into tree nodes to predict values.

  1. Random Forest Regression

Ensemble of trees; improves accuracy, reduces variance.

  1. Support Vector Regression (SVR)

Uses support vectors for linear/non-linear regression.

Regression Model Types

  • Simple Regression: One independent variable.

  • Multiple Regression: Multiple independent variables.

How to Choose the Best Regression Model

  1. Compare performance metrics:

    • MAE, MSE, RMSE, R², MAPE

  2. Check model complexity & interpretability.

  3. Evaluate generalization on test data.

Evaluation Metrics

Metric
Description

MAE

Average absolute error.

MSE

Average squared error.

RMSE

Square root of MSE.

R² Score

Proportion of variance explained (1 is best).

MAPE

Error as a percentage of true values.

Median Absolute Error

Robust to outliers.

Building a Regressor in Python

You can build regression models using libraries like Scikit-learn, TensorFlow, and Statsmodels.

Example with Scikit-learn:

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd

# Load dataset
X = dataset[['feature1', 'feature2']]
y = dataset['target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, predictions)
print("MSE:", mse)

Keywords

regression analysis, machine learning, linear regression, logistic regression, polynomial regression, lasso regression, ridge regression, support vector regression, decision tree regression, random forest regression, overfitting, underfitting, multicollinearity, prediction, evaluation metrics, mean squared error, root mean squared error, R-squared, model selection, feature selection

PreviousRegression Analysis In MLNextProof of Linear Regression Formulas

Last updated 27 days ago