Simple Linear Regression Implementation
Nerd Cafe
The primary purpose of implementing a Simple Linear Regression model on the below dataset
is to model and quantify the linear relationship between years of experience (independent variable) and salary (dependent variable) in order to make predictions. This serves the following specific goals:
1. Understanding Relationships
You want to determine how salary changes with years of experience.
A linear regression line provides an interpretable model that tells you how much increase in salary is expected for each additional year of experience.
2. Predict Future Outcomes
Once trained, the model can be used to predict salary for unseen or future inputs (e.g., "What will be the salary of a person with 7.5 years of experience?").
3. Evaluate Model Accuracy
By comparing predicted values with actual values on the test set, you can evaluate how well the model generalizes to new data.
The
y_testvsy_predcomparison helps calculate accuracy metrics like:Mean Squared Error (MSE)
R² Score (coefficient of determination)
Mean Absolute Error (MAE)
4. Learn Model Parameters
After fitting the model, the values of
w₀(intercept) andw₁(slope) are learned from the training data.These parameters define the best-fit line:

which explains the trend in your dataset.
5. Validate Assumptions
By plotting residuals or checking histograms, you can verify the assumptions of linear regression (linearity, normality, independence, homoskedasticity).
6. Foundational Learning
This is a baseline model in machine learning. It introduces important concepts such as:
Supervised learning
Cost/loss functions
Optimization (e.g., gradient descent)
Model evaluation and visualization
Python code
The dataset contains two columns:
Years of Experience: Number of years a person has worked.
Salary: The corresponding salary.
Here's a preview of the first few rows:
The output is:
Let's build and visualize a simple linear regression model to predict salary based on years of experience.
1. Import Required Libraries
pandasis used to load and manage the dataset.matplotlib.pyplotis for plotting the graph.sklearn.linear_model.LinearRegressionis the core linear regression model.train_test_splitsplits the data into training and testing sets.r2_scoreandmean_squared_errorare evaluation metrics.
2. Load the CSV File
This reads the CSV file into a DataFrame called
df.
3. Separate Features and Labels python Copy Edit
Xcontains the input feature: Years of Experience (2D array).ycontains the target variable: Salary.
4. Split into Training and Testing Sets
80% of the data goes into training (
X_train,y_train).20% goes into testing (
X_test,y_test).random_state=42ensures reproducibility
5. Train the Linear Regression Model
We create a linear regression model.
.fit()trains the model using training data.
6. Make Predictions on Test Set
This uses the trained model to predict salaries for
X_test.
7. Print the Model Equation
slopeis the coefficient (how much salary increases per year).interceptis the base salary when experience is 0.You get the equation in the form: Salary = a × YearsExperience + b
Output
8. Evaluate Accuracy
R² Score: how well the model fits the data (1 = perfect).MSE: average of squared prediction errors (lower is better).
Output
9. Predict Salary for a Specific Experience
We predict the salary for 5 years of experience using our model.
Output
10. Plot the Data and the Regression Line
Blue points are actual data.
Red line is the predicted regression line.
Shows how well the line fits the data.

Last updated