Linear Regression

Posted May 31, 2025

By Nancy Chemutai

1 min read

Linear Regression

Linear Regression using Python

This project aims at predicting an outcome (e.g., house prices) based on a single feature (e.g., house area). It can be achieved through:

Checking the shape
- The dataset consists of 6 rows (inclusive of header) and 4 columns
Check for irregularities
- Presence of missing values in bedrooms column
- bedrooms should be of integer data type
Summary statistics

Impute bedrooms
- The missing value was filled by median i.e., the middle value of the bedrooms, since the value was missing at random (MAR)

The visualizations show the relationship between area, bedrooms, and age versus price

Scatter Plot of area vs price
Scatter Plot of bedrooms vs price
Scatter Plot of age vs price
Correlation Heatmap This shows the relationship between features
- There age of the house has a negative relationship with the price. As the age of the house increases, the price decreases.
- The area and the number of bedrooms show a positive relationship with the price. As the area and the number of bedrooms increases, the price of the house also increases.

Define target and feature variables and split the data into train and test

The model performed fairly due to the distance between the actual and predicted values. The model can be improved in future by performing hyperparameter tuning.
The model is performing poorly due to poor fit between the model and the data which oversimplifies the model causing biasness, thus underfitting. This can be improved by increasing the data.

This post is licensed under CC BY 4.0 by the author.