Post

Linear Regression

Linear Regression

Linear Regression using Python

Objective

This project aims at predicting an outcome (e.g., house prices) based on a single feature (e.g., house area). It can be achieved through:

  • Exploring a real-world dataset
  • Preparing and splitting data for training and testing
  • Building a simple linear regression model using Scikit-learn’s LinearRegression
  • Evaluating the model using key metrics i.e., MAE, MSE, RMSE, and R² Score
  • Visualizing predictions and regression lines
  • Publishing the project on GitHub/Portfolio

Project Execution

  1. Import relevant libraries Libraries

  2. Loading the datasets Dataset

    • The dataset consisted of area, bedrooms, age, and price

Exploratory Data Analysis

  1. Checking the shape Shape
    • The dataset consists of 6 rows (inclusive of header) and 4 columns
  2. Check for irregularities Irregularities
    • Presence of missing values in bedrooms column
    • bedrooms should be of integer data type
  3. Summary statistics Statistics

Impute Missing Values

  1. Impute bedrooms MissingVal
    • The missing value was filled by median i.e., the middle value of the bedrooms, since the value was missing at random (MAR)

Visualize Data

The visualizations show the relationship between area, bedrooms, and age versus price

  1. Scatter Plot of area vs price AreavsPrice

  2. Scatter Plot of bedrooms vs price BedroomsvsPrice

  3. Scatter Plot of age vs price AgevsPrice

  4. Correlation Heatmap This shows the relationship between features Heatmap

    • There age of the house has a negative relationship with the price. As the age of the house increases, the price decreases.
    • The area and the number of bedrooms show a positive relationship with the price. As the area and the number of bedrooms increases, the price of the house also increases.

Feature Engineering

  • Define the target and feature variables
  • Split the dataset into train and test
  1. Define target and feature variables and split the data into train and test Split

Model Building

  1. Train model TrainModel

  2. Print Coefficients Coef

  3. Print the intercept intercepts

Evaluate the Model

  1. Evaluate the model using mse, r2, rmse, and mae Evaluate

  2. Predicted vs Actual PredAct

Conclusion

  • The model performed fairly due to the distance between the actual and predicted values. The model can be improved in future by performing hyperparameter tuning.
  • The model is performing poorly due to poor fit between the model and the data which oversimplifies the model causing biasness, thus underfitting. This can be improved by increasing the data.
This post is licensed under CC BY 4.0 by the author.