Implementing Simple Linear Regression without any Python Machine learining libraries

Simple Linear Regression implementation in Python

Simple Linear Regression implementation in Python

Simple linear regression implementation in python

Today we are going to implement the most popular and most straightforward regression technique simple linear regression purely in python. When I said purely in python. It’s purely in python without using any machine learning libraries.

When I said simple linear regression. What is going on your mind? Let me guess 😛

  • It’s so simple to implement simple linear regression.
  • Understanding simple linear regression is so comfortable than linear regression.
  • Time complexity level, simple linear regression will take less time to process.

I guess the above analysis you were doing when I said simple linear regression. Maybe the above assumptions were technically reasonable. But there is a particular reason to call it as simple linear regression. First, let’s understand why we are calling it as simple linear regression. Then we can start my favorite part, code the simple linear regression in python.

Building Simple Linear Regression without using any Python machine learning libraries Click To Tweet

What is simple linear regression?

In the linear regression analysis article, we mainly concentrated on explaining the linear regression concepts. We used the below equation while describing the linear regression general equations.

$latex \hat{y} = {w}_{0} + {w}_{1} * {x}$

The above equation is more likely the straight line equation.

$latex \textrm{ y = m*x + c }$

Where m is the slope of the straight line, and c is the constant value. If we compare the above two equations, we can sense the closeness of both the equations. They only differ in the way written except that everything is same.

 

In linear regression, the m ($latex {w}_{1}$) value is known as the coefficient and the c ($latex {w}_{0}$) value called intersect. In the above equation, we have only one dependent variable, and one independent variable is there. That’s the reason we have only one coefficient.

  • Dependent variable –> y or $latex \hat{y}$
  • Independent variable –> x or  $latex {x}$

If we have k independent variables. We will get k coefficient values. If we have more than one independent variable to predict the depended value, then it is called linear regression algorithm. When we have only one independent variable to predict the depended value then it simple linear regression problem.

Let me give few more examples to give you the difference between the linear regression and simple linear regression problems.

Simple linear regression examples

  • Using the feature number of room to predict the house price.
    • Number of rooms independent variable and price is dependent variable.
  • By considering the number of hours student studied to predict the marks percentage, the student will get.
    • Number of hours independent variable and marks percentage dependent variable.
  • Given time predicting the temperature outside your room.
    • Time is independent variable and temperature is dependent variable.

Linear regression examples

  • Using the features like numbers of rooms, how many years old, garden space to predict the house price.
    • The number of rooms, years old, garden area are independent variables, and the house price is the dependent variable.
  • By considering the numbers of hours student spent on English, Mathematics, Physics subject to predict the marks percentage the student will get.
    • The number of hours the student spent on English, mathematics, physics are the independent variables, and the student scores percentage is the dependent variable.
  • Given time, climate details to predict the temperature outside your room.
    • Time and the climate details are the independent variables, and the temperature is the dependent variable.

With the above explanation, I hope I addressed the difference between simple linear regression and linear regression.

In Shot:

Simple Linear Regression: Having one independent variable to predict the dependent variable.

Linear Regression: Having more than one independent variable to predict the dependent variable.

Now let’s build the simple linear regression in python without using any machine libraries.

To implement the simple linear regression we need to know the below formulas.

  • A formula for calculating the mean value.
  • A formula for calculating the variance value.
  • Formula for calculating the covariance between two series of readings (For suppose X, Y)
  • Formulas for calculating the $latex {w}_{0}$ and $latex {w}_{1}$ values.

Formula for calculating mean value

$latex \textrm{mean(x)} = \frac{(x_1)+ (x_2)+(x_3) … + (x_n)} {n}$

Formula for calculating the variance value

$latex \sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i – mean(x))^2} {n-1}$

Formula for calculating covariance between two series of readings

$latex cov_{x,y}=\frac{\sum_{i=1}^{N}(x_{i}-mean(x))(y_{i}-mean(y))}{N-1}$

Formula for calculating the $latex {w}_{0}$ and $latex {w}_{1}$ values

$latex {w}_1 = \frac{covariance(x,y)} {variance(x)}$

$latex {w}_0 = mean(y) – (w_1 * mean(x))$

We are going to use all the above listed formulas to implement the simple linear regression puruly in Python without any machine learning libraries.

In the process of implementing the simple linear regression in python first. We are going to implement all the above formulas. Then we are going to use the implemented function to build the simple linear regression model.

After that, we are going to use python tabular analysis package to implement the same simple linear regression model with few lines of code. We can treat it as checking the previous implementation.

Let’s start building the required functions in the order.

  • Mean Function.
  • Variance Function.
  • Covariance Function.
  • Functions to calculate the $latex {w}_{0}$ and $latex {w}_{1}$ values.

Function to Calculate the mean value

def cal_mean(readings):
    """
    Function to calculate the mean value of the input readings
    :param readings:
    :return:
    """
    readings_total = sum(readings)
    number_of_readings = len(readings)
    mean = readings_total / float(number_of_readings)
    return mean
  • With the cal_mean function, we are going to calculate the mean value of the given readings.
  • We are calculating the sum of readings and storing in the readings_total.
  • Finding the number_of_readings by using the len function.
  • Using the readings_total and the number_of_readings values to calculate the mean.
  • Finally, we return the calculated mean value.

Function to Calculate the Variance Value

from math import pow

def cal_variance(readings):
    """
    Calculating the variance of the readings
    :param readings:
    :return:
    """

    # To calculate the variance we need the mean value
    # Calculating the mean value from the cal_mean function
    readings_mean = cal_mean(readings)
    # mean difference squared readings
    mean_difference_squared_readings = [pow((reading - readings_mean), 2) for reading in readings]
    variance = sum(mean_difference_squared_readings)
    return variance / float(len(readings) - 1)
  • With the cal_variance function, we are going to calculate the variance of the given readings.
  • Using the already implemented cal_mean function, we are calculating the mean value.
  • Then we are calculating the difference between the each and every reading in the given readings to the mean value. After that, we are squaring the calculated difference value and storing the difference squared value in  mean_difference_squared_readings.
  • Finding the sum of the mean_difference_squared_readings and return the ratio of the variance sum and the number of readings -1 value.

Function to Calculate the Covariance Value

def cal_covariance(readings_1, readings_2):
    """
    Calculate the covariance between two different list of readings
    :param readings_1:
    :param readings_2:
    :return:
    """
    readings_1_mean = cal_mean(readings_1)
    readings_2_mean = cal_mean(readings_2)
    readings_size = len(readings_1)
    covariance = 0.0
    for i in xrange(0, readings_size):
        covariance += (readings_1[i] - readings_1_mean) * (readings_2[i] - readings_2_mean)
    return covariance / float(readings_size - 1)
  • With the cal_covariance function, we are going to calculate the covariance between two series of readings. Let’s say the covariance between the readings_1 and readings_2.
  • Using the already implemented cal_mean function to calculate the mean of readings_1 and readings_2.
  • Then summing the product of the mean difference of the readings_1 and readings_2.
  • Finally, return the ratio of the covariance and the number of readings (readings_size – 1).

 With the above function we are ready to calculate the simple linear regression coefficients like $latex {w}_{0}$ and $latex {w}_{1}$ values. Once We implemented these, we can use those values to perform the prediction.

Functions to calculate the $latex {w}_{0}$ and $latex {w}_{1}$ values

def cal_simple_linear_regression_coefficients(x_readings, y_readings):
    """
    Calculating the simple linear regression coefficients (B0, B1)
    :param x_readings:
    :param y_readings:
    :return:
    """
    # Coefficient W1 = covariance of x_readings and y_readings divided by variance of x_readings
    # Directly calling the implemented covariance and the variance functions
    # To calculate the coefficient W1
    w1 = cal_covariance(x_readings, y_readings) / float(cal_variance(x_readings))

    # Coefficient W0 = mean of y_readings - ( W1 * the mean of the x_readings )
    w0 = cal_mean(y_readings) - (w1 * cal_mean(x_readings))
    return w0, w1
  • From the above formulas for calculating $latex {w}_{0}$ and $latex {w}_{1}$ we are creating cal_simple_linear_regression_coefficients function.
  • To calculate $latex {w}_{1}$  value we need to find the ratio of covariance of the x_readings and y_readings and the variance of the x_readings.
  • Using the $latex {w}_{1}$ we are calculating the $latext {w}_{0}$ value.
  • Finally, we are returning the $latex {w}_{0}$ and $ latex {w}_{1}$ values.

Now Let’s use all the above implemented function to predict the house price using the simple linear regression technique.

Predicting House Price With Simple linear Regression In Python

Predicting House Price With Simple Linear Regression In Python

we are using the same house price dataset from linear regression implementation in python.

Let’s first load the dataset and see what are the features in the dataset. To load the dataset, we are going to use pandas.

def simple_linear_regression(dataset):
    """
    Implementing the simple linear regression without using any python machine learning library
    :param dataset:
    :return:
    """

    # Get the dataset header names
    dataset_headers = get_headers(dataset)
    print "Dataset Headers :: ", dataset_headers

input_path = '../Inputs/input_data.csv'
house_price_dataset = pd.read_csv(input_path)
simple_linear_regression(house_price_dataset)
  • We have given the input_path where the dataset is located.
  • Using the input_path we are loading the data into pandas data frame.
  • Next, with the loaded data frame we are calling the simple_linear_regression model.
  • Inside the simple_linear_regression function as of now we are just getting the header name and trying to print the header names.

If we have the pandas setup ready in our system. We can expect the below output.

Script Output

Dataset Headers ::  ['square_feet' 'price']

From the script output, we know that we are having one independent variable (square_feet) and one dependent variable (price). Our intention is to use the square_feet and price readings to calculate the simple linear regression coefficients. Then we are going to using the calculated simple linear regression coefficients to predict the house price.

Now lets’ write a simple function to visualize how the price of the house is varying with the square_feet. We are going to use the plotly scatter plot to visualize.

# Packages for creating the graphs
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.graph_objs import *
py.sign_in(YOUR_USER_NAME, YOUR_API_KEY)


def scatter_graph(x, y, graph_title, x_axis_title, y_axis_title):
    """
    Scatter Graph
    :param x: 
    :param y: 
    :param graph_title: 
    :param x_axis_title: 
    :param y_axis_title: 
    :return: 
    """
    trace = go.Scatter(
        x=x,
        y=y,
        mode='markers'
    )
    layout = go.Layout(
        title=graph_title,
        xaxis=dict(title=x_axis_title), yaxis=dict(title=y_axis_title)
    )
    data = [trace]
    fig = Figure(data=data, layout=layout)
    plot_url = py.plot(fig)

Now let’s call the scatter_graph function with squre_feet readings as x parameter and price readings as y parameter.

input_path = '../Inputs/input_data.csv'
house_price_dataset = pd.read_csv(input_path)
scatter_graph(house_price_dataset['square_feet'], house_price_dataset['price'],
              'Square_feet Vs Price', 'Square Feet', 'Price')

Now let’s use the house price dataset to model the simple linear regression.

def simple_linear_regression(dataset):
    """
    Implementing the simple linear regression without using any python library
    :param dataset:
    :return:
    """

    # Get the dataset header names
    dataset_headers = get_headers(dataset)
    print "Dataset Headers :: ", dataset_headers

    # Calculating the mean of the square feet and the price readings
    square_feet_mean = cal_mean(dataset[dataset_headers[0]])
    price_mean = cal_mean(dataset[dataset_headers[1]])

    square_feet_variance = cal_variance(dataset[dataset_headers[0]])
    price_variance = cal_variance(dataset[dataset_headers[1]])

    # Calculating the regression
    covariance_of_price_and_square_feet = dataset.cov()[dataset_headers[0]][dataset_headers[1]]
    w1 = covariance_of_price_and_square_feet / float(square_feet_variance)

    w0 = price_mean - (w1 * square_feet_mean)

    # Predictions
    dataset['Predicted_Price'] = w0 + w1 * dataset[dataset_headers[0]]

input_path = '../Inputs/input_data.csv'
house_price_dataset = pd.read_csv(input_path)
simple_linear_regression(house_price_dataset)
  • In the simple_linear_regression function. We are using the already implemented cal_mean function to calculate the mean of square_feet and price.
  • Next, we are using the already implemented cal_variance function to calculate the variance of the square_feet and price.
  • After that, we are calculating the $latex {w}_{0}$ and $latex {w}_{1}$ values.
  • We are using the $latex {w}_{0}$ and $latex {w}_{1}$ values to perform the prediction. Which is nothing but prediction the house price given square_feet value.

Check out the complete code below.

#!/usr/bin/env python
# python_pickle.py
# Author : Saimadhu
# Date: 14-Feb-2017
# About: Examples on How to pickle the python object

# Required Python Packages
import pandas as pd
from math import pow


def get_headers(dataframe):
    """
    Get the headers name of the dataframe
    :param dataframe:
    :return:
    """
    return dataframe.columns.values


def cal_mean(readings):
    """
    Function to calculate the mean value of the input readings
    :param readings:
    :return:
    """
    readings_total = sum(readings)
    number_of_readings = len(readings)
    mean = readings_total / float(number_of_readings)
    return mean


def cal_variance(readings):
    """
    Calculating the variance of the readings
    :param readings:
    :return:
    """

    # To calculate the variance we need the mean value
    # Calculating the mean value from the cal_mean function
    readings_mean = cal_mean(readings)
    # mean difference squared readings
    mean_difference_squared_readings = [pow((reading - readings_mean), 2) for reading in readings]
    variance = sum(mean_difference_squared_readings)
    return variance / float(len(readings) - 1)


def cal_covariance(readings_1, readings_2):
    """
    Calculate the covariance between two different list of readings
    :param readings_1:
    :param readings_2:
    :return:
    """
    readings_1_mean = cal_mean(readings_1)
    readings_2_mean = cal_mean(readings_2)
    readings_size = len(readings_1)
    covariance = 0.0
    for i in xrange(0, readings_size):
        covariance += (readings_1[i] - readings_1_mean) * (readings_2[i] - readings_2_mean)
    return covariance / float(readings_size - 1)


def cal_simple_linear_regression_coefficients(x_readings, y_readings):
    """
    Calculating the simple linear regression coefficients (B0, B1)
    :param x_readings:
    :param y_readings:
    :return:
    """
    # Coefficient B1 = covariance of x_readings and y_readings divided by variance of x_readings
    # Directly calling the implemented covariance and the variance functions
    # To calculate the coefficient B1
    b1 = cal_covariance(x_readings, y_readings) / float(cal_variance(x_readings))

    # Coefficient B0 = mean of y_readings - ( B1 * the mean of the x_readings )
    b0 = cal_mean(y_readings) - (b1 * cal_mean(x_readings))
    return b0, b1


def predict_target_value(x, b0, b1):
    """
    Calculating the target (y) value using the input x and the coefficients b0, b1
    :param x:
    :param b0:
    :param b1:
    :return:
    """
    return b0 + b1 * x


def cal_rmse(actual_readings, predicted_readings):
    """
    Calculating the root mean square error
    :param actual_readings:
    :param predicted_readings:
    :return:
    """
    square_error_total = 0.0
    total_readings = len(actual_readings)
    for i in xrange(0, total_readings):
        error = predicted_readings[i] - actual_readings[i]
        square_error_total += pow(error, 2)
    rmse = square_error_total / float(total_readings)
    return rmse


def simple_linear_regression(dataset):
    """
    Implementing simple linear regression without using any python library
    :param dataset:
    :return:
    """

    # Get the dataset header names
    dataset_headers = get_headers(dataset)
    print "Dataset Headers :: ", dataset_headers

    # Calculating the mean of the square feet and the price readings
    square_feet_mean = cal_mean(dataset[dataset_headers[0]])
    price_mean = cal_mean(dataset[dataset_headers[1]])

    square_feet_variance = cal_variance(dataset[dataset_headers[0]])
    price_variance = cal_variance(dataset[dataset_headers[1]])

    # Calculating the regression
    covariance_of_price_and_square_feet = dataset.cov()[dataset_headers[0]][dataset_headers[1]]
    w1 = covariance_of_price_and_square_feet / float(square_feet_variance)

    w0 = price_mean - (w1 * square_feet_mean)

    # Predictions
    dataset['Predicted_Price'] = w0 + w1 * dataset[dataset_headers[0]]


if __name__ == "__main__":

    input_path = '../Inputs/input_data.csv'
    house_price_dataset = pd.read_csv(input_path)
    simple_linear_regression(house_price_dataset)

Give me Five 🙂  with this we implemented the simple linear regression without any machine learning libraries.

The complete code can fork for our Github: simple linear regression code

Follow us:

FACEBOOKQUORA |TWITTERGOOGLE+ | LINKEDINREDDIT FLIPBOARD | MEDIUM | GITHUB

I hope you like this post. If you have any questions, then feel free to comment below.  If you want me to write on one particular topic, then do tell it to me in the comments below.

3 Responses to “Implementing Simple Linear Regression without any Python Machine learining libraries

  • Praveen
    5 years ago

    Can you add the article of simple linear regression using R language

Leave a Reply

Your email address will not be published. Required fields are marked *

>