# Implementing Simple Linear Regression without any Python Machine learining libraries

## Simple linear regression implementation in python

Today we are going to implement the most popular and most straightforward regression technique **simple linear regression** purely in python. When I said purely in python. It’s purely in python **without using** any machine learning libraries.

When I said simple linear regression. What is going on your mind? Let me guess 😛

- It’s so
**simple**to implement simple linear regression. **Understanding**simple linear regression is so comfortable than linear regression.- Time
**complexity level**, simple linear regression will take less time to process.

I guess the above analysis you were doing when I said simple linear regression. Maybe the above assumptions were technically reasonable. But there is a particular reason to call it as simple linear regression. First, let’s understand why we are calling it as simple linear regression. Then we can start my favorite part, code the simple linear regression in python.

Building Simple Linear Regression without using any Python machine learning libraries Click To Tweet

### What is simple linear regression?

In the linear regression analysis article, we mainly concentrated on explaining the linear regression concepts. We used the below equation while describing the linear regression general equations.

The above equation is more likely the **straight line** equation.

Where **m** is the slope of the straight line, and **c** is the constant value. If we compare the above two equations, we can sense the closeness of both the equations. They only differ in the way written except that everything is same.

In linear regression, the m (

- Dependent variable –> y or
- Independent variable –> x or

If we have **k** independent variables. We will get k coefficient values. If we have more than one independent variable to predict the depended value, then it is called linear regression algorithm. When we have only one independent variable to predict the depended value then it **simple linear regression** problem.

Let me give few more examples to give you the **difference between** the linear regression and simple linear regression problems.

### Simple linear regression examples

- Using the feature number of room to predict the house price.
- Number of rooms independent variable and price is dependent variable.

- By considering the number of hours student studied to predict the marks percentage, the student will get.
- Number of hours independent variable and marks percentage dependent variable.

- Given time predicting the temperature outside your room.
- Time is independent variable and temperature is dependent variable.

### Linear regression examples

- Using the features like numbers of rooms, how many years old, garden space to predict the house price.
- The number of rooms, years old, garden area are independent variables, and the house price is the dependent variable.

- By considering the numbers of hours student spent on English, Mathematics, Physics subject to predict the marks percentage the student will get.
- The number of hours the student spent on English, mathematics, physics are the independent variables, and the student scores percentage is the dependent variable.

- Given time, climate details to predict the temperature outside your room.
- Time and the climate details are the independent variables, and the temperature is the dependent variable.

With the above explanation, I hope I addressed the difference between simple linear regression and linear regression.

#### In Shot:

**Simple Linear Regression:** Having one independent variable to predict the dependent variable.

**Linear Regression:** Having more than one independent variable to predict the dependent variable.

Now let’s build the simple linear regression in python without using any machine libraries.

To implement the simple linear regression we need to know the below formulas.

- A formula for calculating the
**mean**value. - A formula for calculating the
**variance**value. - Formula for calculating the
**covariance**between two series of readings (For suppose X, Y) - Formulas for calculating the
and values.

#### Formula for calculating mean value

#### Formula for calculating the variance value

#### Formula for calculating covariance between two series of readings

#### Formula for calculating the and values

We are going to use all the above listed formulas to implement the simple linear regression puruly in Python without any machine learning libraries.

In the process of implementing the simple linear regression in python first. We are going to implement all the above formulas. Then we are going to use the implemented function to build the simple linear regression model.

After that, we are going to use python tabular analysis package to implement the same simple linear regression model with few lines of code. We can treat it as checking the previous implementation.

Let’s start building the required functions in the order.

- Mean Function.
- Variance Function.
- Covariance Function.
- Functions to calculate the
and values.

### Function to Calculate the mean value

def cal_mean(readings): """ Function to calculate the mean value of the input readings :param readings: :return: """ readings_total = sum(readings) number_of_readings = len(readings) mean = readings_total / float(number_of_readings) return mean

- With the
**cal_mean**function, we are going to calculate the**mean value**of the given readings. - We are calculating the sum of readings and storing in the
**readings_total**. - Finding the
**number_of_readings**by using the**len**function. - Using the
**readings_total**and the**number_of_readings**values to calculate the mean. - Finally, we return the calculated
**mean**value.

### Function to Calculate the Variance Value

from math import pow def cal_variance(readings): """ Calculating the variance of the readings :param readings: :return: """ # To calculate the variance we need the mean value # Calculating the mean value from the cal_mean function readings_mean = cal_mean(readings) # mean difference squared readings mean_difference_squared_readings = [pow((reading - readings_mean), 2) for reading in readings] variance = sum(mean_difference_squared_readings) return variance / float(len(readings) - 1)

- With the
**cal_variance**function, we are going to calculate the**variance**of the given readings. - Using the already implemented
**cal_mean**function, we are calculating the**mean**value. - Then we are calculating the difference between the each and every reading in the given readings to the mean value. After that, we are squaring the calculated difference value and storing the difference squared value in
**mean_difference_squared_readings**. - Finding the sum of the
**mean_difference_squared_readings**and return the ratio of the variance sum and the number of readings -1 value.

### Function to Calculate the Covariance Value

def cal_covariance(readings_1, readings_2): """ Calculate the covariance between two different list of readings :param readings_1: :param readings_2: :return: """ readings_1_mean = cal_mean(readings_1) readings_2_mean = cal_mean(readings_2) readings_size = len(readings_1) covariance = 0.0 for i in xrange(0, readings_size): covariance += (readings_1[i] - readings_1_mean) * (readings_2[i] - readings_2_mean) return covariance / float(readings_size - 1)

- With the
**cal_covariance**function, we are going to calculate the covariance between two series of readings. Let’s say the covariance between the**readings_1**and**readings_2**. - Using the already implemented
**cal_mean**function to calculate the mean of**readings_1**and**readings_2**. - Then summing the product of the mean difference of the
**readings_1**and**readings_2**. - Finally, return the ratio of the covariance and the number of readings (
**readings_size – 1**).

With the above function we are ready to calculate the simple linear regression coefficients like

### Functions to calculate the and values

def cal_simple_linear_regression_coefficients(x_readings, y_readings): """ Calculating the simple linear regression coefficients (B0, B1) :param x_readings: :param y_readings: :return: """ # Coefficient W1 = covariance of x_readings and y_readings divided by variance of x_readings # Directly calling the implemented covariance and the variance functions # To calculate the coefficient W1 w1 = cal_covariance(x_readings, y_readings) / float(cal_variance(x_readings)) # Coefficient W0 = mean of y_readings - ( W1 * the mean of the x_readings ) w0 = cal_mean(y_readings) - (w1 * cal_mean(x_readings)) return w0, w1

- From the above formulas for calculating
and we are creating **cal_simple_linear_regression_coefficients function**. - To calculate
value we need to find the ratio of covariance of the **x_readings**and**y_readings**and the variance of the**x_readings.** - Using the
we are calculating the $latext {w}_{0}$ value. - Finally, we are returning the
and $ latex {w}_{1}$ values.

Now Let’s use all the above implemented function to predict the house price using the simple linear regression technique.

we are using the same house price dataset from linear regression implementation in python.

Let’s first load the dataset and see what are the features in the dataset. To load the dataset, we are going to use pandas.

def simple_linear_regression(dataset): """ Implementing the simple linear regression without using any python machine learning library :param dataset: :return: """ # Get the dataset header names dataset_headers = get_headers(dataset) print "Dataset Headers :: ", dataset_headers input_path = '../Inputs/input_data.csv' house_price_dataset = pd.read_csv(input_path) simple_linear_regression(house_price_dataset)

- We have given the input_path where the dataset is located.
- Using the input_path we are loading the data into pandas data frame.
- Next, with the loaded data frame we are calling the simple_linear_regression model.
- Inside the simple_linear_regression function as of now we are just getting the header name and trying to print the header names.

If we have the pandas setup ready in our system. We can expect the below output.

#### Script Output

Dataset Headers :: ['square_feet' 'price']

From the script output, we know that we are having one independent variable (square_feet) and one dependent variable (price). Our intention is to use the square_feet and price readings to calculate the simple linear regression coefficients. Then we are going to using the calculated simple linear regression coefficients to predict the house price.

Now lets’ write a simple function to visualize how the price of the house is varying with the square_feet. We are going to use the plotly scatter plot to visualize.

# Packages for creating the graphs import plotly.plotly as py import plotly.graph_objs as go from plotly.graph_objs import * py.sign_in(YOUR_USER_NAME, YOUR_API_KEY) def scatter_graph(x, y, graph_title, x_axis_title, y_axis_title): """ Scatter Graph :param x: :param y: :param graph_title: :param x_axis_title: :param y_axis_title: :return: """ trace = go.Scatter( x=x, y=y, mode='markers' ) layout = go.Layout( title=graph_title, xaxis=dict(title=x_axis_title), yaxis=dict(title=y_axis_title) ) data = [trace] fig = Figure(data=data, layout=layout) plot_url = py.plot(fig)

Now let’s call the scatter_graph function with squre_feet readings as x parameter and price readings as y parameter.

input_path = '../Inputs/input_data.csv' house_price_dataset = pd.read_csv(input_path) scatter_graph(house_price_dataset['square_feet'], house_price_dataset['price'], 'Square_feet Vs Price', 'Square Feet', 'Price')

Now let’s use the house price dataset to model the simple linear regression.

def simple_linear_regression(dataset): """ Implementing the simple linear regression without using any python library :param dataset: :return: """ # Get the dataset header names dataset_headers = get_headers(dataset) print "Dataset Headers :: ", dataset_headers # Calculating the mean of the square feet and the price readings square_feet_mean = cal_mean(dataset[dataset_headers[0]]) price_mean = cal_mean(dataset[dataset_headers[1]]) square_feet_variance = cal_variance(dataset[dataset_headers[0]]) price_variance = cal_variance(dataset[dataset_headers[1]]) # Calculating the regression covariance_of_price_and_square_feet = dataset.cov()[dataset_headers[0]][dataset_headers[1]] w1 = covariance_of_price_and_square_feet / float(square_feet_variance) w0 = price_mean - (w1 * square_feet_mean) # Predictions dataset['Predicted_Price'] = w0 + w1 * dataset[dataset_headers[0]] input_path = '../Inputs/input_data.csv' house_price_dataset = pd.read_csv(input_path) simple_linear_regression(house_price_dataset)

- In the
**simple_linear_regression**function. We are using the already implemented**cal_mean**function to calculate the mean of**square_feet**and**price**. - Next, we are using the already implemented
**cal_variance**function to calculate the variance of the**square_feet**and price. - After that, we are calculating the
and values. - We are using the
and values to perform the prediction. Which is nothing but prediction the house price given square_feet value.

Check out the complete code below.

#!/usr/bin/env python # python_pickle.py # Author : Saimadhu # Date: 14-Feb-2017 # About: Examples on How to pickle the python object # Required Python Packages import pandas as pd from math import pow def get_headers(dataframe): """ Get the headers name of the dataframe :param dataframe: :return: """ return dataframe.columns.values def cal_mean(readings): """ Function to calculate the mean value of the input readings :param readings: :return: """ readings_total = sum(readings) number_of_readings = len(readings) mean = readings_total / float(number_of_readings) return mean def cal_variance(readings): """ Calculating the variance of the readings :param readings: :return: """ # To calculate the variance we need the mean value # Calculating the mean value from the cal_mean function readings_mean = cal_mean(readings) # mean difference squared readings mean_difference_squared_readings = [pow((reading - readings_mean), 2) for reading in readings] variance = sum(mean_difference_squared_readings) return variance / float(len(readings) - 1) def cal_covariance(readings_1, readings_2): """ Calculate the covariance between two different list of readings :param readings_1: :param readings_2: :return: """ readings_1_mean = cal_mean(readings_1) readings_2_mean = cal_mean(readings_2) readings_size = len(readings_1) covariance = 0.0 for i in xrange(0, readings_size): covariance += (readings_1[i] - readings_1_mean) * (readings_2[i] - readings_2_mean) return covariance / float(readings_size - 1) def cal_simple_linear_regression_coefficients(x_readings, y_readings): """ Calculating the simple linear regression coefficients (B0, B1) :param x_readings: :param y_readings: :return: """ # Coefficient B1 = covariance of x_readings and y_readings divided by variance of x_readings # Directly calling the implemented covariance and the variance functions # To calculate the coefficient B1 b1 = cal_covariance(x_readings, y_readings) / float(cal_variance(x_readings)) # Coefficient B0 = mean of y_readings - ( B1 * the mean of the x_readings ) b0 = cal_mean(y_readings) - (b1 * cal_mean(x_readings)) return b0, b1 def predict_target_value(x, b0, b1): """ Calculating the target (y) value using the input x and the coefficients b0, b1 :param x: :param b0: :param b1: :return: """ return b0 + b1 * x def cal_rmse(actual_readings, predicted_readings): """ Calculating the root mean square error :param actual_readings: :param predicted_readings: :return: """ square_error_total = 0.0 total_readings = len(actual_readings) for i in xrange(0, total_readings): error = predicted_readings[i] - actual_readings[i] square_error_total += pow(error, 2) rmse = square_error_total / float(total_readings) return rmse def simple_linear_regression(dataset): """ Implementing simple linear regression without using any python library :param dataset: :return: """ # Get the dataset header names dataset_headers = get_headers(dataset) print "Dataset Headers :: ", dataset_headers # Calculating the mean of the square feet and the price readings square_feet_mean = cal_mean(dataset[dataset_headers[0]]) price_mean = cal_mean(dataset[dataset_headers[1]]) square_feet_variance = cal_variance(dataset[dataset_headers[0]]) price_variance = cal_variance(dataset[dataset_headers[1]]) # Calculating the regression covariance_of_price_and_square_feet = dataset.cov()[dataset_headers[0]][dataset_headers[1]] w1 = covariance_of_price_and_square_feet / float(square_feet_variance) w0 = price_mean - (w1 * square_feet_mean) # Predictions dataset['Predicted_Price'] = w0 + w1 * dataset[dataset_headers[0]] if __name__ == "__main__": input_path = '../Inputs/input_data.csv' house_price_dataset = pd.read_csv(input_path) simple_linear_regression(house_price_dataset)

Give me Five 🙂 with this we implemented the simple linear regression without any machine learning libraries.

The complete code can fork for our Github: simple linear regression code

#### Follow us:

**FACEBOOK| QUORA |TWITTER| GOOGLE+ | LINKEDIN| REDDIT | FLIPBOARD | MEDIUM | GITHUB**

I hope you like this post. If you have any questions, then feel free to comment below. If you want me to write on one particular topic, then do tell it to me in the comments below.

Can you add the article of simple linear regression using R language