linear regression

Linear Regression, Clearly Explained

Linear Regression In Python Without Sklearn

Linear Regression is a Regression-based algorithm under Supervised Learning. In Regression the target label is in continuous numerical data type.

To understand Linear Regression consider an example to predict the Blood Pressure of the patients. Here Drug Dosage is a feature and Blood Pressure is the label. Now plot Dosages vs Blood Pressure data on a Scatter Plot. In order to predict the target variable on new data, we must have to train and predict the model with the provided dataset. The formula used to predict the target label for unseen data is given:

example_linear_regression
y_pred = WΓ—feature_data + bias

By applying the formula we get the predicted value. There shall definitely be a difference between the actual value and predicted value and this difference is known as Loss error. The loss function is defined as the sum of the square of the difference between the actual value and the predicted value.

Cost Function and Gradient Descent

On further computation, it is possible to reduce the loss error value. Next, we try to reduce the Loss function by calculating the Cost Function. The cost function measures the average error for the entire training data set. In Linear Regression we draw a linear line to show the relationship between the data. But how do we get this linear line? To make the question better, How do we get the optimized linear line to make a better prediction. This is where Gradient Descent comes into the picture.

gradient descent linear regression

Gradient Descent is used to find the best Regression line by utilizing the backward propagation technique. In this technique, we initially assign values of Weights and Bias. Once we assign initial Weights and Bias we can calculate the cost function and now our main aim is to reduce the loss. Thus, we update the weights and biases. To update the weights and Bias we use Chain Rule and step size procedure. With help of the Chain Rule, we find the derivatives of weights and Bias. Step size aka Learning Rate is the value that is multiplied with derivatives to get new weights and Bias. Step size should neither be small nor large.

Let’s get into the coding part now:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

celsius_feature = np.arange(20,80)
fahren_label = (celsius_feature * 1.8) + 32

df = pd.DataFrame({"Feature":celsius_feature,"Label":fahren_label})

X = df['Feature']
Y = df.Label   #two ways to select the data

Data Preprocessing

The first step to proceed with training the model is to understand the data. In Linear Regression w, x and b should be in form of Vectors. Vectors here are the matrix of numbers(numerical data type). But we don’t get numerical data features all the time. Moreover, the data is not always clean, lots of Preprocessing is required before we train the model. Let us check the Preprocessing techniques one by one along with the coding part:

Handle Missing Values

In the real-world dataset, most of the data is accommodated with many missing values. Missing values are denoted as NaN (not a number). With help of Pandas library, we manipulate the empty data. Different strategies are required to fill the empty values or sometimes remove the empty data. We have already covered how to do this on Day 2 and Day 3 content. You can also check different Preprocessing techniques in the Day 10 content of the 100DaysofML repository.

import seaborn as sns

sns.heatmap(df.isnull())
plt.show()

Scale And Encode

If the feature is a numerical data type, then it requires to be scaled. Scaling compresses the long-range values in a very small range. Encoding is used to convert categorical data types into numerical data types. Again you can check Day 10 content to understand the workflow of Scaling and Encoding.

#StandardScalar
class StandardScalar():
    def fit_transform(self,x):
        mean = np.mean(x)
        std_dev = np.std(x)
        return (x-mean)/std_dev

sc = StandardScalar()
x = sc.fit_transform(X)

Split the Data

The given data is further divided into two different data sets: Training and Testing. Most of the time we also divided data into three different sets: Training, Validation, and Testing.

train_size = int(0.8*len(x))
# or test_size = int(0.2*len(x))

size = list(range(len(x)))
np.random.shuffle(size)
x = x[size]
y = y[size]

x_train = x[0:train_size].reshape(-1,1)
y_train = y[0:train_size].reshape(-1,1)
x_test = x[train_size:].reshape(-1,1)
y_test = y[train_size:].reshape(-1,1)

How To Train Linear Regression Model?

Firstly assign the initial value for weights and bias.

# we shall first define initial values for Weights and Biases
m = x_train.shape[1]
W = 0.01 * np.random.randn(m,1)
b = np.zeros((1, 1))

y_pred = np.dot(x_train,W) + b

Loss Function and Cost Function

The Loss function measures the error for every individual training example, whereas the Cost function measures the average error for the entire training data set.

Formula:

Loss = βˆ‘π‘π‘–βˆ’1(π‘¦π‘–βˆ’π‘¦Μ‚ 𝑖)^2

𝐽(πœƒ)=𝑀𝑆𝐸=1π‘βˆ‘π‘π‘–βˆ’1(π‘¦π‘–βˆ’π‘¦Μ‚ 𝑖)^2

Gradient Descent

Check the below steps to implement Gradient Descent in the linear regression algorithm.

Update Weight and Bias

We update weights and biases during Gradient Descent

βˆ‚π‘Š=βˆ’(2/𝑁)*βˆ‘π‘–(π‘¦π‘–βˆ’π‘‹π‘–π‘Š)𝑋𝑖=βˆ’(2/𝑁)*βˆ‘π‘–(π‘¦π‘–βˆ’π‘¦Μ‚ 𝑖)𝑋𝑖 
βˆ‚π‘=βˆ’(2/𝑁)*βˆ‘π‘–(π‘¦π‘–βˆ’π‘‹π‘–π‘Š)1=βˆ’(2/𝑁)*βˆ‘π‘–(π‘¦π‘–βˆ’π‘¦Μ‚ 𝑖)1 
π‘Š=π‘Šβˆ’π›Ό*βˆ‚π‘Š 
𝑏=π‘βˆ’π›Ό*βˆ‚π‘ 

Learning Rate( 𝛼 ) is the step size
def optimize(W,b,x_train,y_train,learning_rate,y_pred):
    N = len(y_train)
    dW = -(2/N) * np.sum((y_train - y_pred) * x_train)
    db = -(2/N) * np.sum((y_train - y_pred))
    W += -learning_rate* dW
    b += -learning_rate* db
    
    grad = {"dW":dW,"db":db}
    update = {"W":W,"b":b}
    return grad,update

Predict Linear Regression Model

num_of_iterations = 201 #change and see this difference
learning_rate = 0.01
W = 0.01 * np.random.randn(m,1)
b = np.zeros((1, 1))

for i in range(num_of_iterations):
    y_pred = np.dot(x_train, W) + b
    loss_func = loss(y_train,y_pred)
    
    if i%20 == 0:
        print(f"Iteration:{i}, Loss: {loss_func}")
    
    gradient,change = optimize(W,b,x_train,y_train,learning_rate,y_pred)
    dW = gradient["dW"]
    db = gradient['db']
    W = change["W"]
    b = change["b"]
    
train_predict = predict(W,b,x_train)
test_predict = predict(W,b,x_test)
Output:

Iteration:0, Loss: 0.9781162808237794
Iteration:20, Loss: 0.4334018619487675
Iteration:40, Loss: 0.19205821657440603
Iteration:60, Loss: 0.08511715226734567
Iteration:80, Loss: 0.03772624550514388
Iteration:100, Loss: 0.016722939657209977
Iteration:120, Loss: 0.007413519133635098
Iteration:140, Loss: 0.003286844582149845
Iteration:160, Loss: 0.0014573943987548953
Iteration:180, Loss: 0.0006462765394055764
Iteration:200, Loss: 0.0002866178556400658

Evaluation Metrics

The predicted model is lastly evaluated based on its performance. The evaluation Metrics used for the regression model are: mean absolute error, root mean square error, and mean squared error.

train_mse = np.mean((y_train - train_predict) ** 2)
test_mse = np.mean((y_test - test_predict) ** 2)
print(f"Train MSE: {train_mse}, Test MSE: {test_mse}")
train_rmse = np.mean((y_train - train_predict) ** 2)
test_rmse = np.mean((y_test - test_predict) ** 2)
print(f"Train RMSE: {np.sqrt(train_rmse)}, Test RMSE: {np.sqrt(test_rmse)}")

Linear Regression Without Sklearn: Putting All Together

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

celsius_feature = np.arange(20,80)
fahren_label = (celsius_feature * 1.8) + 32

df = pd.DataFrame({"Feature":celsius_feature,"Label":fahren_label})

X = df['Feature']
Y = df.Label   #two ways to select the data

plt.scatter(X.values,Y.values)
plt.title("Celsius vs Fahrenheit")
plt.xlabel("Celsius")
plt.ylabel("Fahrenheit")
plt.show()

import seaborn as sns

sns.heatmap(df.isnull())
plt.show()

#StandardScalar
class StandardScalar():
    def fit_transform(self,x):
        mean = np.mean(x)
        std_dev = np.std(x)
        return (x-mean)/std_dev

sc = StandardScalar()

x = sc.fit_transform(X)
x = x.to_numpy()
Y = Y.values
y = sc.fit_transform(Y)

train_size = int(0.8*len(x))
# or test_size = int(0.2*len(x))

size = list(range(len(x)))
np.random.shuffle(size)
x = x[size]
y = y[size]

x_train = x[0:train_size].reshape(-1,1)
y_train = y[0:train_size].reshape(-1,1)
x_test = x[train_size:].reshape(-1,1)
y_test = y[train_size:].reshape(-1,1)

# we shall first define initial values for Weights and Biases
m = x_train.shape[1]
W = 0.01 * np.random.randn(m,1)
b = np.zeros((1, 1))

def loss(y_train,y_pred):
    N = len(y_train)
    loss = (1/N) * np.sum((y_pred - y_train)**2)
    return loss

def optimize(W,b,x_train,y_train,learning_rate,y_pred):
    N = len(y_train)
    dW = -(2/N) * np.sum((y_train - y_pred) * x_train)
    db = -(2/N) * np.sum((y_train - y_pred))
    W += -learning_rate* dW
    b += -learning_rate* db
    
    grad = {"dW":dW,"db":db}
    update = {"W":W,"b":b}
    return grad,update

def predict(W,b,X):
    prediction = np.dot(X,W) + b
    return prediction

num_of_iterations = 201 #change and see this difference
learning_rate = 0.01
W = 0.01 * np.random.randn(m,1)
b = np.zeros((1, 1))

for i in range(num_of_iterations):
    y_pred = np.dot(x_train, W) + b
    loss_func = loss(y_train,y_pred)
    
    if i%20 == 0:
        print(f"Iteration:{i}, Loss: {loss_func}")
    
    gradient,change = optimize(W,b,x_train,y_train,learning_rate,y_pred)
    dW = gradient["dW"]
    db = gradient['db']
    W = change["W"]
    b = change["b"]
    
train_predict = predict(W,b,x_train)
test_predict = predict(W,b,x_test)

train_mse = np.mean((y_train - train_predict) ** 2)
test_mse = np.mean((y_test - test_predict) ** 2)
print(f"Train MSE: {train_mse}, Test MSE: {test_mse}")

train_rmse = np.mean((y_train - train_predict) ** 2)
test_rmse = np.mean((y_test - test_predict) ** 2)
print(f"Train RMSE: {np.sqrt(train_rmse)}, Test RMSE: {np.sqrt(test_rmse)}")

Linear Regression With Python Sklearn

Scikit-learn aka Sklearn is a Python Machine Learning library. Sklearn makes our job easier by providing the pre-written Machine Learning Algorithms. We shall first install the scikit-learn library and directly dive into a real-world dataset.

pip install scikit-learn

Syntax to implement Linear Regression: Car Prediction Using Linear Regression Model.

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_train,y_train)

Complete Source Code: Car Prediction.

Regularisation

What to do if our model is Overfitting our training dataset?

Regularisation is a technique using which you can avoid Overfitting. To regularise our model we add complexity to the loss function. The method to add complexity is the sum of the square of weights multiplied by lambda. The most common Regularisation technique is L2 Regularisation, and we shall see this in the ongoing #100DaysofML. For time being just remember that the reason we use Regularisation is to reduce the overfitting of the training dataset.

Conclusion

I have covered pretty much everything you need to know to get started with Linear Regression. In order to evaluate a better model, we try different ML algorithms. And we shall cover everything in this journey of #100daysofCode.

Want to learn Machine Learning with Proper Roadmap and resources. Then check this Repository: https://github.com/lucifertrj/100DaysOfML

Subscribe to our Newsletter to never miss out on the content: https://animevyuh.org/newsletter. Join our Newsletter now for amazing Anime recommendationsPython, and Machine Learning Content.

Support Us: https://www.buymeacoffee.com/trjtarunhttps://ko-fi.com/tarunrjain751.
GitHub: https://github.com/lucifertrj.
Twitter: https://twitter.com/TRJ_0751.