logistic regression

Logistic Regression Clearly Explained

Logistic Regression Without Sklearn Library

With the ongoing 100DaysOfML in this article, you will dive deeper into Logistic Regression and its working. Further, we will build a Logistic Regression Model with and without using the Sklearn Library.

What is Logistic Regression?

Just like Linear Regression, Logistic Regression is a Supervised Machine Learning algorithm that is widely used for classification predictions. Logistic Regression varies from Linear Regression as it is used to predict discrete labels. Due to its discrete nature, this is also known as Binary Classification.

Logistic regression is used to solve binary classification problems, for example, Yes or No, Survived or Not Survived, Dead or Alive, and so on.

How does Logistic Regression Work?

On passing the input data to an activation function Logistic Regression estimates the relationship between one or more independent variables and returns a categorical variable. The categorical variable is either 0 or 1. To find the relationship among the independent variables this particular regression makes use of the Sigmoid Function. Sigmoid Function returns the value in the range 0 to 1. A value above 0.5 is considered 1 and lower than 0.5 or equal is considered as 0.

Sigmoid Function is an S-shaped curve that stretches from zero to one, while never being exactly zero and never being exactly one, either. 

sigmoid_function_logistic_regression
z = (W.T*X) + b
y_pred = 1/(1+e^{-z})

By applying the formula we get the predicted value. There shall definitely be a difference between the actual value and predicted value and this difference is known as Loss error. The loss function of Logistic Regression is slightly different from Linear Regression. Our goal is to reduce the Loss Function, this process is known as the Cost Function. The cost function measures the average error for the entire training data set. 

loss = -y*log(y_pred) - (1-y)*log(1-y_pred)
J = sum(loss) / m  #cost

Our primary goal is to reduce the Cost function and to perform this we shall use Backpropagation Technique i.e., is to find Gradient Descent. Gradient Descent or Back Propagation is a technique using which we shall update the weights and biases and this process uses Learning Rate and Number of Iterations as hyperparameters. On each iteration, we try to reduce the Cost Function of the model.

Kickstart with 100 Days Of Machine Learning Challenge

Logistic Regression Without Sklearn

Enough with the theory, let us dig directly into code implementation of Logistic Regression with and without the Sklearn library.

Import and Read Data

You can download the dataset here:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

train_data = pd.read_csv("hr_job.csv")
train_data.fillna(method="bfill",inplace=True)

Preprocessing

In preprocessing we shall Standardize the data and initialize the weights and bias. Standardization rescales a dataset to have a mean of 0 and a standard deviation of 1. 

train_data.drop('city',axis=1,inplace=True)
X = train_data.iloc[:,:-1]
Y = train_data.target

sc = StandardScalar()
X = sc.fit_transform(X)
Y = Y.to_numpy().reshape(-1,1)
x_T,x_V,y_T,y_V = train_test_split(X,Y,train_size=0.75)

def initialize_weights(dim):
    W = np.full((dim,1),0.1)
    b = 0.0
    return W,b

Activation Function

The activation Function is a non-linear function that is used to finalize the target output in range of 0 to 1. The activation function used in Logistic Regression is Sigmoid Function. The sigmoid Function is best used for Binary Classification.

def sigmoid(z):
    activation = 1 / (1+np.exp(-z))
    return activation

Cost Function

The cost function measures the average error for the entire training data set. 

def costFunction(y_pred,y):
    training_samples = y.shape[0]
    loss = (y*np.log(y_pred))+((1-y)*np.log(1-y_pred))
    return -np.sum(loss)/training_samples

Gradient Descent: Backward Propagation

For Gradient Descent we use Chain Rule, i.e., we partially differentiate weights and bias and update it using a learning rate. Learning Rate determined the step size to update the parameters.

def gradient_descent(x_train,y_train,y_pred):
    cost = costFunction(y_pred,y_train)
    derivative_weight = (np.dot(x_train.T,(y_pred-y_train)))/x_train.shape[0] 
    derivative_bias = np.sum(y_pred-y_train,keepdims=True)/x_train.shape[0]
    gradients = {"dW": derivative_weight,"db": derivative_bias}
    return cost,gradients

def update(W,b,x_train,y_train,y_pred,learning_rate,number_of_iterations):
    costRecord = []
    for i in range(number_of_iterations):
        cost,gradients = gradient_descent(x_train,y_train,y_pred)
        W -= learning_rate*gradients['dW']
        b -= learning_rate*gradients['db']
        if i%100 == 0:
            costRecord.append(cost)
    parameters = {'W':W,'b':b}
    plt.plot(costRecord,np.arange(0,number_of_iterations,100))
    plt.show()
    return parameters

Prediction

Since we have used Sigmoid Function as the activation function, this will return the predicted results in a range of 0 to 1. But since we need binary classified output, we need to initialize a new NumPy array and store 0 when the predicted result is less than <0.5, and 1 if it is greater.

def predict(W,b,x):
    Y_prediction = np.zeros((x.shape[0],1))
    z = np.dot(X,W)+b
    y_pred = activation(z)
    print(y_pred.shape[0])
    for i in range(y_pred.shape[1]):
        if y_pred[i,0]< 0.5:
            Y_prediction[i,0] = 0
        else:
            Y_prediction[i,0] = 1
    return Y_prediction

Putting All Together- Logistic Regression Without Sklearn

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

class StandardScalar():
    def fit_transform(self,x):
        mean = np.mean(x,axis=0)
        std_dev = np.std(x)
        return (x-mean)/std_dev


train_data = pd.read_csv("hr_job.csv")
train_data.fillna(method="bfill",inplace=True)

train_data.drop('city',axis=1,inplace=True)
X = train_data.iloc[:,:-1]
Y = train_data.target

sc = StandardScalar()
X = sc.fit_transform(X)

def initialize_weights(dim):
    W = np.full((dim,1),0.1)
    b = 0.0
    return W,b
    
def activation(Z):
    sigmoid = 1/(1+np.exp(-Z))
    return sigmoid

def costFunction(y_pred,y):
    training_samples = y.shape[0]
    loss = (y*np.log(y_pred))+((1-y)*np.log(1-y_pred))
    return -np.sum(loss)/training_samples

def gradient_descent(x_train,y_train,y_pred):
    cost = costFunction(y_pred,y_train)
    derivative_weight = (np.dot(x_train.T,(y_pred-y_train)))/x_train.shape[0] 
    derivative_bias = np.sum(y_pred-y_train,keepdims=True)/x_train.shape[0]
    gradients = {"dW": derivative_weight,"db": derivative_bias}
    return cost,gradients

def update(W,b,x_train,y_train,y_pred,learning_rate,number_of_iterations):
    costRecord = []
    for i in range(number_of_iterations):
        cost,gradients = gradient_descent(x_train,y_train,y_pred)
        W -= learning_rate*gradients['dW']
        b -= learning_rate*gradients['db']
        if i%100 == 0:
            costRecord.append(cost)
    parameters = {'W':W,'b':b}
    plt.plot(costRecord,np.arange(0,number_of_iterations,100))
    plt.show()
    return parameters

def predict(W,b,x):
    Y_prediction = np.zeros((x.shape[0],1))
    z = np.dot(X,W)+b
    y_pred = activation(z)
    print(y_pred.shape[0])
    for i in range(y_pred.shape[1]):
        if y_pred[i,0]< 0.5:
            Y_prediction[i,0] = 0
        else:
            Y_prediction[i,0] = 1
    return Y_prediction

Y = Y.to_numpy().reshape(-1,1)
x_T,x_V,y_T,y_V = train_test_split(X,Y,train_size=0.75)

def logistic_regression(x_train, y_train,x_test,y_test,learning_rate ,  num_iterations):
    W,b = initialize_weights(x_train.shape[1])
    z = np.dot(x_train,W)+b
    y_pred = activation(z)
    parameters = update(W, b, x_train, y_train,y_pred,learning_rate,num_iterations)

    y_prediction_train = predict(parameters['W'],parameters['b'],x_train)
    y_prediction_test = predict(parameters['W'],parameters['b'],x_test)

    print("Train accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_train - y_train)) * 100))
    print("Test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100))
    
logistic_regression(x_T,y_T,x_V,y_V,0.001 , 1000)

Logistic Regression With Sklearn

Sklearn is a Python Machine Learning Library that makes it easier to implement Machine Learning Algorithms without hardcoding them with Mathematics. All the Machine Learning algorithms are already implemented with the Sklearn module, we just need to instantiate them.

First, we create the instance of the model and then fit the training data into the model.

model = LogisticRegression()
model = model.fit(x_T,y_T)
predict = model.predict(x_V)

Evaluation Metrics

Evaluation Metrics are used to evaluate the overall performance of the model prediction in reference to testing parameters. In Linear Regression or Regression algorithms, the evaluation metrics used are Mean Square error or Mean Absolute Error. Whereas in Logistic Regression or Classification algorithms, the evaluation metrics used are accuracy score, Precision, Recall, F1 Score, and Confusion Metrics. In this article, we shall look into Accuracy Scores and Confusion Metrics.

true positive is an outcome where the model correctly predicts the positive class. A true negative is an outcome where the model correctly predicts the negative class.

False positive is an outcome where the model incorrectly predicts the positive class. A False negative is an outcome where the model incorrectly predicts the negative class.

In simple words:

predicted = 1 actual = 1 => True Positive 
predicted = 0 actual = 0 => True Negative 
predicted = 0 actual = 1 => False Negative 
predicted = 1 actual = 0 => False Positive
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import confusion_matrix

Accuracy Score

The accuracy_score method is used to calculate the accuracy of all the correct predictions made by the model.

Accuracy Score = (TP+TN)/ (TP+FN+TN+FP)

score = accuracy_score(predict,y_test)

Precision

Precision is a measure of how many of the positive predictions made are correct (true positives).

Precision Score = (TP)/ (TP+FP)

precision = precision_score(predict,y_test)

Recall

Recall is a measure of how many of the positive cases the classifier correctly predicted, over all the positive cases in the data. Also known as Sensitivity.

Recall Score = (TP)/ (TP+FN)

recall = recall_score(predict,y_test)

F1 Score

F1-Score is a measure combining both precision and recall.

F1-Score = 2*(Precision*Recall)/(Precision+Recall)
f1 = f1_score(predict,y_test)

Confusion Matrix

A confusion matrix is a square matrix that represents four values i.e., (TP, FP, TN, and FN).

confusion_matrix(predict,y_test)

#to visualize, you can use Heatmap
sns.heatmap(confusion_matrix(predict,y_test),annot=True)

In this article, we discussed only Accuracy Score, F1Score, Recall, Precision, and Confusion Matrix. In the coming 100DaysofML articles we shall also look into the ROC curve, ROC_AUC score, and more.

Application Of Logistic Regression

The most common use case of this is used to solve classification problem statements that mainly include:

  • Breast Cancer Prediction
  • Brain Tumor Prediction
  • Fraud Detection
  • Accessory Buy Prediction In Gaming
  • One such algorithm that is helpful in Marketing Analysis

Kickstart with 100 Days Of Machine Learning Challenge