Thursday, March 8, 2018

Logistic Regression: Classical Binomial Linear Classification Machine Learning Algorithm

Logistic Regression is one of the most basic machine learning classification algorithms. It is fast and easy to understand, but it could only identify binomial classes. Other than that, we may need Softmax Regression or Factorization Machine,  which are more generalized forms of Logistic Regression aiming to solve multi-classes problems. Furthermore, emerging usage of SVM, ANN, and Decision Tree can also be highly powerful.

However, in this blog, I am only going to focus on Logistic Regression, briefly summarize the essential steps of it, and give a simple example. The dataset is available here and it was extracted from https://catalog.data.gov/dataset/state-energy-data-system-seds-complete-dataset-through-2009#sec-dates.

I. Introduction to Dataset

The dataset is about the energy profile of all states of U.S. from 1960 to 2009. The meanings of each attribute is available in the metadata posted in the website. The 'energy.csv' is exactly the same but preprocessed to panel form.

Suppose we would like to teach our computers to distinguish the economic and energy developing status of different states, and to make it solvable by Logistic Regression, we only recognize two states, Hawaii ('HI') and New York ('NY'). To make the results visualizable, we only include two attributes as the explainers, and they are Total Energy Average Consumption ('TETCB'/Billion Btu) and Real Value of GDP ('GDPRV'/Million Dollars).

Firstly, we could have a brief look at the scatter plots of the two states with the following codes:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_csv('energy_data.csv')
df1 = df[['StateCode','Year','GDPRV','TETCB']].dropna()
state1 = 'HI'
state2 = 'NY'
fullset = df1[(df1['StateCode'] == state1)|(df1['StateCode'] ==state2)]

plt.figure(dpi=150)
plt.title(state1+' & '+state2)
plt.xlabel('Total Energy Total Consumption (Billion Btu)')
plt.ylabel('Real GDP Value (Million Dollars)')
plt.plot(fullset[fullset.StateCode==state1].TETCB,fullset[fullset.StateCode==state1].GDPRV,ls='',marker='.',color='blue')
plt.plot(fullset[fullset.StateCode==state2].TETCB,fullset[fullset.StateCode==state2].GDPRV,ls='',marker='.',color='red')
plt.legend([state1,state2])
plt.savefig(state1+'_'+state2+'_scatter.png')


Because our analysis is simplified, we could easily observe the difference between the two plots. For example, we could easily classify a point (600000, 40000) to 'NY'. Now, we can test the ability of machine learning algorithm.

II. Steps and Codes

Logistic Regression uses sigmoid function as the function to transfer processed attributes to probability and in the meanwhile beautifully lead the loss function to be convex, and implement maximum likelihood estimation to minimize the loss function of the estimations and real values.

Instead of looking for analytical solutions, we specify a random initial hyperplane and use gradient descent method to learn how close the hyperplane is to the final optima. As thousands of iteration, our results might steadily converge to a constant, which should be the final result of the weights, matrix form of classification hyperplane.

The following codes describe the steps and the codes:

def sig(x):
    return 1.0 / (1 + np.exp(-x))

def lr_train_bgd(feature, label, maxCycle, alpha):
    n = np.shape(feature)[1]
    w = np.mat(np.random.rand(n,1))
    i = 0
    while i <= maxCycle:
        i += 1
        h = sig(feature * w)
        err = label - h
        if i % (maxCycle/10) == 0:
            print("\t---------iter=" + str(i) + \
            " , sum of err= " + str(np.sum(np.abs(err))))
            print('\t-------------weights=',end=' ')
            print(w)
        w = w + alpha * feature.T * err
    return w

#The codes of the function sig(*) and lr_train_bgd(*) is available in https://github.com/zhaozhiyong19890102/Python-Machine-Learning-Algorithm/blob/master/Chapter_1%20Logistic%20Regression/lr_test.py

ones = np.ones((len(fullset),1))
feature = np.column_stack((ones,np.mat(fullset[['TETCB','GDPRV']])))
label = []
for s in fullset.StateCode:
    if s == state1:
        label.append(0.0)
    elif s == state2:
        label.append(1.0)
label = np.mat(label).T
weights = lr_train_bgd(feature,label,10000,0.1)

plt.figure(dpi=150)
plt.title(state1+' & '+state2+' without Standardization')
plt.xlabel('Total Energy Total Consumption (Billion Btu)')
plt.ylabel('Real GDP Value (Million Dollars)')
plt.plot(fullset[fullset.StateCode==state1].TETCB,fullset[fullset.StateCode==state1].GDPRV,ls='',marker='.',color='blue')
plt.plot(fullset[fullset.StateCode==state2].TETCB,fullset[fullset.StateCode==state2].GDPRV,ls='',marker='.',color='red')
x = np.linspace(200000,700000)
y = (- weights[0]/weights[2] - weights[1]*x/weights[2]).T
plt.plot(x,y)
plt.legend([state1,state2])
plt.savefig(state1+'_'+state2+'_LogisticRegression_noStandard.png')


The solid line in the plot is the classification line. Above it is classified as 'HI' and beneath it as 'HI'. Obviously, the result is not satisfactory. I asked codes from my friend professional at computer science and found nearly the same results. It turned out that I was once so upset that this algorithm was too unreliable. 

III. The importance of Standardizing Attributes

However, I suddenly got an idea: I forgot to standardize my explainers / attributes! Because Logistic Regression uses gradient descent method to improve the accuracy of classification hyperplane, and without standardizing my attributes to the same scale, my loss function might be a extremely extended ellipsoid, and any specified learning rate might be either too large to converge the loss function or too small to speed up the calculations. 

Change a little bit of the code, by adding the red part inside the codes:

......
state1 = 'NY'
state2 = 'HI'

fullset = df1[(df1['StateCode'] == state1)|(df1['StateCode'] ==state2)]
for col in ('GDPRV','TETCB'):
    fullset[col] = (fullset[col] - np.min(fullset[col]))/(np.max(fullset[col]) - np.min(fullset[col]))

def sig(x):
    return 1.0 / (1 + np.exp(-x))
.......

We may get a satisfactory result of classification:

 

Therefore, we could use this code to solve binomial classification problems.

IV. Available Source Codes

I have also tried a simple test without standardizing attributes (but the attributes are relatively in the same scale), and I may provide all the codes.

Part I: Introduction to dataset
Part II: Logistic Regression without Standardization
Part III: Logistic Regression with Manual Dataset
Part IV: Logistic Regression with Standardization

Reference:

Machine Learning Algorithm: Python, Zhiyong ZHAO (《Python机器学习算法》赵志勇);essential codes are from his github website referred previously.

No comments:

Post a Comment