Ordinary linear regression and its assumptions, weighted linear regression, Lasso regression, Ridge regression and Elastic Net Regression¶

After so many articles about regression on Medium.com, I think its best to abridge the definition and explanation of regression to avoid repetition. Regression is a line equation, when one predictor is used to predict the dependent variable. For two predictor variable, it is plane or a curved surface (when interaction terms are included).

Similarly, the concept can be expanded to dataset of mutiple dimensions.

All machine learning algorithms involve minimizing or maximizing a function or equation, called objective function. In other words, every machine learning problem or algorithm is an optimisation problem with a objective function and constraints.¶

In regression, the objective function is mean squared error and in deep learning, we use log loss function.¶

regression equation:

y=bx+b0¶

Here, y is predicted variable, b is coefficient and b0 is intercept (constant)

sum of mean squared error is given by¶

E=1/N Σ (Y - y)² Here Y is actual value

expanding the term by substituting the regression equation we get

E=1/N Σ (Y - bx+b0)²

Now to find the b0 and b at which the Error is minimum, we differentiate with respect to b and b0 and equate it to zero. ()¶

we get a regression line with slope b that passes through a point (mean of x, mean of Y).

But what exactly is b, b = cov(x,Y)/var(x)

Compare that with correlation coefficient corr(x,Y)=cov(x,Y)/(std(x)*std(y))

Hence b is similar to the correlation between x and Y. this can be reworded as, slope of the line is nothing but the correlation between the independent and dependent variable. sounds logical, isn't it.¶

The same definition and derivation can be extended to multiple linear regression.

Assumptions of the Linear regression:¶

(L.I.N.E)

Linearity: Non linear data should be modeled with polynomial regression. Additionally, linear regression is highly susceptible to outliers.

Independent (No Multicollinearity): Multi-collinearity is the correlation between independent variables. methods to find and avoid this problems are detailed below.

Normally distributed Independent variables: Independent variables should be normally distributed, and if they are not, taking log transformation or feature engineering would help.

Equal variance (Homoscedasticity): Heteroscedarcity can be dealt by taking log transformation of features or by using weighted least square method.

Assumptions for residuals:¶

No auto correlation among the residuals

Why use shrinkage/regularization factors?¶

Before we answer that question, we need to discuss about the lengendary bias vs variance trade-off phenomenon in data science. Different people have different definitions for this topic. I'll try to explain it in two lines. Bias is the difference between the expected (or average) prediction of multiple models created on different samples of the same population and the correct value (Truth) which we are trying to predict. Bias error decreases with increase in complexity of the model. And, variance error arises from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs. Total error is the sum of these two errors along with irreducible error. As more and more features are added to model, the bias decreases and model variance increases. Finally, it all comes down to complexity vs variance of the model. Decreasing complexity increases bias error but decreases model variance. According to Gauss Markov theorem, OLS has lowest variance or mean squared error among unbiased models (), given that (E(bi)=Bi). Therefore regularization is used to introduce bias and decrease model variance in regression.

(I am going to avoid the mathematical representations here) There Three forms of regularization:

L2 - Ridge A penalty term multiplied with squared coefficient is used to reduce or shrink the coefficients of regression, thus, introducing bias into the model. ridge is good for grouped selection.

L1 - Lasso A penalty term multiplied with absolute value of coefficients is used to reduce or shrink the coefficients of regression, thus, introducing bias into the model. Lasso also helps in feature selection. Regularization helps in reducing the variance of the model and improves the prediction accuracy of the model.

As suggested by Richard Berk In the book "Regression analysis: a constructive critique", simple linear regression works as descriptive tool, whereas regularized regression techniques can be used as predictive tool for more accurate prediction.¶

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
import pandas as pd
import sklearn
import warnings
import os
warnings.filterwarnings('ignore')
%config IPCompleter.greedy=True
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)

os.getcwd()

'c:\\users\\annep\\documents\\notebooks'

df_raw = pd.read_csv('SwissRailways.csv')
df_raw.head(10)

Description of attributes
Swiss Railways, N = 49, T(i) = 3 to 13

Variables in the file are

ID = Company ID from 1 to 50 (49 companies, 604 obs) ID = 21 zero count.
RAILROAD = Company ID reduced to remove gap at ID=21, values 1 to 49.
YEAR = Year (1985 to 1997).
NI = Number of year observations for firm, repeated in each observation.
STOPS = Number of stations on the network.
NETWORK = Length of railway network (m).
LABOREXP = Labor expenses in 1000 CHF.
STAFF = Number of employees.
ELECEXP = Electricity expenses in 1000 CHF.
KWH = Total consumed electricity (in 1000 kWh).
TOTCOST = Total cost (in 1000 CHF).
NARROW_T = Dummy for the networks with narrow track (1 m wide).
RACK = Dummy for the networks with RACK RAIL (cremaillere) in at least some part;
(used to maintain a slow movement of the train on high slopes).
TUNNEL = Dummy for networks that have tunnels with an average length of more than 300 meters.
VIRAGE = Dummy for the networks whose minimum radius of curvature is 100 meters or less.
CT = Total costs adjusted for inflation (1000 CHF).
Q1 = Total output in train kilometers.
Q2 = Total passenger output in passenger kilometers.
Q3 = Total goods output in ton kilometers.
PL = Labor price adjusted for inflation (in CHF per person per year).
PK = Capital price using the total number of seats as a proxy for capital stock (CHF per seat).
PE = Price of electricity (CHF per kWh).
LABOR = Quantity of labor.
ELEC = Quantity of energy.
CAPITAL = Quantity of Capital Capital costs = TOTCOST- (LABOREXP + ELECEXP)
Inflation adjustment is done with respect to 1997 prices.
Logs of costs and prices (lnCT, lnPK, lnPL) are normalized by PE.
LNCT, LNPK, LNPL, LNQ1, LNQ2, LNQ3, LNSTOP, LNCAP, LNNET = logs of variables

lets make the dataset simpler by removing Logs of costs and prices (lnCT, lnPK, lnPL) that are normalized by PE. Also, since labor expenses and electricity expenses are directly related to total costs, the prediction variable, we remove these features too.

df_raw.columns.values

array(['id', 'YEAR', 'NI', 'STOPS', 'NETWORK', 'LABOREXP', 'STAFF',
       'ELECEXP', 'KWH', 'TOTCOST', 'NARROW_T', 'RACK', 'TUNNEL', 'T',
       'Q1', 'Q2', 'Q3', 'CT', 'PL', 'PE', 'PK', 'VIRAGE', 'LABOR', 'ELEC',
       'CAPITAL', 'LNCT', 'LNQ1', 'LNQ2', 'LNQ3', 'LNNET', 'LNPL', 'LNPE',
       'LNPK', 'LNSTOP', 'LNCAP', 'RAILROAD '], dtype=object)

df=df_raw[['STOPS', 'NETWORK', 'STAFF',
       'KWH', 'TOTCOST', 'NARROW_T', 'RACK', 'TUNNEL',
       'Q1', 'Q2', 'Q3', 'VIRAGE', 'LABOR', 'ELEC',
       'CAPITAL', 'LNQ1', 'LNQ2', 'LNQ3', 'LNNET', 'LNPE',
        'LNSTOP', 'LNCAP']]

print(df.columns.values)
df.head(20)

['STOPS' 'NETWORK' 'STAFF' 'KWH' 'TOTCOST' 'NARROW_T' 'RACK' 'TUNNEL' 'Q1'
 'Q2' 'Q3' 'VIRAGE' 'LABOR' 'ELEC' 'CAPITAL' 'LNQ1' 'LNQ2' 'LNQ3' 'LNNET'
 'LNPE' 'LNSTOP' 'LNCAP']

#lets study variables
sns.distplot(df.TOTCOST )

<matplotlib.axes._subplots.AxesSubplot at 0x1480dbcd550>

sns.distplot(df.LABOR)

<matplotlib.axes._subplots.AxesSubplot at 0x1480e09a128>

sns.distplot(df.NETWORK)

<matplotlib.axes._subplots.AxesSubplot at 0x1480f13f6d8>

sns.distplot(df.ELEC)

<matplotlib.axes._subplots.AxesSubplot at 0x1480f2789e8>

sns.distplot(df.Q2)

<matplotlib.axes._subplots.AxesSubplot at 0x1480f34bac8>

##### pair plots for correlation analysis
#sns.color_palette("Set1")
sns.pairplot(df[['STOPS', 'NETWORK', 'STAFF',
       'KWH', 'TOTCOST', 
       'Q1', 'Q2', 'Q3', 'LABOR', 'ELEC',
       'CAPITAL']])

<seaborn.axisgrid.PairGrid at 0x1480f444a90>

df.isnull().sum()

STOPS       0
NETWORK     0
STAFF       0
KWH         0
TOTCOST     0
NARROW_T    0
RACK        0
TUNNEL      0
Q1          0
Q2          0
Q3          0
VIRAGE      0
LABOR       0
ELEC        0
CAPITAL     0
LNQ1        0
LNQ2        0
LNQ3        0
LNNET       0
LNPE        0
LNSTOP      0
LNCAP       0
dtype: int64

#sns.light_palette("navy", reverse=True)
#cmap = sns.cubehelix_palette(light=1, as_cmap=True)
sns.reset_orig()
g = sns.PairGrid(df[['STOPS', 'NETWORK', 'STAFF',
       'KWH', 'TOTCOST', 
       'Q1', 'Q2', 'Q3', 'LABOR', 'ELEC',
       'CAPITAL']])
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, n_levels=6);

Since our main intension behind this notebook is to try regression with different cost functions and variable selection methods, lets begin with the modeling without any further ado. As discussed in the clustering notebook, the data should be normalized before modelling

#subsetting the attributes for modelling.
df.temp=df.loc[:, df.columns != 'TOTCOST']
df.temp.head()

from sklearn import preprocessing
df.scaled = preprocessing.scale(df.temp)

df.scaled

array([[-0.5629073 , -0.46297081, -0.46206069, ..., -1.26772948,
        -0.72052812, -1.48360239],
       [-0.5629073 , -0.46297081, -0.45905378, ..., -0.56130556,
        -0.72052812, -1.48360239],
       [-0.5629073 , -0.56499434, -0.45905378, ..., -2.28239909,
        -0.72052812, -1.48360239],
       ..., 
       [ 5.47735469,  5.9868801 ,  4.10843935, ..., -2.08378444,
         3.10123292, -0.12927144],
       [ 5.47735469,  5.9868801 ,  3.95208014, ..., -1.32083693,
         3.10123292, -0.12927144],
       [ 5.36852114,  5.9868801 ,  3.84082454, ..., -2.62852351,
         3.07568304, -0.12927144]])

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
from scipy import stats

X=df.loc[:, df.columns != 'TOTCOST']
y=df[['TOTCOST']]
X2 = sm.add_constant(X) #beta0 or intercept to the model
est = sm.OLS(y, X2)
model = est.fit()
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                TOTCOST   R-squared:                       0.984
Model:                            OLS   Adj. R-squared:                  0.983
Method:                 Least Squares   F-statistic:                     1750.
Date:                Wed, 03 Apr 2019   Prob (F-statistic):               0.00
Time:                        21:54:39   Log-Likelihood:                -6047.8
No. Observations:                 604   AIC:                         1.214e+04
Df Residuals:                     583   BIC:                         1.223e+04
Df Model:                          20                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -21.7121      4.422     -4.910      0.000     -30.398     -13.026
STOPS        269.8442    137.355      1.965      0.050       0.072     539.616
NETWORK        0.1123      0.036      3.091      0.002       0.041       0.184
STAFF        -16.6760      7.224     -2.308      0.021     -30.865      -2.487
KWH            1.8598      0.193      9.622      0.000       1.480       2.239
NARROW_T   -3087.5008   1424.067     -2.168      0.031   -5884.427    -290.575
RACK        2965.2821    724.182      4.095      0.000    1542.958    4387.606
TUNNEL     -2297.8698    953.082     -2.411      0.016   -4169.763    -425.977
Q1            -0.0027      0.002     -1.201      0.230      -0.007       0.002
Q2             0.0001   3.09e-05      4.405      0.000    7.55e-05       0.000
Q3             0.0002   1.98e-05      7.856      0.000       0.000       0.000
VIRAGE      3377.3691   1424.574      2.371      0.018     579.446    6175.292
LABOR       -134.3962    134.503     -0.999      0.318    -398.566     129.773
ELEC       -1938.3287    272.185     -7.121      0.000   -2472.911   -1403.746
CAPITAL      -98.4774    130.140     -0.757      0.450    -354.077     157.123
LNQ1        -497.7727   2060.705     -0.242      0.809   -4545.082    3549.536
LNQ2        3478.3767    998.965      3.482      0.001    1516.369    5440.385
LNQ3        -128.7748    178.577     -0.721      0.471    -479.507     221.957
LNNET       -222.3410   1212.392     -0.183      0.855   -2603.528    2158.846
LNPE        4219.4687   1764.663      2.391      0.017     753.598    7685.339
LNSTOP     -5543.6414   2412.607     -2.298      0.022   -1.03e+04    -805.182
LNCAP      -2707.2628   1172.768     -2.308      0.021   -5010.627    -403.898
==============================================================================
Omnibus:                      142.699   Durbin-Watson:                   0.412
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2423.922
Skew:                           0.542   Prob(JB):                         0.00
Kurtosis:                      12.754   Cond. No.                     1.81e+15
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.02e-12. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

Now that we have constructed a full model, its time for building the parsimonious model. To build the simplest model we need to remove the insignificant features, in this case Land and labor, that have p value greater than 0.05 and rebuild the model.

Also according to the fundamental assumptions of regression, every predictor variable should be independent of other variables. The correlation among the features can cause multi collinearity. why is this multi-collienarity such big issue? glad you asked. The basic definition of regression cofficients is that the increase of response variable when the predictor variable is increased by a unit while keeping all other variables constant. Yes, keeping all other variables constant is the important part here. When the features are correlated, we cannot keep other variables constant. Hence, it breakdowns the entire regression equation.

y=b1(X)+b2(X)+b3(X)+b0

There are several methods to find correlated features; Correlation plots and variance inflation methods are few of them.

But, feature selection can also be done by other methods such as
forward selection: here the model starts from null model to full model by selecting one feature at a time and by evaluating each models accuracy statistics (R2, AIC and BIC)
Backward elimination: eliminates individual feature from full model until the best model is obtained.
stepwise slection: In this method we do both the bove methods simultaneously by doing partial F-test</strong></i>

comparing the variation inflation factor with correlation coefficients, we can identify features that are highly correlated and avoid multicollinearity problems

from statsmodels.stats.outliers_influence import variance_inflation_factor
for j in range(X.shape[1]):
    print (df.columns.values[j],variance_inflation_factor(X.values, j))

STOPS 127.440136513
NETWORK 84.0999520981
STAFF 115.460180091
KWH 181.246228083
TOTCOST 8.86797367744
NARROW_T 1.88639315417
RACK 2.78206760096
TUNNEL 116.668971283
Q1 57.9987118074
Q2 21.8546367338
Q3 8.2387287145
VIRAGE 1027.23285999
LABOR 27.1979168373
ELEC 673.606590656
CAPITAL 86.1323886389
LNQ1 35.7080413282
LNQ2 4.68194573423
LNQ3 24.0928637212
LNNET 1.45435136415
LNPE 49.5498154984
LNSTOP 2.96043864134

#cols=[0,1,3,4]
df.corr()

from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
model = LinearRegression()
rfe = RFE(model,13)               
fit = rfe.fit(X, y)
y_pred=fit.predict(X)
print("Num Features: %s" % (fit.n_features_))
print("Selected Features: %s" % (fit.support_))
print("Feature Ranking: %s" % (fit.ranking_))

Num Features: 13
Selected Features: [False False False False  True False  True False False False  True  True
  True  True  True  True False False False  True  True  True  True  True]
Feature Ranking: [ 5  9  4  8  1  7  1 10 12 11  1  1  1  1  1  1  6  3  2  1  1  1  1  1]

print("Mean squared error: %.2f"
      % mean_squared_error(y, y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y, y_pred))

Mean squared error: 490612376.54
Variance score: 0.72

Feature selection didnt give a optimal set of features.

We see that recursive feature elimination doesn't always yield optimal feature set, especially not helpful in rejecting the features that cause multi-collinearity. For these reasons, we apply VIF and filter out the features that have VIF factor by coss-referencing with correlation matrix created. In this particular case, we find that 'VIRAGE', 'STOPS', 'ELEC' and 'KWH' have high VIF factor. Also, all the insignificant features should be removed to get parsimonious model

X.corr()

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
from scipy import stats

X=df.loc[:, df.columns != 'TOTCOST']
y=df[['TOTCOST']]
X['LNKWH']=np.log(X.KWH)
X['LNLABOR']=np.log(X.LABOR)
X['LNSTAFF']=np.log(X.STAFF)
X2 = sm.add_constant(X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL']]) #beta0 or intercept to the model
est = sm.OLS(y, X2)
model = est.fit()
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                TOTCOST   R-squared:                       0.728
Model:                            OLS   Adj. R-squared:                  0.724
Method:                 Least Squares   F-statistic:                     199.0
Date:                Wed, 03 Apr 2019   Prob (F-statistic):          1.09e-162
Time:                        22:39:33   Log-Likelihood:                -6896.3
No. Observations:                 604   AIC:                         1.381e+04
Df Residuals:                     595   BIC:                         1.385e+04
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        1.77e+04   2.16e+04      0.818      0.414   -2.48e+04    6.02e+04
LNKWH       3.897e+04   4529.019      8.605      0.000    3.01e+04    4.79e+04
LNQ2       -1.878e+04   2156.844     -8.709      0.000    -2.3e+04   -1.45e+04
LNQ3        3112.0238    478.238      6.507      0.000    2172.783    4051.264
STOPS        629.9722     82.994      7.591      0.000     466.975     792.969
LNSTAFF     8106.1383   3721.291      2.178      0.030     797.676    1.54e+04
LNNET      -1.764e+04   2522.807     -6.993      0.000   -2.26e+04   -1.27e+04
ELEC       -8891.6243   1252.454     -7.099      0.000   -1.14e+04   -6431.856
TUNNEL      1.969e+04   2936.610      6.705      0.000    1.39e+04    2.55e+04
==============================================================================
Omnibus:                      401.617   Durbin-Watson:                   0.235
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5954.250
Skew:                           2.739   Prob(JB):                         0.00
Kurtosis:                      17.373   Cond. No.                         823.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

sns.distplot(X.KWH)

<matplotlib.axes._subplots.AxesSubplot at 0x14821777588>

X.columns.values

array(['STOPS', 'NETWORK', 'STAFF', 'KWH', 'NARROW_T', 'RACK', 'TUNNEL',
       'Q1', 'Q2', 'Q3', 'VIRAGE', 'LABOR', 'ELEC', 'CAPITAL', 'LNQ1',
       'LNQ2', 'LNQ3', 'LNNET', 'LNPE', 'LNSTOP', 'LNCAP'], dtype=object)

regr = LinearRegression()

# Train the model using the training sets
regr.fit(X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values, y.values)

# Make predictions using the testing set
y_pred = regr.predict(X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values)

from math import sqrt
from sklearn.metrics import mean_squared_error, r2_score
print("Mean squared error: %.2f"
      % mean_squared_error(y, y_pred))
print("Root mean squared error: %.2f"
      % sqrt(mean_squared_error(y, y_pred)))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y, y_pred))

Mean squared error: 38106449.16
Root mean squared error: 6173.04
Variance score: 0.98

residuals=(y-y_pred)
#residuals

ax=sns.kdeplot(residuals.TOTCOST,shade=True,color="g",legend=False)
ax.set(xlabel='Residuals', ylabel='Density')
plt.show()

ax=sns.residplot(y_pred.flatten(),residuals.TOTCOST.values, lowess=True, color="g")
ax.set(xlabel='Fitted Values', ylabel='Residuals')
plt.show()

from statsmodels.graphics.gofplots import qqplot
qqplot(preprocessing.scale(residuals.TOTCOST), line='s')
plt.show()

Theres a high heteroscedarcity because of outliers. The only way to reduce this is to remove the outliers or model them seperately

The fitted values vs residuals plot clearly shows heteroskedasticity, which in other words means increase of variance of predicted values with increase in the values of predictors. Lets confirm our assumption with Breush-Pagan Test, the test thats done to find conditional heteroskedasticity

from statsmodels.compat import lzip

name = ['Lagrange multiplier statistic', 'p-value',
        'f-value', 'f p-value']
test=sm.stats.diagnostic.het_breuschpagan(residuals.TOTCOST.values ,X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values)
lzip(name, test)

[('Lagrange multiplier statistic', 424.72777958369284),
 ('p-value', 9.5669769244734404e-87),
 ('f-value', 156.62898224179548),
 ('f p-value', 1.3537616000591543e-150)]

Null Hypothesis: Error terms are homoscedastic
Alternative Hypothesis: Error terms are heteroscedastic.

But since p value is almost 0 we can reject the null hypothesis. (we consider the null hypothesis when the p values is significantly greater than 0.05) Hherefore the residuals are heteroskedastic

Hence it is essential to transform the data to remove heteroskedasticity.

import math
trans_y=y.TOTCOST.apply(math.log)

#X['FEED/COWS']=X["FEED"]/X["COWS"]

from scipy import stats
import numpy as np
regr.fit(X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values, trans_y)

# Make predictions using the testing set
y_pred = regr.predict(X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values)

ax=sns.residplot(np.exp(y_pred),((np.exp(y_pred))-y.TOTCOST), lowess=True, color="g")
ax.set(xlabel='Fitted Values', ylabel='Residuals')
plt.show()

Heteroscedarcity is reduced to some extent

print("Mean squared error: %.2f"
      % mean_squared_error(y,np.exp(y_pred)))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y, np.exp(y_pred)))

Mean squared error: 43152197.97
Variance score: 0.98

#  K-fold cross validation
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score

regr = LinearRegression()
cv = ShuffleSplit(n_splits=5, test_size=0.25, random_state=0)
scores = cross_val_score(regr, X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values, y, scoring='neg_mean_squared_error', cv=cv)
print("regression model's mean squared errors of cross validation",-scores.mean())

regression model's mean squared errors of cross validation 36155603.4081

from sklearn.linear_model import Ridge

#Ridge regression model
ridgeReg = Ridge(alpha=0.005, normalize=True)
ridgeReg.fit( X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']],y)
y_ridge = ridgeReg.predict( X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']])

print("Mean squared error: %.2f"
      % mean_squared_error(y,(y_ridge)))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y, (y_ridge)))

Mean squared error: 38226728.02
Variance score: 0.98

n=0
scores=[0]*12
alpha=[1,.7,.5,.3,.1,.07,.05,.01,.007,.005,.001,.0005]
for i in alpha:
    ridgeReg = Ridge(alpha=i, normalize=True)
    cv = ShuffleSplit(n_splits=5, test_size=0.25, random_state=0)
    score = cross_val_score(ridgeReg, X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']].values, y, scoring='neg_mean_squared_error', cv=cv)
    scores[n]=score.mean()    
    print("mean of ridge regression model's mean squared errors of cross validation for alpha",i,"is",-score.mean())
    n+=1

mean of ridge regression model's mean squared errors of cross validation for alpha 1 is 286416914.072
mean of ridge regression model's mean squared errors of cross validation for alpha 0.7 is 227308857.491
mean of ridge regression model's mean squared errors of cross validation for alpha 0.5 is 178377302.417
mean of ridge regression model's mean squared errors of cross validation for alpha 0.3 is 119178513.708
mean of ridge regression model's mean squared errors of cross validation for alpha 0.1 is 54325638.7728
mean of ridge regression model's mean squared errors of cross validation for alpha 0.07 is 46295285.557
mean of ridge regression model's mean squared errors of cross validation for alpha 0.05 is 41803381.9534
mean of ridge regression model's mean squared errors of cross validation for alpha 0.01 is 36298325.6909
mean of ridge regression model's mean squared errors of cross validation for alpha 0.007 is 36172623.3459
mean of ridge regression model's mean squared errors of cross validation for alpha 0.005 is 36123704.8444
mean of ridge regression model's mean squared errors of cross validation for alpha 0.001 is 36128221.351
mean of ridge regression model's mean squared errors of cross validation for alpha 0.0005 is 36140374.2196

from sklearn.linear_model import Lasso

#Lasso regression model
lassoReg = Lasso(alpha=0.1, normalize=True)
lassoReg.fit( X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']],y)
y_lasso = lassoReg.predict( X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']])

print("Mean squared error: %.2f"
      % mean_squared_error(y,(y_lasso)))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y, (y_lasso)))

Mean squared error: 38107566.24
Variance score: 0.98

n=0
scores=[0]*14
alpha=[10,5,4,3,1,.7,.5,.3,.1,.07,.05,.01,.007,.005,.001,.0005]
for i in alpha:
    lassoReg = Lasso(alpha=i, normalize=True)
    cv = ShuffleSplit(n_splits=5, test_size=0.25, random_state=0)
    score = cross_val_score(lassoReg,  X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']], y, scoring='neg_mean_squared_error', cv=cv)
    scores[n]=score.mean()
    print("mean of Lasso regression model's mean squared errors of cross validation for alpha",i,"is",-score.mean())

mean of Lasso regression model's mean squared errors of cross validation for alpha 10 is 36206268.8699
mean of Lasso regression model's mean squared errors of cross validation for alpha 5 is 36184120.1548
mean of Lasso regression model's mean squared errors of cross validation for alpha 4 is 36180176.775
mean of Lasso regression model's mean squared errors of cross validation for alpha 3 is 36190093.3429
mean of Lasso regression model's mean squared errors of cross validation for alpha 1 is 36175759.3379
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.7 is 36152486.0021
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.5 is 36144847.427
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.3 is 36143985.6039
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.1 is 36149789.1527
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.07 is 36151233.374
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.05 is 36151922.9361
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.01 is 36154785.9287
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.007 is 36155029.7204
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.005 is 36155192.5921
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.001 is 36155520.6166
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.0005 is 36155562.0554

from sklearn.linear_model import ElasticNet

#Lasso regression model
elasReg = ElasticNet(alpha=0.01,l1_ratio=0.5, normalize=True)
elasReg.fit( X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']],y)
y_elas = elasReg.predict( X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']])

print("Mean squared error: %.2f"
      % mean_squared_error(y,(y_elas)))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y, (y_elas)))

Mean squared error: 583385575.12
Variance score: 0.67

n=0
scores=[0]*14
alpha=[10,5,4,3,1,.7,.5,.3,.1,.07,.05,.01,.007,.005,.001,.0005,0.0001,0.00007]
for i in alpha:
    elasReg = ElasticNet(alpha=i, normalize=True,l1_ratio=0.9)
    cv = ShuffleSplit(n_splits=5, test_size=0.25, random_state=0)
    score = cross_val_score(elasReg,  X[[ 'LNKWH','LNQ2','LNQ3','STOPS','LNSTAFF','LNNET','ELEC','TUNNEL','KWH']], y, scoring='neg_mean_squared_error', cv=cv)
    scores[n]=score.mean()
    print("mean of Lasso regression model's mean squared errors of cross validation for alpha",i,"is",-score.mean())

mean of Lasso regression model's mean squared errors of cross validation for alpha 10 is 1656474666.53
mean of Lasso regression model's mean squared errors of cross validation for alpha 5 is 1626337277.7
mean of Lasso regression model's mean squared errors of cross validation for alpha 4 is 1611679291.78
mean of Lasso regression model's mean squared errors of cross validation for alpha 3 is 1587834628.25
mean of Lasso regression model's mean squared errors of cross validation for alpha 1 is 1420293209.0
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.7 is 1330765631.58
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.5 is 1228650581.76
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.3 is 1046555169.88
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.1 is 642327072.034
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.07 is 536372438.729
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.05 is 451156103.454
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.01 is 165442987.971
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.007 is 124651961.324
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.005 is 95029239.0366
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.001 is 40881290.6039
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.0005 is 37372572.6236
mean of Lasso regression model's mean squared errors of cross validation for alpha 0.0001 is 36116715.3957
mean of Lasso regression model's mean squared errors of cross validation for alpha 7e-05 is 36107064.9865

Comparing the mean squared errors of Lasso, Ridge and regular regression, we find that lasso and ridge have lower mean squared error than simple linear regression

	id	YEAR	NI	STOPS	NETWORK	LABOREXP	STAFF	ELECEXP	KWH	TOTCOST	NARROW_T	T	Q1	Q2	Q3	CT	PL	PE	PK	VIRAGE	LABOR	ELEC	CAPITAL	LNCT	LNQ1	LNQ2	LNQ3	LNNET	LNPL	LNPE	LNPK	LNSTOP	LNCAP	RAILROAD
0	1	90	7	10	13107	1061	16.0	163	1631	2536	0	5	159000	2401000	179385	3255.456	85125.16	0.128291	3709.715	0	41.83754	6.427445	51.73502	10.141540	11.97666	14.69140	12.09729	2.573147	13.40533	-2.053454	10.272160	2.302585	2.650892	1
1	1	91	7	10	13107	1148	17.0	170	1428	2634	0	6	161000	2578000	196815	3162.065	81067.72	0.142914	3479.806	0	43.58390	6.454062	49.96203	10.004490	11.98916	14.76252	12.19002	2.573147	13.24855	-1.945510	10.100240	2.302585	2.650892	1
2	1	92	7	10	7351	1145	17.0	196	2000	2846	0	7	158000	2573000	209298	3190.583	75507.78	0.109865	3716.343	0	40.23190	6.886859	52.88124	10.276460	11.97035	14.76058	12.25151	1.994836	13.44049	-2.208499	10.428990	2.302585	2.650892	1
3	1	93	7	10	13107	1148	17.0	190	1305	2763	0	8	156000	2512000	121183	2964.592	72456.45	0.156217	3367.775	0	41.54904	6.876584	51.57438	9.851007	11.95761	14.73659	11.70506	2.573147	13.04725	-1.856512	9.978519	2.302585	2.650892	1
4	1	94	7	10	13107	1138	16.0	203	1236	2874	0	9	149000	2431000	124408	2978.239	73704.66	0.170196	3499.121	0	39.59638	7.063326	53.34029	9.769890	11.91170	14.70381	11.73132	2.573147	12.97862	-1.770802	9.931069	2.302585	2.650892	1
5	1	95	7	10	13107	1038	15.0	199	1226	2620	0	10	145000	2322000	56566	2689.938	71047.23	0.166649	3127.573	0	39.61832	7.595420	52.78626	9.689137	11.88449	14.65794	10.94316	2.573147	12.96296	-1.791863	9.839876	2.302585	2.650892	1
6	1	96	7	10	13107	1018	15.0	160	1163	2899	0	11	144000	2168000	115340	2922.379	68413.98	0.138685	3821.320	0	35.11556	5.519145	59.36530	9.955706	11.87757	14.58932	11.65564	2.573147	13.10888	-1.975552	10.223900	2.302585	2.650892	1
7	2	85	13	15	27024	2240	41.0	82	861	4006	1	0	245000	5291000	35196	5961.310	81300.81	0.141723	2303.265	1	55.91613	2.046930	42.03695	10.646920	12.40901	15.48152	10.46869	3.296725	13.25979	-1.953878	9.695961	2.708050	3.251079	2
8	2	86	13	15	27024	2424	42.0	152	1597	4575	1	1	301000	6379000	37433	6480.170	81748.28	0.134814	2980.468	1	52.98361	3.322404	43.69399	10.780360	12.61487	15.66852	10.53031	3.296725	13.31526	-2.003862	10.003700	2.708050	3.251079	2
9	2	87	13	15	27024	2558	46.0	180	1899	4875	1	2	336000	7522000	27382	6827.731	77883.33	0.132754	3150.523	1	52.47179	3.692308	43.83590	10.848000	12.72487	15.83334	10.21764	3.296725	13.28222	-2.019253	10.074580	2.708050	3.251079	2

	STOPS	NETWORK	STAFF	KWH	TOTCOST	NARROW_T	Q1	Q2	Q3	VIRAGE	LABOR	ELEC	CAPITAL	LNQ1	LNQ2	LNQ3	LNNET	LNPE	LNSTOP	LNCAP
0	10	13107	16.0	1631	2536	0	159000	2401000	179385	0	41.83754	6.427445	51.73502	11.97666	14.69140	12.097290	2.573147	-2.053454	2.302585	2.650892
1	10	13107	17.0	1428	2634	0	161000	2578000	196815	0	43.58390	6.454062	49.96203	11.98916	14.76252	12.190020	2.573147	-1.945510	2.302585	2.650892
2	10	7351	17.0	2000	2846	0	158000	2573000	209298	0	40.23190	6.886859	52.88124	11.97035	14.76058	12.251510	1.994836	-2.208499	2.302585	2.650892
3	10	13107	17.0	1305	2763	0	156000	2512000	121183	0	41.54904	6.876584	51.57438	11.95761	14.73659	11.705060	2.573147	-1.856512	2.302585	2.650892
4	10	13107	16.0	1236	2874	0	149000	2431000	124408	0	39.59638	7.063326	53.34029	11.91170	14.70381	11.731320	2.573147	-1.770802	2.302585	2.650892
5	10	13107	15.0	1226	2620	0	145000	2322000	56566	0	39.61832	7.595420	52.78626	11.88449	14.65794	10.943160	2.573147	-1.791863	2.302585	2.650892
6	10	13107	15.0	1163	2899	0	144000	2168000	115340	0	35.11556	5.519145	59.36530	11.87757	14.58932	11.655640	2.573147	-1.975552	2.302585	2.650892
7	15	27024	41.0	861	4006	1	245000	5291000	35196	1	55.91613	2.046930	42.03695	12.40901	15.48152	10.468690	3.296725	-1.953878	2.708050	3.251079
8	15	27024	42.0	1597	4575	1	301000	6379000	37433	1	52.98361	3.322404	43.69399	12.61487	15.66852	10.530310	3.296725	-2.003862	2.708050	3.251079
9	15	27024	46.0	1899	4875	1	336000	7522000	27382	1	52.47179	3.692308	43.83590	12.72487	15.83334	10.217640	3.296725	-2.019253	2.708050	3.251079
10	16	27024	45.0	1913	5132	1	341000	7575000	27273	1	52.02650	3.702260	44.27124	12.73964	15.84036	10.213650	3.296725	-1.991950	2.772589	3.251079
11	16	27024	47.0	1883	5450	1	332000	7868000	28009	1	53.70642	3.449541	42.84404	12.71289	15.87831	10.240280	3.296725	-2.012489	2.772589	3.251079
12	16	27024	47.0	2105	5872	1	373000	8438000	17733	1	56.72684	3.593324	39.67984	12.82933	15.94826	9.783182	3.296725	-2.050468	2.772589	3.251079
13	16	27024	49.0	2209	6820	1	411000	9064000	13610	1	56.14370	3.475073	40.38123	12.92635	16.01982	9.518560	3.296725	-2.049513	2.772589	3.251079
14	17	27024	51.0	2357	7229	1	433000	9130000	11849	1	57.98866	3.513626	38.49772	12.97849	16.02708	9.379999	3.296725	-2.113522	2.833213	3.251079
15	17	27024	50.0	2279	7115	1	419000	10600000	12556	1	60.42165	3.682361	35.89599	12.94563	16.17249	9.437954	3.296725	-2.092725	2.833213	3.251079
16	17	27024	49.0	2337	6911	1	431000	10100000	12090	1	61.06208	4.123860	34.81406	12.97386	16.12775	9.400134	3.296725	-2.068507	2.833213	3.251079
17	17	27024	48.0	2086	7433	1	428000	9875000	9598	1	54.83654	3.430647	41.73281	12.96688	16.10552	9.169310	3.296725	-2.075396	2.833213	3.251079
18	17	27024	46.0	2347	7284	1	428000	9817000	9875	1	58.49808	4.146074	37.35585	12.96688	16.09963	9.197762	3.296725	-2.042434	2.833213	3.251079
19	17	27024	49.0	2343	7157	1	429000	10100000	8622	1	59.95529	4.191700	35.85301	12.96921	16.12914	9.062073	3.296725	-2.055405	2.833213	3.251079

	STOPS	NETWORK	STAFF	KWH	Q1	Q2	Q3	LABOR	ELEC	CAPITAL	LNQ1	LNQ2	LNQ3	LNNET	LNPE	LNSTOP	LNCAP
0	10	13107	16.0	1631	159000	2401000	179385	41.83754	6.427445	51.73502	11.97666	14.69140	12.09729	2.573147	-2.053454	2.302585	2.650892
1	10	13107	17.0	1428	161000	2578000	196815	43.58390	6.454062	49.96203	11.98916	14.76252	12.19002	2.573147	-1.945510	2.302585	2.650892
2	10	7351	17.0	2000	158000	2573000	209298	40.23190	6.886859	52.88124	11.97035	14.76058	12.25151	1.994836	-2.208499	2.302585	2.650892
3	10	13107	17.0	1305	156000	2512000	121183	41.54904	6.876584	51.57438	11.95761	14.73659	11.70506	2.573147	-1.856512	2.302585	2.650892
4	10	13107	16.0	1236	149000	2431000	124408	39.59638	7.063326	53.34029	11.91170	14.70381	11.73132	2.573147	-1.770802	2.302585	2.650892

	STOPS	NETWORK	STAFF	KWH	TOTCOST	NARROW_T	RACK	TUNNEL	Q1	Q2	Q3	VIRAGE	LABOR	ELEC	CAPITAL	LNQ1	LNQ2	LNQ3	LNNET	LNPE	LNSTOP	LNCAP
STOPS	1.000000	0.937483	0.676245	0.587233	0.652653	0.221042	-0.138500	0.402401	0.833750	0.765818	0.191431	0.190760	0.220767	-0.024349	-0.220353	0.716570	0.649731	0.483496	0.774594	-0.222202	0.867703	0.011668
NETWORK	0.937483	1.000000	0.790394	0.708852	0.762797	0.075749	-0.091713	0.419698	0.871335	0.818496	0.311179	0.044869	0.236263	0.017797	-0.242405	0.669803	0.621301	0.554145	0.776215	-0.212689	0.720221	-0.077349
STAFF	0.676245	0.790394	1.000000	0.981292	0.981150	-0.096577	-0.085073	0.494285	0.917086	0.921017	0.783490	-0.121627	0.298010	0.078934	-0.314238	0.674856	0.661000	0.613199	0.656203	-0.153619	0.550021	-0.071705
KWH	0.587233	0.708852	0.981292	1.000000	0.981919	-0.174182	-0.108910	0.510177	0.883055	0.893069	0.855668	-0.199182	0.247461	0.178892	-0.277956	0.664089	0.650076	0.618470	0.619940	-0.126006	0.506215	-0.095261
TOTCOST	0.652653	0.762797	0.981150	0.981919	1.000000	-0.106021	-0.090182	0.512486	0.905470	0.918659	0.812322	-0.124588	0.252610	0.095354	-0.270637	0.679804	0.669411	0.611614	0.650564	-0.145257	0.544439	-0.068818
NARROW_T	0.221042	0.075749	-0.096577	-0.174182	-0.106021	1.000000	0.274255	-0.110358	-0.023946	-0.032331	-0.218845	0.910120	0.072376	-0.464770	-0.003653	0.111019	0.108543	-0.420119	0.129899	-0.198490	0.314526	0.503465
RACK	-0.138500	-0.091713	-0.085073	-0.108910	-0.090182	0.274255	1.000000	-0.007995	-0.175809	-0.139335	-0.103634	0.235738	0.205715	-0.116043	-0.191312	-0.217385	-0.121740	-0.342599	-0.087045	-0.004416	-0.158986	0.200514
TUNNEL	0.402401	0.419698	0.494285	0.510177	0.512486	-0.110358	-0.007995	1.000000	0.535252	0.617501	0.324352	-0.044377	-0.051990	0.275888	0.011328	0.480588	0.566684	0.352088	0.400839	0.042044	0.388774	0.125645
Q1	0.833750	0.871335	0.917086	0.883055	0.905470	-0.023946	-0.175809	0.535252	1.000000	0.965545	0.566255	-0.048813	0.229915	0.125315	-0.252108	0.821516	0.779904	0.623884	0.743461	-0.188072	0.714909	-0.063535
Q2	0.765818	0.818496	0.921017	0.893069	0.918659	-0.032331	-0.139335	0.617501	0.965545	1.000000	0.608059	-0.048558	0.234328	0.094527	-0.251962	0.744904	0.756053	0.582984	0.670537	-0.175236	0.625587	0.041099
Q3	0.191431	0.311179	0.783490	0.855668	0.812322	-0.218845	-0.103634	0.324352	0.566255	0.608059	1.000000	-0.242653	0.176697	0.122147	-0.197632	0.385020	0.374809	0.452011	0.322399	-0.029540	0.214631	-0.105037
VIRAGE	0.190760	0.044869	-0.121627	-0.199182	-0.124588	0.910120	0.235738	-0.044377	-0.048813	-0.048558	-0.242653	1.000000	-0.000686	-0.501909	0.076059	0.088650	0.090084	-0.399277	0.086584	-0.196262	0.261437	0.512594
LABOR	0.220767	0.236263	0.298010	0.247461	0.252610	0.072376	0.205715	-0.051990	0.229915	0.234328	0.176697	-0.000686	1.000000	-0.171082	-0.988997	0.166294	0.212082	0.142532	0.243992	-0.231094	0.192714	0.287272
ELEC	-0.024349	0.017797	0.078934	0.178892	0.095354	-0.464770	-0.116043	0.275888	0.125315	0.094527	0.122147	-0.501909	-0.171082	1.000000	0.023443	0.238740	0.250492	0.294408	0.168856	0.246700	0.086106	-0.248853
CAPITAL	-0.220353	-0.242405	-0.314238	-0.277956	-0.270637	-0.003653	-0.191312	0.011328	-0.252108	-0.251962	-0.197632	0.076059	-0.988997	0.023443	1.000000	-0.204583	-0.252808	-0.188831	-0.272929	0.197445	-0.208473	-0.254125
LNQ1	0.716570	0.669803	0.674856	0.664089	0.679804	0.111019	-0.217385	0.480588	0.821516	0.744904	0.385020	0.088650	0.166294	0.238740	-0.204583	1.000000	0.953608	0.642355	0.879468	-0.098619	0.840233	0.005468
LNQ2	0.649731	0.621301	0.661000	0.650076	0.669411	0.108543	-0.121740	0.566684	0.779904	0.756053	0.374809	0.090084	0.212082	0.250492	-0.252808	0.953608	1.000000	0.600872	0.825550	-0.114533	0.761829	0.152731
LNQ3	0.483496	0.554145	0.613199	0.618470	0.611614	-0.420119	-0.342599	0.352088	0.623884	0.582984	0.452011	-0.399277	0.142532	0.294408	-0.188831	0.642355	0.600872	1.000000	0.657063	0.109996	0.457153	-0.287333
LNNET	0.774594	0.776215	0.656203	0.619940	0.650564	0.129899	-0.087045	0.400839	0.743461	0.670537	0.322399	0.086584	0.243992	0.168856	-0.272929	0.879468	0.825550	0.657063	1.000000	-0.055171	0.851932	-0.043606
LNPE	-0.222202	-0.212689	-0.153619	-0.126006	-0.145257	-0.198490	-0.004416	0.042044	-0.188072	-0.175236	-0.029540	-0.196262	-0.231094	0.246700	0.197445	-0.098619	-0.114533	0.109996	-0.055171	1.000000	-0.144047	-0.110212
LNSTOP	0.867703	0.720221	0.550021	0.506215	0.544439	0.314526	-0.158986	0.388774	0.714909	0.625587	0.214631	0.261437	0.192714	0.086106	-0.208473	0.840233	0.761829	0.457153	0.851932	-0.144047	1.000000	0.093964
LNCAP	0.011668	-0.077349	-0.071705	-0.095261	-0.068818	0.503465	0.200514	0.125645	-0.063535	0.041099	-0.105037	0.512594	0.287272	-0.248853	-0.254125	0.005468	0.152731	-0.287333	-0.043606	-0.110212	0.093964	1.000000

	STOPS	NETWORK	STAFF	KWH	NARROW_T	RACK	TUNNEL	Q1	Q2	Q3	VIRAGE	LABOR	ELEC	CAPITAL	LNQ1	LNQ2	LNQ3	LNNET	LNPE	LNSTOP	LNCAP	LNKWH	LNLABOR
STOPS	1.000000	0.937483	0.676245	0.587233	0.221042	-0.138500	0.402401	0.833750	0.765818	0.191431	0.190760	0.220767	-0.024349	-0.220353	0.716570	0.649731	0.483496	0.774594	-0.222202	0.867703	0.011668	0.645946	0.215464
NETWORK	0.937483	1.000000	0.790394	0.708852	0.075749	-0.091713	0.419698	0.871335	0.818496	0.311179	0.044869	0.236263	0.017797	-0.242405	0.669803	0.621301	0.554145	0.776215	-0.212689	0.720221	-0.077349	0.668158	0.230529
STAFF	0.676245	0.790394	1.000000	0.981292	-0.096577	-0.085073	0.494285	0.917086	0.921017	0.783490	-0.121627	0.298010	0.078934	-0.314238	0.674856	0.661000	0.613199	0.656203	-0.153619	0.550021	-0.071705	0.735178	0.275997
KWH	0.587233	0.708852	0.981292	1.000000	-0.174182	-0.108910	0.510177	0.883055	0.893069	0.855668	-0.199182	0.247461	0.178892	-0.277956	0.664089	0.650076	0.618470	0.619940	-0.126006	0.506215	-0.095261	0.741588	0.229972
NARROW_T	0.221042	0.075749	-0.096577	-0.174182	1.000000	0.274255	-0.110358	-0.023946	-0.032331	-0.218845	0.910120	0.072376	-0.464770	-0.003653	0.111019	0.108543	-0.420119	0.129899	-0.198490	0.314526	0.503465	-0.059722	0.060680
RACK	-0.138500	-0.091713	-0.085073	-0.108910	0.274255	1.000000	-0.007995	-0.175809	-0.139335	-0.103634	0.235738	0.205715	-0.116043	-0.191312	-0.217385	-0.121740	-0.342599	-0.087045	-0.004416	-0.158986	0.200514	-0.038721	0.208971
TUNNEL	0.402401	0.419698	0.494285	0.510177	-0.110358	-0.007995	1.000000	0.535252	0.617501	0.324352	-0.044377	-0.051990	0.275888	0.011328	0.480588	0.566684	0.352088	0.400839	0.042044	0.388774	0.125645	0.546056	-0.053777
Q1	0.833750	0.871335	0.917086	0.883055	-0.023946	-0.175809	0.535252	1.000000	0.965545	0.566255	-0.048813	0.229915	0.125315	-0.252108	0.821516	0.779904	0.623884	0.743461	-0.188072	0.714909	-0.063535	0.804606	0.216167
Q2	0.765818	0.818496	0.921017	0.893069	-0.032331	-0.139335	0.617501	0.965545	1.000000	0.608059	-0.048558	0.234328	0.094527	-0.251962	0.744904	0.756053	0.582984	0.670537	-0.175236	0.625587	0.041099	0.760654	0.220269
Q3	0.191431	0.311179	0.783490	0.855668	-0.218845	-0.103634	0.324352	0.566255	0.608059	1.000000	-0.242653	0.176697	0.122147	-0.197632	0.385020	0.374809	0.452011	0.322399	-0.029540	0.214631	-0.105037	0.466487	0.158260
VIRAGE	0.190760	0.044869	-0.121627	-0.199182	0.910120	0.235738	-0.044377	-0.048813	-0.048558	-0.242653	1.000000	-0.000686	-0.501909	0.076059	0.088650	0.090084	-0.399277	0.086584	-0.196262	0.261437	0.512594	-0.077286	-0.006375
LABOR	0.220767	0.236263	0.298010	0.247461	0.072376	0.205715	-0.051990	0.229915	0.234328	0.176697	-0.000686	1.000000	-0.171082	-0.988997	0.166294	0.212082	0.142532	0.243992	-0.231094	0.192714	0.287272	0.217572	0.988299
ELEC	-0.024349	0.017797	0.078934	0.178892	-0.464770	-0.116043	0.275888	0.125315	0.094527	0.122147	-0.501909	-0.171082	1.000000	0.023443	0.238740	0.250492	0.294408	0.168856	0.246700	0.086106	-0.248853	0.419484	-0.150283
CAPITAL	-0.220353	-0.242405	-0.314238	-0.277956	-0.003653	-0.191312	0.011328	-0.252108	-0.251962	-0.197632	0.076059	-0.988997	0.023443	1.000000	-0.204583	-0.252808	-0.188831	-0.272929	0.197445	-0.208473	-0.254125	-0.283753	-0.980247
LNQ1	0.716570	0.669803	0.674856	0.664089	0.111019	-0.217385	0.480588	0.821516	0.744904	0.385020	0.088650	0.166294	0.238740	-0.204583	1.000000	0.953608	0.642355	0.879468	-0.098619	0.840233	0.005468	0.917466	0.159526
LNQ2	0.649731	0.621301	0.661000	0.650076	0.108543	-0.121740	0.566684	0.779904	0.756053	0.374809	0.090084	0.212082	0.250492	-0.252808	0.953608	1.000000	0.600872	0.825550	-0.114533	0.761829	0.152731	0.932390	0.204090
LNQ3	0.483496	0.554145	0.613199	0.618470	-0.420119	-0.342599	0.352088	0.623884	0.582984	0.452011	-0.399277	0.142532	0.294408	-0.188831	0.642355	0.600872	1.000000	0.657063	0.109996	0.457153	-0.287333	0.675537	0.137904
LNNET	0.774594	0.776215	0.656203	0.619940	0.129899	-0.087045	0.400839	0.743461	0.670537	0.322399	0.086584	0.243992	0.168856	-0.272929	0.879468	0.825550	0.657063	1.000000	-0.055171	0.851932	-0.043606	0.852342	0.250791
LNPE	-0.222202	-0.212689	-0.153619	-0.126006	-0.198490	-0.004416	0.042044	-0.188072	-0.175236	-0.029540	-0.196262	-0.231094	0.246700	0.197445	-0.098619	-0.114533	0.109996	-0.055171	1.000000	-0.144047	-0.110212	-0.096647	-0.235935
LNSTOP	0.867703	0.720221	0.550021	0.506215	0.314526	-0.158986	0.388774	0.714909	0.625587	0.214631	0.261437	0.192714	0.086106	-0.208473	0.840233	0.761829	0.457153	0.851932	-0.144047	1.000000	0.093964	0.730809	0.194081
LNCAP	0.011668	-0.077349	-0.071705	-0.095261	0.503465	0.200514	0.125645	-0.063535	0.041099	-0.105037	0.512594	0.287272	-0.248853	-0.254125	0.005468	0.152731	-0.287333	-0.043606	-0.110212	0.093964	1.000000	-0.029449	0.295729
LNKWH	0.645946	0.668158	0.735178	0.741588	-0.059722	-0.038721	0.546056	0.804606	0.760654	0.466487	-0.077286	0.217572	0.419484	-0.283753	0.917466	0.932390	0.675537	0.852342	-0.096647	0.730809	-0.029449	1.000000	0.214477
LNLABOR	0.215464	0.230529	0.275997	0.229972	0.060680	0.208971	-0.053777	0.216167	0.220269	0.158260	-0.006375	0.988299	-0.150283	-0.980247	0.159526	0.204090	0.137904	0.250791	-0.235935	0.194081	0.295729	0.214477	1.000000

Ram Annepu

Multiple linear regression, assumptions and regularization

Ordinary linear regression and its assumptions, weighted linear regression, Lasso regression, Ridge regression and Elastic Net Regression¶

All machine learning algorithms involve minimizing or maximizing a function or equation, called objective function. In other words, every machine learning problem or algorithm is an optimisation problem with a objective function and constraints.¶

In regression, the objective function is mean squared error and in deep learning, we use log loss function.¶

y=bx+b0¶

sum of mean squared error is given by¶

E=1/N Σ (Y - y)² Here Y is actual value

E=1/N Σ (Y - bx+b0)²

Now to find the b0 and b at which the Error is minimum, we differentiate with respect to b and b0 and equate it to zero. ()¶

Hence b is similar to the correlation between x and Y. this can be reworded as, slope of the line is nothing but the correlation between the independent and dependent variable. sounds logical, isn't it.¶

Assumptions of the Linear regression:¶

Assumptions for residuals:¶

Why use shrinkage/regularization factors?¶

As suggested by Richard Berk In the book "Regression analysis: a constructive critique", simple linear regression works as descriptive tool, whereas regularized regression techniques can be used as predictive tool for more accurate prediction.¶

	TOTCOST
0	8642.628229
1	14713.468931
2	-4785.084996
3	23131.884959
4	26814.409419
5	33722.484531
6	14089.781447
7	17958.066546
8	8916.559831
9	9087.074513
10	8838.770827
11	7803.208003
12	7896.610487
13	7743.618172
14	5581.341016
15	10990.812449
16	13174.065173
17	12428.285829
18	14191.883676
19	15001.505907
20	-2706.361328
21	-1845.737073
22	771.011074
23	1533.656284
24	3171.826606
...	...
579	5776.949219
580	3829.302964
581	3718.979889
582	2831.112492
583	5826.401578
584	4863.488437
585	6470.024378
586	5887.457925
587	14584.535539
588	21948.526936
589	22235.092505
590	30021.173760
591	-9814.147500
592	-10548.479986
593	-10062.786050
594	-5584.659203
595	5024.335713
596	24610.584171
597	41102.762672
598	52491.320880
599	54910.146740
600	55318.807911
601	48855.484514
602	50067.596876
603	47469.493397

Ram Annepu

Multiple linear regression, assumptions and regularization

Ordinary linear regression and its assumptions, weighted linear regression, Lasso regression, Ridge regression and Elastic Net Regression¶

All machine learning algorithms involve minimizing or maximizing a function or equation, called objective function. In other words, every machine learning problem or algorithm is an optimisation problem with a objective function and constraints.¶

In regression, the objective function is mean squared error and in deep learning, we use log loss function.¶

y=bx+b0¶

sum of mean squared error is given by¶

E=1/N Σ (Y - y)2 Here Y is actual value

E=1/N Σ (Y - bx+b0)2

Now to find the b0 and b at which the Error is minimum, we differentiate with respect to b and b0 and equate it to zero. ()¶

Hence b is similar to the correlation between x and Y. this can be reworded as, slope of the line is nothing but the correlation between the independent and dependent variable. sounds logical, isn't it.¶

Assumptions of the Linear regression:¶

Assumptions for residuals:¶

Why use shrinkage/regularization factors?¶

As suggested by Richard Berk In the book "Regression analysis: a constructive critique", simple linear regression works as descriptive tool, whereas regularized regression techniques can be used as predictive tool for more accurate prediction.¶

E=1/N Σ (Y - y)² Here Y is actual value

E=1/N Σ (Y - bx+b0)²