Weighted Least SquaresΒΆ

Link to Notebook GitHub

In [1]:
from __future__ import print_function
import numpy as np
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.sandbox.regression.predstd import wls_prediction_std
from statsmodels.iolib.table import (SimpleTable, default_txt_fmt)
np.random.seed(1024)

WLS Estimation

Artificial data: Heteroscedasticity 2 groups

Model assumptions:

  • Misspecification: true model is quadratic, estimate only linear
  • Independent noise/error term
  • Two groups for error variance, low and high variance groups
In [2]:
nsample = 50
x = np.linspace(0, 20, nsample)
X = np.column_stack((x, (x - 5)**2))
X = sm.add_constant(X)
beta = [5., 0.5, -0.01]
sig = 0.5
w = np.ones(nsample)
w[nsample * 6/10:] = 3
y_true = np.dot(X, beta)
e = np.random.normal(size=nsample)
y = y_true + sig * w * e
X = X[:,[0,1]]

WLS knowing the true variance ratio of heteroscedasticity

In [3]:
mod_wls = sm.WLS(y, X, weights=1./w)
res_wls = mod_wls.fit()
print(res_wls.summary())
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.910
Model:                            WLS   Adj. R-squared:                  0.909
Method:                 Least Squares   F-statistic:                     487.9
Date:                Wed, 27 Apr 2016   Prob (F-statistic):           8.52e-27
Time:                        01:52:23   Log-Likelihood:                -57.048
No. Observations:                  50   AIC:                             118.1
Df Residuals:                      48   BIC:                             121.9
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          5.2726      0.185     28.488      0.000         4.900     5.645
x1             0.4379      0.020     22.088      0.000         0.398     0.478
==============================================================================
Omnibus:                        5.040   Durbin-Watson:                   2.242
Prob(Omnibus):                  0.080   Jarque-Bera (JB):                6.431
Skew:                           0.024   Prob(JB):                       0.0401
Kurtosis:                       4.756   Cond. No.                         17.0
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

OLS vs. WLS

Estimate an OLS model for comparison:

In [4]:
res_ols = sm.OLS(y, X).fit()
print(res_ols.params)
print(res_wls.params)
[ 5.2426  0.4349]
[ 5.2726  0.4379]

Compare the WLS standard errors to heteroscedasticity corrected OLS standard errors:

In [5]:
se = np.vstack([[res_wls.bse], [res_ols.bse], [res_ols.HC0_se],
                [res_ols.HC1_se], [res_ols.HC2_se], [res_ols.HC3_se]])
se = np.round(se,4)
colnames = ['x1', 'const']
rownames = ['WLS', 'OLS', 'OLS_HC0', 'OLS_HC1', 'OLS_HC3', 'OLS_HC3']
tabl = SimpleTable(se, colnames, rownames, txt_fmt=default_txt_fmt)
print(tabl)
=====================
          x1   const
---------------------
WLS     0.1851 0.0198
OLS     0.2707 0.0233
OLS_HC0 0.194  0.0281
OLS_HC1 0.198  0.0287
OLS_HC3 0.2003 0.029
OLS_HC3 0.207   0.03
---------------------

Calculate OLS prediction interval:

In [6]:
covb = res_ols.cov_params()
prediction_var = res_ols.mse_resid + (X * np.dot(covb,X.T).T).sum(1)
prediction_std = np.sqrt(prediction_var)
tppf = stats.t.ppf(0.975, res_ols.df_resid)
In [7]:
prstd_ols, iv_l_ols, iv_u_ols = wls_prediction_std(res_ols)

Draw a plot to compare predicted values in WLS and OLS:

In [8]:
prstd, iv_l, iv_u = wls_prediction_std(res_wls)

fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'o', label="Data")
ax.plot(x, y_true, 'b-', label="True")
# OLS
ax.plot(x, res_ols.fittedvalues, 'r--')
ax.plot(x, iv_u_ols, 'r--', label="OLS")
ax.plot(x, iv_l_ols, 'r--')
# WLS
ax.plot(x, res_wls.fittedvalues, 'g--.')
ax.plot(x, iv_u, 'g--', label="WLS")
ax.plot(x, iv_l, 'g--')
ax.legend(loc="best");
Error in callback <function post_execute at 0x7f89c987e578> (for post_execute):

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/matplotlib/pyplot.pyc in post_execute()
    145             def post_execute():
    146                 if matplotlib.is_interactive():
--> 147                     draw_all()
    148 
    149             # IPython >= 2

/usr/lib/python2.7/dist-packages/matplotlib/_pylab_helpers.pyc in draw_all(cls, force)
    148         for f_mgr in cls.get_all_fig_managers():
    149             if force or f_mgr.canvas.figure.stale:
--> 150                 f_mgr.canvas.draw_idle()
    151 
    152 atexit.register(Gcf.destroy_all)

/usr/lib/python2.7/dist-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs)
   2024         if not self._is_idle_drawing:
   2025             with self._idle_draw_cntx():
-> 2026                 self.draw(*args, **kwargs)
   2027 
   2028     def draw_cursor(self, event):

/usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in draw(self)
    472 
    473         try:
--> 474             self.figure.draw(self.renderer)
    475         finally:
    476             RendererAgg.lock.release()

/usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     59     def draw_wrapper(artist, renderer, *args, **kwargs):
     60         before(artist, renderer)
---> 61         draw(artist, renderer, *args, **kwargs)
     62         after(artist, renderer)
     63 

/usr/lib/python2.7/dist-packages/matplotlib/figure.pyc in draw(self, renderer)
   1157         dsu.sort(key=itemgetter(0))
   1158         for zorder, a, func, args in dsu:
-> 1159             func(*args)
   1160 
   1161         renderer.close_group('figure')

/usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     59     def draw_wrapper(artist, renderer, *args, **kwargs):
     60         before(artist, renderer)
---> 61         draw(artist, renderer, *args, **kwargs)
     62         after(artist, renderer)
     63 

/usr/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe)
   2322 
   2323         for zorder, a in dsu:
-> 2324             a.draw(renderer)
   2325 
   2326         renderer.close_group('axes')

/usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     59     def draw_wrapper(artist, renderer, *args, **kwargs):
     60         before(artist, renderer)
---> 61         draw(artist, renderer, *args, **kwargs)
     62         after(artist, renderer)
     63 

/usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs)
   1106         ticks_to_draw = self._update_ticks(renderer)
   1107         ticklabelBoxes, ticklabelBoxes2 = self._get_tick_bboxes(ticks_to_draw,
-> 1108                                                                 renderer)
   1109 
   1110         for tick in ticks_to_draw:

/usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in _get_tick_bboxes(self, ticks, renderer)
   1056         for tick in ticks:
   1057             if tick.label1On and tick.label1.get_visible():
-> 1058                 extent = tick.label1.get_window_extent(renderer)
   1059                 ticklabelBoxes.append(extent)
   1060             if tick.label2On and tick.label2.get_visible():

/usr/lib/python2.7/dist-packages/matplotlib/text.pyc in get_window_extent(self, renderer, dpi)
    959             raise RuntimeError('Cannot get window extent w/o renderer')
    960 
--> 961         bbox, info, descent = self._get_layout(self._renderer)
    962         x, y = self.get_unitless_position()
    963         x, y = self.get_transform().transform_point((x, y))

/usr/lib/python2.7/dist-packages/matplotlib/text.pyc in _get_layout(self, renderer)
    350         tmp, lp_h, lp_bl = renderer.get_text_width_height_descent('lp',
    351                                                          self._fontproperties,
--> 352                                                          ismath=False)
    353         offsety = (lp_h - lp_bl) * self._linespacing
    354 

/usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in get_text_width_height_descent(self, s, prop, ismath)
    227             fontsize = prop.get_size_in_points()
    228             w, h, d = texmanager.get_text_width_height_descent(s, fontsize,
--> 229                                                                renderer=self)
    230             return w, h, d
    231 

/usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in get_text_width_height_descent(self, tex, fontsize, renderer)
    673         else:
    674             # use dviread. It sometimes returns a wrong descent.
--> 675             dvifile = self.make_dvi(tex, fontsize)
    676             dvi = dviread.Dvi(dvifile, 72 * dpi_fraction)
    677             try:

/usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in make_dvi(self, tex, fontsize)
    420                      'string:\n%s\nHere is the full report generated by '
    421                      'LaTeX: \n\n' % repr(tex.encode('unicode_escape')) +
--> 422                      report))
    423             else:
    424                 mpl.verbose.report(report, 'debug')

RuntimeError: LaTeX was not able to process the following string:
'lp'
Here is the full report generated by LaTeX:

<matplotlib.figure.Figure at 0x7f89bd291750>

Feasible Weighted Least Squares (2-stage FWLS)

In [9]:
resid1 = res_ols.resid[w==1.]
var1 = resid1.var(ddof=int(res_ols.df_model)+1)
resid2 = res_ols.resid[w!=1.]
var2 = resid2.var(ddof=int(res_ols.df_model)+1)
w_est = w.copy()
w_est[w!=1.] = np.sqrt(var2) / np.sqrt(var1)
res_fwls = sm.WLS(y, X, 1./w_est).fit()
print(res_fwls.summary())
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.914
Model:                            WLS   Adj. R-squared:                  0.912
Method:                 Least Squares   F-statistic:                     507.1
Date:                Wed, 27 Apr 2016   Prob (F-statistic):           3.65e-27
Time:                        01:52:24   Log-Likelihood:                -55.777
No. Observations:                  50   AIC:                             115.6
Df Residuals:                      48   BIC:                             119.4
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          5.2710      0.177     29.828      0.000         4.916     5.626
x1             0.4390      0.019     22.520      0.000         0.400     0.478
==============================================================================
Omnibus:                        4.076   Durbin-Watson:                   2.251
Prob(Omnibus):                  0.130   Jarque-Bera (JB):                4.336
Skew:                           0.003   Prob(JB):                        0.114
Kurtosis:                       4.443   Cond. No.                         16.5
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.