Generalized Linear ModelsΒΆ
from __future__ import print_function
import numpy as np
import statsmodels.api as sm
from scipy import stats
from matplotlib import pyplot as plt
GLM: Binomial response data
Load data
In this example, we use the Star98 dataset which was taken with permission from Jeff Gill (2000) Generalized linear models: A unified approach. Codebook information can be obtained by typing:
print(sm.datasets.star98.NOTE)
:: Number of Observations - 303 (counties in California). Number of Variables - 13 and 8 interaction terms. Definition of variables names:: NABOVE - Total number of students above the national median for the math section. NBELOW - Total number of students below the national median for the math section. LOWINC - Percentage of low income students PERASIAN - Percentage of Asian student PERBLACK - Percentage of black students PERHISP - Percentage of Hispanic students PERMINTE - Percentage of minority teachers AVYRSEXP - Sum of teachers' years in educational service divided by the number of teachers. AVSALK - Total salary budget including benefits divided by the number of full-time teachers (in thousands) PERSPENK - Per-pupil spending (in thousands) PTRATIO - Pupil-teacher ratio. PCTAF - Percentage of students taking UC/CSU prep courses PCTCHRT - Percentage of charter schools PCTYRRND - Percentage of year-round schools The below variables are interaction terms of the variables defined above. PERMINTE_AVYRSEXP PEMINTE_AVSAL AVYRSEXP_AVSAL PERSPEN_PTRATIO PERSPEN_PCTAF PTRATIO_PCTAF PERMINTE_AVTRSEXP_AVSAL PERSPEN_PTRATIO_PCTAF
Load the data and add a constant to the exogenous (independent) variables:
data = sm.datasets.star98.load()
data.exog = sm.add_constant(data.exog, prepend=False)
The dependent variable is N by 2 (Success: NABOVE, Failure: NBELOW):
print(data.endog[:5,:])
[[ 452. 355.] [ 144. 40.] [ 337. 234.] [ 395. 178.] [ 8. 57.]]
The independent variables include all the other variables described above, as well as the interaction terms:
print(data.exog[:2,:])
[[ 3.43973000e+01 2.32993000e+01 1.42352800e+01 1.14111200e+01 1.59183700e+01 1.47064600e+01 5.91573200e+01 4.44520700e+00 2.17102500e+01 5.70327600e+01 0.00000000e+00 2.22222200e+01 2.34102872e+02 9.41688110e+02 8.69994800e+02 9.65065600e+01 2.53522420e+02 1.23819550e+03 1.38488985e+04 5.50403520e+03 1.00000000e+00] [ 1.73650700e+01 2.93283800e+01 8.23489700e+00 9.31488400e+00 1.36363600e+01 1.60832400e+01 5.95039700e+01 5.26759800e+00 2.04427800e+01 6.46226400e+01 0.00000000e+00 0.00000000e+00 2.19316851e+02 8.11417560e+02 9.57016600e+02 1.07684350e+02 3.40406090e+02 1.32106640e+03 1.30502233e+04 6.95884680e+03 1.00000000e+00]]
Fit and summary
glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial())
res = glm_binom.fit()
print(res.summary())
Generalized Linear Model Regression Results ============================================================================== Dep. Variable: ['y1', 'y2'] No. Observations: 303 Model: GLM Df Residuals: 282 Model Family: Binomial Df Model: 20 Link Function: logit Scale: 1.0 Method: IRLS Log-Likelihood: -2998.6 Date: Wed, 27 Apr 2016 Deviance: 4078.8 Time: 23:33:06 Pearson chi2: 4.05e+03 No. Iterations: 7 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 -0.0168 0.000 -38.749 0.000 -0.018 -0.016 x2 0.0099 0.001 16.505 0.000 0.009 0.011 x3 -0.0187 0.001 -25.182 0.000 -0.020 -0.017 x4 -0.0142 0.000 -32.818 0.000 -0.015 -0.013 x5 0.2545 0.030 8.498 0.000 0.196 0.313 x6 0.2407 0.057 4.212 0.000 0.129 0.353 x7 0.0804 0.014 5.775 0.000 0.053 0.108 x8 -1.9522 0.317 -6.162 0.000 -2.573 -1.331 x9 -0.3341 0.061 -5.453 0.000 -0.454 -0.214 x10 -0.1690 0.033 -5.169 0.000 -0.233 -0.105 x11 0.0049 0.001 3.921 0.000 0.002 0.007 x12 -0.0036 0.000 -15.878 0.000 -0.004 -0.003 x13 -0.0141 0.002 -7.391 0.000 -0.018 -0.010 x14 -0.0040 0.000 -8.450 0.000 -0.005 -0.003 x15 -0.0039 0.001 -4.059 0.000 -0.006 -0.002 x16 0.0917 0.015 6.321 0.000 0.063 0.120 x17 0.0490 0.007 6.574 0.000 0.034 0.064 x18 0.0080 0.001 5.362 0.000 0.005 0.011 x19 0.0002 2.99e-05 7.428 0.000 0.000 0.000 x20 -0.0022 0.000 -6.445 0.000 -0.003 -0.002 const 2.9589 1.547 1.913 0.056 -0.073 5.990 ==============================================================================
Quantities of interest
print('Total number of trials:', data.endog[0].sum())
print('Parameters: ', res.params)
print('T-values: ', res.tvalues)
Total number of trials: 807.0 Parameters: [ -1.68150366e-02 9.92547661e-03 -1.87242148e-02 -1.42385609e-02 2.54487173e-01 2.40693664e-01 8.04086739e-02 -1.95216050e+00 -3.34086475e-01 -1.69022168e-01 4.91670212e-03 -3.57996435e-03 -1.40765648e-02 -4.00499176e-03 -3.90639579e-03 9.17143006e-02 4.89898381e-02 8.04073890e-03 2.22009503e-04 -2.24924861e-03 2.95887793e+00] T-values: [-38.74908321 16.50473627 -25.1821894 -32.81791308 8.49827113 4.21247925 5.7749976 -6.16191078 -5.45321673 -5.16865445 3.92119964 -15.87825999 -7.39093058 -8.44963886 -4.05916246 6.3210987 6.57434662 5.36229044 7.42806363 -6.44513698 1.91301155]
First differences: We hold all explanatory variables constant at their means and manipulate the percentage of low income households to assess its impact on the response variables:
means = data.exog.mean(axis=0)
means25 = means.copy()
means25[0] = stats.scoreatpercentile(data.exog[:,0], 25)
means75 = means.copy()
means75[0] = lowinc_75per = stats.scoreatpercentile(data.exog[:,0], 75)
resp_25 = res.predict(means25)
resp_75 = res.predict(means75)
diff = resp_75 - resp_25
The interquartile first difference for the percentage of low income households in a school district is:
print("%2.4f%%" % (diff*100))
-11.8753%
Plots
We extract information that will be used to draw some interesting plots:
nobs = res.nobs
y = data.endog[:,0]/data.endog.sum(1)
yhat = res.mu
Plot yhat vs y:
from statsmodels.graphics.api import abline_plot
fig, ax = plt.subplots()
ax.scatter(yhat, y)
line_fit = sm.OLS(y, sm.add_constant(yhat, prepend=True)).fit()
abline_plot(model_results=line_fit, ax=ax)
ax.set_title('Model Fit Plot')
ax.set_ylabel('Observed values')
ax.set_xlabel('Fitted values');
Error in callback <function post_execute at 0x7fb786441500> (for post_execute):
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /usr/lib/python2.7/dist-packages/matplotlib/pyplot.pyc in post_execute() 145 def post_execute(): 146 if matplotlib.is_interactive(): --> 147 draw_all() 148 149 # IPython >= 2 /usr/lib/python2.7/dist-packages/matplotlib/_pylab_helpers.pyc in draw_all(cls, force) 148 for f_mgr in cls.get_all_fig_managers(): 149 if force or f_mgr.canvas.figure.stale: --> 150 f_mgr.canvas.draw_idle() 151 152 atexit.register(Gcf.destroy_all) /usr/lib/python2.7/dist-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs) 2024 if not self._is_idle_drawing: 2025 with self._idle_draw_cntx(): -> 2026 self.draw(*args, **kwargs) 2027 2028 def draw_cursor(self, event): /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in draw(self) 472 473 try: --> 474 self.figure.draw(self.renderer) 475 finally: 476 RendererAgg.lock.release() /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/figure.pyc in draw(self, renderer) 1157 dsu.sort(key=itemgetter(0)) 1158 for zorder, a, func, args in dsu: -> 1159 func(*args) 1160 1161 renderer.close_group('figure') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe) 2322 2323 for zorder, a in dsu: -> 2324 a.draw(renderer) 2325 2326 renderer.close_group('axes') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs) 1106 ticks_to_draw = self._update_ticks(renderer) 1107 ticklabelBoxes, ticklabelBoxes2 = self._get_tick_bboxes(ticks_to_draw, -> 1108 renderer) 1109 1110 for tick in ticks_to_draw: /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in _get_tick_bboxes(self, ticks, renderer) 1056 for tick in ticks: 1057 if tick.label1On and tick.label1.get_visible(): -> 1058 extent = tick.label1.get_window_extent(renderer) 1059 ticklabelBoxes.append(extent) 1060 if tick.label2On and tick.label2.get_visible(): /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in get_window_extent(self, renderer, dpi) 959 raise RuntimeError('Cannot get window extent w/o renderer') 960 --> 961 bbox, info, descent = self._get_layout(self._renderer) 962 x, y = self.get_unitless_position() 963 x, y = self.get_transform().transform_point((x, y)) /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in _get_layout(self, renderer) 350 tmp, lp_h, lp_bl = renderer.get_text_width_height_descent('lp', 351 self._fontproperties, --> 352 ismath=False) 353 offsety = (lp_h - lp_bl) * self._linespacing 354 /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in get_text_width_height_descent(self, s, prop, ismath) 227 fontsize = prop.get_size_in_points() 228 w, h, d = texmanager.get_text_width_height_descent(s, fontsize, --> 229 renderer=self) 230 return w, h, d 231 /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in get_text_width_height_descent(self, tex, fontsize, renderer) 673 else: 674 # use dviread. It sometimes returns a wrong descent. --> 675 dvifile = self.make_dvi(tex, fontsize) 676 dvi = dviread.Dvi(dvifile, 72 * dpi_fraction) 677 try: /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in make_dvi(self, tex, fontsize) 420 'string:\n%s\nHere is the full report generated by ' 421 'LaTeX: \n\n' % repr(tex.encode('unicode_escape')) + --> 422 report)) 423 else: 424 mpl.verbose.report(report, 'debug') RuntimeError: LaTeX was not able to process the following string: 'lp' Here is the full report generated by LaTeX:
<matplotlib.figure.Figure at 0x7fb7736e1b50>
Plot yhat vs. Pearson residuals:
fig, ax = plt.subplots()
ax.scatter(yhat, res.resid_pearson)
ax.hlines(0, 0, 1)
ax.set_xlim(0, 1)
ax.set_title('Residual Dependence Plot')
ax.set_ylabel('Pearson Residuals')
ax.set_xlabel('Fitted values')
<matplotlib.text.Text at 0x7fb773d27cd0>
Error in callback <function post_execute at 0x7fb786441500> (for post_execute):
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /usr/lib/python2.7/dist-packages/matplotlib/pyplot.pyc in post_execute() 145 def post_execute(): 146 if matplotlib.is_interactive(): --> 147 draw_all() 148 149 # IPython >= 2 /usr/lib/python2.7/dist-packages/matplotlib/_pylab_helpers.pyc in draw_all(cls, force) 148 for f_mgr in cls.get_all_fig_managers(): 149 if force or f_mgr.canvas.figure.stale: --> 150 f_mgr.canvas.draw_idle() 151 152 atexit.register(Gcf.destroy_all) /usr/lib/python2.7/dist-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs) 2024 if not self._is_idle_drawing: 2025 with self._idle_draw_cntx(): -> 2026 self.draw(*args, **kwargs) 2027 2028 def draw_cursor(self, event): /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in draw(self) 472 473 try: --> 474 self.figure.draw(self.renderer) 475 finally: 476 RendererAgg.lock.release() /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/figure.pyc in draw(self, renderer) 1157 dsu.sort(key=itemgetter(0)) 1158 for zorder, a, func, args in dsu: -> 1159 func(*args) 1160 1161 renderer.close_group('figure') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe) 2322 2323 for zorder, a in dsu: -> 2324 a.draw(renderer) 2325 2326 renderer.close_group('axes') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs) 1106 ticks_to_draw = self._update_ticks(renderer) 1107 ticklabelBoxes, ticklabelBoxes2 = self._get_tick_bboxes(ticks_to_draw, -> 1108 renderer) 1109 1110 for tick in ticks_to_draw: /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in _get_tick_bboxes(self, ticks, renderer) 1056 for tick in ticks: 1057 if tick.label1On and tick.label1.get_visible(): -> 1058 extent = tick.label1.get_window_extent(renderer) 1059 ticklabelBoxes.append(extent) 1060 if tick.label2On and tick.label2.get_visible(): /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in get_window_extent(self, renderer, dpi) 959 raise RuntimeError('Cannot get window extent w/o renderer') 960 --> 961 bbox, info, descent = self._get_layout(self._renderer) 962 x, y = self.get_unitless_position() 963 x, y = self.get_transform().transform_point((x, y)) /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in _get_layout(self, renderer) 350 tmp, lp_h, lp_bl = renderer.get_text_width_height_descent('lp', 351 self._fontproperties, --> 352 ismath=False) 353 offsety = (lp_h - lp_bl) * self._linespacing 354 /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in get_text_width_height_descent(self, s, prop, ismath) 227 fontsize = prop.get_size_in_points() 228 w, h, d = texmanager.get_text_width_height_descent(s, fontsize, --> 229 renderer=self) 230 return w, h, d 231 /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in get_text_width_height_descent(self, tex, fontsize, renderer) 673 else: 674 # use dviread. It sometimes returns a wrong descent. --> 675 dvifile = self.make_dvi(tex, fontsize) 676 dvi = dviread.Dvi(dvifile, 72 * dpi_fraction) 677 try: /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in make_dvi(self, tex, fontsize) 420 'string:\n%s\nHere is the full report generated by ' 421 'LaTeX: \n\n' % repr(tex.encode('unicode_escape')) + --> 422 report)) 423 else: 424 mpl.verbose.report(report, 'debug') RuntimeError: LaTeX was not able to process the following string: 'lp' Here is the full report generated by LaTeX:
<matplotlib.figure.Figure at 0x7fb7735e76d0>
Histogram of standardized deviance residuals:
from scipy import stats
fig, ax = plt.subplots()
resid = res.resid_deviance.copy()
resid_std = stats.zscore(resid)
ax.hist(resid_std, bins=25)
ax.set_title('Histogram of standardized deviance residuals');
Error in callback <function post_execute at 0x7fb786441500> (for post_execute):
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /usr/lib/python2.7/dist-packages/matplotlib/pyplot.pyc in post_execute() 145 def post_execute(): 146 if matplotlib.is_interactive(): --> 147 draw_all() 148 149 # IPython >= 2 /usr/lib/python2.7/dist-packages/matplotlib/_pylab_helpers.pyc in draw_all(cls, force) 148 for f_mgr in cls.get_all_fig_managers(): 149 if force or f_mgr.canvas.figure.stale: --> 150 f_mgr.canvas.draw_idle() 151 152 atexit.register(Gcf.destroy_all) /usr/lib/python2.7/dist-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs) 2024 if not self._is_idle_drawing: 2025 with self._idle_draw_cntx(): -> 2026 self.draw(*args, **kwargs) 2027 2028 def draw_cursor(self, event): /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in draw(self) 472 473 try: --> 474 self.figure.draw(self.renderer) 475 finally: 476 RendererAgg.lock.release() /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/figure.pyc in draw(self, renderer) 1157 dsu.sort(key=itemgetter(0)) 1158 for zorder, a, func, args in dsu: -> 1159 func(*args) 1160 1161 renderer.close_group('figure') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe) 2322 2323 for zorder, a in dsu: -> 2324 a.draw(renderer) 2325 2326 renderer.close_group('axes') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs) 1106 ticks_to_draw = self._update_ticks(renderer) 1107 ticklabelBoxes, ticklabelBoxes2 = self._get_tick_bboxes(ticks_to_draw, -> 1108 renderer) 1109 1110 for tick in ticks_to_draw: /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in _get_tick_bboxes(self, ticks, renderer) 1056 for tick in ticks: 1057 if tick.label1On and tick.label1.get_visible(): -> 1058 extent = tick.label1.get_window_extent(renderer) 1059 ticklabelBoxes.append(extent) 1060 if tick.label2On and tick.label2.get_visible(): /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in get_window_extent(self, renderer, dpi) 959 raise RuntimeError('Cannot get window extent w/o renderer') 960 --> 961 bbox, info, descent = self._get_layout(self._renderer) 962 x, y = self.get_unitless_position() 963 x, y = self.get_transform().transform_point((x, y)) /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in _get_layout(self, renderer) 350 tmp, lp_h, lp_bl = renderer.get_text_width_height_descent('lp', 351 self._fontproperties, --> 352 ismath=False) 353 offsety = (lp_h - lp_bl) * self._linespacing 354 /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in get_text_width_height_descent(self, s, prop, ismath) 227 fontsize = prop.get_size_in_points() 228 w, h, d = texmanager.get_text_width_height_descent(s, fontsize, --> 229 renderer=self) 230 return w, h, d 231 /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in get_text_width_height_descent(self, tex, fontsize, renderer) 673 else: 674 # use dviread. It sometimes returns a wrong descent. --> 675 dvifile = self.make_dvi(tex, fontsize) 676 dvi = dviread.Dvi(dvifile, 72 * dpi_fraction) 677 try: /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in make_dvi(self, tex, fontsize) 420 'string:\n%s\nHere is the full report generated by ' 421 'LaTeX: \n\n' % repr(tex.encode('unicode_escape')) + --> 422 report)) 423 else: 424 mpl.verbose.report(report, 'debug') RuntimeError: LaTeX was not able to process the following string: 'lp' Here is the full report generated by LaTeX:
<matplotlib.figure.Figure at 0x7fb773b20210>
QQ Plot of Deviance Residuals:
from statsmodels import graphics
graphics.gofplots.qqplot(resid, line='r')
<matplotlib.figure.Figure at 0x7fb773b4ded0>
Error in callback <function post_execute at 0x7fb786441500> (for post_execute):
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /usr/lib/python2.7/dist-packages/matplotlib/pyplot.pyc in post_execute() 145 def post_execute(): 146 if matplotlib.is_interactive(): --> 147 draw_all() 148 149 # IPython >= 2 /usr/lib/python2.7/dist-packages/matplotlib/_pylab_helpers.pyc in draw_all(cls, force) 148 for f_mgr in cls.get_all_fig_managers(): 149 if force or f_mgr.canvas.figure.stale: --> 150 f_mgr.canvas.draw_idle() 151 152 atexit.register(Gcf.destroy_all) /usr/lib/python2.7/dist-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs) 2024 if not self._is_idle_drawing: 2025 with self._idle_draw_cntx(): -> 2026 self.draw(*args, **kwargs) 2027 2028 def draw_cursor(self, event): /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in draw(self) 472 473 try: --> 474 self.figure.draw(self.renderer) 475 finally: 476 RendererAgg.lock.release() /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/figure.pyc in draw(self, renderer) 1157 dsu.sort(key=itemgetter(0)) 1158 for zorder, a, func, args in dsu: -> 1159 func(*args) 1160 1161 renderer.close_group('figure') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe) 2322 2323 for zorder, a in dsu: -> 2324 a.draw(renderer) 2325 2326 renderer.close_group('axes') /usr/lib/python2.7/dist-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 59 def draw_wrapper(artist, renderer, *args, **kwargs): 60 before(artist, renderer) ---> 61 draw(artist, renderer, *args, **kwargs) 62 after(artist, renderer) 63 /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs) 1106 ticks_to_draw = self._update_ticks(renderer) 1107 ticklabelBoxes, ticklabelBoxes2 = self._get_tick_bboxes(ticks_to_draw, -> 1108 renderer) 1109 1110 for tick in ticks_to_draw: /usr/lib/python2.7/dist-packages/matplotlib/axis.pyc in _get_tick_bboxes(self, ticks, renderer) 1056 for tick in ticks: 1057 if tick.label1On and tick.label1.get_visible(): -> 1058 extent = tick.label1.get_window_extent(renderer) 1059 ticklabelBoxes.append(extent) 1060 if tick.label2On and tick.label2.get_visible(): /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in get_window_extent(self, renderer, dpi) 959 raise RuntimeError('Cannot get window extent w/o renderer') 960 --> 961 bbox, info, descent = self._get_layout(self._renderer) 962 x, y = self.get_unitless_position() 963 x, y = self.get_transform().transform_point((x, y)) /usr/lib/python2.7/dist-packages/matplotlib/text.pyc in _get_layout(self, renderer) 350 tmp, lp_h, lp_bl = renderer.get_text_width_height_descent('lp', 351 self._fontproperties, --> 352 ismath=False) 353 offsety = (lp_h - lp_bl) * self._linespacing 354 /usr/lib/python2.7/dist-packages/matplotlib/backends/backend_agg.pyc in get_text_width_height_descent(self, s, prop, ismath) 227 fontsize = prop.get_size_in_points() 228 w, h, d = texmanager.get_text_width_height_descent(s, fontsize, --> 229 renderer=self) 230 return w, h, d 231 /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in get_text_width_height_descent(self, tex, fontsize, renderer) 673 else: 674 # use dviread. It sometimes returns a wrong descent. --> 675 dvifile = self.make_dvi(tex, fontsize) 676 dvi = dviread.Dvi(dvifile, 72 * dpi_fraction) 677 try: /usr/lib/python2.7/dist-packages/matplotlib/texmanager.pyc in make_dvi(self, tex, fontsize) 420 'string:\n%s\nHere is the full report generated by ' 421 'LaTeX: \n\n' % repr(tex.encode('unicode_escape')) + --> 422 report)) 423 else: 424 mpl.verbose.report(report, 'debug') RuntimeError: LaTeX was not able to process the following string: 'lp' Here is the full report generated by LaTeX:
<matplotlib.figure.Figure at 0x7fb773b4ded0>
GLM: Gamma for proportional count response
Load data
In the example above, we printed the NOTE
attribute to learn about the
Star98 dataset. Statsmodels datasets ships with other useful information. For
example:
print(sm.datasets.scotland.DESCRLONG)
This data is based on the example in Gill and describes the proportion of voters who voted Yes to grant the Scottish Parliament taxation powers. The data are divided into 32 council districts. This example's explanatory variables include the amount of council tax collected in pounds sterling as of April 1997 per two adults before adjustments, the female percentage of total claims for unemployment benefits as of January, 1998, the standardized mortality rate (UK is 100), the percentage of labor force participation, regional GDP, the percentage of children aged 5 to 15, and an interaction term between female unemployment and the council tax. The original source files and variable information are included in /scotland/src/
Load the data and add a constant to the exogenous variables:
data2 = sm.datasets.scotland.load()
data2.exog = sm.add_constant(data2.exog, prepend=False)
print(data2.exog[:5,:])
print(data2.endog[:5])
[[ 7.12000000e+02 2.10000000e+01 1.05000000e+02 8.24000000e+01 1.35660000e+04 1.23000000e+01 1.49520000e+04 1.00000000e+00] [ 6.43000000e+02 2.65000000e+01 9.70000000e+01 8.02000000e+01 1.35660000e+04 1.53000000e+01 1.70395000e+04 1.00000000e+00] [ 6.79000000e+02 2.83000000e+01 1.13000000e+02 8.63000000e+01 9.61100000e+03 1.39000000e+01 1.92157000e+04 1.00000000e+00] [ 8.01000000e+02 2.71000000e+01 1.09000000e+02 8.04000000e+01 9.48300000e+03 1.36000000e+01 2.17071000e+04 1.00000000e+00] [ 7.53000000e+02 2.20000000e+01 1.15000000e+02 6.47000000e+01 9.26500000e+03 1.46000000e+01 1.65660000e+04 1.00000000e+00]] [ 60.3 52.3 53.4 57. 68.7]
Fit and summary
glm_gamma = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma())
glm_results = glm_gamma.fit()
print(glm_results.summary())
Generalized Linear Model Regression Results ============================================================================== Dep. Variable: y No. Observations: 32 Model: GLM Df Residuals: 24 Model Family: Gamma Df Model: 7 Link Function: inverse_power Scale: 0.00358428317349 Method: IRLS Log-Likelihood: -83.017 Date: Wed, 27 Apr 2016 Deviance: 0.087389 Time: 23:33:09 Pearson chi2: 0.0860 No. Iterations: 6 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 4.962e-05 1.62e-05 3.060 0.002 1.78e-05 8.14e-05 x2 0.0020 0.001 3.824 0.000 0.001 0.003 x3 -7.181e-05 2.71e-05 -2.648 0.008 -0.000 -1.87e-05 x4 0.0001 4.06e-05 2.757 0.006 3.23e-05 0.000 x5 -1.468e-07 1.24e-07 -1.187 0.235 -3.89e-07 9.56e-08 x6 -0.0005 0.000 -2.159 0.031 -0.001 -4.78e-05 x7 -2.427e-06 7.46e-07 -3.253 0.001 -3.89e-06 -9.65e-07 const -0.0178 0.011 -1.548 0.122 -0.040 0.005 ==============================================================================
GLM: Gaussian distribution with a noncanonical link
Artificial data
nobs2 = 100
x = np.arange(nobs2)
np.random.seed(54321)
X = np.column_stack((x,x**2))
X = sm.add_constant(X, prepend=False)
lny = np.exp(-(.03*x + .0001*x**2 - 1.0)) + .001 * np.random.rand(nobs2)
Fit and summary
gauss_log = sm.GLM(lny, X, family=sm.families.Gaussian(sm.families.links.log))
gauss_log_results = gauss_log.fit()
print(gauss_log_results.summary())
Generalized Linear Model Regression Results ============================================================================== Dep. Variable: y No. Observations: 100 Model: GLM Df Residuals: 97 Model Family: Gaussian Df Model: 2 Link Function: log Scale: 1.05311425588e-07 Method: IRLS Log-Likelihood: 662.92 Date: Wed, 27 Apr 2016 Deviance: 1.0215e-05 Time: 23:33:10 Pearson chi2: 1.02e-05 No. Iterations: 7 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 -0.0300 5.6e-06 -5361.316 0.000 -0.030 -0.030 x2 -9.939e-05 1.05e-07 -951.091 0.000 -9.96e-05 -9.92e-05 const 1.0003 5.39e-05 1.86e+04 0.000 1.000 1.000 ==============================================================================