Regression Analysis III: Advanced Methods

Instructor(s):

  • David Armstrong, Western University

Linear regression is the workhorse of social science methodology. Its relative robustness and easy interpretation are but two of the reasons that it is generally the first and frequently the last stop on the way to characterizing empirical relationships among observed variables. This course will extend the basic linear model framework in a number of directions in an attempt to fix potential problems in the analysis before they arise. We will use the Generalized Additive Model for Location, Scale and Shape (GAMLSS) framework to help build up model complexity while remaining in the regression framework. This will allow us to investigate non-linearity and non-additivity in both standard and non-standard ways. It will also permit us to investigate more flexible means for estimating relationships - namely we can use machine learning tools, like random forests, multivariate adaptive regression splines (MARS), gradient boosted machines and polynomial expansions to represent the relationships among a subset of variables in the model. Thus, we can impose strong functional form assumptions on some aspects of the model and let others be more flexibly estimated. We use the linear model framework for its intuitive appeal, but the tools we learn for model specification, diagnosis and evaluation can all be used in the broader GLM framework and beyond. Time permitting, we will also consider finite mixture models, missing data/multiple imputation and model selection/multi-model inference.

This course is great for anyone trying to figure out how to employ machine learning tools in conventional social science analytical situations. By this, I mean where inference and hypothesis testing is still one of the main goals. It is also good for anyone trying to extend their knowledge of how we deal with functional form assumptions in regression models.

The course assumes an intimate familiarity with the details of OLS regression and a working knowledge of matrices and linear algebra (taking Mathematics for Social Scientists, II concurrently should be sufficient in the event of no prior knowledge). The course relies entirely on the R computing language. In the past, motivated participants have found that this course, taken in conjunction with Introduction to R, provides a sufficiently broad introduction to R. In general, if you have had a linear models course taught at the level of Wooldridge, Gujarati, or Kennedy, you should be in the right place.

Readings from the course will come, in part, from the following books:

Fox, John. 2016. Applied Regression and Generalized Linear Models, 3rd ed. Sage.

James, Garreth, Daniela Witten, Trevor Hastie and Robert Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. Springer.

Stasinopoulos, Mikis, Robert Rigby, Gillian Heller, Vlaios Voudouris and Fernanda De Bastiani. 2017. Flexible Regression and Smoothing Using GAMLSS in R.

Rigby, Robert, Mikis Stasinopoulos, Gillian Heller and Fernanda De Bastiani. 2019. Distributions for Modeling Location, Scale and Shape Using GAMLSS in R.

Fee and Registration: This course is part of the first four-week session. Please see our fee chart on our Registration page for the cost of attending one (or both) four-week session(s). Participants who enroll in a four-week session may take as many courses (workshops and lectures) as desired during the session for which they are enrolled.

Tags: regression, regression diagnostics, R-statistical computing environment, outliers, goodness of fit, linear regression, model selection

Course Sections

Section 1

Location: ICPSR -- Ann Arbor, MI

Date(s): June 22 - July 17

Time: 3:00 PM - 5:00 PM

Instructor(s):

  • David Armstrong, Western University