Analyzing Longitudinal and Multilevel Data in R and Stan (Toronto, ON)


The course consists of a one-day workshop on R followed by a four-day course on models for longitudinal and multilevel data making intensive use of specialized packages in R. The R workshop is tailored to the specific needs of the subsequent course.

Monday, May 28: Introduction to the R Statistical Computing Environment

Instructor: John Fox

The free, open-source R statistical programming language and computing environment is in wide use in many disciplines, including the social sciences. The substantial capabilities of the basic R software are augmented by more than 12,000 contributed R "packages" for various statistical methods, freely available on the Comprehensive R Archive Network (CRAN).

This one-day workshop provides a basic introduction to R and RStudio, which is a sophisticated, and free, editor customized for R. The goal of the workshop is to prepare participants for the subsequent four-day course on "Analyzing Longitudinal and Multilevel Data".

Topics to be covered include: getting started with R and RStudio; reading and manipulating data in R; basic statistical graphics in R; and fitting and working with linear and generalized linear models in R.

Tuesday-Friday, May 29-June 1: Analyzying Longitudinal and Multilevel Data using R and STAN

Instructor: Georges Monette

Mixed and hierarchical models allow researchers to analyze complex data with observations that are interdependent as a result of clustering or proximity in time or space. They allow the estimation of much richer phenomena than is possible with ordinary linear models, but they also present interesting challenges.

The first part of the course is a study of models for longitudinal and multilevel data using mixed-effects models in the 'nlme' package in R. The content parallels the first seven chapters of Singer, J.D. and Willett, J. B. (2003) "Applied Longitudinal Data Analysis", Oxford University Press. We emphasize the notion that these models not only allow the analysis of complex data but, more interestingly, they give researchers access to richer insights into the phenomena of their disciplines.

The second part of the course uses Bayesian Markov Chain Monte Carlo methods, specifically Hamiltonian Monte Carlo implemented with the STAN modelling language, to extend the reach of the ideas of mixed-effects models. We will look at a wide range of distributions for the response variable including those of generalized linear mixed models, multivariate responses with combinations of discrete and continuous components, and thicker-tailed response distributions for robust analysis. We will also consider the treatment of missing data using a variety of approaches.

All methods are implemented with extensive examples coded in R using primarily the 'nlme' package and the STAN modelling language with the 'rstan' package.

Fee: Members = $1700; Non-members = $3200

Tags: longitudinal data, multilevel data

Course Sections