Missing Data: An Introduction to the Analysis of Incomplete Datasets


Missing data pervade all academic fields, including the social, behavioral, educational and biomedical sciences. The vagaries of empirical observation almost inevitably produce incomplete data sets wherein some of the objects under investigation do not have information on one or more of the relevant variables. In the past, researchers often ignored the problem of missing data, simply by dropping incomplete observations from statistical analyses. However, doing so may (and in general will) compromise the validity of the analyses and raise questions about dependability of any substantive conclusions generated from the incomplete data. In recognition of this problem, missing data analysis has become a major area of statistical research over the past few decades. And, we now recognize that the presence of incomplete observations has multiple and far-reaching implications for research in the social and behavioral sciences.

This workshop provides an introduction to statistical strategies for dealing with missing data in empirical analyses. We will begin by focusing on the major mechanisms of missing data, along with the serious limitations of the ad hoc methods that have been used in the past to deal with incomplete datasets. Next, we will discuss two systematic approaches for analyzing missing data: (full information) maximum likelihood (ML) and multiple imputation (MI). We will then look at the inclusive analytic strategy, which is useful when some of the assumptions of ML are violated and can be just as useful for MI. Thereby, we will also examine ways in which we can find auxiliary variables that can enhance considerably the plausibility of the missing at random (MAR) assumption underlying widely available software implementations of ML and MI. This workshop will take an applied approach with plenty of empirical examples and lots of opportunity for participants to obtain hands-on experience using the various methods for dealing with missing data, as well as the software Mplus, Stata, and R. Participants in this workshop should have a good understanding of the multiple regression model. Experience with matrix algebra and calculus, or familiarity with the software used, is not required.

Fee: Members = $1500; Non-members = $3000

Tags: missing data, incompelete data

Course Sections