Machine Learning: Applications and Opportunities in Social Science Research


The field of machine learning is most commonly associated with "big data": how we can use massive datasets to make better predictions about things like credit card fraud, Netflix recommendations, and the like. Though machine learning has been most influential in its commercial and medical applications, a growing number of social scientists are taking advantage of these methods to: (1) uncover patterns and structure embedded in data, (2) test and improve model specification and predictions, and (3) perform data reduction. This course covers the mechanics underlying machine learning methods and discusses how these techniques can be leveraged by social scientists to gain new insight from their data. Specifically, the course will cover: decision trees, random forests, boosting, k-means clustering and nearest neighbors, support vector machines, kernels, neural networks, and ensemble learning. We will also discuss topics related to best practices, including error rates, cross-validation, and the use of bootstrapping methods to develop uncertainty estimates.

Software: The course will use R to demonstrate the theoretical properties and empirical applications of these methods, and so participants should have some basic familiarity with R or similar statistical computing environments (such as Stata, SAS, or Python). An advanced programming background is not required or assumed.

Prerequisites: Participants should also have some prior exposure to linear regression models.

Fee: Members = $1700; Non-members = $3200

Tags: machine learning

Course Sections