Data Mining (Hubert M. Blalock Memorial Lecture Series)

Instructor(s):

  • Robert Stine, University of Pennsylvania

In many cases, the prevalence of large amounts of data allows computational methods to construct predictive models of behavior. Sometimes data mining produces models that are interpretable, sometimes not. These lectures introduce participants to data mining. You can use data mining for time series forecasting, modeling categorical choices, and classifying groups. Any time you think that a regression or similar model might work, you can try data mining, especially when you have a lot of data. The lectures illustrate data mining using software that explores the key algorithms, namely regression models, trees, and neural networks. Regression analysis is a good starting point, and it can be made into a powerful data mining tool, so long as you are careful about how you use it. Stepwise methods can work quite well, but they can also go far wrong unless you know how to guide them. Over-fitting is the bane of data mining: thinking you have found a pattern when there is really nothing going on. The two other techniques for modeling that will be presented are regression trees and neural nets. Trees generate predictions not by using an equation like regression, but by using a sequence of yes/no questions. Following the path of the answers to such questions leads to predictions of behavior. Plus, they can be a lot easier to explain than a regression. Neural nets are another similar method. Trees and neural nets work much like regression -- just cloaking the model in new words and nice pictures. Lessons learned in using regression for data mining apply to these as well.

The Hubert M. Blalock Memorial Lecture Series: Advanced Topics in Social Research -- Frontiers of Quantitative Methods

This is a special lecture series covering advanced topics on the frontier in quantitative methods of social research. Some of this material draws upon recent work in fields such as applied statistics, econometrics, computer science, and mathematical modeling.

This series is dedicated to the late Hubert "Tad" Blalock, whose scholarship, integrity, insight, and wit benefited all the social sciences through his work in applied statistics, causal modeling, theory construction, conceptualization, and measurement.

Fees: Consult the fee structure.

Tags: data mining, regression trees, neural networks

Course Sections

Section 1

Location: ICPSR -- Ann Arbor, MI

Date(s): June 30 - July 11

Time: 1:30 PM - 3:00 PM

Instructor(s):

  • Robert Stine, University of Pennsylvania

Found a problem? Use our Report Problem form to let us know.