Finding Patterns in Data, Big and Small


The goal of this course is to explore methods for finding groups in data. The focus will begin with "regular" data sets (e.g., observations by variables) and discuss both traditional clustering techniques and more recent statistical approaches. Additionally, we will discuss both algorithmic approaches (e.g., pure opitmization algorithms, genetic algorithms, simulated annealing, etc.) and formal statistical models (finite mixture models, latent profile analysis, latent class analysis, etc.). In addition to classic data structures, we will spend time discussing how to find groups in the context of: network data (blockmodeling, community detection), longitudinal data and repeated measure data (time series cluster models, growth mixture models, etc.), text mining (finding clusters in documents), and proximity data. Discussion will also be given on how to implement methods on large data sets.

Many important decisions will be discussed, including: (1) how to choose variables, (2) how to standardize/transform variables, (3) which approaches are best for particular data structures, (4) how to simultaneously reduce dimensionality and find groups, and (5) how to determine if a solution is a valid representation of latent structure. Importantly, each of the approaches to finding groups/patterns in data will be related back to traditional multivariate techniques such as principal component analysis, factor analysis, and structural equation modeling. The class will delineate which approaches work well together and which approaches should be avoided in being combined into a single analysis.

All techniques will be demonstrated through the working of "real data analysis", with attention to given to which output to pay attention to and how to discuss the final results.

Fee: Members = $1500; Non-members = $3000

Tags: cluster analysis, scaling

Course Sections

Found a problem? Use our Report Problem form to let us know.