CPES logoCollaborative Psychiatric
Epidemiology Surveys

Merging the Study Datasets

The CPES dataset contains data merged from the three NCS-R, NLAAS, and NSAL datasets. The data for each study were collected using the Blaise® computer assisted interviewing (CAI) software, which stored question-level metadata (question text, data type, missing data codes, etc.) for each question. After receiving the Blaise® raw data, analysts for each study provided a clean SAS dataset, including added variables, such as the diagnostic variables, which were constructed during study post processing. In order to accommodate such added variables, sections that did not appear in the study instruments were added to the NCS-R, NLAAS, NSAL, and CPES codebooks:

  • Supplemental Variables (project ID, case ID, and weights);
  • Constructed Demographic Variables; and
  • Sections for each type of diagnosis provided by the studies (e.g., DX Adult Separation Anxiety Disorder), with the relevant DSM-IV and/or ICD-10 constructed diagnostic variables.

To facilitate harmonizing and merging the three study datasets, processors created a crosswalk table, with variables from the three datasets, in order of first the NCS-R instrument sections, then NLAAS-specific sections, and finally NSAL-specific sections. For studies where the same sections were asked, any question names that were identical were initially linked. Questions that were not identically named stayed in the order that they appeared in a specific section, but remained unlinked. Through several iterations, processors reviewed question text and code frames. If question text and code frames matched for differently named variables, they were linked, sometimes requiring moving a study question from one section to another for harmonization purposes. If question text or code frames differed substantially across studies, even if the variable names were the same, they were unlinked in the crosswalk table. Some questions with minor differences remained linked and were highlighted in the CPES codebook as having differences. Many of the recodes listed below allowed re-linking and harmonizing variables where the code frames initially differed. Finally, variable labels were reviewed and changed in an effort to make labels consistent across studies.

The crosswalk table was used to merge data from the three studies. Blaise® metadata, information from the crosswalk table, and SAS metadata for constructed variables were combined to create question-level metadata for the CPES merged dataset.