CPES logoCollaborative Psychiatric
Epidemiology Surveys

Data Processing


Data processing includes multiple steps which tend to be sequential, they may also have an iterative flow. CPES data processing efforts include (1) data transmission; (2) data editing and coding; (3) data file creation; and (4) data file documentation. Some of these processing steps can be taken prior to or concurrent with data collection. In CPES, capturing the data, performing edit checks, and building data files can, at least partially, occur automatically while the data are being collected.

Data transmission

Each day interviewers connected their laptops to a telephone line and dialed up an ISP to submit and receive information from the central office. The purpose of the communication was twofold. First, all work performed since the previous communication was transmitted to the central office and the information updated in the master files. As part of the daily communication, interviewers reported the hours that they worked and their expenses. This information transfer allowed for real-time monitoring of interview production, sample disposition, and costs. Second, information from the central office, including program updates, transfers of sample between interviewers, and newly released sample, was transmitted electronically to interviewers.

Data editing and coding

Many data processing activities that are typically completed when collecting data via paper questionnaires were unnecessary in these studies because the questionnaires were computer administered. For example, making sure that questions were asked in the correct sequence, checking for out-of-range or inconsistent responses, and filling in the appropriate question text based on a respondent's previous answers were all controlled by the interviewing application software. Inconsistent responses that failed the programme's edit checks were brought to the attention of the interviewer who could resolve the inconsistency with the respondent during the interview, improving the quality of data and minimizing the need for back-end editing. Although these software programs automatically performed many of the decisions formerly made by interviewers using paper questionnaires, the data for each study did require some additional editing and coding. Editing operations included processing each interview through a series of programming routines that evaluated question responses and assigned codes to indicate the presence or absence of each mental health disorder assessed by the study. A number of other summary variables based on individual question items were also created in preparation for the project's analysis phase. In addition, each study included several open-ended questions, which were coded.

Data file creation

Data files were extracted to ASCII and converted to SAS format once data collection began and were updated throughout the data collection period. Early in the data collection phase, these files were used by project managers to identify any problems with administration of the questionnaire and to monitor response trends and patterns. Datasets were produced for the studies' principal investigators on a weekly basis to allow the investigators' staffs to perform preliminary analyses.

Data file documentation

A codebook and set of companion instructions and study materials were prepared for each study. The codebook provided the information that users need to associate a variable in the data file with the corresponding question on the questionnaire and documented the characteristics of each variable in the data set, such as its format and response codes. The codebook also contained frequencies for nominal and ordinal variables and a set of basic descriptive statistics for continuous variables. For NLAAS, the HTML-compatible codebook included a facility to view each question in any of the five languages.