What are the components of a data collection in HMCA?
A data collection comprises one or more data files, plus
technical documentation that describes the data. SAS, SPSS,
and/or Stata setups are included with many collections.
Data files are often provided in multiple data formats.
Every data file is supplied as an ASCII text file and,
for many collections, in at least one other
format as well, such as Stata files, SPSS
portable files, and SAS transport files generated by the
SAS XPORT engine or SAS CPORT procedure.
SPSS portable and SAS transport files are the most common
data formats besides ASCII.
Technical documentation typically includes the following:
-
study description that summarizes the collection
-
file manifest
-
bibliography of related literature
-
description of the study's methodology
-
data collection instrument(s)
-
data map/record layout of the ASCII data file(s)
-
variable descriptions
-
univariate frequencies (for most collections)
Study descriptions, file manifests,
and bibliographies of related literature are
presented as separate files. Other components of the
documentation may be bundled in a single
file or distributed among multiple files. Documentation
files are provided in Portable Document Format (PDF) and/or
as ASCII text files.
The setups, which usually contain complete variable and
value labels and often include missing value declarations
or recodes, can be used to create
software-specific system files (e.g., SAS datasets) from
the ASCII data files.