What kind of data formats does the archive distribute? Do you have SPSS Portable files? SAS transport? Stata?
We primarily distribute data files in eight data formats: three plain text formats (column-delimited
ASCII, comma-delimited ASCII, and tab-delimited ASCII), two SAS formats (SAS XPORT and CPORT files), two
SPSS formats (SPSS SAV and portable files), and the single Stata data format. Virtually every data
file is available in a plain text format. We also supply many data files in one or more of the other
Column-, comma-, and tab-delimited ASCII data files store data, including
numeric values, as lines of plain text, with one or more lines per observation (or subject or case). In the
plain text format, every character of text--each digit, letter, or other symbol--is encoded in a separate byte
in the data file. Thus, the number 133.5 occupies five bytes, the number 8 just one byte, and the string
"computer programmer" requires nineteen bytes. Many of ICPSR's plain text data files are encoded with the
ASCII character encoding system. However, some use other encodings, such as IBM PC code page 437, which
is based on ASCII but supports more characters than ASCII does. Most use the ASCII-based ISO 8859-1 or
In all three types of plain text data files, the line(s) allocated to a given observation contains the
observation's values for the file's variables. What sets the three types apart is way the values are
demarcated on the lines.
In a column-delimited ASCII data file, each variable occupies the same byte(s) on every
observation. The bytes are usually called "columns," hence the name of this data format. For example, if a
file with one line per observation has just three variables which occupy three bytes each, then the first
variable would be located in columns 1-3, the second in columns 4-6, and the third in columns 7-9 on each
line in the data file.
To facilitate the use of the column-delimited ASCII data files, which require programming expertise to
import them into statistical packages for analysis, ICPSR usually provides programs, called "setups," to
read them into SAS, SPSS, or Stata. The setups also assign variable labels and usually assign value labels
and define missing values too.
In a comma-delimited ASCII data file, the data values are separated with commas instead
of being located in fixed column locations. Thus, in this format, the length of each line varies according
to the magnitude of the line's data values. For example, the first two lines of a four-variable data file
could look like this:
As with the column-delimited ASCII files, ICPSR usually provides setups to read the comma-delimited ASCII
files into SAS, SPSS, or Stata.
Tab-delimited ASCII data files are the same as comma-delimited ASCII files except that
values are delimited with a special tab control character instead of a comma. Most of these files were
created by ICPSR for use with spreadsheets, such as Excel, into which they can be easily imported. These
files can also be read into statistical packages like SAS, SPSS, and Stata. However, ICPSR rarely provides
setups for that purpose.
We distribute two SAS data formats: SAS transport files generated by the SAS
CPORT procedure and SAS transport files written by the SAS XPORT engine. Both types of files contain
specially formatted SAS data sets, which contain variable labels as well as data. Many of ICPSR's SAS CPORT
files also include SAS format catalogs with value labels.
SAS CPORT files should be imported into SAS with the SAS CIMPORT procedure.
Since SAS has an engine that reads SAS XPORT files, they can be read by any SAS command that can read an
ordinary SAS data set, such as the SAS set statement or the SAS FREQ procedure. SAS XPORT files can also
be converted to standard SAS data sets with the SAS COPY procedure.
We distribute two types of SPSS data files: SPSS SAV files written by the SPSS
save command and SPSS portable files written by the SPSS export command. Both types of data files include
variable labels and usually include value labels and missing value definitions.
To load SPSS SAV files into SPSS use the SPSS get command.
To read SPSS portable files into SPSS use the SPSS import command.
Like the SAS and SPSS formats, Stata's proprietary data file format, which is
written by the Stata save command, is platform independent. Our Stata data files include variable labels
and usually include value labels too.
Stata data files should be loaded into Stata with the Stata use command.
Using ASCII data and setup files