How do I interpret a record from an ASCII data file?
Our data files are usually distributed as columnar ASCII files that consist of rows and columns of alphanumeric characters. Since ASCII data files are simply text files, they can be opened in any word processing program or Internet browser. However, the alphanumeric characters are not meaningful without the help of a codebook or setup files to identify the columns of the ASCII data file as particular variables.
This example illustrates how to interpret an ASCII data file for ICPSR 2737, Capital Punishment in the United States, 1973-1997.
The data file consists of 6,819 cases or observations, which in this example is inmates under sentence of death or those who were executed. Example 1 shows the first 10 lines of data in this file. The first observation, or line of data, is highlighted in red.
Example 1: The first case or line of data in the data file
The data file is a fixed format data file and is stored in a logical record length of 81. This means that each line is comprised of 81 characters. These 81 characters correspond to 37 variables or data items. Example 2 illustrates that each line of data in the file is 81 characters long.
Example 2: Each record is the same length (81 characters wide)
In order to know which columns comprise particular variables, it is necessary to refer to the codebook (PDF 234K). The following examples illustrate how to read the first ten variables from this ASCII data file, beginning with the first record (row) and counting from left to right:
V1-ICPSR STUDY NUMBER: This variable is positioned in column locations 1 through 4 and contains the value "2737" for each record. This value represents the 4-digit ICPSR archival study number assigned to this data collection.
Example 3: Variable 1 in Columns 1-4
V2-ICPSR EDITION NUMBER: This variable is positioned in column location 5 and contains the value "1" for each record. This value represents the ICPSR edition number assigned to the data collection.
Example 4: Variable 2 in Column 5
V3-ICPSR PART NUMBER: This variable is positioned in column location 6 and contains the value "1" for each record. This value represents the ICPSR part number assigned to the data file within the data collection.
Example 5: Variable 3 in Column 6
V4-ICPSR SEQUENTIAL ID: This variable is positioned in column locations 7 through 10 and contains the value "1" for the first record. This value represents the first sequential case identification number and is used to uniquely identify a given record in the data file.
Example 6: Variable 4 in Columns 7-10
V5-REPORT YEAR: This variable is positioned in column locations 11 through 14 and represents the reporting year. The first record, highlighted in red, contains the value "0", which represents a reporting year prior to 1973. The fifth record, also highlighted in red, contains the value "1973", which represents the actual year of the event.
Example 7: Variable 5 in Columns 11-14
V6-INMATE ID: This variable is positioned in column locations 15 through 18 and contains the value "8" for the first record. This value represents a four-digit inmate identification number.
Example 8: Variable 6 in Columns 15-18
V7-STATE: This variable is positioned in column locations 19 through 20 and contains the value "1" for all 10 records in this example. This value represents the FIPS state code for Alabama.
Example 9: Variable 7 in Columns 19-20
V8-Q3 SEX: This variable is positioned in column location 21 and contains the value "1" for the first 10 records. This code identifies the sex of these inmates as "male".
Example 10: Variable 8 in Column 21
V9-Q4A RACE: This variable is positioned in column 22 and contains the value "2" for the first record. This code identifies the race of this inmate as "Black".
Example 11: Variable 9 in Column 22
V10-HISPANIC ORIGIN: This variable is positioned column 23 and contains the value "2" for the first record. This code identifies the Hispanic origin of this inmate as "Non-Hispanic".
Example 12: Variable 10 in Column 23
To locate the column positions for the remaining variables for this study, see the codebook for CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-1997.
This example illustrates that a visual interpretation of the data record is inefficient. Commercially available statistical software packages such as SAS, SPSS, and Stata are available to interpret data files and to subset the variables and or cases as needed.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.