How do I interpret a record from an ASCII data file?

Our data files are usually distributed as columnar ASCII files that consist of rows and columns of alphanumeric characters. Since ASCII data files are simply text files, they can be opened in any word processing program or Internet browser. However, the alphanumeric characters are not meaningful without the help of a codebook or setup files to identify the columns of the ASCII data file as particular variables.

This example illustrates how to interpret an ASCII data file for ICPSR 2737, Capital Punishment in the United States, 1973-1997.

The data file consists of 6,819 cases or observations, which in this example is inmates under sentence of death or those who were executed. Example 1 shows the first 10 lines of data in this file. The first observation, or line of data, is highlighted in red.

Example 1: The first case or line of data in the data file

Screen shot of columns of numbers, first row highlighted in red

The data file is a fixed format data file and is stored in a logical record length of 81. This means that each line is comprised of 81 characters. These 81 characters correspond to 37 variables or data items. Example 2 illustrates that each line of data in the file is 81 characters long.

Example 2: Each record is the same length (81 characters wide)

Screen shot of columns of numbers, first and last columns highlighted in yellow

In order to know which columns comprise particular variables, it is necessary to refer to the codebook (PDF 234K). The following examples illustrate how to read the first ten variables from this ASCII data file, beginning with the first record (row) and counting from left to right:

VARIABLE 1

V1-ICPSR STUDY NUMBER: This variable is positioned in column locations 1 through 4 and contains the value "2737" for each record. This value represents the 4-digit ICPSR archival study number assigned to this data collection.

Example 3: Variable 1 in Columns 1-4

Screen shot of columns of numbers, first four characters highlighted in yellow

VARIABLE 2

V2-ICPSR EDITION NUMBER: This variable is positioned in column location 5 and contains the value "1" for each record. This value represents the ICPSR edition number assigned to the data collection.

Example 4: Variable 2 in Column 5

Screen shot of columns of numbers, fifth character in each row highlighted in yellow

VARIABLE 3

V3-ICPSR PART NUMBER: This variable is positioned in column location 6 and contains the value "1" for each record. This value represents the ICPSR part number assigned to the data file within the data collection.

Example 5: Variable 3 in Column 6

Screen shot of columns of numbers, sixth character in each row highlighted in yellow

VARIABLE 4

V4-ICPSR SEQUENTIAL ID: This variable is positioned in column locations 7 through 10 and contains the value "1" for the first record. This value represents the first sequential case identification number and is used to uniquely identify a given record in the data file.

Example 6: Variable 4 in Columns 7-10

Screen shot of columns of numbers, second column highlighted in yellow

VARIABLE 5

V5-REPORT YEAR: This variable is positioned in column locations 11 through 14 and represents the reporting year. The first record, highlighted in red, contains the value "0", which represents a reporting year prior to 1973. The fifth record, also highlighted in red, contains the value "1973", which represents the actual year of the event.

Example 7: Variable 5 in Columns 11-14

Screen shot of columns of numbers, third column highlighted in yellow

VARIABLE 6

V6-INMATE ID: This variable is positioned in column locations 15 through 18 and contains the value "8" for the first record. This value represents a four-digit inmate identification number.

Example 8: Variable 6 in Columns 15-18

Screen shot of columns of numbers, fourth column highlighted in yellow

VARIABLE 7

V7-STATE: This variable is positioned in column locations 19 through 20 and contains the value "1" for all 10 records in this example. This value represents the FIPS state code for Alabama.

Example 9: Variable 7 in Columns 19-20

Screen shot of columns of numbers, first character of fifth column highlighted in yellow

VARIABLE 8

V8-Q3 SEX: This variable is positioned in column location 21 and contains the value "1" for the first 10 records. This code identifies the sex of these inmates as "male".

Example 10: Variable 8 in Column 21

Screen shot of columns of numbers, second character of fifth column highlighted in yellow

VARIABLE 9

V9-Q4A RACE: This variable is positioned in column 22 and contains the value "2" for the first record. This code identifies the race of this inmate as "Black".

Example 11: Variable 9 in Column 22

Screen shot of columns of numbers, third character of fifth column highlighted in yellow

VARIABLE 10

V10-HISPANIC ORIGIN: This variable is positioned column 23 and contains the value "2" for the first record. This code identifies the Hispanic origin of this inmate as "Non-Hispanic".

Example 12: Variable 10 in Column 23

Screen shot of columns of numbers, fourth character of fifth column highlighted in yellow

To locate the column positions for the remaining variables for this study, see the codebook for CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-1997.

This example illustrates that a visual interpretation of the data record is inefficient. Commercially available statistical software packages such as SAS, SPSS, and Stata are available to interpret data files and to subset the variables and or cases as needed.

Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

Found a problem? Use our Report Problem form to let us know.