[DDI-SRG] long and short integers

Joachim Wackerow joachim.wackerow at gesis.org
Thu Aug 2 02:35:48 EDT 2007


Regarding data types in DDI we have two purposes of the description of a 
data type:

1. data type (better data description) on a general level
2. data type in terms of the used statistical package, database system, 
or program language

The second purpose seems to be important when describing data stored in 
a specific form like a SPSS system file, DBMS, etc. It would be used 
only by a future specific DDI sub-module.

The first purpose is important for the description of data stored in a 
form not specific to a special package like rectangular (in the logical 
sense) ASCII or Unicode files. Numeric with number of decimal places and 
string with length are the most important characteristics, minimum and 
maximum are others. Ideally a generator (i.e. DDI to SPSS, Stata, SQL) 
should be able to select the most appropriate data types of a specific 
package/program when writing data in the related form.

For example a generator can select short or long integer of Stata 
dependent of the minimum and maximum value of a variable. Stata itself 
has an optimizer for data types which can be used when space is a issue. 
Then an analysis of the data is made and the most appropriate data type 
selected. Similar a generic converter can be constructed, which uses the 
general characteristics of a variable or - if the metadata are not 
reliable - these characteristics can derived from an analysis.

The question is: does it really make sense to build system-specific 
sub-modules in DDI, is it not more in the field of applications? I know 
- in terms of a work flow - it seems to be attractive to have 
system-specific DDI sub-modules. But designing these modules can cause a 
lot of work which is dependent from these systems. I would prefer to 
make a distinction between the system-specific data types and a 
system-specific application module (not DDI) which uses these data 
types. The system-specific data types should be stored optionally in 
addition to the general characteristics of a variable in the physical 
product.

We need to identify the general characteristics, which is probably 
already done. We should make a careful review, if these characteristics 
are really comprehensive enough to derive the system-specific data types 
from.

In addition a mapping table between the system-specific data types can 
be useful. This table can describe probably only the core data types and 
will be never complete. Nevertheless it can be perhaps included in the 
user guide.

Achim


More information about the DDI-SRG mailing list