[DDI-SRG] [DDI-CDG] The dataset movie (fwd)
Reto Hadorn
reto.hadorn at sidos.unine.ch
Mon Jun 13 09:50:42 EDT 2005
Hi I-Lin,
At 11.06.2005, you wrote:
>Hi Reto,
>
>I have a couple of questions about the movie, some of which may overlap
>with the
>ones I asked you at IASSIST
>
>1. What is a "cat" in the sense of "This is a cat" remarks on slide 44, 91?
In his introductory conference, Jowell compared coordinating a
cross-national survey program with herding cats. In the context of the
conference, this was understandable for those who recorded that joke; in a
publication out of that context, it has to be explained or suppressed.
Thank you for mentioning it.
>2. The linking concept for comparisons works well in the case that you
>analyzed,
>the time and geography dimensions in the repeated cross-national survey with a
>standard. Looking at it more closely, it works well for the following reasons:
> a. In the space dimension, there is a standard that all the studies are
>compared to. Thus the pairwise linking concept works.
> b. In the time dimension, there is a natural ordering via time, so no
> standard
>is necessary in this case, unlike the geography. The natural time ordering
>determines a natural direction for the linking.
>
> - Q: What happens in the space dimension when there is no standard? In that
>case, what should the link source and targets be (and in what direction?), or
>is a different comparison mechanism more appropriate, perhaps a bidirectional
>link or some other comparison mechanism? Perhaps this is Ingo's assignment?
Your question refers to the so called 'harmonization study', where
uncoordinated studies are compared. Since there is no standard, you will
not enter any nor have information about the variations on the standard.
You will neither need them. Having defined
- a 'harmonisation study' (corresponding to the 'program study in the
comparative by design case)
- references from the single studies, which are candidates for the
integration work, to the hramonization study (equivalent to the references
from the country study descriptions to the program study description in the
comparative by design case)
- a harmonized dataset, for the case harmonization would work...
(equivalent to the integrated dataset in the cross-national case)
- a harmonized variable in the harmonized dataset - just a name and an
identifier (the equivalent of a harmonized variable in the ... case)
- references from the harmonized variables to the candidates for
integration (analogue to the references used for integration in the ... case)
the program should be able to confront you with all the information necessary
- to decide on the comparability of the variables involved, on any level:
variable, question, data collection method, sample design, non-response
analysis, questionnaire, project summary etc. etc. and
- to decide on the appropriate definition of the harmonized variable (now,
you will have more than the name and the id)
- to document fully the choices made for this harmonization operation.
The rest will be done similarly as in the ... case (copy of the harmonized
variable in the single datasets, computation on the single dataset level
and integration into the harmonized dataset.
... Yes, you are right, you would not do that in the original datasets. At
a minimum, the program should create in the harmonization study a replica
of the original single datasets (id and reference to the original), which
will store the information about the variables as used for integration,
original or constructed. The construction will be stored in the reference
between those variables and the original ones in the original datasets. In
this manner, a harmonization study is constructed with minimal redundancy.
Actually, the variables compared are not compared directly, there is no
need for any link between them. They are compared through their reference
to the same potentially harmonized variable.
> - Q: You added additional coordinates "comp" and "integrated" on the
> time and
>space dimensions. What happens when there is more than one integrated dataset
>or semi-integrated ( I think Eurobarometer produces some intermediate
>integrated datasets before the final?). I don't think these semi-integrated
>datasets should be necessarily given different time coordinates as then they
>lose their connection with the original one-time datasets.
If my understanding of your question and of the EB case is correct, this
would be a case for several versions of the integrated dataset. If you
don't think so, pleas define 'semi-integrated'.
> - Q: In slide 132 "The country datasets appear to grow like the branches
> of a
>christmas tree" I understand that for each variable, there is only at most one
>link whose source is that variable. This does not seem sufficient to capture
>all the comparisons information that might be desired. For example, if
>there is
>a variable V1 in T1/C3 which differs from the T1 standard but is identical
>with
>the variables V1 in T2/C3, T3/C3, T4/C3, that information is not captured in
>the implicit process which you have described, as adding those links would
>then
>result in a graph which no longer resembles a Christmas tree. And it seems
>to me
>you would want to capture that information in order to properly synthesize a
>cumulative time slice/longitudinal dataset for C3.
In the construct I describe, the similarity between several varying country
Q/V will appear as a similarity between the variations on the standard. The
information is there to be discovered. This similarity may play a role
a) while defining the candidate harmonized variables
b) while defining the computations on dataset level - maybe the code
defined for one can be used for the others of the group ?
The question to answer first concerns the most appropriate way to define
that special kind of group:
- ... is it really necessary to store that communality in the DB?
Let's suppose the answer is yes:
- should this similarity be stored as links between those variables
(exponential growth of the number of links)
- should this be store as a kind of group?
- should the group be defined on the Q/V themselves or on the links to the
standard, which show the similarity?
The answer depends upon what helps best for a and b above. More analysis is
needed here.
> So, it seems in order to capture this information, the process must be
>further complicated. Yes/no?
Why not? (sorry, I am reluctant to answer closed questions...)
Best wishes
Reto
>Quoting Wendy Thomas <wlt at pop.umn.edu>:
>
>>A number of you were at the Edinburgh conference and saw an abbreviated
>>version of the material previded here by Reto. He asked that I share this
>>with you.
>
>--
>I-Lin Kuo
>Programmer/Analyst II
>ICPSR
>_______________________________________________
>DDI-CDG mailing list
>DDI-CDG at icpsr.umich.edu
>http://www.icpsr.umich.edu/mailman/listinfo/ddi-cdg
More information about the DDI-CDG
mailing list