From reto.hadorn at sidos.unine.ch Thu Jun 9 08:40:31 2005 From: reto.hadorn at sidos.unine.ch (Reto Hadorn) Date: Thu Jun 9 08:42:06 2005 Subject: [DDI-CDG] The dataset movie Message-ID: <6.0.3.0.2.20050609112347.01ec9d10@webmail.unine.ch> Hello colleagues, Discussion in the comparative datasets group in Edinburgh have shown some difficulties in the conception of the documentation of the whole process of conducting a cross-national survey program over years - a requirement for the new metadata model since we ambition to cover the whole life-cycle of the data/metadata couple. This discussion has also shown the difficulty of exposing the ideas available. The text I distributed is lengthy and abstract; the diagram is static... So I reconsidered the conception of the presenation I had to make on Friday in the MetaDater session, to present the architecture as a process of construction rather than a finished house. As a medium, I used the movie, showing the various stages of construction. Of course, a 20' presentation does not make it possible to show all the necessary details. So I have done some additional work to make the presentation more complete; some scenes will probably still be added in the future. I distribute the rolls now, as they are, in the hope that they allow to go deeper into the discussion and clarify what we need to have in the DDI. The movie is not to be seen in isolation but rather together with the other documents describing the issue: the text document and the diagram already distributed. There are some inconsistencies now between the text and the diagram and the film, since every new presentation leads to a deeper understanding of the matter. I hope I will be able to make the necessary corrections to the text in the next week, so we get a better integrated package. Oliver: I will also send this to Wendy, for distribution in the CDG. Best wishes Reto Reto Hadorn SIDOS Swiss Information and Data Archive Service For the Social Sciences, Neuch?tel +41 32 721 20 03 -------------- next part -------------- A non-text attachment was scrubbed... Name: RCS_Movie.ppt Type: application/octet-stream Size: 1189376 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20050609/1aa7ccbf/RCS_Movie-0001.obj From ikuo at umich.edu Sat Jun 11 09:22:32 2005 From: ikuo at umich.edu (I-Lin Kuo) Date: Mon Jun 13 04:00:58 2005 Subject: [DDI-SRG] [DDI-CDG] The dataset movie (fwd) In-Reply-To: References: Message-ID: <20050611092232.iay4q1zusgow4w4g@icpsr.mail.umich.edu> Hi Reto, I have a couple of questions about the movie, some of which may overlap with the ones I asked you at IASSIST 1. What is a "cat" in the sense of "This is a cat" remarks on slide 44, 91? 2. The linking concept for comparisons works well in the case that you analyzed, the time and geography dimensions in the repeated cross-national survey with a standard. Looking at it more closely, it works well for the following reasons: a. In the space dimension, there is a standard that all the studies are compared to. Thus the pairwise linking concept works. b. In the time dimension, there is a natural ordering via time, so no standard is necessary in this case, unlike the geography. The natural time ordering determines a natural direction for the linking. - Q: What happens in the space dimension when there is no standard? In that case, what should the link source and targets be (and in what direction?), or is a different comparison mechanism more appropriate, perhaps a bidirectional link or some other comparison mechanism? Perhaps this is Ingo's assignment? - Q: You added additional coordinates "comp" and "integrated" on the time and space dimensions. What happens when there is more than one integrated dataset or semi-integrated ( I think Eurobarometer produces some intermediate integrated datasets before the final?). I don't think these semi-integrated datasets should be necessarily given different time coordinates as then they lose their connection with the original one-time datasets. - Q: In slide 132 "The country datasets appear to grow like the branches of a christmas tree" I understand that for each variable, there is only at most one link whose source is that variable. This does not seem sufficient to capture all the comparisons information that might be desired. For example, if there is a variable V1 in T1/C3 which differs from the T1 standard but is identical with the variables V1 in T2/C3, T3/C3, T4/C3, that information is not captured in the implicit process which you have described, as adding those links would then result in a graph which no longer resembles a Christmas tree. And it seems to me you would want to capture that information in order to properly synthesize a cumulative time slice/longitudinal dataset for C3. So, it seems in order to capture this information, the process must be further complicated. Yes/no? Quoting Wendy Thomas : > A number of you were at the Edinburgh conference and saw an abbreviated > version of the material previded here by Reto. He asked that I share this > with you. -- I-Lin Kuo Programmer/Analyst II ICPSR From watteler at za.uni-koeln.de Mon Jun 13 04:33:26 2005 From: watteler at za.uni-koeln.de (Oliver Watteler) Date: Mon Jun 13 04:35:29 2005 Subject: [DDI-CDG] Thanks for your help Message-ID: <42AD44D6.3020408@za.uni-koeln.de> To everyone participating in the DDI-CDG in Edinburgh! I just wanted to take the opportunity to thank everyone for their active participation in the Comparative Data Group's meeting at IASSIST in Edinburgh. From consider it a success and I hope you agree. I will hurry to make the meeting minutes and all material available to you (I was away on vacation the past two weeks). Thanks again and hoping to continue the fruitful work, yours Oliver. P.S.: Unfortunately I do not have Bill Bradley's private e-mail, but I want to contact him as well. -- GESIS - Zentralarchiv f?r Empirische Sozialforschung (ZA) Oliver Watteler, M.A. Dokumentation und Archivierung /Documentation and Archiving Bachemer Str.40 50931 K?ln Tel.:++49-221-47694-76 FAX :++49-221-47694-44 Web :http://www.gesis.org/za From reto.hadorn at sidos.unine.ch Mon Jun 13 09:50:42 2005 From: reto.hadorn at sidos.unine.ch (Reto Hadorn) Date: Mon Jun 13 09:59:30 2005 Subject: [DDI-SRG] [DDI-CDG] The dataset movie (fwd) In-Reply-To: <20050611092232.iay4q1zusgow4w4g@icpsr.mail.umich.edu> References: <20050611092232.iay4q1zusgow4w4g@icpsr.mail.umich.edu> Message-ID: <6.0.3.0.2.20050613144937.01ed5310@webmail.unine.ch> Hi I-Lin, At 11.06.2005, you wrote: >Hi Reto, > >I have a couple of questions about the movie, some of which may overlap >with the >ones I asked you at IASSIST > >1. What is a "cat" in the sense of "This is a cat" remarks on slide 44, 91? In his introductory conference, Jowell compared coordinating a cross-national survey program with herding cats. In the context of the conference, this was understandable for those who recorded that joke; in a publication out of that context, it has to be explained or suppressed. Thank you for mentioning it. >2. The linking concept for comparisons works well in the case that you >analyzed, >the time and geography dimensions in the repeated cross-national survey with a >standard. Looking at it more closely, it works well for the following reasons: > a. In the space dimension, there is a standard that all the studies are >compared to. Thus the pairwise linking concept works. > b. In the time dimension, there is a natural ordering via time, so no > standard >is necessary in this case, unlike the geography. The natural time ordering >determines a natural direction for the linking. > > - Q: What happens in the space dimension when there is no standard? In that >case, what should the link source and targets be (and in what direction?), or >is a different comparison mechanism more appropriate, perhaps a bidirectional >link or some other comparison mechanism? Perhaps this is Ingo's assignment? Your question refers to the so called 'harmonization study', where uncoordinated studies are compared. Since there is no standard, you will not enter any nor have information about the variations on the standard. You will neither need them. Having defined - a 'harmonisation study' (corresponding to the 'program study in the comparative by design case) - references from the single studies, which are candidates for the integration work, to the hramonization study (equivalent to the references from the country study descriptions to the program study description in the comparative by design case) - a harmonized dataset, for the case harmonization would work... (equivalent to the integrated dataset in the cross-national case) - a harmonized variable in the harmonized dataset - just a name and an identifier (the equivalent of a harmonized variable in the ... case) - references from the harmonized variables to the candidates for integration (analogue to the references used for integration in the ... case) the program should be able to confront you with all the information necessary - to decide on the comparability of the variables involved, on any level: variable, question, data collection method, sample design, non-response analysis, questionnaire, project summary etc. etc. and - to decide on the appropriate definition of the harmonized variable (now, you will have more than the name and the id) - to document fully the choices made for this harmonization operation. The rest will be done similarly as in the ... case (copy of the harmonized variable in the single datasets, computation on the single dataset level and integration into the harmonized dataset. ... Yes, you are right, you would not do that in the original datasets. At a minimum, the program should create in the harmonization study a replica of the original single datasets (id and reference to the original), which will store the information about the variables as used for integration, original or constructed. The construction will be stored in the reference between those variables and the original ones in the original datasets. In this manner, a harmonization study is constructed with minimal redundancy. Actually, the variables compared are not compared directly, there is no need for any link between them. They are compared through their reference to the same potentially harmonized variable. > - Q: You added additional coordinates "comp" and "integrated" on the > time and >space dimensions. What happens when there is more than one integrated dataset >or semi-integrated ( I think Eurobarometer produces some intermediate >integrated datasets before the final?). I don't think these semi-integrated >datasets should be necessarily given different time coordinates as then they >lose their connection with the original one-time datasets. If my understanding of your question and of the EB case is correct, this would be a case for several versions of the integrated dataset. If you don't think so, pleas define 'semi-integrated'. > - Q: In slide 132 "The country datasets appear to grow like the branches > of a >christmas tree" I understand that for each variable, there is only at most one >link whose source is that variable. This does not seem sufficient to capture >all the comparisons information that might be desired. For example, if >there is >a variable V1 in T1/C3 which differs from the T1 standard but is identical >with >the variables V1 in T2/C3, T3/C3, T4/C3, that information is not captured in >the implicit process which you have described, as adding those links would >then >result in a graph which no longer resembles a Christmas tree. And it seems >to me >you would want to capture that information in order to properly synthesize a >cumulative time slice/longitudinal dataset for C3. In the construct I describe, the similarity between several varying country Q/V will appear as a similarity between the variations on the standard. The information is there to be discovered. This similarity may play a role a) while defining the candidate harmonized variables b) while defining the computations on dataset level - maybe the code defined for one can be used for the others of the group ? The question to answer first concerns the most appropriate way to define that special kind of group: - ... is it really necessary to store that communality in the DB? Let's suppose the answer is yes: - should this similarity be stored as links between those variables (exponential growth of the number of links) - should this be store as a kind of group? - should the group be defined on the Q/V themselves or on the links to the standard, which show the similarity? The answer depends upon what helps best for a and b above. More analysis is needed here. > So, it seems in order to capture this information, the process must be >further complicated. Yes/no? Why not? (sorry, I am reluctant to answer closed questions...) Best wishes Reto >Quoting Wendy Thomas : > >>A number of you were at the Edinburgh conference and saw an abbreviated >>version of the material previded here by Reto. He asked that I share this >>with you. > >-- >I-Lin Kuo >Programmer/Analyst II >ICPSR >_______________________________________________ >DDI-CDG mailing list >DDI-CDG@icpsr.umich.edu >http://www.icpsr.umich.edu/mailman/listinfo/ddi-cdg From ikuo at umich.edu Mon Jun 20 08:44:41 2005 From: ikuo at umich.edu (I-Lin Kuo) Date: Mon Jun 20 08:48:06 2005 Subject: [DDI-SRG] [DDI-CDG] The dataset movie (fwd) In-Reply-To: <6.0.3.0.2.20050613144937.01ed5310@webmail.unine.ch> References: <20050611092232.iay4q1zusgow4w4g@icpsr.mail.umich.edu> <6.0.3.0.2.20050613144937.01ed5310@webmail.unine.ch> Message-ID: <20050620084441.ermzp0nk2gc0kwgw@icpsr.mail.umich.edu> Sorry to have taken so long to respond... Quoting Reto Hadorn : >> 2. The linking concept for comparisons works well in the case that >> you analyzed, >> the time and geography dimensions in the repeated cross-national >> survey with a >> standard. Looking at it more closely, it works well for the >> following reasons: >> a. In the space dimension, there is a standard that all the studies are >> compared to. Thus the pairwise linking concept works. >> b. In the time dimension, there is a natural ordering via time, so >> no standard >> is necessary in this case, unlike the geography. The natural time ordering >> determines a natural direction for the linking. >> >> - Q: What happens in the space dimension when there is no standard? In that >> case, what should the link source and targets be (and in what >> direction?), or >> is a different comparison mechanism more appropriate, perhaps a >> bidirectional >> link or some other comparison mechanism? Perhaps this is Ingo's assignment? > > Your question refers to the so called 'harmonization study', where > uncoordinated studies are compared. Since there is no standard, you > will not enter any nor have information about the variations on the > standard. You will neither need them. Having defined > - a 'harmonisation study' (corresponding to the 'program study in the > comparative by design case) > - references from the single studies, which are candidates for the > integration work, to the hramonization study (equivalent to the > references from the country study descriptions to the program study > description in the comparative by design case) > - a harmonized dataset, for the case harmonization would work... > (equivalent to the integrated dataset in the cross-national case) > - a harmonized variable in the harmonized dataset - just a name and > an identifier (the equivalent of a harmonized variable in the ... > case) > - references from the harmonized variables to the candidates for > integration (analogue to the references used for integration in the > ... case) > the program should be able to confront you with all the information necessary > - to decide on the comparability of the variables involved, on any > level: variable, question, data collection method, sample design, > non-response analysis, questionnaire, project summary etc. etc. and > - to decide on the appropriate definition of the harmonized variable > (now, you will have more than the name and the id) > - to document fully the choices made for this harmonization operation. > > The rest will be done similarly as in the ... case (copy of the > harmonized variable in the single datasets, computation on the single > dataset level and integration into the harmonized dataset. > > ... Yes, you are right, you would not do that in the original > datasets. At a minimum, the program should create in the > harmonization study a replica of the original single datasets (id and > reference to the original), which will store the information about > the variables as used for integration, original or constructed. The > construction will be stored in the reference between those variables > and the original ones in the original datasets. In this manner, a > harmonization study is constructed with minimal redundancy. > > Actually, the variables compared are not compared directly, there is > no need for any link between them. They are compared through their > reference to the same potentially harmonized variable. If I understand correctly, the reply implies that when neither a standard nor a harmonization exists for a collection of datasets (and when the relationship between is not that between successive waves of a longitudinal study), there is not a need for comparison of variables. Indeed, in the scheme which you have outlined, that comparison is not possible to construct. In particular, that means for a loose collection of multi-national studies without a standard and prior to harmonization, there is no way to document variable comparisons between the studies. Is that correct? > >> - Q: You added additional coordinates "comp" and "integrated" on >> the time and >> space dimensions. What happens when there is more than one >> integrated dataset >> or semi-integrated ( I think Eurobarometer produces some intermediate >> integrated datasets before the final?). I don't think these semi-integrated >> datasets should be necessarily given different time coordinates as then they >> lose their connection with the original one-time datasets. > > If my understanding of your question and of the EB case is correct, > this would be a case for several versions of the integrated dataset. > If you don't think so, pleas define 'semi-integrated'. Yes, you are right. I didn't think about versioning. > >> - Q: In slide 132 "The country datasets appear to grow like the >> branches of a >> christmas tree" I understand that for each variable, there is only >> at most one >> link whose source is that variable. This does not seem sufficient to capture >> all the comparisons information that might be desired. For example, >> if there is >> a variable V1 in T1/C3 which differs from the T1 standard but is >> identical with >> the variables V1 in T2/C3, T3/C3, T4/C3, that information is not captured in >> the implicit process which you have described, as adding those links >> would then >> result in a graph which no longer resembles a Christmas tree. And it >> seems to me >> you would want to capture that information in order to properly synthesize a >> cumulative time slice/longitudinal dataset for C3. > > In the construct I describe, the similarity between several varying > country Q/V will appear as a similarity between the variations on the > standard. The information is there to be discovered. This similarity The "discovery" is the thing that I don't quite see. How do you capture or discover the similarity between variations? Capturing, to me, would imply links between the branches, in which case the graph is no longer a christmas tree. Discovery, I think, is not possible, as I was trying to illustrate in the example above. The fundamental problem with discovery is that equality is a transitive relation, while similarity (in the sense of comparative datasets) is not. If V1 in Country 5 at time T1 is identical to the V1 in Standard at time T1, which in turn is identical to V1 in standard at time T2, which in turn is identical to V1 in country 5 at time T2, then we may infer that V1 in Country 5 at time T1 is identical to V1 in Country 5 at time T2 by transitivity. However, consider the following two examples: 1) V1/C5/T1: What is your sex? Male Female Other V1/Standard/T1: What is your sex? Male Female V1/C5/T2: What is your sex? Male Female Other V1/Standard/T2: What is your sex? Male Female 2) V1/C5/T1: What is your sex? Male Female Other V1/Standard/T1: What is your sex? Male Female V1/C5/T2: What is your sex? Male Female homosexual bisexual V1/Standard/T2: What is your sex? Male Female In both examples, the description of the links in the christmas tree are the same: - V1/Standard/T1 and V1/Standard/T2 are identical - V1/C5/T1 and V1/Standard/T1 are identical in question wording but different in the categories. - V1/C5/T2 and V1/Standard/T2 are identical in question wording but different in the categories. However, in example 1, V1/C5/T1 and V1/C5/T2 are identical whereas in example 2 they differ in the categories. The identification of V1/C5/T1 and V1/C5/T2 in example 1 cannot be inferred from the above links in the christmas tree and would require some other descriptive information between V1/C5/T1 and V1/C5/T2 such as a direct link. Another case in which this problem appear is in comparison between the repeated multi-national case (slide 134) and the simple longitudinal study (slide 113). Ideally, if the comparison links are identified in a repeated cross-national study, then if I extract a slide of that for a single country, then I should expect to be able to recreate the links between the datasets of that slice just as if I had only started with a repeated study for a single country and marked that up. But, in the same way as above, I don't think the links between the countries can be constructed from the links in the christmas tree. From maryv at icpsr.umich.edu Wed Jun 22 08:25:24 2005 From: maryv at icpsr.umich.edu (Mary Vardigan) Date: Wed Jun 22 08:38:36 2005 Subject: [DDI-CDG] [DDI-All] DDI Meeting Minutes Message-ID: <6.0.1.1.2.20050622081503.01eeba98@icpsr.umich.edu> Dear Expert Committee members, It was great to see so many of you in Edinburgh in May. I wanted to let you know that the minutes for the meetings are now online: * Expert Committee meeting in Edinburgh (with a section on the Tuesday meeting of the comparative data groups requested by Ekkehard) --- http://www.icpsr.umich.edu/DDI/org/minutes/2005-05-22.html * Steering Committee meeting in Edinburgh -- http://www.icpsr.umich.edu/DDI/org/minutes/2005-05-23.html Also, we are still in the process of planning an October meeting in Ann Arbor that will involve the SRG and representatives of the working groups (not the full Expert Committee). The objective of the meeting will be to incorporate the proposals of the working groups into a draft XML Schema. More information on this will be forthcoming. Wishing you all a great summer! Mary Mary Vardigan Director, Collection Delivery Inter-university Consortium for Political and Social Research (ICPSR) University of Michigan P.O. Box 1248, Ann Arbor, MI 48106-1248 Phone: 734-615-7908 Fax: 734-647-8200 www.icpsr.umich.edu _______________________________________________ DDI-All mailing list DDI-All@icpsr.umich.edu http://www.icpsr.umich.edu/mailman/listinfo/ddi-all From matvey at umich.edu Wed Jun 22 08:34:54 2005 From: matvey at umich.edu (Matthew Richardson) Date: Wed Jun 22 08:38:55 2005 Subject: [DDI-CDG] [DDI-All] Improving the DDI Web Site Message-ID: <2147483647.1119429294@lime.icpsr.umich.edu> We're getting ready to re-design the DDI Web site, and I would like to gather feedback from site users on how best to improve it. We're especially looking for feedback on how to build a better navigation system. Please take a moment to answer the questions below. Detailed answers will greatly help us improve the site. Thanks for your assistance. 1) What parts of the site do you visit most often? 2) What site features work best for your needs? 3) What site features don't work well for you? I.e., what do you think most needs improvement? How would you like to see them improved? 4) What's missing from the site? What do we not offer that you feel would be a valuable addition? Matthew A. Richardson Inter-university Consortium for Political and Social Research Phone: 734.615.7901 Email: matvey@umich.edu "Everything tires with time, and starts to seek some opposition, to save it from itself." --Clive Barker, The Hellbound Heart _______________________________________________ DDI-All mailing list DDI-All@icpsr.umich.edu http://www.icpsr.umich.edu/mailman/listinfo/ddi-all From reto.hadorn at sidos.unine.ch Tue Jun 28 09:56:38 2005 From: reto.hadorn at sidos.unine.ch (Reto Hadorn) Date: Tue Jun 28 09:59:01 2005 Subject: [DDI-SRG] [DDI-CDG] The dataset movie (fwd) In-Reply-To: <20050620084441.ermzp0nk2gc0kwgw@icpsr.mail.umich.edu> References: <20050611092232.iay4q1zusgow4w4g@icpsr.mail.umich.edu> <6.0.3.0.2.20050613144937.01ed5310@webmail.unine.ch> <20050620084441.ermzp0nk2gc0kwgw@icpsr.mail.umich.edu> Message-ID: <6.0.3.0.2.20050628141432.01f374d8@webmail.unine.ch> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: RCNS_Movie_ILINKUO.ppt Type: application/octet-stream Size: 100864 bytes Desc: not available Url : http://www.icpsr.umich.edu/pipermail/ddi-cdg/attachments/20050628/86f16c5c/RCNS_Movie_ILINKUO-0001.obj