[DDI-ADG] Fwd: Use Case No.1

Katherine McNeill-Harman mcneillh at MIT.EDU
Mon Jun 27 16:09:00 EDT 2005


I think you make all very good points (most of your suggestions I agree 
with) and will just add a couple of comments before tomorrow's meeting (and 
given this, I didn't pull together a different specific study for us to 
discuss).  Just a couple of comments within:

At 03:34 PM 6/24/2005 -0400, Sanda Ionescu wrote:



>>Hi, all.
>>
>>Mary and I thought that looking at some concrete examples might give us a 
>>better insight into the use of available geographic tags, and also reveal 
>>any gaps that we might want to try to fill in the specification.
>>
>>Starting from Kate's first set of questions
>>(1. Searching for variable information within a specific geographic 
>>coverage area at a specific level of geographic unit (yet still 
>>aggregated), e.g.
>>- data on voter registration by precinct within the United States
>>- unemployment rate by county within the United States)
>>
>>
>>we picked up a codebook that documents just this type of data. It's ICPSR 
>>study no. 9405 and I'm sending it as an attachment to this message.
>>
>>At variable level:
>>As you will see, there are 3 geographic variables in this dataset - state 
>>code (V4) county name (V5) and county code (V6).
>>Although there is no region variable, analysis by region is also 
>>possible, because the code for region is embedded in the state code (see 
>>V4). This information is included in a free text note to the variable - 
>>should we attempt to capture this kind of information in such a way that 
>>it would become machine-readable? if yes, how do we go about it?
>>Same question for V6 - in the DDI, we would probably include just the 
>>county code in the category values tags, because that's what we have in 
>>the data. However, these codes can only be used in conjunction with the 
>>codes in V4. Do we want to code this information in any way that will 
>>assist "machine actionability'?
>>
>>At study level (more closely related to search and discovery, and Kate's 
>>questions):
>>Trying to mark up the info with the tags provided in the V 3.0 spreadsheet -
>>1. from what I see here, geographicCover becomes a "container element" 
>>with no PCDATA allowed (?) and the actual textual information is now 
>>supposed to go in its child (?) <text>, that replaces the <geogCover> 
>>from V 2.0?
>>That's okay, I suppose. So , then, we would have:
>><geographicCover><text>United States</text></geographicCover>.
>>Farther down, however, it gets more confusing (at least to me it does). 
>>Looking at the way elements are structured, it appears that countryCode, 
>>subCountryCode, geographicUnit, geographicKind and geoBndBox are all 
>>children of geographicCover on a par with <text>. If I'm reading the 
>>spreadsheet correctly, this kind of structure is counterintuitive to the 
>>new definition of geographicCover, which is "largest geographic extent". 
>>The elements' structure  only provides for a list of various levels of 
>>coverage without indicating a hierarchy. If, for instance, we find 
>>geoKind defined as line, or point, or polygon, how will we know whether 
>>this refers to the geographicUnit (lowest level of coverage) or <text> 
>>(largest extent) or <countryCode>, or what?
>>
>><countryCode> - will it be an element spelling out the name of the 
>>country(ies) covered with an attribute containing the ISO code? this is 
>>not clear in the spreadsheet. We probably need both name and code (If the 
>><text> under geogCover were "Europe" we would need to list all countries 
>>included both by name and code).
>>Going back to our example above, where our largest coverage is one 
>>country, would we repeat the name of the country, and then add its ISO code?
>><countryCode ISOcode="us">United States</countryCode>
>>
>><subCountryCode> - this seems to be intended for lower-than-country 
>>levels of coverage.  How do we actually use it? If we have more than one 
>>subcountry level, do we just repeat the element for each level? - again, 
>>this does not allow for establishing a hierarchy among the said levels 
>>(maybe we should enable some kind of a nesting structure here? going down 
>>from largest extent to smallest unit (geographicUnit)?
>>In the attached study, we have three subcountry levels - region, state, 
>>and county. Do we mention each within its own <subCountryCode> element?
>
>><subCountryCode>region</subCountryCode>
>><subCountryCode>state</subCountryCode>
>><subCountryCode>county</subCountryCode>
>
>>The presence of the word "code" in the name of the element suggest that 
>>we would have actual codes here - but that seems unlikely? as the codes 
>>will be listed in the variables' description or in an external document. 
>>So from here we would probably only have a link to either the geographic 
>>variable(s), or the said document. If this seems right, we should enable 
>>such a link. And maybe rename this element as <subCountryLevel>?

This makes sense to me, b/one wouldn't indicate here the name/code of the 
county covered, but just the fact that information is available by county.

>>Not sure if "authority" is meant to be an attribute - probably yes. In 
>>our example, it would read "ICPSR". But if we're only listing levels 
>>here, it should accompany the codes, wherever they are - or qualify the 
>>link to the codes.

For authority, I thought that was the body that developed and maintained 
the codes that were used in this system (not the author of the study) (I 
assume you were using ICPSR as an author?)

>>It also seems appropriate to point from the levels listed here to the 
>>variables that cover them - region and state to V4, county to V5 and V6. 
>>This is not the same thing as linking to the codes; the codes may not be 
>>embedded in the variable description, and there is not always a 
>>one-to-one match between levels, codes, and geographic variables.
>>
>>Finally, in our example, <geographicUnit> would be "county".

I think I'm a little uncertain about the relationship of <subCountryCode> 
to <geographicUnit>.  2.0 defines <geographicUnit> as "lowest level of 
geographic aggregation covered by the data," which you have done so with 
the example.  I agree with your point that we should provide a set of 
elements that can identify the different levels of geography at which info. 
is available (and have them be hierarchical in a machine-readable manner, 
as you suggest).  And I like you're suggestion of <subCountryLevel>.  That 
being said, do we need a separate <geographicUnit> element?  If we were to 
adopt some sort of more structured <subCountryLevel> set of elements, 
presumably there would be one for county; could we somehow enable the flag 
it to indicate that it's the lowest level available.  It's not a lot of 
repetition, and I'm not sure if also having it be separate would be easier 
for systems to interpret it.


>>I won't go into the actual mapping with this example.
>>
>>I will raise one last question that's not related to our example. Some 
>>studies may cover regions that are above country level (like Eastern 
>>Europe, for instance) but smaller than total coverage (or largest 
>>extent). So far, we don't seem to have a way of accounting for this kind 
>>of coverage.

Yes, and one thing that's come up in conversations is how to describe 
coverage of an area when not all sections in an area are covered (e.g. if 
it's info on Eastern Europe, but some countries in that region were not 
covered).


>>One last word about the exercise above: it obviously raises more 
>>questions than provides answers, and that's precisely why we hope it will 
>>be a good starting point for our next discussion(s).
>>
>>Sanda and Mary
>>
>>
>>
>>Sanda Ionescu,
>>Research Associate
>>Inter-university Consortium for Political and Social Research (ICPSR)
>>The University of Michigan
>>P.O. Box 1248
>>Ann Arbor, MI 48106
>>
>>Phone: (734) 615-7890
>>Fax: (734) 615-7890
>>        (734) 647-8200
>_______________________________________________
>DDI-ADG mailing list
>DDI-ADG at icpsr.umich.edu
>http://www.icpsr.umich.edu/mailman/listinfo/ddi-adg

___________________________________________
Katherine McNeill-Harman
Data Services Librarian
Dewey Library for Management and Social Sciences
Massachusetts Institute of Technology
77 Massachusetts Avenue, E53-100
Cambridge, MA 02139
mcneillh at mit.edu
617-253-0787 



More information about the DDI-ADG mailing list