[DDI-SRG] DDI 3.0 simple data dictionary

Wendy Thomas wlt at pop.umn.edu
Mon Dec 22 14:58:08 EST 2008


Mike,

Depending on the current storage structure of you metadata you could 
probably write a simple perl script similar to the one you have to create 
a very basic data dictionary. I've attached a sample of how what you 
provided would be entered in DDI 3.0. I do recommend DDI 3.0 for those 
starting in DDI at this point because of the added features. I am familer 
with a number of data collections within the DOJ as well as the need to 
link these to geographic files. If you expand the application of DDI to 
your at all, you will quickly find that these features will come in handy.

A qucck walkthrough of the attached xml file will help you get oriented. 
First, published DDI instances are always wrapped in a DDIInstance to make 
them consistantly recognizable. While I routinely list all the schemas 
available in DDI the only ones needed here are instance, studyunit, 
conceptualcomponent, logicalproduct, physicaldatastructure, 
physicalinstance, archive, and reusable (along with the support files for 
XHTML and dublin core). If you are using this only for internal purposes 
you could skip the instance and the archive sections, but this doesn't 
really save you much.

The required information in the StudyUnit consists of a brief citation 
(title required), universe reference, abstract, and purpose. These second 
two can be very brief. You need to declare the universe in a universe 
scheme, at least the top level. At the end of the instance is a brief 
identification of the responsible agency, refered to as the archive. It 
contains a reference to itself in the organization scheme. Once again this 
can be very brief but as this is standard information it could be produced 
directly from the perl script.

The body of the instance is broken into three parts:

LogicalProduct
Describes the record of data as a whole. As this is a simple record its 
pretty brief. The LogicalRecord within the DataRelationship section serves 
the purpose of the link between the information on the phyiscal layout of 
the data record and the intellectual content.
VariableScheme contains the intellectual information information on each 
variable. In your file you have declared them all to be strings so all are 
text representations. Each variable has a Name, Label, and 
TextRepresentation with the maximum field length. Pretty basic. Other 
options are numeric, coded categories, date, etc.

PhysicalDataProduct
This describes the physical layout in two steps. First is the physical 
structure where you find the link to the LogicalRecord. Once again this is 
a simple file so there is a single physical record segment and the default 
delimiter is identified.
The RecordLayoutScheme identifies the RecordLayout, links it to the 
PhysicalSegment within the PhysicalStructure, indicates the language of 
the file (ASCII) the ArrayBase (1) and then lists each data item in the 
file. This consists of a reference to the variable, its array number, and 
width.

PhysicalInstance is simply a refrence to the location of the physical data 
file that the metadata describes, linking to the PhysicalStructure by 
referencing the RecordLayout.

The may seem a bit convoluted but consider that you may have multipel 
copies of a data file (multiple physicalinstances of the same file), or 
different formats of the data (different physicaldatastructures), all 
pointing back to the same intellectual description of the data. This 
structure allows you to copy and reformat your data and keep it all linked 
to the common description of the data.

If you haven't joined the DDI User group yet, you should as there will be 
training sessions in the US announced over the coming year. Also please 
contact with an further questions you have. I have asked Achim, the author 
of the software you mentioned to contact you upon his return from vacation 
(mid January). I hope this helped clarify what you needed for a basic data 
dictionary in DDI 3.0

Wendy Thomas
Chair, DDI Technical Implementation Committee




Wendy L. Thomas                          Phone: +1 612.624.4389
Data Access Core Director		 Fax:   +1 612.626.8375
Minnesota Population Center              Email: wlt at pop.umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
-------------- next part --------------
<?xml version="1.0"?>

<ddi:DDIInstance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ddi:instance:3_0 instance.xsd" xmlns:ddi="ddi:instance:3_0" xmlns:r="ddi:reusable:3_0" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:dce="ddi:dcelements:3_0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:a="ddi:archive:3_0" xmlns:g="ddi:group:3_0" xmlns:cm="ddi:comparative:3_0" xmlns:c="ddi:conceptualcomponent:3_0" xmlns:d="ddi:datacollection:3_0" xmlns:l="ddi:logicalproduct:3_0" xmlns:p="ddi:physicaldataproduct:3_0" xmlns:ds="ddi:dataset:3_0" xmlns:pi="ddi:physicalinstance:3_0" xmlns:m1="ddi:physicaldataproduct/ncube/normal:3_0" xmlns:m2="ddi:physicaldataproduct/ncube/tabular:3_0" xmlns:m3="ddi:physicaldataproduct/ncube/inline:3_0" xmlns:s="ddi:studyunit:3_0" xmlns:pr="ddi:profile:3_0" isMaintainable="true" id="datadictionary" version="1.0" versionDate="2008-12-19" agency="mpc.umn.ddi" urn="urn:ddi:3.0:Instance=datadictionary:mpc.umn.ddi[1.0]">

<s:StudyUnit isMaintainable="true" id="WLT_DD" version="2.0" versionDate="2008-12-19">

<r:Citation>

<r:Title>Sample Data Dictionary</r:Title>

</r:Citation>

<s:Abstract isIdentifiable="true" id="ABS_1"><r:Content xml:lang="en">Limited data dictionary</r:Content> </s:Abstract>

<r:UniverseReference isReference="true"><r:ID>U1</r:ID></r:UniverseReference>

<s:Purpose isIdentifiable="true" id="PUR_1"><r:Content xml:lang="en">To show it can be done</r:Content></s:Purpose>

<c:ConceptualComponent isMaintainable="true" id="CC">

<c:UniverseScheme isMaintainable="true" id="UScheme">

<c:Universe isVersionable="true" id="U1">

<c:HumanReadable xml:lang="en">BLAHBLAHBLAH in the United States</c:HumanReadable>

</c:Universe>

</c:UniverseScheme>

</c:ConceptualComponent>

<l:LogicalProduct isMaintainable="true" id="LP_1">

<l:DataRelationship isIdentifiable="true" id="DR_1"><r:Description>Single logical record</r:Description>

<l:LogicalRecord isIdentifiable="true" id="LR_1" hasLocator="false">

<l:VariablesInRecord allVariablesInLogicalProduct="true"></l:VariablesInRecord></l:LogicalRecord>

</l:DataRelationship>

<l:VariableScheme isMaintainable="true" id="VS_1">

<l:Variable isVersionable="true" id="V1" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">CERT</r:Name>

<r:Label xml:lang="en">FDIC Certificate Number</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="5">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V2" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">BRNUM</r:Name>

<r:Label xml:lang="en">Office Number</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="4">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V3" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">STCNTYBR</r:Name>

<r:Label xml:lang="en">State and County Number (Branch)</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="5">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V4" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">CBSA_METROB</r:Name>

<r:Label xml:lang="en">Core Based Statistical Areas (Branch)</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="5">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V5" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">RSSDID</r:Name>

<r:Label xml:lang="en">FRB ID Number</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V6" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">DOCKET</r:Name>

<r:Label xml:lang="en">OTS Docket Number</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V7" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">NAME</r:Name>

<r:Label xml:lang="en">Institution Name</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="72">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V8" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">NAMEFULL</r:Name>

<r:Label xml:lang="en">Institution Name</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="72">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V9" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">RSSDHCR</r:Name>

<r:Label xml:lang="en">FRB ID Number (Band Holding Company)</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V10" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">NAMEHCR</r:Name>

<r:Label xml:lang="en">Name of regulatory high hold (BHC)</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="95">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V11" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">HCTMULT</r:Name>

<r:Label xml:lang="en">Multi-Bank Holding Company flag</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V12" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">HCTNONE</r:Name>

<r:Label xml:lang="en">No Bank Holding Company flag</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V13" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">HCTONE</r:Name>

<r:Label xml:lang="en">One Bank Holding Company flag</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V14" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">STALPHCR</r:Name>

<r:Label xml:lang="en">State Code(BHC)</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="2">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V15" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">CITYHCR</r:Name>

<r:Label xml:lang="en">City (Bank Holding Company)</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="25">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V16" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">UNIT</r:Name>

<r:Label xml:lang="en">Unit Bank flag</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V17" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">REGAGNT</r:Name>

<r:Label xml:lang="en">Primary Federal Regulator</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="5">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V18" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">INSAGNT1</r:Name>

<r:Label xml:lang="en">Primary Insurance Fund</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="5">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V19" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">OAKAR</r:Name>

<r:Label xml:lang="en">OAKAR flag</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="8">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

<l:Variable isVersionable="true" id="V20" isTemporal="false" isGeographic="false" isWeight="false">

<r:Name xml:lang="en">CHRTAGNT</r:Name>

<r:Label xml:lang="en">Charter Agent Code</r:Label>

<l:Representation>

<l:TextRepresentation maxLength="5">

</l:TextRepresentation>

</l:Representation>

</l:Variable>

</l:VariableScheme>

<!-- continues through remaining variables -->

</l:LogicalProduct>

<p:PhysicalDataProduct isMaintainable="true" id="PD_1">

<p:PhysicalStructureScheme isMaintainable="true" id="PSS_1">

<p:PhysicalStructure isVersionable="true" id="PS_1">

<p:LogicalProductReference isReference="true"><r:ID>LP_1</r:ID></p:LogicalProductReference>

<p:DefaultDelimiter>Comma</p:DefaultDelimiter>

<p:GrossRecordStructure isIdentifiable="true" id="GR_1" numberOfPhysicalSegments="1">

<p:LogicalRecordReference isReference="true"><r:ID>LR_1</r:ID></p:LogicalRecordReference>

<p:PhysicalRecordSegment isIdentifiable="true" id="PHYS_1" segmentOrder="1" hasSegmentKey="false">

</p:PhysicalRecordSegment>

</p:GrossRecordStructure>

</p:PhysicalStructure>

</p:PhysicalStructureScheme>

<p:RecordLayoutScheme isMaintainable="true" id="RLS_1"> 

<p:RecordLayout isIdentifiable="true" id="RL_1">

<p:PhysicalStructureReference isReference="true" lateBound="false"><r:ID>PS_1</r:ID><p:PhysicalRecordSegmentUsed>PHYS_1</p:PhysicalRecordSegmentUsed></p:PhysicalStructureReference>

<p:CharacterSet>ASCII</p:CharacterSet>

<p:ArrayBase>1</p:ArrayBase>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V1</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>1</p:ArrayPosition><p:Width>5</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V2</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>2</p:ArrayPosition><p:Width>4</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V3</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>3</p:ArrayPosition><p:Width>5</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V4</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>4</p:ArrayPosition><p:Width>5</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V5</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>5</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V6</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>6</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V7</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>7</p:ArrayPosition><p:Width>72</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V8</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>8</p:ArrayPosition><p:Width>72</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V9</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>9</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V10</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>10</p:ArrayPosition><p:Width>95</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V11</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>11</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V12</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>12</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V13</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>13</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V14</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>14</p:ArrayPosition><p:Width>2</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V15</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>15</p:ArrayPosition><p:Width>25</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V16</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>16</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V17</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>17</p:ArrayPosition><p:Width>5</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V18</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>18</p:ArrayPosition><p:Width>5</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V19</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>19</p:ArrayPosition><p:Width>8</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<p:DataItem>

<p:VariableReference isReference="true"><r:ID>V20</r:ID></p:VariableReference>

<p:PhysicalLocation><p:ArrayPosition>20</p:ArrayPosition><p:Width>5</p:Width>

</p:PhysicalLocation>

</p:DataItem>

<!-- continues through remaining variables -->

</p:RecordLayout>

</p:RecordLayoutScheme>

</p:PhysicalDataProduct>

<pi:PhysicalInstance isMaintainable="true" id="PI_1">

<pi:RecordLayoutReference isReference="true"><r:ID>RL_1</r:ID></pi:RecordLayoutReference>

<pi:DataFileIdentification isIdentifiable="true" id="FID_1">

<pi:Location>DOJ</pi:Location>

<pi:URI>filename.dat</pi:URI>

</pi:DataFileIdentification>

</pi:PhysicalInstance>

<a:Archive isMaintainable="true" id="ARCH">

<a:ArchiveSpecific>

<a:ArchiveOrganizationReference isReference="true"><r:ID>ORG_OWNER</r:ID></a:ArchiveOrganizationReference>

</a:ArchiveSpecific>

<a:OrganizationScheme isMaintainable="true" id="OS_1">

<a:Organization isVersionable="true" id="ORG_OWNER">

<a:OrganizationName>Minnesota Population Center</a:OrganizationName>

<a:Nickname>mpc.umn.ddi</a:Nickname>

</a:Organization>

</a:OrganizationScheme>

</a:Archive>

</s:StudyUnit>

</ddi:DDIInstance>





More information about the DDI-SRG mailing list