[DDI-SRG] Single child Reference element in referenecs
I-Lin Kuo
ikuoikuo at gmail.com
Thu Sep 20 13:26:46 EDT 2007
Hi Achim,
I'm confused by "Nevertheless it should be generally avoided to be dependent
heavily from
XSLT filters at central places of processing"
What are XSLT filters? Do you mean this:
http://www.caucho.com/resin-3.0/xml/xslt-filter.xtp
Or do you mean we should avoid XSLT processing in general? Also, while I
understand the statement, I don't understand the reasoning, the why's behind
it. To me, it's similar to saying "Nevertheless it should be generally
avoided to be dependent heavily upon Windows servers at central places of
processing" which, though it may be true, requires some justification. Is
the reason because of XSLT's performance, or because XSLT skills are not
widely available, etc.?
On 9/20/07, Joachim Wackerow <joachim.wackerow at gesis.org> wrote:
>
> Hi I-Lin,
>
> I agree, that the key construction for the identification will be
> complicated and is indeed a very good candidate for optimization.
>
> The approach with the contains function seems to be not bad.
>
> Nevertheless it should be generally avoided to be dependent heavily from
> XSLT filters at central places of processing. If the design of the
> schemas would allow it to choose a more adapted alternative in this
> sense, I would opt for that.
>
> Regarding StAX: I think this can be a good approach for special
> applications. I'm not familiar enough with it, but I'm doubting if this
> can be a general approach.
>
> Achim
>
> I-Lin Kuo wrote:
> > Hi Achim,
> >
> > I'm inclined to think this is premature optimization. A general
> > heuristic about optimization is that you profile first before you
> > optimize, so that you spend your efforts on the bottlenecks rather than
> > every little thing. Not having profiled the key construction, however,
> > I'm going to make a guess on where the bottleneck is going to be with
> > XSLT key construction.
> >
> > First, while you're right about ends-with and //*[URN|ID], you're
> > fixating on the "Reference-ness" of the element and missing the trees
> > for the forest. How about combining the two conditions into
> > //*[URN|ID][contains(local-name(), 'Reference')]? That should work, I
> > think.
> >
> > From what I can see, the computational bottleneck isn't going to be the
> > identification of Reference elements, (this requires only two
> > conditions, see previous paragraph), but the construction of the
> > Reference value for the key. That involves selecting either the URN
> > value or the ID value. Selecting the URN is not difficult if no parsing
> > of the URN is necessary and right now I don't think it is. Selecting the
> > ID involves handling inheritance of agency code and version number up
> > the ancestor:: axis from only those appropriate ancestral types
> > (Maintainable, etc.) and is a complicated pain in the @r5e. That to me,
> > is more likely to be a bottleneck. I remember looking at DexTris's XSLT
> > stylesheets over 6 months to see how they handled it and was impressed
> > to see that it handled it reasonably. However, I seem to recall that it
> > didn't handle agency or version. The addition of agency and version
> > complicate things greatly. In addition, we still don't have a document
> > detailing the construction of URNs and the algorithm translating a URN
> > reference to and ID reference and back. So... I would suggest targeting
> > the reference/URN system if you're looking for a performance bottleneck.
> >
> > On 9/13/07, *Joachim Wackerow* <joachim.wackerow at gesis.org
> > <mailto:joachim.wackerow at gesis.org>> wrote:
> >
> > Pascal and others,
> >
> > What are now the reasons for removing the ReferenceType again?
> > We should collect the pro/cons and record them in Mantis. So a
> decision
> > is later comprehensible.
> > I have the impression, that the decision regarding this issue is
> > dependent which persons are attending the meeting :) .
> >
> >
> > 106 elements (22 in reusable, 84 in others) in the current Schema
> are
> > reference elements, i.e. they are now using ReferenceType.
> >
> > Some thoughts regarding the processing of references:
> > (I have the impression that some of these thoughts have open ends)
> >
> > References are heavily used in DDI 3.0. Therefore the processing of
> > references should be easy and clear, and should have a good
> > performance.
> > Every detail will affect the complexity of the XML document, the
> > complexity of an XSLT stylesheet, and the processing performance as
> > well.
> >
> > As the references and identification will be used heavily in a DDI 3
> > document, XSLT keys seem to very important for processing both. A
> > construction of a key based on a defined element like ReferenceType
> or
> > IdentifiableID seems to be straight forward (the construction of the
> URN
> > is complicated anyway). A construction of a key for references which
> are
> > represented by 106 different elements is a lot of work. I don't see
> an
> > easy way. I'm asking myself for what this can be really useful. It
> is
> > not as necessary as for identifications. But without this
> possibility I
> > have a bad feeling. I think references are in DDI 3 so important.
> > Probably there will be an application requirement to identify easily
> > references. So the suggested attribute "isReference" with a fixed
> value
> > "true" should be at least realized.
> >
> > Then a key must be constructed with "//*[@isReference='true']". This
> is
> > an XPath expression which makes a filter test on every element. It
> is
> > very costly when processing large documents to go over EVERY
> element.
> > Furthermore filter tests are not really quickly processed. DDI 3
> > documents will be large. This approach can make only sense with
> keys,
> > then this process is necessary only once.
> >
> > When ReferenceType is not available every referencing element needs
> an
> > own template, which can call a general template for processing the
> > reference details.
> >
> > What is really the problem with the current explicit reference
> > solution?
> > It can be only the size of the document, the complexity is not
> really
> > larger for applications.
> >
> > I was wondering if the complexity of an XSLT stylesheet can be
> reduced
> > in using a generic approach for references, so not for each of the
> 106
> > referencing elements an own template will be necessary. This can
> make
> > sense in a generic reporting tool using a data-driven approach (push
> > approach). This approach could make also use of the systematic names
> of
> > the referencing elements and/or a look-up table with the field-level
> > documentation.
> >
> > Thinking this one step further a general reference container can
> make
> > sense, where the referenced subject is defined in an attribute or a
> > child element. For example instead of UniverseReference,
> > Reference/ReferencedSubject=Universe.
> >
> >
> > BTW "//*[URN|ID]" will not work. This would catch also the
> > identification. Something would be necessary like:
> > //*
> > [ local-name() != 'MaintainableID' ]
> > [ local-name() != 'VersionableID' ]
> > [ local-name() != 'IdentifiableID' ]
> > [ r:URN | r:ID ]
> > This would result in five test on every element.
> >
> > The function 'ends-with' is only available in XSLT/XPath 2.0. XSLT
> 1.0
> > cannot be used again. Only one processor (Saxon) does exist for XSLT
> > 2.0. This seems to be a limiting approach.
> >
> > This is a general note:
> > I would prefer data-driven processing of the complex DDI documents
> > and let do the XSLT processor the work, i.e. element-specific
> templates,
> > not a declarative programming style. The schema should be
> constructed
> > accordingly when possible.
> >
> > Any response welcome.
> >
> > I'll be available again starting at the meeting September 20.
> >
> > Achim
> >
> > I-Lin Kuo wrote:
> > > I would even go so far as to say that @isReference is redundant.
> > >
> > > I vaguely recollect that the reason given for the Reference was
> to
> > > identify elements of ReferenceType via a test of //*[Reference].
> It
> > > didn't convince me at the time, as I think that with the
> > elimination of
> > > the extra Reference element,
> //*[ends-with(local-name(),'Reference')]
> > > would work, or //*[URN|ID].
> > >
> > > On 9/7/07, *Pascal Heus* <pascal.heus at gmail.com
> > <mailto:pascal.heus at gmail.com>
> > > <mailto:pascal.heus at gmail.com <mailto:pascal.heus at gmail.com>>>
> wrote:
> > >
> > > Achim, I-Lin:
> > > we reviewed yesterday bug #2
> > > ( http://mantis.ddialliance.org/view.php?id=2) related to the
> > extra
> > > */Reference element that appears under every referencing type
> > in the
> > > current schema. We had a general agreement that this is not
> > necessary
> > > and that it should be removed or possibly replaced with a
> fixed
> > > @isReference attribute. Since I believe you initially
> > requested this
> > > change, we would like to have your perspective on the issue
> > before
> > > making a final decision.
> > > Would appreciate your prompt input as this change impacts
> > most of the
> > > ongoing tool development and the earlier we can make it
> > happen, the
> > > better.
> > > many thanks
> > > Pascal
> > >
> > > _______________________________________________
> > > DDI-SRG mailing list
> > > DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>
> > <mailto:DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>>
> > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
> > >
> > >
> > >
> > >
> > > --
> > > I-Lin Kuo
> > >
> > >
> > >
> >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > DDI-SRG mailing list
> > > DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>
> > > http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
> >
> >
> > --
> > GESIS - German Social Science Infrastructure Services
> > http://www.gesis.org/en/
> >
> > _______________________________________________
> > DDI-SRG mailing list
> > DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>
> > http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
> >
> >
> >
> >
> > --
> > I-Lin Kuo
>
>
> --
> GESIS - German Social Science Infrastructure Services
> http://www.gesis.org/en/
>
--
I-Lin Kuo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.icpsr.umich.edu/pipermail/ddi-srg/attachments/20070920/28d2a5d1/attachment-0001.html
More information about the DDI-SRG
mailing list