[DDI-SRG] Single child Reference element in referenecs

Joachim Wackerow joachim.wackerow at gesis.org
Thu Sep 20 15:28:24 EDT 2007


Hi I-Lin,

Sorry for the confusion. I meant XPath predicates like 
[URN|ID][contains(local-name(), 'Reference')], which filter a node-set. 
I didn't mean to avoid XSLT at all.

The reasoning is that predicates are slower processed by the XSLT 
processor than simply selecting elements by their names (using XSLT 
templates without predicates).

Hope this clarifies the issue.

Achim

I-Lin Kuo wrote:
> Hi Achim,
> 
> I'm confused by "Nevertheless it should be generally avoided to be 
> dependent heavily from
>   XSLT filters at central places of processing"
> 
> What are XSLT filters? Do you mean this:
> 
> http://www.caucho.com/resin-3.0/xml/xslt-filter.xtp
> 
> Or do you mean we should avoid XSLT processing in general? Also, while I 
> understand the statement, I don't understand the reasoning, the why's 
> behind it. To me, it's similar to saying "Nevertheless it should be 
> generally avoided to be dependent heavily upon Windows servers at 
> central places of processing" which, though it may be true, requires 
> some justification. Is the reason because of XSLT's performance, or 
> because XSLT skills are not widely available, etc.?
> 
> On 9/20/07, *Joachim Wackerow* <joachim.wackerow at gesis.org 
> <mailto:joachim.wackerow at gesis.org>> wrote:
> 
>     Hi I-Lin,
> 
>     I agree, that the key construction for the identification will be
>     complicated and is indeed a very good candidate for optimization.
> 
>     The approach with the contains function seems to be not bad.
> 
>     Nevertheless it should be generally avoided to be dependent heavily from
>       XSLT filters at central places of processing. If the design of the
>     schemas would allow it to choose a more adapted alternative in this
>     sense, I would opt for that.
> 
>     Regarding StAX: I think this can be a good approach for special
>     applications. I'm not familiar enough with it, but I'm doubting if this
>     can be a general approach.
> 
>     Achim
> 
>     I-Lin Kuo wrote:
>      > Hi Achim,
>      >
>      > I'm inclined to think this is premature optimization. A general
>      > heuristic about optimization is that you profile first before you
>      > optimize, so that you spend your efforts on the bottlenecks
>     rather than
>      > every little thing. Not having profiled the key construction,
>     however,
>      > I'm going to make a guess on where the bottleneck is going to be with
>      > XSLT key construction.
>      >
>      > First, while you're right about ends-with and //*[URN|ID], you're
>      > fixating on the "Reference-ness" of the element and missing the trees
>      > for the forest. How about combining the two conditions into
>      > //*[URN|ID][contains(local-name(), 'Reference')]? That should
>     work, I
>      > think.
>      >
>      >  From what I can see, the computational bottleneck isn't going to
>     be the
>      > identification of Reference elements, (this requires only two
>      > conditions, see previous paragraph), but the construction of the
>      > Reference value for the key. That involves selecting either the URN
>      > value or the ID value. Selecting the URN is not difficult if no
>     parsing
>      > of the URN is necessary and right now I don't think it is.
>     Selecting the
>      > ID involves handling inheritance of agency code and version number up
>      > the ancestor:: axis from only those appropriate ancestral types
>      > (Maintainable, etc.) and is a complicated pain in the @r5e. That
>     to me,
>      > is more likely to be a bottleneck. I remember looking at
>     DexTris's XSLT
>      > stylesheets over 6 months to see how they handled it and was
>     impressed
>      > to see that it handled it reasonably. However, I seem to recall
>     that it
>      > didn't handle agency or version. The addition of agency and version
>      > complicate things greatly. In addition, we still don't have a
>     document
>      > detailing the construction of URNs and the algorithm translating
>     a URN
>      > reference to and ID reference and back. So... I would suggest
>     targeting
>      > the reference/URN system if you're looking for a performance
>     bottleneck.
>      >
>      > On 9/13/07, *Joachim Wackerow* < joachim.wackerow at gesis.org
>     <mailto:joachim.wackerow at gesis.org>
>      > <mailto:joachim.wackerow at gesis.org
>     <mailto:joachim.wackerow at gesis.org>>> wrote:
>      >
>      >     Pascal and others,
>      >
>      >     What are now the reasons for removing the ReferenceType again?
>      >     We should collect the pro/cons and record them in Mantis. So
>     a decision
>      >     is later comprehensible.
>      >     I have the impression, that the decision regarding this issue is
>      >     dependent which persons are attending the meeting :) .
>      >
>      >
>      >     106 elements (22 in reusable, 84 in others) in the current
>     Schema are
>      >     reference elements, i.e. they are now using ReferenceType.
>      >
>      >     Some thoughts regarding the processing of references:
>      >     (I have the impression that some of these thoughts have open
>     ends)
>      >
>      >     References are heavily used in DDI 3.0. Therefore the
>     processing of
>      >     references should be easy and clear, and should have a good
>      >     performance.
>      >     Every detail will affect the complexity of the XML document, the
>      >     complexity of an XSLT stylesheet, and the processing
>     performance as
>      >     well.
>      >
>      >     As the references and identification will be used heavily in
>     a DDI 3
>      >     document, XSLT keys seem to very important for processing both. A
>      >     construction of a key based on a defined element like
>     ReferenceType or
>      >     IdentifiableID seems to be straight forward (the construction
>     of the URN
>      >     is complicated anyway). A construction of a key for
>     references which are
>      >     represented by 106 different elements is a lot of work. I
>     don't see an
>      >     easy way. I'm asking myself for what this can be really
>     useful. It is
>      >     not as necessary as for identifications. But without this
>     possibility I
>      >     have a bad feeling. I think references are in DDI 3 so important.
>      >     Probably there will be an application requirement to identify
>     easily
>      >     references. So the suggested attribute "isReference" with a
>     fixed value
>      >     "true" should be at least realized.
>      >
>      >     Then a key must be constructed with
>     "//*[@isReference='true']". This is
>      >     an XPath expression which makes a filter test on every
>     element. It is
>      >     very costly when processing large documents to go over EVERY
>     element.
>      >     Furthermore filter tests are not really quickly processed. DDI 3
>      >     documents will be large. This approach can make only sense
>     with keys,
>      >     then this process is necessary only once.
>      >
>      >     When ReferenceType is not available every referencing element
>     needs an
>      >     own template, which can call a general template for
>     processing the
>      >     reference details.
>      >
>      >     What is really the problem with the current explicit reference
>      >     solution?
>      >     It can be only the size of the document, the complexity is
>     not really
>      >     larger for applications.
>      >
>      >     I was wondering if the complexity of an XSLT stylesheet can
>     be reduced
>      >     in using a generic approach for references, so not for each
>     of the 106
>      >     referencing elements an own template will be necessary. This
>     can make
>      >     sense in a generic reporting tool using a data-driven
>     approach (push
>      >     approach). This approach could make also use of the
>     systematic names of
>      >     the referencing elements and/or a look-up table with the
>     field-level
>      >     documentation.
>      >
>      >     Thinking this one step further a general reference container
>     can make
>      >     sense, where the referenced subject is defined in an
>     attribute or a
>      >     child element. For example instead of UniverseReference,
>      >     Reference/ReferencedSubject=Universe.
>      >
>      >
>      >     BTW "//*[URN|ID]" will not work. This would catch also the
>      >     identification. Something would be necessary like:
>      >     //*
>      >     [ local-name() != 'MaintainableID' ]
>      >     [ local-name() != 'VersionableID' ]
>      >     [ local-name() != 'IdentifiableID' ]
>      >     [ r:URN | r:ID ]
>      >     This would result in five test on every element.
>      >
>      >     The function 'ends-with' is only available in XSLT/XPath 2.0.
>     XSLT 1.0
>      >     cannot be used again. Only one processor (Saxon) does exist
>     for XSLT
>      >     2.0. This seems to be a limiting approach.
>      >
>      >     This is a general note:
>      >     I would prefer data-driven processing of the complex DDI
>     documents
>      >     and let do the XSLT processor the work, i.e. element-specific
>     templates,
>      >     not a declarative programming style. The schema should be
>     constructed
>      >     accordingly when possible.
>      >
>      >     Any response welcome.
>      >
>      >     I'll be available again starting at the meeting September 20.
>      >
>      >       Achim
>      >
>      >     I-Lin Kuo wrote:
>      >      > I would even go so far as to say that @isReference is
>     redundant.
>      >      >
>      >      > I vaguely recollect that the reason given for the
>     Reference was to
>      >      > identify elements of ReferenceType via a test of
>     //*[Reference]. It
>      >      > didn't convince me at the time, as I think that with the
>      >     elimination of
>      >      > the extra Reference element,
>     //*[ends-with(local-name(),'Reference')]
>      >      > would work, or //*[URN|ID].
>      >      >
>      >      > On 9/7/07, *Pascal Heus* <pascal.heus at gmail.com
>     <mailto:pascal.heus at gmail.com>
>      >     <mailto:pascal.heus at gmail.com <mailto:pascal.heus at gmail.com>>
>      >      > <mailto:pascal.heus at gmail.com
>     <mailto:pascal.heus at gmail.com> <mailto:pascal.heus at gmail.com
>     <mailto:pascal.heus at gmail.com>>>> wrote:
>      >      >
>      >      >     Achim, I-Lin:
>      >      >     we reviewed yesterday bug #2
>      >      >     ( http://mantis.ddialliance.org/view.php?id=2) related
>     to the
>      >     extra
>      >      >     */Reference element that appears under every
>     referencing type
>      >     in the
>      >      >     current schema. We had a general agreement that this
>     is not
>      >     necessary
>      >      >     and that it should be removed or possibly replaced
>     with a fixed
>      >      >     @isReference attribute. Since I believe you initially
>      >     requested this
>      >      >     change, we would like to have your perspective on the
>     issue
>      >     before
>      >      >     making a final decision.
>      >      >     Would appreciate your prompt input as this change impacts
>      >     most of the
>      >      >     ongoing tool development and the earlier we can make it
>      >     happen, the
>      >      >     better.
>      >      >     many thanks
>      >      >     Pascal
>      >      >
>      >      >     _______________________________________________
>      >      >     DDI-SRG mailing list
>      >      >     DDI-SRG at icpsr.umich.edu
>     <mailto:DDI-SRG at icpsr.umich.edu> <mailto: DDI-SRG at icpsr.umich.edu
>     <mailto:DDI-SRG at icpsr.umich.edu>>
>      >     <mailto:DDI-SRG at icpsr.umich.edu
>     <mailto:DDI-SRG at icpsr.umich.edu> <mailto:DDI-SRG at icpsr.umich.edu
>     <mailto:DDI-SRG at icpsr.umich.edu>>>
>      >      >     http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>      >      >
>      >      >
>      >      >
>      >      >
>      >      > --
>      >      > I-Lin Kuo
>      >      >
>      >      >
>      >      >
>      >    
>     ------------------------------------------------------------------------
>      >      >
>      >      > _______________________________________________
>      >      > DDI-SRG mailing list
>      >      > DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>
>     <mailto:DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>>
>      >      > http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>      >
>      >
>      >     --
>      >     GESIS - German Social Science Infrastructure Services
>      >     http://www.gesis.org/en/
>      >
>      >     _______________________________________________
>      >     DDI-SRG mailing list
>      >     DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>
>     <mailto:DDI-SRG at icpsr.umich.edu <mailto:DDI-SRG at icpsr.umich.edu>>
>      >     http://www.icpsr.umich.edu/mailman/listinfo/ddi-srg
>      >
>      >
>      >
>      >
>      > --
>      > I-Lin Kuo
> 
> 
>     --
>     GESIS - German Social Science Infrastructure Services
>     http://www.gesis.org/en/
> 
> 
> 
> 
> -- 
> I-Lin Kuo


-- 
GESIS - German Social Science Infrastructure Services
http://www.gesis.org/en/


More information about the DDI-SRG mailing list