CIDE (2009) Marcoux : Différence entre versions

De CIDE
imported>Abdelhakim Aidene
(ISS files)
imported>Abdelhakim Aidene
Ligne 199 : Ligne 199 :
  
  
4 Structure of the generic stylesheet
+
==Structure of the generic stylesheet==
 +
 
 
The general structure of the generic stylesheet (ISG.xsl in the above examples) is as follows:
 
The general structure of the generic stylesheet (ISG.xsl in the above examples) is as follows:
1. Global variables initialization.
+
 
2. Overall HTML structure template, matching /.
+
#Global variables initialization.
3. Templates for text-only elements.
+
#Overall HTML structure template, matching /.
4. Templates for all other elements.
+
#Templates for text-only elements.
5. Templates for text nodes (PCDATA).
+
#Templates for all other elements.
6. Called templates for finding the best matching rule.
+
#Templates for text nodes (PCDATA).
7. Called templates for processing attributes.
+
#Called templates for finding the best matching rule.
8. Called templates for processing hyperlinks.
+
#Called templates for processing attributes.
9. Called templates for exception handling.
+
#Called templates for processing hyperlinks.
 +
#Called templates for exception handling.
  
 
The rules contained in the model-specific ISS file genID.iss.xml, where genID is the generic ID of the top-level element of the document, are read in Part 1 and placed in a global variable. Part 2 produces the overall HTML structure of the output, including an internal CSS stylesheet.
 
The rules contained in the model-specific ISS file genID.iss.xml, where genID is the generic ID of the top-level element of the document, are read in Part 1 and placed in a global variable. Part 2 produces the overall HTML structure of the output, including an internal CSS stylesheet.
 +
 
Parts 3, 4, and 5 are actually pairs of templates: one for block formatting and one for flowed formatting. The indentation heuristics, mentioned earlier, is realized by the templates in Part 4 and determines how blocks are indented relative to one another and at which level the formatting should be changed from block to flowed.
 
Parts 3, 4, and 5 are actually pairs of templates: one for block formatting and one for flowed formatting. The indentation heuristics, mentioned earlier, is realized by the templates in Part 4 and determines how blocks are indented relative to one another and at which level the formatting should be changed from block to flowed.
 
 
 
The templates in Part 6 are used to determine the rule, in the ISS file, that “best” matches an element. As sketched earlier, a rule is a best match for an element E iff one of its paths matches E (i.e., is a suffix of E’s  ancestral line) and no other rule specifies a longer path matching E. If  two rules or more are best matches for an element, the first one (in ISS file order) is chosen.
 
The templates in Part 6 are used to determine the rule, in the ISS file, that “best” matches an element. As sketched earlier, a rule is a best match for an element E iff one of its paths matches E (i.e., is a suffix of E’s  ancestral line) and no other rule specifies a longer path matching E. If  two rules or more are best matches for an element, the first one (in ISS file order) is chosen.
  
Ligne 222 : Ligne 223 :
  
 
The examples illustrate that peritexts can be very long. It is an essential feature of IS that there be no limit on their length. They should not be constrained lexically either. However, this is not entirely the case in the current implementation. Indeed, it is currently not possible to include some of the delimiters and placeholders for attribute guarded-passages and hyperlinks as data in the peritexts (and, in a few cases, in element content). One possible improvement would thus be to define and implement conventions for allowing all delimiters and placeholders to be included as data in peritexts (and element content).
 
The examples illustrate that peritexts can be very long. It is an essential feature of IS that there be no limit on their length. They should not be constrained lexically either. However, this is not entirely the case in the current implementation. Indeed, it is currently not possible to include some of the delimiters and placeholders for attribute guarded-passages and hyperlinks as data in the peritexts (and, in a few cases, in element content). One possible improvement would thus be to define and implement conventions for allowing all delimiters and placeholders to be included as data in peritexts (and element content).
 +
 
A related issue is the validation of the syntax used for attribute guarded- passages and hyperlinks in peritexts. At the moment, no validation or error detection is performed. While this will never cause the abnormal termination of the transformation, it could yield unexpected results. Another possible improvement would thus be to implement full syntactic validation of the peritexts.
 
A related issue is the validation of the syntax used for attribute guarded- passages and hyperlinks in peritexts. At the moment, no validation or error detection is performed. While this will never cause the abnormal termination of the transformation, it could yield unexpected results. Another possible improvement would thus be to implement full syntactic validation of the peritexts.
  
Ligne 228 : Ligne 230 :
 
One of the challenges of developing a generic stylesheet in XSLT 1.0 is  to maintain a one-pass approach. Solving the above-mentioned weaknesses would without doubt make this challenge even bigger. Switching to a two-pass approach may thus be an attractive avenue for future developments.
 
One of the challenges of developing a generic stylesheet in XSLT 1.0 is  to maintain a one-pass approach. Solving the above-mentioned weaknesses would without doubt make this challenge even bigger. Switching to a two-pass approach may thus be an attractive avenue for future developments.
  
Adopting a two-pass approach can be done in essentially two ways: an external         pipelining         mechanism         (such         as       XProc
+
Adopting a two-pass approach can be done in essentially two ways: an external pipelining mechanism (such as XProc
<http://www.w3.org/TR/xproc/>) can be used with XSLT 1.0, or multi- passes can be handled internally in XSLT 2.0, through the node-set function,  which  allows  some  pipeline-like  processing.  In  both   cases,
+
<small><nowiki>< </nowiki> <http://www.w3.org/TR/xproc/><nowiki>></nowiki></small>) can be used with XSLT 1.0, or multi- passes can be handled internally in XSLT 2.0, through the node-set function,  which  allows  some  pipeline-like  processing.  In  both cases,browser integration could be non-trivial. One interesting way to exploit  an external pipelining mechanism would be to have a generic stylesheet generate a model-specific stylesheet from the IS specification, then apply the generated stylesheet to the document instance to generate its IS. Another functionality that could benefit from the enhanced possibilities  of multi-pass / XSLT 2.0 processing is the automatic indentation of the output. Right now, the heuristics used is fairly simple, and it can break down even on simple cases. A more sophisticated and robust heuristics should thus be developed, and this would likely be easier with  multi-pass
+
/ XSLT 2.0 processing.
  
browser integration could be non-trivial. One interesting way to exploit  an external pipelining mechanism would be to have a generic stylesheet generate a model-specific stylesheet from the IS specification, then apply the generated stylesheet to the document instance to generate its IS. Another functionality that could benefit from the enhanced possibilities  of multi-pass / XSLT 2.0 processing is the automatic indentation of the output. Right now, the heuristics used is fairly simple, and it can break down even on simple cases. A more sophisticated and robust heuristics should thus be developed, and this would likely be easier with  multi-pass
 
/ XSLT 2.0 processing.
 
 
A question that needs to be investigated through experimentation is that  of determining how much of the indentation should be automatic. In [{{CIDE lien citation|1}}], indentation was specified explicitly in a conventional manner in the peritexts.
 
A question that needs to be investigated through experimentation is that  of determining how much of the indentation should be automatic. In [{{CIDE lien citation|1}}], indentation was specified explicitly in a conventional manner in the peritexts.
  
 
Let us now consider the foreseeable evolutions of the IS framework and how they could impact the IS generation mechanism. Consider the output of Example 1. As textual content, it includes the following passage:
 
Let us now consider the foreseeable evolutions of the IS framework and how they could impact the IS generation mechanism. Consider the output of Example 1. As textual content, it includes the following passage:
There, he met The person named Barbe-Bleue
+
::''There, he met The person named Barbe-Bleue''
  
 
Note that the article The has been capitalized. Why? The answer is that it comes from a text-before segment that is sometimes located at the beginning of a sentence, where capitalization is appropriate. But capitalization in the middle of a sentence (as in Example 1) is inappropriate. The source of the problem is that, in the current framework, the same text-before segment must be used consistently, regardless of its position in a sentence.
 
Note that the article The has been capitalized. Why? The answer is that it comes from a text-before segment that is sometimes located at the beginning of a sentence, where capitalization is appropriate. But capitalization in the middle of a sentence (as in Example 1) is inappropriate. The source of the problem is that, in the current framework, the same text-before segment must be used consistently, regardless of its position in a sentence.
Ligne 243 : Ligne 243 :
 
Remember that in IS, the focus is not on presentation but on meaning. Since the problem at hand only affects presentation and does not hinder comprehension, it must not be considered major. Moreover, it can be alleviated by various devices, such as writing the peritext all in capitals. With Example 1, this gives the following output, which, though still unusual, is not as strange-looking as the original output:
 
Remember that in IS, the focus is not on presentation but on meaning. Since the problem at hand only affects presentation and does not hinder comprehension, it must not be considered major. Moreover, it can be alleviated by various devices, such as writing the peritext all in capitals. With Example 1, this gives the following output, which, though still unusual, is not as strange-looking as the original output:
  
 
+
[[Fichier:CIDE (2009) Marcoux fig 2.JPG|center|400px|thumb|]]
  
 
Thus, it is not clear that the framework needs to be modified to accommodate peritexts that vary according to their position in a sentence.
 
Thus, it is not clear that the framework needs to be modified to accommodate peritexts that vary according to their position in a sentence.
 
  
 
Other possible extensions in the same line would include peritexts that vary with the position of the element relative to its siblings, with the number of children of the element, and with the grammatical gender of a word or expression in the content of some element or attribute. Clearly, adding any such extension to IS would complicate the IS-generation mechanism. For one thing, it might, require the inclusion of additional peritexts in the IS specification of a model. Then, those additional peritexts would have to be appropriately processed during IS-generation. We believe that, in all cases, experimentation should be used to determine whether an extension is truly necessary or if some workaround without extension is possible. We think extreme parsimony is of utmost importance for the evolution of IS, because the inclusion of too powerful mechanisms could severely impair the explanatory power of  the approach.
 
Other possible extensions in the same line would include peritexts that vary with the position of the element relative to its siblings, with the number of children of the element, and with the grammatical gender of a word or expression in the content of some element or attribute. Clearly, adding any such extension to IS would complicate the IS-generation mechanism. For one thing, it might, require the inclusion of additional peritexts in the IS specification of a model. Then, those additional peritexts would have to be appropriately processed during IS-generation. We believe that, in all cases, experimentation should be used to determine whether an extension is truly necessary or if some workaround without extension is possible. We think extreme parsimony is of utmost importance for the evolution of IS, because the inclusion of too powerful mechanisms could severely impair the explanatory power of  the approach.

Version du 18 juillet 2016 à 10:43

Intertextual semantics generation for structured documents:a complete implementation in XSLT


 
 

 
titre
Intertextual semantics generation for structured documents:a complete implementation in XSLT
auteurs
Yves Marcoux.
Affiliations
GRDS, EBSI, Université de Montréal.
In
CIDE.12 (Montréal), 2009
En PDF 
CIDE (2009) Marcoux.pdf.pdf
Mots-clés 
Sémantique intertextuelle, documents structurés, langages de balisage, XML, XSLT, descriptions formelles de jeux de balises.
Keywords
Intertextual semantics, structured documents, markup languages, XML, XSLT, formal tag-set descriptions.
Résumé
La sémantique intertextuelle (SI) [1] [4] attribue aux documents balisés un sens en langue naturelle. Alors que les sémantiques formelles visent une représentation du sens des documents pour la machine, la SI vise l’humain. Dans la forme actuelle de l’approche, la SI d’un modèle (DTD, schéma) est donnée par deux péritextes associés à chaque élément: un texte-avant et un texte- après. La SI d’un document est la concaténation des péritextes et des contenus d’élément dans l’ordre du document. Nous présentons une implantation complète, en XSLT 1.0, de la génération de SI. L’implantation traite les attributs tel que décrit dans [2], et les hyperliens et éléments locaux tel que décrit dans [1]. Elle indente aussi l’extrant pour une meilleure lisibilité tel que suggéré dans [3] et gère les exceptions que sont les éléments et attributs inconnus.