CIDE (2009) Marcoux : Différence entre versions

De CIDE
imported>Abdelhakim Aidene
(General approach)
imported>Abdelhakim Aidene
Ligne 84 : Ligne 84 :
  
  
 
  
3.3 Examples
+
===Examples===
 
Here is the ISS file used for our examples. It is intended for a top-level element of story, and should thus be named story.iss.xml and reside in the same directory as the generic stylesheet:
 
Here is the ISS file used for our examples. It is intended for a top-level element of story, and should thus be named story.iss.xml and reside in the same directory as the generic stylesheet:
  
Ligne 113 : Ligne 112 :
  
  
 +
Note that the text contributed by peritexts is typeset in italics and the text contributed  by  the  document  is  typeset  in  normal  font  on  blue
  
 
Note that the text contributed by peritexts is typeset in italics and the text contributed  by  the  document  is  typeset  in  normal  font  on  blue
 
 
  
 
background. This is in keeping with the philosophy of IS, which demands that the origin of all text in the IS of a document be clearly identifiable.
 
background. This is in keeping with the philosophy of IS, which demands that the origin of all text in the IS of a document be clearly identifiable.
Ligne 171 : Ligne 168 :
  
 
The templates in Part 6 are used to determine the rule, in the ISS file, that “best” matches an element. As sketched earlier, a rule is a best match for an element E iff one of its paths matches E (i.e., is a suffix of E’s  ancestral line) and no other rule specifies a longer path matching E. If  two rules or more are best matches for an element, the first one (in ISS file order) is chosen.
 
The templates in Part 6 are used to determine the rule, in the ISS file, that “best” matches an element. As sketched earlier, a rule is a best match for an element E iff one of its paths matches E (i.e., is a suffix of E’s  ancestral line) and no other rule specifies a longer path matching E. If  two rules or more are best matches for an element, the first one (in ISS file order) is chosen.
 +
 
The templates in Parts 7 and 8 process attributes and hyperlinks, respectively. Processing consists essentially in recursive search-replace of various delimiters and placeholders. The templates in Part 9 are called by those of Parts 3 and 4 to verify the presence of exceptions and, if needed, include the appropriate warnings in the produced IS.
 
The templates in Parts 7 and 8 process attributes and hyperlinks, respectively. Processing consists essentially in recursive search-replace of various delimiters and placeholders. The templates in Part 9 are called by those of Parts 3 and 4 to verify the presence of exceptions and, if needed, include the appropriate warnings in the produced IS.
  
  
5 Discussion
+
==Discussion==
 +
 
 
The examples illustrate that peritexts can be very long. It is an essential feature of IS that there be no limit on their length. They should not be constrained lexically either. However, this is not entirely the case in the current implementation. Indeed, it is currently not possible to include some of the delimiters and placeholders for attribute guarded-passages and hyperlinks as data in the peritexts (and, in a few cases, in element content). One possible improvement would thus be to define and implement conventions for allowing all delimiters and placeholders to be included as data in peritexts (and element content).
 
The examples illustrate that peritexts can be very long. It is an essential feature of IS that there be no limit on their length. They should not be constrained lexically either. However, this is not entirely the case in the current implementation. Indeed, it is currently not possible to include some of the delimiters and placeholders for attribute guarded-passages and hyperlinks as data in the peritexts (and, in a few cases, in element content). One possible improvement would thus be to define and implement conventions for allowing all delimiters and placeholders to be included as data in peritexts (and element content).
 
A related issue is the validation of the syntax used for attribute guarded- passages and hyperlinks in peritexts. At the moment, no validation or error detection is performed. While this will never cause the abnormal termination of the transformation, it could yield unexpected results. Another possible improvement would thus be to implement full syntactic validation of the peritexts.
 
A related issue is the validation of the syntax used for attribute guarded- passages and hyperlinks in peritexts. At the moment, no validation or error detection is performed. While this will never cause the abnormal termination of the transformation, it could yield unexpected results. Another possible improvement would thus be to implement full syntactic validation of the peritexts.
 +
 
Certain delimiters are used internally as placeholders in text variables during processing. Their use relies implicitly on certain character sequences not occurring as textual content in the processed document. This should be replaced by more robust mechanisms.
 
Certain delimiters are used internally as placeholders in text variables during processing. Their use relies implicitly on certain character sequences not occurring as textual content in the processed document. This should be replaced by more robust mechanisms.
 +
 
One of the challenges of developing a generic stylesheet in XSLT 1.0 is  to maintain a one-pass approach. Solving the above-mentioned weaknesses would without doubt make this challenge even bigger. Switching to a two-pass approach may thus be an attractive avenue for future developments.
 
One of the challenges of developing a generic stylesheet in XSLT 1.0 is  to maintain a one-pass approach. Solving the above-mentioned weaknesses would without doubt make this challenge even bigger. Switching to a two-pass approach may thus be an attractive avenue for future developments.
 +
 
Adopting a two-pass approach can be done in essentially two ways: an external          pipelining          mechanism          (such          as        XProc
 
Adopting a two-pass approach can be done in essentially two ways: an external          pipelining          mechanism          (such          as        XProc
 
<http://www.w3.org/TR/xproc/>) can be used with XSLT 1.0, or multi- passes can be handled internally in XSLT 2.0, through the node-set function,  which  allows  some  pipeline-like  processing.  In  both  cases,
 
<http://www.w3.org/TR/xproc/>) can be used with XSLT 1.0, or multi- passes can be handled internally in XSLT 2.0, through the node-set function,  which  allows  some  pipeline-like  processing.  In  both  cases,
Ligne 185 : Ligne 187 :
 
browser integration could be non-trivial. One interesting way to exploit  an external pipelining mechanism would be to have a generic stylesheet generate a model-specific stylesheet from the IS specification, then apply the generated stylesheet to the document instance to generate its IS. Another functionality that could benefit from the enhanced possibilities  of multi-pass / XSLT 2.0 processing is the automatic indentation of the output. Right now, the heuristics used is fairly simple, and it can break down even on simple cases. A more sophisticated and robust heuristics should thus be developed, and this would likely be easier with  multi-pass
 
browser integration could be non-trivial. One interesting way to exploit  an external pipelining mechanism would be to have a generic stylesheet generate a model-specific stylesheet from the IS specification, then apply the generated stylesheet to the document instance to generate its IS. Another functionality that could benefit from the enhanced possibilities  of multi-pass / XSLT 2.0 processing is the automatic indentation of the output. Right now, the heuristics used is fairly simple, and it can break down even on simple cases. A more sophisticated and robust heuristics should thus be developed, and this would likely be easier with  multi-pass
 
/ XSLT 2.0 processing.
 
/ XSLT 2.0 processing.
A question that needs to be investigated through experimentation is that  of determining how much of the indentation should be automatic. In [1], indentation was specified explicitly in a conventional manner in the peritexts.
+
A question that needs to be investigated through experimentation is that  of determining how much of the indentation should be automatic. In [{{CIDE lien citation|1}}], indentation was specified explicitly in a conventional manner in the peritexts.
 +
 
 
Let us now consider the foreseeable evolutions of the IS framework and how they could impact the IS generation mechanism. Consider the output of Example 1. As textual content, it includes the following passage:
 
Let us now consider the foreseeable evolutions of the IS framework and how they could impact the IS generation mechanism. Consider the output of Example 1. As textual content, it includes the following passage:
 
There, he met The person named Barbe-Bleue
 
There, he met The person named Barbe-Bleue
  
 
Note that the article The has been capitalized. Why? The answer is that it comes from a text-before segment that is sometimes located at the beginning of a sentence, where capitalization is appropriate. But capitalization in the middle of a sentence (as in Example 1) is inappropriate. The source of the problem is that, in the current framework, the same text-before segment must be used consistently, regardless of its position in a sentence.
 
Note that the article The has been capitalized. Why? The answer is that it comes from a text-before segment that is sometimes located at the beginning of a sentence, where capitalization is appropriate. But capitalization in the middle of a sentence (as in Example 1) is inappropriate. The source of the problem is that, in the current framework, the same text-before segment must be used consistently, regardless of its position in a sentence.
 +
 
Remember that in IS, the focus is not on presentation but on meaning. Since the problem at hand only affects presentation and does not hinder comprehension, it must not be considered major. Moreover, it can be alleviated by various devices, such as writing the peritext all in capitals. With Example 1, this gives the following output, which, though still unusual, is not as strange-looking as the original output:
 
Remember that in IS, the focus is not on presentation but on meaning. Since the problem at hand only affects presentation and does not hinder comprehension, it must not be considered major. Moreover, it can be alleviated by various devices, such as writing the peritext all in capitals. With Example 1, this gives the following output, which, though still unusual, is not as strange-looking as the original output:
  
Ligne 200 : Ligne 204 :
  
  
6 Conclusion
+
==Conclusion==
In this article, we presented a complete implementation, in XSLT 1.0, of the intertextual semantics (IS) generation mechanism for XML documents. The implementation is model-independent, in that a generic XSLT stylesheet reads the peritexts from an IS specification file, an XML document giving the IS specification (ISS) applicable to the document being processed. The implementation handles attributes as described in [2], hyperlinks (in peritexts or as attribute or element content) as described in [1], and local element definitions (in the sense of W3C schemas), also as described in [1]. In addition, it performs indentation of the IS produced (in the same line as [3], but more elaborate), for  increased readability, and handles exceptions, elements for which no peritexts are given in the IS specification or attributes unexpected by the peritexts.
+
In this article, we presented a complete implementation, in XSLT 1.0, of the intertextual semantics (IS) generation mechanism for XML documents. The implementation is model-independent, in that a generic XSLT stylesheet reads the peritexts from an IS specification file, an XML document giving the IS specification (ISS) applicable to the document being processed. The implementation handles attributes as described in [{{CIDE lien citation|2}}], hyperlinks (in peritexts or as attribute or element content) as described in [{{CIDE lien citation|1}}], and local element definitions (in the sense of W3C schemas), also as described in [{{CIDE lien citation|1}}]. In addition, it performs indentation of the IS produced (in the same line as [{{CIDE lien citation|3}}], but more elaborate), for  increased readability, and handles exceptions, elements for which no peritexts are given in the IS specification or attributes unexpected by the peritexts.
 +
 
 
After describing the format adopted for ISS files, we gave examples illustrating the functionalities of the implementation, then outlined the structure of the generic stylesheet. Finally, we discussed various aspects of the implementation, possible improvements, and the impact that foreseeable generalizations of the IS framework might have on the IS generation mechanism.
 
After describing the format adopted for ISS files, we gave examples illustrating the functionalities of the implementation, then outlined the structure of the generic stylesheet. Finally, we discussed various aspects of the implementation, possible improvements, and the impact that foreseeable generalizations of the IS framework might have on the IS generation mechanism.
The   current   version   of   the   stylesheet   is   available       through
+
 
 +
The current version of the stylesheet is available through
 
<http://grds.ebsi.umontreal.ca/>. It is published under the Creative Commons “Attribution-Noncommercial-Share Alike 2.5 Canada” license
 
<http://grds.ebsi.umontreal.ca/>. It is published under the Creative Commons “Attribution-Noncommercial-Share Alike 2.5 Canada” license
 
<http://creativecommons .org/licenses/by-nc-sa/2.5/ca/>. We warmly encourage readers to experiment with it, look at the examples, write IS specifications for their models, either extant or under development, and send comments and suggestions.
 
<http://creativecommons .org/licenses/by-nc-sa/2.5/ca/>. We warmly encourage readers to experiment with it, look at the examples, write IS specifications for their models, either extant or under development, and send comments and suggestions.
 
   
 
   
  
7 References
+
==References==
 +
 
 
[1] Marcoux, Yves. “A natural-language approach to modeling:  Why is some XML so difficult to write?” Proceedings of Extreme Markup Languages 2006.
 
[1] Marcoux, Yves. “A natural-language approach to modeling:  Why is some XML so difficult to write?” Proceedings of Extreme Markup Languages 2006.
 
[2] Marcoux, Yves; Rizkallah, Élias. “Exploring  intertextual semantics: a reflection on attributes and optionality.” Proceedings of Extreme Markup Languages 2007.
 
[2] Marcoux, Yves; Rizkallah, Élias. “Exploring  intertextual semantics: a reflection on attributes and optionality.” Proceedings of Extreme Markup Languages 2007.

Version du 5 juillet 2016 à 11:48

Intertextual semantics generation for structured documents:a complete implementation in XSLT


 
 

 
titre
Intertextual semantics generation for structured documents:a complete implementation in XSLT
auteurs
Yves Marcoux.
Affiliations
GRDS, EBSI, Université de Montréal.
In
CIDE.12 (Montréal), 2009
En PDF 
CIDE (2009) Marcoux.pdf.pdf
Mots-clés 
Sémantique intertextuelle, documents structurés, langages de balisage, XML, XSLT, descriptions formelles de jeux de balises.
Keywords
Intertextual semantics, structured documents, markup languages, XML, XSLT, formal tag-set descriptions.
Résumé
La sémantique intertextuelle (SI) [1] [4] attribue aux documents balisés un sens en langue naturelle. Alors que les sémantiques formelles visent une représentation du sens des documents pour la machine, la SI vise l’humain. Dans la forme actuelle de l’approche, la SI d’un modèle (DTD, schéma) est donnée par deux péritextes associés à chaque élément: un texte-avant et un texte- après. La SI d’un document est la concaténation des péritextes et des contenus d’élément dans l’ordre du document. Nous présentons une implantation complète, en XSLT 1.0, de la génération de SI. L’implantation traite les attributs tel que décrit dans [2], et les hyperliens et éléments locaux tel que décrit dans [1]. Elle indente aussi l’extrant pour une meilleure lisibilité tel que suggéré dans [3] et gère les exceptions que sont les éléments et attributs inconnus.


… davantage au sujet de « CIDE (2009) Marcoux »
Intertextual semantics generation for structured documents:a complete implementation in XSLT +