CIDE (2009) Marcoux : Différence entre versions

De CIDE
imported>Abdelhakim Aidene
imported>Abdelhakim Aidene
Ligne 47 : Ligne 47 :
 
The IS generation process is akin to styling the document with the peritexts, concatenating peritexts and element contents as the document tree is traversed depth-first. The IS, or IS-meaning, of the document is the resulting character string. It is important to stress that, in spite of the similarity between styling and the generation of the IS of a document, the preoccupations of IS are absolutely not at the presentational level, but really at the semantic level.
 
The IS generation process is akin to styling the document with the peritexts, concatenating peritexts and element contents as the document tree is traversed depth-first. The IS, or IS-meaning, of the document is the resulting character string. It is important to stress that, in spite of the similarity between styling and the generation of the IS of a document, the preoccupations of IS are absolutely not at the presentational level, but really at the semantic level.
  
In this article, we present a complete implementation, in XSLT 1.0, of the intertextual semantics generation mechanism. The transformation is model-independent in that it reads the peritexts from an XML document encoding the IS specification for a given model. It implements attribute handling as defined in [{{CIDE lien citation|2}}]; hyperlinks in peritexts or as attribute or element content, as described in [{{CIDE lien citation|1}}]; and local element definitions, also as described in [1]. In addition, it performs indentation of the output (in the same line as [{{CIDE lien citation|3}}], but more elaborate), for increased readability, and handles exceptions, elements for which no peritext exists in the IS specification and unexpected attributes.
+
In this article, we present a complete implementation, in XSLT 1.0, of the intertextual semantics generation mechanism. The transformation is model-independent in that it reads the peritexts from an XML document encoding the IS specification for a given model. It implements attribute handling as defined in [{{CIDE lien citation|2}}]; hyperlinks in peritexts or as attribute or element content, as described in [{{CIDE lien citation|1}}]; and local element definitions, also as described in [{{CIDE lien citation|1}}]. In addition, it performs indentation of the output (in the same line as [{{CIDE lien citation|3}}], but more elaborate), for increased readability, and handles exceptions, elements for which no peritext exists in the IS specification and unexpected attributes.
  
  
2 General approach
+
==General approach==
 
As mentioned earlier, the implementation is model-independent: the same XSLT stylesheet is used to process any document. In principle, the association between elements and peritexts is determined by the model (DTD or schema) to which the document conforms. Knowing the  model,
 
As mentioned earlier, the implementation is model-independent: the same XSLT stylesheet is used to process any document. In principle, the association between elements and peritexts is determined by the model (DTD or schema) to which the document conforms. Knowing the  model,
 
   
 
   
  
 
the generic stylesheet can read an IS specification (ISS) file, giving the peritexts for all elements, and compute the IS of the instance.
 
the generic stylesheet can read an IS specification (ISS) file, giving the peritexts for all elements, and compute the IS of the instance.
In theory, namespace or schema-location information could be used to identify the appropriate ISS file applicable to a document. However, complications would arise from the fact that the same document can conform to different schemas, and contain elements of different namespaces. Thus, it seems simpler to determine the model of a  document for IS purposes independently from namespace and schema- location information. One possibility would be to point explicitly to the ISS file through a processing instruction in the document. We chose a more implicit approach, requiring no model-specific addition to the documents, and which proved flexible enough: the generic ID of the document element of the instance is used to form the filename of the ISS file. More specifically, the file named genID.iss.xml in the same directory as the generic stylesheet is used as an ISS file, where genID is the generic ID of the top-level element of the document.
+
 
 +
In theory, namespace or schema-location information could be used to identify the appropriate ISS file applicable to a document. However, complications would arise from the fact that the same document can conform to different schemas, and contain elements of different namespaces. Thus, it seems simpler to determine the model of a  document for IS purposes independently from namespace and schema- location information. One possibility would be to point explicitly to the ISS file through a processing instruction in the document. We chose a more implicit approach, requiring no model-specific addition to the documents, and which proved flexible enough:  
 +
the generic ID of the document element of the instance is used to form the filename of the ISS file. More specifically, the file named genID.iss.xml in the same directory as the generic stylesheet is used as an ISS file, where genID is the generic ID of the top-level element of the document.
 +
 
 
The processing performed by the generic stylesheet is one-pass, i.e., it takes as input the document instance and directly generates its IS. Thus, no pipelining environment is necessary. Any current browser with an XSLT 1.0 processor can be used to view the IS of documents directly, provided a link to the generic stylesheet is included in the documents, for example:
 
The processing performed by the generic stylesheet is one-pass, i.e., it takes as input the document instance and directly generates its IS. Thus, no pipelining environment is necessary. Any current browser with an XSLT 1.0 processor can be used to view the IS of documents directly, provided a link to the generic stylesheet is included in the documents, for example:
  
Ligne 61 : Ligne 64 :
  
  
3 IS specifications
+
==IS specifications==
  
3.1 Overview
+
===Overview===
Essentially, an ISS file gives the peritexts (text-before and text-after segments) for all the “elements” in the model. Remember, however, that local elements (in the sense of W3C schemas) are possible [1], so that different peritexts can be assigned to elements with the same generic ID but different ancestral lines. Thus, in effect, a peritext is not assigned to some fixed generic ID, but to a path, and will be applicable to elements matched by that path. A path P is said to match an element E iff E’s ancestral line ends with P, i.e., iff P is a suffix of E’s ancestral line. For example, a peritext assigned to path title is applicable to all elements with generic ID title, regardless of their ancestral line, but a peritext assigned to path titleStmt/title is applicable only to elements with generic  ID  title that  are  children  of  elements  with  generic ID
+
Essentially, an ISS file gives the peritexts (text-before and text-after segments) for all the “elements” in the model. Remember, however, that local elements (in the sense of W3C schemas) are possible [1], so that different peritexts can be assigned to elements with the same generic ID but different ancestral lines. Thus, in effect, a peritext is not assigned to some fixed generic ID, but to a path, and will be applicable to elements matched by that path. A path P is said to match an element E iff E’s ancestral line ends with P, i.e., iff P is a suffix of E’s ancestral line. For example, a peritext assigned to path title is applicable to all elements with generic ID title, regardless of their ancestral line, but a peritext assigned to path titleStmt/title is applicable only to elements with generic  ID  title that  are  children  of  elements  with  generic ID titleStmt. A peritext assigned to path /TEI is applicable only to document (top-level) elements with generic ID TEI.<ref>Paths play a role similar to match attributes of xsl:template elements in XSLT. However, paths are deliberately much less flexible and cannot, for example, contain wildcards or refer to arbitrary XPath axes.</ref>
 
  
titleStmt. A peritext assigned to path /TEI is applicable only to document (top-level) elements with generic ID TEI.1
 
 
For convenience, peritexts can be assigned simultaneously to more than one path, and peritexts are in fact assigned to paths in pairs, consisting each of one text-before segment and one text-after segment.
 
For convenience, peritexts can be assigned simultaneously to more than one path, and peritexts are in fact assigned to paths in pairs, consisting each of one text-before segment and one text-after segment.
Peritexts can contain certain delimiters for handling attributes as  described in [2] and allowing hyperlinks in the resulting IS as described  in [1]. The hyperlink delimiters in [1] were [ ]; however, in [2] and here, those are used for attributes. Thus, we will use {{ }} for hyperlinks.
+
Peritexts can contain certain delimiters for handling attributes as  described in [{{CIDE lien citation|2}}] and allowing hyperlinks in the resulting IS as described  in [{{CIDE lien citation|1}}]. The hyperlink delimiters in [{{CIDE lien citation|1}}] were [ ]; however, in [{{CIDE lien citation|2}}] and here, those are used for attributes. Thus, we will use {{ }} for hyperlinks.
  
3.2 ISS files
+
===ISS files===
 
An ISS file is an XML document. All its elements and attributes belong  to the specific namespace:
 
An ISS file is an XML document. All its elements and attributes belong  to the specific namespace:
  
Ligne 77 : Ligne 78 :
  
 
Its top-level element is an iss element. The content of that element is one or more rule elements. Each rule element is empty and has three mandatory attributes: paths, text-before, and text-after. The effect of a rule element is to assign the pair of peritexts text-before and text-after to the path or space-delimited paths given in paths.
 
Its top-level element is an iss element. The content of that element is one or more rule elements. Each rule element is empty and has three mandatory attributes: paths, text-before, and text-after. The effect of a rule element is to assign the pair of peritexts text-before and text-after to the path or space-delimited paths given in paths.
 +
 
A rule is applicable to an element iff one of its paths matches the element. If more than one rule applies to an element, the one with the most specific (longest) matching path is chosen; if more than one rule has that longest matching path, the first one (in ISS file order) is applied.
 
A rule is applicable to an element iff one of its paths matches the element. If more than one rule applies to an element, the one with the most specific (longest) matching path is chosen; if more than one rule has that longest matching path, the first one (in ISS file order) is applied.
 +
 
The sequences {{ and }} in peritexts are hyperlink delimiters, i.e., what is between them is interpreted as a URL and converted to a hyperlink in the IS. It is possible to have {{ in a text-before and }} in the corresponding text-after, but this will only work when the element contains neither sub- elements nor }} character sequences (which would be unusual in a URL). Peritexts can contain passages “guarded” by an attribute name, such as:
 
The sequences {{ and }} in peritexts are hyperlink delimiters, i.e., what is between them is interpreted as a URL and converted to a hyperlink in the IS. It is possible to have {{ in a text-before and }} in the corresponding text-after, but this will only work when the element contains neither sub- elements nor }} character sequences (which would be unusual in a URL). Peritexts can contain passages “guarded” by an attribute name, such as:
 
@attribName[Some text containing exactly one @.]
 
@attribName[Some text containing exactly one @.]
 +
 
Such guarded passages in peritexts are included in the resulting IS only if the guarding attribute is present on the element to which the peritext is applied. Otherwise, the entire guarded passage is omitted. When the passage is included, the actual value of the attribute is inserted in place of the @.
 
Such guarded passages in peritexts are included in the resulting IS only if the guarding attribute is present on the element to which the peritext is applied. Otherwise, the entire guarded passage is omitted. When the passage is included, the actual value of the attribute is inserted in place of the @.
 
It is possible to use xmlns as an attribute to refer to the namespace-uri of an element. The guarded passage is then included only if the element belongs to a namespace.
 
It is possible to use xmlns as an attribute to refer to the namespace-uri of an element. The guarded passage is then included only if the element belongs to a namespace.
  
  
 
1 Paths play a role similar to match attributes of xsl:template elements in XSLT. However, paths are deliberately much less flexible and cannot, for example, contain wildcards or refer to arbitrary XPath axes.
 
 
   
 
   
  

Version du 5 juillet 2016 à 11:44

Intertextual semantics generation for structured documents:a complete implementation in XSLT


 
 

 
titre
Intertextual semantics generation for structured documents:a complete implementation in XSLT
auteurs
Yves Marcoux.
Affiliations
GRDS, EBSI, Université de Montréal.
In
CIDE.12 (Montréal), 2009
En PDF 
CIDE (2009) Marcoux.pdf.pdf
Mots-clés 
Sémantique intertextuelle, documents structurés, langages de balisage, XML, XSLT, descriptions formelles de jeux de balises.
Keywords
Intertextual semantics, structured documents, markup languages, XML, XSLT, formal tag-set descriptions.
Résumé
La sémantique intertextuelle (SI) [1] [4] attribue aux documents balisés un sens en langue naturelle. Alors que les sémantiques formelles visent une représentation du sens des documents pour la machine, la SI vise l’humain. Dans la forme actuelle de l’approche, la SI d’un modèle (DTD, schéma) est donnée par deux péritextes associés à chaque élément: un texte-avant et un texte- après. La SI d’un document est la concaténation des péritextes et des contenus d’élément dans l’ordre du document. Nous présentons une implantation complète, en XSLT 1.0, de la génération de SI. L’implantation traite les attributs tel que décrit dans [2], et les hyperliens et éléments locaux tel que décrit dans [1]. Elle indente aussi l’extrant pour une meilleure lisibilité tel que suggéré dans [3] et gère les exceptions que sont les éléments et attributs inconnus.


… davantage au sujet de « CIDE (2009) Marcoux »
Intertextual semantics generation for structured documents:a complete implementation in XSLT +