CIDE (2009) Marcoux : Différence entre versions

De CIDE
imported>Jacques Ducloy
(Introduction)
imported>Jacques Ducloy
(Introduction)
Ligne 30 : Ligne 30 :
 
==Introduction==
 
==Introduction==
  
In a structured document (XML, [[A pour norme citée::Standard Generalized Markup Language|SGML]], etc.), what is the “meaning” of the various tags (the markup) present in the document? How is the meaning of the document augmented—or otherwise affected—by the presence of markup?
+
In a structured document (XML, [[A pour norme citée::Standard Generalized Markup Language|SGML]], etc.), what is the “meaning” of the various tags (''the markup'') present in the document? How is the meaning of the document augmented—or otherwise affected—by the presence of markup?
  
Fundamentally, there are two possible avenues to give an answer to that question: the formal one and the informal one. One can devise a framework in which the meaning of a marked-up document is represented by a set of formal statements, for example in first-order logic. Or, one can seek a framework in which the meaning of a marked-up document is represented by a set of sentences in an informal language, for example a natural language.
+
Fundamentally, there are two possible avenues to give an answer to that question: the formal one and the informal one. One can devise a framework in which the meaning of a marked-up document is represented by a set of ''formal'' statements, for example in first-order logic. Or, one can seek a framework in which the meaning of a marked-up document is represented by a set of sentences in an ''informal'' language, for example a natural language.
  
 
If automatic inferencing (through an inference engine) is aimed at, then a formal approach probably has a leading edge. However, if some other use of the “meaning” of the document is envisioned, which for example involves showing that meaning to humans, then the situation may be reversed.
 
If automatic inferencing (through an inference engine) is aimed at, then a formal approach probably has a leading edge. However, if some other use of the “meaning” of the document is envisioned, which for example involves showing that meaning to humans, then the situation may be reversed.
  
Formal Tag-Set Descriptions (see for example [{{CIDE lien citation|6}}], [{{CIDE lien citation|7}}] and [{{CIDE lien citation|8}}]) are an example of the approaches along the formal avenue. Intertextual semantics [{{CIDE lien citation|1}}] [{{CIDE lien citation|2}}] [{{CIDE lien citation|4}}] is an approach along the informal avenue. In intertextual semantics (IS), the meaning of a marked-up document is entirely and exclusively represented in natural language.
+
''Formal Tag-Set Descriptions'' (see for example [{{CIDE lien citation|6}}], [{{CIDE lien citation|7}}] and [{{CIDE lien citation|8}}]) are an example of the approaches along the formal avenue. Intertextual semantics [{{CIDE lien citation|1}}] [{{CIDE lien citation|2}}] [{{CIDE lien citation|4}}] is an approach along the informal avenue. In intertextual semantics (IS), the meaning of a marked-up document is entirely and exclusively represented in natural language.
 +
 
 
The intertextual semantics (IS) approach is based on the hypothesis (of which traces can be found in, among other places, the works of  Wirzbicka [{{CIDE lien citation|9}}], Smedslund [{{CIDE lien citation|5}}], and even Wittgenstein [{{CIDE lien citation|10}}]) that humans ultimately “make sense” of artefacts through the use of natural language (NL), and that in designing artefacts, one should be preoccupied by how, and how easily and with how much ambiguity (or unambiguity), humans can derive NL from those artefacts. No matter how useful intermediate formal representations of meaning (including marked-up documents) may be for conciseness, machine processing, etc., they must ultimately be translatable (not necessarily translated) to NL, and are ever only as “meaningful” as such NL expressions of them are.
 
The intertextual semantics (IS) approach is based on the hypothesis (of which traces can be found in, among other places, the works of  Wirzbicka [{{CIDE lien citation|9}}], Smedslund [{{CIDE lien citation|5}}], and even Wittgenstein [{{CIDE lien citation|10}}]) that humans ultimately “make sense” of artefacts through the use of natural language (NL), and that in designing artefacts, one should be preoccupied by how, and how easily and with how much ambiguity (or unambiguity), humans can derive NL from those artefacts. No matter how useful intermediate formal representations of meaning (including marked-up documents) may be for conciseness, machine processing, etc., they must ultimately be translatable (not necessarily translated) to NL, and are ever only as “meaningful” as such NL expressions of them are.
  
 
In the realm of structured (i.e., marked-up) documents, IS suggests that the creators of tag-sets (modelers) should be preoccupied by how markup can be translated to NL. Even if “end users” never see any marked-up document, some other humans, for example, processing software developers, or archivists, will have to deal with them directly or indirectly, unless the documents are totally pointless. One might say it is even more important to be preoccupied by that translation as the number of intermediate representations increases, because there are then more opportunities for misinterpretations.
 
In the realm of structured (i.e., marked-up) documents, IS suggests that the creators of tag-sets (modelers) should be preoccupied by how markup can be translated to NL. Even if “end users” never see any marked-up document, some other humans, for example, processing software developers, or archivists, will have to deal with them directly or indirectly, unless the documents are totally pointless. One might say it is even more important to be preoccupied by that translation as the number of intermediate representations increases, because there are then more opportunities for misinterpretations.
 
   
 
   
IS proposes a mechanism by which NL passages (or whole documents) are generated from marked-up documents, according to an IS specification for the tag-set. So far, only very weak NL generation mechanisms have been explored, and it is extremely important that those mechanisms be weak, because too powerful mechanisms would “hide under the carpet” inherent interpretation complications which IS, in contrast, seeks to uncover. In the current state of the IS framework, an IS specification takes the form of a table giving, for each element type two NL segments: a “text-before” segment and a “text-after” segment, generically called “peritexts.”
+
IS proposes a mechanism by which NL passages (or whole documents) are generated from marked-up documents, according to an ''IS specification'' for the tag-set. So far, only very weak NL generation mechanisms have been explored, ''and it is extremely important that those mechanisms'' be weak, because too powerful mechanisms would “hide under the carpet” inherent interpretation complications which IS, in contrast, seeks to uncover. In the current state of the IS framework, an IS specification takes the form of a table giving, for each element type two NL segments: a “text-before” segment and a “text-after” segment, generically called “peritexts.”
  
 
Attributes require special attention, but a way of handling them in keeping with the spirit of IS is presented in [{{CIDE lien citation|2}}]. They are handled through the possibility of including in the peritexts “guarded segments,” segments guarded by an attribute name, that are only included if the corresponding attribute is specified on the element, and that can refer to the attribute value. “Local” elements (in the sense of W3C schemas) are supported, so that different peritexts can be assigned depending on the ancestors of the element.
 
Attributes require special attention, but a way of handling them in keeping with the spirit of IS is presented in [{{CIDE lien citation|2}}]. They are handled through the possibility of including in the peritexts “guarded segments,” segments guarded by an attribute name, that are only included if the corresponding attribute is specified on the element, and that can refer to the attribute value. “Local” elements (in the sense of W3C schemas) are supported, so that different peritexts can be assigned depending on the ancestors of the element.
  
The IS generation process is akin to styling the document with the peritexts, concatenating peritexts and element contents as the document tree is traversed depth-first. The IS, or IS-meaning, of the document is the resulting character string. It is important to stress that, in spite of the similarity between styling and the generation of the IS of a document, the preoccupations of IS are absolutely not at the presentational level, but really at the semantic level.
+
The IS generation process is akin to styling the document with the peritexts, concatenating peritexts and element contents as the document tree is traversed depth-first. The IS, or ''IS-meaning'', of the document is the resulting character string. It is important to stress that, in spite of the similarity between styling and the generation of the IS of a document, the preoccupations of IS are absolutely not at the presentational level, but really at the semantic level.
  
 
In this article, we present a complete implementation, in XSLT 1.0, of the intertextual semantics generation mechanism. The transformation is model-independent in that it reads the peritexts from an XML document encoding the IS specification for a given model. It implements attribute handling as defined in [{{CIDE lien citation|2}}]; hyperlinks in peritexts or as attribute or element content, as described in [{{CIDE lien citation|1}}]; and local element definitions, also as described in [{{CIDE lien citation|1}}]. In addition, it performs indentation of the output (in the same line as [{{CIDE lien citation|3}}], but more elaborate), for increased readability, and handles exceptions, elements for which no peritext exists in the IS specification and unexpected attributes.
 
In this article, we present a complete implementation, in XSLT 1.0, of the intertextual semantics generation mechanism. The transformation is model-independent in that it reads the peritexts from an XML document encoding the IS specification for a given model. It implements attribute handling as defined in [{{CIDE lien citation|2}}]; hyperlinks in peritexts or as attribute or element content, as described in [{{CIDE lien citation|1}}]; and local element definitions, also as described in [{{CIDE lien citation|1}}]. In addition, it performs indentation of the output (in the same line as [{{CIDE lien citation|3}}], but more elaborate), for increased readability, and handles exceptions, elements for which no peritext exists in the IS specification and unexpected attributes.

Version du 18 juillet 2016 à 08:29

Intertextual semantics generation for structured documents:a complete implementation in XSLT


 
 

 
titre
Intertextual semantics generation for structured documents:a complete implementation in XSLT
auteurs
Yves Marcoux.
Affiliations
GRDS, EBSI, Université de Montréal.
In
CIDE.12 (Montréal), 2009
En PDF 
CIDE (2009) Marcoux.pdf.pdf
Mots-clés 
Sémantique intertextuelle, documents structurés, langages de balisage, XML, XSLT, descriptions formelles de jeux de balises.
Keywords
Intertextual semantics, structured documents, markup languages, XML, XSLT, formal tag-set descriptions.
Résumé
La sémantique intertextuelle (SI) [1] [4] attribue aux documents balisés un sens en langue naturelle. Alors que les sémantiques formelles visent une représentation du sens des documents pour la machine, la SI vise l’humain. Dans la forme actuelle de l’approche, la SI d’un modèle (DTD, schéma) est donnée par deux péritextes associés à chaque élément: un texte-avant et un texte- après. La SI d’un document est la concaténation des péritextes et des contenus d’élément dans l’ordre du document. Nous présentons une implantation complète, en XSLT 1.0, de la génération de SI. L’implantation traite les attributs tel que décrit dans [2], et les hyperliens et éléments locaux tel que décrit dans [1]. Elle indente aussi l’extrant pour une meilleure lisibilité tel que suggéré dans [3] et gère les exceptions que sont les éléments et attributs inconnus.