TeiVM2, Main, Merge, bibRecord, 000117

SusTEInability of linguistic resources through feature structures

Identifieur interne : 000117 ( Main/Merge ); précédent : 000116; suivant : 000118

SusTEInability of linguistic resources through feature structures

Auteurs : Andreas Witt [Allemagne] ; Georg Rehm [Allemagne] ; Erhard Hinrichs [Allemagne] ; Timm Lehmberg [Allemagne] ; Jens Stegmann [Allemagne]

Source :

Literary and linguistic computing [ 0268-1145 ] ; 2009.

RBID : Francis:11-0223735

Descripteurs français

Pascal (Inist)
- Ressources linguistiques, Annotation de corpus, TEI, Structure de traits, Langage de balisage.

English descriptors

KwdEn :
- Corpus annotation, Feature structure, Linguistic resources, Markup language, TEI.

Abstract

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000007
to stream PascalFrancis, to step Curation: 000038
to stream PascalFrancis, to step Checkpoint: 000014

Links to Exploration step

Francis:11-0223735

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">SusTEInability of linguistic resources through feature structures</title>
<author><name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName><region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Karlsruhe</region>
<settlement type="city">Mannheim</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName><region type="land" nuts="3">Berlin</region>
<settlement type="city">Berlin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<affiliation wicri:level="1"><inist:fA14 i1="03"><s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>General and Computational Linguistics</wicri:noRegion>
<wicri:noRegion>Tübingen University, General and Computational Linguistics</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<affiliation wicri:level="1"><inist:fA14 i1="04"><s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>SFB Multilingualism</wicri:noRegion>
<wicri:noRegion>Hamburg University, SFB Multilingualism</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
<affiliation wicri:level="1"><inist:fA14 i1="05"><s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>Faculty of Linguistics and Literary Studies</wicri:noRegion>
<wicri:noRegion>Bielefeld University, Faculty of Linguistics and Literary Studies</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">11-0223735</idno>
<date when="2009">2009</date>
<idno type="stanalyst">FRANCIS 11-0223735 INIST</idno>
<idno type="RBID">Francis:11-0223735</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000007</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000038</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000014</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000014</idno>
<idno type="wicri:doubleKey">0268-1145:2009:Witt A:susteinability:of:linguistic</idno>
<idno type="wicri:Area/Main/Merge">000117</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">SusTEInability of linguistic resources through feature structures</title>
<author><name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName><region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Karlsruhe</region>
<settlement type="city">Mannheim</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName><region type="land" nuts="3">Berlin</region>
<settlement type="city">Berlin</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<affiliation wicri:level="1"><inist:fA14 i1="03"><s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>General and Computational Linguistics</wicri:noRegion>
<wicri:noRegion>Tübingen University, General and Computational Linguistics</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<affiliation wicri:level="1"><inist:fA14 i1="04"><s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>SFB Multilingualism</wicri:noRegion>
<wicri:noRegion>Hamburg University, SFB Multilingualism</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
<affiliation wicri:level="1"><inist:fA14 i1="05"><s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<wicri:noRegion>Faculty of Linguistics and Literary Studies</wicri:noRegion>
<wicri:noRegion>Bielefeld University, Faculty of Linguistics and Literary Studies</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Literary and linguistic computing</title>
<title level="j" type="abbreviated">Lit. linguist. comput.</title>
<idno type="ISSN">0268-1145</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Literary and linguistic computing</title>
<title level="j" type="abbreviated">Lit. linguist. comput.</title>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Corpus annotation</term>
<term>Feature structure</term>
<term>Linguistic resources</term>
<term>Markup language</term>
<term>TEI</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Ressources linguistiques</term>
<term>Annotation de corpus</term>
<term>TEI</term>
<term>Structure de traits</term>
<term>Langage de balisage</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.</div>
</front>
</TEI>
<affiliations><list><country><li>Allemagne</li>
</country>
<region><li>Bade-Wurtemberg</li>
<li>Berlin</li>
<li>District de Karlsruhe</li>
</region>
<settlement><li>Berlin</li>
<li>Mannheim</li>
</settlement>
</list>
<tree><country name="Allemagne"><region name="Bade-Wurtemberg"><name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
</region>
<name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000117 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 000117 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     Francis:11-0223735
   |texte=   SusTEInability of linguistic resources through feature structures
}}

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024

	Serveur d'exploration sur la TEI
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la TEI

SusTEInability of linguistic resources through feature structures

SusTEInability of linguistic resources through feature structures

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri