SusTEInability of linguistic resources through feature structures
Identifieur interne :
000038 ( PascalFrancis/Curation );
précédent :
000037;
suivant :
000039
SusTEInability of linguistic resources through feature structures
Auteurs : Andreas Witt [
Allemagne] ;
Georg Rehm [
Allemagne] ;
Erhard Hinrichs [
Allemagne] ;
Timm Lehmberg [
Allemagne] ;
Jens Stegmann [
Allemagne]
Source :
-
Literary and linguistic computing [ 0268-1145 ] ; 2009.
RBID : Francis:11-0223735
Descripteurs français
English descriptors
Abstract
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.
pA |
A01 | 01 | 1 | | @0 0268-1145 |
---|
A03 | | 1 | | @0 Lit. linguist. comput. |
---|
A05 | | | | @2 24 |
---|
A06 | | | | @2 3 |
---|
A08 | 01 | 1 | ENG | @1 SusTEInability of linguistic resources through feature structures |
---|
A09 | 01 | 1 | ENG | @1 Selected papers from Text Encoding Initiative |
---|
A11 | 01 | 1 | | @1 WITT (Andreas) |
---|
A11 | 02 | 1 | | @1 REHM (Georg) |
---|
A11 | 03 | 1 | | @1 HINRICHS (Erhard) |
---|
A11 | 04 | 1 | | @1 LEHMBERG (Timm) |
---|
A11 | 05 | 1 | | @1 STEGMANN (Jens) |
---|
A12 | 01 | 1 | | @1 RAHTZ (Sebastian) @9 ed. |
---|
A12 | 02 | 1 | | @1 SCHREIBMAN (Susan) @9 ed. |
---|
A14 | 01 | | | @1 Institut für Deutsche Sprache @2 Mannheim @3 DEU @Z 1 aut. |
---|
A14 | 02 | | | @1 Vionto GmbH @2 Berlin @3 DEU @Z 2 aut. |
---|
A14 | 03 | | | @1 Tübingen University, General and Computational Linguistics @3 DEU @Z 3 aut. |
---|
A14 | 04 | | | @1 Hamburg University, SFB Multilingualism @3 DEU @Z 4 aut. |
---|
A14 | 05 | | | @1 Bielefeld University, Faculty of Linguistics and Literary Studies @3 DEU @Z 5 aut. |
---|
A15 | 01 | | | @1 Oxford University @3 GBR @Z 1 aut. |
---|
A15 | 02 | | | @1 Digital Humanities Observatory, Royal Irish Academy @3 IRL @Z 2 aut. |
---|
A20 | | | | @1 363-372 |
---|
A21 | | | | @1 2009 |
---|
A23 | 01 | | | @0 ENG |
---|
A43 | 01 | | | @1 INIST @2 23967 @5 354000171896560100 |
---|
A44 | | | | @0 0000 @1 © 2011 INIST-CNRS. All rights reserved. |
---|
A45 | | | | @0 1 p.1/4 |
---|
A47 | 01 | 1 | | @0 11-0223735 |
---|
A60 | | | | @1 P @2 C |
---|
A61 | | | | @0 A |
---|
A64 | 01 | 1 | | @0 Literary and linguistic computing |
---|
A66 | 01 | | | @0 GBR |
---|
A68 | 01 | 1 | FRE | @1 "SusTEInability" des ressources linguistiques par le biais de structures de traits |
---|
A69 | 01 | 1 | FRE | @1 Sélection d'articles de la TEI |
---|
A99 | | | | @0 5 notes |
---|
C01 | 01 | | ENG | @0 This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data. |
---|
C02 | 01 | L | | @0 52478 @1 XV |
---|
C02 | 02 | L | | @0 524 |
---|
C03 | 01 | L | FRE | @0 Ressources linguistiques @2 563 @5 01 |
---|
C03 | 01 | L | ENG | @0 Linguistic resources @2 563 @5 01 |
---|
C03 | 02 | L | FRE | @0 Annotation de corpus @2 NI @5 02 |
---|
C03 | 02 | L | ENG | @0 Corpus annotation @2 NI @5 02 |
---|
C03 | 03 | L | FRE | @0 TEI @2 NI @5 05 |
---|
C03 | 03 | L | ENG | @0 TEI @2 NI @5 05 |
---|
C03 | 04 | L | FRE | @0 Structure de traits @2 NI @5 06 |
---|
C03 | 04 | L | ENG | @0 Feature structure @2 NI @5 06 |
---|
C03 | 05 | L | FRE | @0 Langage de balisage @2 NI @5 07 |
---|
C03 | 05 | L | ENG | @0 Markup language @2 NI @5 07 |
---|
C07 | 01 | L | FRE | @0 Linguistique de corpus @2 NI @5 03 |
---|
C07 | 01 | L | ENG | @0 Corpus linguistics @2 NI @5 03 |
---|
C07 | 02 | L | FRE | @0 Linguistique informatique @2 NI @5 04 |
---|
C07 | 02 | L | ENG | @0 Computational linguistics @2 NI @5 04 |
---|
N21 | | | | @1 150 |
---|
|
pR |
A30 | 01 | 1 | ENG | @1 TEI Consortium @2 6 @3 Maryland USA @4 2007-11 |
---|
|
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000007
Links to Exploration step
Francis:11-0223735
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">SusTEInability of linguistic resources through feature structures</title>
<author><name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<affiliation wicri:level="1"><inist:fA14 i1="03"><s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<affiliation wicri:level="1"><inist:fA14 i1="04"><s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
<affiliation wicri:level="1"><inist:fA14 i1="05"><s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">11-0223735</idno>
<date when="2009">2009</date>
<idno type="stanalyst">FRANCIS 11-0223735 INIST</idno>
<idno type="RBID">Francis:11-0223735</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000007</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000038</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">SusTEInability of linguistic resources through feature structures</title>
<author><name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<affiliation wicri:level="1"><inist:fA14 i1="03"><s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<affiliation wicri:level="1"><inist:fA14 i1="04"><s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
<affiliation wicri:level="1"><inist:fA14 i1="05"><s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Literary and linguistic computing</title>
<title level="j" type="abbreviated">Lit. linguist. comput.</title>
<idno type="ISSN">0268-1145</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Literary and linguistic computing</title>
<title level="j" type="abbreviated">Lit. linguist. comput.</title>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Corpus annotation</term>
<term>Feature structure</term>
<term>Linguistic resources</term>
<term>Markup language</term>
<term>TEI</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Ressources linguistiques</term>
<term>Annotation de corpus</term>
<term>TEI</term>
<term>Structure de traits</term>
<term>Langage de balisage</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0268-1145</s0>
</fA01>
<fA03 i2="1"><s0>Lit. linguist. comput.</s0>
</fA03>
<fA08 i1="01" i2="1" l="ENG"><s1>SusTEInability of linguistic resources through feature structures</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Selected papers from Text Encoding Initiative</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>WITT (Andreas)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>REHM (Georg)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>HINRICHS (Erhard)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>LEHMBERG (Timm)</s1>
</fA11>
<fA11 i1="05" i2="1"><s1>STEGMANN (Jens)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>RAHTZ (Sebastian)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>SCHREIBMAN (Susan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA14 i1="03"><s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</fA14>
<fA14 i1="04"><s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</fA14>
<fA14 i1="05"><s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>Oxford University</s1>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>Digital Humanities Observatory, Royal Irish Academy</s1>
<s3>IRL</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA20><s1>363-372</s1>
</fA20>
<fA21><s1>2009</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA43 i1="01"><s1>INIST</s1>
<s2>23967</s2>
<s5>354000171896560100</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2011 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.1/4</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>11-0223735</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA64 i1="01" i2="1"><s0>Literary and linguistic computing</s0>
</fA64>
<fA66 i1="01"><s0>GBR</s0>
</fA66>
<fA68 i1="01" i2="1" l="FRE"><s1>"SusTEInability" des ressources linguistiques par le biais de structures de traits</s1>
</fA68>
<fA69 i1="01" i2="1" l="FRE"><s1>Sélection d'articles de la TEI</s1>
</fA69>
<fA99><s0>5 notes</s0>
</fA99>
<fC01 i1="01" l="ENG"><s0>This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.</s0>
</fC01>
<fC02 i1="01" i2="L"><s0>52478</s0>
<s1>XV</s1>
</fC02>
<fC02 i1="02" i2="L"><s0>524</s0>
</fC02>
<fC03 i1="01" i2="L" l="FRE"><s0>Ressources linguistiques</s0>
<s2>563</s2>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="L" l="ENG"><s0>Linguistic resources</s0>
<s2>563</s2>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="L" l="FRE"><s0>Annotation de corpus</s0>
<s2>NI</s2>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="L" l="ENG"><s0>Corpus annotation</s0>
<s2>NI</s2>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="L" l="FRE"><s0>TEI</s0>
<s2>NI</s2>
<s5>05</s5>
</fC03>
<fC03 i1="03" i2="L" l="ENG"><s0>TEI</s0>
<s2>NI</s2>
<s5>05</s5>
</fC03>
<fC03 i1="04" i2="L" l="FRE"><s0>Structure de traits</s0>
<s2>NI</s2>
<s5>06</s5>
</fC03>
<fC03 i1="04" i2="L" l="ENG"><s0>Feature structure</s0>
<s2>NI</s2>
<s5>06</s5>
</fC03>
<fC03 i1="05" i2="L" l="FRE"><s0>Langage de balisage</s0>
<s2>NI</s2>
<s5>07</s5>
</fC03>
<fC03 i1="05" i2="L" l="ENG"><s0>Markup language</s0>
<s2>NI</s2>
<s5>07</s5>
</fC03>
<fC07 i1="01" i2="L" l="FRE"><s0>Linguistique de corpus</s0>
<s2>NI</s2>
<s5>03</s5>
</fC07>
<fC07 i1="01" i2="L" l="ENG"><s0>Corpus linguistics</s0>
<s2>NI</s2>
<s5>03</s5>
</fC07>
<fC07 i1="02" i2="L" l="FRE"><s0>Linguistique informatique</s0>
<s2>NI</s2>
<s5>04</s5>
</fC07>
<fC07 i1="02" i2="L" l="ENG"><s0>Computational linguistics</s0>
<s2>NI</s2>
<s5>04</s5>
</fC07>
<fN21><s1>150</s1>
</fN21>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>TEI Consortium</s1>
<s2>6</s2>
<s3>Maryland USA</s3>
<s4>2007-11</s4>
</fA30>
</pR>
</standard>
</inist>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/PascalFrancis/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000038 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Curation/biblio.hfd -nk 000038 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien
|wiki= Wicri/Ticri
|area= TeiVM2
|flux= PascalFrancis
|étape= Curation
|type= RBID
|clé= Francis:11-0223735
|texte= SusTEInability of linguistic resources through feature structures
}}
| This area was generated with Dilib version V0.6.31. Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024 | |