Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SusTEInability of linguistic resources through feature structures

Identifieur interne : 000038 ( PascalFrancis/Curation ); précédent : 000037; suivant : 000039

SusTEInability of linguistic resources through feature structures

Auteurs : Andreas Witt [Allemagne] ; Georg Rehm [Allemagne] ; Erhard Hinrichs [Allemagne] ; Timm Lehmberg [Allemagne] ; Jens Stegmann [Allemagne]

Source :

RBID : Francis:11-0223735

Descripteurs français

English descriptors

Abstract

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.
pA  
A01 01  1    @0 0268-1145
A03   1    @0 Lit. linguist. comput.
A05       @2 24
A06       @2 3
A08 01  1  ENG  @1 SusTEInability of linguistic resources through feature structures
A09 01  1  ENG  @1 Selected papers from Text Encoding Initiative
A11 01  1    @1 WITT (Andreas)
A11 02  1    @1 REHM (Georg)
A11 03  1    @1 HINRICHS (Erhard)
A11 04  1    @1 LEHMBERG (Timm)
A11 05  1    @1 STEGMANN (Jens)
A12 01  1    @1 RAHTZ (Sebastian) @9 ed.
A12 02  1    @1 SCHREIBMAN (Susan) @9 ed.
A14 01      @1 Institut für Deutsche Sprache @2 Mannheim @3 DEU @Z 1 aut.
A14 02      @1 Vionto GmbH @2 Berlin @3 DEU @Z 2 aut.
A14 03      @1 Tübingen University, General and Computational Linguistics @3 DEU @Z 3 aut.
A14 04      @1 Hamburg University, SFB Multilingualism @3 DEU @Z 4 aut.
A14 05      @1 Bielefeld University, Faculty of Linguistics and Literary Studies @3 DEU @Z 5 aut.
A15 01      @1 Oxford University @3 GBR @Z 1 aut.
A15 02      @1 Digital Humanities Observatory, Royal Irish Academy @3 IRL @Z 2 aut.
A20       @1 363-372
A21       @1 2009
A23 01      @0 ENG
A43 01      @1 INIST @2 23967 @5 354000171896560100
A44       @0 0000 @1 © 2011 INIST-CNRS. All rights reserved.
A45       @0 1 p.1/4
A47 01  1    @0 11-0223735
A60       @1 P @2 C
A61       @0 A
A64 01  1    @0 Literary and linguistic computing
A66 01      @0 GBR
A68 01  1  FRE  @1 "SusTEInability" des ressources linguistiques par le biais de structures de traits
A69 01  1  FRE  @1 Sélection d'articles de la TEI
A99       @0 5 notes
C01 01    ENG  @0 This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.
C02 01  L    @0 52478 @1 XV
C02 02  L    @0 524
C03 01  L  FRE  @0 Ressources linguistiques @2 563 @5 01
C03 01  L  ENG  @0 Linguistic resources @2 563 @5 01
C03 02  L  FRE  @0 Annotation de corpus @2 NI @5 02
C03 02  L  ENG  @0 Corpus annotation @2 NI @5 02
C03 03  L  FRE  @0 TEI @2 NI @5 05
C03 03  L  ENG  @0 TEI @2 NI @5 05
C03 04  L  FRE  @0 Structure de traits @2 NI @5 06
C03 04  L  ENG  @0 Feature structure @2 NI @5 06
C03 05  L  FRE  @0 Langage de balisage @2 NI @5 07
C03 05  L  ENG  @0 Markup language @2 NI @5 07
C07 01  L  FRE  @0 Linguistique de corpus @2 NI @5 03
C07 01  L  ENG  @0 Corpus linguistics @2 NI @5 03
C07 02  L  FRE  @0 Linguistique informatique @2 NI @5 04
C07 02  L  ENG  @0 Computational linguistics @2 NI @5 04
N21       @1 150
pR  
A30 01  1  ENG  @1 TEI Consortium @2 6 @3 Maryland USA @4 2007-11

Links toward previous steps (curation, corpus...)


Links to Exploration step

Francis:11-0223735

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">SusTEInability of linguistic resources through feature structures</title>
<author>
<name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<affiliation wicri:level="1">
<inist:fA14 i1="04">
<s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
<affiliation wicri:level="1">
<inist:fA14 i1="05">
<s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0223735</idno>
<date when="2009">2009</date>
<idno type="stanalyst">FRANCIS 11-0223735 INIST</idno>
<idno type="RBID">Francis:11-0223735</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000007</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000038</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">SusTEInability of linguistic resources through feature structures</title>
<author>
<name sortKey="Witt, Andreas" sort="Witt, Andreas" uniqKey="Witt A" first="Andreas" last="Witt">Andreas Witt</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Rehm, Georg" sort="Rehm, Georg" uniqKey="Rehm G" first="Georg" last="Rehm">Georg Rehm</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Hinrichs, Erhard" sort="Hinrichs, Erhard" uniqKey="Hinrichs E" first="Erhard" last="Hinrichs">Erhard Hinrichs</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Lehmberg, Timm" sort="Lehmberg, Timm" uniqKey="Lehmberg T" first="Timm" last="Lehmberg">Timm Lehmberg</name>
<affiliation wicri:level="1">
<inist:fA14 i1="04">
<s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Stegmann, Jens" sort="Stegmann, Jens" uniqKey="Stegmann J" first="Jens" last="Stegmann">Jens Stegmann</name>
<affiliation wicri:level="1">
<inist:fA14 i1="05">
<s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Literary and linguistic computing</title>
<title level="j" type="abbreviated">Lit. linguist. comput.</title>
<idno type="ISSN">0268-1145</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Literary and linguistic computing</title>
<title level="j" type="abbreviated">Lit. linguist. comput.</title>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Corpus annotation</term>
<term>Feature structure</term>
<term>Linguistic resources</term>
<term>Markup language</term>
<term>TEI</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Ressources linguistiques</term>
<term>Annotation de corpus</term>
<term>TEI</term>
<term>Structure de traits</term>
<term>Langage de balisage</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0268-1145</s0>
</fA01>
<fA03 i2="1">
<s0>Lit. linguist. comput.</s0>
</fA03>
<fA05>
<s2>24</s2>
</fA05>
<fA06>
<s2>3</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>SusTEInability of linguistic resources through feature structures</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>Selected papers from Text Encoding Initiative</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>WITT (Andreas)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>REHM (Georg)</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>HINRICHS (Erhard)</s1>
</fA11>
<fA11 i1="04" i2="1">
<s1>LEHMBERG (Timm)</s1>
</fA11>
<fA11 i1="05" i2="1">
<s1>STEGMANN (Jens)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>RAHTZ (Sebastian)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>SCHREIBMAN (Susan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Institut für Deutsche Sprache</s1>
<s2>Mannheim</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Vionto GmbH</s1>
<s2>Berlin</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA14 i1="03">
<s1>Tübingen University, General and Computational Linguistics</s1>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</fA14>
<fA14 i1="04">
<s1>Hamburg University, SFB Multilingualism</s1>
<s3>DEU</s3>
<sZ>4 aut.</sZ>
</fA14>
<fA14 i1="05">
<s1>Bielefeld University, Faculty of Linguistics and Literary Studies</s1>
<s3>DEU</s3>
<sZ>5 aut.</sZ>
</fA14>
<fA15 i1="01">
<s1>Oxford University</s1>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02">
<s1>Digital Humanities Observatory, Royal Irish Academy</s1>
<s3>IRL</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA20>
<s1>363-372</s1>
</fA20>
<fA21>
<s1>2009</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>23967</s2>
<s5>354000171896560100</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2011 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>1 p.1/4</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>11-0223735</s0>
</fA47>
<fA60>
<s1>P</s1>
<s2>C</s2>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Literary and linguistic computing</s0>
</fA64>
<fA66 i1="01">
<s0>GBR</s0>
</fA66>
<fA68 i1="01" i2="1" l="FRE">
<s1>"SusTEInability" des ressources linguistiques par le biais de structures de traits</s1>
</fA68>
<fA69 i1="01" i2="1" l="FRE">
<s1>Sélection d'articles de la TEI</s1>
</fA69>
<fA99>
<s0>5 notes</s0>
</fA99>
<fC01 i1="01" l="ENG">
<s0>This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation comprises the separation of conceptually different annotation layers contained in the original corpus data (e.g. syntax, phonology, and semantics) into multiple XML files. These annotation layers are linked to each other implicitly by the identical textual content of all files. A suitable data structure for the representation of these annotations is a multi-rooted tree that again can be represented by the TEI and ISO tag set for feature structures. The mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEI tag set for feature structures as a storage and exchange format for linguistically annotated data.</s0>
</fC01>
<fC02 i1="01" i2="L">
<s0>52478</s0>
<s1>XV</s1>
</fC02>
<fC02 i1="02" i2="L">
<s0>524</s0>
</fC02>
<fC03 i1="01" i2="L" l="FRE">
<s0>Ressources linguistiques</s0>
<s2>563</s2>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="L" l="ENG">
<s0>Linguistic resources</s0>
<s2>563</s2>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="L" l="FRE">
<s0>Annotation de corpus</s0>
<s2>NI</s2>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="L" l="ENG">
<s0>Corpus annotation</s0>
<s2>NI</s2>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="L" l="FRE">
<s0>TEI</s0>
<s2>NI</s2>
<s5>05</s5>
</fC03>
<fC03 i1="03" i2="L" l="ENG">
<s0>TEI</s0>
<s2>NI</s2>
<s5>05</s5>
</fC03>
<fC03 i1="04" i2="L" l="FRE">
<s0>Structure de traits</s0>
<s2>NI</s2>
<s5>06</s5>
</fC03>
<fC03 i1="04" i2="L" l="ENG">
<s0>Feature structure</s0>
<s2>NI</s2>
<s5>06</s5>
</fC03>
<fC03 i1="05" i2="L" l="FRE">
<s0>Langage de balisage</s0>
<s2>NI</s2>
<s5>07</s5>
</fC03>
<fC03 i1="05" i2="L" l="ENG">
<s0>Markup language</s0>
<s2>NI</s2>
<s5>07</s5>
</fC03>
<fC07 i1="01" i2="L" l="FRE">
<s0>Linguistique de corpus</s0>
<s2>NI</s2>
<s5>03</s5>
</fC07>
<fC07 i1="01" i2="L" l="ENG">
<s0>Corpus linguistics</s0>
<s2>NI</s2>
<s5>03</s5>
</fC07>
<fC07 i1="02" i2="L" l="FRE">
<s0>Linguistique informatique</s0>
<s2>NI</s2>
<s5>04</s5>
</fC07>
<fC07 i1="02" i2="L" l="ENG">
<s0>Computational linguistics</s0>
<s2>NI</s2>
<s5>04</s5>
</fC07>
<fN21>
<s1>150</s1>
</fN21>
</pA>
<pR>
<fA30 i1="01" i2="1" l="ENG">
<s1>TEI Consortium</s1>
<s2>6</s2>
<s3>Maryland USA</s3>
<s4>2007-11</s4>
</fA30>
</pR>
</standard>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/PascalFrancis/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000038 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Curation/biblio.hfd -nk 000038 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    PascalFrancis
   |étape=   Curation
   |type=    RBID
   |clé=     Francis:11-0223735
   |texte=   SusTEInability of linguistic resources through feature structures
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024