Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Corpus Design Criteria

Identifieur interne : 000336 ( Istex/Curation ); précédent : 000335; suivant : 000337

Corpus Design Criteria

Auteurs : Sue Atkins ; Jeremy Clear ; Nicholas Ostler

Source :

RBID : ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C

Abstract

‘Corpus Design Criteria’ beings (Section 1) by defining the object to be created, a corpus, and the constituents of it, texts themselves, noting briefly the pragmatic constraints on the sort of documents which will actually be available, spoken as well as written. It then (Section 2) reviews the practical stages in the process of establishing a corpus, from selection of sources through to mark-up, assigning annotations to the texts assembled. This is followed by a consideration of copyright problems (Section 3). Section 4 points out the major difficulties in defining the population of texts that the corpus will sample, contrasting the sets of texts received versus those produced by a target group, and internal (linguistic) versus external (social) means of defining such groups. The next three sections look at the sets of markers which can be useful at different levels Section 7 begins at the highest level, considering the different types of corpus there may be. Section 6 is intermediate, considering how to distinguish the different types of text occurring within a corpus. Then, for the intra-text level. Section 7 reviews considerations governing mark-up, distinguishing those markers useful for written and spoken texts. Of these three sections. Section 6 is the most fully explicit, listing twenty-nine significant attributes assignable to a text. Sections 8 and 9 turn away from the corpus design itself, to focus on its social context and function, both of the corpus design process, and of the corpus when implemented: to what extent are there now accepted standards relevant to the criteria reviewed in preceding sections? And what are the major classes of potential users and uses for corpora, both now and in the future?

Url:
DOI: 10.1093/llc/7.1.1

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C

Curation

No country items

Sue Atkins
<affiliation>
<mods:affiliation>Oxford University Press UK</mods:affiliation>
<wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
Jeremy Clear
<affiliation>
<mods:affiliation>Oxford University Press UK</mods:affiliation>
<wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
Nicholas Ostler
<affiliation>
<mods:affiliation>Linguacubun UK</mods:affiliation>
<wicri:noCountry code="no comma">Linguacubun UK</wicri:noCountry>
</affiliation>

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Corpus Design Criteria</title>
<author>
<name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
<affiliation>
<mods:affiliation>Oxford University Press UK</mods:affiliation>
<wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
<affiliation>
<mods:affiliation>Oxford University Press UK</mods:affiliation>
<wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
<affiliation>
<mods:affiliation>Linguacubun UK</mods:affiliation>
<wicri:noCountry code="no comma">Linguacubun UK</wicri:noCountry>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C</idno>
<date when="1992" year="1992">1992</date>
<idno type="doi">10.1093/llc/7.1.1</idno>
<idno type="url">https://api.istex.fr/document/D0CF95D3285FF52254C40F6913E14ADA65C37E2C/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000336</idno>
<idno type="wicri:Area/Istex/Curation">000336</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Corpus Design Criteria</title>
<author>
<name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
<affiliation>
<mods:affiliation>Oxford University Press UK</mods:affiliation>
<wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
<affiliation>
<mods:affiliation>Oxford University Press UK</mods:affiliation>
<wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
<affiliation>
<mods:affiliation>Linguacubun UK</mods:affiliation>
<wicri:noCountry code="no comma">Linguacubun UK</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="1992">1992</date>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="1">1</biblScope>
<biblScope unit="page" to="16">16</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">D0CF95D3285FF52254C40F6913E14ADA65C37E2C</idno>
<idno type="DOI">10.1093/llc/7.1.1</idno>
<idno type="ArticleID">7.1.1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract">‘Corpus Design Criteria’ beings (Section 1) by defining the object to be created, a corpus, and the constituents of it, texts themselves, noting briefly the pragmatic constraints on the sort of documents which will actually be available, spoken as well as written. It then (Section 2) reviews the practical stages in the process of establishing a corpus, from selection of sources through to mark-up, assigning annotations to the texts assembled. This is followed by a consideration of copyright problems (Section 3). Section 4 points out the major difficulties in defining the population of texts that the corpus will sample, contrasting the sets of texts received versus those produced by a target group, and internal (linguistic) versus external (social) means of defining such groups. The next three sections look at the sets of markers which can be useful at different levels Section 7 begins at the highest level, considering the different types of corpus there may be. Section 6 is intermediate, considering how to distinguish the different types of text occurring within a corpus. Then, for the intra-text level. Section 7 reviews considerations governing mark-up, distinguishing those markers useful for written and spoken texts. Of these three sections. Section 6 is the most fully explicit, listing twenty-nine significant attributes assignable to a text. Sections 8 and 9 turn away from the corpus design itself, to focus on its social context and function, both of the corpus design process, and of the corpus when implemented: to what extent are there now accepted standards relevant to the criteria reviewed in preceding sections? And what are the major classes of potential users and uses for corpora, both now and in the future?</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000336 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 000336 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C
   |texte=   Corpus Design Criteria
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024