TeiVM2, Main, Exploration, bibRecordById, ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C

Corpus Design Criteria

Identifieur interne : 000581 ( Main/Exploration ); précédent : 000580; suivant : 000582

Corpus Design Criteria

Auteurs : Sue Atkins ; Jeremy Clear ; Nicholas Ostler

Source :

Literary and Linguistic Computing [ 0268-1145 ] ; 1992.

RBID : ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C

Abstract

‘Corpus Design Criteria’ beings (Section 1) by defining the object to be created, a corpus, and the constituents of it, texts themselves, noting briefly the pragmatic constraints on the sort of documents which will actually be available, spoken as well as written. It then (Section 2) reviews the practical stages in the process of establishing a corpus, from selection of sources through to mark-up, assigning annotations to the texts assembled. This is followed by a consideration of copyright problems (Section 3). Section 4 points out the major difficulties in defining the population of texts that the corpus will sample, contrasting the sets of texts received versus those produced by a target group, and internal (linguistic) versus external (social) means of defining such groups. The next three sections look at the sets of markers which can be useful at different levels Section 7 begins at the highest level, considering the different types of corpus there may be. Section 6 is intermediate, considering how to distinguish the different types of text occurring within a corpus. Then, for the intra-text level. Section 7 reviews considerations governing mark-up, distinguishing those markers useful for written and spoken texts. Of these three sections. Section 6 is the most fully explicit, listing twenty-nine significant attributes assignable to a text. Sections 8 and 9 turn away from the corpus design itself, to focus on its social context and function, both of the corpus design process, and of the corpus when implemented: to what extent are there now accepted standards relevant to the criteria reviewed in preceding sections? And what are the major classes of potential users and uses for corpora, both now and in the future?

Url:

https://api.istex.fr/document/D0CF95D3285FF52254C40F6913E14ADA65C37E2C/fulltext/pdf

DOI: 10.1093/llc/7.1.1

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000336
to stream Istex, to step Curation: 000336
to stream Istex, to step Checkpoint: 000477
to stream Main, to step Merge: 000620
to stream Main, to step Curation: 000581

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Corpus Design Criteria</title>
<author><name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
</author>
<author><name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
</author>
<author><name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C</idno>
<date when="1992" year="1992">1992</date>
<idno type="doi">10.1093/llc/7.1.1</idno>
<idno type="url">https://api.istex.fr/document/D0CF95D3285FF52254C40F6913E14ADA65C37E2C/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000336</idno>
<idno type="wicri:Area/Istex/Curation">000336</idno>
<idno type="wicri:Area/Istex/Checkpoint">000477</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000477</idno>
<idno type="wicri:doubleKey">0268-1145:1992:Atkins S:corpus:design:criteria</idno>
<idno type="wicri:Area/Main/Merge">000620</idno>
<idno type="wicri:Area/Main/Curation">000581</idno>
<idno type="wicri:Area/Main/Exploration">000581</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Corpus Design Criteria</title>
<author><name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
<affiliation><wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
<affiliation><wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
<affiliation><wicri:noCountry code="no comma">Linguacubun UK</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint><publisher>Oxford University Press</publisher>
<date type="published" when="1992">1992</date>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="1">1</biblScope>
<biblScope unit="page" to="16">16</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">D0CF95D3285FF52254C40F6913E14ADA65C37E2C</idno>
<idno type="DOI">10.1093/llc/7.1.1</idno>
<idno type="ArticleID">7.1.1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract">‘Corpus Design Criteria’ beings (Section 1) by defining the object to be created, a corpus, and the constituents of it, texts themselves, noting briefly the pragmatic constraints on the sort of documents which will actually be available, spoken as well as written. It then (Section 2) reviews the practical stages in the process of establishing a corpus, from selection of sources through to mark-up, assigning annotations to the texts assembled. This is followed by a consideration of copyright problems (Section 3). Section 4 points out the major difficulties in defining the population of texts that the corpus will sample, contrasting the sets of texts received versus those produced by a target group, and internal (linguistic) versus external (social) means of defining such groups. The next three sections look at the sets of markers which can be useful at different levels Section 7 begins at the highest level, considering the different types of corpus there may be. Section 6 is intermediate, considering how to distinguish the different types of text occurring within a corpus. Then, for the intra-text level. Section 7 reviews considerations governing mark-up, distinguishing those markers useful for written and spoken texts. Of these three sections. Section 6 is the most fully explicit, listing twenty-nine significant attributes assignable to a text. Sections 8 and 9 turn away from the corpus design itself, to focus on its social context and function, both of the corpus design process, and of the corpus when implemented: to what extent are there now accepted standards relevant to the criteria reviewed in preceding sections? And what are the major classes of potential users and uses for corpora, both now and in the future?</div>
</front>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
<name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
<name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000581 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000581 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C
   |texte=   Corpus Design Criteria
}}

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024

	Serveur d'exploration sur la TEI
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la TEI

Corpus Design Criteria

Corpus Design Criteria

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri