Corpus Design Criteria
Identifieur interne : 000581 ( Main/Exploration ); précédent : 000580; suivant : 000582Corpus Design Criteria
Auteurs : Sue Atkins ; Jeremy Clear ; Nicholas OstlerSource :
- Literary and Linguistic Computing [ 0268-1145 ] ; 1992.
Abstract
‘Corpus Design Criteria’ beings (Section 1) by defining the object to be created, a corpus, and the constituents of it, texts themselves, noting briefly the pragmatic constraints on the sort of documents which will actually be available, spoken as well as written. It then (Section 2) reviews the practical stages in the process of establishing a corpus, from selection of sources through to mark-up, assigning annotations to the texts assembled. This is followed by a consideration of copyright problems (Section 3). Section 4 points out the major difficulties in defining the population of texts that the corpus will sample, contrasting the sets of texts received versus those produced by a target group, and internal (linguistic) versus external (social) means of defining such groups. The next three sections look at the sets of markers which can be useful at different levels Section 7 begins at the highest level, considering the different types of corpus there may be. Section 6 is intermediate, considering how to distinguish the different types of text occurring within a corpus. Then, for the intra-text level. Section 7 reviews considerations governing mark-up, distinguishing those markers useful for written and spoken texts. Of these three sections. Section 6 is the most fully explicit, listing twenty-nine significant attributes assignable to a text. Sections 8 and 9 turn away from the corpus design itself, to focus on its social context and function, both of the corpus design process, and of the corpus when implemented: to what extent are there now accepted standards relevant to the criteria reviewed in preceding sections? And what are the major classes of potential users and uses for corpora, both now and in the future?
Url:
DOI: 10.1093/llc/7.1.1
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000336
- to stream Istex, to step Curation: 000336
- to stream Istex, to step Checkpoint: 000477
- to stream Main, to step Merge: 000620
- to stream Main, to step Curation: 000581
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Corpus Design Criteria</title>
<author><name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
</author>
<author><name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
</author>
<author><name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C</idno>
<date when="1992" year="1992">1992</date>
<idno type="doi">10.1093/llc/7.1.1</idno>
<idno type="url">https://api.istex.fr/document/D0CF95D3285FF52254C40F6913E14ADA65C37E2C/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000336</idno>
<idno type="wicri:Area/Istex/Curation">000336</idno>
<idno type="wicri:Area/Istex/Checkpoint">000477</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000477</idno>
<idno type="wicri:doubleKey">0268-1145:1992:Atkins S:corpus:design:criteria</idno>
<idno type="wicri:Area/Main/Merge">000620</idno>
<idno type="wicri:Area/Main/Curation">000581</idno>
<idno type="wicri:Area/Main/Exploration">000581</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Corpus Design Criteria</title>
<author><name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
<affiliation><wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
<affiliation><wicri:noCountry code="no comma">Oxford University Press UK</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
<affiliation><wicri:noCountry code="no comma">Linguacubun UK</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint><publisher>Oxford University Press</publisher>
<date type="published" when="1992">1992</date>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="1">1</biblScope>
<biblScope unit="page" to="16">16</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">D0CF95D3285FF52254C40F6913E14ADA65C37E2C</idno>
<idno type="DOI">10.1093/llc/7.1.1</idno>
<idno type="ArticleID">7.1.1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract">‘Corpus Design Criteria’ beings (Section 1) by defining the object to be created, a corpus, and the constituents of it, texts themselves, noting briefly the pragmatic constraints on the sort of documents which will actually be available, spoken as well as written. It then (Section 2) reviews the practical stages in the process of establishing a corpus, from selection of sources through to mark-up, assigning annotations to the texts assembled. This is followed by a consideration of copyright problems (Section 3). Section 4 points out the major difficulties in defining the population of texts that the corpus will sample, contrasting the sets of texts received versus those produced by a target group, and internal (linguistic) versus external (social) means of defining such groups. The next three sections look at the sets of markers which can be useful at different levels Section 7 begins at the highest level, considering the different types of corpus there may be. Section 6 is intermediate, considering how to distinguish the different types of text occurring within a corpus. Then, for the intra-text level. Section 7 reviews considerations governing mark-up, distinguishing those markers useful for written and spoken texts. Of these three sections. Section 6 is the most fully explicit, listing twenty-nine significant attributes assignable to a text. Sections 8 and 9 turn away from the corpus design itself, to focus on its social context and function, both of the corpus design process, and of the corpus when implemented: to what extent are there now accepted standards relevant to the criteria reviewed in preceding sections? And what are the major classes of potential users and uses for corpora, both now and in the future?</div>
</front>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Atkins, Sue" sort="Atkins, Sue" uniqKey="Atkins S" first="Sue" last="Atkins">Sue Atkins</name>
<name sortKey="Clear, Jeremy" sort="Clear, Jeremy" uniqKey="Clear J" first="Jeremy" last="Clear">Jeremy Clear</name>
<name sortKey="Ostler, Nicholas" sort="Ostler, Nicholas" uniqKey="Ostler N" first="Nicholas" last="Ostler">Nicholas Ostler</name>
</noCountry>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000581 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000581 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Ticri |area= TeiVM2 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:D0CF95D3285FF52254C40F6913E14ADA65C37E2C |texte= Corpus Design Criteria }}
This area was generated with Dilib version V0.6.31. |