Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Czech National Corpus: Principles, Design, and Results

Identifieur interne : 000161 ( Istex/Corpus ); précédent : 000160; suivant : 000162

The Czech National Corpus: Principles, Design, and Results

Auteurs : Karel Kucera

Source :

RBID : ISTEX:1D86EA4932E758629D37D52E7F01E5307CC404E2

Abstract

This paper describes the general principles, design, and present state of the Czech National Corpus (CNC) project. The corpus has been designed to provide a firm basis for the study of both the contemporary written Czech (a goal well attainable with the present resources) and the Czech language beyond the limits of contemporary written texts (a long‐term commitment including the building of a corpus of spoken Czech and diachronic and dialectal corpora). The work on the CNC project, now in the eighth year of its official existence, has resulted in the completion of SYN2000, a 100‐million‐word corpus of contemporary written Czech, the organization of the cores of spoken, diachronic, and dialectal corpora, and the finding of workable solutions to some general theoretical problems involved in the building of these corpora.

Url:
DOI: 10.1093/llc/17.2.245

Links to Exploration step

ISTEX:1D86EA4932E758629D37D52E7F01E5307CC404E2

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Czech National Corpus: Principles, Design, and Results</title>
<author wicri:is="90%">
<name sortKey="Kucera, Karel" sort="Kucera, Karel" uniqKey="Kucera K" first="Karel" last="Kucera">Karel Kucera</name>
<affiliation>
<mods:affiliation>Charles University, Praha, Czech Republic</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:1D86EA4932E758629D37D52E7F01E5307CC404E2</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1093/llc/17.2.245</idno>
<idno type="url">https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000161</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">The Czech National Corpus: Principles, Design, and Results</title>
<author wicri:is="90%">
<name sortKey="Kucera, Karel" sort="Kucera, Karel" uniqKey="Kucera K" first="Karel" last="Kucera">Karel Kucera</name>
<affiliation>
<mods:affiliation>Charles University, Praha, Czech Republic</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<title level="j" type="abbrev">Lit Linguist Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2002-06">2002-06</date>
<biblScope unit="volume">17</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="245">245</biblScope>
<biblScope unit="page" to="257">257</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">1D86EA4932E758629D37D52E7F01E5307CC404E2</idno>
<idno type="DOI">10.1093/llc/17.2.245</idno>
<idno type="local">170245</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper describes the general principles, design, and present state of the Czech National Corpus (CNC) project. The corpus has been designed to provide a firm basis for the study of both the contemporary written Czech (a goal well attainable with the present resources) and the Czech language beyond the limits of contemporary written texts (a long‐term commitment including the building of a corpus of spoken Czech and diachronic and dialectal corpora). The work on the CNC project, now in the eighth year of its official existence, has resulted in the completion of SYN2000, a 100‐million‐word corpus of contemporary written Czech, the organization of the cores of spoken, diachronic, and dialectal corpora, and the finding of workable solutions to some general theoretical problems involved in the building of these corpora.</div>
</front>
</TEI>
<istex>
<corpusName>oup</corpusName>
<author>
<json:item>
<name>Karel Kucera</name>
<affiliations>
<json:string>Charles University, Praha, Czech Republic</json:string>
</affiliations>
</json:item>
</author>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>research-article</json:string>
</originalGenre>
<abstract>This paper describes the general principles, design, and present state of the Czech National Corpus (CNC) project. The corpus has been designed to provide a firm basis for the study of both the contemporary written Czech (a goal well attainable with the present resources) and the Czech language beyond the limits of contemporary written texts (a long‐term commitment including the building of a corpus of spoken Czech and diachronic and dialectal corpora). The work on the CNC project, now in the eighth year of its official existence, has resulted in the completion of SYN2000, a 100‐million‐word corpus of contemporary written Czech, the organization of the cores of spoken, diachronic, and dialectal corpora, and the finding of workable solutions to some general theoretical problems involved in the building of these corpora.</abstract>
<qualityIndicators>
<score>6.56</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>538.874 x 697.149 pts</pdfPageSize>
<refBibsNative>false</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>830</abstractCharCount>
<pdfWordCount>5591</pdfWordCount>
<pdfCharCount>32977</pdfCharCount>
<pdfPageCount>14</pdfPageCount>
<abstractWordCount>130</abstractWordCount>
</qualityIndicators>
<title>The Czech National Corpus: Principles, Design, and Results</title>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<volume>17</volume>
<publisherId>
<json:string>litlin</json:string>
</publisherId>
<pages>
<last>257</last>
<first>245</first>
</pages>
<issn>
<json:string>0268-1145</json:string>
</issn>
<issue>2</issue>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1477-4615</json:string>
</eissn>
<title>Literary and Linguistic Computing</title>
</host>
<categories>
<wos>
<json:string>LINGUISTICS</json:string>
<json:string>LITERATURE</json:string>
</wos>
</categories>
<publicationDate>2002</publicationDate>
<copyrightDate>2002</copyrightDate>
<doi>
<json:string>10.1093/llc/17.2.245</json:string>
</doi>
<id>1D86EA4932E758629D37D52E7F01E5307CC404E2</id>
<score>0.24639532</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">The Czech National Corpus: Principles, Design, and Results</title>
<respStmt xml:id="ISTEX-API" resp="Références bibliographiques récupérées via GROBID" name="ISTEX-API (INIST-CNRS)"></respStmt>
<respStmt xml:id="ISTEX-API" resp="Références bibliographiques récupérées via GROBID" name="ISTEX-API (INIST-CNRS)"></respStmt>
<respStmt>
<resp>Références bibliographiques récupérées via GROBID</resp>
<name resp="ISTEX-API">ISTEX-API (INIST-CNRS)</name>
</respStmt>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Oxford University Press</publisher>
<availability>
<p>OUP</p>
</availability>
<date>2002</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">The Czech National Corpus: Principles, Design, and Results</title>
<author>
<persName>
<forename type="first">Karel</forename>
<surname>Kucera</surname>
</persName>
<affiliation>Charles University, Praha, Czech Republic</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Literary and Linguistic Computing</title>
<title level="j" type="abbrev">Lit Linguist Computing</title>
<idno type="pISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2002-06"></date>
<biblScope unit="volume">17</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="245">245</biblScope>
<biblScope unit="page" to="257">257</biblScope>
</imprint>
</monogr>
<idno type="istex">1D86EA4932E758629D37D52E7F01E5307CC404E2</idno>
<idno type="DOI">10.1093/llc/17.2.245</idno>
<idno type="local">170245</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2002</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>This paper describes the general principles, design, and present state of the Czech National Corpus (CNC) project. The corpus has been designed to provide a firm basis for the study of both the contemporary written Czech (a goal well attainable with the present resources) and the Czech language beyond the limits of contemporary written texts (a long‐term commitment including the building of a corpus of spoken Czech and diachronic and dialectal corpora). The work on the CNC project, now in the eighth year of its official existence, has resulted in the completion of SYN2000, a 100‐million‐word corpus of contemporary written Czech, the organization of the cores of spoken, diachronic, and dialectal corpora, and the finding of workable solutions to some general theoretical problems involved in the building of these corpora.</p>
</abstract>
</profileDesc>
<revisionDesc>
<change when="2002-06">Published</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-3-15">References added</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-3-21">References added</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-07-27">References added</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus oup" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="US-ASCII"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" URI="journalpublishing.dtd" name="istex:docType"></istex:docType>
<istex:document>
<article xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">litlin</journal-id>
<journal-id journal-id-type="hwp">litlin</journal-id>
<journal-title>Literary and Linguistic Computing</journal-title>
<abbrev-journal-title abbrev-type="publisher">Lit Linguist Computing</abbrev-journal-title>
<issn pub-type="ppub">0268-1145</issn>
<issn pub-type="epub">1477-4615</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="other">170245</article-id>
<article-id pub-id-type="doi">10.1093/llc/17.2.245</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>The Czech National Corpus: Principles, Design, and Results</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Kucera</surname>
<given-names>Karel</given-names>
</name>
<xref rid="AFF1">1</xref>
</contrib>
<aff>
<target target-type="aff" id="AFF1"></target>
<label>1</label>
Charles University, Praha, Czech Republic</aff>
</contrib-group>
<pub-date pub-type="ppub">
<month>06</month>
<year>2002</year>
</pub-date>
<volume>17</volume>
<issue>2</issue>
<fpage>245</fpage>
<lpage>257</lpage>
<permissions>
<copyright-statement>Copyright Association for Literary & Linguistic Computing 2002</copyright-statement>
<copyright-year>2002</copyright-year>
</permissions>
<abstract xml:lang="en">
<p>This paper describes the general principles, design, and present state of the Czech National Corpus (CNC) project. The corpus has been designed to provide a firm basis for the study of both the contemporary written Czech (a goal well attainable with the present resources) and the Czech language beyond the limits of contemporary written texts (a long‐term commitment including the building of a corpus of spoken Czech and diachronic and dialectal corpora). The work on the CNC project, now in the eighth year of its official existence, has resulted in the completion of SYN2000, a 100‐million‐word corpus of contemporary written Czech, the organization of the cores of spoken, diachronic, and dialectal corpora, and the finding of workable solutions to some general theoretical problems involved in the building of these corpora.</p>
</abstract>
<custom-meta-wrap>
<custom-meta>
<meta-name>hwp-legacy-fpage</meta-name>
<meta-value>245</meta-value>
</custom-meta>
<custom-meta>
<meta-name>hwp-legacy-dochead</meta-name>
<meta-value>Article</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>The Czech National Corpus: Principles, Design, and Results</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>The Czech National Corpus: Principles, Design, and Results</title>
</titleInfo>
<name type="personal">
<namePart type="given">Karel</namePart>
<namePart type="family">Kucera</namePart>
<affiliation>Charles University, Praha, Czech Republic</affiliation>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="research-article"></genre>
<originInfo>
<publisher>Oxford University Press</publisher>
<dateIssued encoding="w3cdtf">2002-06</dateIssued>
<copyrightDate encoding="w3cdtf">2002</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">This paper describes the general principles, design, and present state of the Czech National Corpus (CNC) project. The corpus has been designed to provide a firm basis for the study of both the contemporary written Czech (a goal well attainable with the present resources) and the Czech language beyond the limits of contemporary written texts (a long‐term commitment including the building of a corpus of spoken Czech and diachronic and dialectal corpora). The work on the CNC project, now in the eighth year of its official existence, has resulted in the completion of SYN2000, a 100‐million‐word corpus of contemporary written Czech, the organization of the cores of spoken, diachronic, and dialectal corpora, and the finding of workable solutions to some general theoretical problems involved in the building of these corpora.</abstract>
<relatedItem type="host">
<titleInfo>
<title>Literary and Linguistic Computing</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>Lit Linguist Computing</title>
</titleInfo>
<genre type="journal">journal</genre>
<identifier type="ISSN">0268-1145</identifier>
<identifier type="eISSN">1477-4615</identifier>
<identifier type="PublisherID">litlin</identifier>
<identifier type="PublisherID-hwp">litlin</identifier>
<part>
<date>2002</date>
<detail type="volume">
<caption>vol.</caption>
<number>17</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>2</number>
</detail>
<extent unit="pages">
<start>245</start>
<end>257</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">1D86EA4932E758629D37D52E7F01E5307CC404E2</identifier>
<identifier type="DOI">10.1093/llc/17.2.245</identifier>
<identifier type="local">170245</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Copyright Association for Literary & Linguistic Computing 2002</accessCondition>
<recordInfo>
<recordContentSource>OUP</recordContentSource>
</recordInfo>
</mods>
</metadata>
<enrichments>
<istex:catWosTEI uri="https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/enrichments/catWos">
<teiHeader>
<profileDesc>
<textClass>
<classCode scheme="WOS">LINGUISTICS</classCode>
<classCode scheme="WOS">LITERATURE</classCode>
</textClass>
</profileDesc>
</teiHeader>
</istex:catWosTEI>
<json:item>
<type>refBibs</type>
<uri>https://api.istex.fr/document/1D86EA4932E758629D37D52E7F01E5307CC404E2/enrichments/refBibs</uri>
</json:item>
</enrichments>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000161 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000161 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:1D86EA4932E758629D37D52E7F01E5307CC404E2
   |texte=   The Czech National Corpus: Principles, Design, and Results
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024