Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles

Identifieur interne : 000605 ( Main/Merge ); précédent : 000604; suivant : 000606

SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles

Auteurs : Qing Cai ; Marc Brysbaert

Source :

RBID : PMC:2880003

Abstract

Background

Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.

Methodology

Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.

Conclusions

Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.


Url:
DOI: 10.1371/journal.pone.0010729
PubMed: 20532192
PubMed Central: 2880003

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:2880003

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles</title>
<author>
<name sortKey="Cai, Qing" sort="Cai, Qing" uniqKey="Cai Q" first="Qing" last="Cai">Qing Cai</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Brysbaert, Marc" sort="Brysbaert, Marc" uniqKey="Brysbaert M" first="Marc" last="Brysbaert">Marc Brysbaert</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20532192</idno>
<idno type="pmc">2880003</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2880003</idno>
<idno type="RBID">PMC:2880003</idno>
<idno type="doi">10.1371/journal.pone.0010729</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000183</idno>
<idno type="wicri:Area/Pmc/Curation">000183</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000142</idno>
<idno type="wicri:Area/Ncbi/Merge">000081</idno>
<idno type="wicri:Area/Ncbi/Curation">000081</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000081</idno>
<idno type="wicri:Area/Main/Merge">000605</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles</title>
<author>
<name sortKey="Cai, Qing" sort="Cai, Qing" uniqKey="Cai Q" first="Qing" last="Cai">Qing Cai</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Brysbaert, Marc" sort="Brysbaert, Marc" uniqKey="Brysbaert M" first="Marc" last="Brysbaert">Marc Brysbaert</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.</p>
</sec>
<sec>
<title>Methodology</title>
<p>Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Perfetti, Ca" uniqKey="Perfetti C">CA Perfetti</name>
</author>
<author>
<name sortKey="Tan, Lh" uniqKey="Tan L">LH Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bai, X" uniqKey="Bai X">X Bai</name>
</author>
<author>
<name sortKey="Yan, G" uniqKey="Yan G">G Yan</name>
</author>
<author>
<name sortKey="Liversedge, Sp" uniqKey="Liversedge S">SP Liversedge</name>
</author>
<author>
<name sortKey="Zang, X" uniqKey="Zang X">X Zang</name>
</author>
<author>
<name sortKey="Rayner, K" uniqKey="Rayner K">K Rayner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wong, K" uniqKey="Wong K">K Wong</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Xu, R" uniqKey="Xu R">R Xu</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xiao, R" uniqKey="Xiao R">R Xiao</name>
</author>
<author>
<name sortKey="Rayson, P" uniqKey="Rayson P">P Rayson</name>
</author>
<author>
<name sortKey="Mcenery, A" uniqKey="Mcenery A">A McEnery</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feng, Z" uniqKey="Feng Z">Z Feng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, H" uniqKey="Sun H">H Sun</name>
</author>
<author>
<name sortKey="Huang, J" uniqKey="Huang J">J Huang</name>
</author>
<author>
<name sortKey="Sun, D" uniqKey="Sun D">D Sun</name>
</author>
<author>
<name sortKey="Li, D" uniqKey="Li D">D Li</name>
</author>
<author>
<name sortKey="Xing, H" uniqKey="Xing H">H Xing</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, M" uniqKey="Sun M">M Sun</name>
</author>
<author>
<name sortKey="Wang, H" uniqKey="Wang H">H Wang</name>
</author>
<author>
<name sortKey="Li, X" uniqKey="Li X">X LI</name>
</author>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L Fu</name>
</author>
<author>
<name sortKey="Huang, C" uniqKey="Huang C">C Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Shu, H" uniqKey="Shu H">H Shu</name>
</author>
<author>
<name sortKey="Li, P" uniqKey="Li P">P LI</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brysbaert, M" uniqKey="Brysbaert M">M Brysbaert</name>
</author>
<author>
<name sortKey="New, B" uniqKey="New B">B New</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keuleers, E" uniqKey="Keuleers E">E Keuleers</name>
</author>
<author>
<name sortKey="Brysbaert, M" uniqKey="Brysbaert M">M Brysbaert</name>
</author>
<author>
<name sortKey="New, B" uniqKey="New B">B New</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="New, B" uniqKey="New B">B New</name>
</author>
<author>
<name sortKey="Brysbaert, M" uniqKey="Brysbaert M">M Brysbaert</name>
</author>
<author>
<name sortKey="Veronis, J" uniqKey="Veronis J">J Veronis</name>
</author>
<author>
<name sortKey="Pallier, C" uniqKey="Pallier C">C Pallier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balota, Da" uniqKey="Balota D">DA Balota</name>
</author>
<author>
<name sortKey="Yap, Mj" uniqKey="Yap M">MJ Yap</name>
</author>
<author>
<name sortKey="Cortese, Mj" uniqKey="Cortese M">MJ Cortese</name>
</author>
<author>
<name sortKey="Hutchison, Ka" uniqKey="Hutchison K">KA Hutchison</name>
</author>
<author>
<name sortKey="Kessler, B" uniqKey="Kessler B">B Kessler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Adelman, Js" uniqKey="Adelman J">JS Adelman</name>
</author>
<author>
<name sortKey="Brown, Gda" uniqKey="Brown G">GDA Brown</name>
</author>
<author>
<name sortKey="Quesada, Jf" uniqKey="Quesada J">JF Quesada</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baayen, Rh" uniqKey="Baayen R">RH Baayen</name>
</author>
<author>
<name sortKey="Piepenbrock, R" uniqKey="Piepenbrock R">R Piepenbrock</name>
</author>
<author>
<name sortKey="Van Rijn, H" uniqKey="Van Rijn H">H van Rijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baayen, Rh" uniqKey="Baayen R">RH Baayen</name>
</author>
<author>
<name sortKey="Piepenbrock, R" uniqKey="Piepenbrock R">R Piepenbrock</name>
</author>
<author>
<name sortKey="Van Rijn, H" uniqKey="Van Rijn H">H van Rijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Xiong, D" uniqKey="Xiong D">D Xiong</name>
</author>
<author>
<name sortKey="Liu, Q" uniqKey="Liu Q">Q Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, K" uniqKey="Zhang K">K Zhang</name>
</author>
<author>
<name sortKey="Liu, Q" uniqKey="Liu Q">Q Liu</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Cheng, X" uniqKey="Cheng X">X Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, X" uniqKey="Zhou X">X Zhou</name>
</author>
<author>
<name sortKey="Marslen Wilson, W" uniqKey="Marslen Wilson W">W Marslen-Wilson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, J" uniqKey="Myers J">J Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Packard, Jl" uniqKey="Packard J">JL Packard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, S" uniqKey="Yu S">S Yu</name>
</author>
<author>
<name sortKey="Duan, H" uniqKey="Duan H">H Duan</name>
</author>
<author>
<name sortKey="Zhu, X" uniqKey="Zhu X">X Zhu</name>
</author>
<author>
<name sortKey="Sun, B" uniqKey="Sun B">B Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, S" uniqKey="Yu S">S Yu</name>
</author>
<author>
<name sortKey="Duan, H" uniqKey="Duan H">H Duan</name>
</author>
<author>
<name sortKey="Zhu, X" uniqKey="Zhu X">X Zhu</name>
</author>
<author>
<name sortKey="Sun, B" uniqKey="Sun B">B Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, R" uniqKey="Zhang R">R Zhang</name>
</author>
<author>
<name sortKey="Yasuda, K" uniqKey="Yasuda K">K Yasuda</name>
</author>
<author>
<name sortKey="Sumita, E" uniqKey="Sumita E">E Sumita</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balota, Da" uniqKey="Balota D">DA Balota</name>
</author>
<author>
<name sortKey="Cortese, Mj" uniqKey="Cortese M">MJ Cortese</name>
</author>
<author>
<name sortKey="Sergent Marshall, Sd" uniqKey="Sergent Marshall S">SD Sergent-Marshall</name>
</author>
<author>
<name sortKey="Spieler, Dh" uniqKey="Spieler D">DH Spieler</name>
</author>
<author>
<name sortKey="Yap, Mj" uniqKey="Yap M">MJ Yap</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Myers, J" uniqKey="Myers J">J Myers</name>
</author>
<author>
<name sortKey="Huang, Y" uniqKey="Huang Y">Y Huang</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000605 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 000605 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:2880003
   |texte=   SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Merge/RBID.i   -Sk "pubmed:20532192" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024