Testing hypotheses about compound stress assignment in English: a corpus-based investigation
Identifieur interne : 000829 ( Main/Exploration ); précédent : 000828; suivant : 000830Testing hypotheses about compound stress assignment in English: a corpus-based investigation
Auteurs : Ingo Plag ; Gero Kunter ; Sabine LappeSource :
- Corpus Linguistics and Linguistic Theory [ 1613-7027 ] ; 2007-12-11.
English descriptors
- Teeft :
- Agentive suffix, Algorithm, Analogical, Analogical algorithm, Analogical effects, Analogical factors, Analogical hypothesis, Analogical model, Analogical modeling, Analogical models, Argstruct morphright, Argument head compounds show, Argument structure, Argument structure effect, Argumenthead compounds, Assistant professor, Authorship relation, Baayen, Boston university radio speech corpus, Cambridge university press, Categorical rules, Celex, Celex compilers, Celex database, Celex frequencies, Celex frequency, Cobuild, Cobuild corpus, Compound, Compound constituents, Compound interpretation, Compound semantics, Compound stress, Compound stress assignment, Compound stress rule, Compound stress variability, Constituent, Constituent families, Constituent family, Constituent family information, Continuous letter strings, Copulative compounds, Corpus data, Correct predictions, Dance hall, Data computationally, Data points, Database, Deriv model, Dictionary data, Distance space, Dutch compounds, English compound stress, English language, English linguistics, Exemplar, Experimental psychology, Feature values, Fifth avenue, Final model, First element, Frequency information, Gagne, Gero kunter, Giegerich, Google, Google frequencies, Harald baayen, Head morphology, Higher frequency, Higher proportion, Hyphenated compounds, Hypothesis accuracy, Important factor, Ingo plag, John benjamins, Krott, Kunter, Language processing, Lappe, Lappe figure, Large number, Latest version, Leftward, Leftward stress, Less lexicalized compounds, Lexicalization, Lexicalization effect, Lexicalized, Lexicalized compounds, Lexicon, Liberman, Linguistic data consortium, Logistic, Logistic regression, Logistic regression analysis, Logistic regression model, Madison avenue, Many compounds, Mental lexicon, Modeling, Modifierhead compounds, More detail, Morpheme, Morphology, Morphright, Music hall, Nearest neighbors, Noun, Noun phrases, Novel compounds, Orthographic words, Other compounds, Other features, Other languages, Other words, Overall accuracy, Pertinent compounds, Pertinent examples, Plag, Predictive accuracy, Predictor, Present authors, Present study, Probabilistic, Proper noun, Raters, Regression, Regression analysis, Regression model, Right constituent, Right constituents, Right predictions, Right stress, Right stresses, Rightward, Rightward stress, Rightward stress assignment, Rightward stresses, Robert schreuder, Sabine lappe, Same data, Semantic, Semantic categories, Semantic entities, Semantic features, Semantic hypotheses, Semantic hypothesis, Semantic relation, Semantic relations, Significant influence, Significant predictors, Spelling, Sproat, Stress assignment, Stress pattern, Stress patterns, Stress position, Structural hypothesis, Subset, Synthetic compounds, Test item, Timbl, Timbl analysis, Traditional claims, Truck driver, Usual model simplification process, Variability, Variable compound behavior, Vast majority, Worm hole.
Abstract
This paper tests three factors that have been held to be responsible for the variable stress behavior of noun-noun constructs in English: argument structure, semantics, and analogy. In a large-scale investigation of some 4500 compounds extracted from the CELEX lexical database (Baayen et al. 1995), we show that traditional claims about noun-noun stress cannot be upheld. Argument structure plays a role only with synthetic compounds ending in the agentive suffix -er. The semantic categories and relations assumed in the literature to trigger rightward stress do not show the expected effects. As an alternative to the rule-based approaches, the data were modeled computationally and probabilistically using a memory-based analogical algorithm (TiMBL 5.1) and logistic regression, respectively. It turns out that probabilistic models and the analogical algorithm are more successful in predicting stress assignment correctly than any of the rules proposed in the literature. Furthermore, the results of the analogical modeling suggest that the left and right constituent are the most important factor in compound stress assignment. This is in line with recent findings on the semi-regular behavior of compounds in other languages.
Url:
DOI: 10.1515/CLLT.2007.012
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000858
- to stream Istex, to step Curation: 000815
- to stream Istex, to step Checkpoint: 000643
- to stream Main, to step Merge: 000829
- to stream Main, to step Curation: 000829
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Testing hypotheses about compound stress assignment in English: a corpus-based investigation</title>
<author><name sortKey="Plag, Ingo" sort="Plag, Ingo" uniqKey="Plag I" first="Ingo" last="Plag">Ingo Plag</name>
</author>
<author><name sortKey="Kunter, Gero" sort="Kunter, Gero" uniqKey="Kunter G" first="Gero" last="Kunter">Gero Kunter</name>
</author>
<author><name sortKey="Lappe, Sabine" sort="Lappe, Sabine" uniqKey="Lappe S" first="Sabine" last="Lappe">Sabine Lappe</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:519348FA143EF197827D6C5F8553F9626EBC142F</idno>
<date when="2007" year="2007">2007</date>
<idno type="doi">10.1515/CLLT.2007.012</idno>
<idno type="url">https://api.istex.fr/document/519348FA143EF197827D6C5F8553F9626EBC142F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000858</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000858</idno>
<idno type="wicri:Area/Istex/Curation">000815</idno>
<idno type="wicri:Area/Istex/Checkpoint">000643</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000643</idno>
<idno type="wicri:doubleKey">1613-7027:2007:Plag I:testing:hypotheses:about</idno>
<idno type="wicri:Area/Main/Merge">000829</idno>
<idno type="wicri:Area/Main/Curation">000829</idno>
<idno type="wicri:Area/Main/Exploration">000829</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Testing hypotheses about compound stress assignment in English: a corpus-based investigation</title>
<author><name sortKey="Plag, Ingo" sort="Plag, Ingo" uniqKey="Plag I" first="Ingo" last="Plag">Ingo Plag</name>
</author>
<author><name sortKey="Kunter, Gero" sort="Kunter, Gero" uniqKey="Kunter G" first="Gero" last="Kunter">Gero Kunter</name>
</author>
<author><name sortKey="Lappe, Sabine" sort="Lappe, Sabine" uniqKey="Lappe S" first="Sabine" last="Lappe">Sabine Lappe</name>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Corpus Linguistics and Linguistic Theory</title>
<title level="j" type="abbrev">Corpus Linguistics and Linguistic Theory</title>
<idno type="ISSN">1613-7027</idno>
<idno type="eISSN">1613-7035</idno>
<imprint><publisher>Walter de Gruyter</publisher>
<date type="published" when="2007-12-11">2007-12-11</date>
<biblScope unit="volume">3</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="199">199</biblScope>
<biblScope unit="page" to="232">232</biblScope>
</imprint>
<idno type="ISSN">1613-7027</idno>
</series>
<idno type="istex">519348FA143EF197827D6C5F8553F9626EBC142F</idno>
<idno type="DOI">10.1515/CLLT.2007.012</idno>
<idno type="ArticleID">cllt.3.2.199</idno>
<idno type="pdf">cllt.2007.012.pdf</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1613-7027</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Agentive suffix</term>
<term>Algorithm</term>
<term>Analogical</term>
<term>Analogical algorithm</term>
<term>Analogical effects</term>
<term>Analogical factors</term>
<term>Analogical hypothesis</term>
<term>Analogical model</term>
<term>Analogical modeling</term>
<term>Analogical models</term>
<term>Argstruct morphright</term>
<term>Argument head compounds show</term>
<term>Argument structure</term>
<term>Argument structure effect</term>
<term>Argumenthead compounds</term>
<term>Assistant professor</term>
<term>Authorship relation</term>
<term>Baayen</term>
<term>Boston university radio speech corpus</term>
<term>Cambridge university press</term>
<term>Categorical rules</term>
<term>Celex</term>
<term>Celex compilers</term>
<term>Celex database</term>
<term>Celex frequencies</term>
<term>Celex frequency</term>
<term>Cobuild</term>
<term>Cobuild corpus</term>
<term>Compound</term>
<term>Compound constituents</term>
<term>Compound interpretation</term>
<term>Compound semantics</term>
<term>Compound stress</term>
<term>Compound stress assignment</term>
<term>Compound stress rule</term>
<term>Compound stress variability</term>
<term>Constituent</term>
<term>Constituent families</term>
<term>Constituent family</term>
<term>Constituent family information</term>
<term>Continuous letter strings</term>
<term>Copulative compounds</term>
<term>Corpus data</term>
<term>Correct predictions</term>
<term>Dance hall</term>
<term>Data computationally</term>
<term>Data points</term>
<term>Database</term>
<term>Deriv model</term>
<term>Dictionary data</term>
<term>Distance space</term>
<term>Dutch compounds</term>
<term>English compound stress</term>
<term>English language</term>
<term>English linguistics</term>
<term>Exemplar</term>
<term>Experimental psychology</term>
<term>Feature values</term>
<term>Fifth avenue</term>
<term>Final model</term>
<term>First element</term>
<term>Frequency information</term>
<term>Gagne</term>
<term>Gero kunter</term>
<term>Giegerich</term>
<term>Google</term>
<term>Google frequencies</term>
<term>Harald baayen</term>
<term>Head morphology</term>
<term>Higher frequency</term>
<term>Higher proportion</term>
<term>Hyphenated compounds</term>
<term>Hypothesis accuracy</term>
<term>Important factor</term>
<term>Ingo plag</term>
<term>John benjamins</term>
<term>Krott</term>
<term>Kunter</term>
<term>Language processing</term>
<term>Lappe</term>
<term>Lappe figure</term>
<term>Large number</term>
<term>Latest version</term>
<term>Leftward</term>
<term>Leftward stress</term>
<term>Less lexicalized compounds</term>
<term>Lexicalization</term>
<term>Lexicalization effect</term>
<term>Lexicalized</term>
<term>Lexicalized compounds</term>
<term>Lexicon</term>
<term>Liberman</term>
<term>Linguistic data consortium</term>
<term>Logistic</term>
<term>Logistic regression</term>
<term>Logistic regression analysis</term>
<term>Logistic regression model</term>
<term>Madison avenue</term>
<term>Many compounds</term>
<term>Mental lexicon</term>
<term>Modeling</term>
<term>Modifierhead compounds</term>
<term>More detail</term>
<term>Morpheme</term>
<term>Morphology</term>
<term>Morphright</term>
<term>Music hall</term>
<term>Nearest neighbors</term>
<term>Noun</term>
<term>Noun phrases</term>
<term>Novel compounds</term>
<term>Orthographic words</term>
<term>Other compounds</term>
<term>Other features</term>
<term>Other languages</term>
<term>Other words</term>
<term>Overall accuracy</term>
<term>Pertinent compounds</term>
<term>Pertinent examples</term>
<term>Plag</term>
<term>Predictive accuracy</term>
<term>Predictor</term>
<term>Present authors</term>
<term>Present study</term>
<term>Probabilistic</term>
<term>Proper noun</term>
<term>Raters</term>
<term>Regression</term>
<term>Regression analysis</term>
<term>Regression model</term>
<term>Right constituent</term>
<term>Right constituents</term>
<term>Right predictions</term>
<term>Right stress</term>
<term>Right stresses</term>
<term>Rightward</term>
<term>Rightward stress</term>
<term>Rightward stress assignment</term>
<term>Rightward stresses</term>
<term>Robert schreuder</term>
<term>Sabine lappe</term>
<term>Same data</term>
<term>Semantic</term>
<term>Semantic categories</term>
<term>Semantic entities</term>
<term>Semantic features</term>
<term>Semantic hypotheses</term>
<term>Semantic hypothesis</term>
<term>Semantic relation</term>
<term>Semantic relations</term>
<term>Significant influence</term>
<term>Significant predictors</term>
<term>Spelling</term>
<term>Sproat</term>
<term>Stress assignment</term>
<term>Stress pattern</term>
<term>Stress patterns</term>
<term>Stress position</term>
<term>Structural hypothesis</term>
<term>Subset</term>
<term>Synthetic compounds</term>
<term>Test item</term>
<term>Timbl</term>
<term>Timbl analysis</term>
<term>Traditional claims</term>
<term>Truck driver</term>
<term>Usual model simplification process</term>
<term>Variability</term>
<term>Variable compound behavior</term>
<term>Vast majority</term>
<term>Worm hole</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper tests three factors that have been held to be responsible for the variable stress behavior of noun-noun constructs in English: argument structure, semantics, and analogy. In a large-scale investigation of some 4500 compounds extracted from the CELEX lexical database (Baayen et al. 1995), we show that traditional claims about noun-noun stress cannot be upheld. Argument structure plays a role only with synthetic compounds ending in the agentive suffix -er. The semantic categories and relations assumed in the literature to trigger rightward stress do not show the expected effects. As an alternative to the rule-based approaches, the data were modeled computationally and probabilistically using a memory-based analogical algorithm (TiMBL 5.1) and logistic regression, respectively. It turns out that probabilistic models and the analogical algorithm are more successful in predicting stress assignment correctly than any of the rules proposed in the literature. Furthermore, the results of the analogical modeling suggest that the left and right constituent are the most important factor in compound stress assignment. This is in line with recent findings on the semi-regular behavior of compounds in other languages.</div>
</front>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Kunter, Gero" sort="Kunter, Gero" uniqKey="Kunter G" first="Gero" last="Kunter">Gero Kunter</name>
<name sortKey="Lappe, Sabine" sort="Lappe, Sabine" uniqKey="Lappe S" first="Sabine" last="Lappe">Sabine Lappe</name>
<name sortKey="Plag, Ingo" sort="Plag, Ingo" uniqKey="Plag I" first="Ingo" last="Plag">Ingo Plag</name>
</noCountry>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000829 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000829 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:519348FA143EF197827D6C5F8553F9626EBC142F |texte= Testing hypotheses about compound stress assignment in English: a corpus-based investigation }}
This area was generated with Dilib version V0.6.33. |