MusicSarreV3, Main, Exploration, bibRecord, 000829

Testing hypotheses about compound stress assignment in English: a corpus-based investigation

Identifieur interne : 000829 ( Main/Exploration ); précédent : 000828; suivant : 000830

Testing hypotheses about compound stress assignment in English: a corpus-based investigation

Auteurs : Ingo Plag ; Gero Kunter ; Sabine Lappe

Source :

Corpus Linguistics and Linguistic Theory [ 1613-7027 ] ; 2007-12-11.

RBID : ISTEX:519348FA143EF197827D6C5F8553F9626EBC142F

English descriptors

Teeft :
- Agentive suffix, Algorithm, Analogical, Analogical algorithm, Analogical effects, Analogical factors, Analogical hypothesis, Analogical model, Analogical modeling, Analogical models, Argstruct morphright, Argument head compounds show, Argument structure, Argument structure effect, Argumenthead compounds, Assistant professor, Authorship relation, Baayen, Boston university radio speech corpus, Cambridge university press, Categorical rules, Celex, Celex compilers, Celex database, Celex frequencies, Celex frequency, Cobuild, Cobuild corpus, Compound, Compound constituents, Compound interpretation, Compound semantics, Compound stress, Compound stress assignment, Compound stress rule, Compound stress variability, Constituent, Constituent families, Constituent family, Constituent family information, Continuous letter strings, Copulative compounds, Corpus data, Correct predictions, Dance hall, Data computationally, Data points, Database, Deriv model, Dictionary data, Distance space, Dutch compounds, English compound stress, English language, English linguistics, Exemplar, Experimental psychology, Feature values, Fifth avenue, Final model, First element, Frequency information, Gagne, Gero kunter, Giegerich, Google, Google frequencies, Harald baayen, Head morphology, Higher frequency, Higher proportion, Hyphenated compounds, Hypothesis accuracy, Important factor, Ingo plag, John benjamins, Krott, Kunter, Language processing, Lappe, Lappe figure, Large number, Latest version, Leftward, Leftward stress, Less lexicalized compounds, Lexicalization, Lexicalization effect, Lexicalized, Lexicalized compounds, Lexicon, Liberman, Linguistic data consortium, Logistic, Logistic regression, Logistic regression analysis, Logistic regression model, Madison avenue, Many compounds, Mental lexicon, Modeling, Modifierhead compounds, More detail, Morpheme, Morphology, Morphright, Music hall, Nearest neighbors, Noun, Noun phrases, Novel compounds, Orthographic words, Other compounds, Other features, Other languages, Other words, Overall accuracy, Pertinent compounds, Pertinent examples, Plag, Predictive accuracy, Predictor, Present authors, Present study, Probabilistic, Proper noun, Raters, Regression, Regression analysis, Regression model, Right constituent, Right constituents, Right predictions, Right stress, Right stresses, Rightward, Rightward stress, Rightward stress assignment, Rightward stresses, Robert schreuder, Sabine lappe, Same data, Semantic, Semantic categories, Semantic entities, Semantic features, Semantic hypotheses, Semantic hypothesis, Semantic relation, Semantic relations, Significant influence, Significant predictors, Spelling, Sproat, Stress assignment, Stress pattern, Stress patterns, Stress position, Structural hypothesis, Subset, Synthetic compounds, Test item, Timbl, Timbl analysis, Traditional claims, Truck driver, Usual model simplification process, Variability, Variable compound behavior, Vast majority, Worm hole.

Abstract

This paper tests three factors that have been held to be responsible for the variable stress behavior of noun-noun constructs in English: argument structure, semantics, and analogy. In a large-scale investigation of some 4500 compounds extracted from the CELEX lexical database (Baayen et al. 1995), we show that traditional claims about noun-noun stress cannot be upheld. Argument structure plays a role only with synthetic compounds ending in the agentive suffix -er. The semantic categories and relations assumed in the literature to trigger rightward stress do not show the expected effects. As an alternative to the rule-based approaches, the data were modeled computationally and probabilistically using a memory-based analogical algorithm (TiMBL 5.1) and logistic regression, respectively. It turns out that probabilistic models and the analogical algorithm are more successful in predicting stress assignment correctly than any of the rules proposed in the literature. Furthermore, the results of the analogical modeling suggest that the left and right constituent are the most important factor in compound stress assignment. This is in line with recent findings on the semi-regular behavior of compounds in other languages.

Url:

https://api.istex.fr/document/519348FA143EF197827D6C5F8553F9626EBC142F/fulltext/pdf

DOI: 10.1515/CLLT.2007.012

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000858
to stream Istex, to step Curation: 000815
to stream Istex, to step Checkpoint: 000643
to stream Main, to step Merge: 000829
to stream Main, to step Curation: 000829

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Testing hypotheses about compound stress assignment in English: a corpus-based investigation</title>
<author><name sortKey="Plag, Ingo" sort="Plag, Ingo" uniqKey="Plag I" first="Ingo" last="Plag">Ingo Plag</name>
</author>
<author><name sortKey="Kunter, Gero" sort="Kunter, Gero" uniqKey="Kunter G" first="Gero" last="Kunter">Gero Kunter</name>
</author>
<author><name sortKey="Lappe, Sabine" sort="Lappe, Sabine" uniqKey="Lappe S" first="Sabine" last="Lappe">Sabine Lappe</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:519348FA143EF197827D6C5F8553F9626EBC142F</idno>
<date when="2007" year="2007">2007</date>
<idno type="doi">10.1515/CLLT.2007.012</idno>
<idno type="url">https://api.istex.fr/document/519348FA143EF197827D6C5F8553F9626EBC142F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000858</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000858</idno>
<idno type="wicri:Area/Istex/Curation">000815</idno>
<idno type="wicri:Area/Istex/Checkpoint">000643</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000643</idno>
<idno type="wicri:doubleKey">1613-7027:2007:Plag I:testing:hypotheses:about</idno>
<idno type="wicri:Area/Main/Merge">000829</idno>
<idno type="wicri:Area/Main/Curation">000829</idno>
<idno type="wicri:Area/Main/Exploration">000829</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Testing hypotheses about compound stress assignment in English: a corpus-based investigation</title>
<author><name sortKey="Plag, Ingo" sort="Plag, Ingo" uniqKey="Plag I" first="Ingo" last="Plag">Ingo Plag</name>
</author>
<author><name sortKey="Kunter, Gero" sort="Kunter, Gero" uniqKey="Kunter G" first="Gero" last="Kunter">Gero Kunter</name>
</author>
<author><name sortKey="Lappe, Sabine" sort="Lappe, Sabine" uniqKey="Lappe S" first="Sabine" last="Lappe">Sabine Lappe</name>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Corpus Linguistics and Linguistic Theory</title>
<title level="j" type="abbrev">Corpus Linguistics and Linguistic Theory</title>
<idno type="ISSN">1613-7027</idno>
<idno type="eISSN">1613-7035</idno>
<imprint><publisher>Walter de Gruyter</publisher>
<date type="published" when="2007-12-11">2007-12-11</date>
<biblScope unit="volume">3</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="199">199</biblScope>
<biblScope unit="page" to="232">232</biblScope>
</imprint>
<idno type="ISSN">1613-7027</idno>
</series>
<idno type="istex">519348FA143EF197827D6C5F8553F9626EBC142F</idno>
<idno type="DOI">10.1515/CLLT.2007.012</idno>
<idno type="ArticleID">cllt.3.2.199</idno>
<idno type="pdf">cllt.2007.012.pdf</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1613-7027</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Agentive suffix</term>
<term>Algorithm</term>
<term>Analogical</term>
<term>Analogical algorithm</term>
<term>Analogical effects</term>
<term>Analogical factors</term>
<term>Analogical hypothesis</term>
<term>Analogical model</term>
<term>Analogical modeling</term>
<term>Analogical models</term>
<term>Argstruct morphright</term>
<term>Argument head compounds show</term>
<term>Argument structure</term>
<term>Argument structure effect</term>
<term>Argumenthead compounds</term>
<term>Assistant professor</term>
<term>Authorship relation</term>
<term>Baayen</term>
<term>Boston university radio speech corpus</term>
<term>Cambridge university press</term>
<term>Categorical rules</term>
<term>Celex</term>
<term>Celex compilers</term>
<term>Celex database</term>
<term>Celex frequencies</term>
<term>Celex frequency</term>
<term>Cobuild</term>
<term>Cobuild corpus</term>
<term>Compound</term>
<term>Compound constituents</term>
<term>Compound interpretation</term>
<term>Compound semantics</term>
<term>Compound stress</term>
<term>Compound stress assignment</term>
<term>Compound stress rule</term>
<term>Compound stress variability</term>
<term>Constituent</term>
<term>Constituent families</term>
<term>Constituent family</term>
<term>Constituent family information</term>
<term>Continuous letter strings</term>
<term>Copulative compounds</term>
<term>Corpus data</term>
<term>Correct predictions</term>
<term>Dance hall</term>
<term>Data computationally</term>
<term>Data points</term>
<term>Database</term>
<term>Deriv model</term>
<term>Dictionary data</term>
<term>Distance space</term>
<term>Dutch compounds</term>
<term>English compound stress</term>
<term>English language</term>
<term>English linguistics</term>
<term>Exemplar</term>
<term>Experimental psychology</term>
<term>Feature values</term>
<term>Fifth avenue</term>
<term>Final model</term>
<term>First element</term>
<term>Frequency information</term>
<term>Gagne</term>
<term>Gero kunter</term>
<term>Giegerich</term>
<term>Google</term>
<term>Google frequencies</term>
<term>Harald baayen</term>
<term>Head morphology</term>
<term>Higher frequency</term>
<term>Higher proportion</term>
<term>Hyphenated compounds</term>
<term>Hypothesis accuracy</term>
<term>Important factor</term>
<term>Ingo plag</term>
<term>John benjamins</term>
<term>Krott</term>
<term>Kunter</term>
<term>Language processing</term>
<term>Lappe</term>
<term>Lappe figure</term>
<term>Large number</term>
<term>Latest version</term>
<term>Leftward</term>
<term>Leftward stress</term>
<term>Less lexicalized compounds</term>
<term>Lexicalization</term>
<term>Lexicalization effect</term>
<term>Lexicalized</term>
<term>Lexicalized compounds</term>
<term>Lexicon</term>
<term>Liberman</term>
<term>Linguistic data consortium</term>
<term>Logistic</term>
<term>Logistic regression</term>
<term>Logistic regression analysis</term>
<term>Logistic regression model</term>
<term>Madison avenue</term>
<term>Many compounds</term>
<term>Mental lexicon</term>
<term>Modeling</term>
<term>Modifierhead compounds</term>
<term>More detail</term>
<term>Morpheme</term>
<term>Morphology</term>
<term>Morphright</term>
<term>Music hall</term>
<term>Nearest neighbors</term>
<term>Noun</term>
<term>Noun phrases</term>
<term>Novel compounds</term>
<term>Orthographic words</term>
<term>Other compounds</term>
<term>Other features</term>
<term>Other languages</term>
<term>Other words</term>
<term>Overall accuracy</term>
<term>Pertinent compounds</term>
<term>Pertinent examples</term>
<term>Plag</term>
<term>Predictive accuracy</term>
<term>Predictor</term>
<term>Present authors</term>
<term>Present study</term>
<term>Probabilistic</term>
<term>Proper noun</term>
<term>Raters</term>
<term>Regression</term>
<term>Regression analysis</term>
<term>Regression model</term>
<term>Right constituent</term>
<term>Right constituents</term>
<term>Right predictions</term>
<term>Right stress</term>
<term>Right stresses</term>
<term>Rightward</term>
<term>Rightward stress</term>
<term>Rightward stress assignment</term>
<term>Rightward stresses</term>
<term>Robert schreuder</term>
<term>Sabine lappe</term>
<term>Same data</term>
<term>Semantic</term>
<term>Semantic categories</term>
<term>Semantic entities</term>
<term>Semantic features</term>
<term>Semantic hypotheses</term>
<term>Semantic hypothesis</term>
<term>Semantic relation</term>
<term>Semantic relations</term>
<term>Significant influence</term>
<term>Significant predictors</term>
<term>Spelling</term>
<term>Sproat</term>
<term>Stress assignment</term>
<term>Stress pattern</term>
<term>Stress patterns</term>
<term>Stress position</term>
<term>Structural hypothesis</term>
<term>Subset</term>
<term>Synthetic compounds</term>
<term>Test item</term>
<term>Timbl</term>
<term>Timbl analysis</term>
<term>Traditional claims</term>
<term>Truck driver</term>
<term>Usual model simplification process</term>
<term>Variability</term>
<term>Variable compound behavior</term>
<term>Vast majority</term>
<term>Worm hole</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper tests three factors that have been held to be responsible for the variable stress behavior of noun-noun constructs in English: argument structure, semantics, and analogy. In a large-scale investigation of some 4500 compounds extracted from the CELEX lexical database (Baayen et al. 1995), we show that traditional claims about noun-noun stress cannot be upheld. Argument structure plays a role only with synthetic compounds ending in the agentive suffix -er. The semantic categories and relations assumed in the literature to trigger rightward stress do not show the expected effects. As an alternative to the rule-based approaches, the data were modeled computationally and probabilistically using a memory-based analogical algorithm (TiMBL 5.1) and logistic regression, respectively. It turns out that probabilistic models and the analogical algorithm are more successful in predicting stress assignment correctly than any of the rules proposed in the literature. Furthermore, the results of the analogical modeling suggest that the left and right constituent are the most important factor in compound stress assignment. This is in line with recent findings on the semi-regular behavior of compounds in other languages.</div>
</front>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Kunter, Gero" sort="Kunter, Gero" uniqKey="Kunter G" first="Gero" last="Kunter">Gero Kunter</name>
<name sortKey="Lappe, Sabine" sort="Lappe, Sabine" uniqKey="Lappe S" first="Sabine" last="Lappe">Sabine Lappe</name>
<name sortKey="Plag, Ingo" sort="Plag, Ingo" uniqKey="Plag I" first="Ingo" last="Plag">Ingo Plag</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000829 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000829 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:519348FA143EF197827D6C5F8553F9626EBC142F
   |texte=   Testing hypotheses about compound stress assignment in English: a corpus-based investigation
}}

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024

	Serveur d'exploration sur la musique en Sarre
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la musique en Sarre

Testing hypotheses about compound stress assignment in English: a corpus-based investigation

Testing hypotheses about compound stress assignment in English: a corpus-based investigation

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri