Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Using evolutionary Expectation Maximization to estimate indel rates

Identifieur interne : 000697 ( Istex/Corpus ); précédent : 000696; suivant : 000698

Using evolutionary Expectation Maximization to estimate indel rates

Auteurs : Ian Holmes

Source :

RBID : ISTEX:F281223AAC22F2B7DE8C59461F5F6B2675557942

Abstract

Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results: We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability: Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/ Contact: ihh@berkeley.edu

Url:
DOI: 10.1093/bioinformatics/bti177

Links to Exploration step

ISTEX:F281223AAC22F2B7DE8C59461F5F6B2675557942

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Using evolutionary Expectation Maximization to estimate indel rates</title>
<author wicri:is="90%">
<name sortKey="Holmes, Ian" sort="Holmes, Ian" uniqKey="Holmes I" first="Ian" last="Holmes">Ian Holmes</name>
<affiliation>
<mods:affiliation>Department of Statistics 1 South Parks Road, Oxford OX1 3TG, UK</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F281223AAC22F2B7DE8C59461F5F6B2675557942</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1093/bioinformatics/bti177</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000697</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000697</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main">Using evolutionary Expectation Maximization to estimate indel rates</title>
<author wicri:is="90%">
<name sortKey="Holmes, Ian" sort="Holmes, Ian" uniqKey="Holmes I" first="Ian" last="Holmes">Ian Holmes</name>
<affiliation>
<mods:affiliation>Department of Statistics 1 South Parks Road, Oxford OX1 3TG, UK</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j" type="main">Bioinformatics</title>
<title level="j" type="abbrev">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1460-2059</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="e-published">2005</date>
<date type="published">2005</date>
<biblScope unit="vol">21</biblScope>
<biblScope unit="issue">10</biblScope>
<biblScope unit="page" from="2294">2294</biblScope>
<biblScope unit="page" to="2300">2300</biblScope>
</imprint>
<idno type="ISSN">1367-4803</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1367-4803</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results: We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability: Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/ Contact: ihh@berkeley.edu</div>
</front>
</TEI>
<istex>
<corpusName>oup</corpusName>
<keywords>
<teeft>
<json:string>indel</json:string>
<json:string>algorithm</json:string>
<json:string>markov</json:string>
<json:string>coronavirus</json:string>
<json:string>phylogenetic</json:string>
<json:string>indel rate</json:string>
<json:string>markov model</json:string>
<json:string>sars</json:string>
<json:string>pairwise</json:string>
<json:string>statistic</json:string>
<json:string>thorne</json:string>
<json:string>evol</json:string>
<json:string>deletion</json:string>
<json:string>evolutionary model</json:string>
<json:string>matrix</json:string>
<json:string>mutation</json:string>
<json:string>mcmc</json:string>
<json:string>biol</json:string>
<json:string>hein</json:string>
<json:string>insertion</json:string>
<json:string>datum</json:string>
<json:string>substitution</json:string>
<json:string>substitution model</json:string>
<json:string>sequence length</json:string>
<json:string>phylogenetic tree</json:string>
<json:string>stochastic</json:string>
<json:string>multiple alignment</json:string>
<json:string>stochastic grammar</json:string>
<json:string>ancestral sequence</json:string>
<json:string>alignment</json:string>
<json:string>amino</json:string>
<json:string>pairwise amino distance matrix</json:string>
<json:string>amino acid</json:string>
<json:string>estimate indel rate</json:string>
<json:string>mortal link</json:string>
<json:string>peptidase domain</json:string>
<json:string>multiple sequence alignment</json:string>
<json:string>naive estimate</json:string>
<json:string>indel process</json:string>
<json:string>sequence evolution</json:string>
<json:string>initial sequence length</json:string>
<json:string>statistical alignment</json:string>
<json:string>secondary structure</json:string>
<json:string>markov chain</json:string>
<json:string>true indel rate</json:string>
<json:string>indel model</json:string>
<json:string>rate matrix</json:string>
<json:string>probabilistic model</json:string>
<json:string>single residue</json:string>
<json:string>coronavirus glycosurface protein</json:string>
<json:string>deletion rate</json:string>
<json:string>immortal link</json:string>
<json:string>evolutionary</json:string>
<json:string>spike</json:string>
<json:string>sars coronavirus</json:string>
<json:string>ancestral residue</json:string>
<json:string>point substitution model</json:string>
<json:string>state space</json:string>
<json:string>such model</json:string>
<json:string>pairwise alignment</json:string>
<json:string>sequence alignment</json:string>
<json:string>null state</json:string>
<json:string>lagrange multiplier</json:string>
<json:string>substitution rate</json:string>
<json:string>actual value</json:string>
<json:string>descendant sequence length</json:string>
<json:string>likelihood function</json:string>
<json:string>rate parameter</json:string>
<json:string>evolutionary history</json:string>
<json:string>point substitution process</json:string>
<json:string>insertion site</json:string>
<json:string>previous work</json:string>
<json:string>equilibrium distribution</json:string>
<json:string>sequence accession number</json:string>
<json:string>biological datum</json:string>
<json:string>unaligned sequence</json:string>
<json:string>cambridge university press</json:string>
<json:string>experimental datum</json:string>
<json:string>probability parameter</json:string>
<json:string>separate phylogenetic tree</json:string>
<json:string>alignment path</json:string>
<json:string>branch length</json:string>
<json:string>insertion rate</json:string>
<json:string>domain name</json:string>
<json:string>sars coronavirus spike surface glycoprotein</json:string>
<json:string>same protein</json:string>
<json:string>nucleic acid</json:string>
<json:string>biological sequence analysis</json:string>
<json:string>expectation maximization</json:string>
<json:string>maximum likelihood</json:string>
<json:string>evolutionary hmms</json:string>
<json:string>domain</json:string>
</teeft>
</keywords>
<author>
<json:item>
<name>Ian Holmes</name>
<affiliations>
<json:string>Department of Statistics 1 South Parks Road, Oxford OX1 3TG, UK</json:string>
</affiliations>
</json:item>
</author>
<arkIstex>ark:/67375/HXZ-CKLS1M4Q-6</arkIstex>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>other</json:string>
</originalGenre>
<abstract>Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results: We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability: Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/ Contact: ihh@berkeley.edu</abstract>
<qualityIndicators>
<score>8.555</score>
<pdfWordCount>4936</pdfWordCount>
<pdfCharCount>29589</pdfCharCount>
<pdfVersion>1.4</pdfVersion>
<pdfPageCount>7</pdfPageCount>
<pdfPageSize>595 x 841 pts</pdfPageSize>
<pdfWordsPerPage>705</pdfWordsPerPage>
<pdfText>true</pdfText>
<refBibsNative>true</refBibsNative>
<abstractWordCount>246</abstractWordCount>
<abstractCharCount>1765</abstractCharCount>
<keywordCount>0</keywordCount>
</qualityIndicators>
<title>Using evolutionary Expectation Maximization to estimate indel rates</title>
<pmid>
<json:string>15731213</json:string>
</pmid>
<genre>
<json:string>other</json:string>
</genre>
<host>
<title>Bioinformatics</title>
<language>
<json:string>unknown</json:string>
</language>
<issn>
<json:string>1367-4803</json:string>
</issn>
<eissn>
<json:string>1460-2059</json:string>
</eissn>
<publisherId>
<json:string>bioinformatics</json:string>
</publisherId>
<volume>21</volume>
<issue>10</issue>
<pages>
<first>2294</first>
<last>2300</last>
</pages>
<genre>
<json:string>journal</json:string>
</genre>
<subject>
<json:item>
<value>Phylogenetics</value>
</json:item>
</subject>
</host>
<namedEntities>
<unitex>
<date>
<json:string>2005-02-24</json:string>
</date>
<geogName></geogName>
<orgName>
<json:string>Department of Bioengineering, University of California, Berkeley, CA</json:string>
<json:string>Oxford University</json:string>
<json:string>UK Received</json:string>
<json:string>EPSRC</json:string>
</orgName>
<orgName_funder>
<json:string>EPSRC</json:string>
</orgName_funder>
<orgName_provider></orgName_provider>
<persName>
<json:string>Bob Grif</json:string>
<json:string>I.Holmes</json:string>
<json:string>Bing Yap</json:string>
<json:string>Terry Speed</json:string>
</persName>
<placeName></placeName>
<ref_url>
<json:string>http://www.biowiki.org/</json:string>
</ref_url>
<ref_bibl>
<json:string>Thorne et al., 1991</json:string>
<json:string>Miklós et al., 2004</json:string>
<json:string>Durbin et al. (1998)</json:string>
<json:string>Metzler et al., 2001</json:string>
<json:string>Holmes and Rubin, 2002</json:string>
<json:string>Press et al., 1992</json:string>
<json:string>Dempster et al., 1977</json:string>
<json:string>Kellis et al., 2003</json:string>
<json:string>Mitchison and Durbin, 1995</json:string>
<json:string>Holmes and Bruno, 2001</json:string>
<json:string>Bruno et al., 2000</json:string>
<json:string>Holmes, 2004</json:string>
<json:string>Hein et al., 2000</json:string>
<json:string>Karlin and Taylor, 1975</json:string>
<json:string>Holmes, 2003</json:string>
<json:string>Hein, 2001</json:string>
<json:string>Marra et al., 2003</json:string>
<json:string>Thorne et al. (1991)</json:string>
<json:string>Lunter and Hein, 2004</json:string>
<json:string>Thorne et al. (1996)</json:string>
<json:string>Bateman et al., 1999</json:string>
<json:string>Knudsen and Miyamoto, 2003</json:string>
<json:string>Thorne et al., 1992</json:string>
<json:string>Siepel and Haussler, 2004</json:string>
<json:string>Friedman et al., 2001</json:string>
<json:string>Durbin et al., 1998</json:string>
</ref_bibl>
<bibl></bibl>
</unitex>
</namedEntities>
<ark>
<json:string>ark:/67375/HXZ-CKLS1M4Q-6</json:string>
</ark>
<categories>
<wos>
<json:string>1 - science</json:string>
<json:string>2 - mathematical & computational biology</json:string>
<json:string>2 - biotechnology & applied microbiology</json:string>
<json:string>2 - biochemical research methods</json:string>
</wos>
<scienceMetrix>
<json:string>1 - applied sciences</json:string>
<json:string>2 - enabling & strategic technologies</json:string>
<json:string>3 - bioinformatics</json:string>
</scienceMetrix>
<scopus>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Mathematics</json:string>
<json:string>3 - Computational Mathematics</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Computer Science</json:string>
<json:string>3 - Computational Theory and Mathematics</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Computer Science</json:string>
<json:string>3 - Computer Science Applications</json:string>
<json:string>1 - Life Sciences</json:string>
<json:string>2 - Biochemistry, Genetics and Molecular Biology</json:string>
<json:string>3 - Molecular Biology</json:string>
<json:string>1 - Life Sciences</json:string>
<json:string>2 - Biochemistry, Genetics and Molecular Biology</json:string>
<json:string>3 - Biochemistry</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Mathematics</json:string>
<json:string>3 - Statistics and Probability</json:string>
</scopus>
<inist>
<json:string>1 - sciences appliquees, technologies et medecines</json:string>
<json:string>2 - sciences biologiques et medicales</json:string>
<json:string>3 - sciences biologiques fondamentales et appliquees. psychologie</json:string>
<json:string>4 - generalites</json:string>
</inist>
</categories>
<publicationDate>2005</publicationDate>
<copyrightDate>2005</copyrightDate>
<doi>
<json:string>10.1093/bioinformatics/bti177</json:string>
</doi>
<id>F281223AAC22F2B7DE8C59461F5F6B2675557942</id>
<score>1</score>
<fulltext>
<json:item>
<extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/fulltext.pdf</uri>
</json:item>
<json:item>
<extension>zip</extension>
<original>false</original>
<mimetype>application/zip</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/bundle.zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/fulltext.tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main">Using evolutionary Expectation Maximization to estimate indel rates</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Oxford University Press</publisher>
<availability>
<licence>© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email:
<ref type="email">journals.permissions@oupjournals.org</ref>
</licence>
</availability>
<date type="published">2005</date>
<date type="Copyright" when="2005">2005</date>
</publicationStmt>
<notesStmt>
<note type="content-type" source="other" scheme="https://content-type.data.istex.fr/ark:/67375/XTP-6N5SZHKN-D">article</note>
<note type="publication-type" scheme="https://publication-type.data.istex.fr/ark:/67375/JMC-0GLKJH51-B">journal</note>
</notesStmt>
<sourceDesc>
<biblStruct type="other">
<analytic>
<title level="a" type="main">Using evolutionary Expectation Maximization to estimate indel rates</title>
<author xml:id="author-0000">
<persName>
<surname>Holmes</surname>
<forename type="first">Ian</forename>
</persName>
<affiliation role="corresp">*Present address: Department of Bioengineering, University of California, Berkeley, CA 94720-1762, USA</affiliation>
</author>
<idno type="istex">F281223AAC22F2B7DE8C59461F5F6B2675557942</idno>
<idno type="ark">ark:/67375/HXZ-CKLS1M4Q-6</idno>
<idno type="other">bti177</idno>
<idno type="DOI">10.1093/bioinformatics/bti177</idno>
</analytic>
<monogr>
<title level="j" type="main">Bioinformatics</title>
<title level="j" type="abbrev">Bioinformatics</title>
<idno type="hwp">bioinfo</idno>
<idno type="nlm-ta">Bioinformatics</idno>
<idno type="publisher-id">bioinformatics</idno>
<idno type="pISSN">1367-4803</idno>
<idno type="eISSN">1460-2059</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="e-published">2005</date>
<date type="published">2005</date>
<biblScope unit="vol">21</biblScope>
<biblScope unit="issue">10</biblScope>
<biblScope unit="page" from="2294">2294</biblScope>
<biblScope unit="page" to="2300">2300</biblScope>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<encodingDesc>
<schemaRef type="ODD" url="https://xml-schema.delivery.istex.fr/tei-istex.odd"></schemaRef>
<appInfo>
<application ident="pub2tei" version="1.0.41" when="2020-04-06">
<label>pub2TEI-ISTEX</label>
<desc>A set of style sheets for converting XML documents encoded in various scientific publisher formats into a common TEI format.
<ref target="http://www.tei-c.org/">We use TEI</ref>
</desc>
</application>
</appInfo>
</encodingDesc>
<profileDesc>
<abstract xml:lang="en">
<p>
<hi rend="bold">Motivation:</hi>
The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as
<hi rend="italic">Statistical Alignment</hi>
). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process.</p>
<p>
<hi rend="bold">Results:</hi>
We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates.</p>
<p>
<hi rend="bold">Availability:</hi>
Software implementing the algorithm and the benchmark is available under GPL from
<ref type="url">http://www.biowiki.org/</ref>
</p>
<p>
<hi rend="bold">Contact:</hi>
<ref type="email">ihh@berkeley.edu</ref>
</p>
</abstract>
<textClass ana="subject">
<keywords scheme="heading">
<term>ORIGINAL PAPERS</term>
</keywords>
</textClass>
<langUsage>
<language ident="en"></language>
</langUsage>
</profileDesc>
<revisionDesc>
<change when="2020-04-06" who="#istex" xml:id="pub2tei">formatting</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<extension>txt</extension>
<original>false</original>
<mimetype>text/plain</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/fulltext.txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus oup, element #text not found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="US-ASCII"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" URI="journalpublishing.dtd" name="istex:docType"></istex:docType>
<istex:document>
<article xml:lang="en" article-type="other">
<front>
<journal-meta>
<journal-id journal-id-type="hwp">bioinfo</journal-id>
<journal-id journal-id-type="nlm-ta">Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-title>Bioinformatics</journal-title>
<abbrev-journal-title abbrev-type="publisher">Bioinformatics</abbrev-journal-title>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1460-2059</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="other">bti177</article-id>
<article-id pub-id-type="doi">10.1093/bioinformatics/bti177</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>ORIGINAL PAPERS</subject>
<subj-group>
<subject>Phylogenetics</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Using evolutionary Expectation Maximization to estimate indel rates</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holmes</surname>
<given-names>Ian</given-names>
</name>
<xref rid="COR1">*</xref>
</contrib>
<aff>Department of Statistics 1 South Parks Road, Oxford OX1 3TG, UK</aff>
</contrib-group>
<author-notes>
<corresp id="COR1">
<sup>*</sup>
Present address: Department of Bioengineering, University of California, Berkeley, CA 94720-1762, USA</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>2</month>
<year>2005</year>
</pub-date>
<pub-date pub-type="ppub">
<day>15</day>
<month>5</month>
<year>2005</year>
</pub-date>
<volume>21</volume>
<issue>10</issue>
<fpage>2294</fpage>
<lpage>2300</lpage>
<history>
<date date-type="accepted">
<day>22</day>
<month>11</month>
<year>2004</year>
</date>
<date date-type="received">
<day>20</day>
<month>7</month>
<year>2004</year>
</date>
<date date-type="rev-recd">
<day>19</day>
<month>11</month>
<year>2004</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email:
<ext-link xlink:href="journals.permissions@oupjournals.org" ext-link-type="email">journals.permissions@oupjournals.org</ext-link>
</copyright-statement>
<copyright-year>2005</copyright-year>
</permissions>
<abstract xml:lang="en">
<p>
<bold>Motivation:</bold>
The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as
<italic>Statistical Alignment</italic>
). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process.</p>
<p>
<bold>Results:</bold>
We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates.</p>
<p>
<bold>Availability:</bold>
Software implementing the algorithm and the benchmark is available under GPL from
<ext-link xlink:href="http://www.biowiki.org/" ext-link-type="url">http://www.biowiki.org/</ext-link>
</p>
<p>
<bold>Contact:</bold>
<ext-link xlink:href="ihh@berkeley.edu" ext-link-type="email">ihh@berkeley.edu</ext-link>
</p>
</abstract>
<custom-meta-wrap>
<custom-meta>
<meta-name>hwp-legacy-fpage</meta-name>
<meta-value>2294</meta-value>
</custom-meta>
<custom-meta>
<meta-name>cover-date</meta-name>
<meta-value></meta-value>
</custom-meta>
<custom-meta>
<meta-name>hwp-legacy-dochead</meta-name>
<meta-value>research-article</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Using evolutionary Expectation Maximization to estimate indel rates</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>Using evolutionary Expectation Maximization to estimate indel rates</title>
</titleInfo>
<name type="personal">
<namePart type="given">Ian</namePart>
<namePart type="family">Holmes</namePart>
<affiliation>Department of Statistics 1 South Parks Road, Oxford OX1 3TG, UK</affiliation>
</name>
<typeOfResource>text</typeOfResource>
<genre type="other" displayLabel="other" authority="ISTEX" authorityURI="https://content-type.data.istex.fr" valueURI="https://content-type.data.istex.fr/ark:/67375/XTP-7474895G-0">other</genre>
<originInfo>
<publisher>Oxford University Press</publisher>
<dateIssued encoding="w3cdtf">2005-05-15</dateIssued>
<dateCreated encoding="w3cdtf">2004-11-22</dateCreated>
<copyrightDate encoding="w3cdtf">2005</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<abstract lang="en">Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results: We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability: Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/ Contact: ihh@berkeley.edu</abstract>
<note type="author-notes">*Present address: Department of Bioengineering, University of California, Berkeley, CA 94720-1762, USA</note>
<relatedItem type="host">
<titleInfo>
<title>Bioinformatics</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>Bioinformatics</title>
</titleInfo>
<genre type="journal" authority="ISTEX" authorityURI="https://publication-type.data.istex.fr" valueURI="https://publication-type.data.istex.fr/ark:/67375/JMC-0GLKJH51-B">journal</genre>
<subject>
<topic>Phylogenetics</topic>
</subject>
<identifier type="ISSN">1367-4803</identifier>
<identifier type="eISSN">1460-2059</identifier>
<identifier type="PublisherID">bioinformatics</identifier>
<identifier type="PublisherID-hwp">bioinfo</identifier>
<identifier type="PublisherID-nlm-ta">Bioinformatics</identifier>
<part>
<date>2005</date>
<detail type="volume">
<caption>vol.</caption>
<number>21</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>10</number>
</detail>
<extent unit="pages">
<start>2294</start>
<end>2300</end>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B1">
<titleInfo>
<title>Nucleic Acids Res.</title>
</titleInfo>
<note>Bateman, A., et al. 1999Pfam 3.1: 1313 multiple alignments match the majority of proteins. Nucleic Acids Res. 27260–262</note>
<part>
<date>1999</date>
<detail type="volume">
<caption>vol.</caption>
<number>27</number>
</detail>
<extent unit="pages">
<start>260</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B2">
<titleInfo>
<title>Mol. Biol. Evol.</title>
</titleInfo>
<note>Bruno, W.J., et al. 2000Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17189–197</note>
<part>
<date>2000</date>
<detail type="volume">
<caption>vol.</caption>
<number>17</number>
</detail>
<extent unit="pages">
<start>189</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B3">
<titleInfo>
<title>J. R. Stat. Soc. B</title>
</titleInfo>
<note>Dempster, A.P., et al. 1977Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 391–38</note>
<part>
<date>1977</date>
<detail type="volume">
<caption>vol.</caption>
<number>39</number>
</detail>
<extent unit="pages">
<start>1</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B4">
<titleInfo>
<title>Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids</title>
</titleInfo>
<note>Durbin, R., Eddy, S., Krogh, A., Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids 1998, Cambridge, UK Cambridge University Press</note>
<part>
<date>1998</date>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B5">
<titleInfo>
<title>Inferring Phylogenies</title>
</titleInfo>
<note>Felsenstein, J. Inferring Phylogenies 2003 ISBN 0878931775 Sinauer Associates, Inc</note>
<part>
<date>2003</date>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B6">
<titleInfo>
<title>Friedman, N., Ninio, M., Pe'er, I., Pupko, T. 2001A structural EM algorithm for phylogenetic inference. Proceedings of the Fifth Annual International Conference on Computational Biology , New York Association for Computing Machinery</title>
</titleInfo>
<note>Friedman, N., Ninio, M., Pe'er, I., Pupko, T. 2001A structural EM algorithm for phylogenetic inference. Proceedings of the Fifth Annual International Conference on Computational Biology , New York Association for Computing Machinery</note>
<part>
<date>2001</date>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B7">
<titleInfo>
<title>Pacific Symposium on Biocomputing</title>
</titleInfo>
<note>Hein, J. 2001An algorithm for statistical alignment of sequences related by a binary tree. In Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Klein, T.E. (Eds.). Pacific Symposium on Biocomputing , Singapore World Scientific, pp. , pp. 179–190</note>
<part>
<date>2001</date>
<extent unit="pages">
<start>179</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B8">
<titleInfo>
<title>J. Mol. Biol.</title>
</titleInfo>
<note>Hein, J., et al. 2000Statistical alignment: computational properties, homology testing and goodness-of-fit. J. Mol. Biol. 302265–279</note>
<part>
<date>2000</date>
<detail type="volume">
<caption>vol.</caption>
<number>302</number>
</detail>
<extent unit="pages">
<start>265</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B9">
<titleInfo>
<title>Holmes, I. 2003Using guide trees to construct multiple-sequence evolutionary HMMs. Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology , Menlo Park, CA AAAI Press, pp. 147–157</title>
</titleInfo>
<note>Holmes, I. 2003Using guide trees to construct multiple-sequence evolutionary HMMs. Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology , Menlo Park, CA AAAI Press, pp. 147–157</note>
<part>
<date>2003</date>
<extent unit="pages">
<start>147</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B10">
<titleInfo>
<title>BMC Bioinformatics</title>
</titleInfo>
<note>Holmes, I. 2004A probabilistic model for the evolution of RNA structure. BMC Bioinformatics 5</note>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>5</number>
</detail>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B11">
<titleInfo>
<title>Bioinformatics</title>
</titleInfo>
<note>Holmes, I. and Bruno, W.J. 2001Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 17803–820</note>
<part>
<date>2001</date>
<detail type="volume">
<caption>vol.</caption>
<number>17</number>
</detail>
<extent unit="pages">
<start>803</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B12">
<titleInfo>
<title>J. Mol. Biol.</title>
</titleInfo>
<note>Holmes, I. and Rubin, G.M. 2002An Expectation Maximization algorithm for training hidden substitution models. J. Mol. Biol. 317757–768</note>
<part>
<date>2002</date>
<detail type="volume">
<caption>vol.</caption>
<number>317</number>
</detail>
<extent unit="pages">
<start>757</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B13">
<titleInfo>
<title>A First Course in Stochastic Processes</title>
</titleInfo>
<note>Karlin, S. and Taylor, H. A First Course in Stochastic Processes 1975, San Diego, CA Academic Press</note>
<part>
<date>1975</date>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B14">
<titleInfo>
<title>Nature</title>
</titleInfo>
<note>Kellis, M., et al. 2003Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423241–254</note>
<part>
<date>2003</date>
<detail type="volume">
<caption>vol.</caption>
<number>423</number>
</detail>
<extent unit="pages">
<start>241</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B15">
<titleInfo>
<title>J. Mol. Biol.</title>
</titleInfo>
<note>Knudsen, B. and Miyamoto, M. 2003Sequence alignments and pair hidden Markov models using evolutionary history. J. Mol. Biol. 333453–460</note>
<part>
<date>2003</date>
<detail type="volume">
<caption>vol.</caption>
<number>333</number>
</detail>
<extent unit="pages">
<start>453</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B16">
<titleInfo>
<title>Bioinformatics</title>
</titleInfo>
<note>Lunter, G.A. and Hein, J. 2004A nucleotide substitution model with nearest-neighbour interactions. Bioinformatics 20Suppl. 1,I216–I223</note>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>20</number>
</detail>
<extent unit="pages">
<start>I216</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B17">
<titleInfo>
<title>Science</title>
</titleInfo>
<note>Marra, M.A., et al. 2003The genome sequence of the SARS-associated coronavirus. Science 3001399–1404</note>
<part>
<date>2003</date>
<detail type="volume">
<caption>vol.</caption>
<number>300</number>
</detail>
<extent unit="pages">
<start>1399</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B18">
<titleInfo>
<title>J. Mol. Evol.</title>
</titleInfo>
<note>Metzler, D., et al. 2001Assessing variability by joint sampling of alignments and mutation rates. J. Mol. Evol. 53660–669</note>
<part>
<date>2001</date>
<detail type="volume">
<caption>vol.</caption>
<number>53</number>
</detail>
<extent unit="pages">
<start>660</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B19">
<titleInfo>
<title>Mol. Biol. Evol.</title>
</titleInfo>
<note>Miklós, I., et al. 2004A long indel model for evolutionary sequence alignment. Mol. Biol. Evol. 21529–540</note>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>21</number>
</detail>
<extent unit="pages">
<start>529</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B20">
<titleInfo>
<title>J. Mol. Evol.</title>
</titleInfo>
<note>Mitchison, G.J. and Durbin, R. 1995Tree-based maximal likelihood substitution matrices and hidden Markov models. J. Mol. Evol. 411139–1151</note>
<part>
<date>1995</date>
<detail type="volume">
<caption>vol.</caption>
<number>41</number>
</detail>
<extent unit="pages">
<start>1139</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B21">
<titleInfo>
<title>Numerical Recipes in C</title>
</titleInfo>
<note>Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. Numerical Recipes in C 1992, Cambridge, UK Cambridge University Press</note>
<part>
<date>1992</date>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B22">
<titleInfo>
<title>Mol. Biol. Evol.</title>
</titleInfo>
<note>Siepel, A. and Haussler, D. 2004Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol. Biol. Evol. 21468–488</note>
<part>
<date>2004</date>
<detail type="volume">
<caption>vol.</caption>
<number>21</number>
</detail>
<extent unit="pages">
<start>468</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B23">
<titleInfo>
<title>J. Mol. Evol.</title>
</titleInfo>
<note>Thorne, J.L., et al. 1991An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33114–124</note>
<part>
<date>1991</date>
<detail type="volume">
<caption>vol.</caption>
<number>33</number>
</detail>
<extent unit="pages">
<start>114</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B24">
<titleInfo>
<title>J. Mol. Evol.</title>
</titleInfo>
<note>Thorne, J.L., et al. 1992Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 343–16</note>
<part>
<date>1992</date>
<detail type="volume">
<caption>vol.</caption>
<number>34</number>
</detail>
<extent unit="pages">
<start>3</start>
</extent>
</part>
</relatedItem>
<relatedItem type="references" displayLabel="B25">
<titleInfo>
<title>Mol. Biol. Evol.</title>
</titleInfo>
<note>Thorne, J.L., et al. 1996Combining protein evolution and secondary structure. Mol. Biol. Evol. 13666–673</note>
<part>
<date>1996</date>
<detail type="volume">
<caption>vol.</caption>
<number>13</number>
</detail>
<extent unit="pages">
<start>666</start>
</extent>
</part>
</relatedItem>
<identifier type="istex">F281223AAC22F2B7DE8C59461F5F6B2675557942</identifier>
<identifier type="ark">ark:/67375/HXZ-CKLS1M4Q-6</identifier>
<identifier type="DOI">10.1093/bioinformatics/bti177</identifier>
<identifier type="local">bti177</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org</accessCondition>
<recordInfo>
<recordContentSource authority="ISTEX" authorityURI="https://loaded-corpus.data.istex.fr" valueURI="https://loaded-corpus.data.istex.fr/ark:/67375/XBH-GTWS0RDP-M">oup</recordContentSource>
<recordOrigin>Converted from (version 1.2.10) to MODS version 3.6.</recordOrigin>
<recordCreationDate encoding="w3cdtf">2020-04-16</recordCreationDate>
</recordInfo>
</mods>
<json:item>
<extension>json</extension>
<original>false</original>
<mimetype>application/json</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/record.json</uri>
</json:item>
</metadata>
<annexes>
<json:item>
<extension>gif</extension>
<original>true</original>
<mimetype>image/gif</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/annexes.gif</uri>
</json:item>
<json:item>
<extension>jpeg</extension>
<original>true</original>
<mimetype>image/jpeg</mimetype>
<uri>https://api.istex.fr/ark:/67375/HXZ-CKLS1M4Q-6/annexes.jpeg</uri>
</json:item>
</annexes>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000697 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000697 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:F281223AAC22F2B7DE8C59461F5F6B2675557942
   |texte=   Using evolutionary Expectation Maximization to estimate indel rates
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021