Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping.

Identifieur interne : 001D64 ( PubMed/Curation ); précédent : 001D63; suivant : 001D65

Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping.

Auteurs : Pandurang Kolekar [Inde] ; Mohan Kale ; Urmila Kulkarni-Kale

Source :

RBID : pubmed:22820020

Descripteurs français

English descriptors

Abstract

The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of 'return time distribution' (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy.

DOI: 10.1016/j.ympev.2012.07.003
PubMed: 22820020

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:22820020

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping.</title>
<author>
<name sortKey="Kolekar, Pandurang" sort="Kolekar, Pandurang" uniqKey="Kolekar P" first="Pandurang" last="Kolekar">Pandurang Kolekar</name>
<affiliation wicri:level="1">
<nlm:affiliation>Bioinformatics Centre, University of Pune, Pune 411 007, India. pandurang@bioinfo.net.in</nlm:affiliation>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Bioinformatics Centre, University of Pune, Pune 411 007</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kale, Mohan" sort="Kale, Mohan" uniqKey="Kale M" first="Mohan" last="Kale">Mohan Kale</name>
</author>
<author>
<name sortKey="Kulkarni Kale, Urmila" sort="Kulkarni Kale, Urmila" uniqKey="Kulkarni Kale U" first="Urmila" last="Kulkarni-Kale">Urmila Kulkarni-Kale</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="RBID">pubmed:22820020</idno>
<idno type="pmid">22820020</idno>
<idno type="doi">10.1016/j.ympev.2012.07.003</idno>
<idno type="wicri:Area/PubMed/Corpus">001D64</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001D64</idno>
<idno type="wicri:Area/PubMed/Curation">001D64</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001D64</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping.</title>
<author>
<name sortKey="Kolekar, Pandurang" sort="Kolekar, Pandurang" uniqKey="Kolekar P" first="Pandurang" last="Kolekar">Pandurang Kolekar</name>
<affiliation wicri:level="1">
<nlm:affiliation>Bioinformatics Centre, University of Pune, Pune 411 007, India. pandurang@bioinfo.net.in</nlm:affiliation>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Bioinformatics Centre, University of Pune, Pune 411 007</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Kale, Mohan" sort="Kale, Mohan" uniqKey="Kale M" first="Mohan" last="Kale">Mohan Kale</name>
</author>
<author>
<name sortKey="Kulkarni Kale, Urmila" sort="Kulkarni Kale, Urmila" uniqKey="Kulkarni Kale U" first="Urmila" last="Kulkarni-Kale">Urmila Kulkarni-Kale</name>
</author>
</analytic>
<series>
<title level="j">Molecular phylogenetics and evolution</title>
<idno type="eISSN">1095-9513</idno>
<imprint>
<date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Cluster Analysis</term>
<term>Computational Biology</term>
<term>Data Mining</term>
<term>Dengue Virus (classification)</term>
<term>Flaviviridae (classification)</term>
<term>Genome, Viral</term>
<term>Pattern Recognition, Automated</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence ()</term>
<term>Biologie informatique</term>
<term>Flaviviridae ()</term>
<term>Fouille de données</term>
<term>Génome viral</term>
<term>Phylogénie</term>
<term>Reconnaissance automatique des formes</term>
<term>Virus de la dengue ()</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en">
<term>Dengue Virus</term>
<term>Flaviviridae</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Analysis</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Cluster Analysis</term>
<term>Computational Biology</term>
<term>Data Mining</term>
<term>Genome, Viral</term>
<term>Pattern Recognition, Automated</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence</term>
<term>Biologie informatique</term>
<term>Flaviviridae</term>
<term>Fouille de données</term>
<term>Génome viral</term>
<term>Phylogénie</term>
<term>Reconnaissance automatique des formes</term>
<term>Virus de la dengue</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of 'return time distribution' (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">22820020</PMID>
<DateCompleted>
<Year>2012</Year>
<Month>12</Month>
<Day>21</Day>
</DateCompleted>
<DateRevised>
<Year>2012</Year>
<Month>09</Month>
<Day>17</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1095-9513</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>65</Volume>
<Issue>2</Issue>
<PubDate>
<Year>2012</Year>
<Month>Nov</Month>
</PubDate>
</JournalIssue>
<Title>Molecular phylogenetics and evolution</Title>
<ISOAbbreviation>Mol. Phylogenet. Evol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping.</ArticleTitle>
<Pagination>
<MedlinePgn>510-22</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1016/j.ympev.2012.07.003</ELocationID>
<ELocationID EIdType="pii" ValidYN="Y">S1055-7903(12)00260-6</ELocationID>
<Abstract>
<AbstractText>The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of 'return time distribution' (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy.</AbstractText>
<CopyrightInformation>Copyright © 2012 Elsevier Inc. All rights reserved.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Kolekar</LastName>
<ForeName>Pandurang</ForeName>
<Initials>P</Initials>
<AffiliationInfo>
<Affiliation>Bioinformatics Centre, University of Pune, Pune 411 007, India. pandurang@bioinfo.net.in</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Kale</LastName>
<ForeName>Mohan</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Kulkarni-Kale</LastName>
<ForeName>Urmila</ForeName>
<Initials>U</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2012</Year>
<Month>07</Month>
<Day>20</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Mol Phylogenet Evol</MedlineTA>
<NlmUniqueID>9304400</NlmUniqueID>
<ISSNLinking>1055-7903</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D016000" MajorTopicYN="N">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D057225" MajorTopicYN="N">Data Mining</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003716" MajorTopicYN="N">Dengue Virus</DescriptorName>
<QualifierName UI="Q000145" MajorTopicYN="N">classification</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D018067" MajorTopicYN="N">Flaviviridae</DescriptorName>
<QualifierName UI="Q000145" MajorTopicYN="N">classification</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016679" MajorTopicYN="N">Genome, Viral</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010363" MajorTopicYN="N">Pattern Recognition, Automated</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010802" MajorTopicYN="Y">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017421" MajorTopicYN="N">Sequence Analysis</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2012</Year>
<Month>01</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2012</Year>
<Month>07</Month>
<Day>08</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2012</Year>
<Month>7</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2012</Year>
<Month>7</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2012</Year>
<Month>12</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">22820020</ArticleId>
<ArticleId IdType="pii">S1055-7903(12)00260-6</ArticleId>
<ArticleId IdType="doi">10.1016/j.ympev.2012.07.003</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D64 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 001D64 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:22820020
   |texte=   Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:22820020" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021