Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

Identifieur interne : 001307 ( Main/Exploration ); précédent : 001306; suivant : 001308

Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

Auteurs : Troy Hernandez [République populaire de Chine] ; Jie Yang

Source :

RBID : pubmed:27409298

Descripteurs français

English descriptors

Abstract

The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.

DOI: 10.1089/cmb.2013.0132
PubMed: 27409298


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.</title>
<author>
<name sortKey="Hernandez, Troy" sort="Hernandez, Troy" uniqKey="Hernandez T" first="Troy" last="Hernandez">Troy Hernandez</name>
<affiliation wicri:level="3">
<nlm:affiliation>1 Mathematical Sciences Center, Tsinghua University , Beijing, China .</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>1 Mathematical Sciences Center, Tsinghua University , Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Yang, Jie" sort="Yang, Jie" uniqKey="Yang J" first="Jie" last="Yang">Jie Yang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:27409298</idno>
<idno type="pmid">27409298</idno>
<idno type="doi">10.1089/cmb.2013.0132</idno>
<idno type="wicri:Area/PubMed/Corpus">001048</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001048</idno>
<idno type="wicri:Area/PubMed/Curation">001048</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001048</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001155</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001155</idno>
<idno type="wicri:Area/Ncbi/Merge">001698</idno>
<idno type="wicri:Area/Ncbi/Curation">001698</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001698</idno>
<idno type="wicri:Area/Main/Merge">001311</idno>
<idno type="wicri:Area/Main/Curation">001307</idno>
<idno type="wicri:Area/Main/Exploration">001307</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.</title>
<author>
<name sortKey="Hernandez, Troy" sort="Hernandez, Troy" uniqKey="Hernandez T" first="Troy" last="Hernandez">Troy Hernandez</name>
<affiliation wicri:level="3">
<nlm:affiliation>1 Mathematical Sciences Center, Tsinghua University , Beijing, China .</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>1 Mathematical Sciences Center, Tsinghua University , Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Yang, Jie" sort="Yang, Jie" uniqKey="Yang J" first="Jie" last="Yang">Jie Yang</name>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Genome, Viral</term>
<term>Phylogeny</term>
<term>Sequence Alignment (methods)</term>
<term>Software</term>
<term>Viruses (classification)</term>
<term>Viruses (genetics)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Alignement de séquences ()</term>
<term>Biologie informatique ()</term>
<term>Génome viral</term>
<term>Logiciel</term>
<term>Phylogénie</term>
<term>Virus ()</term>
<term>Virus (génétique)</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en">
<term>Viruses</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Viruses</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Alignment</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Genome, Viral</term>
<term>Phylogeny</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Biologie informatique</term>
<term>Génome viral</term>
<term>Logiciel</term>
<term>Phylogénie</term>
<term>Virus</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request. </div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
<settlement>
<li>Pékin</li>
</settlement>
</list>
<tree>
<noCountry>
<name sortKey="Yang, Jie" sort="Yang, Jie" uniqKey="Yang J" first="Jie" last="Yang">Jie Yang</name>
</noCountry>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Hernandez, Troy" sort="Hernandez, Troy" uniqKey="Hernandez T" first="Troy" last="Hernandez">Troy Hernandez</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001307 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001307 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:27409298
   |texte=   Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:27409298" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021