Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Identifieur interne : 004B14 ( Main/Merge ); précédent : 004B13; suivant : 004B15

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Auteurs : Albert C-C Yang [États-Unis] ; Ary L. Goldberger ; C-K Peng

Source :

RBID : pubmed:16241900

Descripteurs français

English descriptors

Abstract

Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.

DOI: 10.1089/cmb.2005.12.1103
PubMed: 16241900

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:16241900

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Genomic classification using an information-based similarity index: application to the SARS coronavirus.</title>
<author>
<name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215</wicri:regionArea>
<wicri:noRegion>Massachusetts 02215</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
</author>
<author>
<name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:16241900</idno>
<idno type="pmid">16241900</idno>
<idno type="doi">10.1089/cmb.2005.12.1103</idno>
<idno type="wicri:Area/PubMed/Corpus">002486</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002486</idno>
<idno type="wicri:Area/PubMed/Curation">002486</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002486</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002695</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002695</idno>
<idno type="wicri:Area/Ncbi/Merge">001220</idno>
<idno type="wicri:Area/Ncbi/Curation">001220</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001220</idno>
<idno type="wicri:doubleKey">1066-5277:2005:Yang A:genomic:classification:using</idno>
<idno type="wicri:Area/Main/Merge">004B14</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Genomic classification using an information-based similarity index: application to the SARS coronavirus.</title>
<author>
<name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215</wicri:regionArea>
<wicri:noRegion>Massachusetts 02215</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
</author>
<author>
<name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="ISSN">1066-5277</idno>
<imprint>
<date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>DNA, Mitochondrial</term>
<term>Databases, Nucleic Acid</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Influenza A virus (genetics)</term>
<term>Molecular Sequence Data</term>
<term>Phylogeny</term>
<term>SARS Virus (classification)</term>
<term>SARS Virus (genetics)</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN mitochondrial</term>
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données d'acides nucléiques</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Phylogénie</term>
<term>Séquence nucléotidique</term>
<term>Virus de la grippe A (génétique)</term>
<term>Virus du SRAS ()</term>
<term>Virus du SRAS (génétique)</term>
<term>Évolution moléculaire</term>
</keywords>
<keywords scheme="MESH" type="chemical" xml:lang="en">
<term>DNA, Mitochondrial</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en">
<term>SARS Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Influenza A virus</term>
<term>SARS Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Virus de la grippe A</term>
<term>Virus du SRAS</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>Databases, Nucleic Acid</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Molecular Sequence Data</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>ADN mitochondrial</term>
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données d'acides nucléiques</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Phylogénie</term>
<term>Séquence nucléotidique</term>
<term>Virus du SRAS</term>
<term>Évolution moléculaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 004B14 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 004B14 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:16241900
   |texte=   Genomic classification using an information-based similarity index: application to the SARS coronavirus.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Merge/RBID.i   -Sk "pubmed:16241900" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021