Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Identifieur interne : 001220 ( Ncbi/Merge ); précédent : 001219; suivant : 001221

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Auteurs : Albert C-C Yang [États-Unis] ; Ary L. Goldberger ; C-K Peng

Source :

RBID : pubmed:16241900

Descripteurs français

English descriptors

Abstract

Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.

DOI: 10.1089/cmb.2005.12.1103
PubMed: 16241900

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:16241900

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Genomic classification using an information-based similarity index: application to the SARS coronavirus.</title>
<author>
<name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215</wicri:regionArea>
<wicri:noRegion>Massachusetts 02215</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
</author>
<author>
<name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:16241900</idno>
<idno type="pmid">16241900</idno>
<idno type="doi">10.1089/cmb.2005.12.1103</idno>
<idno type="wicri:Area/PubMed/Corpus">002486</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002486</idno>
<idno type="wicri:Area/PubMed/Curation">002486</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002486</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002695</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002695</idno>
<idno type="wicri:Area/Ncbi/Merge">001220</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Genomic classification using an information-based similarity index: application to the SARS coronavirus.</title>
<author>
<name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215</wicri:regionArea>
<wicri:noRegion>Massachusetts 02215</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
</author>
<author>
<name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="ISSN">1066-5277</idno>
<imprint>
<date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>DNA, Mitochondrial</term>
<term>Databases, Nucleic Acid</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Influenza A virus (genetics)</term>
<term>Molecular Sequence Data</term>
<term>Phylogeny</term>
<term>SARS Virus (classification)</term>
<term>SARS Virus (genetics)</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN mitochondrial</term>
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données d'acides nucléiques</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Phylogénie</term>
<term>Séquence nucléotidique</term>
<term>Virus de la grippe A (génétique)</term>
<term>Virus du SRAS ()</term>
<term>Virus du SRAS (génétique)</term>
<term>Évolution moléculaire</term>
</keywords>
<keywords scheme="MESH" type="chemical" xml:lang="en">
<term>DNA, Mitochondrial</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en">
<term>SARS Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Influenza A virus</term>
<term>SARS Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Virus de la grippe A</term>
<term>Virus du SRAS</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>Databases, Nucleic Acid</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Molecular Sequence Data</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>ADN mitochondrial</term>
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données d'acides nucléiques</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Phylogénie</term>
<term>Séquence nucléotidique</term>
<term>Virus du SRAS</term>
<term>Évolution moléculaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">16241900</PMID>
<DateCompleted>
<Year>2005</Year>
<Month>12</Month>
<Day>29</Day>
</DateCompleted>
<DateRevised>
<Year>2007</Year>
<Month>11</Month>
<Day>14</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">1066-5277</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>12</Volume>
<Issue>8</Issue>
<PubDate>
<Year>2005</Year>
<Month>Oct</Month>
</PubDate>
</JournalIssue>
<Title>Journal of computational biology : a journal of computational molecular cell biology</Title>
<ISOAbbreviation>J. Comput. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Genomic classification using an information-based similarity index: application to the SARS coronavirus.</ArticleTitle>
<Pagination>
<MedlinePgn>1103-16</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Yang</LastName>
<ForeName>Albert C-C</ForeName>
<Initials>AC</Initials>
<AffiliationInfo>
<Affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Goldberger</LastName>
<ForeName>Ary L</ForeName>
<Initials>AL</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Peng</LastName>
<ForeName>C-K</ForeName>
<Initials>CK</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>P41-RR13622</GrantID>
<Acronym>RR</Acronym>
<Agency>NCRR NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>P60-AG08812</GrantID>
<Acronym>AG</Acronym>
<Agency>NIA NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Comput Biol</MedlineTA>
<NlmUniqueID>9433358</NlmUniqueID>
<ISSNLinking>1066-5277</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D004272">DNA, Mitochondrial</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D001483" MajorTopicYN="N">Base Sequence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016000" MajorTopicYN="N">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004272" MajorTopicYN="N">DNA, Mitochondrial</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D030561" MajorTopicYN="N">Databases, Nucleic Acid</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019143" MajorTopicYN="N">Evolution, Molecular</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D009980" MajorTopicYN="N">Influenza A virus</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008969" MajorTopicYN="N">Molecular Sequence Data</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010802" MajorTopicYN="N">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D045473" MajorTopicYN="N">SARS Virus</DescriptorName>
<QualifierName UI="Q000145" MajorTopicYN="N">classification</QualifierName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2005</Year>
<Month>10</Month>
<Day>26</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2005</Year>
<Month>12</Month>
<Day>31</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2005</Year>
<Month>10</Month>
<Day>26</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">16241900</ArticleId>
<ArticleId IdType="doi">10.1089/cmb.2005.12.1103</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
</list>
<tree>
<noCountry>
<name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
<name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</noCountry>
<country name="États-Unis">
<noRegion>
<name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001220 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001220 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:16241900
   |texte=   Genomic classification using an information-based similarity index: application to the SARS coronavirus.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:16241900" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021