SrasV1, Ncbi, Merge, bibRecord, 001220

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Identifieur interne : 001220 ( Ncbi/Merge ); précédent : 001219; suivant : 001221

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Auteurs : Albert C-C Yang [États-Unis] ; Ary L. Goldberger ; C-K Peng

Source :

Journal of computational biology : a journal of computational molecular cell biology [ 1066-5277 ] ; 2005.

RBID : pubmed:16241900

Descripteurs français

English descriptors

KwdEn :
- Base Sequence, Cluster Analysis, DNA, Mitochondrial, Databases, Nucleic Acid, Evolution, Molecular, Humans, Influenza A virus (genetics), Molecular Sequence Data, Phylogeny, SARS Virus (classification), SARS Virus (genetics), Sequence Alignment, Sequence Analysis, DNA (methods).
MESH :
- chemical : DNA, Mitochondrial.
- classification : SARS Virus.
- genetics : Influenza A virus, SARS Virus.
- methods : Sequence Analysis, DNA.
- Base Sequence, Cluster Analysis, Databases, Nucleic Acid, Evolution, Molecular, Humans, Molecular Sequence Data, Phylogeny, Sequence Alignment.

Abstract

Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.

DOI: 10.1089/cmb.2005.12.1103
PubMed: 16241900

Links toward previous steps (curation, corpus...)

to stream PubMed, to step Corpus: 002486
to stream PubMed, to step Curation: 002486
to stream PubMed, to step Checkpoint: 002695

Links to Exploration step

pubmed:16241900

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Genomic classification using an information-based similarity index: application to the SARS coronavirus.</title>
<author><name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
<affiliation wicri:level="1"><nlm:affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215</wicri:regionArea>
<wicri:noRegion>Massachusetts 02215</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
</author>
<author><name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:16241900</idno>
<idno type="pmid">16241900</idno>
<idno type="doi">10.1089/cmb.2005.12.1103</idno>
<idno type="wicri:Area/PubMed/Corpus">002486</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002486</idno>
<idno type="wicri:Area/PubMed/Curation">002486</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002486</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002695</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002695</idno>
<idno type="wicri:Area/Ncbi/Merge">001220</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Genomic classification using an information-based similarity index: application to the SARS coronavirus.</title>
<author><name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
<affiliation wicri:level="1"><nlm:affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215</wicri:regionArea>
<wicri:noRegion>Massachusetts 02215</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
</author>
<author><name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</author>
</analytic>
<series><title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="ISSN">1066-5277</idno>
<imprint><date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>DNA, Mitochondrial</term>
<term>Databases, Nucleic Acid</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Influenza A virus (genetics)</term>
<term>Molecular Sequence Data</term>
<term>Phylogeny</term>
<term>SARS Virus (classification)</term>
<term>SARS Virus (genetics)</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>ADN mitochondrial</term>
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données d'acides nucléiques</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Phylogénie</term>
<term>Séquence nucléotidique</term>
<term>Virus de la grippe A (génétique)</term>
<term>Virus du SRAS ()</term>
<term>Virus du SRAS (génétique)</term>
<term>Évolution moléculaire</term>
</keywords>
<keywords scheme="MESH" type="chemical" xml:lang="en"><term>DNA, Mitochondrial</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en"><term>SARS Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Influenza A virus</term>
<term>SARS Virus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Virus de la grippe A</term>
<term>Virus du SRAS</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Base Sequence</term>
<term>Cluster Analysis</term>
<term>Databases, Nucleic Acid</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Molecular Sequence Data</term>
<term>Phylogeny</term>
<term>Sequence Alignment</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>ADN mitochondrial</term>
<term>Alignement de séquences</term>
<term>Analyse de regroupements</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données d'acides nucléiques</term>
<term>Données de séquences moléculaires</term>
<term>Humains</term>
<term>Phylogénie</term>
<term>Séquence nucléotidique</term>
<term>Virus du SRAS</term>
<term>Évolution moléculaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" Owner="NLM"><PMID Version="1">16241900</PMID>
<DateCompleted><Year>2005</Year>
<Month>12</Month>
<Day>29</Day>
</DateCompleted>
<DateRevised><Year>2007</Year>
<Month>11</Month>
<Day>14</Day>
</DateRevised>
<Article PubModel="Print"><Journal><ISSN IssnType="Print">1066-5277</ISSN>
<JournalIssue CitedMedium="Print"><Volume>12</Volume>
<Issue>8</Issue>
<PubDate><Year>2005</Year>
<Month>Oct</Month>
</PubDate>
</JournalIssue>
<Title>Journal of computational biology : a journal of computational molecular cell biology</Title>
<ISOAbbreviation>J. Comput. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Genomic classification using an information-based similarity index: application to the SARS coronavirus.</ArticleTitle>
<Pagination><MedlinePgn>1103-16</MedlinePgn>
</Pagination>
<Abstract><AbstractText>Measures of genetic distance based on alignment methods are confined to studying sequences that are conserved and identifiable in all organisms under study. A number of alignment-free techniques based on either statistical linguistics or information theory have been developed to overcome the limitations of alignment methods. We present a novel alignment-free approach to measuring the similarity among genetic sequences that incorporates elements from both word rank order-frequency statistics and information theory. We first validate this method on the human influenza A viral genomes as well as on the human mitochondrial DNA database. We then apply the method to study the origin of the SARS coronavirus. We find that the majority of the SARS genome is most closely related to group 1 coronaviruses, with smaller regions of matches to sequences from groups 2 and 3. The information based similarity index provides a new tool to measure the similarity between datasets based on their information content and may have a wide range of applications in the large-scale analysis of genomic databases.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Yang</LastName>
<ForeName>Albert C-C</ForeName>
<Initials>AC</Initials>
<AffiliationInfo><Affiliation>Cardiovascular Division and Margret and H.A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, Massachusetts 02215, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Goldberger</LastName>
<ForeName>Ary L</ForeName>
<Initials>AL</Initials>
</Author>
<Author ValidYN="Y"><LastName>Peng</LastName>
<ForeName>C-K</ForeName>
<Initials>CK</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y"><Grant><GrantID>P41-RR13622</GrantID>
<Acronym>RR</Acronym>
<Agency>NCRR NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant><GrantID>P60-AG08812</GrantID>
<Acronym>AG</Acronym>
<Agency>NIA NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>J Comput Biol</MedlineTA>
<NlmUniqueID>9433358</NlmUniqueID>
<ISSNLinking>1066-5277</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList><Chemical><RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D004272">DNA, Mitochondrial</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D001483" MajorTopicYN="N">Base Sequence</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016000" MajorTopicYN="N">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D004272" MajorTopicYN="N">DNA, Mitochondrial</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D030561" MajorTopicYN="N">Databases, Nucleic Acid</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D019143" MajorTopicYN="N">Evolution, Molecular</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D009980" MajorTopicYN="N">Influenza A virus</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D008969" MajorTopicYN="N">Molecular Sequence Data</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D010802" MajorTopicYN="N">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D045473" MajorTopicYN="N">SARS Virus</DescriptorName>
<QualifierName UI="Q000145" MajorTopicYN="N">classification</QualifierName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016415" MajorTopicYN="N">Sequence Alignment</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="pubmed"><Year>2005</Year>
<Month>10</Month>
<Day>26</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2005</Year>
<Month>12</Month>
<Day>31</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2005</Year>
<Month>10</Month>
<Day>26</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">16241900</ArticleId>
<ArticleId IdType="doi">10.1089/cmb.2005.12.1103</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations><list><country><li>États-Unis</li>
</country>
</list>
<tree><noCountry><name sortKey="Goldberger, Ary L" sort="Goldberger, Ary L" uniqKey="Goldberger A" first="Ary L" last="Goldberger">Ary L. Goldberger</name>
<name sortKey="Peng, C K" sort="Peng, C K" uniqKey="Peng C" first="C-K" last="Peng">C-K Peng</name>
</noCountry>
<country name="États-Unis"><noRegion><name sortKey="Yang, Albert C C" sort="Yang, Albert C C" uniqKey="Yang A" first="Albert C-C" last="Yang">Albert C-C Yang</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001220 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001220 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:16241900
   |texte=   Genomic classification using an information-based similarity index: application to the SARS coronavirus.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:16241900" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021

	Serveur d'exploration SRAS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration SRAS

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Genomic classification using an information-based similarity index: application to the SARS coronavirus.

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki