Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.

Identifieur interne : 001D07 ( PubMed/Corpus ); précédent : 001D06; suivant : 001D08

Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.

Auteurs : Heejoon Chae ; Jinwoo Park ; Seong-Whan Lee ; Kenneth P. Nephew ; Sun Kim

Source :

RBID : pubmed:23519616

English descriptors

Abstract

CpG islands are GC-rich regions often located in the 5' end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mammalian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techniques remains a major challenge in the epigenetics field. In this study, we conducted a comparative analysis of all CpG island sequences in 10 mammalian genomes. As sequence similarity methods and character composition techniques such as information theory are particularly difficult to conduct, we used exact patterns in CpG island sequences and single character discrepancies to identify differences in CpG island sequences. First, by calculating genome distance based on rank correlation tests, we show that k-mer and k-flank patterns around CpG sites can be used to correctly reconstruct the phylogeny of 10 mammalian genomes. Further, we used various machine learning algorithms to demonstrate that CpG islands sequences can be characterized using k-mers. In addition, by testing a human model on the nine different mammalian genomes, we provide the first evidence that k-mer signatures are consistent with evolutionary history.

DOI: 10.1093/nar/gkt144
PubMed: 23519616

Links to Exploration step

pubmed:23519616

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.</title>
<author>
<name sortKey="Chae, Heejoon" sort="Chae, Heejoon" uniqKey="Chae H" first="Heejoon" last="Chae">Heejoon Chae</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Park, Jinwoo" sort="Park, Jinwoo" uniqKey="Park J" first="Jinwoo" last="Park">Jinwoo Park</name>
</author>
<author>
<name sortKey="Lee, Seong Whan" sort="Lee, Seong Whan" uniqKey="Lee S" first="Seong-Whan" last="Lee">Seong-Whan Lee</name>
</author>
<author>
<name sortKey="Nephew, Kenneth P" sort="Nephew, Kenneth P" uniqKey="Nephew K" first="Kenneth P" last="Nephew">Kenneth P. Nephew</name>
</author>
<author>
<name sortKey="Kim, Sun" sort="Kim, Sun" uniqKey="Kim S" first="Sun" last="Kim">Sun Kim</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2013">2013</date>
<idno type="RBID">pubmed:23519616</idno>
<idno type="pmid">23519616</idno>
<idno type="doi">10.1093/nar/gkt144</idno>
<idno type="wicri:Area/PubMed/Corpus">001D07</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001D07</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.</title>
<author>
<name sortKey="Chae, Heejoon" sort="Chae, Heejoon" uniqKey="Chae H" first="Heejoon" last="Chae">Heejoon Chae</name>
<affiliation>
<nlm:affiliation>Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Park, Jinwoo" sort="Park, Jinwoo" uniqKey="Park J" first="Jinwoo" last="Park">Jinwoo Park</name>
</author>
<author>
<name sortKey="Lee, Seong Whan" sort="Lee, Seong Whan" uniqKey="Lee S" first="Seong-Whan" last="Lee">Seong-Whan Lee</name>
</author>
<author>
<name sortKey="Nephew, Kenneth P" sort="Nephew, Kenneth P" uniqKey="Nephew K" first="Kenneth P" last="Nephew">Kenneth P. Nephew</name>
</author>
<author>
<name sortKey="Kim, Sun" sort="Kim, Sun" uniqKey="Kim S" first="Sun" last="Kim">Sun Kim</name>
</author>
</analytic>
<series>
<title level="j">Nucleic acids research</title>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2013" type="published">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Artificial Intelligence</term>
<term>CpG Islands</term>
<term>Evolution, Molecular</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Mammals (classification)</term>
<term>Mammals (genetics)</term>
<term>Phylogeny</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="classification" xml:lang="en">
<term>Mammals</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Mammals</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Genomics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Artificial Intelligence</term>
<term>CpG Islands</term>
<term>Evolution, Molecular</term>
<term>Humans</term>
<term>Phylogeny</term>
<term>Sequence Analysis, DNA</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">CpG islands are GC-rich regions often located in the 5' end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mammalian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techniques remains a major challenge in the epigenetics field. In this study, we conducted a comparative analysis of all CpG island sequences in 10 mammalian genomes. As sequence similarity methods and character composition techniques such as information theory are particularly difficult to conduct, we used exact patterns in CpG island sequences and single character discrepancies to identify differences in CpG island sequences. First, by calculating genome distance based on rank correlation tests, we show that k-mer and k-flank patterns around CpG sites can be used to correctly reconstruct the phylogeny of 10 mammalian genomes. Further, we used various machine learning algorithms to demonstrate that CpG islands sequences can be characterized using k-mers. In addition, by testing a human model on the nine different mammalian genomes, we provide the first evidence that k-mer signatures are consistent with evolutionary history.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">23519616</PMID>
<DateCompleted>
<Year>2013</Year>
<Month>07</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1362-4962</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>41</Volume>
<Issue>9</Issue>
<PubDate>
<Year>2013</Year>
<Month>May</Month>
</PubDate>
</JournalIssue>
<Title>Nucleic acids research</Title>
<ISOAbbreviation>Nucleic Acids Res.</ISOAbbreviation>
</Journal>
<ArticleTitle>Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.</ArticleTitle>
<Pagination>
<MedlinePgn>4783-91</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/nar/gkt144</ELocationID>
<Abstract>
<AbstractText>CpG islands are GC-rich regions often located in the 5' end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mammalian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techniques remains a major challenge in the epigenetics field. In this study, we conducted a comparative analysis of all CpG island sequences in 10 mammalian genomes. As sequence similarity methods and character composition techniques such as information theory are particularly difficult to conduct, we used exact patterns in CpG island sequences and single character discrepancies to identify differences in CpG island sequences. First, by calculating genome distance based on rank correlation tests, we show that k-mer and k-flank patterns around CpG sites can be used to correctly reconstruct the phylogeny of 10 mammalian genomes. Further, we used various machine learning algorithms to demonstrate that CpG islands sequences can be characterized using k-mers. In addition, by testing a human model on the nine different mammalian genomes, we provide the first evidence that k-mer signatures are consistent with evolutionary history.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Chae</LastName>
<ForeName>Heejoon</ForeName>
<Initials>H</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Park</LastName>
<ForeName>Jinwoo</ForeName>
<Initials>J</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Lee</LastName>
<ForeName>Seong-Whan</ForeName>
<Initials>SW</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Nephew</LastName>
<ForeName>Kenneth P</ForeName>
<Initials>KP</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Kim</LastName>
<ForeName>Sun</ForeName>
<Initials>S</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>CA113001</GrantID>
<Acronym>CA</Acronym>
<Agency>NCI NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2013</Year>
<Month>03</Month>
<Day>21</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Nucleic Acids Res</MedlineTA>
<NlmUniqueID>0411011</NlmUniqueID>
<ISSNLinking>0305-1048</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001185" MajorTopicYN="N">Artificial Intelligence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D018899" MajorTopicYN="Y">CpG Islands</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019143" MajorTopicYN="Y">Evolution, Molecular</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D023281" MajorTopicYN="N">Genomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008322" MajorTopicYN="N">Mammals</DescriptorName>
<QualifierName UI="Q000145" MajorTopicYN="N">classification</QualifierName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D010802" MajorTopicYN="N">Phylogeny</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2013</Year>
<Month>3</Month>
<Day>23</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2013</Year>
<Month>3</Month>
<Day>23</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2013</Year>
<Month>7</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">23519616</ArticleId>
<ArticleId IdType="pii">gkt144</ArticleId>
<ArticleId IdType="doi">10.1093/nar/gkt144</ArticleId>
<ArticleId IdType="pmc">PMC3643570</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Nat Biotechnol. 2010 Oct;28(10):1057-68</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20944598</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2010 Oct;28(10):1049-52</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20944596</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2012 Jan 31;109(5):1601-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22307618</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Methods Mol Biol. 2012;856:431-67</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22399470</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Genet. 2012 Jul;13(7):484-92</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22641018</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Genet. 2012 Oct;13(10):705-19</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22986265</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Epigenetics. 2012 Oct;7(10):1188-99</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22968434</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2002 Mar 19;99(6):3740-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11891299</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cell Mol Life Sci. 2003 Aug;60(8):1647-58</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14504655</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Gene. 2004 May 26;333:143-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15177689</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1967 May;57(5):1394-400</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">5231746</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1987 Jul 20;196(2):261-82</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3656447</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Biol Evol. 1987 Jul;4(4):406-25</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3447015</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Comput Appl Biosci. 1991 Jul;7(3):287-93</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1913208</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358-62</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1741388</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genomics. 1992 Aug;13(4):1095-107</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1505946</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Jun 1;21(11):2783-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15774554</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2005;33(20):e176</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16314307</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2006 Jan 31;103(5):1412-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16432200</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Genet. 2006 Mar;2(3):e26</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16520826</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2006;7:315</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16792795</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Biostatistics. 2010 Jul;11(3):499-514</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20212320</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Dev Growth Differ. 2010 Aug;52(6):545-54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20646027</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cell. 2011 May 27;145(5):773-86</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21620139</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D07 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001D07 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:23519616
   |texte=   Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:23519616" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021