Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

Identifieur interne : 001447 ( PubMed/Corpus ); précédent : 001446; suivant : 001448

Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.

Auteurs : Ezzeddin Kamil Mohamed Hashim ; Rosni Abdullah

Source :

RBID : pubmed:26427337

English descriptors

Abstract

Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets.

DOI: 10.1016/j.jtbi.2015.09.014
PubMed: 26427337

Links to Exploration step

pubmed:26427337

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.</title>
<author>
<name sortKey="Mohamed Hashim, Ezzeddin Kamil" sort="Mohamed Hashim, Ezzeddin Kamil" uniqKey="Mohamed Hashim E" first="Ezzeddin Kamil" last="Mohamed Hashim">Ezzeddin Kamil Mohamed Hashim</name>
<affiliation>
<nlm:affiliation>School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia. Electronic address: ezzeddin@usm.my.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Abdullah, Rosni" sort="Abdullah, Rosni" uniqKey="Abdullah R" first="Rosni" last="Abdullah">Rosni Abdullah</name>
<affiliation>
<nlm:affiliation>School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia; National Advanced IPv6 Centre of Excellence (NAv6), School of Computer Sciences Building, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2015">2015</date>
<idno type="RBID">pubmed:26427337</idno>
<idno type="pmid">26427337</idno>
<idno type="doi">10.1016/j.jtbi.2015.09.014</idno>
<idno type="wicri:Area/PubMed/Corpus">001447</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001447</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.</title>
<author>
<name sortKey="Mohamed Hashim, Ezzeddin Kamil" sort="Mohamed Hashim, Ezzeddin Kamil" uniqKey="Mohamed Hashim E" first="Ezzeddin Kamil" last="Mohamed Hashim">Ezzeddin Kamil Mohamed Hashim</name>
<affiliation>
<nlm:affiliation>School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia. Electronic address: ezzeddin@usm.my.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Abdullah, Rosni" sort="Abdullah, Rosni" uniqKey="Abdullah R" first="Rosni" last="Abdullah">Rosni Abdullah</name>
<affiliation>
<nlm:affiliation>School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia; National Advanced IPv6 Centre of Excellence (NAv6), School of Computer Sciences Building, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of theoretical biology</title>
<idno type="eISSN">1095-8541</idno>
<imprint>
<date when="2015" type="published">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Animals</term>
<term>Birds (genetics)</term>
<term>Chromosomes, Human, Pair 21 (chemistry)</term>
<term>Computational Biology</term>
<term>CpG Islands (genetics)</term>
<term>DNA (genetics)</term>
<term>Databases, Genetic</term>
<term>Fishes (genetics)</term>
<term>Humans</term>
<term>Mammals (genetics)</term>
<term>Nucleotide Motifs (genetics)</term>
<term>Promoter Regions, Genetic</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="chemistry" xml:lang="en">
<term>Chromosomes, Human, Pair 21</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Birds</term>
<term>CpG Islands</term>
<term>Fishes</term>
<term>Mammals</term>
<term>Nucleotide Motifs</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Animals</term>
<term>Computational Biology</term>
<term>Databases, Genetic</term>
<term>Humans</term>
<term>Promoter Regions, Genetic</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. </div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">26427337</PMID>
<DateCompleted>
<Year>2016</Year>
<Month>09</Month>
<Day>12</Day>
</DateCompleted>
<DateRevised>
<Year>2015</Year>
<Month>11</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1095-8541</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>387</Volume>
<PubDate>
<Year>2015</Year>
<Month>Dec</Month>
<Day>21</Day>
</PubDate>
</JournalIssue>
<Title>Journal of theoretical biology</Title>
<ISOAbbreviation>J. Theor. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.</ArticleTitle>
<Pagination>
<MedlinePgn>88-100</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1016/j.jtbi.2015.09.014</ELocationID>
<ELocationID EIdType="pii" ValidYN="Y">S0022-5193(15)00457-9</ELocationID>
<Abstract>
<AbstractText>Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. </AbstractText>
<CopyrightInformation>Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Mohamed Hashim</LastName>
<ForeName>Ezzeddin Kamil</ForeName>
<Initials>EK</Initials>
<AffiliationInfo>
<Affiliation>School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia. Electronic address: ezzeddin@usm.my.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Abdullah</LastName>
<ForeName>Rosni</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia; National Advanced IPv6 Centre of Excellence (NAv6), School of Computer Sciences Building, Universiti Sains Malaysia, 11800 Gelugor, Penang, Malaysia.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2015</Year>
<Month>09</Month>
<Day>30</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>J Theor Biol</MedlineTA>
<NlmUniqueID>0376342</NlmUniqueID>
<ISSNLinking>0022-5193</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>9007-49-2</RegistryNumber>
<NameOfSubstance UI="D004247">DNA</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001717" MajorTopicYN="N">Birds</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D002891" MajorTopicYN="N">Chromosomes, Human, Pair 21</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="N">chemistry</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D018899" MajorTopicYN="N">CpG Islands</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004247" MajorTopicYN="N">DNA</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D030541" MajorTopicYN="N">Databases, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D005399" MajorTopicYN="N">Fishes</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008322" MajorTopicYN="N">Mammals</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059372" MajorTopicYN="N">Nucleotide Motifs</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011401" MajorTopicYN="Y">Promoter Regions, Genetic</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">CGI</Keyword>
<Keyword MajorTopicYN="N">Classification</Keyword>
<Keyword MajorTopicYN="N">Genome</Keyword>
<Keyword MajorTopicYN="N">Rare-word</Keyword>
<Keyword MajorTopicYN="N">k-tuple</Keyword>
<Keyword MajorTopicYN="N">n-mer</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2015</Year>
<Month>01</Month>
<Day>01</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2015</Year>
<Month>09</Month>
<Day>10</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2015</Year>
<Month>09</Month>
<Day>15</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2015</Year>
<Month>10</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2015</Year>
<Month>10</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>9</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">26427337</ArticleId>
<ArticleId IdType="pii">S0022-5193(15)00457-9</ArticleId>
<ArticleId IdType="doi">10.1016/j.jtbi.2015.09.014</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001447 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001447 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:26427337
   |texte=   Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:26427337" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021