Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Efficient sequential and parallel algorithms for planted motif search.

Identifieur interne : 001A68 ( PubMed/Corpus ); précédent : 001A67; suivant : 001A69

Efficient sequential and parallel algorithms for planted motif search.

Auteurs : Marius Nicolae ; Sanguthevar Rajasekaran

Source :

RBID : pubmed:24479443

English descriptors

Abstract

Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.

DOI: 10.1186/1471-2105-15-34
PubMed: 24479443

Links to Exploration step

pubmed:24479443

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Efficient sequential and parallel algorithms for planted motif search.</title>
<author>
<name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
<affiliation>
<nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:24479443</idno>
<idno type="pmid">24479443</idno>
<idno type="doi">10.1186/1471-2105-15-34</idno>
<idno type="wicri:Area/PubMed/Corpus">001A68</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A68</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Efficient sequential and parallel algorithms for planted motif search.</title>
<author>
<name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
<affiliation>
<nlm:affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</author>
</analytic>
<series>
<title level="j">BMC bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>DNA (chemistry)</term>
<term>DNA (genetics)</term>
<term>Proteins (chemistry)</term>
<term>Proteins (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Sequence Analysis, Protein (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>DNA</term>
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>DNA</term>
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
<term>Sequence Analysis, Protein</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">24479443</PMID>
<DateCompleted>
<Year>2014</Year>
<Month>06</Month>
<Day>15</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">1471-2105</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>15</Volume>
<PubDate>
<Year>2014</Year>
<Month>Jan</Month>
<Day>31</Day>
</PubDate>
</JournalIssue>
<Title>BMC bioinformatics</Title>
<ISOAbbreviation>BMC Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Efficient sequential and parallel algorithms for planted motif search.</ArticleTitle>
<Pagination>
<MedlinePgn>34</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/1471-2105-15-34</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Nicolae</LastName>
<ForeName>Marius</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA. marius.nicolae@engr.uconn.edu.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Rajasekaran</LastName>
<ForeName>Sanguthevar</ForeName>
<Initials>S</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>R01-LM010101</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2014</Year>
<Month>01</Month>
<Day>31</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>BMC Bioinformatics</MedlineTA>
<NlmUniqueID>100965194</NlmUniqueID>
<ISSNLinking>1471-2105</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D011506">Proteins</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>9007-49-2</RegistryNumber>
<NameOfSubstance UI="D004247">DNA</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004247" MajorTopicYN="N">DNA</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="N">chemistry</QualifierName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011506" MajorTopicYN="N">Proteins</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="N">chemistry</QualifierName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020539" MajorTopicYN="N">Sequence Analysis, Protein</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2013</Year>
<Month>03</Month>
<Day>15</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2014</Year>
<Month>01</Month>
<Day>27</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2014</Year>
<Month>2</Month>
<Day>1</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2014</Year>
<Month>2</Month>
<Day>1</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2014</Year>
<Month>6</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">24479443</ArticleId>
<ArticleId IdType="pii">1471-2105-15-34</ArticleId>
<ArticleId IdType="doi">10.1186/1471-2105-15-34</ArticleId>
<ArticleId IdType="pmc">PMC3924400</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>BMC Bioinformatics. 2011;12:410</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22024209</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Oct 1;27(19):2641-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21821665</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Res Notes. 2011 Mar 08;4:54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21385438</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2009 Oct 29;4:14</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19874606</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17975266</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2012;13 Suppl 17:S10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23281969</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2005 Jan;23(1):137-44</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15637633</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2012;7(7):e41425</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22848493</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2012;7(10):e48442</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23119020</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A68 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001A68 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:24479443
   |texte=   Efficient sequential and parallel algorithms for planted motif search.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:24479443" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021