Efficient Algorithms for Finding the Closest l-mers in Biological Data.
Identifieur interne : 000841 ( PubMed/Corpus ); précédent : 000840; suivant : 000842Efficient Algorithms for Finding the Closest l-mers in Biological Data.
Auteurs : Xingyu Cai ; Abdullah-Al Mamun ; Sanguthevar RajasekaranSource :
- IEEE/ACM transactions on computational biology and bioinformatics [ 1557-9964 ] ; 2018.
Abstract
With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from the voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S1,S2,…,Sm and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X1,X2,…,Xm such that Xi is an l-mer in Si (for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications including motif search. Algorithms for finding the closest l-mers have been used in solving the (l,d)-motif search problem (see e.g., \cite{PeSz00,DBR07}). In this paper novel algorithms are proposed for this problem for the case of . A comprehensive experimental evaluation is performed for m=3, along with a further empirical study of m=4 and 5.
DOI: 10.1109/TCBB.2018.2843364
PubMed: 29993557
Links to Exploration step
pubmed:29993557Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Efficient Algorithms for Finding the Closest l-mers in Biological Data.</title>
<author><name sortKey="Cai, Xingyu" sort="Cai, Xingyu" uniqKey="Cai X" first="Xingyu" last="Cai">Xingyu Cai</name>
</author>
<author><name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29993557</idno>
<idno type="pmid">29993557</idno>
<idno type="doi">10.1109/TCBB.2018.2843364</idno>
<idno type="wicri:Area/PubMed/Corpus">000841</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000841</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Efficient Algorithms for Finding the Closest l-mers in Biological Data.</title>
<author><name sortKey="Cai, Xingyu" sort="Cai, Xingyu" uniqKey="Cai X" first="Xingyu" last="Cai">Xingyu Cai</name>
</author>
<author><name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
</author>
<author><name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</author>
</analytic>
<series><title level="j">IEEE/ACM transactions on computational biology and bioinformatics</title>
<idno type="eISSN">1557-9964</idno>
<imprint><date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from the voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S<sub>1</sub>
,S<sub>2</sub>
,…,S<sub>m</sub>
and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X<sub>1</sub>
,X<sub>2</sub>
,…,X<sub>m</sub>
such that X<sub>i</sub>
is an l-mer in S<sub>i</sub>
(for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications including motif search. Algorithms for finding the closest l-mers have been used in solving the (l,d)-motif search problem (see e.g., \cite{PeSz00,DBR07}). In this paper novel algorithms are proposed for this problem for the case of . A comprehensive experimental evaluation is performed for m=3, along with a further empirical study of m=4 and 5.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="Publisher" Owner="NLM"><PMID Version="1">29993557</PMID>
<DateRevised><Year>2019</Year>
<Month>11</Month>
<Day>14</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1557-9964</ISSN>
<JournalIssue CitedMedium="Internet"><PubDate><Year>2018</Year>
<Month>Jun</Month>
<Day>04</Day>
</PubDate>
</JournalIssue>
<Title>IEEE/ACM transactions on computational biology and bioinformatics</Title>
<ISOAbbreviation>IEEE/ACM Trans Comput Biol Bioinform</ISOAbbreviation>
</Journal>
<ArticleTitle>Efficient Algorithms for Finding the Closest l-mers in Biological Data.</ArticleTitle>
<ELocationID EIdType="doi" ValidYN="Y">10.1109/TCBB.2018.2843364</ELocationID>
<Abstract><AbstractText>With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from the voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S<sub>1</sub>
,S<sub>2</sub>
,…,S<sub>m</sub>
and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X<sub>1</sub>
,X<sub>2</sub>
,…,X<sub>m</sub>
such that X<sub>i</sub>
is an l-mer in S<sub>i</sub>
(for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications including motif search. Algorithms for finding the closest l-mers have been used in solving the (l,d)-motif search problem (see e.g., \cite{PeSz00,DBR07}). In this paper novel algorithms are proposed for this problem for the case of . A comprehensive experimental evaluation is performed for m=3, along with a further empirical study of m=4 and 5.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Cai</LastName>
<ForeName>Xingyu</ForeName>
<Initials>X</Initials>
</Author>
<Author ValidYN="Y"><LastName>Mamun</LastName>
<ForeName>Abdullah-Al</ForeName>
<Initials>AA</Initials>
</Author>
<Author ValidYN="Y"><LastName>Rajasekaran</LastName>
<ForeName>Sanguthevar</ForeName>
<Initials>S</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2018</Year>
<Month>06</Month>
<Day>04</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>IEEE/ACM Trans Comput Biol Bioinform</MedlineTA>
<NlmUniqueID>101196755</NlmUniqueID>
<ISSNLinking>1545-5963</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="pubmed"><Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>aheadofprint</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">29993557</ArticleId>
<ArticleId IdType="doi">10.1109/TCBB.2018.2843364</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000841 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000841 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:29993557 |texte= Efficient Algorithms for Finding the Closest l-mers in Biological Data. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:29993557" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |