Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Efficient Algorithms for Finding the Closest l-mers in Biological Data.

Identifieur interne : 000841 ( PubMed/Corpus ); précédent : 000840; suivant : 000842

Efficient Algorithms for Finding the Closest l-mers in Biological Data.

Auteurs : Xingyu Cai ; Abdullah-Al Mamun ; Sanguthevar Rajasekaran

Source :

RBID : pubmed:29993557

Abstract

With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from the voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S1,S2,…,Sm and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X1,X2,…,Xm such that Xi is an l-mer in Si (for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications including motif search. Algorithms for finding the closest l-mers have been used in solving the (l,d)-motif search problem (see e.g., \cite{PeSz00,DBR07}). In this paper novel algorithms are proposed for this problem for the case of . A comprehensive experimental evaluation is performed for m=3, along with a further empirical study of m=4 and 5.

DOI: 10.1109/TCBB.2018.2843364
PubMed: 29993557

Links to Exploration step

pubmed:29993557

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Efficient Algorithms for Finding the Closest l-mers in Biological Data.</title>
<author>
<name sortKey="Cai, Xingyu" sort="Cai, Xingyu" uniqKey="Cai X" first="Xingyu" last="Cai">Xingyu Cai</name>
</author>
<author>
<name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
</author>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29993557</idno>
<idno type="pmid">29993557</idno>
<idno type="doi">10.1109/TCBB.2018.2843364</idno>
<idno type="wicri:Area/PubMed/Corpus">000841</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000841</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Efficient Algorithms for Finding the Closest l-mers in Biological Data.</title>
<author>
<name sortKey="Cai, Xingyu" sort="Cai, Xingyu" uniqKey="Cai X" first="Xingyu" last="Cai">Xingyu Cai</name>
</author>
<author>
<name sortKey="Mamun, Abdullah Al" sort="Mamun, Abdullah Al" uniqKey="Mamun A" first="Abdullah-Al" last="Mamun">Abdullah-Al Mamun</name>
</author>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</author>
</analytic>
<series>
<title level="j">IEEE/ACM transactions on computational biology and bioinformatics</title>
<idno type="eISSN">1557-9964</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from the voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S
<sub>1</sub>
,S
<sub>2</sub>
,…,S
<sub>m</sub>
and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X
<sub>1</sub>
,X
<sub>2</sub>
,…,X
<sub>m</sub>
such that X
<sub>i</sub>
is an l-mer in S
<sub>i</sub>
(for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications including motif search. Algorithms for finding the closest l-mers have been used in solving the (l,d)-motif search problem (see e.g., \cite{PeSz00,DBR07}). In this paper novel algorithms are proposed for this problem for the case of . A comprehensive experimental evaluation is performed for m=3, along with a further empirical study of m=4 and 5.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">29993557</PMID>
<DateRevised>
<Year>2019</Year>
<Month>11</Month>
<Day>14</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1557-9964</ISSN>
<JournalIssue CitedMedium="Internet">
<PubDate>
<Year>2018</Year>
<Month>Jun</Month>
<Day>04</Day>
</PubDate>
</JournalIssue>
<Title>IEEE/ACM transactions on computational biology and bioinformatics</Title>
<ISOAbbreviation>IEEE/ACM Trans Comput Biol Bioinform</ISOAbbreviation>
</Journal>
<ArticleTitle>Efficient Algorithms for Finding the Closest l-mers in Biological Data.</ArticleTitle>
<ELocationID EIdType="doi" ValidYN="Y">10.1109/TCBB.2018.2843364</ELocationID>
<Abstract>
<AbstractText>With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from the voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S
<sub>1</sub>
,S
<sub>2</sub>
,…,S
<sub>m</sub>
and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X
<sub>1</sub>
,X
<sub>2</sub>
,…,X
<sub>m</sub>
such that X
<sub>i</sub>
is an l-mer in S
<sub>i</sub>
(for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications including motif search. Algorithms for finding the closest l-mers have been used in solving the (l,d)-motif search problem (see e.g., \cite{PeSz00,DBR07}). In this paper novel algorithms are proposed for this problem for the case of . A comprehensive experimental evaluation is performed for m=3, along with a further empirical study of m=4 and 5.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Cai</LastName>
<ForeName>Xingyu</ForeName>
<Initials>X</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Mamun</LastName>
<ForeName>Abdullah-Al</ForeName>
<Initials>AA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Rajasekaran</LastName>
<ForeName>Sanguthevar</ForeName>
<Initials>S</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2018</Year>
<Month>06</Month>
<Day>04</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>IEEE/ACM Trans Comput Biol Bioinform</MedlineTA>
<NlmUniqueID>101196755</NlmUniqueID>
<ISSNLinking>1545-5963</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>aheadofprint</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">29993557</ArticleId>
<ArticleId IdType="doi">10.1109/TCBB.2018.2843364</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000841 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000841 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:29993557
   |texte=   Efficient Algorithms for Finding the Closest l-mers in Biological Data.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:29993557" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021