Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.

Identifieur interne : 001264 ( Ncbi/Merge ); précédent : 001263; suivant : 001265

Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.

Auteurs : Weiming Li ; Bin Ma ; Kaizhong Zhang

Source :

RBID : pubmed:26355786

Descripteurs français

English descriptors

Abstract

Large-scale comparison or similarity search of genomic DNA and protein sequence is of fundamental importance in modern molecular biology. To perform DNA and protein sequence similarity search efficiently, seeding (or filtration) method has been widely used where only sequences sharing a common pattern or "seed" are subject to detailed comparison. Therefore these methods trade search sensitivity with search speed. In this paper, we introduce a new seeding method, called spaced k-mer neighbors, which provides a better tradeoff between the sensitivity and speed in protein sequence similarity search. With the method of spaced k-mer neighbors, for each spaced k-mer, a set of spaced k-mers is selected as its neighbors. These pre-selected spaced k-mer neighbors are then used to detect hits between query sequence and database sequences. We propose an efficient heuristic algorithm for the spaced neighbor selection. Our computational experimental results demonstrate that the method of spaced k-mer neighbors can improve the overall tradeoff efficiency over existing seeding methods.

DOI: 10.1109/TCBB.2014.2306831
PubMed: 26355786

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:26355786

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.</title>
<author>
<name sortKey="Li, Weiming" sort="Li, Weiming" uniqKey="Li W" first="Weiming" last="Li">Weiming Li</name>
</author>
<author>
<name sortKey="Ma, Bin" sort="Ma, Bin" uniqKey="Ma B" first="Bin" last="Ma">Bin Ma</name>
</author>
<author>
<name sortKey="Zhang, Kaizhong" sort="Zhang, Kaizhong" uniqKey="Zhang K" first="Kaizhong" last="Zhang">Kaizhong Zhang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="????">
<PubDate>
<MedlineDate>2014 Mar-Apr</MedlineDate>
</PubDate>
</date>
<idno type="RBID">pubmed:26355786</idno>
<idno type="pmid">26355786</idno>
<idno type="doi">10.1109/TCBB.2014.2306831</idno>
<idno type="wicri:Area/PubMed/Corpus">001A41</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001A41</idno>
<idno type="wicri:Area/PubMed/Curation">001A41</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001A41</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002A69</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002A69</idno>
<idno type="wicri:Area/Ncbi/Merge">001264</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.</title>
<author>
<name sortKey="Li, Weiming" sort="Li, Weiming" uniqKey="Li W" first="Weiming" last="Li">Weiming Li</name>
</author>
<author>
<name sortKey="Ma, Bin" sort="Ma, Bin" uniqKey="Ma B" first="Bin" last="Ma">Bin Ma</name>
</author>
<author>
<name sortKey="Zhang, Kaizhong" sort="Zhang, Kaizhong" uniqKey="Zhang K" first="Kaizhong" last="Zhang">Kaizhong Zhang</name>
</author>
</analytic>
<series>
<title level="j">IEEE/ACM transactions on computational biology and bioinformatics</title>
<idno type="eISSN">1557-9964</idno>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Computational Biology (methods)</term>
<term>Drosophila</term>
<term>Humans</term>
<term>Mice</term>
<term>Proteins (chemistry)</term>
<term>Proteins (genetics)</term>
<term>Sequence Analysis, Protein (methods)</term>
<term>Sequence Homology, Amino Acid</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence de protéine ()</term>
<term>Animaux</term>
<term>Biologie informatique ()</term>
<term>Drosophila</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Protéines ()</term>
<term>Protéines (génétique)</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Souris</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Protéines</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, Protein</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Drosophila</term>
<term>Humans</term>
<term>Mice</term>
<term>Sequence Homology, Amino Acid</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence de protéine</term>
<term>Animaux</term>
<term>Biologie informatique</term>
<term>Drosophila</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Protéines</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Souris</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Large-scale comparison or similarity search of genomic DNA and protein sequence is of fundamental importance in modern molecular biology. To perform DNA and protein sequence similarity search efficiently, seeding (or filtration) method has been widely used where only sequences sharing a common pattern or "seed" are subject to detailed comparison. Therefore these methods trade search sensitivity with search speed. In this paper, we introduce a new seeding method, called spaced k-mer neighbors, which provides a better tradeoff between the sensitivity and speed in protein sequence similarity search. With the method of spaced k-mer neighbors, for each spaced k-mer, a set of spaced k-mers is selected as its neighbors. These pre-selected spaced k-mer neighbors are then used to detect hits between query sequence and database sequences. We propose an efficient heuristic algorithm for the spaced neighbor selection. Our computational experimental results demonstrate that the method of spaced k-mer neighbors can improve the overall tradeoff efficiency over existing seeding methods. </div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">26355786</PMID>
<DateCompleted>
<Year>2016</Year>
<Month>03</Month>
<Day>14</Day>
</DateCompleted>
<DateRevised>
<Year>2016</Year>
<Month>10</Month>
<Day>20</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1557-9964</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>11</Volume>
<Issue>2</Issue>
<PubDate>
<MedlineDate>2014 Mar-Apr</MedlineDate>
</PubDate>
</JournalIssue>
<Title>IEEE/ACM transactions on computational biology and bioinformatics</Title>
<ISOAbbreviation>IEEE/ACM Trans Comput Biol Bioinform</ISOAbbreviation>
</Journal>
<ArticleTitle>Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.</ArticleTitle>
<Pagination>
<MedlinePgn>398-406</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1109/TCBB.2014.2306831</ELocationID>
<Abstract>
<AbstractText>Large-scale comparison or similarity search of genomic DNA and protein sequence is of fundamental importance in modern molecular biology. To perform DNA and protein sequence similarity search efficiently, seeding (or filtration) method has been widely used where only sequences sharing a common pattern or "seed" are subject to detailed comparison. Therefore these methods trade search sensitivity with search speed. In this paper, we introduce a new seeding method, called spaced k-mer neighbors, which provides a better tradeoff between the sensitivity and speed in protein sequence similarity search. With the method of spaced k-mer neighbors, for each spaced k-mer, a set of spaced k-mers is selected as its neighbors. These pre-selected spaced k-mer neighbors are then used to detect hits between query sequence and database sequences. We propose an efficient heuristic algorithm for the spaced neighbor selection. Our computational experimental results demonstrate that the method of spaced k-mer neighbors can improve the overall tradeoff efficiency over existing seeding methods. </AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Li</LastName>
<ForeName>Weiming</ForeName>
<Initials>W</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Ma</LastName>
<ForeName>Bin</ForeName>
<Initials>B</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Zhang</LastName>
<ForeName>Kaizhong</ForeName>
<Initials>K</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>IEEE/ACM Trans Comput Biol Bioinform</MedlineTA>
<NlmUniqueID>101196755</NlmUniqueID>
<ISSNLinking>1545-5963</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D011506">Proteins</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D004330" MajorTopicYN="N">Drosophila</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D051379" MajorTopicYN="N">Mice</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011506" MajorTopicYN="N">Proteins</DescriptorName>
<QualifierName UI="Q000737" MajorTopicYN="Y">chemistry</QualifierName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D020539" MajorTopicYN="N">Sequence Analysis, Protein</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017386" MajorTopicYN="Y">Sequence Homology, Amino Acid</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="N">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2015</Year>
<Month>9</Month>
<Day>11</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2014</Year>
<Month>3</Month>
<Day>1</Day>
<Hour>0</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>3</Month>
<Day>15</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">26355786</ArticleId>
<ArticleId IdType="doi">10.1109/TCBB.2014.2306831</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Li, Weiming" sort="Li, Weiming" uniqKey="Li W" first="Weiming" last="Li">Weiming Li</name>
<name sortKey="Ma, Bin" sort="Ma, Bin" uniqKey="Ma B" first="Bin" last="Ma">Bin Ma</name>
<name sortKey="Zhang, Kaizhong" sort="Zhang, Kaizhong" uniqKey="Zhang K" first="Kaizhong" last="Zhang">Kaizhong Zhang</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001264 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001264 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:26355786
   |texte=   Optimizing Spaced k-mer Neighbors for Efficient Filtration in Protein Similarity Search.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:26355786" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021