Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

FSH: fast spaced seed hashing exploiting adjacent hashes.

Identifieur interne : 000955 ( PubMed/Corpus ); précédent : 000954; suivant : 000956

FSH: fast spaced seed hashing exploiting adjacent hashes.

Auteurs : Samuele Girotto ; Matteo Comin ; Cinzia Pizzi

Source :

RBID : pubmed:29588651

Abstract

Patterns with wildcards in specified positions, namely spaced seeds, are increasingly used instead of k-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k-mers can be rapidly computed by exploiting the large overlap between consecutive k-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.

DOI: 10.1186/s13015-018-0125-4
PubMed: 29588651

Links to Exploration step

pubmed:29588651

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">FSH: fast spaced seed hashing exploiting adjacent hashes.</title>
<author>
<name sortKey="Girotto, Samuele" sort="Girotto, Samuele" uniqKey="Girotto S" first="Samuele" last="Girotto">Samuele Girotto</name>
<affiliation>
<nlm:affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation>
<nlm:affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
<affiliation>
<nlm:affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29588651</idno>
<idno type="pmid">29588651</idno>
<idno type="doi">10.1186/s13015-018-0125-4</idno>
<idno type="wicri:Area/PubMed/Corpus">000955</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000955</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">FSH: fast spaced seed hashing exploiting adjacent hashes.</title>
<author>
<name sortKey="Girotto, Samuele" sort="Girotto, Samuele" uniqKey="Girotto S" first="Samuele" last="Girotto">Samuele Girotto</name>
<affiliation>
<nlm:affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation>
<nlm:affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
<affiliation>
<nlm:affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Patterns with wildcards in specified positions, namely
<i>spaced seeds</i>
, are increasingly used instead of
<i>k</i>
-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of
<i>k</i>
-mers can be rapidly computed by exploiting the large overlap between consecutive
<i>k</i>
-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">29588651</PMID>
<DateRevised>
<Year>2019</Year>
<Month>11</Month>
<Day>20</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Print">1748-7188</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>13</Volume>
<PubDate>
<Year>2018</Year>
</PubDate>
</JournalIssue>
<Title>Algorithms for molecular biology : AMB</Title>
<ISOAbbreviation>Algorithms Mol Biol</ISOAbbreviation>
</Journal>
<ArticleTitle>FSH: fast spaced seed hashing exploiting adjacent hashes.</ArticleTitle>
<Pagination>
<MedlinePgn>8</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/s13015-018-0125-4</ELocationID>
<Abstract>
<AbstractText Label="Background" NlmCategory="UNASSIGNED">Patterns with wildcards in specified positions, namely
<i>spaced seeds</i>
, are increasingly used instead of
<i>k</i>
-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of
<i>k</i>
-mers can be rapidly computed by exploiting the large overlap between consecutive
<i>k</i>
-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.</AbstractText>
<AbstractText Label="Results" NlmCategory="UNASSIGNED">The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6[Formula: see text] to 5.3[Formula: see text], depending on the structure of the spaced seed.</AbstractText>
<AbstractText Label="Conclusions" NlmCategory="UNASSIGNED">Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.</AbstractText>
<AbstractText Label="Availability" NlmCategory="UNASSIGNED">The software FSH is freely available for academic use at: https://bitbucket.org/samu661/fsh/overview.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Girotto</LastName>
<ForeName>Samuele</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</Affiliation>
<Identifier Source="ISNI">0000 0004 1757 3470</Identifier>
<Identifier Source="GRID">grid.5608.b</Identifier>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Comin</LastName>
<ForeName>Matteo</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</Affiliation>
<Identifier Source="ISNI">0000 0004 1757 3470</Identifier>
<Identifier Source="GRID">grid.5608.b</Identifier>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Pizzi</LastName>
<ForeName>Cinzia</ForeName>
<Initials>C</Initials>
<Identifier Source="ORCID">0000-0002-6616-4003</Identifier>
<AffiliationInfo>
<Affiliation>Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy.</Affiliation>
<Identifier Source="ISNI">0000 0004 1757 3470</Identifier>
<Identifier Source="GRID">grid.5608.b</Identifier>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2018</Year>
<Month>03</Month>
<Day>22</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Algorithms Mol Biol</MedlineTA>
<NlmUniqueID>101265088</NlmUniqueID>
<ISSNLinking>1748-7188</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Efficient hashing</Keyword>
<Keyword MajorTopicYN="N">K-mers</Keyword>
<Keyword MajorTopicYN="N">Spaced seeds</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2017</Year>
<Month>10</Month>
<Day>31</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2018</Year>
<Month>03</Month>
<Day>12</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2018</Year>
<Month>3</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2018</Year>
<Month>3</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2018</Year>
<Month>3</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">29588651</ArticleId>
<ArticleId IdType="doi">10.1186/s13015-018-0125-4</ArticleId>
<ArticleId IdType="pii">125</ArticleId>
<ArticleId IdType="pmc">PMC5863468</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>PLoS Comput Biol. 2016 Oct 19;12 (10 ):e1005107</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27760124</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Dec 15;32(24):3823-3825</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27540266</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Sep 1;32(17 ):i538-i544</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27587672</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2002 Mar;18(3):440-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11934743</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2001 May;17(5):419-28</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11331236</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Nov 15;32(22):3492-3494</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27423894</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2015 Mar 25;16:236</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25879410</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Sci Rep. 2016 Jan 18;6:19233</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26778510</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1990 Oct 5;215(3):403-10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">2231712</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):500-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26356018</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2009 May;5(5):e1000386</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19461883</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2015 Nov 15;31(22):3584-92</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26209798</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Bioinform Comput Biol. 2004 Dec;2(4):819-42</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15617167</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2015 Jan 28;10:4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25691913</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2014 Mar 03;15(3):R46</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24580807</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Sep 1;32(17 ):i567-i575</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27587676</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Jul 15;30(14):1991-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24700317</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Sep 1;27(17):2433-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21690104</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000955 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000955 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:29588651
   |texte=   FSH: fast spaced seed hashing exploiting adjacent hashes.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:29588651" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021