Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters
Identifieur interne : 001837 ( Ncbi/Merge ); précédent : 001836; suivant : 001838Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters
Auteurs : David Pellow ; Darya Filippova ; Carl KingsfordSource :
- Journal of Computational Biology [ 1066-5277 ] ; 2017.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
Abstract
Url:
DOI: 10.1089/cmb.2016.0155
PubMed: 27828710
PubMed Central: 5467106
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000D98
- to stream Pmc, to step Curation: 000D98
- to stream Pmc, to step Checkpoint: 000823
- to stream PubMed, to step Corpus: 000E99
- to stream PubMed, to step Curation: 000E99
- to stream PubMed, to step Checkpoint: 000C62
Links to Exploration step
PMC:5467106Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using <italic>k</italic>
-mer Bloom Filters</title>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation><nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation><nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation><nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27828710</idno>
<idno type="pmc">5467106</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467106</idno>
<idno type="RBID">PMC:5467106</idno>
<idno type="doi">10.1089/cmb.2016.0155</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D98</idno>
<idno type="wicri:Area/Pmc/Curation">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D98</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000823</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000823</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:27828710</idno>
<idno type="wicri:Area/PubMed/Corpus">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000E99</idno>
<idno type="wicri:Area/PubMed/Curation">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000E99</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000C62</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000C62</idno>
<idno type="wicri:Area/Ncbi/Merge">001837</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Improving Bloom Filter Performance on Sequence Data Using <italic>k</italic>
-mer Bloom Filters</title>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation><nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation><nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation><nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint><date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Biologie informatique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Biologie informatique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Abstract</title>
<p><bold>Using a sequence's <italic>k</italic>
-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As <italic>k</italic>
-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for <italic>k</italic>
-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of <italic>k</italic>
-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because <italic>k</italic>
-mers are derived from sequencing reads, the information about <italic>k</italic>
-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 – 1.6 times slower. Alternatively, we can leverage <italic>k</italic>
-mer overlap information to store <italic>k</italic>
-mer sets in about half the space while maintaining the original FPR. We consider several variants of such <italic>k</italic>
-mer Bloom filters (<italic>k</italic>
BFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.</bold>
</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Benoit, G" uniqKey="Benoit G">G. Benoit</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bloom, B" uniqKey="Bloom B">B. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Broder, A" uniqKey="Broder A">A. Broder</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heo, Y" uniqKey="Heo Y">Y. Heo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Holley, G" uniqKey="Holley G">G. Holley</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Malde, K" uniqKey="Malde K">K. Malde</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Patro, R" uniqKey="Patro R">R. Patro</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rozov, R" uniqKey="Rozov R">R. Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salikhov, K" uniqKey="Salikhov K">K. Salikhov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shi, H" uniqKey="Shi H">H. Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Solomon, B" uniqKey="Solomon B">B. Solomon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Stranneheim, H" uniqKey="Stranneheim H">H. Stranneheim</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, D" uniqKey="Wood D">D. Wood</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yu, Y" uniqKey="Yu Y">Y. Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zerbino, D" uniqKey="Zerbino D">D. Zerbino</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="27828710"><pmc><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using <italic>k</italic>
-mer Bloom Filters</title>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation><nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation><nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation><nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27828710</idno>
<idno type="pmc">5467106</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467106</idno>
<idno type="RBID">PMC:5467106</idno>
<idno type="doi">10.1089/cmb.2016.0155</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D98</idno>
<idno type="wicri:Area/Pmc/Curation">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D98</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000823</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000823</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Improving Bloom Filter Performance on Sequence Data Using <italic>k</italic>
-mer Bloom Filters</title>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation><nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation><nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation><nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint><date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Abstract</title>
<p><bold>Using a sequence's <italic>k</italic>
-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As <italic>k</italic>
-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for <italic>k</italic>
-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of <italic>k</italic>
-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because <italic>k</italic>
-mers are derived from sequencing reads, the information about <italic>k</italic>
-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 – 1.6 times slower. Alternatively, we can leverage <italic>k</italic>
-mer overlap information to store <italic>k</italic>
-mer sets in about half the space while maintaining the original FPR. We consider several variants of such <italic>k</italic>
-mer Bloom filters (<italic>k</italic>
BFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.</bold>
</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Benoit, G" uniqKey="Benoit G">G. Benoit</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bloom, B" uniqKey="Bloom B">B. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Broder, A" uniqKey="Broder A">A. Broder</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heo, Y" uniqKey="Heo Y">Y. Heo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Holley, G" uniqKey="Holley G">G. Holley</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Malde, K" uniqKey="Malde K">K. Malde</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Patro, R" uniqKey="Patro R">R. Patro</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rozov, R" uniqKey="Rozov R">R. Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salikhov, K" uniqKey="Salikhov K">K. Salikhov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shi, H" uniqKey="Shi H">H. Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Solomon, B" uniqKey="Solomon B">B. Solomon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Stranneheim, H" uniqKey="Stranneheim H">H. Stranneheim</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, D" uniqKey="Wood D">D. Wood</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yu, Y" uniqKey="Yu Y">Y. Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zerbino, D" uniqKey="Zerbino D">D. Zerbino</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.</title>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation wicri:level="1"><nlm:affiliation>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .</nlm:affiliation>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation wicri:level="2"><nlm:affiliation>2 Roche Sequencing Solutions , Pleasanton, California.</nlm:affiliation>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>2 Roche Sequencing Solutions , Pleasanton</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="2"><nlm:affiliation>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:27828710</idno>
<idno type="pmid">27828710</idno>
<idno type="doi">10.1089/cmb.2016.0155</idno>
<idno type="wicri:Area/PubMed/Corpus">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000E99</idno>
<idno type="wicri:Area/PubMed/Curation">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000E99</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000C62</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000C62</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.</title>
<author><name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation wicri:level="1"><nlm:affiliation>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .</nlm:affiliation>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation wicri:level="2"><nlm:affiliation>2 Roche Sequencing Solutions , Pleasanton, California.</nlm:affiliation>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>2 Roche Sequencing Solutions , Pleasanton</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="2"><nlm:affiliation>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint><date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Biologie informatique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Biologie informatique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 - 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.</div>
</front>
</TEI>
</pubmed>
</double>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001837 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001837 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Ncbi |étape= Merge |type= RBID |clé= PMC:5467106 |texte= Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i -Sk "pubmed:27828710" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |