Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters

Identifieur interne : 001837 ( Ncbi/Merge ); précédent : 001836; suivant : 001838

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters

Auteurs : David Pellow ; Darya Filippova ; Carl Kingsford

Source :

RBID : PMC:5467106

Descripteurs français

English descriptors

Abstract

Abstract

Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 – 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.


Url:
DOI: 10.1089/cmb.2016.0155
PubMed: 27828710
PubMed Central: 5467106

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5467106

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using
<italic>k</italic>
-mer Bloom Filters</title>
<author>
<name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27828710</idno>
<idno type="pmc">5467106</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467106</idno>
<idno type="RBID">PMC:5467106</idno>
<idno type="doi">10.1089/cmb.2016.0155</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D98</idno>
<idno type="wicri:Area/Pmc/Curation">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D98</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000823</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000823</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:27828710</idno>
<idno type="wicri:Area/PubMed/Corpus">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000E99</idno>
<idno type="wicri:Area/PubMed/Curation">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000E99</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000C62</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000C62</idno>
<idno type="wicri:Area/Ncbi/Merge">001837</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Improving Bloom Filter Performance on Sequence Data Using
<italic>k</italic>
-mer Bloom Filters</title>
<author>
<name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Biologie informatique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Biologie informatique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>
<bold>Using a sequence's
<italic>k</italic>
-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As
<italic>k</italic>
-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for
<italic>k</italic>
-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of
<italic>k</italic>
-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because
<italic>k</italic>
-mers are derived from sequencing reads, the information about
<italic>k</italic>
-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 – 1.6 times slower. Alternatively, we can leverage
<italic>k</italic>
-mer overlap information to store
<italic>k</italic>
-mer sets in about half the space while maintaining the original FPR. We consider several variants of such
<italic>k</italic>
-mer Bloom filters (
<italic>k</italic>
BFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.</bold>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benoit, G" uniqKey="Benoit G">G. Benoit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, B" uniqKey="Bloom B">B. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Broder, A" uniqKey="Broder A">A. Broder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heo, Y" uniqKey="Heo Y">Y. Heo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holley, G" uniqKey="Holley G">G. Holley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Malde, K" uniqKey="Malde K">K. Malde</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patro, R" uniqKey="Patro R">R. Patro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozov, R" uniqKey="Rozov R">R. Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salikhov, K" uniqKey="Salikhov K">K. Salikhov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, H" uniqKey="Shi H">H. Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Solomon, B" uniqKey="Solomon B">B. Solomon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stranneheim, H" uniqKey="Stranneheim H">H. Stranneheim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, D" uniqKey="Wood D">D. Wood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, Y" uniqKey="Yu Y">Y. Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, D" uniqKey="Zerbino D">D. Zerbino</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="27828710">
<pmc>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using
<italic>k</italic>
-mer Bloom Filters</title>
<author>
<name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27828710</idno>
<idno type="pmc">5467106</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467106</idno>
<idno type="RBID">PMC:5467106</idno>
<idno type="doi">10.1089/cmb.2016.0155</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D98</idno>
<idno type="wicri:Area/Pmc/Curation">000D98</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D98</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000823</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000823</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Improving Bloom Filter Performance on Sequence Data Using
<italic>k</italic>
-mer Bloom Filters</title>
<author>
<name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>
<bold>Using a sequence's
<italic>k</italic>
-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As
<italic>k</italic>
-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for
<italic>k</italic>
-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of
<italic>k</italic>
-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because
<italic>k</italic>
-mers are derived from sequencing reads, the information about
<italic>k</italic>
-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 – 1.6 times slower. Alternatively, we can leverage
<italic>k</italic>
-mer overlap information to store
<italic>k</italic>
-mer sets in about half the space while maintaining the original FPR. We consider several variants of such
<italic>k</italic>
-mer Bloom filters (
<italic>k</italic>
BFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.</bold>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benoit, G" uniqKey="Benoit G">G. Benoit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, B" uniqKey="Bloom B">B. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Broder, A" uniqKey="Broder A">A. Broder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heo, Y" uniqKey="Heo Y">Y. Heo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holley, G" uniqKey="Holley G">G. Holley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Malde, K" uniqKey="Malde K">K. Malde</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G. Marçais</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patro, R" uniqKey="Patro R">R. Patro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pellow, D" uniqKey="Pellow D">D. Pellow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozov, R" uniqKey="Rozov R">R. Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salikhov, K" uniqKey="Salikhov K">K. Salikhov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, H" uniqKey="Shi H">H. Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Solomon, B" uniqKey="Solomon B">B. Solomon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stranneheim, H" uniqKey="Stranneheim H">H. Stranneheim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, D" uniqKey="Wood D">D. Wood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, Y" uniqKey="Yu Y">Y. Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, D" uniqKey="Zerbino D">D. Zerbino</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.</title>
<author>
<name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation wicri:level="1">
<nlm:affiliation>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .</nlm:affiliation>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation wicri:level="2">
<nlm:affiliation>2 Roche Sequencing Solutions , Pleasanton, California.</nlm:affiliation>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>2 Roche Sequencing Solutions , Pleasanton</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="2">
<nlm:affiliation>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2017">2017</date>
<idno type="RBID">pubmed:27828710</idno>
<idno type="pmid">27828710</idno>
<idno type="doi">10.1089/cmb.2016.0155</idno>
<idno type="wicri:Area/PubMed/Corpus">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000E99</idno>
<idno type="wicri:Area/PubMed/Curation">000E99</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000E99</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000C62</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000C62</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.</title>
<author>
<name sortKey="Pellow, David" sort="Pellow, David" uniqKey="Pellow D" first="David" last="Pellow">David Pellow</name>
<affiliation wicri:level="1">
<nlm:affiliation>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv, Israel .</nlm:affiliation>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>1 The Blavatnik School of Computer Science, Tel Aviv University , Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Filippova, Darya" sort="Filippova, Darya" uniqKey="Filippova D" first="Darya" last="Filippova">Darya Filippova</name>
<affiliation wicri:level="2">
<nlm:affiliation>2 Roche Sequencing Solutions , Pleasanton, California.</nlm:affiliation>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
<wicri:cityArea>2 Roche Sequencing Solutions , Pleasanton</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kingsford, Carl" sort="Kingsford, Carl" uniqKey="Kingsford C" first="Carl" last="Kingsford">Carl Kingsford</name>
<affiliation wicri:level="2">
<nlm:affiliation>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>3 Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2017" type="published">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Biologie informatique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computer Simulation</term>
<term>Humans</term>
<term>Probability</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Biologie informatique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Probabilité</term>
<term>Simulation numérique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 - 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.</div>
</front>
</TEI>
</pubmed>
</double>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001837 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001837 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:5467106
   |texte=   Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:27828710" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021