Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

AllSome Sequence Bloom Trees.

Identifieur interne : 001D87 ( Ncbi/Merge ); précédent : 001D86; suivant : 001D88

AllSome Sequence Bloom Trees.

Auteurs : Chen Sun [États-Unis] ; Robert S. Harris [États-Unis] ; Rayan Chikhi [France] ; Paul Medvedev [États-Unis]

Source :

RBID : pubmed:29620920

Descripteurs français

English descriptors

Abstract

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

DOI: 10.1089/cmb.2017.0258
PubMed: 29620920

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:29620920

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">AllSome Sequence Bloom Trees.</title>
<author>
<name sortKey="Sun, Chen" sort="Sun, Chen" uniqKey="Sun C" first="Chen" last="Sun">Chen Sun</name>
<affiliation wicri:level="2">
<nlm:affiliation>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Harris, Robert S" sort="Harris, Robert S" uniqKey="Harris R" first="Robert S" last="Harris">Robert S. Harris</name>
<affiliation wicri:level="2">
<nlm:affiliation>2 Department of Biology, Pennsylvania State University , University Park, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>2 Department of Biology, Pennsylvania State University , University Park</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chikhi, Rayan" sort="Chikhi, Rayan" uniqKey="Chikhi R" first="Rayan" last="Chikhi">Rayan Chikhi</name>
<affiliation wicri:level="3">
<nlm:affiliation>3 CNRS, CRIStAL, University of Lille , Lille, France .</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>3 CNRS, CRIStAL, University of Lille , Lille</wicri:regionArea>
<placeName>
<region type="region">Hauts-de-France</region>
<region type="old region">Nord-Pas-de-Calais</region>
<settlement type="city">Lille</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Medvedev, Paul" sort="Medvedev, Paul" uniqKey="Medvedev P" first="Paul" last="Medvedev">Paul Medvedev</name>
<affiliation wicri:level="2">
<nlm:affiliation>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29620920</idno>
<idno type="pmid">29620920</idno>
<idno type="doi">10.1089/cmb.2017.0258</idno>
<idno type="wicri:Area/PubMed/Corpus">000945</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000945</idno>
<idno type="wicri:Area/PubMed/Curation">000945</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000945</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000A18</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000A18</idno>
<idno type="wicri:Area/Ncbi/Merge">001D87</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">AllSome Sequence Bloom Trees.</title>
<author>
<name sortKey="Sun, Chen" sort="Sun, Chen" uniqKey="Sun C" first="Chen" last="Sun">Chen Sun</name>
<affiliation wicri:level="2">
<nlm:affiliation>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Harris, Robert S" sort="Harris, Robert S" uniqKey="Harris R" first="Robert S" last="Harris">Robert S. Harris</name>
<affiliation wicri:level="2">
<nlm:affiliation>2 Department of Biology, Pennsylvania State University , University Park, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>2 Department of Biology, Pennsylvania State University , University Park</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chikhi, Rayan" sort="Chikhi, Rayan" uniqKey="Chikhi R" first="Rayan" last="Chikhi">Rayan Chikhi</name>
<affiliation wicri:level="3">
<nlm:affiliation>3 CNRS, CRIStAL, University of Lille , Lille, France .</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>3 CNRS, CRIStAL, University of Lille , Lille</wicri:regionArea>
<placeName>
<region type="region">Hauts-de-France</region>
<region type="old region">Nord-Pas-de-Calais</region>
<settlement type="city">Lille</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Medvedev, Paul" sort="Medvedev, Paul" uniqKey="Medvedev P" first="Paul" last="Medvedev">Paul Medvedev</name>
<affiliation wicri:level="2">
<nlm:affiliation>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Blood (metabolism)</term>
<term>Brain (metabolism)</term>
<term>Breast (metabolism)</term>
<term>Computational Biology (methods)</term>
<term>Databases, Nucleic Acid</term>
<term>Female</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
<term>Transcriptome</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données d'acides nucléiques</term>
<term>Biologie informatique ()</term>
<term>Encéphale (métabolisme)</term>
<term>Femelle</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Région mammaire (métabolisme)</term>
<term>Sang (métabolisme)</term>
<term>Séquençage nucléotidique à haut débit ()</term>
<term>Transcriptome</term>
</keywords>
<keywords scheme="MESH" qualifier="metabolism" xml:lang="en">
<term>Blood</term>
<term>Brain</term>
<term>Breast</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="métabolisme" xml:lang="fr">
<term>Encéphale</term>
<term>Région mammaire</term>
<term>Sang</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Databases, Nucleic Acid</term>
<term>Female</term>
<term>Humans</term>
<term>Software</term>
<term>Transcriptome</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données d'acides nucléiques</term>
<term>Biologie informatique</term>
<term>Femelle</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
<term>Transcriptome</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">29620920</PMID>
<DateCompleted>
<Year>2019</Year>
<Month>08</Month>
<Day>28</Day>
</DateCompleted>
<DateRevised>
<Year>2019</Year>
<Month>08</Month>
<Day>28</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1557-8666</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>25</Volume>
<Issue>5</Issue>
<PubDate>
<Year>2018</Year>
<Month>05</Month>
</PubDate>
</JournalIssue>
<Title>Journal of computational biology : a journal of computational molecular cell biology</Title>
<ISOAbbreviation>J. Comput. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>AllSome Sequence Bloom Trees.</ArticleTitle>
<Pagination>
<MedlinePgn>467-479</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1089/cmb.2017.0258</ELocationID>
<Abstract>
<AbstractText>The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Sun</LastName>
<ForeName>Chen</ForeName>
<Initials>C</Initials>
<AffiliationInfo>
<Affiliation>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Harris</LastName>
<ForeName>Robert S</ForeName>
<Initials>RS</Initials>
<AffiliationInfo>
<Affiliation>2 Department of Biology, Pennsylvania State University , University Park, Pennsylvania.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Chikhi</LastName>
<ForeName>Rayan</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>3 CNRS, CRIStAL, University of Lille , Lille, France .</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Medvedev</LastName>
<ForeName>Paul</ForeName>
<Initials>P</Initials>
<AffiliationInfo>
<Affiliation>1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>4 Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park , Pennsylvania.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>5 Genome Sciences Institute of the Huck, Pennsylvania State University, University Park , Pennsylvania.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2018</Year>
<Month>04</Month>
<Day>05</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Comput Biol</MedlineTA>
<NlmUniqueID>9433358</NlmUniqueID>
<ISSNLinking>1066-5277</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001769" MajorTopicYN="N">Blood</DescriptorName>
<QualifierName UI="Q000378" MajorTopicYN="N">metabolism</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001921" MajorTopicYN="N">Brain</DescriptorName>
<QualifierName UI="Q000378" MajorTopicYN="N">metabolism</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D001940" MajorTopicYN="N">Breast</DescriptorName>
<QualifierName UI="Q000378" MajorTopicYN="N">metabolism</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D030561" MajorTopicYN="Y">Databases, Nucleic Acid</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D005260" MajorTopicYN="N">Female</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059467" MajorTopicYN="N">Transcriptome</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="Y">Bloom filters</Keyword>
<Keyword MajorTopicYN="Y">RNA-seq</Keyword>
<Keyword MajorTopicYN="Y">Sequence Bloom Trees</Keyword>
<Keyword MajorTopicYN="Y">algorithms</Keyword>
<Keyword MajorTopicYN="Y">bioinformatics</Keyword>
<Keyword MajorTopicYN="Y">data structures</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2018</Year>
<Month>4</Month>
<Day>6</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>8</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2018</Year>
<Month>4</Month>
<Day>6</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">29620920</ArticleId>
<ArticleId IdType="doi">10.1089/cmb.2017.0258</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>France</li>
<li>États-Unis</li>
</country>
<region>
<li>Hauts-de-France</li>
<li>Nord-Pas-de-Calais</li>
<li>Pennsylvanie</li>
</region>
<settlement>
<li>Lille</li>
</settlement>
</list>
<tree>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Sun, Chen" sort="Sun, Chen" uniqKey="Sun C" first="Chen" last="Sun">Chen Sun</name>
</region>
<name sortKey="Harris, Robert S" sort="Harris, Robert S" uniqKey="Harris R" first="Robert S" last="Harris">Robert S. Harris</name>
<name sortKey="Medvedev, Paul" sort="Medvedev, Paul" uniqKey="Medvedev P" first="Paul" last="Medvedev">Paul Medvedev</name>
</country>
<country name="France">
<region name="Hauts-de-France">
<name sortKey="Chikhi, Rayan" sort="Chikhi, Rayan" uniqKey="Chikhi R" first="Rayan" last="Chikhi">Rayan Chikhi</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D87 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001D87 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:29620920
   |texte=   AllSome Sequence Bloom Trees.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:29620920" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021