Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Faucet: streaming de novo assembly graph construction.

Identifieur interne : 000B15 ( PubMed/Corpus ); précédent : 000B14; suivant : 000B16

Faucet: streaming de novo assembly graph construction.

Auteurs : Roye Rozov ; Gil Goldshlager ; Eran Halperin ; Ron Shamir

Source :

RBID : pubmed:29036597

English descriptors

Abstract

We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.

DOI: 10.1093/bioinformatics/btx471
PubMed: 29036597

Links to Exploration step

pubmed:29036597

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Faucet: streaming de novo assembly graph construction.</title>
<author>
<name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
<affiliation>
<nlm:affiliation>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
<affiliation>
<nlm:affiliation>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
<affiliation>
<nlm:affiliation>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation>
<nlm:affiliation>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29036597</idno>
<idno type="pmid">29036597</idno>
<idno type="doi">10.1093/bioinformatics/btx471</idno>
<idno type="wicri:Area/PubMed/Corpus">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000B15</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Faucet: streaming de novo assembly graph construction.</title>
<author>
<name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
<affiliation>
<nlm:affiliation>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
<affiliation>
<nlm:affiliation>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
<affiliation>
<nlm:affiliation>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation>
<nlm:affiliation>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Microbiota (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Microbiota</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">29036597</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>09</Month>
<Day>26</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>34</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2018</Year>
<Month>01</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Faucet: streaming de novo assembly graph construction.</ArticleTitle>
<Pagination>
<MedlinePgn>147-154</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btx471</ELocationID>
<Abstract>
<AbstractText Label="Motivation">We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.</AbstractText>
<AbstractText Label="Results">Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata-coverage counts collected at junction k-mers and connections bridging between junction pairs-contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency-namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14-110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available.</AbstractText>
<AbstractText Label="Availability and implementation">Faucet is available at https://github.com/Shamir-Lab/Faucet.</AbstractText>
<AbstractText Label="Contact">rshamir@tau.ac.il or eranhalperin@gmail.com.</AbstractText>
<AbstractText Label="Supplementary information">Supplementary data are available at Bioinformatics online.</AbstractText>
<CopyrightInformation>© The Author 2017. Published by Oxford University Press.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Rozov</LastName>
<ForeName>Roye</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Goldshlager</LastName>
<ForeName>Gil</ForeName>
<Initials>G</Initials>
<AffiliationInfo>
<Affiliation>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Halperin</LastName>
<ForeName>Eran</ForeName>
<Initials>E</Initials>
<AffiliationInfo>
<Affiliation>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Shamir</LastName>
<ForeName>Ron</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D023281" MajorTopicYN="N">Genomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D054892" MajorTopicYN="Y">Metagenome</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D064307" MajorTopicYN="N">Microbiota</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2017</Year>
<Month>04</Month>
<Day>12</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2017</Year>
<Month>07</Month>
<Day>21</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2017</Year>
<Month>10</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2018</Year>
<Month>9</Month>
<Day>27</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2017</Year>
<Month>10</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">29036597</ArticleId>
<ArticleId IdType="pii">4004871</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btx471</ArticleId>
<ArticleId IdType="pmc">PMC5870852</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Nat Methods. 2013 Jan;10(1):71-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23160280</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2010 Jun 15;26(12):i367-73</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20529929</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Jun 15;30(12):i293-301</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24931996</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2014;15(11):509</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25398208</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 Feb 15;33(4):475-482</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28003256</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Jun 15;32(12 ):i201-i208</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27307618</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 Dec 15;33(24):4024-4032</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27659452</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2015 Mar;33(3):290-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25690850</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2017 Apr 7;45(6):e43</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27924003</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Genet. 2012 Jan 08;44(2):226-32</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22231483</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2015 May 15;31(10):1674-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25609793</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Dec 15;30(24):3541-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25355787</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2013 Sep 16;8(1):22</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24040893</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11504945</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2013 Apr 15;29(8):1072-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23422339</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2012 Apr 19;13 Suppl 6:S1</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22537038</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2014 Jul 25;9(7):e101271</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25062443</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 May 1;33(9):1324-1330</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28453674</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Nov 1;32(21):3215-3223</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27412092</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2012 May;19(5):455-77</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22506599</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22847406</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2017 May;27(5):824-834</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28298430</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B15 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000B15 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:29036597
   |texte=   Faucet: streaming de novo assembly graph construction.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:29036597" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021