Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
Identifieur interne : 000394 ( Main/Exploration ); précédent : 000393; suivant : 000395Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
Auteurs : Kanak Mahadik [États-Unis] ; Christopher Wright [États-Unis] ; Milind Kulkarni [États-Unis] ; Saurabh Bagchi [États-Unis] ; Somali Chaterji [États-Unis]Source :
- Scientific Reports [ 2045-2322 ] ; 2019.
Abstract
Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable
Url:
DOI: 10.1038/s41598-019-51284-9
PubMed: 31619717
PubMed Central: 6795807
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000461
- to stream Pmc, to step Curation: 000461
- to stream Pmc, to step Checkpoint: 000190
- to stream PubMed, to step Corpus: 000393
- to stream PubMed, to step Curation: 000393
- to stream PubMed, to step Checkpoint: 000392
- to stream Ncbi, to step Merge: 002369
- to stream Ncbi, to step Curation: 002369
- to stream Ncbi, to step Checkpoint: 002369
- to stream Main, to step Merge: 000397
- to stream Main, to step Curation: 000394
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Scalable Genome Assembly through Parallel <italic>de Bruijn</italic>
Graph Construction for Multiple <italic>k</italic>
-mers</title>
<author><name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1"><nlm:aff id="Aff1">Adobe Research, San Jose, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">31619717</idno>
<idno type="pmc">6795807</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6795807</idno>
<idno type="RBID">PMC:6795807</idno>
<idno type="doi">10.1038/s41598-019-51284-9</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000461</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000461</idno>
<idno type="wicri:Area/Pmc/Curation">000461</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000461</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000190</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000190</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:31619717</idno>
<idno type="wicri:Area/PubMed/Corpus">000393</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000393</idno>
<idno type="wicri:Area/PubMed/Curation">000393</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000393</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000392</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000392</idno>
<idno type="wicri:Area/Ncbi/Merge">002369</idno>
<idno type="wicri:Area/Ncbi/Curation">002369</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">002369</idno>
<idno type="wicri:Area/Main/Merge">000397</idno>
<idno type="wicri:Area/Main/Curation">000394</idno>
<idno type="wicri:Area/Main/Exploration">000394</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Scalable Genome Assembly through Parallel <italic>de Bruijn</italic>
Graph Construction for Multiple <italic>k</italic>
-mers</title>
<author><name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1"><nlm:aff id="Aff1">Adobe Research, San Jose, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName><region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint><date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p id="Par1">Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable <italic>de novo</italic>
assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular <italic>de Bruijn</italic>
graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of <italic>k</italic>
-values used in the construction of de Bruijn graphs (DBG). However, this process of <italic>sequentially</italic>
iterating from small to large <italic>k</italic>
-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct <italic>k</italic>
-value in parallel. We develop an innovative mechanism to “patch” a higher <italic>k</italic>
-valued graph with contigs generated from a lower <italic>k</italic>
-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes <italic>simultaneously</italic>
. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Stephens, Zd" uniqKey="Stephens Z">ZD Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author><name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Compeau, Pe" uniqKey="Compeau P">PE Compeau</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author><name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author><name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author><name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Chaterji, S" uniqKey="Chaterji S">S Chaterji</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author><name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author><name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author><name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
<author><name sortKey="Maskell, Dl" uniqKey="Maskell D">DL Maskell</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author><name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author><name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author><name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author><name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Gurevich, A" uniqKey="Gurevich A">A Gurevich</name>
</author>
<author><name sortKey="Saveliev, V" uniqKey="Saveliev V">V Saveliev</name>
</author>
<author><name sortKey="Vyahhi, N" uniqKey="Vyahhi N">N Vyahhi</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Indiana</li>
</region>
</list>
<tree><country name="États-Unis"><noRegion><name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
</noRegion>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000394 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000394 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:6795807 |texte= Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:31619717" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |