Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

Identifieur interne : 001699 ( Ncbi/Merge ); précédent : 001698; suivant : 001700

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

Auteurs : Sara El-Metwally [Égypte] ; Magdi Zakaria [Égypte] ; Taher Hamza [Égypte]

Source :

RBID : pubmed:27412092

Descripteurs français

English descriptors

Abstract

The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory.

DOI: 10.1093/bioinformatics/btw470
PubMed: 27412092

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:27412092

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.</title>
<author>
<name sortKey="El Metwally, Sara" sort="El Metwally, Sara" uniqKey="El Metwally S" first="Sara" last="El-Metwally">Sara El-Metwally</name>
<affiliation wicri:level="4">
<nlm:affiliation>Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516</wicri:regionArea>
<orgName type="university">Université de Californie du Sud</orgName>
<placeName>
<settlement type="city">Los Angeles</settlement>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zakaria, Magdi" sort="Zakaria, Magdi" uniqKey="Zakaria M" first="Magdi" last="Zakaria">Magdi Zakaria</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516</wicri:regionArea>
<wicri:noRegion>Mansoura 35516</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hamza, Taher" sort="Hamza, Taher" uniqKey="Hamza T" first="Taher" last="Hamza">Taher Hamza</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516</wicri:regionArea>
<wicri:noRegion>Mansoura 35516</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:27412092</idno>
<idno type="pmid">27412092</idno>
<idno type="doi">10.1093/bioinformatics/btw470</idno>
<idno type="wicri:Area/PubMed/Corpus">000F22</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000F22</idno>
<idno type="wicri:Area/PubMed/Curation">000F22</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000F22</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001029</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001029</idno>
<idno type="wicri:Area/Ncbi/Merge">001699</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.</title>
<author>
<name sortKey="El Metwally, Sara" sort="El Metwally, Sara" uniqKey="El Metwally S" first="Sara" last="El-Metwally">Sara El-Metwally</name>
<affiliation wicri:level="4">
<nlm:affiliation>Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516</wicri:regionArea>
<orgName type="university">Université de Californie du Sud</orgName>
<placeName>
<settlement type="city">Los Angeles</settlement>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zakaria, Magdi" sort="Zakaria, Magdi" uniqKey="Zakaria M" first="Magdi" last="Zakaria">Magdi Zakaria</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516</wicri:regionArea>
<wicri:noRegion>Mansoura 35516</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hamza, Taher" sort="Hamza, Taher" uniqKey="Hamza T" first="Taher" last="Hamza">Taher Hamza</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516</wicri:regionArea>
<wicri:noRegion>Mansoura 35516</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Genome</term>
<term>Genomics</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Animaux</term>
<term>Génome</term>
<term>Génomique</term>
<term>Humains</term>
<term>Séquençage nucléotidique à haut débit ()</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>High-Throughput Nucleotide Sequencing</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Animals</term>
<term>Genome</term>
<term>Genomics</term>
<term>Humans</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Animaux</term>
<term>Génome</term>
<term>Génomique</term>
<term>Humains</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">27412092</PMID>
<DateCompleted>
<Year>2017</Year>
<Month>08</Month>
<Day>15</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>32</Volume>
<Issue>21</Issue>
<PubDate>
<Year>2016</Year>
<Month>11</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.</ArticleTitle>
<Pagination>
<MedlinePgn>3215-3223</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText Label="MOTIVATION">The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory.</AbstractText>
<AbstractText Label="RESULTS">LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION">https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online.</AbstractText>
<CopyrightInformation>© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>El-Metwally</LastName>
<ForeName>Sara</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Zakaria</LastName>
<ForeName>Magdi</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Hamza</LastName>
<ForeName>Taher</ForeName>
<Initials>T</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>07</Month>
<Day>13</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016678" MajorTopicYN="N">Genome</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D023281" MajorTopicYN="N">Genomics</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="N">Sequence Analysis, DNA</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2016</Year>
<Month>04</Month>
<Day>02</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>06</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>10</Month>
<Day>30</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2017</Year>
<Month>8</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>7</Month>
<Day>15</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27412092</ArticleId>
<ArticleId IdType="pii">btw470</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btw470</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>Égypte</li>
</country>
<region>
<li>Californie</li>
</region>
<settlement>
<li>Los Angeles</li>
</settlement>
<orgName>
<li>Université de Californie du Sud</li>
</orgName>
</list>
<tree>
<country name="Égypte">
<region name="Californie">
<name sortKey="El Metwally, Sara" sort="El Metwally, Sara" uniqKey="El Metwally S" first="Sara" last="El-Metwally">Sara El-Metwally</name>
</region>
<name sortKey="Hamza, Taher" sort="Hamza, Taher" uniqKey="Hamza T" first="Taher" last="Hamza">Taher Hamza</name>
<name sortKey="Zakaria, Magdi" sort="Zakaria, Magdi" uniqKey="Zakaria M" first="Magdi" last="Zakaria">Magdi Zakaria</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001699 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001699 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     pubmed:27412092
   |texte=   LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:27412092" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021