Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

In silico read normalization using set multi-cover optimization.

Identifieur interne : 000864 ( PubMed/Curation ); précédent : 000863; suivant : 000865

In silico read normalization using set multi-cover optimization.

Auteurs : Dilip A. Durai [Allemagne] ; Marcel H. Schulz [Allemagne]

Source :

RBID : pubmed:29912280

Descripteurs français

English descriptors

Abstract

De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.

DOI: 10.1093/bioinformatics/bty307
PubMed: 29912280

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:29912280

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">In silico read normalization using set multi-cover optimization.</title>
<author>
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Schulz, Marcel H" sort="Schulz, Marcel H" uniqKey="Schulz M" first="Marcel H" last="Schulz">Marcel H. Schulz</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29912280</idno>
<idno type="pmid">29912280</idno>
<idno type="doi">10.1093/bioinformatics/bty307</idno>
<idno type="wicri:Area/PubMed/Corpus">000864</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000864</idno>
<idno type="wicri:Area/PubMed/Curation">000864</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000864</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">In silico read normalization using set multi-cover optimization.</title>
<author>
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Schulz, Marcel H" sort="Schulz, Marcel H" uniqKey="Schulz M" first="Marcel H" last="Schulz">Marcel H. Schulz</name>
<affiliation wicri:level="1">
<nlm:affiliation>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Computer Simulation</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ARN</term>
<term>Biologie informatique</term>
<term>Simulation numérique</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Computer Simulation</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ARN</term>
<term>Biologie informatique</term>
<term>Simulation numérique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">29912280</PMID>
<DateCompleted>
<Year>2019</Year>
<Month>10</Month>
<Day>21</Day>
</DateCompleted>
<DateRevised>
<Year>2019</Year>
<Month>10</Month>
<Day>22</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>34</Volume>
<Issue>19</Issue>
<PubDate>
<Year>2018</Year>
<Month>10</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>In silico read normalization using set multi-cover optimization.</ArticleTitle>
<Pagination>
<MedlinePgn>3273-3280</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/bty307</ELocationID>
<Abstract>
<AbstractText Label="Motivation">De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.</AbstractText>
<AbstractText Label="Results">Here, normalization is phrased as a set multi-cover problem on reads and a heuristic algorithm, Optimized Read Normalization Algorithm (ORNA), is proposed. ORNA normalizes to the minimum number of reads required to retain all k-mers and their relative k-mer abundances from the original dataset. Hence, all connections from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. Normalizing error corrected data allows for more accurate assemblies compared to the normalized uncorrected dataset. Further, an application is proposed in which multiple datasets are combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with loss of assembly quality in between [1, 30]% depending on reduction stringency.</AbstractText>
<AbstractText Label="Availability and implementation">ORNA is available at https://github.com/SchulzLab/ORNA.</AbstractText>
<AbstractText Label="Supplementary information">Supplementary data are available at Bioinformatics online.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Durai</LastName>
<ForeName>Dilip A</ForeName>
<Initials>DA</Initials>
<AffiliationInfo>
<Affiliation>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Schulz</LastName>
<ForeName>Marcel H</ForeName>
<Initials>MH</Initials>
<AffiliationInfo>
<Affiliation>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003198" MajorTopicYN="Y">Computer Simulation</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017423" MajorTopicYN="Y">Sequence Analysis, RNA</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2017</Year>
<Month>06</Month>
<Day>19</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2018</Year>
<Month>04</Month>
<Day>18</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2018</Year>
<Month>6</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>10</Month>
<Day>23</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2018</Year>
<Month>6</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">29912280</ArticleId>
<ArticleId IdType="pii">4975418</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/bty307</ArticleId>
<ArticleId IdType="pmc">PMC6157080</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Nucleic Acids Res. 2013 May 1;41(10):e109</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23558750</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Jun 15;32(12):i192-i200</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27307617</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2010 Nov;7(11):909-12</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20935650</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>F1000Res. 2015 Sep 25;4:900</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26535114</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2002 Apr;12(4):656-64</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11932250</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2012 Dec 1;28(23):3150-2</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23060610</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2012 Apr 15;28(8):1086-92</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22368243</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2013 Dec 10;110(50):E4821-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24282307</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2014 Feb 24;9(1):2</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24565280</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Jun 1;32(11):1670-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27153653</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2014 Dec 21;15(12):553</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25608678</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Jun 15;32(12):i201-i208</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27307618</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Protoc. 2013 Aug;8(8):1494-512</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23845962</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2011 May 15;29(7):644-52</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21572440</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2014 Nov 19;15:357</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25407910</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Front Genet. 2014 Jan 31;5:13</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24567737</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2012 Sep;22(9):1760-74</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22955987</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2012 Dec 21;338(6114):1587-93</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23258890</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Oct 15;30(20):2959-61</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24990603</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Front Genet. 2014 Feb 12;5:17</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24575122</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Gigascience. 2015 Oct 19;4:48</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26500767</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Biotechnol. 2012 Jul 10;30(7):627-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22781691</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Genet. 2013 May;14(5):333-46</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23594911</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Front Genet. 2016 Jan 11;6:361</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26793234</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2014 Aug 13;15(8):429</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25116943</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25352552</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2017 Apr;14(4):417-419</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28263959</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22847406</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genomics. 2010 Jun;95(6):315-27</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20211242</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000864 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 000864 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:29912280
   |texte=   In silico read normalization using set multi-cover optimization.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:29912280" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021