Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

In silico read normalization using set multi-cover optimization

Identifieur interne : 000A98 ( Main/Exploration ); précédent : 000A97; suivant : 000A99

In silico read normalization using set multi-cover optimization

Auteurs : Dilip A. Durai [Allemagne] ; Marcel H. Schulz [Allemagne]

Source :

RBID : PMC:6157080

Descripteurs français

English descriptors

Abstract

AbstractMotivation

De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.

Results

Here, normalization is phrased as a set multi-cover problem on reads and a heuristic algorithm, Optimized Read Normalization Algorithm (ORNA), is proposed. ORNA normalizes to the minimum number of reads required to retain all k-mers and their relative k-mer abundances from the original dataset. Hence, all connections from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. Normalizing error corrected data allows for more accurate assemblies compared to the normalized uncorrected dataset. Further, an application is proposed in which multiple datasets are combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with loss of assembly quality in between [1, 30]% depending on reduction stringency.

Availability and implementation

ORNA is available at https://github.com/SchulzLab/ORNA.

Supplementary information

Supplementary data are available at Bioinformatics online.


Url:
DOI: 10.1093/bioinformatics/bty307
PubMed: 29912280
PubMed Central: 6157080


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">
<italic>In silico</italic>
read normalization using set multi-cover optimization</title>
<author>
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff1">Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff2">Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff3">Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Schulz, Marcel H" sort="Schulz, Marcel H" uniqKey="Schulz M" first="Marcel H" last="Schulz">Marcel H. Schulz</name>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff1">Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff2">Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">29912280</idno>
<idno type="pmc">6157080</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157080</idno>
<idno type="RBID">PMC:6157080</idno>
<idno type="doi">10.1093/bioinformatics/bty307</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000B21</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B21</idno>
<idno type="wicri:Area/Pmc/Curation">000B21</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B21</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000660</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000660</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:29912280</idno>
<idno type="wicri:Area/PubMed/Corpus">000864</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000864</idno>
<idno type="wicri:Area/PubMed/Curation">000864</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000864</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000885</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000885</idno>
<idno type="wicri:Area/Ncbi/Merge">001E69</idno>
<idno type="wicri:Area/Ncbi/Curation">001E69</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001E69</idno>
<idno type="wicri:Area/Main/Merge">000B01</idno>
<idno type="wicri:Area/Main/Curation">000A98</idno>
<idno type="wicri:Area/Main/Exploration">000A98</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">
<italic>In silico</italic>
read normalization using set multi-cover optimization</title>
<author>
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff1">Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff2">Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff3">Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Schulz, Marcel H" sort="Schulz, Marcel H" uniqKey="Schulz M" first="Marcel H" last="Schulz">Marcel H. Schulz</name>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff1">Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="bty307-aff2">Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany</nlm:aff>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Computer Simulation</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ARN</term>
<term>Biologie informatique</term>
<term>Simulation numérique</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology</term>
<term>Computer Simulation</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ARN</term>
<term>Biologie informatique</term>
<term>Simulation numérique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<sec id="s1">
<title>Motivation</title>
<p>De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important
<italic>k</italic>
-mers that form connections between regions in the graph.</p>
</sec>
<sec id="s2">
<title>Results</title>
<p>Here, normalization is phrased as a
<italic>set multi-cover problem</italic>
on reads and a heuristic algorithm, Optimized Read Normalization Algorithm (ORNA), is proposed. ORNA normalizes to the minimum number of reads required to retain all
<italic>k</italic>
-mers and their relative
<italic>k</italic>
-mer abundances from the original dataset. Hence, all connections from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. Normalizing error corrected data allows for more accurate assemblies compared to the normalized uncorrected dataset. Further, an application is proposed in which multiple datasets are combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with loss of assembly quality in between [1, 30]% depending on reduction stringency.</p>
</sec>
<sec id="s3">
<title>Availability and implementation</title>
<p>ORNA is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/SchulzLab/ORNA">https://github.com/SchulzLab/ORNA</ext-link>
.</p>
</sec>
<sec id="s4">
<title>Supplementary information</title>
<p>
<xref ref-type="supplementary-material" rid="sup1">Supplementary data</xref>
are available at
<italic>Bioinformatics</italic>
online.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Au, K" uniqKey="Au K">K. Au</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barbosa Morais, N" uniqKey="Barbosa Morais N">N. Barbosa-Morais</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, B" uniqKey="Berger B">B. Berger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, C" uniqKey="Brown C">C. Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chekuri, C" uniqKey="Chekuri C">C. Chekuri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crusoe, M" uniqKey="Crusoe M">M. Crusoe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cunningham, F" uniqKey="Cunningham F">F. Cunningham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drezen, E" uniqKey="Drezen E">E. Drezen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Durai, D" uniqKey="Durai D">D. Durai</name>
</author>
<author>
<name sortKey="Schulz, M" uniqKey="Schulz M">M. Schulz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L. Fu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grabherr, M" uniqKey="Grabherr M">M. Grabherr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haas, B" uniqKey="Haas B">B. Haas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harrow, J" uniqKey="Harrow J">J. Harrow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, W" uniqKey="Kent W">W. Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le, H" uniqKey="Le H">H. Le</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, B" uniqKey="Li B">B. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Limasset, R G" uniqKey="Limasset R">R.G. Limasset</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loh, P R" uniqKey="Loh P">P.R. Loh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macmanes, M" uniqKey="Macmanes M">M. MacManes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mbandi, S K" uniqKey="Mbandi S">S.K. Mbandi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccorrison, J" uniqKey="Mccorrison J">J. McCorrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miller, J" uniqKey="Miller J">J. Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moreton, J" uniqKey="Moreton J">J. Moreton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patro, R" uniqKey="Patro R">R. Patro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G. Rizk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robertson, G" uniqKey="Robertson G">G. Robertson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salikhov, K" uniqKey="Salikhov K">K. Salikhov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schulz, M" uniqKey="Schulz M">M. Schulz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
<author>
<name sortKey="Florea, L" uniqKey="Florea L">L. Florea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Srivastava, A" uniqKey="Srivastava A">A. Srivastava</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="White, N" uniqKey="White N">N. White</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q. Zhang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>Sarre (Land)</li>
</region>
<settlement>
<li>Sarrebruck</li>
</settlement>
</list>
<tree>
<country name="Allemagne">
<region name="Sarre (Land)">
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
</region>
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
<name sortKey="Durai, Dilip A" sort="Durai, Dilip A" uniqKey="Durai D" first="Dilip A" last="Durai">Dilip A. Durai</name>
<name sortKey="Schulz, Marcel H" sort="Schulz, Marcel H" uniqKey="Schulz M" first="Marcel H" last="Schulz">Marcel H. Schulz</name>
<name sortKey="Schulz, Marcel H" sort="Schulz, Marcel H" uniqKey="Schulz M" first="Marcel H" last="Schulz">Marcel H. Schulz</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A98 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A98 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:6157080
   |texte=   In silico read normalization using set multi-cover optimization
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:29912280" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021