Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

An improved approach for reconstructing consensus repeats from short sequence reads

Identifieur interne : 000298 ( Pmc/Curation ); précédent : 000297; suivant : 000299

An improved approach for reconstructing consensus repeats from short sequence reads

Auteurs : Chong Chu [États-Unis] ; Jingwen Pei [États-Unis] ; Yufeng Wu [États-Unis]

Source :

RBID : PMC:6101065

Abstract

Background

Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.

Results

We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads.

Conclusion

We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo.


Url:
DOI: 10.1186/s12864-018-4920-6
PubMed: 30367582
PubMed Central: 6101065

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:6101065

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">An improved approach for reconstructing consensus repeats from short sequence reads</title>
<author>
<name sortKey="Chu, Chong" sort="Chu, Chong" uniqKey="Chu C" first="Chong" last="Chu">Chong Chu</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">000000041936754X</institution-id>
<institution-id institution-id-type="GRID">grid.38142.3c</institution-id>
<institution>Department of Biomedical Informatics, Harvard Medical School,</institution>
</institution-wrap>
10 Shattuck Street, Boston, 02115 MA USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:regionArea>10 Shattuck Street, Boston</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pei, Jingwen" sort="Pei, Jingwen" uniqKey="Pei J" first="Jingwen" last="Pei">Jingwen Pei</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0860 4915</institution-id>
<institution-id institution-id-type="GRID">grid.63054.34</institution-id>
<institution>Department of Computer Science and Engineering, University of Connecticut,</institution>
</institution-wrap>
371 Fairfield Way, Unit 2155, Storrs, 06269 CT USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:regionArea>371 Fairfield Way, Unit 2155, Storrs</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Wu, Yufeng" sort="Wu, Yufeng" uniqKey="Wu Y" first="Yufeng" last="Wu">Yufeng Wu</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0860 4915</institution-id>
<institution-id institution-id-type="GRID">grid.63054.34</institution-id>
<institution>Department of Computer Science and Engineering, University of Connecticut,</institution>
</institution-wrap>
371 Fairfield Way, Unit 2155, Storrs, 06269 CT USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:regionArea>371 Fairfield Way, Unit 2155, Storrs</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30367582</idno>
<idno type="pmc">6101065</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101065</idno>
<idno type="RBID">PMC:6101065</idno>
<idno type="doi">10.1186/s12864-018-4920-6</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000298</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000298</idno>
<idno type="wicri:Area/Pmc/Curation">000298</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000298</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">An improved approach for reconstructing consensus repeats from short sequence reads</title>
<author>
<name sortKey="Chu, Chong" sort="Chu, Chong" uniqKey="Chu C" first="Chong" last="Chu">Chong Chu</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">000000041936754X</institution-id>
<institution-id institution-id-type="GRID">grid.38142.3c</institution-id>
<institution>Department of Biomedical Informatics, Harvard Medical School,</institution>
</institution-wrap>
10 Shattuck Street, Boston, 02115 MA USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
<wicri:regionArea>10 Shattuck Street, Boston</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pei, Jingwen" sort="Pei, Jingwen" uniqKey="Pei J" first="Jingwen" last="Pei">Jingwen Pei</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0860 4915</institution-id>
<institution-id institution-id-type="GRID">grid.63054.34</institution-id>
<institution>Department of Computer Science and Engineering, University of Connecticut,</institution>
</institution-wrap>
371 Fairfield Way, Unit 2155, Storrs, 06269 CT USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:regionArea>371 Fairfield Way, Unit 2155, Storrs</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Wu, Yufeng" sort="Wu, Yufeng" uniqKey="Wu Y" first="Yufeng" last="Wu">Yufeng Wu</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0860 4915</institution-id>
<institution-id institution-id-type="GRID">grid.63054.34</institution-id>
<institution>Department of Computer Science and Engineering, University of Connecticut,</institution>
</institution-wrap>
371 Fairfield Way, Unit 2155, Storrs, 06269 CT USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:regionArea>371 Fairfield Way, Unit 2155, Storrs</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.</p>
</sec>
<sec>
<title>Results</title>
<p>We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at
<ext-link ext-link-type="uri" xlink:href="https://github.com/Reedwarbler/REPdenovo">https://github.com/Reedwarbler/REPdenovo</ext-link>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Jr, Hhk" uniqKey="Jr H">HHK Jr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cordaux, R" uniqKey="Cordaux R">R Cordaux</name>
</author>
<author>
<name sortKey="Batzer, Ma" uniqKey="Batzer M">MA Batzer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mills, Re" uniqKey="Mills R">RE Mills</name>
</author>
<author>
<name sortKey="Bennett, Ea" uniqKey="Bennett E">EA Bennett</name>
</author>
<author>
<name sortKey="Iskow, Rc" uniqKey="Iskow R">RC Iskow</name>
</author>
<author>
<name sortKey="Devine, Se" uniqKey="Devine S">SE Devine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jurka, J" uniqKey="Jurka J">J Jurka</name>
</author>
<author>
<name sortKey="Kapitonov, Vv" uniqKey="Kapitonov V">VV Kapitonov</name>
</author>
<author>
<name sortKey="Pavlicek, A" uniqKey="Pavlicek A">A Pavlicek</name>
</author>
<author>
<name sortKey="Klonowski, P" uniqKey="Klonowski P">P Klonowski</name>
</author>
<author>
<name sortKey="Kohany, O" uniqKey="Kohany O">O Kohany</name>
</author>
<author>
<name sortKey="Walichiewicz, J" uniqKey="Walichiewicz J">J Walichiewicz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, Tj" uniqKey="Wheeler T">TJ Wheeler</name>
</author>
<author>
<name sortKey="Clements, J" uniqKey="Clements J">J Clements</name>
</author>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author>
<name sortKey="Hubley, R" uniqKey="Hubley R">R Hubley</name>
</author>
<author>
<name sortKey="Jones, Ta" uniqKey="Jones T">TA Jones</name>
</author>
<author>
<name sortKey="Jurka, J" uniqKey="Jurka J">J Jurka</name>
</author>
<author>
<name sortKey="Smit, Af" uniqKey="Smit A">AF Smit</name>
</author>
<author>
<name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Price, Al" uniqKey="Price A">AL Price</name>
</author>
<author>
<name sortKey="Jones, Nc" uniqKey="Jones N">NC Jones</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schaeffer, Ce" uniqKey="Schaeffer C">CE Schaeffer</name>
</author>
<author>
<name sortKey="Figueroa, Nd" uniqKey="Figueroa N">ND Figueroa</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Karro, Je" uniqKey="Karro J">JE Karro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koch, P" uniqKey="Koch P">P Koch</name>
</author>
<author>
<name sortKey="Platzer, M" uniqKey="Platzer M">M Platzer</name>
</author>
<author>
<name sortKey="Downie, Br" uniqKey="Downie B">BR Downie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ye, N" uniqKey="Ye N">N Ye</name>
</author>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
<author>
<name sortKey="Miao, M" uniqKey="Miao M">M Miao</name>
</author>
<author>
<name sortKey="Fan, X" uniqKey="Fan X">X Fan</name>
</author>
<author>
<name sortKey="Zheng, Y" uniqKey="Zheng Y">Y Zheng</name>
</author>
<author>
<name sortKey="Xu, D" uniqKey="Xu D">D Xu</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Zhou, L" uniqKey="Zhou L">L Zhou</name>
</author>
<author>
<name sortKey="Wang, D" uniqKey="Wang D">D Wang</name>
</author>
<author>
<name sortKey="Gao, Y" uniqKey="Gao Y">Y Gao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chu, C" uniqKey="Chu C">C Chu</name>
</author>
<author>
<name sortKey="Nielsen, R" uniqKey="Nielsen R">R Nielsen</name>
</author>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robinson, Jt" uniqKey="Robinson J">JT Robinson</name>
</author>
<author>
<name sortKey="Thorvaldsd Ttir, H" uniqKey="Thorvaldsd Ttir H">H Thorvaldsdóttir</name>
</author>
<author>
<name sortKey="Winckler, W" uniqKey="Winckler W">W Winckler</name>
</author>
<author>
<name sortKey="Guttman, M" uniqKey="Guttman M">M Guttman</name>
</author>
<author>
<name sortKey="Es Lander, Ea" uniqKey="Es Lander E">ea ES Lander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Consortium, Gp" uniqKey="Consortium G">GP Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chin, C S" uniqKey="Chin C">C-S Chin</name>
</author>
<author>
<name sortKey="Peluso, P" uniqKey="Peluso P">P Peluso</name>
</author>
<author>
<name sortKey="Sedlazeck, Fj" uniqKey="Sedlazeck F">FJ Sedlazeck</name>
</author>
<author>
<name sortKey="Nattestad, M" uniqKey="Nattestad M">M Nattestad</name>
</author>
<author>
<name sortKey="Concepcion, Gt" uniqKey="Concepcion G">GT Concepcion</name>
</author>
<author>
<name sortKey="Clum, A" uniqKey="Clum A">A Clum</name>
</author>
<author>
<name sortKey="Dunn, C" uniqKey="Dunn C">C Dunn</name>
</author>
<author>
<name sortKey="O Alley, R" uniqKey="O Alley R">R O’Malley</name>
</author>
<author>
<name sortKey="Figueroa Balderas, R" uniqKey="Figueroa Balderas R">R Figueroa-Balderas</name>
</author>
<author>
<name sortKey="Morales Cruz, A" uniqKey="Morales Cruz A">A Morales-Cruz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosenbloom, Kr" uniqKey="Rosenbloom K">KR Rosenbloom</name>
</author>
<author>
<name sortKey="Armstrong, J" uniqKey="Armstrong J">J Armstrong</name>
</author>
<author>
<name sortKey="Barber, Gp" uniqKey="Barber G">GP Barber</name>
</author>
<author>
<name sortKey="Casper, J" uniqKey="Casper J">J Casper</name>
</author>
<author>
<name sortKey="Clawson, H" uniqKey="Clawson H">H Clawson</name>
</author>
<author>
<name sortKey="Diekhans, M" uniqKey="Diekhans M">M Diekhans</name>
</author>
<author>
<name sortKey="Dreszer, Tr" uniqKey="Dreszer T">TR Dreszer</name>
</author>
<author>
<name sortKey="Fujita, Pa" uniqKey="Fujita P">PA Fujita</name>
</author>
<author>
<name sortKey="Guruvadoo, L" uniqKey="Guruvadoo L">L Guruvadoo</name>
</author>
<author>
<name sortKey="Haeussler, M" uniqKey="Haeussler M">M Haeussler</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30367582</article-id>
<article-id pub-id-type="pmc">6101065</article-id>
<article-id pub-id-type="publisher-id">4920</article-id>
<article-id pub-id-type="doi">10.1186/s12864-018-4920-6</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An improved approach for reconstructing consensus repeats from short sequence reads</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Chu</surname>
<given-names>Chong</given-names>
</name>
<address>
<email>chong_chu@hms.harvard.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pei</surname>
<given-names>Jingwen</given-names>
</name>
<address>
<email>jingwen.pei@uconn.edu</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Wu</surname>
<given-names>Yufeng</given-names>
</name>
<address>
<email>yufeng.wu@uconn.edu</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">000000041936754X</institution-id>
<institution-id institution-id-type="GRID">grid.38142.3c</institution-id>
<institution>Department of Biomedical Informatics, Harvard Medical School,</institution>
</institution-wrap>
10 Shattuck Street, Boston, 02115 MA USA</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0860 4915</institution-id>
<institution-id institution-id-type="GRID">grid.63054.34</institution-id>
<institution>Department of Computer Science and Engineering, University of Connecticut,</institution>
</institution-wrap>
371 Fairfield Way, Unit 2155, Storrs, 06269 CT USA</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>13</day>
<month>8</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>13</day>
<month>8</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>19</volume>
<issue>Suppl 6</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.</issue-sponsor>
<elocation-id>566</elocation-id>
<permissions>
<copyright-statement>© The Author(s) 2018</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method.</p>
</sec>
<sec>
<title>Results</title>
<p>We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at
<ext-link ext-link-type="uri" xlink:href="https://github.com/Reedwarbler/REPdenovo">https://github.com/Reedwarbler/REPdenovo</ext-link>
.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Repeat elements</kwd>
<kwd>De novo genome assembly</kwd>
<kwd>Sequence analysis</kwd>
</kwd-group>
<conference>
<conf-name>13th International Symposium on Bioinformatics Research and Applications (ISBRA 2017)</conf-name>
<conf-loc>Honolulu, Hawaii, USA</conf-loc>
<conf-date>30 May - 2 June 2017</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2018</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000298 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000298 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:6101065
   |texte=   An improved approach for reconstructing consensus repeats from short sequence reads
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:30367582" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021