MersV1, Pmc, Curation, bibRecord, 000944

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

Identifieur interne : 000944 ( Pmc/Curation ); précédent : 000943; suivant : 000945

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

Auteurs : Berat Z. Haznedaroglu [États-Unis] ; Darryl Reeves [États-Unis] ; Hamid Rismani-Yazdi [États-Unis] ; Jordan Peccia [États-Unis]

Source :

BMC Bioinformatics [ 1471-2105 ] ; 2012.

RBID : PMC:3489510

Abstract

Background

The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly.

Results

Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63). For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs) in the assemblies of individual k-mers (k-19 to k-63) that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented.

Conclusions

This study demonstrated that different k-mer choices result in various quantities of unique contigs per single k-mer assembly which affects biological information that is retrievable from the transcriptome. This undesirable effect can be minimized, but not eliminated, with clustering of multi-k assemblies with redundancy removal. The complete extraction of biological information in de novo transcriptomics studies requires both the production of a CA and efforts to identify unique contigs that are present in individual k-mer assemblies but not in the CA.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3489510

DOI: 10.1186/1471-2105-13-170
PubMed: 22808927
PubMed Central: 3489510

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000944

Links to Exploration step

PMC:3489510

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Optimization of <italic>de novo</italic>
 transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms</title>
<author><name sortKey="Haznedaroglu, Berat Z" sort="Haznedaroglu, Berat Z" uniqKey="Haznedaroglu B" first="Berat Z" last="Haznedaroglu">Berat Z. Haznedaroglu</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Reeves, Darryl" sort="Reeves, Darryl" uniqKey="Reeves D" first="Darryl" last="Reeves">Darryl Reeves</name>
<affiliation wicri:level="1"><nlm:aff id="I2">Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Rismani Yazdi, Hamid" sort="Rismani Yazdi, Hamid" uniqKey="Rismani Yazdi H" first="Hamid" last="Rismani-Yazdi">Hamid Rismani-Yazdi</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="I3">Now at the Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Now at the Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Peccia, Jordan" sort="Peccia, Jordan" uniqKey="Peccia J" first="Jordan" last="Peccia">Jordan Peccia</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">22808927</idno>
<idno type="pmc">3489510</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3489510</idno>
<idno type="RBID">PMC:3489510</idno>
<idno type="doi">10.1186/1471-2105-13-170</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000944</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000944</idno>
<idno type="wicri:Area/Pmc/Curation">000944</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000944</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Optimization of <italic>de novo</italic>
 transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms</title>
<author><name sortKey="Haznedaroglu, Berat Z" sort="Haznedaroglu, Berat Z" uniqKey="Haznedaroglu B" first="Berat Z" last="Haznedaroglu">Berat Z. Haznedaroglu</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Reeves, Darryl" sort="Reeves, Darryl" uniqKey="Reeves D" first="Darryl" last="Reeves">Darryl Reeves</name>
<affiliation wicri:level="1"><nlm:aff id="I2">Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Rismani Yazdi, Hamid" sort="Rismani Yazdi, Hamid" uniqKey="Rismani Yazdi H" first="Hamid" last="Rismani-Yazdi">Hamid Rismani-Yazdi</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="I3">Now at the Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Now at the Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Peccia, Jordan" sort="Peccia, Jordan" uniqKey="Peccia J" first="Jordan" last="Peccia">Jordan Peccia</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>The <italic>k</italic>
-mer hash length is a key factor affecting the output of <italic>de novo</italic>
 transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single <italic>k</italic>
-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single <italic>k</italic>
-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of <italic>k</italic>
-mer selection on the annotation output. This study provides an in-depth <italic>k</italic>
-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual <italic>k</italic>
-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual <italic>k</italic>
-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a <italic>de novo</italic>
 transcriptome assembly.</p>
</sec>
<sec><title>Results</title>
<p>Analyses of single <italic>k</italic>
-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of <italic>k</italic>
-mers (<italic>k-</italic>
19 to <italic>k-</italic>
63). For each <italic>k</italic>
-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other <italic>k</italic>
-mer assemblies. Producing a non-redundant CA of <italic>k</italic>
-mers 19 to 63 resulted in a more complete functional annotation than any single <italic>k</italic>
-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs) in the assemblies of individual <italic>k</italic>
-mers (<italic>k-</italic>
19 to <italic>k-</italic>
63) that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented.</p>
</sec>
<sec><title>Conclusions</title>
<p>This study demonstrated that different <italic>k</italic>
-mer choices result in various quantities of unique contigs per single <italic>k</italic>
-mer assembly which affects biological information that is retrievable from the transcriptome. This undesirable effect can be minimized, but not eliminated, with clustering of multi-<italic>k</italic>
 assemblies with redundancy removal. The complete extraction of biological information in <italic>de novo</italic>
 transcriptomics studies requires both the production of a CA and efforts to identify unique contigs that are present in individual <italic>k</italic>
-mer assemblies but not in the CA.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Iyer, Mk" uniqKey="Iyer M">MK Iyer</name>
</author>
<author><name sortKey="Chinnaiyan, Am" uniqKey="Chinnaiyan A">AM Chinnaiyan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Martin, Ja" uniqKey="Martin J">JA Martin</name>
</author>
<author><name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="De Bruijn, Ng" uniqKey="De Bruijn N">NG De Bruijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schulz, Mh" uniqKey="Schulz M">MH Schulz</name>
</author>
<author><name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author><name sortKey="Vingron, M" uniqKey="Vingron M">M Vingron</name>
</author>
<author><name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Robertson, G" uniqKey="Robertson G">G Robertson</name>
</author>
<author><name sortKey="Schein, J" uniqKey="Schein J">J Schein</name>
</author>
<author><name sortKey="Chiu, R" uniqKey="Chiu R">R Chiu</name>
</author>
<author><name sortKey="Corbett, R" uniqKey="Corbett R">R Corbett</name>
</author>
<author><name sortKey="Field, M" uniqKey="Field M">M Field</name>
</author>
<author><name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
<author><name sortKey="Mungall, K" uniqKey="Mungall K">K Mungall</name>
</author>
<author><name sortKey="Lee, S" uniqKey="Lee S">S Lee</name>
</author>
<author><name sortKey="Okada, Hm" uniqKey="Okada H">HM Okada</name>
</author>
<author><name sortKey="Qian, Jq" uniqKey="Qian J">JQ Qian</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author><name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author><name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author><name sortKey="Qian, W" uniqKey="Qian W">W Qian</name>
</author>
<author><name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
<author><name sortKey="Shi, Z" uniqKey="Shi Z">Z Shi</name>
</author>
<author><name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author><name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author><name sortKey="Shan, G" uniqKey="Shan G">G Shan</name>
</author>
<author><name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grabherr, Mg" uniqKey="Grabherr M">MG Grabherr</name>
</author>
<author><name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author><name sortKey="Yassour, M" uniqKey="Yassour M">M Yassour</name>
</author>
<author><name sortKey="Levin, Jz" uniqKey="Levin J">JZ Levin</name>
</author>
<author><name sortKey="Thompson, Da" uniqKey="Thompson D">DA Thompson</name>
</author>
<author><name sortKey="Amit, I" uniqKey="Amit I">I Amit</name>
</author>
<author><name sortKey="Adiconis, X" uniqKey="Adiconis X">X Adiconis</name>
</author>
<author><name sortKey="Fan, L" uniqKey="Fan L">L Fan</name>
</author>
<author><name sortKey="Raychowdhury, R" uniqKey="Raychowdhury R">R Raychowdhury</name>
</author>
<author><name sortKey="Zeng, Q" uniqKey="Zeng Q">Q Zeng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bao, S" uniqKey="Bao S">S Bao</name>
</author>
<author><name sortKey="Jiang, R" uniqKey="Jiang R">R Jiang</name>
</author>
<author><name sortKey="Kwan, W" uniqKey="Kwan W">W Kwan</name>
</author>
<author><name sortKey="Wang, B" uniqKey="Wang B">B Wang</name>
</author>
<author><name sortKey="Ma, X" uniqKey="Ma X">X Ma</name>
</author>
<author><name sortKey="Song, Y Q" uniqKey="Song Y">Y-Q Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Narzisi, G" uniqKey="Narzisi G">G Narzisi</name>
</author>
<author><name sortKey="Mishra, B" uniqKey="Mishra B">B Mishra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author><name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author><name sortKey="Yang, Y" uniqKey="Yang Y">Y Yang</name>
</author>
<author><name sortKey="Tang, Y" uniqKey="Tang Y">Y Tang</name>
</author>
<author><name sortKey="Shang, J" uniqKey="Shang J">J Shang</name>
</author>
<author><name sortKey="Shen, B" uniqKey="Shen B">B Shen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author><name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Surget Groba, Y" uniqKey="Surget Groba Y">Y Surget-Groba</name>
</author>
<author><name sortKey="Montoya Burgos, Ji" uniqKey="Montoya Burgos J">JI Montoya-Burgos</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author><name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kurtz, S" uniqKey="Kurtz S">S Kurtz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pertea, G" uniqKey="Pertea G">G Pertea</name>
</author>
<author><name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
<author><name sortKey="Liang, F" uniqKey="Liang F">F Liang</name>
</author>
<author><name sortKey="Antonescu, V" uniqKey="Antonescu V">V Antonescu</name>
</author>
<author><name sortKey="Sultana, R" uniqKey="Sultana R">R Sultana</name>
</author>
<author><name sortKey="Karamycheva, S" uniqKey="Karamycheva S">S Karamycheva</name>
</author>
<author><name sortKey="Lee, Y" uniqKey="Lee Y">Y Lee</name>
</author>
<author><name sortKey="White, J" uniqKey="White J">J White</name>
</author>
<author><name sortKey="Cheung, F" uniqKey="Cheung F">F Cheung</name>
</author>
<author><name sortKey="Parvizi, B" uniqKey="Parvizi B">B Parvizi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Griffiths, M" uniqKey="Griffiths M">M Griffiths</name>
</author>
<author><name sortKey="Harrison, S" uniqKey="Harrison S">S Harrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author><name sortKey="Horsman, M" uniqKey="Horsman M">M Horsman</name>
</author>
<author><name sortKey="Wang, B" uniqKey="Wang B">B Wang</name>
</author>
<author><name sortKey="Wu, N" uniqKey="Wu N">N Wu</name>
</author>
<author><name sortKey="Lan, C" uniqKey="Lan C">C Lan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pruvost, J" uniqKey="Pruvost J">J Pruvost</name>
</author>
<author><name sortKey="Van Vooren, G" uniqKey="Van Vooren G">G Van Vooren</name>
</author>
<author><name sortKey="Cogne, G" uniqKey="Cogne G">G Cogne</name>
</author>
<author><name sortKey="Legrand, J" uniqKey="Legrand J">J Legrand</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Andrews, S" uniqKey="Andrews S">S Andrews</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cox, M" uniqKey="Cox M">M Cox</name>
</author>
<author><name sortKey="Peterson, D" uniqKey="Peterson D">D Peterson</name>
</author>
<author><name sortKey="Biggs, P" uniqKey="Biggs P">P Biggs</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Garg, R" uniqKey="Garg R">R Garg</name>
</author>
<author><name sortKey="Patel, Rk" uniqKey="Patel R">RK Patel</name>
</author>
<author><name sortKey="Tyagi, Ak" uniqKey="Tyagi A">AK Tyagi</name>
</author>
<author><name sortKey="Jain, M" uniqKey="Jain M">M Jain</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Feldmeyer, B" uniqKey="Feldmeyer B">B Feldmeyer</name>
</author>
<author><name sortKey="Wheat, C" uniqKey="Wheat C">C Wheat</name>
</author>
<author><name sortKey="Krezdorn, N" uniqKey="Krezdorn N">N Krezdorn</name>
</author>
<author><name sortKey="Rotter, B" uniqKey="Rotter B">B Rotter</name>
</author>
<author><name sortKey="Pfenninger, M" uniqKey="Pfenninger M">M Pfenninger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Moriya, Y" uniqKey="Moriya Y">Y Moriya</name>
</author>
<author><name sortKey="Itoh, M" uniqKey="Itoh M">M Itoh</name>
</author>
<author><name sortKey="Okuda, S" uniqKey="Okuda S">S Okuda</name>
</author>
<author><name sortKey="Yoshizawa, Ac" uniqKey="Yoshizawa A">AC Yoshizawa</name>
</author>
<author><name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M Kanehisa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Aoki Kinoshita, Kf" uniqKey="Aoki Kinoshita K">KF Aoki-Kinoshita</name>
</author>
<author><name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M Kanehisa</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author><name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author><name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">22808927</article-id>
<article-id pub-id-type="pmc">3489510</article-id>
<article-id pub-id-type="publisher-id">1471-2105-13-170</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-13-170</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Optimization of <italic>de novo</italic>
 transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" id="A1"><name><surname>Haznedaroglu</surname>
<given-names>Berat Z</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>berat.haznedaroglu@yale.edu</email>
</contrib>
<contrib contrib-type="author" id="A2"><name><surname>Reeves</surname>
<given-names>Darryl</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>darryl.reeves@yale.edu</email>
</contrib>
<contrib contrib-type="author" id="A3"><name><surname>Rismani-Yazdi</surname>
<given-names>Hamid</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>hrismani@mit.edu</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A4"><name><surname>Peccia</surname>
<given-names>Jordan</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>jordan.peccia@yale.edu</email>
</contrib>
</contrib-group>
<aff id="I1"><label>1</label>
Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06511, USA</aff>
<aff id="I2"><label>2</label>
Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA</aff>
<aff id="I3"><label>3</label>
Now at the Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA</aff>
<pub-date pub-type="collection"><year>2012</year>
</pub-date>
<pub-date pub-type="epub"><day>18</day>
<month>7</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<fpage>170</fpage>
<lpage>170</lpage>
<history><date date-type="received"><day>25</day>
<month>2</month>
<year>2012</year>
</date>
<date date-type="accepted"><day>26</day>
<month>6</month>
<year>2012</year>
</date>
</history>
<permissions><copyright-statement>Copyright ©2012 Haznedaroglu et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Haznedaroglu et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0"><license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/13/170"></self-uri>
<abstract><sec><title>Background</title>
<p>The <italic>k</italic>
-mer hash length is a key factor affecting the output of <italic>de novo</italic>
 transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single <italic>k</italic>
-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single <italic>k</italic>
-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of <italic>k</italic>
-mer selection on the annotation output. This study provides an in-depth <italic>k</italic>
-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual <italic>k</italic>
-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual <italic>k</italic>
-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a <italic>de novo</italic>
 transcriptome assembly.</p>
</sec>
<sec><title>Results</title>
<p>Analyses of single <italic>k</italic>
-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of <italic>k</italic>
-mers (<italic>k-</italic>
19 to <italic>k-</italic>
63). For each <italic>k</italic>
-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other <italic>k</italic>
-mer assemblies. Producing a non-redundant CA of <italic>k</italic>
-mers 19 to 63 resulted in a more complete functional annotation than any single <italic>k</italic>
-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs) in the assemblies of individual <italic>k</italic>
-mers (<italic>k-</italic>
19 to <italic>k-</italic>
63) that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented.</p>
</sec>
<sec><title>Conclusions</title>
<p>This study demonstrated that different <italic>k</italic>
-mer choices result in various quantities of unique contigs per single <italic>k</italic>
-mer assembly which affects biological information that is retrievable from the transcriptome. This undesirable effect can be minimized, but not eliminated, with clustering of multi-<italic>k</italic>
 assemblies with redundancy removal. The complete extraction of biological information in <italic>de novo</italic>
 transcriptomics studies requires both the production of a CA and efforts to identify unique contigs that are present in individual <italic>k</italic>
-mer assemblies but not in the CA.</p>
</sec>
</abstract>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000944 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000944 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:3489510
   |texte=   Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:22808927" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki