AustralieFrV1, Pmc, Corpus, bibRecord, 000A85

Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods

Identifieur interne : 000A85 ( Pmc/Corpus ); précédent : 000A84; suivant : 000A86

Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods

Auteurs : Guillaume Martin ; Franc-Christophe Baurens ; Gaëtan Droc ; Mathieu Rouard ; Alberto Cenci ; Andrzej Kilian ; Alex Hastie ; Jaroslav Doležel ; Jean-Marc Aury ; Adriana Alberti ; Françoise Carreel ; Angélique D Ont

Source :

BMC Genomics [ 1471-2164 ] ; 2016.

RBID : PMC:4793746

Abstract

Background

Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata).

Results

We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %.

Conclusion

The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4793746

DOI: 10.1186/s12864-016-2579-4
PubMed: 26984673
PubMed Central: 4793746

Links to Exploration step

PMC:4793746

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improvement of the banana “<italic>Musa acuminata</italic>
” reference sequence using NGS data and semi-automated bioinformatics methods</title>
<author><name sortKey="Martin, Guillaume" sort="Martin, Guillaume" uniqKey="Martin G" first="Guillaume" last="Martin">Guillaume Martin</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Baurens, Franc Christophe" sort="Baurens, Franc Christophe" uniqKey="Baurens F" first="Franc-Christophe" last="Baurens">Franc-Christophe Baurens</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Droc, Gaetan" sort="Droc, Gaetan" uniqKey="Droc G" first="Gaëtan" last="Droc">Gaëtan Droc</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Rouard, Mathieu" sort="Rouard, Mathieu" uniqKey="Rouard M" first="Mathieu" last="Rouard">Mathieu Rouard</name>
<affiliation><nlm:aff id="Aff2">Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Cenci, Alberto" sort="Cenci, Alberto" uniqKey="Cenci A" first="Alberto" last="Cenci">Alberto Cenci</name>
<affiliation><nlm:aff id="Aff2">Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kilian, Andrzej" sort="Kilian, Andrzej" uniqKey="Kilian A" first="Andrzej" last="Kilian">Andrzej Kilian</name>
<affiliation><nlm:aff id="Aff3">Diversity Arrays Technology, Yarralumla, Australian Capital Territory 2600 Australia</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hastie, Alex" sort="Hastie, Alex" uniqKey="Hastie A" first="Alex" last="Hastie">Alex Hastie</name>
<affiliation><nlm:aff id="Aff4">BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA 92121 USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Dolezel, Jaroslav" sort="Dolezel, Jaroslav" uniqKey="Dolezel J" first="Jaroslav" last="Doležel">Jaroslav Doležel</name>
<affiliation><nlm:aff id="Aff5">Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371 Olomouc, Czech Republic</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Aury, Jean Marc" sort="Aury, Jean Marc" uniqKey="Aury J" first="Jean-Marc" last="Aury">Jean-Marc Aury</name>
<affiliation><nlm:aff id="Aff6">Commissariat à l’Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057 Evry, France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Alberti, Adriana" sort="Alberti, Adriana" uniqKey="Alberti A" first="Adriana" last="Alberti">Adriana Alberti</name>
<affiliation><nlm:aff id="Aff6">Commissariat à l’Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057 Evry, France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Carreel, Francoise" sort="Carreel, Francoise" uniqKey="Carreel F" first="Françoise" last="Carreel">Françoise Carreel</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="D Ont, Angelique" sort="D Ont, Angelique" uniqKey="D Ont A" first="Angélique" last="D Ont">Angélique D Ont</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26984673</idno>
<idno type="pmc">4793746</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4793746</idno>
<idno type="RBID">PMC:4793746</idno>
<idno type="doi">10.1186/s12864-016-2579-4</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000A85</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000A85</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Improvement of the banana “<italic>Musa acuminata</italic>
” reference sequence using NGS data and semi-automated bioinformatics methods</title>
<author><name sortKey="Martin, Guillaume" sort="Martin, Guillaume" uniqKey="Martin G" first="Guillaume" last="Martin">Guillaume Martin</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Baurens, Franc Christophe" sort="Baurens, Franc Christophe" uniqKey="Baurens F" first="Franc-Christophe" last="Baurens">Franc-Christophe Baurens</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Droc, Gaetan" sort="Droc, Gaetan" uniqKey="Droc G" first="Gaëtan" last="Droc">Gaëtan Droc</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Rouard, Mathieu" sort="Rouard, Mathieu" uniqKey="Rouard M" first="Mathieu" last="Rouard">Mathieu Rouard</name>
<affiliation><nlm:aff id="Aff2">Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Cenci, Alberto" sort="Cenci, Alberto" uniqKey="Cenci A" first="Alberto" last="Cenci">Alberto Cenci</name>
<affiliation><nlm:aff id="Aff2">Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Kilian, Andrzej" sort="Kilian, Andrzej" uniqKey="Kilian A" first="Andrzej" last="Kilian">Andrzej Kilian</name>
<affiliation><nlm:aff id="Aff3">Diversity Arrays Technology, Yarralumla, Australian Capital Territory 2600 Australia</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Hastie, Alex" sort="Hastie, Alex" uniqKey="Hastie A" first="Alex" last="Hastie">Alex Hastie</name>
<affiliation><nlm:aff id="Aff4">BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA 92121 USA</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Dolezel, Jaroslav" sort="Dolezel, Jaroslav" uniqKey="Dolezel J" first="Jaroslav" last="Doležel">Jaroslav Doležel</name>
<affiliation><nlm:aff id="Aff5">Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371 Olomouc, Czech Republic</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Aury, Jean Marc" sort="Aury, Jean Marc" uniqKey="Aury J" first="Jean-Marc" last="Aury">Jean-Marc Aury</name>
<affiliation><nlm:aff id="Aff6">Commissariat à l’Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057 Evry, France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Alberti, Adriana" sort="Alberti, Adriana" uniqKey="Alberti A" first="Adriana" last="Alberti">Adriana Alberti</name>
<affiliation><nlm:aff id="Aff6">Commissariat à l’Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057 Evry, France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Carreel, Francoise" sort="Carreel, Francoise" uniqKey="Carreel F" first="Françoise" last="Carreel">Françoise Carreel</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
<author><name sortKey="D Ont, Angelique" sort="D Ont, Angelique" uniqKey="D Ont A" first="Angélique" last="D Ont">Angélique D Ont</name>
<affiliation><nlm:aff id="Aff1">CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint><date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (<italic>Musa acuminata</italic>
).</p>
</sec>
<sec><title>Results</title>
<p>We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of <italic>Musa acuminata.</italic>
 Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total <italic>Musa</italic>
 scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 <italic>Musa</italic>
 chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %.</p>
</sec>
<sec><title>Conclusion</title>
<p>The release of the <italic>Musa acuminata</italic>
 reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Bolger, Me" uniqKey="Bolger M">ME Bolger</name>
</author>
<author><name sortKey="Weisshaar, B" uniqKey="Weisshaar B">B Weisshaar</name>
</author>
<author><name sortKey="Scholz, U" uniqKey="Scholz U">U Scholz</name>
</author>
<author><name sortKey="Stein, N" uniqKey="Stein N">N Stein</name>
</author>
<author><name sortKey="Usadel, B" uniqKey="Usadel B">B Usadel</name>
</author>
<author><name sortKey="Mayer, Kf" uniqKey="Mayer K">KF Mayer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Feuillet, C" uniqKey="Feuillet C">C Feuillet</name>
</author>
<author><name sortKey="Leach, Je" uniqKey="Leach J">JE Leach</name>
</author>
<author><name sortKey="Rogers, J" uniqKey="Rogers J">J Rogers</name>
</author>
<author><name sortKey="Schnable, Ps" uniqKey="Schnable P">PS Schnable</name>
</author>
<author><name sortKey="Eversole, K" uniqKey="Eversole K">K Eversole</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Michael, Tp" uniqKey="Michael T">TP Michael</name>
</author>
<author><name sortKey="Jackson, S" uniqKey="Jackson S">S Jackson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kejnovsky, E" uniqKey="Kejnovsky E">E Kejnovsky</name>
</author>
<author><name sortKey="Hawkins, J" uniqKey="Hawkins J">J Hawkins</name>
</author>
<author><name sortKey="Feschotte, C" uniqKey="Feschotte C">C Feschotte</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hahn, Mw" uniqKey="Hahn M">MW Hahn</name>
</author>
<author><name sortKey="Zhang, Sv" uniqKey="Zhang S">SV Zhang</name>
</author>
<author><name sortKey="Moyle, Lc" uniqKey="Moyle L">LC Moyle</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Vanneste, K" uniqKey="Vanneste K">K Vanneste</name>
</author>
<author><name sortKey="Maere, S" uniqKey="Maere S">S Maere</name>
</author>
<author><name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alkan, C" uniqKey="Alkan C">C Alkan</name>
</author>
<author><name sortKey="Sajjadian, S" uniqKey="Sajjadian S">S Sajjadian</name>
</author>
<author><name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Williams, Ljs" uniqKey="Williams L">LJS Williams</name>
</author>
<author><name sortKey="Tabbaa, Dg" uniqKey="Tabbaa D">DG Tabbaa</name>
</author>
<author><name sortKey="Li, N" uniqKey="Li N">N Li</name>
</author>
<author><name sortKey="Berlin, Am" uniqKey="Berlin A">AM Berlin</name>
</author>
<author><name sortKey="Shea, Tp" uniqKey="Shea T">TP Shea</name>
</author>
<author><name sortKey="Maccallum, I" uniqKey="Maccallum I">I MacCallum</name>
</author>
<author><name sortKey="Lawrence, Ms" uniqKey="Lawrence M">MS Lawrence</name>
</author>
<author><name sortKey="Drier, Y" uniqKey="Drier Y">Y Drier</name>
</author>
<author><name sortKey="Getz, G" uniqKey="Getz G">G Getz</name>
</author>
<author><name sortKey="Young, Sk" uniqKey="Young S">SK Young</name>
</author>
<author><name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author><name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author><name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dong, Y" uniqKey="Dong Y">Y Dong</name>
</author>
<author><name sortKey="Xie, M" uniqKey="Xie M">M Xie</name>
</author>
<author><name sortKey="Jiang, Y" uniqKey="Jiang Y">Y Jiang</name>
</author>
<author><name sortKey="Xiao, N" uniqKey="Xiao N">N Xiao</name>
</author>
<author><name sortKey="Du, X" uniqKey="Du X">X Du</name>
</author>
<author><name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author><name sortKey="Tosser Klopp, G" uniqKey="Tosser Klopp G">G Tosser-Klopp</name>
</author>
<author><name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author><name sortKey="Yang, S" uniqKey="Yang S">S Yang</name>
</author>
<author><name sortKey="Liang, J" uniqKey="Liang J">J Liang</name>
</author>
<author><name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
<author><name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author><name sortKey="Zeng, P" uniqKey="Zeng P">P Zeng</name>
</author>
<author><name sortKey="Hou, Y" uniqKey="Hou Y">Y Hou</name>
</author>
<author><name sortKey="Bian, C" uniqKey="Bian C">C Bian</name>
</author>
<author><name sortKey="Pan, S" uniqKey="Pan S">S Pan</name>
</author>
<author><name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author><name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author><name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
<author><name sortKey="Servin, B" uniqKey="Servin B">B Servin</name>
</author>
<author><name sortKey="Sayre, B" uniqKey="Sayre B">B Sayre</name>
</author>
<author><name sortKey="Zhu, B" uniqKey="Zhu B">B Zhu</name>
</author>
<author><name sortKey="Sweeney, D" uniqKey="Sweeney D">D Sweeney</name>
</author>
<author><name sortKey="Moore, R" uniqKey="Moore R">R Moore</name>
</author>
<author><name sortKey="Nie, W" uniqKey="Nie W">W Nie</name>
</author>
<author><name sortKey="Shen, Y" uniqKey="Shen Y">Y Shen</name>
</author>
<author><name sortKey="Zhao, R" uniqKey="Zhao R">R Zhao</name>
</author>
<author><name sortKey="Zhang, G" uniqKey="Zhang G">G Zhang</name>
</author>
<author><name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author><name sortKey="Faraut, T" uniqKey="Faraut T">T Faraut</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Levy Sakin, M" uniqKey="Levy Sakin M">M Levy-Sakin</name>
</author>
<author><name sortKey="Ebenstein, Y" uniqKey="Ebenstein Y">Y Ebenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Neely, Rk" uniqKey="Neely R">RK Neely</name>
</author>
<author><name sortKey="Deen, J" uniqKey="Deen J">J Deen</name>
</author>
<author><name sortKey="Hofkens, J" uniqKey="Hofkens J">J Hofkens</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mascher, M" uniqKey="Mascher M">M Mascher</name>
</author>
<author><name sortKey="Stein, N" uniqKey="Stein N">N Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mascher, M" uniqKey="Mascher M">M Mascher</name>
</author>
<author><name sortKey="Muehlbauer, Gj" uniqKey="Muehlbauer G">GJ Muehlbauer</name>
</author>
<author><name sortKey="Rokhsar, Ds" uniqKey="Rokhsar D">DS Rokhsar</name>
</author>
<author><name sortKey="Chapman, J" uniqKey="Chapman J">J Chapman</name>
</author>
<author><name sortKey="Schmutz, J" uniqKey="Schmutz J">J Schmutz</name>
</author>
<author><name sortKey="Barry, K" uniqKey="Barry K">K Barry</name>
</author>
<author><name sortKey="Mu Oz Amatriain, M" uniqKey="Mu Oz Amatriain M">M Muñoz-Amatriaín</name>
</author>
<author><name sortKey="Close, Tj" uniqKey="Close T">TJ Close</name>
</author>
<author><name sortKey="Wise, Rp" uniqKey="Wise R">RP Wise</name>
</author>
<author><name sortKey="Schulman, Ah" uniqKey="Schulman A">AH Schulman</name>
</author>
<author><name sortKey="Himmelbach, A" uniqKey="Himmelbach A">A Himmelbach</name>
</author>
<author><name sortKey="Mayer, Kfx" uniqKey="Mayer K">KFX Mayer</name>
</author>
<author><name sortKey="Scholz, U" uniqKey="Scholz U">U Scholz</name>
</author>
<author><name sortKey="Poland, Ja" uniqKey="Poland J">JA Poland</name>
</author>
<author><name sortKey="Stein, N" uniqKey="Stein N">N Stein</name>
</author>
<author><name sortKey="Waugh, R" uniqKey="Waugh R">R Waugh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schatz, M" uniqKey="Schatz M">M Schatz</name>
</author>
<author><name sortKey="Witkowski, J" uniqKey="Witkowski J">J Witkowski</name>
</author>
<author><name sortKey="Mccombie, Wr" uniqKey="Mccombie W">WR McCombie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author><name sortKey="Kosack, Ds" uniqKey="Kosack D">DS Kosack</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dayarian, A" uniqKey="Dayarian A">A Dayarian</name>
</author>
<author><name sortKey="Michael, T" uniqKey="Michael T">T Michael</name>
</author>
<author><name sortKey="Sengupta, A" uniqKey="Sengupta A">A Sengupta</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L Salmela</name>
</author>
<author><name sortKey="M Kinen, V" uniqKey="M Kinen V">V Mäkinen</name>
</author>
<author><name sortKey="V Lim Ki, N" uniqKey="V Lim Ki N">N Välimäki</name>
</author>
<author><name sortKey="Ylinen, J" uniqKey="Ylinen J">J Ylinen</name>
</author>
<author><name sortKey="Ukkonen, E" uniqKey="Ukkonen E">E Ukkonen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Boetzer, M" uniqKey="Boetzer M">M Boetzer</name>
</author>
<author><name sortKey="Henkel, Cv" uniqKey="Henkel C">CV Henkel</name>
</author>
<author><name sortKey="Jansen, Hj" uniqKey="Jansen H">HJ Jansen</name>
</author>
<author><name sortKey="Butler, D" uniqKey="Butler D">D Butler</name>
</author>
<author><name sortKey="Pirovano, W" uniqKey="Pirovano W">W Pirovano</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gao, S" uniqKey="Gao S">S Gao</name>
</author>
<author><name sortKey="Sung, W K" uniqKey="Sung W">W-K Sung</name>
</author>
<author><name sortKey="Nagarajan, N" uniqKey="Nagarajan N">N Nagarajan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gritsenko, Aa" uniqKey="Gritsenko A">AA Gritsenko</name>
</author>
<author><name sortKey="Nijkamp, Jf" uniqKey="Nijkamp J">JF Nijkamp</name>
</author>
<author><name sortKey="Reinders, Mjt" uniqKey="Reinders M">MJT Reinders</name>
</author>
<author><name sortKey="De Ridder, D" uniqKey="De Ridder D">D de Ridder</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Donmez, N" uniqKey="Donmez N">N Donmez</name>
</author>
<author><name sortKey="Brudno, M" uniqKey="Brudno M">M Brudno</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Boetzer, M" uniqKey="Boetzer M">M Boetzer</name>
</author>
<author><name sortKey="Pirovano, W" uniqKey="Pirovano W">W Pirovano</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
<author><name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author><name sortKey="Xie, Y" uniqKey="Xie Y">Y Xie</name>
</author>
<author><name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author><name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author><name sortKey="Yuan, J" uniqKey="Yuan J">J Yuan</name>
</author>
<author><name sortKey="He, G" uniqKey="He G">G He</name>
</author>
<author><name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author><name sortKey="Pan, Q" uniqKey="Pan Q">Q Pan</name>
</author>
<author><name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author><name sortKey="Tang, J" uniqKey="Tang J">J Tang</name>
</author>
<author><name sortKey="Wu, G" uniqKey="Wu G">G Wu</name>
</author>
<author><name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author><name sortKey="Shi, Y" uniqKey="Shi Y">Y Shi</name>
</author>
<author><name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author><name sortKey="Yu, C" uniqKey="Yu C">C Yu</name>
</author>
<author><name sortKey="Wang, B" uniqKey="Wang B">B Wang</name>
</author>
<author><name sortKey="Lu, Y" uniqKey="Lu Y">Y Lu</name>
</author>
<author><name sortKey="Han, C" uniqKey="Han C">C Han</name>
</author>
<author><name sortKey="Cheung, D" uniqKey="Cheung D">D Cheung</name>
</author>
<author><name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author><name sortKey="Peng, S" uniqKey="Peng S">S Peng</name>
</author>
<author><name sortKey="Xiaoqian, Z" uniqKey="Xiaoqian Z">Z Xiaoqian</name>
</author>
<author><name sortKey="Liu, G" uniqKey="Liu G">G Liu</name>
</author>
<author><name sortKey="Liao, X" uniqKey="Liao X">X Liao</name>
</author>
<author><name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author><name sortKey="Yang, H" uniqKey="Yang H">H Yang</name>
</author>
<author><name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author><name sortKey="Lam, T W" uniqKey="Lam T">T-W Lam</name>
</author>
<author><name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Boetzer, M" uniqKey="Boetzer M">M Boetzer</name>
</author>
<author><name sortKey="Pirovano, W" uniqKey="Pirovano W">W Pirovano</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Swain, Mt" uniqKey="Swain M">MT Swain</name>
</author>
<author><name sortKey="Tsai, Ij" uniqKey="Tsai I">IJ Tsai</name>
</author>
<author><name sortKey="Assefa, Sa" uniqKey="Assefa S">SA Assefa</name>
</author>
<author><name sortKey="Newbold, C" uniqKey="Newbold C">C Newbold</name>
</author>
<author><name sortKey="Berriman, M" uniqKey="Berriman M">M Berriman</name>
</author>
<author><name sortKey="Otto, Td" uniqKey="Otto T">TD Otto</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="D Ont, A" uniqKey="D Ont A">A D’Hont</name>
</author>
<author><name sortKey="Denoeud, F" uniqKey="Denoeud F">F Denoeud</name>
</author>
<author><name sortKey="Aury, J M" uniqKey="Aury J">J-M Aury</name>
</author>
<author><name sortKey="Baurens, F C" uniqKey="Baurens F">F-C Baurens</name>
</author>
<author><name sortKey="Carreel, F" uniqKey="Carreel F">F Carreel</name>
</author>
<author><name sortKey="Garsmeur, O" uniqKey="Garsmeur O">O Garsmeur</name>
</author>
<author><name sortKey="Noel, B" uniqKey="Noel B">B Noel</name>
</author>
<author><name sortKey="Bocs, S" uniqKey="Bocs S">S Bocs</name>
</author>
<author><name sortKey="Droc, G" uniqKey="Droc G">G Droc</name>
</author>
<author><name sortKey="Rouard, M" uniqKey="Rouard M">M Rouard</name>
</author>
<author><name sortKey="Da Silva, C" uniqKey="Da Silva C">C Da Silva</name>
</author>
<author><name sortKey="Jabbari, K" uniqKey="Jabbari K">K Jabbari</name>
</author>
<author><name sortKey="Cardi, C" uniqKey="Cardi C">C Cardi</name>
</author>
<author><name sortKey="Poulain, J" uniqKey="Poulain J">J Poulain</name>
</author>
<author><name sortKey="Souquet, M" uniqKey="Souquet M">M Souquet</name>
</author>
<author><name sortKey="Labadie, K" uniqKey="Labadie K">K Labadie</name>
</author>
<author><name sortKey="Jourda, C" uniqKey="Jourda C">C Jourda</name>
</author>
<author><name sortKey="Lengelle, J" uniqKey="Lengelle J">J Lengelle</name>
</author>
<author><name sortKey="Rodier Goud, M" uniqKey="Rodier Goud M">M Rodier-Goud</name>
</author>
<author><name sortKey="Alberti, A" uniqKey="Alberti A">A Alberti</name>
</author>
<author><name sortKey="Bernard, M" uniqKey="Bernard M">M Bernard</name>
</author>
<author><name sortKey="Correa, M" uniqKey="Correa M">M Correa</name>
</author>
<author><name sortKey="Ayyampalayam, S" uniqKey="Ayyampalayam S">S Ayyampalayam</name>
</author>
<author><name sortKey="Mckain, Mr" uniqKey="Mckain M">MR Mckain</name>
</author>
<author><name sortKey="Leebens Mack, J" uniqKey="Leebens Mack J">J Leebens-Mack</name>
</author>
<author><name sortKey="Burgess, D" uniqKey="Burgess D">D Burgess</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
<author><name sortKey="Mbeguie A Mbeguie, D" uniqKey="Mbeguie A Mbeguie D">D Mbeguie-A-Mbeguie</name>
</author>
<author><name sortKey="Chabannes, M" uniqKey="Chabannes M">M Chabannes</name>
</author>
<author><name sortKey="Wicker, T" uniqKey="Wicker T">T Wicker</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jourda, C" uniqKey="Jourda C">C Jourda</name>
</author>
<author><name sortKey="Cardi, C" uniqKey="Cardi C">C Cardi</name>
</author>
<author><name sortKey="Mbeguie A Mbeguie, D" uniqKey="Mbeguie A Mbeguie D">D Mbéguié-A-Mbéguié</name>
</author>
<author><name sortKey="Bocs, S" uniqKey="Bocs S">S Bocs</name>
</author>
<author><name sortKey="Garsmeur, O" uniqKey="Garsmeur O">O Garsmeur</name>
</author>
<author><name sortKey="D Ont, A" uniqKey="D Ont A">A D’Hont</name>
</author>
<author><name sortKey="Yahiaoui, N" uniqKey="Yahiaoui N">N Yahiaoui</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Garsmeur, O" uniqKey="Garsmeur O">O Garsmeur</name>
</author>
<author><name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
<author><name sortKey="Almeida, A" uniqKey="Almeida A">A Almeida</name>
</author>
<author><name sortKey="Jourda, C" uniqKey="Jourda C">C Jourda</name>
</author>
<author><name sortKey="D Ont, A" uniqKey="D Ont A">A D’Hont</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cenci, A" uniqKey="Cenci A">A Cenci</name>
</author>
<author><name sortKey="Guignon, V" uniqKey="Guignon V">V Guignon</name>
</author>
<author><name sortKey="Roux, N" uniqKey="Roux N">N Roux</name>
</author>
<author><name sortKey="Rouard, M" uniqKey="Rouard M">M Rouard</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author><name sortKey="Hu, Q" uniqKey="Hu Q">Q Hu</name>
</author>
<author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author><name sortKey="Lu, C" uniqKey="Lu C">C Lu</name>
</author>
<author><name sortKey="Kuang, H" uniqKey="Kuang H">H Kuang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Golicz, Aa" uniqKey="Golicz A">AA Golicz</name>
</author>
<author><name sortKey="Schliep, M" uniqKey="Schliep M">M Schliep</name>
</author>
<author><name sortKey="Lee, Ht" uniqKey="Lee H">HT Lee</name>
</author>
<author><name sortKey="Larkum, Awd" uniqKey="Larkum A">AWD Larkum</name>
</author>
<author><name sortKey="Dolferus, R" uniqKey="Dolferus R">R Dolferus</name>
</author>
<author><name sortKey="Batley, J" uniqKey="Batley J">J Batley</name>
</author>
<author><name sortKey="Chan, C Kk" uniqKey="Chan C">C-KK Chan</name>
</author>
<author><name sortKey="Sablok, G" uniqKey="Sablok G">G Sablok</name>
</author>
<author><name sortKey="Ralph, Pj" uniqKey="Ralph P">PJ Ralph</name>
</author>
<author><name sortKey="Edwards, D" uniqKey="Edwards D">D Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sampedro, J" uniqKey="Sampedro J">J Sampedro</name>
</author>
<author><name sortKey="Guttman, M" uniqKey="Guttman M">M Guttman</name>
</author>
<author><name sortKey="Li, L C" uniqKey="Li L">L-C Li</name>
</author>
<author><name sortKey="Cosgrove, Dj" uniqKey="Cosgrove D">DJ Cosgrove</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="De Smet, R" uniqKey="De Smet R">R De Smet</name>
</author>
<author><name sortKey="Adams, Kl" uniqKey="Adams K">KL Adams</name>
</author>
<author><name sortKey="Vandepoele, K" uniqKey="Vandepoele K">K Vandepoele</name>
</author>
<author><name sortKey="Van Montagu, Mce" uniqKey="Van Montagu M">MCE Van Montagu</name>
</author>
<author><name sortKey="Maere, S" uniqKey="Maere S">S Maere</name>
</author>
<author><name sortKey="Van De Peer, Y" uniqKey="Van De Peer Y">Y Van de Peer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chain, Psg" uniqKey="Chain P">PSG Chain</name>
</author>
<author><name sortKey="Grafham, Dv" uniqKey="Grafham D">DV Grafham</name>
</author>
<author><name sortKey="Fulton, Rs" uniqKey="Fulton R">RS Fulton</name>
</author>
<author><name sortKey="Fitzgerald, Mg" uniqKey="Fitzgerald M">MG FitzGerald</name>
</author>
<author><name sortKey="Hostetler, J" uniqKey="Hostetler J">J Hostetler</name>
</author>
<author><name sortKey="Muzny, D" uniqKey="Muzny D">D Muzny</name>
</author>
<author><name sortKey="Ali, J" uniqKey="Ali J">J Ali</name>
</author>
<author><name sortKey="Birren, B" uniqKey="Birren B">B Birren</name>
</author>
<author><name sortKey="Bruce, Dc" uniqKey="Bruce D">DC Bruce</name>
</author>
<author><name sortKey="Buhay, C" uniqKey="Buhay C">C Buhay</name>
</author>
<author><name sortKey="Cole, Jr" uniqKey="Cole J">JR Cole</name>
</author>
<author><name sortKey="Ding, Y" uniqKey="Ding Y">Y Ding</name>
</author>
<author><name sortKey="Dugan, S" uniqKey="Dugan S">S Dugan</name>
</author>
<author><name sortKey="Field, D" uniqKey="Field D">D Field</name>
</author>
<author><name sortKey="Garrity, Gm" uniqKey="Garrity G">GM Garrity</name>
</author>
<author><name sortKey="Gibbs, R" uniqKey="Gibbs R">R Gibbs</name>
</author>
<author><name sortKey="Graves, T" uniqKey="Graves T">T Graves</name>
</author>
<author><name sortKey="Han, Cs" uniqKey="Han C">CS Han</name>
</author>
<author><name sortKey="Harrison, Sh" uniqKey="Harrison S">SH Harrison</name>
</author>
<author><name sortKey="Highlander, S" uniqKey="Highlander S">S Highlander</name>
</author>
<author><name sortKey="Hugenholtz, P" uniqKey="Hugenholtz P">P Hugenholtz</name>
</author>
<author><name sortKey="Khouri, Hm" uniqKey="Khouri H">HM Khouri</name>
</author>
<author><name sortKey="Kodira, Cd" uniqKey="Kodira C">CD Kodira</name>
</author>
<author><name sortKey="Kolker, E" uniqKey="Kolker E">E Kolker</name>
</author>
<author><name sortKey="Kyrpides, Nc" uniqKey="Kyrpides N">NC Kyrpides</name>
</author>
<author><name sortKey="Lang, D" uniqKey="Lang D">D Lang</name>
</author>
<author><name sortKey="Lapidus, A" uniqKey="Lapidus A">A Lapidus</name>
</author>
<author><name sortKey="Malfatti, Sa" uniqKey="Malfatti S">SA Malfatti</name>
</author>
<author><name sortKey="Markowitz, V" uniqKey="Markowitz V">V Markowitz</name>
</author>
<author><name sortKey="Metha, T" uniqKey="Metha T">T Metha</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Simkova, H" uniqKey="Simkova H">H Šimková</name>
</author>
<author><name sortKey=" Halikova, J" uniqKey=" Halikova J">J Číhalíková</name>
</author>
<author><name sortKey="Vrana, J" uniqKey="Vrana J">J Vrána</name>
</author>
<author><name sortKey="Lysak, M" uniqKey="Lysak M">M Lysák</name>
</author>
<author><name sortKey="Dolezel, J" uniqKey="Dolezel J">J Doležel</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author><name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Van Ooijen, Jw" uniqKey="Van Ooijen J">JW Van Ooijen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krzywinski, M" uniqKey="Krzywinski M">M Krzywinski</name>
</author>
<author><name sortKey="Schein, J" uniqKey="Schein J">J Schein</name>
</author>
<author><name sortKey="Birol, " uniqKey="Birol ">İ Birol</name>
</author>
<author><name sortKey="Connors, J" uniqKey="Connors J">J Connors</name>
</author>
<author><name sortKey="Gascoyne, R" uniqKey="Gascoyne R">R Gascoyne</name>
</author>
<author><name sortKey="Horsman, D" uniqKey="Horsman D">D Horsman</name>
</author>
<author><name sortKey="Jones, Sj" uniqKey="Jones S">SJ Jones</name>
</author>
<author><name sortKey="Marra, Ma" uniqKey="Marra M">MA Marra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Anantharaman, T" uniqKey="Anantharaman T">T Anantharaman</name>
</author>
<author><name sortKey="Mishra, B" uniqKey="Mishra B">B Mishra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nguyen, Jv" uniqKey="Nguyen J">JV Nguyen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pendleton, M" uniqKey="Pendleton M">M Pendleton</name>
</author>
<author><name sortKey="Sebra, R" uniqKey="Sebra R">R Sebra</name>
</author>
<author><name sortKey="Pang, Awc" uniqKey="Pang A">AWC Pang</name>
</author>
<author><name sortKey="Ummat, A" uniqKey="Ummat A">A Ummat</name>
</author>
<author><name sortKey="Franzen, O" uniqKey="Franzen O">O Franzen</name>
</author>
<author><name sortKey="Rausch, T" uniqKey="Rausch T">T Rausch</name>
</author>
<author><name sortKey="Stutz, Am" uniqKey="Stutz A">AM Stütz</name>
</author>
<author><name sortKey="Stedman, W" uniqKey="Stedman W">W Stedman</name>
</author>
<author><name sortKey="Anantharaman, T" uniqKey="Anantharaman T">T Anantharaman</name>
</author>
<author><name sortKey="Hastie, A" uniqKey="Hastie A">A Hastie</name>
</author>
<author><name sortKey="Dai, H" uniqKey="Dai H">H Dai</name>
</author>
<author><name sortKey="Fritz, Mh Y" uniqKey="Fritz M">MH-Y Fritz</name>
</author>
<author><name sortKey="Cao, H" uniqKey="Cao H">H Cao</name>
</author>
<author><name sortKey="Cohain, A" uniqKey="Cohain A">A Cohain</name>
</author>
<author><name sortKey="Deikus, G" uniqKey="Deikus G">G Deikus</name>
</author>
<author><name sortKey="Durrett, Re" uniqKey="Durrett R">RE Durrett</name>
</author>
<author><name sortKey="Blanchard, Sc" uniqKey="Blanchard S">SC Blanchard</name>
</author>
<author><name sortKey="Altman, R" uniqKey="Altman R">R Altman</name>
</author>
<author><name sortKey="Chin, C S" uniqKey="Chin C">C-S Chin</name>
</author>
<author><name sortKey="Guo, Y" uniqKey="Guo Y">Y Guo</name>
</author>
<author><name sortKey="Paxinos, Ee" uniqKey="Paxinos E">EE Paxinos</name>
</author>
<author><name sortKey="Korbel, Jo" uniqKey="Korbel J">JO Korbel</name>
</author>
<author><name sortKey="Darnell, Rb" uniqKey="Darnell R">RB Darnell</name>
</author>
<author><name sortKey="Mccombie, Wr" uniqKey="Mccombie W">WR McCombie</name>
</author>
<author><name sortKey="Kwok, P Y" uniqKey="Kwok P">P-Y Kwok</name>
</author>
<author><name sortKey="Mason, Ce" uniqKey="Mason C">CE Mason</name>
</author>
<author><name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author><name sortKey="Bashir, A" uniqKey="Bashir A">A Bashir</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Slater, G" uniqKey="Slater G">G Slater</name>
</author>
<author><name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Quinlan, Ar" uniqKey="Quinlan A">AR Quinlan</name>
</author>
<author><name sortKey="Hall, Im" uniqKey="Hall I">IM Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Muggli, Md" uniqKey="Muggli M">MD Muggli</name>
</author>
<author><name sortKey="Puglisi, Sj" uniqKey="Puglisi S">SJ Puglisi</name>
</author>
<author><name sortKey="Ronen, R" uniqKey="Ronen R">R Ronen</name>
</author>
<author><name sortKey="Boucher, C" uniqKey="Boucher C">C Boucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chen, M" uniqKey="Chen M">M Chen</name>
</author>
<author><name sortKey="Presting, G" uniqKey="Presting G">G Presting</name>
</author>
<author><name sortKey="Barbazuk, Wb" uniqKey="Barbazuk W">WB Barbazuk</name>
</author>
<author><name sortKey="Goicoechea, Jl" uniqKey="Goicoechea J">JL Goicoechea</name>
</author>
<author><name sortKey="Blackmon, B" uniqKey="Blackmon B">B Blackmon</name>
</author>
<author><name sortKey="Fang, G" uniqKey="Fang G">G Fang</name>
</author>
<author><name sortKey="Kim, H" uniqKey="Kim H">H Kim</name>
</author>
<author><name sortKey="Frisch, D" uniqKey="Frisch D">D Frisch</name>
</author>
<author><name sortKey="Yu, Y" uniqKey="Yu Y">Y Yu</name>
</author>
<author><name sortKey="Sun, S" uniqKey="Sun S">S Sun</name>
</author>
<author><name sortKey="Higingbottom, S" uniqKey="Higingbottom S">S Higingbottom</name>
</author>
<author><name sortKey="Phimphilai, J" uniqKey="Phimphilai J">J Phimphilai</name>
</author>
<author><name sortKey="Phimphilai, D" uniqKey="Phimphilai D">D Phimphilai</name>
</author>
<author><name sortKey="Thurmond, S" uniqKey="Thurmond S">S Thurmond</name>
</author>
<author><name sortKey="Gaudette, B" uniqKey="Gaudette B">B Gaudette</name>
</author>
<author><name sortKey="Li, P" uniqKey="Li P">P Li</name>
</author>
<author><name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
<author><name sortKey="Hatfield, J" uniqKey="Hatfield J">J Hatfield</name>
</author>
<author><name sortKey="Main, D" uniqKey="Main D">D Main</name>
</author>
<author><name sortKey="Farrar, K" uniqKey="Farrar K">K Farrar</name>
</author>
<author><name sortKey="Henderson, C" uniqKey="Henderson C">C Henderson</name>
</author>
<author><name sortKey="Barnett, L" uniqKey="Barnett L">L Barnett</name>
</author>
<author><name sortKey="Costa, R" uniqKey="Costa R">R Costa</name>
</author>
<author><name sortKey="Williams, B" uniqKey="Williams B">B Williams</name>
</author>
<author><name sortKey="Walser, S" uniqKey="Walser S">S Walser</name>
</author>
<author><name sortKey="Atkins, M" uniqKey="Atkins M">M Atkins</name>
</author>
<author><name sortKey="Hall, C" uniqKey="Hall C">C Hall</name>
</author>
<author><name sortKey="Budiman, Ma" uniqKey="Budiman M">MA Budiman</name>
</author>
<author><name sortKey="Tomkins, Jp" uniqKey="Tomkins J">JP Tomkins</name>
</author>
<author><name sortKey="Luo, M" uniqKey="Luo M">M Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gill, Ks" uniqKey="Gill K">KS Gill</name>
</author>
<author><name sortKey="Gill, Bs" uniqKey="Gill B">BS Gill</name>
</author>
<author><name sortKey="Endo, Tr" uniqKey="Endo T">TR Endo</name>
</author>
<author><name sortKey="Taylor, T" uniqKey="Taylor T">T Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hall, Se" uniqKey="Hall S">SE Hall</name>
</author>
<author><name sortKey="Kettler, G" uniqKey="Kettler G">G Kettler</name>
</author>
<author><name sortKey="Preuss, D" uniqKey="Preuss D">D Preuss</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, J" uniqKey="Wu J">J Wu</name>
</author>
<author><name sortKey="Mizuno, H" uniqKey="Mizuno H">H Mizuno</name>
</author>
<author><name sortKey="Hayashi Tsugane, M" uniqKey="Hayashi Tsugane M">M Hayashi-Tsugane</name>
</author>
<author><name sortKey="Ito, Y" uniqKey="Ito Y">Y Ito</name>
</author>
<author><name sortKey="Chiden, Y" uniqKey="Chiden Y">Y Chiden</name>
</author>
<author><name sortKey="Fujisawa, M" uniqKey="Fujisawa M">M Fujisawa</name>
</author>
<author><name sortKey="Katagiri, S" uniqKey="Katagiri S">S Katagiri</name>
</author>
<author><name sortKey="Saji, S" uniqKey="Saji S">S Saji</name>
</author>
<author><name sortKey="Yoshiki, S" uniqKey="Yoshiki S">S Yoshiki</name>
</author>
<author><name sortKey="Karasawa, W" uniqKey="Karasawa W">W Karasawa</name>
</author>
<author><name sortKey="Yoshihara, R" uniqKey="Yoshihara R">R Yoshihara</name>
</author>
<author><name sortKey="Hayashi, A" uniqKey="Hayashi A">A Hayashi</name>
</author>
<author><name sortKey="Kobayashi, H" uniqKey="Kobayashi H">H Kobayashi</name>
</author>
<author><name sortKey="Ito, K" uniqKey="Ito K">K Ito</name>
</author>
<author><name sortKey="Hamada, M" uniqKey="Hamada M">M Hamada</name>
</author>
<author><name sortKey="Okamoto, M" uniqKey="Okamoto M">M Okamoto</name>
</author>
<author><name sortKey="Ikeno, M" uniqKey="Ikeno M">M Ikeno</name>
</author>
<author><name sortKey="Ichikawa, Y" uniqKey="Ichikawa Y">Y Ichikawa</name>
</author>
<author><name sortKey="Katayose, Y" uniqKey="Katayose Y">Y Katayose</name>
</author>
<author><name sortKey="Yano, M" uniqKey="Yano M">M Yano</name>
</author>
<author><name sortKey="Matsumoto, T" uniqKey="Matsumoto T">T Matsumoto</name>
</author>
<author><name sortKey="Sasaki, T" uniqKey="Sasaki T">T Sasaki</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Droc, G" uniqKey="Droc G">G Droc</name>
</author>
<author><name sortKey="Lariviere, D" uniqKey="Lariviere D">D Larivière</name>
</author>
<author><name sortKey="Guignon, V" uniqKey="Guignon V">V Guignon</name>
</author>
<author><name sortKey="Yahiaoui, N" uniqKey="Yahiaoui N">N Yahiaoui</name>
</author>
<author><name sortKey="This, D" uniqKey="This D">D This</name>
</author>
<author><name sortKey="Garsmeur, O" uniqKey="Garsmeur O">O Garsmeur</name>
</author>
<author><name sortKey="Dereeper, A" uniqKey="Dereeper A">A Dereeper</name>
</author>
<author><name sortKey="Hamelin, C" uniqKey="Hamelin C">C Hamelin</name>
</author>
<author><name sortKey="Argout, X" uniqKey="Argout X">X Argout</name>
</author>
<author><name sortKey="Dufayard, J F" uniqKey="Dufayard J">J-F Dufayard</name>
</author>
<author><name sortKey="Lengelle, J" uniqKey="Lengelle J">J Lengelle</name>
</author>
<author><name sortKey="Baurens, F C" uniqKey="Baurens F">F-C Baurens</name>
</author>
<author><name sortKey="Cenci, A" uniqKey="Cenci A">A Cenci</name>
</author>
<author><name sortKey="Pitollat, B" uniqKey="Pitollat B">B Pitollat</name>
</author>
<author><name sortKey="D Ont, A" uniqKey="D Ont A">A D’Hont</name>
</author>
<author><name sortKey="Ruiz, M" uniqKey="Ruiz M">M Ruiz</name>
</author>
<author><name sortKey="Rouard, M" uniqKey="Rouard M">M Rouard</name>
</author>
<author><name sortKey="Bocs, S" uniqKey="Bocs S">S Bocs</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group><journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">26984673</article-id>
<article-id pub-id-type="pmc">4793746</article-id>
<article-id pub-id-type="publisher-id">2579</article-id>
<article-id pub-id-type="doi">10.1186/s12864-016-2579-4</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Improvement of the banana “<italic>Musa acuminata</italic>
” reference sequence using NGS data and semi-automated bioinformatics methods</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Martin</surname>
<given-names>Guillaume</given-names>
</name>
<address><email>guillaume.martin@cirad.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Baurens</surname>
<given-names>Franc-Christophe</given-names>
</name>
<address><email>franc-christophe.baurens@cirad.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Droc</surname>
<given-names>Gaëtan</given-names>
</name>
<address><email>gaetan.droc@cirad.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Rouard</surname>
<given-names>Mathieu</given-names>
</name>
<address><email>m.rouard@cgiar.org</email>
</address>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Cenci</surname>
<given-names>Alberto</given-names>
</name>
<address><email>a.cenci@cgiar.org</email>
</address>
<xref ref-type="aff" rid="Aff2"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Kilian</surname>
<given-names>Andrzej</given-names>
</name>
<address><email>zej@diversityarrays.com</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Hastie</surname>
<given-names>Alex</given-names>
</name>
<address><email>ahastie@bionanogenomics.com</email>
</address>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Doležel</surname>
<given-names>Jaroslav</given-names>
</name>
<address><email>dolezel@ueb.cas.cz</email>
</address>
<xref ref-type="aff" rid="Aff5"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Aury</surname>
<given-names>Jean-Marc</given-names>
</name>
<address><email>jmaury@genoscope.cns.fr</email>
</address>
<xref ref-type="aff" rid="Aff6"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Alberti</surname>
<given-names>Adriana</given-names>
</name>
<address><email>aalberti@genoscope.cns.fr</email>
</address>
<xref ref-type="aff" rid="Aff6"></xref>
</contrib>
<contrib contrib-type="author"><name><surname>Carreel</surname>
<given-names>Françoise</given-names>
</name>
<address><email>francoise.carreel@cirad.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>D’Hont</surname>
<given-names>Angélique</given-names>
</name>
<address><phone>+33 (0)4 67 61 59 27</phone>
<email>dhont@cirad.fr</email>
<email>angelique.d'hont@cirad.fr</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
</contrib>
<aff id="Aff1"><label></label>
CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5 France</aff>
<aff id="Aff2"><label></label>
Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5 France</aff>
<aff id="Aff3"><label></label>
Diversity Arrays Technology, Yarralumla, Australian Capital Territory 2600 Australia</aff>
<aff id="Aff4"><label></label>
BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA 92121 USA</aff>
<aff id="Aff5"><label></label>
Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371 Olomouc, Czech Republic</aff>
<aff id="Aff6"><label></label>
Commissariat à l’Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057 Evry, France</aff>
</contrib-group>
<pub-date pub-type="epub"><day>16</day>
<month>3</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>16</day>
<month>3</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection"><year>2016</year>
</pub-date>
<volume>17</volume>
<elocation-id>243</elocation-id>
<history><date date-type="received"><day>4</day>
<month>8</month>
<year>2015</year>
</date>
<date date-type="accepted"><day>8</day>
<month>3</month>
<year>2016</year>
</date>
</history>
<permissions><copyright-statement>© Martin et al. 2016</copyright-statement>
<license license-type="OpenAccess"><license-p><bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1"><sec><title>Background</title>
<p>Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (<italic>Musa acuminata</italic>
).</p>
</sec>
<sec><title>Results</title>
<p>We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of <italic>Musa acuminata.</italic>
 Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total <italic>Musa</italic>
 scaffold number from 7513 to 1532 (i.e. by 80 %), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5 % of the assembly was anchored to the 11 <italic>Musa</italic>
 chromosomes compared to the previous 70 %. Unknown sites (N) were reduced from 17.3 to 10.0 %.</p>
</sec>
<sec><title>Conclusion</title>
<p>The release of the <italic>Musa acuminata</italic>
 reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-016-2579-4) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en"><title>Keywords</title>
<kwd><italic>Musa acuminata</italic>
</kwd>
<kwd>Genome assembly</kwd>
<kwd>Bioinformatics tool</kwd>
<kwd>Paired-end sequences</kwd>
<kwd>GBS</kwd>
<kwd>Genome map</kwd>
</kwd-group>
<funding-group><award-group><funding-source><institution>CIRAD (FR)</institution>
</funding-source>
</award-group>
<award-group><funding-source><institution>CRP-RTB</institution>
</funding-source>
</award-group>
<award-group><funding-source><institution>DArT (AUS)</institution>
</funding-source>
</award-group>
<award-group><funding-source><institution>National Program of Sustainability I</institution>
</funding-source>
<award-id>LO1204</award-id>
<principal-award-recipient><name><surname>Doležel</surname>
<given-names>Jaroslav</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<custom-meta-group><custom-meta><meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2016</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body><sec id="Sec1"><title>Background</title>
<p>The first two plant genomes to be sequenced were <italic>Arabidopsis</italic>
 and rice. Their sequences were obtained by sequencing a minimum tiling path of bacterial artificial chromosome (BAC) clones selected from physical maps. Since then, the number of sequenced plant genomes has increased steadily each year, thanks to considerable decrease in costs and increase in throughput of sequencing technologies [<xref ref-type="bibr" rid="CR1">1</xref>
–<xref ref-type="bibr" rid="CR3">3</xref>
]. Nowadays, most genome assemblies are produced after whole genome shotgun sequencing (WGS) using Next Generation Sequencing (NGS). WGS is based on three main steps: i) assembling raw sequence reads into larger sequences called contigs; ii) building bridges between contigs using end-sequenced DNA fragments of various lengths (<italic>e.g</italic>
 BACs, fosmids, plasmids, large insert size libraries) to generate scaffolds; iii) anchoring scaffolds to chromosomes using genetic mapping data to produce pseudo-molecules.</p>
<p>A major challenge is to generate highly contiguous sequence assemblies from short reads in genomes characterized by sequence redundancy, which is a typical situation for plants. The main source of redundancy is transposable elements (TE) that represent a large part of plant genomes (from 14 % in <italic>Arabidopsis</italic>
 to 80 % in wheat) (reviewed in [<xref ref-type="bibr" rid="CR4">4</xref>
]). Another source of difficulties are paralogous genes [<xref ref-type="bibr" rid="CR5">5</xref>
] resulting from various types of duplications processes including whole genome duplication (WGD) that occurred frequently during the evolution of plants [<xref ref-type="bibr" rid="CR6">6</xref>
] or segmental duplication of various sizes. Repeated sequences are often assembled into a single collapsed region during the assembly steps [<xref ref-type="bibr" rid="CR7">7</xref>
]. Once created, a collapsed region is linked to multiple other genomic regions leading to conflicts. Automatic assemblers then face two problematic options, either to assemble anyway with a risk to misassemble non-contiguous regions or to prematurely stop the sequence assembly process. These constraints are exacerbated with short insert-size paired reads since the insert size will not span repeat elements. Conversely, scaffolding with only very large insert size libraries (i.e. BAC-end sequences) limits the integration of small scaffolds in the final assembly.</p>
<p>New approaches are continuously being developed to improve genome sequence assemblies. They include longer read sequencing, high coverage medium and large insert size libraries [<xref ref-type="bibr" rid="CR8">8</xref>
, <xref ref-type="bibr" rid="CR9">9</xref>
], optical maps [<xref ref-type="bibr" rid="CR10">10</xref>
–<xref ref-type="bibr" rid="CR12">12</xref>
], which improve contigs assembly into scaffolds, and genotyping by sequencing (GBS), which has been used to assemble scaffolds into pseudo-molecules [<xref ref-type="bibr" rid="CR13">13</xref>
, <xref ref-type="bibr" rid="CR14">14</xref>
]. In contrast to tremendous advances in high-throughput sequencing, assembling sequences remains a substantial endeavor [<xref ref-type="bibr" rid="CR15">15</xref>
]. Several automated programs have been developed to improve draft genome sequence assemblies such as Bambus [<xref ref-type="bibr" rid="CR16">16</xref>
], SOPRA [<xref ref-type="bibr" rid="CR17">17</xref>
], MIP [<xref ref-type="bibr" rid="CR18">18</xref>
], SSPACE [<xref ref-type="bibr" rid="CR19">19</xref>
], Opera [<xref ref-type="bibr" rid="CR20">20</xref>
], GRASS [<xref ref-type="bibr" rid="CR21">21</xref>
], SCARPA [<xref ref-type="bibr" rid="CR22">22</xref>
], SSPACE-LongRead [<xref ref-type="bibr" rid="CR23">23</xref>
], SOAP-de-novo2 [<xref ref-type="bibr" rid="CR24">24</xref>
], GapFiller [<xref ref-type="bibr" rid="CR25">25</xref>
] and PAGIT [<xref ref-type="bibr" rid="CR26">26</xref>
]. However, these programs were designed for assembling contigs into scaffolds and/or filling unknown regions, and are running under a compromise between the quantity and quality of the assembly. This compromise results in a significant proportion of misassembled, un-scaffolded and un-filled regions.</p>
<p>A draft genome sequence assembly of banana (<italic>Musa acuminata</italic>
, 2n = 22, 1C = 523 Mbp)<italic>,</italic>
 was produced recently using the WGS strategy [<xref ref-type="bibr" rid="CR27">27</xref>
]. The sequence was obtained from a doubled-haploid plant of cv. Pahang and represented a major step forward in understanding the structure and evolution of the banana genome [<xref ref-type="bibr" rid="CR27">27</xref>
, <xref ref-type="bibr" rid="CR28">28</xref>
]. Specific ancestral whole genome duplications were identified within the <italic>Musa</italic>
 lineage and their impact on gene fractionation and expression patterns was characterized [<xref ref-type="bibr" rid="CR29">29</xref>
]. Being the first monocotyledon genome sequence outside the Poales, the sequence provided an essential bridge for comparative genome analysis in plants e.g. [<xref ref-type="bibr" rid="CR27">27</xref>
, <xref ref-type="bibr" rid="CR28">28</xref>
, <xref ref-type="bibr" rid="CR30">30</xref>
–<xref ref-type="bibr" rid="CR34">34</xref>
].</p>
<p>According to criteria outlined by [<xref ref-type="bibr" rid="CR35">35</xref>
], this genome sequence can be classified as high quality draft. However, there has been an obvious room for improvement, including the reduction of the number of scaffolds (7573) and the number of scaffolds not anchored to one of the eleven chromosomes (30 % of the draft assembly). Here we describe a significant improvement of the first <italic>Musa acuminata</italic>
 draft reference genome sequence and the bioinformatics tools that we developed and used in this work. The work comprised: i) detection and correction of sequence misassemblies, ii) merging scaffolds, and iii) integration of many previously un-anchored scaffolds to the 11 pseudo-molecules. In addition, conciliation between existing genome annotations was made.</p>
</sec>
<sec id="Sec2" sec-type="materials|methods"><title>Methods</title>
<sec id="Sec3"><title>Sequence data</title>
<p>The first draft reference sequence of banana (<italic>Musa acuminata</italic>
) [<xref ref-type="bibr" rid="CR27">27</xref>
] was produced from DNA of a doubled-haploid plant of cv. ‘Pahang’ (DH-Pahang) using reads obtained by 454 sequencing (ERX166948 to ERX167027), Sanger 10 kb fosmid paired-reads (available on the Banana Genome Hub, <ext-link ext-link-type="uri" xlink:href="http://banana-genome.cirad.fr/download">http://banana-genome.cirad.fr/download</ext-link>
), Sanger BAC-end reads (available on the Banana Genome Hub, <ext-link ext-link-type="uri" xlink:href="http://banana-genome.cirad.fr/download">http://banana-genome.cirad.fr/download</ext-link>
) and 330 bp pair-end illumina sequences (ERX179491 to ERX179503). In the present work a 5 kb mate-pair library of DH-Pahang was created and sequenced using illumina HiSeq 2000 to 40x genome coverage. The reads obtained were trimmed and filtered following three criteria: (1) trimming of both read ends until base quality is higher or equal to 20; (2) read trimming at the second unknown base in the sequence; and (3) read larger or equal to 30 bases were conserved.</p>
</sec>
<sec id="Sec4"><title>Single molecule mapping</title>
<p>Genome map of DH-Pahang genome was constructed using BioNano Irys System (BioNano Genomics, San Diego, USA). High molecular weight (HMW) DNA was prepared according to [<xref ref-type="bibr" rid="CR36">36</xref>
]. Briefly, a liquid suspension of intact cell nuclei was prepared by mechanical homogenization of formaldehyde-fixed tissues of unopened (cigar) leaves. The nuclei in the homogenate were stained by DAPI (4′,6-diamidino-2′-phenylindole), the nuclei in G<sub>1</sub>
 phase of cell cycle were purified by flow cytometric sorting and embedded in agarose miniplugs. HMW DNA was then purified and labeled using IrysPrep Reagent Kit (BioNano Genomics). The labelling was done with fluorescent nucleotide analogs at all Nt.BspQI nicking endonuclease sites. Single molecules were linearized in nanochannel arrays, imaged. A total of 426,846 molecules, with a N50 of 153 kb, representing a cumulated length of 65,719 Mb with an average label density of 9.4 labels/100 kb were generated and <italic>de novo</italic>
 assembled using a layout-overlap-consensus method. The <italic>de novo</italic>
 map assembly yielded 464 Mb with a map N50 of 715 kb.</p>
</sec>
<sec id="Sec5"><title>Genetic markers</title>
<p>A total of 180 individuals among the 268 individuals of a self-progeny of the ‘Pahang’ accession (PT-BA-00267) obtained at the CIRAD research station in Guadeloupe were genotyped using the DArTseq technology [<xref ref-type="bibr" rid="CR37">37</xref>
]. A total of 9,968 co-dominant (SNP) and 16,233 dominant markers were generated using a <italic>Pst</italic>
I-<italic>Mse</italic>
I enzyme combination. These markers were used in addition to the 768 SSR and 497 DArT markers previously used to anchor the <italic>Musa acuminata</italic>
 genome assembly. Out of the 268 individuals in the mapping population, 91 individuals were genotyped with all types of markers, 178 individuals with both DArT and DArTseq markers, 91 individuals with both DArTseq and SSR markers and 176 individuals with both DArT and SSR markers. The markers were filtered independently for each marker type on the basis of the following criteria: no more than 20 % missing data, no less than 10 % heterozygous or dominant and no less than 1.5 % homozygous for at least one homozygous state, resulting in 23,430 markers. The choice of these relatively non-stringent parameters was motivated by large segregation distortions that were previously observed in chromosome 1 and chromosome 4 in the segregating population [<xref ref-type="bibr" rid="CR27">27</xref>
].</p>
</sec>
<sec id="Sec6"><title>Gene annotation</title>
<p>Two gene annotations of the <italic>Musa acuminata</italic>
 draft genome sequence were available for the initial assembly. The first corresponded to the annotation published by [<xref ref-type="bibr" rid="CR27">27</xref>
], in addition to approximately 1000 genes curated by human expertise before 08 December 2014 (<ext-link ext-link-type="uri" xlink:href="http://banana-genome.cirad.fr/">http://banana-genome.cirad.fr/</ext-link>
). The second one was the NCBI RefSeq genome annotation released the 7 October 2014 (<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/Musa_acuminata/">ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/Musa_acuminata/</ext-link>
) and generated with the NCBI Eukaryotic Genome Annotation Pipeline.</p>
</sec>
<sec id="Sec7"><title>Bioinformatics pipeline</title>
<p>An overview of the pipeline used to improve the banana draft genome assembly is shown in Fig. <xref rid="Fig1" ref-type="fig">1</xref>
. It is divided into 8 distinct modules corresponding to different and optional operations. This pipeline exploited several tools that we have developed and which are available under Scaffhunter and Scaffremodler toolboxes. The first one exploits genetic mapping data and the second one Large insert size Paired Reads (LPR). They are described in details in the Additional file <xref rid="MOESM1" ref-type="media">1</xref>
.<fig id="Fig1"><label>Fig. 1</label>
<caption><p>Overview of the pipeline used to improve the <italic>Musa</italic>
 draft genome sequence. Ellipses correspond to input data and <italic>grey ellipses</italic>
 indicate new data acquired for the improvement of the assembly. <italic>Boxes</italic>
 corresponds to bioinformatics tools, the ones in <italic>blue</italic>
 are new and made available through <italic>Scaffremodler</italic>
 and <italic>Scaffhunte</italic>
r toolboxes respectively (see Additional file <xref rid="MOESM1" ref-type="media">1</xref>
)</p>
</caption>
<graphic xlink:href="12864_2016_2579_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<sec id="Sec8"><title>Module 1: (Re-)scaffolding of contigs</title>
<p>This module used <italic>SSPACE</italic>
 [<xref ref-type="bibr" rid="CR23">23</xref>
] and exploited large insert size paired reads (LPR) to perform a new scaffolding of the existing contigs. The scaffolding process was divided into as many steps as the number of sequenced libraries with distinct inserts sizes. The libraries were used by increasing insert size order; scaffolding parameters were optimized for each step. To prevent accumulation of scaffolding errors, the first library was used with more stringent parameters (-a 0.5, -k 20) than the second and third ones (-a 0.7, -k 1). For Sanger sequence libraries (i.e. BAC-end and fosmids-end sequences) reads were mapped as single end-reads using <italic>BWA</italic>
 [<xref ref-type="bibr" rid="CR38">38</xref>
]. Single location reads were used to reconstruct read-pairs that were stored in a tabulated file used by <italic>SSPACE</italic>
.</p>
</sec>
<sec id="Sec9"><title>Module 2: identification and splitting of scaffold/contig misassemblies</title>
<p>This module identified and split misassembled contigs/scaffolds using a combination of GBS genetic mapping data and LPR data. Genetic markers were grouped into linkage groups using JoinMap4.1 software [<xref ref-type="bibr" rid="CR39">39</xref>
]. No marker ordering was performed at this stage. In parallel, marker sequences were aligned to scaffolds using a consensus of <italic>BWA</italic>
, <italic>bowtie2</italic>
 [<xref ref-type="bibr" rid="CR40">40</xref>
] and <italic>BLAST</italic>
 [<xref ref-type="bibr" rid="CR41">41</xref>
] and only single hits markers were conserved. Scaffolds harboring markers attributed to more than one linkage group were identified. LPR aligning (using bowtie2 in --very-sensitive mode) in these scaffolds were inspected to precisely locate the misassembly boundaries. The misassembled boundaries were identified based on the absence of overlap of read-pairs in the area and an increased proportion of discordant reads. Misassembled scaffolds were then split. The complete process and tools used for this module are described in Additional file <xref rid="MOESM1" ref-type="media">1</xref>
.</p>
</sec>
<sec id="Sec10"><title>Module 3: scaffold fusions/junctions</title>
<p>This module used LPR to identify scaffolds that should be inserted into larger ones (hereafter referred to as fusion) and scaffolds that should be end-joined (hereafter referred to as junction). LPR were aligned to the scaffolds using bowtie2 in --very-sensitive mode. Only single hit LPR were conserved. Redundant LPR were filtered out using MarkDuplicates tool of Picard (<ext-link ext-link-type="uri" xlink:href="http://broadinstitute.github.io/picard/">http://broadinstitute.github.io/picard/</ext-link>
). Filtered LPR were then used to identify discordant read clusters, which were used to identify potential scaffold fusions and scaffold junctions. Potential scaffold fusions and junctions were then manually validated by inspecting circos [<xref ref-type="bibr" rid="CR42">42</xref>
] picture showing paired reads position in these regions. Fusion and junction performed were validated by aligning LPR along the corrected scaffolds using bowtie2 (in --very-sensitive mode) and mapped reads were inspected to ensure that the newly created junctions are spanned with reads mapped in the correct orientation. The complete process and tools used for this module are described in Additional file <xref rid="MOESM1" ref-type="media">1</xref>
.</p>
</sec>
<sec id="Sec11"><title>Module 4: scaffold gap re-estimation</title>
<p>In this module, the size of all remaining gaps (region composed of N) were re-estimated using all paired-reads (i.e. LPR, BAC-ends sequences and fosmid paired reads). Paired reads were aligned against scaffolds using bowtie2 in --very-sensitive mode for illumina reads and BWA with <italic>mem</italic>
 algorithm for Sanger reads. For each paired read library, gaps were re-estimated so that correctly orientated paired read overlapping a gap have an insert size corresponding to the expected median insert size of the library. For the 5 kb mate-pair library (illumina), at least 30 pairs were required to re-estimate a gap while for the 10 kb and BAC-end Sanger reads at least 2 and 1 pairs were required respectively. The complete process and tools used for this module are described in Additional file <xref rid="MOESM1" ref-type="media">1</xref>
.</p>
</sec>
<sec id="Sec12"><title>Module 5: super scaffold construction</title>
<p>This module exploits genome map to arrange scaffolds into super scaffolds. First, the sequence assembly fasta file was converted into the BioNano Irys map format by running an “in silico digest with the Nt.BspQI nicking endonuclease” of the sequence assembly using Knickers (<ext-link ext-link-type="uri" xlink:href="http://www.bnxinstall.com/knickers/Knickers.htm">http://www.bnxinstall.com/knickers/Knickers.htm</ext-link>
). Only scaffolds larger than 20 kb with more than five sites were used, representing 613 scaffolds for a cumulative size of 437 Mb. Then, using BioNano’s proprietary alignment tool RefAligner [<xref ref-type="bibr" rid="CR43">43</xref>
, <xref ref-type="bibr" rid="CR44">44</xref>
], the sequence maps were compared with Irys genome maps to find their best alignments; here only sequence maps with more than 5 labels (i.e. Nt.BspQI nicking endonuclease site) were used for comparison. The sequence-Irys map pairs with significant discordance were flagged and removed, with discordance defined as more than 5 consecutive labels not unaligned on both the sequence map and the Irys map. These pairs may represent chimeric assemblies due to sequencing errors or allelic differences. Then, the filtered sequence maps and filtered Irys maps were merged with RefAligner using a <italic>p</italic>
-value of 10<sup>−10</sup>
 based on [<xref ref-type="bibr" rid="CR45">45</xref>
] to create super scaffolds. This merging process was iterative, and the merge order was based on map similarity. The iterations stopped when all possible pairs were merged. A tabulated file locating scaffold sequence into the merged maps was then used to group scaffolds into super scaffolds. Original scaffolds were separated by Ns corresponding to their expected distance in the physical map.</p>
</sec>
<sec id="Sec13"><title>Module 6: scaffold gap closure</title>
<p>This module exploited paired short insert size reads (330 b pair-end illumina) to close gaps in scaffold using GapCloser v1.12 program [<xref ref-type="bibr" rid="CR24">24</xref>
]. At the end of this module, all scaffolds were renamed according to their length.</p>
</sec>
<sec id="Sec14"><title>Module 7: scaffold anchoring</title>
<p>This module used genetic markers obtained from a genetic mapping population to group, order and assemble scaffolds into pseudo-molecules. Our approach avoided the step of genetic map construction and a subsequent conciliation between genetic map and scaffolds. We used blocks of already ordered markers based on their position on scaffolds and first ordered them relative to each other, using UPGMA-like based approach. Then this first order was improved with permutation testing. The process can be decomposed into 4 steps:<list list-type="order"><list-item><p>Marker location on scaffolds using a consensus of <italic>BWA</italic>
, <italic>bowtie2</italic>
 and <italic>BLAST,</italic>
</p>
</list-item>
<list-item><p>Pairwise linkage LOD calculation between markers using JoinMap4.1,</p>
</list-item>
<list-item><p>Calculation of a first order using an UPGMA like approach on mean pairwise linkage LOD calculated between scaffolds,</p>
</list-item>
<list-item><p>Scaffold ordering and orientation optimization by performing scaffold permutations and re-orientations leading to maximization of a score calculated as follows:</p>
</list-item>
</list>
<disp-formula id="Equa"><alternatives><tex-math id="M1">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$ \mathrm{score}={\displaystyle \sum_{i=1,j=1,{x}_i<{x}_j}^n\left(1\hbox{-} \frac{\left({x}_j\mathit{\hbox{-}}{x}_i\right)}{\mathrm{n}}\right)\mathrm{L}\mathrm{O}{\mathrm{D}}_{\mathrm{ij}}} $$\end{document}</tex-math>
<mml:math id="M2"><mml:mi mathvariant="normal">score</mml:mi>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true"><mml:munderover><mml:mo stretchy="true">∑</mml:mo>
<mml:mrow><mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:msub><mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo><</mml:mo>
<mml:msub><mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mi>n</mml:mi>
</mml:munderover>
<mml:mrow><mml:mfenced close=")" open="("><mml:mrow><mml:mn>1</mml:mn>
<mml:mo>‐</mml:mo>
<mml:mfrac><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>x</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo mathvariant="italic">‐</mml:mo>
<mml:msub><mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="normal">n</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mi mathvariant="normal">L</mml:mi>
<mml:mi mathvariant="normal">O</mml:mi>
<mml:msub><mml:mi mathvariant="normal">D</mml:mi>
<mml:mrow><mml:mi mathvariant="normal">i</mml:mi>
<mml:mi mathvariant="normal">j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:math>
<graphic xlink:href="12864_2016_2579_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>with n the number of markers in the LG to order, x<sub>i</sub>
 and x<sub>j</sub>
 are the position of markers i and j in the tested order, and LOD<sub>ij</sub>
 the LOD score between markers i and j. To optimize computation time and as order is not tested within scaffolds, i and j are markers from different scaffolds. Scaffold sequences were then assembled into pseudo-molecules. In addition to a fasta file containing ordered scaffold sequences separated by 100 N, an AGP file locating scaffolds into pseudo-molecules was generated. The complete process and tools used for this module are described in Additional file <xref rid="MOESM1" ref-type="media">1</xref>
.</p>
</sec>
<sec id="Sec15"><title>Module 8: annotation transposition</title>
<p>This module consisted of transposing annotations from the first draft genome sequence to the new assembly. Gene annotations (consisting in fasta putative transcripts) were transferred to the new assembly using Exonerate software [<xref ref-type="bibr" rid="CR46">46</xref>
] with the cdna2genome model and a maximum allowed intron size of 30 kb. Exonerate performed genomic searches and spliced alignments in a single run. Using a custom Perl script, based on the exonerate output, we transferred the annotation on a new GFF3 files, and generated a file of sequence identifier equivalence between the two releases. The script performed some quality checks by comparing protein-coding sequences before and after the transfer as some discrepancies may occur. In such case, the script used Blastp to align genes exons by exons. Since two annotations were available (the annotation performed by [<xref ref-type="bibr" rid="CR27">27</xref>
] and the one performed by NCBI) both annotations were transposed. An additional consensus annotation was generated using a custom script that selected between the two annotations version genes spanning the same genomic coordinates based on tags enclosed in the GFF3 files using the intersect function of BEDTools [<xref ref-type="bibr" rid="CR47">47</xref>
].</p>
</sec>
</sec>
</sec>
<sec id="Sec16"><title>Results</title>
<p>The original banana, <italic>Musa acuminata</italic>
, draft reference genome assembly [<xref ref-type="bibr" rid="CR27">27</xref>
] was improved using the approach, tools and datasets as summarized in Fig. <xref rid="Fig1" ref-type="fig">1</xref>
. The improvement was made in 8 successive steps.</p>
<sec id="Sec17"><title>Contig scaffolding</title>
<p>The original 24,425 contigs published in the first version of the <italic>Musa acuminata</italic>
 reference genome [<xref ref-type="bibr" rid="CR27">27</xref>
] were re-assembled into scaffolds exploiting paired end data, which were used for original version of the assembly (Sanger 10 kb fosmid paired-reads, Sanger BAC-end reads), and new 5 kb mate-pair illumina sequences (40x coverage). Contigs were assembled into 2,267 scaffolds for a cumulated size of 439 Mb representing 84 % of the estimated size (523 Mb) of the DH-Pahang genome (Table <xref rid="Tab1" ref-type="table">1</xref>
). Ninety percent of the assembly was in 416 scaffolds and the N50 was 1.55 Mb. Gaps (region composed of at least one N) in scaffold represent 48.3 Mb accounting for 11 % of the assembly.<table-wrap id="Tab1"><label>Table 1</label>
<caption><p>Statistics on scaffold assemblies</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th></th>
<th>V1 (D'hont et al. 2012)</th>
<th>SSPACE</th>
<th>Fusion/joining/splitting/gap re-estimation</th>
<th>IRYS scaffold</th>
<th>GapCloser</th>
</tr>
</thead>
<tbody><tr><td>Scaffold number</td>
<td>7 513</td>
<td>2 267</td>
<td>1 572</td>
<td>1 532</td>
<td>1 532</td>
</tr>
<tr><td>Cumulated size</td>
<td>472 210 317</td>
<td>438 736 528</td>
<td>443 852 100</td>
<td>450 994 104</td>
<td>450 697 673</td>
</tr>
<tr><td>Unknown sites (%)</td>
<td>81 728 542 (17.3)</td>
<td>48 267 272 (11.0)</td>
<td>53 378 493 (12.3)</td>
<td>60 520 497 (13.4)</td>
<td>45 175 659 (10.0)</td>
</tr>
<tr><td>N50 (scaffold number)</td>
<td>1 311 088 (65)</td>
<td>1 545 585 (52)</td>
<td>2 890 075 (28)</td>
<td>3 014 384 (26)</td>
<td>3 016 874 (26)</td>
</tr>
<tr><td>N80 (scaffold number)</td>
<td>316 579 (299)</td>
<td>370 770 (242)</td>
<td>491 628 (169)</td>
<td>578 880 (150)</td>
<td>579 793 (150)</td>
</tr>
<tr><td>N90 (scaffold number)</td>
<td>54 335 (647)</td>
<td>169 980 (416)</td>
<td>201 127 (305)</td>
<td>234 686 (268)</td>
<td>234 825 (267)</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec18"><title>Scaffold correction</title>
<p>First, we looked for misassembled scaffolds. A total of 33 scaffolds were identified as containing markers from different linkage groups and thus as potentially containing misassembled regions. The misassembled regions were confirmed by the presence of discordant 5 kb LPR in the region. The 36 misassembled regions identified in these 33 scaffolds were then split, resulting in a total of 2,303 scaffolds. Figure <xref rid="Fig2" ref-type="fig">2</xref>
 shows an example of a misassembled scaffold. Most of the misassembled regions (24/36) resulted from scaffolding errors, potentially due to chimeric paired reads or read misalignment. The remaining misassembled regions (12/36) resulted from contig assembly errors.<fig id="Fig2"><label>Fig. 2</label>
<caption><p>Example of a clue leading to scaffold splitting. <bold>a</bold>
 Genetic markers mapped onto scaffold21 belong respectively to linkage-group 7 (<italic>red</italic>
) and linkage-group 6 (<italic>blue</italic>
) suggesting a chimeric misassembly. <bold>b</bold>
 CIRCOS graphical representation of paired read mapping in the misassembled region. This representation is drawn using <italic>Scaffremodler</italic>
’s tools. In the inner circle, links between read pairs are drawn with the following color code: <italic>grey lines</italic>
 correspond to concordant pairs (correct orientation and insert size), <italic>orange</italic>
 and <italic>red lines</italic>
 correspond to discordant pairs with smaller and greater insert size respectively. <italic>Purple lines</italic>
 correspond to pairs showing reverse-reverse orientation, <italic>green lines</italic>
, forward-forward and <italic>blue lines</italic>
 correspond to pair with complete reverse orientation relative to the paired library construction. The second circle represents scaffold in blue with gaps as black regions. The next circles are scatter plots with warm-cool color code. The first scatter plot presents the proportion of discordant reads on window size of one third of expected read pair insert size. The outer circle represents a scatter plot of read coverage on window size of 100 bases. The <italic>black arrow</italic>
 points the misassembled region in scaffold21 leading to the assembly of two regions that are not linked</p>
</caption>
<graphic xlink:href="12864_2016_2579_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p>Second, we looked for potential scaffold fusions and junctions. Based on the analysis of discordant paired-reads from the 5 kb LPR with our semi-automated tools, we could perform a total of 438 scaffold fusions and 293 scaffold junctions, resulting in reduction of scaffold number from 2,303 to 1,572. Figure <xref rid="Fig3" ref-type="fig">3</xref>
 shows an example of clue leading to scaffold1112 fusion into scaffold24. Additional file <xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1 shows the mapping of reads on the two borders of scaffold1112 after fusion into scaffold24. Both right and left borders displayed overlapping reads in the correct orientation (Additional file <xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1, A and B).<fig id="Fig3"><label>Fig. 3</label>
<caption><p>Example of a clue leading to scaffold fusion. <bold>a</bold>
 Graphical representation of paired read leading to the identification of fusion of scaffold1112 into scaffold24. This representation is drawn using <italic>Scaffremodler</italic>
’s tools. In the inner circle, links between read pairs are drawn with the color code described in Fig. <xref rid="Fig2" ref-type="fig">2</xref>
: <italic>grey</italic>
 for concordant pairs; red and orange for discordant in size; <italic>purple</italic>
, <italic>green</italic>
 and <italic>blue</italic>
 for orientation discordant pairs. The second circle represents scaffold in blue with gaps as black regions. The next represents the proportion of discordant reads and the last circle represents read coverage as in Fig. <xref rid="Fig2" ref-type="fig">2</xref>
. <italic>Red</italic>
 and <italic>blue</italic>
 beams linking scaffold1112 and scaffold24 allowed identifying scaffold fusion schematized in (<bold>b</bold>
). Inserting scaffold1112 into scaffold24 will correct the discordant red links and correct the orientation of discordant blue links</p>
</caption>
<graphic xlink:href="12864_2016_2579_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p>At this stage the size of gaps (region composed of Ns) within the new 1,572 scaffolds was re-estimated using the paired reads libraries sequentially resulting in 53 Mb for 12.3 % of the assembly (Table <xref rid="Tab1" ref-type="table">1</xref>
). The cumulative size of the new 1,572 scaffolds after gap re-estimation was of 444 Mb. Ninety percent of the assembly was in 305 scaffolds and the N50 was 2.9 Mb.</p>
<p>Finally, BioNano Irys genome map of DH-Pahang was used to order and orient scaffolds into super scaffolds. This step allowed merging of 72 scaffolds into 40 super-scaffolds. A total of 7.1 Mb of gap regions were added during super scaffold construction (Table <xref rid="Tab1" ref-type="table">1</xref>
). Finally, 90 % of the assembly was in 268 scaffolds and the N50 was 3.0 Mb with 26 scaffolds. Gaps in scaffolds represented 60.5 Mb for 13.4 % of the assembly.</p>
</sec>
<sec id="Sec19"><title>Gap closure</title>
<p>Gaps within the 1,532 scaffolds were then tentatively filled with the GapCloser program using the 330 bp pair-end illumina sequencing libraries (50x), generated to correct the first version of the banana <italic>Musa acuminata</italic>
 reference genome. Of the total of 27,691 gap regions, 9,838 were closed.</p>
</sec>
<sec id="Sec20"><title>Final assembly</title>
<p>The final assembly (Table <xref rid="Tab1" ref-type="table">1</xref>
) consisted of 1,532 scaffolds and showed a cumulative size of 450.7 Mb corresponding to 86 % of the estimated size of the DH-Pahang genome. Ninety percent of the assembly was in 267 scaffolds and the N50 was 3.0 Mb. Gaps in scaffolds represent only 45.2 Mb (10.0 % of the assembly). Twelve of these scaffolds were identified as mitochondrial DNA (cumulative size of 7.2 Mb) using BLAST (blastn, e-value 10<sup>−100</sup>
) of mitochondrial coding sequences of <italic>Phoenix dactylifera</italic>
 (NC_016740). The twelve mitochondrial scaffolds were removed from the final nuclear assembly.</p>
<p>In order to validate the improvements made, the proportion of mapped 5 kb mate pair discordant reads (i.e. wrong insert size and/or orientation) for each scaffold assembly versions was calculated. Over the 82.9 million non-redundant and single mapped pairs, 16.3 million (19.7 %) mapped discordantly on the first version. Over the 82.9 million non-redundant and single mapped pairs 12.3 million (14.8 %) mapped discordantly for the new assembly before gap closure. Over the 80.8 million non-redundant and single mapped pairs 9.6 million (11.9 %) mapped discordantly for the new assembly after gap closure.</p>
</sec>
<sec id="Sec21"><title>Musa scaffold anchoring</title>
<p>Genetic markers were then used to assemble scaffolds into pseudo-molecules. Of the 23,430 selected genetic markers, 21,851 that mapped to a unique position were grouped into 11 linkage groups. A total of 248 markers were discarded since they created local discrepancies in scaffolds, clearly attributed to a linkage group based on the majority of the markers. Markers located on small scaffolds for which no linkage group majority could be found were also discarded. The remaining 21,603 markers allowed to order and orient 376 scaffolds into the 11 pseudo-molecules (Fig. <xref rid="Fig4" ref-type="fig">4</xref>
), with an average of 5.44 markers per 100 kb (Table <xref rid="Tab2" ref-type="table">2</xref>
)<italic>.</italic>
<fig id="Fig4"><label>Fig. 4</label>
<caption><p>Representation of the new version of eleven pseudo-molecules of <italic>Musa acuminata</italic>
. <italic>Black</italic>
 and <italic>white boxes</italic>
 correspond to oriented and unoriented scaffolds, respectively. Genetic marker, gene and unknown sequence (‘N’) density are represented in <italic>grey</italic>
, <italic>blue</italic>
 and <italic>green</italic>
 respectively based on a windows size of 100 kb. The recombination rate (<italic>red curve</italic>
) has been calculated on 180 individuals on corrected genetic markers and a sliding window of 500 kb</p>
</caption>
<graphic xlink:href="12864_2016_2579_Fig4_HTML" id="MO4"></graphic>
</fig>
<table-wrap id="Tab2"><label>Table 2</label>
<caption><p>Statistics on marker density on linkage groups</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th>Linkage group</th>
<th>Cumulated scaffold size</th>
<th>Marker number</th>
<th>Marker density (number/100 kb)</th>
</tr>
</thead>
<tbody><tr><td>chr01</td>
<td>29 067 552</td>
<td>1 384</td>
<td char="." align="char">4.76</td>
</tr>
<tr><td>chr02</td>
<td>29 509 134</td>
<td>1 502</td>
<td char="." align="char">5.09</td>
</tr>
<tr><td>chr03</td>
<td>35 017 413</td>
<td>1 920</td>
<td char="." align="char">5.48</td>
</tr>
<tr><td>chr04</td>
<td>37 104 143</td>
<td>2 489</td>
<td char="." align="char">6.71</td>
</tr>
<tr><td>chr05</td>
<td>41 848 132</td>
<td>1 924</td>
<td char="." align="char">4.60</td>
</tr>
<tr><td>chr06</td>
<td>37 589 864</td>
<td>2 234</td>
<td char="." align="char">5.94</td>
</tr>
<tr><td>chr07</td>
<td>35 025 021</td>
<td>1 744</td>
<td char="." align="char">4.98</td>
</tr>
<tr><td>chr08</td>
<td>44 883 571</td>
<td>2 728</td>
<td char="." align="char">6.08</td>
</tr>
<tr><td>chr09</td>
<td>41 302 925</td>
<td>2 136</td>
<td char="." align="char">5.17</td>
</tr>
<tr><td>chr10</td>
<td>37 671 811</td>
<td>2 023</td>
<td char="." align="char">5.37</td>
</tr>
<tr><td>chr11</td>
<td>27 952 850</td>
<td>1 519</td>
<td char="." align="char">5.43</td>
</tr>
<tr><td>Total</td>
<td>396 972 416</td>
<td>21 603</td>
<td char="." align="char">5.44</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Finally, a total of 397 Mb of genome sequence was anchored, representing 89.5 % of the nuclear genome assembly (versus 70 % in version 1) and including all scaffolds larger than 1 Mb. Each pseudo-molecule comprised from 16 to 57 scaffolds and N50 in pseudo-molecules varied between 1.4 Mb to 9.9 Mb. The mean N (gap) proportion varied from 5.6 to 12.9 % in pseudo-molecules and was of 25.1 % in unanchored scaffolds (Table <xref rid="Tab3" ref-type="table">3</xref>
). Marker linkage in ordered scaffolds can be visualized for each chromosome in Additional file <xref rid="MOESM1" ref-type="media">1</xref>
: Figure S2.<table-wrap id="Tab3"><label>Table 3</label>
<caption><p>Statistics on <italic>Musa acuminata</italic>
 pseudo-molecule assembly between the first and the new version</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th></th>
<th colspan="6">Version 1</th>
<th colspan="6">Version 2</th>
</tr>
<tr><th>Identifier</th>
<th>Scaffold cumulated size</th>
<th>Nb<sup>a</sup>
</th>
<th>Scaffold N50</th>
<th>Nb<sup>a</sup>
</th>
<th>N in scaffolds</th>
<th>%</th>
<th>Scaffold cumulated size</th>
<th>Nb<sup>a</sup>
</th>
<th>Scaffold N50</th>
<th>Nb<sup>a</sup>
</th>
<th>N in scaffolds</th>
<th>%</th>
</tr>
</thead>
<tbody><tr><td>chr01</td>
<td>27 571 529</td>
<td>22</td>
<td>2 245 470</td>
<td>4</td>
<td>3 459 727</td>
<td char="." align="char">12.5</td>
<td>29 067 552</td>
<td>30</td>
<td>1 394 891</td>
<td>2</td>
<td>2 151 480</td>
<td char="." align="char">7.4</td>
</tr>
<tr><td>chr02</td>
<td>22 052 597</td>
<td>22</td>
<td>1 755 924</td>
<td>3</td>
<td>2 961 122</td>
<td char="." align="char">13.4</td>
<td>29 509 134</td>
<td>27</td>
<td>2 676 329</td>
<td>3</td>
<td>3 555 070</td>
<td char="." align="char">12.0</td>
</tr>
<tr><td>chr03</td>
<td>30 468 307</td>
<td>22</td>
<td>3 785 391</td>
<td>3</td>
<td>3 981 002</td>
<td char="." align="char">13.1</td>
<td>35 017 413</td>
<td>31</td>
<td>9 733 574</td>
<td>2</td>
<td>2 329 119</td>
<td char="." align="char">6.7</td>
</tr>
<tr><td>chr04</td>
<td>30 050 316</td>
<td>13</td>
<td>8 856 836</td>
<td>2</td>
<td>3 343 441</td>
<td char="." align="char">11.1</td>
<td>37 104 143</td>
<td>17</td>
<td>7 838 899</td>
<td>3</td>
<td>2 076 824</td>
<td char="." align="char">5.6</td>
</tr>
<tr><td>chr05</td>
<td>29 375 369</td>
<td>21</td>
<td>2 773 165</td>
<td>4</td>
<td>3 488 635</td>
<td char="." align="char">11.9</td>
<td>41 848 132</td>
<td>52</td>
<td>2 239 696</td>
<td>5</td>
<td>3 976 084</td>
<td char="." align="char">9.5</td>
</tr>
<tr><td>chr06</td>
<td>34 896 279</td>
<td>30</td>
<td>7 330 853</td>
<td>2</td>
<td>4 472 335</td>
<td char="." align="char">12.8</td>
<td>37 589 864</td>
<td>36</td>
<td>9 841 105</td>
<td>2</td>
<td>2 328 163</td>
<td char="." align="char">6.2</td>
</tr>
<tr><td>chr07</td>
<td>28 615 304</td>
<td>22</td>
<td>5 244 634</td>
<td>3</td>
<td>4 262 894</td>
<td char="." align="char">14.9</td>
<td>35 025 021</td>
<td>31</td>
<td>6 378 715</td>
<td>3</td>
<td>4 518 654</td>
<td char="." align="char">12.9</td>
</tr>
<tr><td>chr08</td>
<td>35 437 139</td>
<td>27</td>
<td>2 556 008</td>
<td>3</td>
<td>5 002 970</td>
<td char="." align="char">14.1</td>
<td>44 883 571</td>
<td>57</td>
<td>9 906 416</td>
<td>2</td>
<td>3 821 170</td>
<td char="." align="char">8.5</td>
</tr>
<tr><td>chr09</td>
<td>34 145 263</td>
<td>37</td>
<td>1 544 587</td>
<td>6</td>
<td>5 397 793</td>
<td char="." align="char">15.8</td>
<td>41 302 925</td>
<td>39</td>
<td>2 119 922</td>
<td>3</td>
<td>3 398 494</td>
<td char="." align="char">8.2</td>
</tr>
<tr><td>chr10</td>
<td>33 662 572</td>
<td>33</td>
<td>1 266 487</td>
<td>5</td>
<td>5 753 963</td>
<td char="." align="char">17.1</td>
<td>37 671 811</td>
<td>31</td>
<td>1 798 308</td>
<td>3</td>
<td>3 318 350</td>
<td char="." align="char">8.8</td>
</tr>
<tr><td>chr11</td>
<td>25 512 624</td>
<td>15</td>
<td>7 530 813</td>
<td>2</td>
<td>2 838 651</td>
<td char="." align="char">11.1</td>
<td>27 952 850</td>
<td>16</td>
<td>7 787 879</td>
<td>2</td>
<td>1 979 175</td>
<td char="." align="char">7.1</td>
</tr>
<tr><td>Mitochondrion</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>7 218 240</td>
<td>12</td>
<td>616 199</td>
<td>4</td>
<td>37 503</td>
<td char="." align="char">0.5</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p><sup>a</sup>
Scaffold number</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>In comparison to the first pseudo-molecule assembly version, we corrected the position of only a few large regions from one pseudo-molecule to another (Fig. <xref rid="Fig5" ref-type="fig">5</xref>
, Additional file <xref rid="MOESM1" ref-type="media">1</xref>
: Figure S3). One major change concerned a region that was previously anchored to chromosome 1 and that is now assigned to chromosome 4. These regions of chromosomes 1 and 4 displayed marked segregation distortions that created pseudo-linkages [<xref ref-type="bibr" rid="CR27">27</xref>
] and hampered the anchoring of the first draft assembly that was based on much lower number of genetic markers. Apart from this large change in the assembly, many small modifications were made, representing either anchoring small scaffolds previously unanchored, or small scaffolds reordering. Most of these changes concerned peri-centromeric regions.<fig id="Fig5"><label>Fig. 5</label>
<caption><p>Dot plot comparison of gene order between the initial and the new version of <italic>Musa acuminata</italic>
 genome sequence assembly. A <italic>dot</italic>
 represents the position of a gene in the two assembly versions with the initial assembly on x axis and the new one on the y axis. Ruptures in the diagonal indicate differences of gene order. <italic>Red circles</italic>
 indicate the main differences and <italic>green circles</italic>
 indicate the variations resulting from the approximate scaffold order in the peri-centromeric regions. For instance, the version 2 of the assembly corrects a significant error between the chromosome 1 and 4</p>
</caption>
<graphic xlink:href="12864_2016_2579_Fig5_HTML" id="MO5"></graphic>
</fig>
</p>
</sec>
<sec id="Sec22"><title>Annotation transfer</title>
<p>Two independent annotations of the initial version of the banana genome assembly were available and both were transferred to the new assembly. The <italic>M. acuminata</italic>
 transcripts from the first annotation published [<xref ref-type="bibr" rid="CR27">27</xref>
] in addition to several manually curated gene annotation were transferred to the new assembly version. Of the 36,550 predicted genes, 36,154 (98.9 %) genes were transferred to the new assembly version (Table <xref rid="Tab4" ref-type="table">4</xref>
). Of the total number of transferred genes, 540 (1.5 %) were located in unanchored scaffolds compared to 2,927 genes (8 %) in the first version. Ninety-six genes were transferred onto the mitochondrial scaffolds. The same transfer was performed for the NCBI Refseq genome annotation. A total of 30,674 (99.9 %) genes of the 30,716 predicted genes were transferred to the new assembly version (Table <xref rid="Tab4" ref-type="table">4</xref>
).<table-wrap id="Tab4"><label>Table 4</label>
<caption><p>Statistics on annotation transfer between the first release of the assembly and the new release</p>
</caption>
<table frame="hsides" rules="groups"><thead><tr><th></th>
<th colspan="3">First release (D'hont et al. 2012)</th>
<th colspan="4">New release (version 2)</th>
</tr>
<tr><th rowspan="2">Identifier</th>
<th>Pseudo-molecule size (bp)<sup>c</sup>
</th>
<th colspan="2">Number</th>
<th>Pseudo-molecule size (bp)<sup>c</sup>
</th>
<th colspan="3">Number</th>
</tr>
<tr><th></th>
<th>RefSeq<sup>a</sup>
</th>
<th>BGH<sup>b</sup>
</th>
<th></th>
<th>RefSeq<sup>a</sup>
</th>
<th>BGH<sup>b</sup>
</th>
<th>Consensus</th>
</tr>
</thead>
<tbody><tr><td>chr01</td>
<td>27 573 629</td>
<td>2 407</td>
<td>2 836</td>
<td>29 070 452</td>
<td>2 038</td>
<td>2 427</td>
<td>2 372</td>
</tr>
<tr><td>chr02</td>
<td>22 054 697</td>
<td>1 975</td>
<td>2 328</td>
<td>29 511 734</td>
<td>2 172</td>
<td>2 563</td>
<td>2 517</td>
</tr>
<tr><td>chr03</td>
<td>30 470 407</td>
<td>2 796</td>
<td>3 251</td>
<td>35 020 413</td>
<td>2 991</td>
<td>3 443</td>
<td>3 371</td>
</tr>
<tr><td>chr04</td>
<td>30 051 516</td>
<td>2 850</td>
<td>3 368</td>
<td>37 105 743</td>
<td>3 512</td>
<td>4 123</td>
<td>4 018</td>
</tr>
<tr><td>chr05</td>
<td>29 377 369</td>
<td>2 583</td>
<td>2 972</td>
<td>41 853 232</td>
<td>2 824</td>
<td>3 268</td>
<td>3 215</td>
</tr>
<tr><td>chr06</td>
<td>34 899 179</td>
<td>3 165</td>
<td>3 700</td>
<td>37 593 364</td>
<td>3 425</td>
<td>4 003</td>
<td>3 896</td>
</tr>
<tr><td>chr07</td>
<td>28 617 404</td>
<td>2 447</td>
<td>2 764</td>
<td>35 028 021</td>
<td>2 577</td>
<td>2 907</td>
<td>2 918</td>
</tr>
<tr><td>chr08</td>
<td>35 439 739</td>
<td>2 876</td>
<td>3 458</td>
<td>44 889 171</td>
<td>3 034</td>
<td>3 623</td>
<td>3 489</td>
</tr>
<tr><td>chr09</td>
<td>34 148 863</td>
<td>2 602</td>
<td>3 110</td>
<td>41 306 725</td>
<td>2 752</td>
<td>3 318</td>
<td>3 157</td>
</tr>
<tr><td>chr10</td>
<td>33 665 772</td>
<td>2 677</td>
<td>3 157</td>
<td>37 674 811</td>
<td>2 775</td>
<td>3 229</td>
<td>3 155</td>
</tr>
<tr><td>chr11</td>
<td>25 514 024</td>
<td>2 257</td>
<td>2 679</td>
<td>27 954 350</td>
<td>2 205</td>
<td>2 614</td>
<td>2 521</td>
</tr>
<tr><td>chrUn_random</td>
<td>141 147 818</td>
<td>2 081</td>
<td>2 927</td>
<td>46 622 217</td>
<td>344</td>
<td>540</td>
<td>543</td>
</tr>
<tr><td>Mitochondrial</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>7 218 240</td>
<td>25</td>
<td>96</td>
<td>104</td>
</tr>
<tr><td>Total</td>
<td>472 960 417</td>
<td>30 716</td>
<td>36 550</td>
<td>450 848 473</td>
<td>30 674</td>
<td>36 154</td>
<td>35 276</td>
</tr>
</tbody>
</table>
<table-wrap-foot><p><sup>a</sup>
NCBI RefSeq genome annotation released the 7 October 2014 and generated with the NCBI Eukaryotic Genome Annotation Pipeline</p>
<p><sup>b</sup>
Banana Genome Hub (BGH) annotation performed by [<xref ref-type="bibr" rid="CR27">27</xref>
], in addition to manually curated genes performed before 08 December 2014 available in the Banana Genome Hub</p>
<p><sup>c</sup>
Including ‘N’ separating scaffolds</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>Based on the analysis of several manually curated genes, the NCBI RefSeq genome annotation proved to be generally of better quality than the first published annotation in particular because the first annotation over predicted introns. In addition, the NCBI RefSeq genome annotation integrated RNAseq data and predicted alternative transcripts. We thus created a consensus annotation that combined all the manually curated genes, the NCBI Refseq annotation and the predicted genes from the first annotation that were missed by the Refseq annotation pipeline. Using JBrowse in the Banana Genome Hub, these three gene annotations can be visualized as separate tracks. Note that since, we did not perform a new annotation but an annotation transfer, gene fragmentation due to contigs miss-junctions still remains in the new annotated assembly version even if the new assembly version corrected such gene fragmentation. Finally, the consensus annotation contains 35,276 predicted genes with 34,629 (98.2 %) located in chromosomes, 543 (1.5 %) located in unanchored scaffolds and 104 (0.3 %) located in identified mitochondrial scaffolds (Table <xref rid="Tab4" ref-type="table">4</xref>
). To avoid any confusion, we modified the nomenclature of Locus tags. For example, GSMUA_Achr5t02570_001 in version 1 becomes Ma05_t02680.1 in version 2.</p>
</sec>
</sec>
<sec id="Sec23"><title>Discussion</title>
<p>During the course of this work we succeeded in significantly improving the initial <italic>Musa</italic>
 nuclear draft genome assembly by reducing the scaffold number by 80 % (7,513 vs. 1532), doubling the N50 value (3.0 vs. 1.3 Mb) and increasing the proportion of assembly anchored to the 11 <italic>Musa</italic>
 chromosomes by 20 % (70 % vs. 89.5 %) that now include 98.2 % of genes. The decrease of discordant 5 kb read-pairs mapping proportion of 40 % between initial and new version of the assembly support the quality of the changes that were made.</p>
<p>The addition of the 5 kb mate-pair illumina library in the scaffolding process decreased scaffold number by 70 % (7,513 to 2,267) and raised N50 from 1.3 Mb to 1.5 Mb. These results highlighted the importance of medium insert size library during the scaffolding process. Interestingly, the scaffold fusion/junction that we performed decreased further the scaffold number by 30 % (2,267 to 1,572) and significantly impacted the N50 value which nearly doubled. These results highlight the utility and power of the semi-automated tools we have developed. Apart from verifying the newly established scaffolds, the use of BioNano Irys genome maps permitted a few additional scaffold junctions. These maps would have had a bigger impact if they were available earlier during the process [<xref ref-type="bibr" rid="CR48">48</xref>
]. The gap filling step allowed an important reduction of gap regions in the final assembly (17.3 % to 10.0 % between the first and the new assembly versions). The reduction of discordant 5 kb read pairs proportion between the assembly before and after gap filling highlighted the quality of gap closure step performed.</p>
<p>The cumulative size of the new assembly is reduced by 21.5 Mb in comparison with the first genome assembly [<xref ref-type="bibr" rid="CR27">27</xref>
]. This reduction is mainly due to the insertion of small scaffolds into previous gaps of larger scaffolds. The total size of the assembly, lower than expected, can be explained at least in part by difficulties in correctly assembling the repeated fraction of the genome (45S and 5S ribosomal DNA, transposons, retro-transposons and tandem repeats). These repeat-rich sequences are often collapsed into single regions, resulting in a reduced size for the total assembly [<xref ref-type="bibr" rid="CR5">5</xref>
]. For example, 10.6 Mb rDNA have been found in the unassembled reads of DH-Pahang [<xref ref-type="bibr" rid="CR27">27</xref>
].</p>
<p>Saturation of genetic map with DArTseq markers increased the proportion of anchored assembly from 70 to 89.5 % and anchored genes from 92 to 98.2 %. For scaffold anchoring, the classical approach is to construct a genetic map and to anchor the scaffold assembly onto this genetic map to construct a pseudo-molecule. Genotyping errors that are frequent in GBS data can lead to marker miss-ordering in genetic map and to conflict between markers order in genetic map and in scaffolds, when performing the scaffold anchoring. To avoid the tedious step of conciliation between genetic map and scaffolds, we developed a method that takes the advantage of markers already ordered into blocks corresponding to scaffolds. In this context, genotyping error impact is lowered as markers are already partially ordered. The newly anchored regions belong essentially to peri-centromeric regions. However because the proportion of repeated sequence is high in these regions, the marker density is lower (Fig. <xref rid="Fig4" ref-type="fig">4</xref>
) and the recombination rate is generally very low (or even suppressed) [<xref ref-type="bibr" rid="CR49">49</xref>
–<xref ref-type="bibr" rid="CR52">52</xref>
]. Consequently the scaffold order and orientation in these regions remains tentative.</p>
</sec>
<sec id="Sec24"><title>Conclusion</title>
<p>The significant improvements made on the banana reference genome sequence will have important impact on the quality of future genetic and comparative genomic analysis. The bioinformatics methods and tools described in this work can be useful to improve draft genome assemblies in other plant species. The pipeline comprises independent modules adaptable to various datatypes. It can be used to improve existing assemblies or in combination with existing automated programs during <italic>de novo</italic>
 assembly. The improved version of the <italic>Musa acuminata</italic>
 genome assembly is accessible and can be downloaded in the new version of the Banana Genome Hub at <ext-link ext-link-type="uri" xlink:href="http://banana-genome.cirad.fr/">http://banana-genome.cirad.fr/</ext-link>
 [<xref ref-type="bibr" rid="CR53">53</xref>
]. Tools are available in command line version on GitHub (<ext-link ext-link-type="uri" xlink:href="https://github.com/SouthGreenPlatform">https://github.com/SouthGreenPlatform</ext-link>
). Most of the options (Modules 2, 3, 4 and 7) are also available on the South Green Galaxy platform under <italic>Scaffhunter</italic>
 and <italic>Scaffremodler</italic>
 toolboxes (<ext-link ext-link-type="uri" xlink:href="http://galaxy.southgreen.fr/galaxy/">http://galaxy.southgreen.fr/galaxy/</ext-link>
).</p>
<sec id="Sec25"><title>Availability of supporting data</title>
<p>Datasets (contigs, scaffold assembly, Pseudo-molecules, makers matrix and raw data of the genome map) are available through the banana genome hub (<ext-link ext-link-type="uri" xlink:href="http://banana-genome.cirad.fr/">http://banana-genome.cirad.fr/</ext-link>
) and the 5 kb library is deposited on the ENA read archive (ID number: ERP013665).</p>
</sec>
</sec>
</body>
<back><app-group><app id="App1"><sec id="Sec26"><title>Additional file</title>
<p><media position="anchor" xlink:href="12864_2016_2579_MOESM1_ESM.pdf" id="MOESM1"><label>Additional file 1:</label>
<caption><p>Detailed description of tools and processes used to improve the <italic>Musa acuminata</italic>
 reference sequence and additional figures. (PDF 1604 kb)</p>
</caption>
</media>
</p>
</sec>
</app>
</app-group>
<glossary><title>Abbreviations</title>
<def-list><def-item><term>BAC</term>
<def><p>bacterial artificial chromosome</p>
</def>
</def-item>
<def-item><term>GBS</term>
<def><p>genotyping by sequencing</p>
</def>
</def-item>
<def-item><term>HMW</term>
<def><p>high molecular weight</p>
</def>
</def-item>
<def-item><term>LPR</term>
<def><p>large insert size paired reads</p>
</def>
</def-item>
<def-item><term>NGS</term>
<def><p>Next Generation Sequencing</p>
</def>
</def-item>
<def-item><term>TE</term>
<def><p>transposable elements</p>
</def>
</def-item>
<def-item><term>WGD</term>
<def><p>whole genome duplication</p>
</def>
</def-item>
<def-item><term>WGS</term>
<def><p>whole genome shotgun sequencing</p>
</def>
</def-item>
</def-list>
</glossary>
<fn-group><fn><p><bold>Competing interests</bold>
</p>
<p>The authors declare that they have no competing interests.</p>
</fn>
<fn><p><bold>Authors’ contributions</bold>
</p>
<p>GM, FCB, ADH: Conceived and designed the study and wrote the manuscript. GM: Developed the bioinformatic programs and performed the analysis. AK, AA, JD, AH: Produced the sequencing data and the genome Irys map. GD, MR, AC, AK, JMA, AH, FC: Contributed to the analysis and edited the manuscript. ADH: coordinated the study. All authors read and approved the final manuscript.</p>
</fn>
</fn-group>
<ack><p>The authors thank the Diversity Arrays Technology Pty Ltd for DArTSeq genotyping, Jan Vrána and Hana Šimková for preparation of HMW DNA, and CGIAR Research Program on Roots, Tubers and Bananas (RTB) for financial support for sequencing data acquisition. We also thank the South Green Bioinformatics Platform (<ext-link ext-link-type="uri" xlink:href="http://southgreen.cirad.fr">http://southgreen.cirad.fr</ext-link>
) for providing us with computational resources. We thank Christophe Jenny for providing the Pahang segregating population from the CIRAD research station in Guadeloupe, French West Indies.</p>
</ack>
<ref-list id="Bib1"><title>References</title>
<ref id="CR1"><label>1.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bolger</surname>
<given-names>ME</given-names>
</name>
<name><surname>Weisshaar</surname>
<given-names>B</given-names>
</name>
<name><surname>Scholz</surname>
<given-names>U</given-names>
</name>
<name><surname>Stein</surname>
<given-names>N</given-names>
</name>
<name><surname>Usadel</surname>
<given-names>B</given-names>
</name>
<name><surname>Mayer</surname>
<given-names>KF</given-names>
</name>
</person-group>
<article-title>Plant genome sequencing — applications for crop improvement</article-title>
<source>Curr Opin Biotechnol.</source>
<year>2014</year>
<volume>26</volume>
<fpage>31</fpage>
<lpage>37</lpage>
<pub-id pub-id-type="doi">10.1016/j.copbio.2013.08.019</pub-id>
<pub-id pub-id-type="pmid">24679255</pub-id>
</element-citation>
</ref>
<ref id="CR2"><label>2.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Feuillet</surname>
<given-names>C</given-names>
</name>
<name><surname>Leach</surname>
<given-names>JE</given-names>
</name>
<name><surname>Rogers</surname>
<given-names>J</given-names>
</name>
<name><surname>Schnable</surname>
<given-names>PS</given-names>
</name>
<name><surname>Eversole</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Crop genome sequencing: lessons and rationales</article-title>
<source>Trends Plant Sci.</source>
<year>2011</year>
<volume>16</volume>
<fpage>77</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="doi">10.1016/j.tplants.2010.10.005</pub-id>
<pub-id pub-id-type="pmid">21081278</pub-id>
</element-citation>
</ref>
<ref id="CR3"><label>3.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Michael</surname>
<given-names>TP</given-names>
</name>
<name><surname>Jackson</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>The First 50 Plant Genomes</article-title>
<source>Plant Genome.</source>
<year>2013</year>
<volume>6</volume>
<fpage>1</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.3835/plantgenome2013.03.0001in</pub-id>
</element-citation>
</ref>
<ref id="CR4"><label>4.</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Kejnovsky</surname>
<given-names>E</given-names>
</name>
<name><surname>Hawkins</surname>
<given-names>J</given-names>
</name>
<name><surname>Feschotte</surname>
<given-names>C</given-names>
</name>
</person-group>
<person-group person-group-type="editor"><name><surname>Wendel</surname>
<given-names>JF</given-names>
</name>
<name><surname>Greilhuber</surname>
<given-names>J</given-names>
</name>
<name><surname>Dolezel</surname>
<given-names>J</given-names>
</name>
<name><surname>Leitch</surname>
<given-names>IJ</given-names>
</name>
</person-group>
<article-title>Plant Transposable Elements: Biology and Evolution</article-title>
<source>Plant Genome Diversity</source>
<year>2012</year>
<publisher-loc>Vienna</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>17</fpage>
<lpage>34</lpage>
</element-citation>
</ref>
<ref id="CR5"><label>5.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hahn</surname>
<given-names>MW</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>SV</given-names>
</name>
<name><surname>Moyle</surname>
<given-names>LC</given-names>
</name>
</person-group>
<article-title>Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations</article-title>
<source>G3 Genes Genomes Genetics.</source>
<year>2014</year>
<volume>4</volume>
<fpage>669</fpage>
<lpage>79</lpage>
<pub-id pub-id-type="pmid">24531727</pub-id>
</element-citation>
</ref>
<ref id="CR6"><label>6.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vanneste</surname>
<given-names>K</given-names>
</name>
<name><surname>Maere</surname>
<given-names>S</given-names>
</name>
<name><surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution</article-title>
<source>Philos Trans R Soc B Biol Sci.</source>
<year>2014</year>
<volume>369</volume>
<fpage>1</fpage>
<lpage>13</lpage>
<pub-id pub-id-type="doi">10.1098/rstb.2013.0353</pub-id>
</element-citation>
</ref>
<ref id="CR7"><label>7.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alkan</surname>
<given-names>C</given-names>
</name>
<name><surname>Sajjadian</surname>
<given-names>S</given-names>
</name>
<name><surname>Eichler</surname>
<given-names>EE</given-names>
</name>
</person-group>
<article-title>Limitations of next-generation genome sequence assembly</article-title>
<source>Nat Methods.</source>
<year>2011</year>
<volume>8</volume>
<fpage>61</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1527</pub-id>
<pub-id pub-id-type="pmid">21102452</pub-id>
</element-citation>
</ref>
<ref id="CR8"><label>8.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mardis</surname>
<given-names>ER</given-names>
</name>
</person-group>
<article-title>A decade’s perspective on DNA sequencing technology</article-title>
<source>Nature.</source>
<year>2011</year>
<volume>470</volume>
<fpage>198</fpage>
<lpage>203</lpage>
<pub-id pub-id-type="doi">10.1038/nature09796</pub-id>
<pub-id pub-id-type="pmid">21307932</pub-id>
</element-citation>
</ref>
<ref id="CR9"><label>9.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Williams</surname>
<given-names>LJS</given-names>
</name>
<name><surname>Tabbaa</surname>
<given-names>DG</given-names>
</name>
<name><surname>Li</surname>
<given-names>N</given-names>
</name>
<name><surname>Berlin</surname>
<given-names>AM</given-names>
</name>
<name><surname>Shea</surname>
<given-names>TP</given-names>
</name>
<name><surname>MacCallum</surname>
<given-names>I</given-names>
</name>
<name><surname>Lawrence</surname>
<given-names>MS</given-names>
</name>
<name><surname>Drier</surname>
<given-names>Y</given-names>
</name>
<name><surname>Getz</surname>
<given-names>G</given-names>
</name>
<name><surname>Young</surname>
<given-names>SK</given-names>
</name>
<name><surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name><surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name><surname>Gnirke</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Paired-end sequencing of Fosmid libraries by Illumina</article-title>
<source>Genome Res.</source>
<year>2012</year>
<volume>22</volume>
<fpage>2241</fpage>
<lpage>2249</lpage>
<pub-id pub-id-type="doi">10.1101/gr.138925.112</pub-id>
<pub-id pub-id-type="pmid">22800726</pub-id>
</element-citation>
</ref>
<ref id="CR10"><label>10.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname>
<given-names>Y</given-names>
</name>
<name><surname>Xie</surname>
<given-names>M</given-names>
</name>
<name><surname>Jiang</surname>
<given-names>Y</given-names>
</name>
<name><surname>Xiao</surname>
<given-names>N</given-names>
</name>
<name><surname>Du</surname>
<given-names>X</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>W</given-names>
</name>
<name><surname>Tosser-Klopp</surname>
<given-names>G</given-names>
</name>
<name><surname>Wang</surname>
<given-names>J</given-names>
</name>
<name><surname>Yang</surname>
<given-names>S</given-names>
</name>
<name><surname>Liang</surname>
<given-names>J</given-names>
</name>
<name><surname>Chen</surname>
<given-names>W</given-names>
</name>
<name><surname>Chen</surname>
<given-names>J</given-names>
</name>
<name><surname>Zeng</surname>
<given-names>P</given-names>
</name>
<name><surname>Hou</surname>
<given-names>Y</given-names>
</name>
<name><surname>Bian</surname>
<given-names>C</given-names>
</name>
<name><surname>Pan</surname>
<given-names>S</given-names>
</name>
<name><surname>Li</surname>
<given-names>Y</given-names>
</name>
<name><surname>Liu</surname>
<given-names>X</given-names>
</name>
<name><surname>Wang</surname>
<given-names>W</given-names>
</name>
<name><surname>Servin</surname>
<given-names>B</given-names>
</name>
<name><surname>Sayre</surname>
<given-names>B</given-names>
</name>
<name><surname>Zhu</surname>
<given-names>B</given-names>
</name>
<name><surname>Sweeney</surname>
<given-names>D</given-names>
</name>
<name><surname>Moore</surname>
<given-names>R</given-names>
</name>
<name><surname>Nie</surname>
<given-names>W</given-names>
</name>
<name><surname>Shen</surname>
<given-names>Y</given-names>
</name>
<name><surname>Zhao</surname>
<given-names>R</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>G</given-names>
</name>
<name><surname>Li</surname>
<given-names>J</given-names>
</name>
<name><surname>Faraut</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus)</article-title>
<source>Nat Biotechnol.</source>
<year>2013</year>
<volume>31</volume>
<fpage>135</fpage>
<lpage>141</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2478</pub-id>
<pub-id pub-id-type="pmid">23263233</pub-id>
</element-citation>
</ref>
<ref id="CR11"><label>11.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Levy-Sakin</surname>
<given-names>M</given-names>
</name>
<name><surname>Ebenstein</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy</article-title>
<source>Curr Opin Biotechnol.</source>
<year>2013</year>
<volume>24</volume>
<fpage>690</fpage>
<lpage>698</lpage>
<pub-id pub-id-type="doi">10.1016/j.copbio.2013.01.009</pub-id>
<pub-id pub-id-type="pmid">23428595</pub-id>
</element-citation>
</ref>
<ref id="CR12"><label>12.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neely</surname>
<given-names>RK</given-names>
</name>
<name><surname>Deen</surname>
<given-names>J</given-names>
</name>
<name><surname>Hofkens</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Optical mapping of DNA: Single-molecule-based methods for mapping genomes</article-title>
<source>Biopolymers.</source>
<year>2011</year>
<volume>95</volume>
<fpage>298</fpage>
<lpage>311</lpage>
<pub-id pub-id-type="doi">10.1002/bip.21579</pub-id>
<pub-id pub-id-type="pmid">21207457</pub-id>
</element-citation>
</ref>
<ref id="CR13"><label>13.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mascher</surname>
<given-names>M</given-names>
</name>
<name><surname>Stein</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Genetic anchoring of whole-genome shotgun assemblies</article-title>
<source>Front Genet.</source>
<year>2014</year>
<volume>5</volume>
<fpage>1</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.3389/fgene.2014.00208</pub-id>
<pub-id pub-id-type="pmid">24567736</pub-id>
</element-citation>
</ref>
<ref id="CR14"><label>14.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Mascher</surname>
<given-names>M</given-names>
</name>
<name><surname>Muehlbauer</surname>
<given-names>GJ</given-names>
</name>
<name><surname>Rokhsar</surname>
<given-names>DS</given-names>
</name>
<name><surname>Chapman</surname>
<given-names>J</given-names>
</name>
<name><surname>Schmutz</surname>
<given-names>J</given-names>
</name>
<name><surname>Barry</surname>
<given-names>K</given-names>
</name>
<name><surname>Muñoz-Amatriaín</surname>
<given-names>M</given-names>
</name>
<name><surname>Close</surname>
<given-names>TJ</given-names>
</name>
<name><surname>Wise</surname>
<given-names>RP</given-names>
</name>
<name><surname>Schulman</surname>
<given-names>AH</given-names>
</name>
<name><surname>Himmelbach</surname>
<given-names>A</given-names>
</name>
<name><surname>Mayer</surname>
<given-names>KFX</given-names>
</name>
<name><surname>Scholz</surname>
<given-names>U</given-names>
</name>
<name><surname>Poland</surname>
<given-names>JA</given-names>
</name>
<name><surname>Stein</surname>
<given-names>N</given-names>
</name>
<name><surname>Waugh</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)</article-title>
<source>Plant J.</source>
<year>2013</year>
<volume>76</volume>
<fpage>718</fpage>
<lpage>727</lpage>
<pub-id pub-id-type="doi">10.1111/tpj.12319</pub-id>
<pub-id pub-id-type="pmid">23998490</pub-id>
</element-citation>
</ref>
<ref id="CR15"><label>15.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schatz</surname>
<given-names>M</given-names>
</name>
<name><surname>Witkowski</surname>
<given-names>J</given-names>
</name>
<name><surname>McCombie</surname>
<given-names>WR</given-names>
</name>
</person-group>
<article-title>Current challenges in de novo plant genome sequencing and assembly</article-title>
<source>Genome Biol.</source>
<year>2012</year>
<volume>13</volume>
<fpage>243</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2012-13-4-243</pub-id>
<pub-id pub-id-type="pmid">22546054</pub-id>
</element-citation>
</ref>
<ref id="CR16"><label>16.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pop</surname>
<given-names>M</given-names>
</name>
<name><surname>Kosack</surname>
<given-names>DS</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Hierarchical Scaffolding With Bambus</article-title>
<source>Genome Res.</source>
<year>2004</year>
<volume>14</volume>
<fpage>149</fpage>
<lpage>159</lpage>
<pub-id pub-id-type="doi">10.1101/gr.1536204</pub-id>
<pub-id pub-id-type="pmid">14707177</pub-id>
</element-citation>
</ref>
<ref id="CR17"><label>17.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dayarian</surname>
<given-names>A</given-names>
</name>
<name><surname>Michael</surname>
<given-names>T</given-names>
</name>
<name><surname>Sengupta</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>SOPRA: Scaffolding algorithm for paired reads via statistical optimization</article-title>
<source>BMC Bioinformatics.</source>
<year>2010</year>
<volume>11</volume>
<fpage>345</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-345</pub-id>
<pub-id pub-id-type="pmid">20576136</pub-id>
</element-citation>
</ref>
<ref id="CR18"><label>18.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Salmela</surname>
<given-names>L</given-names>
</name>
<name><surname>Mäkinen</surname>
<given-names>V</given-names>
</name>
<name><surname>Välimäki</surname>
<given-names>N</given-names>
</name>
<name><surname>Ylinen</surname>
<given-names>J</given-names>
</name>
<name><surname>Ukkonen</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Fast scaffolding with small independent mixed integer programs</article-title>
<source>Bioinformatics.</source>
<year>2011</year>
<volume>27</volume>
<fpage>3259</fpage>
<lpage>3265</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr562</pub-id>
<pub-id pub-id-type="pmid">21998153</pub-id>
</element-citation>
</ref>
<ref id="CR19"><label>19.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boetzer</surname>
<given-names>M</given-names>
</name>
<name><surname>Henkel</surname>
<given-names>CV</given-names>
</name>
<name><surname>Jansen</surname>
<given-names>HJ</given-names>
</name>
<name><surname>Butler</surname>
<given-names>D</given-names>
</name>
<name><surname>Pirovano</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Scaffolding pre-assembled contigs using SSPACE</article-title>
<source>Bioinformatics.</source>
<year>2011</year>
<volume>27</volume>
<fpage>578</fpage>
<lpage>579</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq683</pub-id>
<pub-id pub-id-type="pmid">21149342</pub-id>
</element-citation>
</ref>
<ref id="CR20"><label>20.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname>
<given-names>S</given-names>
</name>
<name><surname>Sung</surname>
<given-names>W-K</given-names>
</name>
<name><surname>Nagarajan</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences</article-title>
<source>J Comput Biol.</source>
<year>2011</year>
<volume>18</volume>
<fpage>1681</fpage>
<lpage>1691</lpage>
<pub-id pub-id-type="doi">10.1089/cmb.2011.0170</pub-id>
<pub-id pub-id-type="pmid">21929371</pub-id>
</element-citation>
</ref>
<ref id="CR21"><label>21.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gritsenko</surname>
<given-names>AA</given-names>
</name>
<name><surname>Nijkamp</surname>
<given-names>JF</given-names>
</name>
<name><surname>Reinders</surname>
<given-names>MJT</given-names>
</name>
<name><surname>de Ridder</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies</article-title>
<source>Bioinformatics.</source>
<year>2012</year>
<volume>28</volume>
<fpage>1429</fpage>
<lpage>1437</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts175</pub-id>
<pub-id pub-id-type="pmid">22492642</pub-id>
</element-citation>
</ref>
<ref id="CR22"><label>22.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Donmez</surname>
<given-names>N</given-names>
</name>
<name><surname>Brudno</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>SCARPA: scaffolding reads with practical algorithms</article-title>
<source>Bioinformatics.</source>
<year>2013</year>
<volume>29</volume>
<fpage>428</fpage>
<lpage>434</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts716</pub-id>
<pub-id pub-id-type="pmid">23274213</pub-id>
</element-citation>
</ref>
<ref id="CR23"><label>23.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boetzer</surname>
<given-names>M</given-names>
</name>
<name><surname>Pirovano</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information</article-title>
<source>BMC Bioinformatics.</source>
<year>2014</year>
<volume>15</volume>
<fpage>211</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-15-211</pub-id>
<pub-id pub-id-type="pmid">24950923</pub-id>
</element-citation>
</ref>
<ref id="CR24"><label>24.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Luo</surname>
<given-names>R</given-names>
</name>
<name><surname>Liu</surname>
<given-names>B</given-names>
</name>
<name><surname>Xie</surname>
<given-names>Y</given-names>
</name>
<name><surname>Li</surname>
<given-names>Z</given-names>
</name>
<name><surname>Huang</surname>
<given-names>W</given-names>
</name>
<name><surname>Yuan</surname>
<given-names>J</given-names>
</name>
<name><surname>He</surname>
<given-names>G</given-names>
</name>
<name><surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name><surname>Pan</surname>
<given-names>Q</given-names>
</name>
<name><surname>Liu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Tang</surname>
<given-names>J</given-names>
</name>
<name><surname>Wu</surname>
<given-names>G</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name><surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name><surname>Liu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Yu</surname>
<given-names>C</given-names>
</name>
<name><surname>Wang</surname>
<given-names>B</given-names>
</name>
<name><surname>Lu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Han</surname>
<given-names>C</given-names>
</name>
<name><surname>Cheung</surname>
<given-names>D</given-names>
</name>
<name><surname>Yiu</surname>
<given-names>S-M</given-names>
</name>
<name><surname>Peng</surname>
<given-names>S</given-names>
</name>
<name><surname>Xiaoqian</surname>
<given-names>Z</given-names>
</name>
<name><surname>Liu</surname>
<given-names>G</given-names>
</name>
<name><surname>Liao</surname>
<given-names>X</given-names>
</name>
<name><surname>Li</surname>
<given-names>Y</given-names>
</name>
<name><surname>Yang</surname>
<given-names>H</given-names>
</name>
<name><surname>Wang</surname>
<given-names>J</given-names>
</name>
<name><surname>Lam</surname>
<given-names>T-W</given-names>
</name>
<name><surname>Wang</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler</article-title>
<source>GigaScience.</source>
<year>2012</year>
<volume>1</volume>
<fpage>18</fpage>
<pub-id pub-id-type="doi">10.1186/2047-217X-1-18</pub-id>
<pub-id pub-id-type="pmid">23587118</pub-id>
</element-citation>
</ref>
<ref id="CR25"><label>25.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Boetzer</surname>
<given-names>M</given-names>
</name>
<name><surname>Pirovano</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Toward almost closed genomes with GapFiller</article-title>
<source>Genome Biol.</source>
<year>2012</year>
<volume>13</volume>
<fpage>R56</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2012-13-6-r56</pub-id>
<pub-id pub-id-type="pmid">22731987</pub-id>
</element-citation>
</ref>
<ref id="CR26"><label>26.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Swain</surname>
<given-names>MT</given-names>
</name>
<name><surname>Tsai</surname>
<given-names>IJ</given-names>
</name>
<name><surname>Assefa</surname>
<given-names>SA</given-names>
</name>
<name><surname>Newbold</surname>
<given-names>C</given-names>
</name>
<name><surname>Berriman</surname>
<given-names>M</given-names>
</name>
<name><surname>Otto</surname>
<given-names>TD</given-names>
</name>
</person-group>
<article-title>A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs</article-title>
<source>Nat Protoc.</source>
<year>2012</year>
<volume>7</volume>
<fpage>1260</fpage>
<lpage>1284</lpage>
<pub-id pub-id-type="doi">10.1038/nprot.2012.068</pub-id>
<pub-id pub-id-type="pmid">22678431</pub-id>
</element-citation>
</ref>
<ref id="CR27"><label>27.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>D’Hont</surname>
<given-names>A</given-names>
</name>
<name><surname>Denoeud</surname>
<given-names>F</given-names>
</name>
<name><surname>Aury</surname>
<given-names>J-M</given-names>
</name>
<name><surname>Baurens</surname>
<given-names>F-C</given-names>
</name>
<name><surname>Carreel</surname>
<given-names>F</given-names>
</name>
<name><surname>Garsmeur</surname>
<given-names>O</given-names>
</name>
<name><surname>Noel</surname>
<given-names>B</given-names>
</name>
<name><surname>Bocs</surname>
<given-names>S</given-names>
</name>
<name><surname>Droc</surname>
<given-names>G</given-names>
</name>
<name><surname>Rouard</surname>
<given-names>M</given-names>
</name>
<name><surname>Da Silva</surname>
<given-names>C</given-names>
</name>
<name><surname>Jabbari</surname>
<given-names>K</given-names>
</name>
<name><surname>Cardi</surname>
<given-names>C</given-names>
</name>
<name><surname>Poulain</surname>
<given-names>J</given-names>
</name>
<name><surname>Souquet</surname>
<given-names>M</given-names>
</name>
<name><surname>Labadie</surname>
<given-names>K</given-names>
</name>
<name><surname>Jourda</surname>
<given-names>C</given-names>
</name>
<name><surname>Lengelle</surname>
<given-names>J</given-names>
</name>
<name><surname>Rodier-Goud</surname>
<given-names>M</given-names>
</name>
<name><surname>Alberti</surname>
<given-names>A</given-names>
</name>
<name><surname>Bernard</surname>
<given-names>M</given-names>
</name>
<name><surname>Correa</surname>
<given-names>M</given-names>
</name>
<name><surname>Ayyampalayam</surname>
<given-names>S</given-names>
</name>
<name><surname>Mckain</surname>
<given-names>MR</given-names>
</name>
<name><surname>Leebens-Mack</surname>
<given-names>J</given-names>
</name>
<name><surname>Burgess</surname>
<given-names>D</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
<name><surname>Mbeguie-A-Mbeguie</surname>
<given-names>D</given-names>
</name>
<name><surname>Chabannes</surname>
<given-names>M</given-names>
</name>
<name><surname>Wicker</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The banana (Musa acuminata) genome and the evolution of monocotyledonous plants</article-title>
<source>Nature.</source>
<year>2012</year>
<volume>488</volume>
<fpage>213</fpage>
<lpage>217</lpage>
<pub-id pub-id-type="doi">10.1038/nature11241</pub-id>
<pub-id pub-id-type="pmid">22801500</pub-id>
</element-citation>
</ref>
<ref id="CR28"><label>28.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jourda</surname>
<given-names>C</given-names>
</name>
<name><surname>Cardi</surname>
<given-names>C</given-names>
</name>
<name><surname>Mbéguié-A-Mbéguié</surname>
<given-names>D</given-names>
</name>
<name><surname>Bocs</surname>
<given-names>S</given-names>
</name>
<name><surname>Garsmeur</surname>
<given-names>O</given-names>
</name>
<name><surname>D’Hont</surname>
<given-names>A</given-names>
</name>
<name><surname>Yahiaoui</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Expansion of banana (Musa acuminata) gene families involved in ethylene biosynthesis and signalling after lineage-specific whole-genome duplications</article-title>
<source>New Phytol.</source>
<year>2014</year>
<volume>202</volume>
<fpage>986</fpage>
<lpage>1000</lpage>
<pub-id pub-id-type="doi">10.1111/nph.12710</pub-id>
<pub-id pub-id-type="pmid">24716518</pub-id>
</element-citation>
</ref>
<ref id="CR29"><label>29.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Garsmeur</surname>
<given-names>O</given-names>
</name>
<name><surname>Schnable</surname>
<given-names>JC</given-names>
</name>
<name><surname>Almeida</surname>
<given-names>A</given-names>
</name>
<name><surname>Jourda</surname>
<given-names>C</given-names>
</name>
<name><surname>D’Hont</surname>
<given-names>A</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Two Evolutionarily Distinct Classes of Paleopolyploidy</article-title>
<source>Mol Biol Evol.</source>
<year>2014</year>
<volume>31</volume>
<fpage>448</fpage>
<lpage>454</lpage>
<pub-id pub-id-type="doi">10.1093/molbev/mst230</pub-id>
<pub-id pub-id-type="pmid">24296661</pub-id>
</element-citation>
</ref>
<ref id="CR30"><label>30.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cenci</surname>
<given-names>A</given-names>
</name>
<name><surname>Guignon</surname>
<given-names>V</given-names>
</name>
<name><surname>Roux</surname>
<given-names>N</given-names>
</name>
<name><surname>Rouard</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Genomic analysis of NAC transcription factors in banana (Musa acuminata) and definition of NAC orthologous groups for monocots and dicots</article-title>
<source>Plant Mol Biol.</source>
<year>2014</year>
<volume>85</volume>
<fpage>63</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="doi">10.1007/s11103-013-0169-2</pub-id>
<pub-id pub-id-type="pmid">24570169</pub-id>
</element-citation>
</ref>
<ref id="CR31"><label>31.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname>
<given-names>J</given-names>
</name>
<name><surname>Hu</surname>
<given-names>Q</given-names>
</name>
<name><surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name><surname>Lu</surname>
<given-names>C</given-names>
</name>
<name><surname>Kuang</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>P-MITE: a database for plant miniature inverted-repeat transposable elements</article-title>
<source>Nucleic Acids Res.</source>
<year>2014</year>
<volume>42</volume>
<fpage>D1176</fpage>
<lpage>D1181</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkt1000</pub-id>
<pub-id pub-id-type="pmid">24174541</pub-id>
</element-citation>
</ref>
<ref id="CR32"><label>32.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Golicz</surname>
<given-names>AA</given-names>
</name>
<name><surname>Schliep</surname>
<given-names>M</given-names>
</name>
<name><surname>Lee</surname>
<given-names>HT</given-names>
</name>
<name><surname>Larkum</surname>
<given-names>AWD</given-names>
</name>
<name><surname>Dolferus</surname>
<given-names>R</given-names>
</name>
<name><surname>Batley</surname>
<given-names>J</given-names>
</name>
<name><surname>Chan</surname>
<given-names>C-KK</given-names>
</name>
<name><surname>Sablok</surname>
<given-names>G</given-names>
</name>
<name><surname>Ralph</surname>
<given-names>PJ</given-names>
</name>
<name><surname>Edwards</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Genome-wide survey of the seagrass Zostera muelleri suggests modification of the ethylene signalling network</article-title>
<source>J Exp Bot.</source>
<year>2015</year>
<volume>66</volume>
<fpage>1489</fpage>
<lpage>98</lpage>
<pub-id pub-id-type="doi">10.1093/jxb/eru510</pub-id>
<pub-id pub-id-type="pmid">25563969</pub-id>
</element-citation>
</ref>
<ref id="CR33"><label>33.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sampedro</surname>
<given-names>J</given-names>
</name>
<name><surname>Guttman</surname>
<given-names>M</given-names>
</name>
<name><surname>Li</surname>
<given-names>L-C</given-names>
</name>
<name><surname>Cosgrove</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Evolutionary divergence of β–expansin structure and function in grasses parallels emergence of distinctive primary cell wall traits</article-title>
<source>Plant J.</source>
<year>2015</year>
<volume>81</volume>
<fpage>108</fpage>
<lpage>120</lpage>
<pub-id pub-id-type="doi">10.1111/tpj.12715</pub-id>
<pub-id pub-id-type="pmid">25353668</pub-id>
</element-citation>
</ref>
<ref id="CR34"><label>34.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>De Smet</surname>
<given-names>R</given-names>
</name>
<name><surname>Adams</surname>
<given-names>KL</given-names>
</name>
<name><surname>Vandepoele</surname>
<given-names>K</given-names>
</name>
<name><surname>Van Montagu</surname>
<given-names>MCE</given-names>
</name>
<name><surname>Maere</surname>
<given-names>S</given-names>
</name>
<name><surname>Van de Peer</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants</article-title>
<source>Proc Natl Acad Sci.</source>
<year>2013</year>
<volume>110</volume>
<fpage>2898</fpage>
<lpage>2903</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1300127110</pub-id>
<pub-id pub-id-type="pmid">23382190</pub-id>
</element-citation>
</ref>
<ref id="CR35"><label>35.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chain</surname>
<given-names>PSG</given-names>
</name>
<name><surname>Grafham</surname>
<given-names>DV</given-names>
</name>
<name><surname>Fulton</surname>
<given-names>RS</given-names>
</name>
<name><surname>FitzGerald</surname>
<given-names>MG</given-names>
</name>
<name><surname>Hostetler</surname>
<given-names>J</given-names>
</name>
<name><surname>Muzny</surname>
<given-names>D</given-names>
</name>
<name><surname>Ali</surname>
<given-names>J</given-names>
</name>
<name><surname>Birren</surname>
<given-names>B</given-names>
</name>
<name><surname>Bruce</surname>
<given-names>DC</given-names>
</name>
<name><surname>Buhay</surname>
<given-names>C</given-names>
</name>
<name><surname>Cole</surname>
<given-names>JR</given-names>
</name>
<name><surname>Ding</surname>
<given-names>Y</given-names>
</name>
<name><surname>Dugan</surname>
<given-names>S</given-names>
</name>
<name><surname>Field</surname>
<given-names>D</given-names>
</name>
<name><surname>Garrity</surname>
<given-names>GM</given-names>
</name>
<name><surname>Gibbs</surname>
<given-names>R</given-names>
</name>
<name><surname>Graves</surname>
<given-names>T</given-names>
</name>
<name><surname>Han</surname>
<given-names>CS</given-names>
</name>
<name><surname>Harrison</surname>
<given-names>SH</given-names>
</name>
<name><surname>Highlander</surname>
<given-names>S</given-names>
</name>
<name><surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name><surname>Khouri</surname>
<given-names>HM</given-names>
</name>
<name><surname>Kodira</surname>
<given-names>CD</given-names>
</name>
<name><surname>Kolker</surname>
<given-names>E</given-names>
</name>
<name><surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
<name><surname>Lang</surname>
<given-names>D</given-names>
</name>
<name><surname>Lapidus</surname>
<given-names>A</given-names>
</name>
<name><surname>Malfatti</surname>
<given-names>SA</given-names>
</name>
<name><surname>Markowitz</surname>
<given-names>V</given-names>
</name>
<name><surname>Metha</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Genome Project Standards in a New Era of Sequencing</article-title>
<source>Science.</source>
<year>2009</year>
<volume>326</volume>
<fpage>236</fpage>
<lpage>237</lpage>
<pub-id pub-id-type="doi">10.1126/science.1180614</pub-id>
<pub-id pub-id-type="pmid">19815760</pub-id>
</element-citation>
</ref>
<ref id="CR36"><label>36.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Šimková</surname>
<given-names>H</given-names>
</name>
<name><surname>Číhalíková</surname>
<given-names>J</given-names>
</name>
<name><surname>Vrána</surname>
<given-names>J</given-names>
</name>
<name><surname>Lysák</surname>
<given-names>M</given-names>
</name>
<name><surname>Doležel</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Preparation of HMW DNA from Plant Nuclei and Chromosomes Isolated from Root Tips</article-title>
<source>Biol Plant.</source>
<year>2003</year>
<volume>46</volume>
<fpage>369</fpage>
<lpage>373</lpage>
<pub-id pub-id-type="doi">10.1023/A:1024322001786</pub-id>
</element-citation>
</ref>
<ref id="CR37"><label>37.</label>
<mixed-citation publication-type="other">Cruz VM. Molecular Genetic Characterization of Lesquerella New Industrial Crop Using DArTseq Markers. In Plant and Animal Genome XXI Conference, San Diego, CA, USA. Plant and Animal Genome. 2013.</mixed-citation>
</ref>
<ref id="CR38"><label>38.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname>
<given-names>H</given-names>
</name>
<name><surname>Durbin</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Fast and accurate long-read alignment with Burrows–Wheeler transform</article-title>
<source>Bioinformatics.</source>
<year>2010</year>
<volume>26</volume>
<fpage>589</fpage>
<lpage>595</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp698</pub-id>
<pub-id pub-id-type="pmid">20080505</pub-id>
</element-citation>
</ref>
<ref id="CR39"><label>39.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Van Ooijen</surname>
<given-names>JW</given-names>
</name>
</person-group>
<article-title>Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species</article-title>
<source>Genet Res.</source>
<year>2011</year>
<volume>93</volume>
<fpage>343</fpage>
<lpage>349</lpage>
<pub-id pub-id-type="doi">10.1017/S0016672311000279</pub-id>
</element-citation>
</ref>
<ref id="CR40"><label>40.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name><surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Fast gapped-read alignment with Bowtie 2</article-title>
<source>Nat Methods.</source>
<year>2012</year>
<volume>9</volume>
<fpage>357</fpage>
<lpage>359</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id>
<pub-id pub-id-type="pmid">22388286</pub-id>
</element-citation>
</ref>
<ref id="CR41"><label>41.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name><surname>Gish</surname>
<given-names>W</given-names>
</name>
<name><surname>Miller</surname>
<given-names>W</given-names>
</name>
<name><surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name><surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Basic local alignment search tool</article-title>
<source>J Mol Biol.</source>
<year>1990</year>
<volume>215</volume>
<fpage>403</fpage>
<lpage>410</lpage>
<pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id>
<pub-id pub-id-type="pmid">2231712</pub-id>
</element-citation>
</ref>
<ref id="CR42"><label>42.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Krzywinski</surname>
<given-names>M</given-names>
</name>
<name><surname>Schein</surname>
<given-names>J</given-names>
</name>
<name><surname>Birol</surname>
<given-names>İ</given-names>
</name>
<name><surname>Connors</surname>
<given-names>J</given-names>
</name>
<name><surname>Gascoyne</surname>
<given-names>R</given-names>
</name>
<name><surname>Horsman</surname>
<given-names>D</given-names>
</name>
<name><surname>Jones</surname>
<given-names>SJ</given-names>
</name>
<name><surname>Marra</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Circos: An information aesthetic for comparative genomics</article-title>
<source>Genome Res.</source>
<year>2009</year>
<volume>19</volume>
<fpage>1639</fpage>
<lpage>1645</lpage>
<pub-id pub-id-type="doi">10.1101/gr.092759.109</pub-id>
<pub-id pub-id-type="pmid">19541911</pub-id>
</element-citation>
</ref>
<ref id="CR43"><label>43.</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Anantharaman</surname>
<given-names>T</given-names>
</name>
<name><surname>Mishra</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>A Probabilistic Analysis of False Positives in Optical Map Alignment and Validation</article-title>
<source>Proc. of WABI</source>
<year>2001</year>
<fpage>27</fpage>
<lpage>40</lpage>
</element-citation>
</ref>
<ref id="CR44"><label>44.</label>
<element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Nguyen</surname>
<given-names>JV</given-names>
</name>
</person-group>
<source>Genomic Mapping: A Statistical and Algorithmic Analysis of the Optical Mapping System</source>
<year>2010</year>
<publisher-loc>Los Angeles, CA, USA</publisher-loc>
<publisher-name>University of Southern California</publisher-name>
</element-citation>
</ref>
<ref id="CR45"><label>45.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pendleton</surname>
<given-names>M</given-names>
</name>
<name><surname>Sebra</surname>
<given-names>R</given-names>
</name>
<name><surname>Pang</surname>
<given-names>AWC</given-names>
</name>
<name><surname>Ummat</surname>
<given-names>A</given-names>
</name>
<name><surname>Franzen</surname>
<given-names>O</given-names>
</name>
<name><surname>Rausch</surname>
<given-names>T</given-names>
</name>
<name><surname>Stütz</surname>
<given-names>AM</given-names>
</name>
<name><surname>Stedman</surname>
<given-names>W</given-names>
</name>
<name><surname>Anantharaman</surname>
<given-names>T</given-names>
</name>
<name><surname>Hastie</surname>
<given-names>A</given-names>
</name>
<name><surname>Dai</surname>
<given-names>H</given-names>
</name>
<name><surname>Fritz</surname>
<given-names>MH-Y</given-names>
</name>
<name><surname>Cao</surname>
<given-names>H</given-names>
</name>
<name><surname>Cohain</surname>
<given-names>A</given-names>
</name>
<name><surname>Deikus</surname>
<given-names>G</given-names>
</name>
<name><surname>Durrett</surname>
<given-names>RE</given-names>
</name>
<name><surname>Blanchard</surname>
<given-names>SC</given-names>
</name>
<name><surname>Altman</surname>
<given-names>R</given-names>
</name>
<name><surname>Chin</surname>
<given-names>C-S</given-names>
</name>
<name><surname>Guo</surname>
<given-names>Y</given-names>
</name>
<name><surname>Paxinos</surname>
<given-names>EE</given-names>
</name>
<name><surname>Korbel</surname>
<given-names>JO</given-names>
</name>
<name><surname>Darnell</surname>
<given-names>RB</given-names>
</name>
<name><surname>McCombie</surname>
<given-names>WR</given-names>
</name>
<name><surname>Kwok</surname>
<given-names>P-Y</given-names>
</name>
<name><surname>Mason</surname>
<given-names>CE</given-names>
</name>
<name><surname>Schadt</surname>
<given-names>EE</given-names>
</name>
<name><surname>Bashir</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Assembly and diploid architecture of an individual human genome via single-molecule technologies</article-title>
<source>Nat Methods</source>
<year>2015</year>
<volume>12</volume>
<fpage>780</fpage>
<lpage>786</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.3454</pub-id>
<pub-id pub-id-type="pmid">26121404</pub-id>
</element-citation>
</ref>
<ref id="CR46"><label>46.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slater</surname>
<given-names>G</given-names>
</name>
<name><surname>Birney</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Automated generation of heuristics for biological sequence comparison</article-title>
<source>BMC Bioinformatics.</source>
<year>2005</year>
<volume>6</volume>
<fpage>31</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-6-31</pub-id>
<pub-id pub-id-type="pmid">15713233</pub-id>
</element-citation>
</ref>
<ref id="CR47"><label>47.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname>
<given-names>AR</given-names>
</name>
<name><surname>Hall</surname>
<given-names>IM</given-names>
</name>
</person-group>
<article-title>BEDTools: a flexible suite of utilities for comparing genomic features</article-title>
<source>Bioinformatics.</source>
<year>2010</year>
<volume>26</volume>
<fpage>841</fpage>
<lpage>842</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id>
<pub-id pub-id-type="pmid">20110278</pub-id>
</element-citation>
</ref>
<ref id="CR48"><label>48.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Muggli</surname>
<given-names>MD</given-names>
</name>
<name><surname>Puglisi</surname>
<given-names>SJ</given-names>
</name>
<name><surname>Ronen</surname>
<given-names>R</given-names>
</name>
<name><surname>Boucher</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Misassembly detection using paired-end sequence reads and optical mapping data</article-title>
<source>Bioinformatics.</source>
<year>2015</year>
<volume>31</volume>
<fpage>80</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btv262</pub-id>
</element-citation>
</ref>
<ref id="CR49"><label>49.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname>
<given-names>M</given-names>
</name>
<name><surname>Presting</surname>
<given-names>G</given-names>
</name>
<name><surname>Barbazuk</surname>
<given-names>WB</given-names>
</name>
<name><surname>Goicoechea</surname>
<given-names>JL</given-names>
</name>
<name><surname>Blackmon</surname>
<given-names>B</given-names>
</name>
<name><surname>Fang</surname>
<given-names>G</given-names>
</name>
<name><surname>Kim</surname>
<given-names>H</given-names>
</name>
<name><surname>Frisch</surname>
<given-names>D</given-names>
</name>
<name><surname>Yu</surname>
<given-names>Y</given-names>
</name>
<name><surname>Sun</surname>
<given-names>S</given-names>
</name>
<name><surname>Higingbottom</surname>
<given-names>S</given-names>
</name>
<name><surname>Phimphilai</surname>
<given-names>J</given-names>
</name>
<name><surname>Phimphilai</surname>
<given-names>D</given-names>
</name>
<name><surname>Thurmond</surname>
<given-names>S</given-names>
</name>
<name><surname>Gaudette</surname>
<given-names>B</given-names>
</name>
<name><surname>Li</surname>
<given-names>P</given-names>
</name>
<name><surname>Liu</surname>
<given-names>J</given-names>
</name>
<name><surname>Hatfield</surname>
<given-names>J</given-names>
</name>
<name><surname>Main</surname>
<given-names>D</given-names>
</name>
<name><surname>Farrar</surname>
<given-names>K</given-names>
</name>
<name><surname>Henderson</surname>
<given-names>C</given-names>
</name>
<name><surname>Barnett</surname>
<given-names>L</given-names>
</name>
<name><surname>Costa</surname>
<given-names>R</given-names>
</name>
<name><surname>Williams</surname>
<given-names>B</given-names>
</name>
<name><surname>Walser</surname>
<given-names>S</given-names>
</name>
<name><surname>Atkins</surname>
<given-names>M</given-names>
</name>
<name><surname>Hall</surname>
<given-names>C</given-names>
</name>
<name><surname>Budiman</surname>
<given-names>MA</given-names>
</name>
<name><surname>Tomkins</surname>
<given-names>JP</given-names>
</name>
<name><surname>Luo</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>An Integrated Physical and Genetic Map of the Rice Genome</article-title>
<source>Plant Cell Online.</source>
<year>2002</year>
<volume>14</volume>
<fpage>537</fpage>
<lpage>545</lpage>
<pub-id pub-id-type="doi">10.1105/tpc.010485</pub-id>
</element-citation>
</ref>
<ref id="CR50"><label>50.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gill</surname>
<given-names>KS</given-names>
</name>
<name><surname>Gill</surname>
<given-names>BS</given-names>
</name>
<name><surname>Endo</surname>
<given-names>TR</given-names>
</name>
<name><surname>Taylor</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Identification and high-density mapping of gene-rich regions in chromosome group 1 of wheat</article-title>
<source>Genetics.</source>
<year>1996</year>
<volume>144</volume>
<fpage>1883</fpage>
<lpage>1891</lpage>
<pub-id pub-id-type="pmid">8978071</pub-id>
</element-citation>
</ref>
<ref id="CR51"><label>51.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hall</surname>
<given-names>SE</given-names>
</name>
<name><surname>Kettler</surname>
<given-names>G</given-names>
</name>
<name><surname>Preuss</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Centromere Satellites From Arabidopsis Populations: Maintenance of Conserved and Variable Domains</article-title>
<source>Genome Res.</source>
<year>2003</year>
<volume>13</volume>
<fpage>195</fpage>
<lpage>205</lpage>
<pub-id pub-id-type="doi">10.1101/gr.593403</pub-id>
<pub-id pub-id-type="pmid">12566397</pub-id>
</element-citation>
</ref>
<ref id="CR52"><label>52.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname>
<given-names>J</given-names>
</name>
<name><surname>Mizuno</surname>
<given-names>H</given-names>
</name>
<name><surname>Hayashi-Tsugane</surname>
<given-names>M</given-names>
</name>
<name><surname>Ito</surname>
<given-names>Y</given-names>
</name>
<name><surname>Chiden</surname>
<given-names>Y</given-names>
</name>
<name><surname>Fujisawa</surname>
<given-names>M</given-names>
</name>
<name><surname>Katagiri</surname>
<given-names>S</given-names>
</name>
<name><surname>Saji</surname>
<given-names>S</given-names>
</name>
<name><surname>Yoshiki</surname>
<given-names>S</given-names>
</name>
<name><surname>Karasawa</surname>
<given-names>W</given-names>
</name>
<name><surname>Yoshihara</surname>
<given-names>R</given-names>
</name>
<name><surname>Hayashi</surname>
<given-names>A</given-names>
</name>
<name><surname>Kobayashi</surname>
<given-names>H</given-names>
</name>
<name><surname>Ito</surname>
<given-names>K</given-names>
</name>
<name><surname>Hamada</surname>
<given-names>M</given-names>
</name>
<name><surname>Okamoto</surname>
<given-names>M</given-names>
</name>
<name><surname>Ikeno</surname>
<given-names>M</given-names>
</name>
<name><surname>Ichikawa</surname>
<given-names>Y</given-names>
</name>
<name><surname>Katayose</surname>
<given-names>Y</given-names>
</name>
<name><surname>Yano</surname>
<given-names>M</given-names>
</name>
<name><surname>Matsumoto</surname>
<given-names>T</given-names>
</name>
<name><surname>Sasaki</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Physical maps and recombination frequency of six rice chromosomes</article-title>
<source>Plant J.</source>
<year>2003</year>
<volume>36</volume>
<fpage>720</fpage>
<lpage>730</lpage>
<pub-id pub-id-type="doi">10.1046/j.1365-313X.2003.01903.x</pub-id>
<pub-id pub-id-type="pmid">14617072</pub-id>
</element-citation>
</ref>
<ref id="CR53"><label>53.</label>
<element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Droc</surname>
<given-names>G</given-names>
</name>
<name><surname>Larivière</surname>
<given-names>D</given-names>
</name>
<name><surname>Guignon</surname>
<given-names>V</given-names>
</name>
<name><surname>Yahiaoui</surname>
<given-names>N</given-names>
</name>
<name><surname>This</surname>
<given-names>D</given-names>
</name>
<name><surname>Garsmeur</surname>
<given-names>O</given-names>
</name>
<name><surname>Dereeper</surname>
<given-names>A</given-names>
</name>
<name><surname>Hamelin</surname>
<given-names>C</given-names>
</name>
<name><surname>Argout</surname>
<given-names>X</given-names>
</name>
<name><surname>Dufayard</surname>
<given-names>J-F</given-names>
</name>
<name><surname>Lengelle</surname>
<given-names>J</given-names>
</name>
<name><surname>Baurens</surname>
<given-names>F-C</given-names>
</name>
<name><surname>Cenci</surname>
<given-names>A</given-names>
</name>
<name><surname>Pitollat</surname>
<given-names>B</given-names>
</name>
<name><surname>D’Hont</surname>
<given-names>A</given-names>
</name>
<name><surname>Ruiz</surname>
<given-names>M</given-names>
</name>
<name><surname>Rouard</surname>
<given-names>M</given-names>
</name>
<name><surname>Bocs</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>The Banana Genome Hub</article-title>
<source>Database.</source>
<year>2013</year>
<volume>2013</volume>
<fpage>1</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1093/database/bat035</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A85 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000A85 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4793746
   |texte=   Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26984673" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024

	Serveur d'exploration sur les relations entre la France et l'Australie
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur les relations entre la France et l'Australie

Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods

Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki