OrangerV1, Pmc, Corpus, bibRecord, 000002

The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure

Identifieur interne : 000002 ( Pmc/Corpus ); précédent : 000001; suivant : 000003

The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure

Auteurs : Sateesh Kagale ; Chushin Koh ; John Nixon ; Venkatesh Bollina ; Wayne E. Clarke ; Reetu Tuteja ; Charles Spillane ; Stephen J. Robinson ; Matthew G. Links ; Carling Clarke ; Erin E. Higgins ; Terry Huebert ; Andrew G. Sharpe ; Isobel A. P. Parkin

Source :

Nature Communications [ 2041-1723 ] ; 2014.

RBID : PMC:4015329

Abstract

Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015329

DOI: 10.1038/ncomms4706
PubMed: 24759634
PubMed Central: 4015329

Links to Exploration step

PMC:4015329

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">The emerging biofuel crop <italic>Camelina sativa</italic>
 retains a highly undifferentiated hexaploid genome structure</title>
<author><name sortKey="Kagale, Sateesh" sort="Kagale, Sateesh" uniqKey="Kagale S" first="Sateesh" last="Kagale">Sateesh Kagale</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Koh, Chushin" sort="Koh, Chushin" uniqKey="Koh C" first="Chushin" last="Koh">Chushin Koh</name>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Nixon, John" sort="Nixon, John" uniqKey="Nixon J" first="John" last="Nixon">John Nixon</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Bollina, Venkatesh" sort="Bollina, Venkatesh" uniqKey="Bollina V" first="Venkatesh" last="Bollina">Venkatesh Bollina</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Clarke, Wayne E" sort="Clarke, Wayne E" uniqKey="Clarke W" first="Wayne E." last="Clarke">Wayne E. Clarke</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Tuteja, Reetu" sort="Tuteja, Reetu" uniqKey="Tuteja R" first="Reetu" last="Tuteja">Reetu Tuteja</name>
<affiliation><nlm:aff id="a3"><institution>Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway</institution>
, Galway,<country>Ireland</country>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Spillane, Charles" sort="Spillane, Charles" uniqKey="Spillane C" first="Charles" last="Spillane">Charles Spillane</name>
<affiliation><nlm:aff id="a3"><institution>Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway</institution>
, Galway,<country>Ireland</country>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Robinson, Stephen J" sort="Robinson, Stephen J" uniqKey="Robinson S" first="Stephen J." last="Robinson">Stephen J. Robinson</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Links, Matthew G" sort="Links, Matthew G" uniqKey="Links M" first="Matthew G." last="Links">Matthew G. Links</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Clarke, Carling" sort="Clarke, Carling" uniqKey="Clarke C" first="Carling" last="Clarke">Carling Clarke</name>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Higgins, Erin E" sort="Higgins, Erin E" uniqKey="Higgins E" first="Erin E." last="Higgins">Erin E. Higgins</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Huebert, Terry" sort="Huebert, Terry" uniqKey="Huebert T" first="Terry" last="Huebert">Terry Huebert</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Sharpe, Andrew G" sort="Sharpe, Andrew G" uniqKey="Sharpe A" first="Andrew G." last="Sharpe">Andrew G. Sharpe</name>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Parkin, Isobel A P" sort="Parkin, Isobel A P" uniqKey="Parkin I" first="Isobel A. P." last="Parkin">Isobel A. P. Parkin</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">24759634</idno>
<idno type="pmc">4015329</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015329</idno>
<idno type="RBID">PMC:4015329</idno>
<idno type="doi">10.1038/ncomms4706</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000002</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">The emerging biofuel crop <italic>Camelina sativa</italic>
 retains a highly undifferentiated hexaploid genome structure</title>
<author><name sortKey="Kagale, Sateesh" sort="Kagale, Sateesh" uniqKey="Kagale S" first="Sateesh" last="Kagale">Sateesh Kagale</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Koh, Chushin" sort="Koh, Chushin" uniqKey="Koh C" first="Chushin" last="Koh">Chushin Koh</name>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Nixon, John" sort="Nixon, John" uniqKey="Nixon J" first="John" last="Nixon">John Nixon</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Bollina, Venkatesh" sort="Bollina, Venkatesh" uniqKey="Bollina V" first="Venkatesh" last="Bollina">Venkatesh Bollina</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Clarke, Wayne E" sort="Clarke, Wayne E" uniqKey="Clarke W" first="Wayne E." last="Clarke">Wayne E. Clarke</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Tuteja, Reetu" sort="Tuteja, Reetu" uniqKey="Tuteja R" first="Reetu" last="Tuteja">Reetu Tuteja</name>
<affiliation><nlm:aff id="a3"><institution>Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway</institution>
, Galway,<country>Ireland</country>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Spillane, Charles" sort="Spillane, Charles" uniqKey="Spillane C" first="Charles" last="Spillane">Charles Spillane</name>
<affiliation><nlm:aff id="a3"><institution>Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway</institution>
, Galway,<country>Ireland</country>
</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Robinson, Stephen J" sort="Robinson, Stephen J" uniqKey="Robinson S" first="Stephen J." last="Robinson">Stephen J. Robinson</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Links, Matthew G" sort="Links, Matthew G" uniqKey="Links M" first="Matthew G." last="Links">Matthew G. Links</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Clarke, Carling" sort="Clarke, Carling" uniqKey="Clarke C" first="Carling" last="Clarke">Carling Clarke</name>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Higgins, Erin E" sort="Higgins, Erin E" uniqKey="Higgins E" first="Erin E." last="Higgins">Erin E. Higgins</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Huebert, Terry" sort="Huebert, Terry" uniqKey="Huebert T" first="Terry" last="Huebert">Terry Huebert</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Sharpe, Andrew G" sort="Sharpe, Andrew G" uniqKey="Sharpe A" first="Andrew G." last="Sharpe">Andrew G. Sharpe</name>
<affiliation><nlm:aff id="a2"><institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Parkin, Isobel A P" sort="Parkin, Isobel A P" uniqKey="Parkin I" first="Isobel A. P." last="Parkin">Isobel A. P. Parkin</name>
<affiliation><nlm:aff id="a1"><institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Nature Communications</title>
<idno type="eISSN">2041-1723</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p><italic>Camelina sativa</italic>
 is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for <italic>C. sativa</italic>
 and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model <italic>Arabidopsis thaliana</italic>
. <italic>C. sativa</italic>
 represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of <italic>C. sativa</italic>
 surprisingly mirrors those of economically important amphidiploid <italic>Brassica</italic>
 crop species from lineage II as well as wheat and cotton. The three genomes of <italic>C. sativa</italic>
 show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of <italic>C. sativa</italic>
 presents significant consequences for breeding and genetic manipulation of this industrial oil crop.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Gehringer, A" uniqKey="Gehringer A">A. Gehringer</name>
</author>
<author><name sortKey="Friedt, W" uniqKey="Friedt W">W. Friedt</name>
</author>
<author><name sortKey="Luhs, W" uniqKey="Luhs W">W. Luhs</name>
</author>
<author><name sortKey="Snowdon, R J" uniqKey="Snowdon R">R. J. Snowdon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Moser, B R" uniqKey="Moser B">B. R. Moser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Seguin Swartz, G" uniqKey="Seguin Swartz G">G. Séguin-Swartz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Beilstein, M A" uniqKey="Beilstein M">M. A. Beilstein</name>
</author>
<author><name sortKey="Al Shehbaz, I A" uniqKey="Al Shehbaz I">I. A. Al-Shehbaz</name>
</author>
<author><name sortKey="Mathews, S" uniqKey="Mathews S">S. Mathews</name>
</author>
<author><name sortKey="Kellogg, E A" uniqKey="Kellogg E">E. A. Kellogg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hutcheon, C" uniqKey="Hutcheon C">C. Hutcheon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bennetzen, J L" uniqKey="Bennetzen J">J. L. Bennetzen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kumar, A" uniqKey="Kumar A">A. Kumar</name>
</author>
<author><name sortKey="Bennetzen, J L" uniqKey="Bennetzen J">J. L. Bennetzen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X. Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Parkin, I A" uniqKey="Parkin I">I. A. Parkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hu, T T" uniqKey="Hu T">T. T. Hu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Xu, X" uniqKey="Xu X">X. Xu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schmutz, J" uniqKey="Schmutz J">J. Schmutz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Paterson, A H" uniqKey="Paterson A">A. H. Paterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brenchley, R" uniqKey="Brenchley R">R. Brenchley</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dujon, B" uniqKey="Dujon B">B. Dujon</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schranz, M E" uniqKey="Schranz M">M. E. Schranz</name>
</author>
<author><name sortKey="Lysak, M A" uniqKey="Lysak M">M. A. Lysak</name>
</author>
<author><name sortKey="Mitchell Olds, T" uniqKey="Mitchell Olds T">T. Mitchell-Olds</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cheng, F" uniqKey="Cheng F">F. Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mandakova, T" uniqKey="Mandakova T">T. Mandakova</name>
</author>
<author><name sortKey="Lysak, M A" uniqKey="Lysak M">M. A. Lysak</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koch, M A" uniqKey="Koch M">M. A. Koch</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Beilstein, M A" uniqKey="Beilstein M">M. A. Beilstein</name>
</author>
<author><name sortKey="Nagalingum, N S" uniqKey="Nagalingum N">N. S. Nagalingum</name>
</author>
<author><name sortKey="Clements, M D" uniqKey="Clements M">M. D. Clements</name>
</author>
<author><name sortKey="Manchester, S R" uniqKey="Manchester S">S. R. Manchester</name>
</author>
<author><name sortKey="Mathews, S" uniqKey="Mathews S">S. Mathews</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nagaharu, U" uniqKey="Nagaharu U">U. Nagaharu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mandakova, T" uniqKey="Mandakova T">T. Mandakova</name>
</author>
<author><name sortKey="Joly, S" uniqKey="Joly S">S. Joly</name>
</author>
<author><name sortKey="Krzywinski, M" uniqKey="Krzywinski M">M. Krzywinski</name>
</author>
<author><name sortKey="Mummenhoff, K" uniqKey="Mummenhoff K">K. Mummenhoff</name>
</author>
<author><name sortKey="Lysak, M A" uniqKey="Lysak M">M. A. Lysak</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schnable, J C" uniqKey="Schnable J">J. C. Schnable</name>
</author>
<author><name sortKey="Springer, N M" uniqKey="Springer N">N. M. Springer</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M. Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sankoff, D" uniqKey="Sankoff D">D. Sankoff</name>
</author>
<author><name sortKey="Zheng, C" uniqKey="Zheng C">C. Zheng</name>
</author>
<author><name sortKey="Zhu, Q" uniqKey="Zhu Q">Q. Zhu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Thomas, B C" uniqKey="Thomas B">B. C. Thomas</name>
</author>
<author><name sortKey="Pedersen, B" uniqKey="Pedersen B">B. Pedersen</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M. Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tang, H" uniqKey="Tang H">H. Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cheng, F" uniqKey="Cheng F">F. Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Langham, R J" uniqKey="Langham R">R. J. Langham</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Doyle, J J" uniqKey="Doyle J">J. J. Doyle</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gout, J F" uniqKey="Gout J">J.-F. Gout</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D. Kahn</name>
</author>
<author><name sortKey="Duret, L" uniqKey="Duret L">L. Duret</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li Beisson, Y" uniqKey="Li Beisson Y">Y. Li-Beisson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jackson, S" uniqKey="Jackson S">S. Jackson</name>
</author>
<author><name sortKey="Chen, Z J" uniqKey="Chen Z">Z. J. Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cusack, B P" uniqKey="Cusack B">B. P. Cusack</name>
</author>
<author><name sortKey="Wolfe, K H" uniqKey="Wolfe K">K. H. Wolfe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Blanc, G" uniqKey="Blanc G">G. Blanc</name>
</author>
<author><name sortKey="Wolfe, K H" uniqKey="Wolfe K">K. H. Wolfe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Luo, R" uniqKey="Luo R">R. Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pop, M" uniqKey="Pop M">M. Pop</name>
</author>
<author><name sortKey="Kosack, D S" uniqKey="Kosack D">D. S. Kosack</name>
</author>
<author><name sortKey="Salzberg, S L" uniqKey="Salzberg S">S. L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wu, T D" uniqKey="Wu T">T. D. Wu</name>
</author>
<author><name sortKey="Watanabe, C K" uniqKey="Watanabe C">C. K. Watanabe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Baird, N A" uniqKey="Baird N">N. A. Baird</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kurtz, S" uniqKey="Kurtz S">S. Kurtz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, S F" uniqKey="Altschul S">S. F. Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W. Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W. Miller</name>
</author>
<author><name sortKey="Myers, E W" uniqKey="Myers E">E. W. Myers</name>
</author>
<author><name sortKey="Lipman, D J" uniqKey="Lipman D">D. J. Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cantarel, B L" uniqKey="Cantarel B">B. L. Cantarel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haas, B J" uniqKey="Haas B">B. J. Haas</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salamov, A A" uniqKey="Salamov A">A. A. Salamov</name>
</author>
<author><name sortKey="Solovyev, V V" uniqKey="Solovyev V">V. V. Solovyev</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Stanke, M" uniqKey="Stanke M">M. Stanke</name>
</author>
<author><name sortKey="Tzvetkova, A" uniqKey="Tzvetkova A">A. Tzvetkova</name>
</author>
<author><name sortKey="Morgenstern, B" uniqKey="Morgenstern B">B. Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haas, B J" uniqKey="Haas B">B. J. Haas</name>
</author>
<author><name sortKey="Delcher, A L" uniqKey="Delcher A">A. L. Delcher</name>
</author>
<author><name sortKey="Wortman, J R" uniqKey="Wortman J">J. R. Wortman</name>
</author>
<author><name sortKey="Salzberg, S L" uniqKey="Salzberg S">S. L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Suzuki, Y" uniqKey="Suzuki Y">Y. Suzuki</name>
</author>
<author><name sortKey="Kawazu, T" uniqKey="Kawazu T">T. Kawazu</name>
</author>
<author><name sortKey="Koyama, H" uniqKey="Koyama H">H. Koyama</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Trapnell, C" uniqKey="Trapnell C">C. Trapnell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grabherr, M G" uniqKey="Grabherr M">M. G. Grabherr</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Montgomery, D C" uniqKey="Montgomery D">D. C. Montgomery</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">Nat Commun</journal-id>
<journal-id journal-id-type="iso-abbrev">Nat Commun</journal-id>
<journal-title-group><journal-title>Nature Communications</journal-title>
</journal-title-group>
<issn pub-type="epub">2041-1723</issn>
<publisher><publisher-name>Nature Pub. Group</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">24759634</article-id>
<article-id pub-id-type="pmc">4015329</article-id>
<article-id pub-id-type="pii">ncomms4706</article-id>
<article-id pub-id-type="doi">10.1038/ncomms4706</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>The emerging biofuel crop <italic>Camelina sativa</italic>
 retains a highly undifferentiated hexaploid genome structure</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Kagale</surname>
<given-names>Sateesh</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Koh</surname>
<given-names>Chushin</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Nixon</surname>
<given-names>John</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Bollina</surname>
<given-names>Venkatesh</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Clarke</surname>
<given-names>Wayne E.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Tuteja</surname>
<given-names>Reetu</given-names>
</name>
<xref ref-type="aff" rid="a3">3</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Spillane</surname>
<given-names>Charles</given-names>
</name>
<xref ref-type="aff" rid="a3">3</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Robinson</surname>
<given-names>Stephen J.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Links</surname>
<given-names>Matthew G.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Clarke</surname>
<given-names>Carling</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Higgins</surname>
<given-names>Erin E.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Huebert</surname>
<given-names>Terry</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Sharpe</surname>
<given-names>Andrew G.</given-names>
</name>
<xref ref-type="corresp" rid="c1">a</xref>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Parkin</surname>
<given-names>Isobel A. P.</given-names>
</name>
<xref ref-type="corresp" rid="c2">b</xref>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<aff id="a1"><label>1</label>
<institution>Saskatoon Research Centre, Agriculture and Agri-Food Canada</institution>
, 107 Science Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0X2</aff>
<aff id="a2"><label>2</label>
<institution>National Research Council Canada</institution>
, 110 Gymnasium Place, Saskatoon, Saskatchewan,<country>Canada</country>
 S7N 0W9</aff>
<aff id="a3"><label>3</label>
<institution>Plant and AgriBiosciences Centre (PABC), School of Natural Sciences, National University of Ireland Galway</institution>
, Galway,<country>Ireland</country>
</aff>
</contrib-group>
<author-notes><corresp id="c1"><label>a</label>
<email>andrew.sharpe@nrc-cnrc.gc.ca</email>
</corresp>
<corresp id="c2"><label>b</label>
<email>isobel.parkin@agr.gc.ca</email>
</corresp>
</author-notes>
<pub-date pub-type="epub"><day>23</day>
<month>04</month>
<year>2014</year>
</pub-date>
<volume>5</volume>
<elocation-id>3706</elocation-id>
<history><date date-type="received"><day>06</day>
<month>01</month>
<year>2014</year>
</date>
<date date-type="accepted"><day>21</day>
<month>03</month>
<year>2014</year>
</date>
</history>
<permissions><copyright-statement>Copyright © 2014, Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.</copyright-statement>
<copyright-year>2014</copyright-year>
<copyright-holder>Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><pmc-comment>author-paid</pmc-comment>
          <license-p>This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/</license-p>
</license>
</permissions>
<abstract><p><italic>Camelina sativa</italic>
 is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for <italic>C. sativa</italic>
 and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model <italic>Arabidopsis thaliana</italic>
. <italic>C. sativa</italic>
 represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of <italic>C. sativa</italic>
 surprisingly mirrors those of economically important amphidiploid <italic>Brassica</italic>
 crop species from lineage II as well as wheat and cotton. The three genomes of <italic>C. sativa</italic>
 show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of <italic>C. sativa</italic>
 presents significant consequences for breeding and genetic manipulation of this industrial oil crop.</p>
</abstract>
<abstract abstract-type="web-summary"><p><inline-graphic id="i1" xlink:href="ncomms4706-i1.jpg"></inline-graphic>
<italic>Camelina sativa</italic>
 is an oilseed crop with important industrial applications. Here, the authors sequence the <italic>C. sativa</italic>
 genome to investigate the genome organization and evolution of this species, and to provide a valuable tool for genetic engineering and potential crop improvement.</p>
</abstract>
</article-meta>
</front>
<body><p>C<italic>amelina sativa</italic>
 (false flax or gold of pleasure) is a relict oilseed crop of the Crucifer family (Brassicaceae) with centres of origin in southeastern Europe and southwestern Asia. <italic>C. sativa</italic>
 was cultivated in Europe as an important oilseed crop for many centuries before being displaced by higher-yielding crops such as canola (<italic>Brassica napus</italic>
) and wheat. <italic>C. sativa</italic>
 has several agronomic advantages for production, including early maturity, low requirement for water and nutrients, adaptability to adverse environmental conditions and resistance to common cruciferous pests and pathogens<xref ref-type="bibr" rid="b1">1</xref>
<xref ref-type="bibr" rid="b2">2</xref>
<xref ref-type="bibr" rid="b3">3</xref>
. With seed oil content (36–47%)<xref ref-type="bibr" rid="b2">2</xref>
 twice that of soybean (18–22%)<xref ref-type="bibr" rid="b2">2</xref>
 and a fatty acid profile (with >90% unsaturated fatty acids) suitable for making jet fuel, biodiesel and high-value industrial lubricants, <italic>C. sativa</italic>
 has tremendous potential to serve as a viable and renewable feedstock for multiple industries. Additionally, due to exceptionally high levels of α-linolenic acid (32–40% of total oil content)<xref ref-type="bibr" rid="b2">2</xref>
, <italic>C. sativa</italic>
 oil offers an additional source of essential fatty acids. The residual essential fatty acids combined with low glucosinolate levels in <italic>C. sativa</italic>
 meal make it desirable as an animal feed. Considering the broad applications, <italic>C. sativa</italic>
 is currently being re-embraced as an industrial oil platform crop; however, due to limited availability of genetic and genomic resources, the full agronomic and breeding potential of this emerging oilseed crop remains largely unexploited.</p>
<p>Genetically, <italic>C. sativa</italic>
 is closely related to the model plant <italic>Arabidopsis thaliana</italic>
 (lineage I of the Brassicaceae) and more distantly to the important vegetable oilseed crop, canola (lineage II)<xref ref-type="bibr" rid="b4">4</xref>
. The previously estimated genome size (750 Mb)<xref ref-type="bibr" rid="b5">5</xref>
 and chromosome count (<italic>n</italic>
=20) of <italic>C. sativa</italic>
 are higher compared with most of the Brassicaceae species that have been sequenced to date (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 1</xref>
). However, the genetic basis for the genome expansion of <italic>C. sativa</italic>
 is currently unknown. In many plant taxa, polyploidization and proliferation of transposable elements (TEs) are recognized as prevalent factors in plant genome expansion<xref ref-type="bibr" rid="b6">6</xref>
<xref ref-type="bibr" rid="b7">7</xref>
. Similar to the diploid progenitors of canola (<italic>B. rapa</italic>
 and <italic>B. oleracea</italic>
)<xref ref-type="bibr" rid="b8">8</xref>
<xref ref-type="bibr" rid="b9">9</xref>
, <italic>C. sativa</italic>
 is suggested to have undergone a genome triplication event<xref ref-type="bibr" rid="b5">5</xref>
. However, the evolutionary origin and mode of the polyploidization event that formed the <italic>C. sativa</italic>
 genome as well as the post-polyploidization evolutionary path leading to its diploidization are currently not understood.</p>
<p>Here we sequence a homozygous doubled haploid line of <italic>C. sativa</italic>
 and assembled 82% of the estimated genome size in order to decipher the genome organization of <italic>C. sativa</italic>
 and facilitate development of genetic and genomic tools essential for crop improvement. The genome sequence of <italic>C. sativa</italic>
 will provide an indispensable tool for genetic manipulation and further crop improvement.</p>
<sec disp-level="1" sec-type="results"><title>Results</title>
<sec disp-level="2"><title>Genome sequencing and assembly</title>
<p>The genome of a homozygous doubled haploid line of <italic>C. sativa</italic>
 (DH55) was sequenced using a hybrid Illumina and Roche 454 next-generation sequencing (NGS) approach (<xref ref-type="supplementary-material" rid="S1">Supplementary Note 1</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 2</xref>
). Filtered sequence data (96.53 Gb) provided 123 × coverage (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 1</xref>
) of the estimated genome size of 785 Mb (<xref ref-type="supplementary-material" rid="S1">Supplementary Note 2</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 3</xref>
), which was assembled using a hierarchical assembly strategy (<xref ref-type="supplementary-material" rid="S1">Supplementary Note 1</xref>
) into 37,871 scaffolds, with a sequence span of 641.45 Mb and an N50 size of 2.16 Mb (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 2</xref>
). A high-density genetic map based on 3,575 polymorphic markers allowed 608.54 Mb of the assembled genome, represented by 588 scaffolds to be anchored to the 20 chromosomes of <italic>C. sativa</italic>
 (<xref ref-type="fig" rid="f1">Fig. 1</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Table 3</xref>
), thereby producing a highly contiguous final assembly with an N50 size of >30 Mb (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 2</xref>
). The final genome assembly contains 641.45 Mb of sequence, covering 82% of the estimated genome size, 95% of which is in 20 chromosomes. A summary describing the overall features as well as completeness and contiguity of the genome assembly is provided in <xref ref-type="supplementary-material" rid="S1">Supplementary Table 4</xref>
. Comparison of the genome sequence with a set of independently assembled BAC scaffolds, expressed sequence tags (ESTs) and core eukaryotic genes (<xref ref-type="supplementary-material" rid="S1">Supplementary Note 3</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Tables 5–8</xref>
) confirmed the quality as well as near complete coverage of the euchromatic space and gene complement in the assembly.</p>
</sec>
<sec disp-level="2"><title>Repeat annotation and gene prediction</title>
<p>Repeat annotation revealed that 28% (180.12 Mb) of the assembled <italic>C. sativa</italic>
 genome comprises TEs (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 9</xref>
). Retrotransposons were found to be the dominant class of repeat elements (19%), while DNA transposons accounted for 3% of the genome. Similar to most higher plant genomes, repetitive elements in <italic>C. sativa</italic>
 were more abundant in the vicinity of centromeres and less so in gene-dense regions (<xref ref-type="fig" rid="f1">Fig. 1</xref>
). The genome occupancy of repetitive DNA in <italic>C. sativa</italic>
 (28%) is comparable to the low abundance of TEs in <italic>A. thaliana</italic>
 (24%)<xref ref-type="bibr" rid="b10">10</xref>
 and <italic>A. lyrata</italic>
 (30%)<xref ref-type="bibr" rid="b10">10</xref>
. However, it is much smaller than in <italic>B. rapa</italic>
 (39.5%)<xref ref-type="bibr" rid="b7">7</xref>
 as well as other similar-sized plant genomes, including potato (62%)<xref ref-type="bibr" rid="b11">11</xref>
, soybean (59%)<xref ref-type="bibr" rid="b12">12</xref>
 and sorghum (62%)<xref ref-type="bibr" rid="b13">13</xref>
. Thus, unlike <italic>B. rapa</italic>
 and other angiosperm species, genome expansion in <italic>C. sativa</italic>
 has not resulted from repetitive sequence proliferation.</p>
<p>RNA-seq data (78.5 Gb) was generated from tissue samples collected at 12 different growth stages to assist with annotation of protein-coding genes (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 10</xref>
). Based on a comprehensive strategy of <italic>ab initio</italic>
 gene prediction and homology evidence from proteome data sets, ESTs and RNA-seq transcripts (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 4</xref>
), 89,418 non-redundant <italic>C. sativa</italic>
 genes were predicted, of which 4,753 (5.3%) genes encoded two or more alternatively spliced isoforms (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 4</xref>
). More than 95% (85,274) of these annotated genes were located on the pseudochromosomes with the remainder on unanchored scaffolds. The overall gene model characteristics, such as gene length and exon-intron structures are comparable to other Brassicaceae species (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 5</xref>
). The genome composition (genic, intergenic and repeat regions) of <italic>C. sativa</italic>
 is more similar to that of <italic>A. lyrata</italic>
<xref ref-type="bibr" rid="b10">10</xref>
 than <italic>A. thaliana</italic>
 (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 11</xref>
). Based on sequence identity a total of 86,849 (97.13%) of the predicted <italic>C. sativa</italic>
 genes have homologues in the UniProt database (<xref ref-type="supplementary-material" rid="S1">Supplementary Data 1</xref>
), and RNA-seq evidence suggested that >90% of the genes were expressed (FPKM>0) in one or more developmental stages (<xref ref-type="fig" rid="f1">Fig. 1</xref>
). The genome sequence and its annotation are available along with a genome browser at <ext-link ext-link-type="uri" xlink:href="http://www.camelinadb.ca">http://www.camelinadb.ca</ext-link>
.</p>
<p>The predicted number of protein-coding genes in <italic>C. sativa</italic>
 is significantly higher than other currently sequenced plant genomes (<xref ref-type="fig" rid="f2">Fig. 2</xref>
, <xref ref-type="supplementary-material" rid="S1">Supplementary Table 12</xref>
). Interestingly, the gene number is similar to that predicted for bread wheat whose genome is almost 22 times larger than that of <italic>C. sativa</italic>
<xref ref-type="bibr" rid="b14">14</xref>
. The estimated total number of genes in <italic>C. sativa</italic>
 is approximately three times that of the model <italic>Arabidopsis</italic>
 species (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 11</xref>
), suggesting that the <italic>C. sativa</italic>
 genome resulted from a whole-genome triplication of a common ancestor. To determine whether the expanded gene repertoire in <italic>C. sativa</italic>
 could have arisen from an expansion of lineage-specific <italic>C. sativa</italic>
 orphan genes<xref ref-type="bibr" rid="b15">15</xref>
, we used a BLAST-based filtering approach comparing <italic>C. sativa</italic>
 genes with all sequenced plant taxa excluding the five Brassicaceae species. A total of 3,761 Brassicaceae-specific orphan genes were identified, of which 1,656 were <italic>C. sativa</italic>
–specific, which accounts for only 1.85% of annotated protein-coding genes in <italic>C. sativa</italic>
 (<xref ref-type="supplementary-material" rid="S1">Supplementary Note 4</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Tables 13 and 14</xref>
).</p>
</sec>
<sec disp-level="2"><title>Synteny and collinearity with Brassicaceae species</title>
<p>The genome sequence and gene annotations of <italic>C. sativa</italic>
 were compared with those of phylogenetically closely related Brassicaceae species (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 6</xref>
), including <italic>A. thaliana</italic>
 (model Crucifer species), <italic>A. lyrata</italic>
 (reference for ancestral karyotype) and <italic>B. rapa</italic>
 (reference for polyploidization). Chromosomal collinearity assessed through whole-genome alignments revealed a striking level of conservation between the genome sequence of <italic>C. sativa</italic>
 and the two <italic>Arabidopsis</italic>
 species (<xref ref-type="fig" rid="f3">Fig. 3a</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 7</xref>
). The longest stretches of conserved syntenic blocks were observed between the <italic>C. sativa</italic>
 and <italic>A. lyrata</italic>
 genomes (<xref ref-type="fig" rid="f3">Fig. 3a</xref>
), with syntenic regions spanning almost complete chromosomes from both of these species. Notably, every chromosome or chromosomal region in <italic>A. lyrata</italic>
 or <italic>A. thaliana</italic>
 was represented in three independent chromosomes in the <italic>C. sativa</italic>
 genome, thus providing robust evidence for a whole-genome triplication event.</p>
</sec>
<sec disp-level="2"><title>Reconstructing the three sub-genomes of hexaploid <italic>C. sativa</italic>
</title>
<p>The triplicated chromosomal segments in <italic>C. sativa</italic>
 were identified and assigned to a sub-genome using the protein-coding genes from <italic>A. thaliana</italic>
 as discrete genomic anchors to determine the corresponding syntenic orthologues (syntelogs) from <italic>C. sativa</italic>
 and the extent of the collinear conserved block environment. A syntelog matrix representing individual <italic>A. thaliana</italic>
 genes and the corresponding triplets of <italic>C. sativa</italic>
 homologues is presented in <xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
. A total of 62,277 <italic>C. sativa</italic>
 genes were found to be syntenically orthologous to <italic>A. thaliana</italic>
 genes; these genes will be referred to as ‘syntelogs’, and the remaining 27,141 genes are divided into tandem duplicates (10,792 genes) and ‘non-syntenic genes’ (16,349 genes). Syntelogs were further classified as either ‘fully retained’ (if all three homologues were retained) or ‘fractionated’ (if one or two of the homologues were lost).</p>
<p>Comparative mapping and cytogenetic studies have suggested that most of the Brassicaceae species have evolved from an ancestral karyotype comprising 8 chromosomes and 24 conserved genomic blocks (labelled as A–X)<xref ref-type="bibr" rid="b16">16</xref>
. To decipher the nature of the ancestral karyotype of <italic>C. sativa</italic>
, its genomic block (GB) structure was elucidated based on previously defined GB intervals in <italic>A. thaliana</italic>
<xref ref-type="bibr" rid="b17">17</xref>
. As expected, for each <italic>A. thaliana</italic>
 GB three syntenic copies were detected in <italic>C. sativa</italic>
 (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 8</xref>
). Of the 24 GBs, 20 were found to be maintained in three nearly undisrupted copies, whereas the remaining four GBs (D, E, I and J) exhibited rearrangements (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 8</xref>
).</p>
<p>Utilizing the extensive synteny and collinearity between <italic>C. sativa</italic>
 and <italic>Arabidopsis</italic>
 species, GB contiguity in the ancestral karyotype, and the assumption that syntenic fragments of each <italic>C. sativa</italic>
 chromosome derive from the same ancestral chromosome, the most parsimonious path to sub-genome structure was deduced (discussed in Methods) and the triplicated sub-genomes within <italic>C. sativa</italic>
 were reconstructed (<xref ref-type="fig" rid="f3">Fig. 3b</xref>
). Accordingly, sub-genome I of <italic>C. sativa</italic>
 (Cs-G1) contains six chromosomes, while the other two sub-genomes (Cs-G2 and Cs-G3) contain seven chromosomes each (<xref ref-type="fig" rid="f3">Fig. 3b</xref>
). The sub-genomes each encode 28,274, 27,218 and 29,207 genes, respectively (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 15</xref>
), which are highly comparable to the <italic>A. thaliana</italic>
 gene complement.</p>
</sec>
<sec disp-level="2"><title>Deciphering the ancestral karyotype of <italic>C. sativa</italic>
</title>
<p>The organization of the GBs in <italic>C. sativa</italic>
 was compared with previously inferred ancestral karyotypes, including the ancestral crucifer karyotype (ACK; <italic>n</italic>
=8)<xref ref-type="bibr" rid="b16">16</xref>
, proto-calepineae karyotype (PCK; <italic>n</italic>
=7)<xref ref-type="bibr" rid="b18">18</xref>
 and translocated PCK (tPCK<xref ref-type="bibr" rid="b17">17</xref>
; <italic>n</italic>
=7; <xref ref-type="supplementary-material" rid="S1">Supplementary Table 16</xref>
). All 16 GB associations that were defined in ACK were conserved in <italic>C. sativa</italic>
 but only 11 of 17 GB associations defined for PCK and tPCK were identified (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 16</xref>
), suggesting that <italic>C. sativa</italic>
 genome organization is more similar to the ACK (<xref ref-type="fig" rid="f3">Fig. 3c</xref>
). However, a number of additional unique GB associations observed in <italic>C. sativa</italic>
, such as D/I, E/I, N/J, Q/V, O/W and O/R have not been reported in ancestral Brassicaceae karyotypes defined so far. Only two of these novel GB associations D/I and E/I were common to all three sub-genomes within <italic>C. sativa</italic>
 (<xref ref-type="fig" rid="f3">Fig. 3b</xref>
). Additionally, Csa16, Csa7 and Csa5/9 from Cs-G1, Cs-G2 and Cs-G3, respectively, carrying these two novel GB associations display further rearrangements resulting in a common organization of GBs as J/I/D/E/I/E/I/D (<xref ref-type="fig" rid="f3">Fig. 3b</xref>
), suggesting the pre-existence of this novel GB structure in the parental karyotype from which the three sub-genomes of <italic>C. sativa</italic>
 evolved.</p>
<p>Based on the above observations, the putative diploid karyotype of <italic>C. sativa</italic>
, named dACK (derivative of ACK), comprising seven chromosomes (<xref ref-type="fig" rid="f3">Fig. 3d</xref>
) was inferred. The dACK karyotype (<italic>n</italic>
=7) comprises six ancestral chromosomes (AK1, 3, 5, 6, 7 and 8) and a chromosomal fusion (AK2/4) (<xref ref-type="fig" rid="f3">Fig. 3e</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 9</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Data 2</xref>
). Previous karyotype analyses of <italic>C. sativa</italic>
 and related <italic>Camelina</italic>
 species has detailed a range of chromosome numbers including <italic>n</italic>
=6, 7, 13 and 20 (ref. <xref ref-type="bibr" rid="b19">19</xref>
). The lower chromosome numbers are consistent with the dACK karyotype and the identified sub-genomes. The higher chromosome counts could suggest that two independent hybridization events resulted in the current hexaploid genome. However, without reference to extant diploid relatives of each sub-genome it is difficult to accurately determine the origin of <italic>C. sativa</italic>
. EST contigs derived from 454 pyrosequencing of the leaf transcriptome of five representatives of all known lower chromosome number <italic>Camelina</italic>
 species were used to derive the phylogenetic relationship between these species and the three sub-genomes (<xref ref-type="fig" rid="f4">Fig. 4</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Note 5</xref>
). The genomes of Cs-G1 and Cs-G2 are more closely related to each other than any of the diploids assayed, which could suggest an initial tetraploidisation event of two closely related species (that is, possibly an amphidiploid), subsequently followed by an additional hybridization event through which Cs-G3 joined, resulting in a hexaploid genome.</p>
<p>Analysis of the distribution of synonymous substitutions (<italic>K</italic>
s) among the coding regions of paralogous gene pairs provided an estimate of the age of divergence of the three sub-genomes in <italic>C. sativa</italic>
. Mixture model analysis of the <italic>K</italic>
s distribution uncovered the previously documented Brassicaceae-related α, β and γ paleopolyploidy events, and revealed the presence of an additional peak at <italic>K</italic>
s=~0.09 (<xref ref-type="fig" rid="f5">Fig. 5</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 10</xref>
). Assuming an established synonymous substitution rate of 8.22 × 10<sup>−9</sup>
 substitutions/synonymous site/year for Brassicaceae species<xref ref-type="bibr" rid="b20">20</xref>
, the three genomes of <italic>C. sativa</italic>
 were estimated to have separated ~5.41 million years ago (Mya), which is comparable to the divergence time of the three mesopolyploid (functionally diploid) <italic>Brassica</italic>
 genomes (<italic>B. oleracea</italic>
, <italic>B. rapa</italic>
 and <italic>B. nigra</italic>
) that fused in all pairwise combinations to form the allopolyploid crop species <italic>B. napus</italic>
 (canola), <italic>B. juncea</italic>
 (oriental mustard) and <italic>B. carinata</italic>
 (Ethiopian mustard)<xref ref-type="bibr" rid="b21">21</xref>
. The mesopolyploid structure of the <italic>Brassica</italic>
 diploid genomes exhibit extensive reduction of chromosome number and rearrangement of ancestral chromosomal blocks. This structure is also mirrored in Australian Brassicaceae species, including <italic>Stenopetalum</italic>
 and <italic>Ballantinia</italic>
 species that diverged ~5.9 Mya (ref. <xref ref-type="bibr" rid="b22">22</xref>
). The relatively unarranged nature of the <italic>C. sativa</italic>
 sub-genomes with respect to <italic>A. thaliana</italic>
 and <italic>A. lyrata</italic>
 stands in contrast to these observations but could reflect a more highly conserved nature of these species within the Camelineae tribe. The three sub-genomes within <italic>C. sativa</italic>
, although showing some differentiation at the nucleotide-level (2–2.5% sequence variation across the coding regions), share a similar gene complement. The hybridization of sub-genomes in <italic>C. sativa</italic>
 probably occurred relatively recently, similar to the <italic>Brassica</italic>
 crop allopolyploids, resulting in insufficient time for the differentiation of gene complement within the three sub-genomes. Comparisons of pairs of syntelogs found between the three sub-genomes and <italic>A. thaliana</italic>
 revealed almost identical <italic>K</italic>
s distributions with a major peak at 0.28 (<xref ref-type="fig" rid="f5">Fig. 5</xref>
), suggesting that the diploid parents of the triplicated <italic>C. sativa</italic>
 sub-genomes shared a common ancestor that diverged from <italic>A. thaliana</italic>
 ~17 Mya.</p>
</sec>
<sec disp-level="2"><title>Homeologous gene expression bias and genome dominance</title>
<p>After polyploidization, duplicated genomes enter an evolutionary trajectory of genetic diploidization during which the nearly identical sub-genomes differentiate via biased loss of homologous genes (fractionation), which is commonly associated with overexpression of genes from the least fractionated sub-genome (genome dominance)<xref ref-type="bibr" rid="b7">7</xref>
<xref ref-type="bibr" rid="b23">23</xref>
<xref ref-type="bibr" rid="b24">24</xref>
<xref ref-type="bibr" rid="b25">25</xref>
<xref ref-type="bibr" rid="b26">26</xref>
<xref ref-type="bibr" rid="b27">27</xref>
<xref ref-type="bibr" rid="b28">28</xref>
. Preceding genome fractionation is a period of ‘genomic shock’ when a mixture of genetic and epigenetic mechanisms is proposed to lead to neo- or subfunctionalization of duplicated genes.</p>
<p>Enumeration of syntelogs revealed that the three sub-genomes (Cs-G1, Cs-G2 and Cs-G3) of <italic>C. sativa</italic>
 have retained an almost identical number of genes (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 17</xref>
), which compares starkly with the differentially fractionated LF, MF1 and MF2 sub-genomes of <italic>B. rapa</italic>
 that have retained only 13,296, 8,891 and 7,659 genes, respectively (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 11</xref>
). However, it is noteworthy that the rate of gene loss in all three sub-genomes of <italic>C. sativa</italic>
 (5% of genes per million year (Myr) is identical to the rate of gene loss in the MF1 sub-genome of <italic>B. rapa</italic>
 (5% of genes per Myr; <xref ref-type="supplementary-material" rid="S1">Supplementary Table 18</xref>
), indicating that the sub-genomes of <italic>C. sativa</italic>
 despite lacking a fractionation bias are experiencing the expected exponential decay pattern of gene loss immediately following whole-genome duplication<xref ref-type="bibr" rid="b29">29</xref>
. Deletion of exonic sequences is one of the mechanisms by which genes are potentially rendered non-functional and subsequently removed from polyploid genomes<xref ref-type="bibr" rid="b26">26</xref>
. Comparing the coding sequences of homeologous genes within triplicated regions demonstrated that both the number of exons and length of coding sequences in syntelogs were highly conserved (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 19</xref>
), indicating that limited insertions or deletions have accumulated in coding sequences of <italic>C. sativa</italic>
 within the last 5.5 Myrs.</p>
<p>Gene expression levels are a major determinant of the fate of duplicated copies following whole-genome duplication. It has been shown that the rate of gene loss post polyploidisation may be negatively correlated to the level of gene expression<xref ref-type="bibr" rid="b30">30</xref>
. Comparison of the expression levels of fully retained and fractionated sets of <italic>C. sativa</italic>
 genes revealed a positive correlation between retention rate of triplicated homeologues and their average expression levels (<xref ref-type="fig" rid="f6">Fig. 6a</xref>
), suggesting that highly expressed genes tend to persist longer. Evidence of genome dominance in <italic>C. sativa</italic>
 was assessed by comparing expression differences between the sub-genomes of <italic>C. sativa</italic>
 based on transcript abundance of genes across 12 different tissue types (listed in <xref ref-type="supplementary-material" rid="S1">Supplementary Table 10</xref>
). Only the fully retained homeologues were included in this analysis to avoid discrepancy due to fractionation. At sub-genome (G) and tissue-type (T) interaction (G × T) level a statistically significant difference was revealed (<italic>P</italic>
<0.05; ANOVA test for interaction) for 77% (14,391 triplets) of fully retained homeologues. The genes of sub-genome Cs-G3 showed a clear expression level advantage over the other two sub-genomes (<xref ref-type="fig" rid="f6">Fig. 6b</xref>
), which could result from a two-stage polyploidisation pathway.</p>
<p>Only 4,106 of the 14,391 triplets revealed an interaction effect of considerable magnitude (STDEV (G × T) >0.25; <xref ref-type="fig" rid="f6">Fig. 6c</xref>
; <xref ref-type="table" rid="t1">Table 1</xref>
); this set was designated as the ‘interaction group’ (<xref ref-type="fig" rid="f6">Fig. 6c</xref>
). Since the differential expression profiles may result from divergent fates of the triplicated genes, further analyses determined the levels of non-functionalization (silencing), neofunctionalization (diversification) or subfunctionalization (shared and/or partitioned functions). Across all 12 tissue types, ~5% of the fully retained homeologues were found to be silenced (FPKM=0 in one or two sub-genomes) (<xref ref-type="table" rid="t1">Table 1</xref>
; <xref ref-type="supplementary-material" rid="S1">Supplementary Table 20</xref>
) with the non-expressed homeologues being equally distributed across all three sub-genomes. Hierarchical clustering of all 12,212 genes belonging to the interaction group based on patterns of gene expression across all 12 tissue types revealed seven major clusters (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 12</xref>
). Genes belonging to each cluster were predominantly expressed in one or a few tissue types, suggesting potential tissue-specific functions. However, genes from 34% (1,316) of the homeologous triplets clustered together; with the remaining 66% being separated across two (699 triplets) or three (1,904 triplets) independent clusters. As homeologues with altered expression patterns potentially may acquire new or additional functions, the results provide evidence for functional diversification of a subset of triplicated genes within <italic>C. sativa</italic>
. The majority of the triplicated genes (78%) showed no significant differences in either expression levels or tissue specificity, which could impact manipulation of the crop phenotype.</p>
</sec>
<sec disp-level="2"><title>Oil metabolism genes</title>
<p>In the context of the importance of <italic>C. sativa</italic>
 as a biofuel crop, we examined the fractionation and expression divergence of genes encoding proteins and regulatory factors involved in acyl-lipid metabolism<xref ref-type="bibr" rid="b31">31</xref>
. More than 80% of the 736 non-redundant genes<xref ref-type="bibr" rid="b31">31</xref>
 governing various steps in acyl-lipid biosynthesis, accumulation and degradation were found to be retained in three copies (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 21</xref>
). A subset of the acyl-lipid metabolism genes (26%), mainly involved in fatty acid and triacylglycerol biosynthesis, elongation or degradation, experienced further expansion with a few genes, such as CER1 and WS (both involved in fatty acid elongation and wax biosynthesis) and LOX (involved in oxylipin metabolism), having >10 paralogues (<xref ref-type="supplementary-material" rid="S1">Supplementary Data 3</xref>
). The overall expansion of lipid metabolism gene families in <italic>C. sativa</italic>
 (217% compared with <italic>A. thaliana</italic>
) is significantly larger than in soybean (63% increase compared with <italic>A. thaliana</italic>
)<xref ref-type="bibr" rid="b12">12</xref>
. Analysis of expression divergence revealed significant differences among only 31% (181 triplets) of the fully retained lipid metabolism genes (<xref ref-type="fig" rid="f6">Fig. 6c</xref>
; <xref ref-type="table" rid="t1">Table 1</xref>
). Hierarchical clustering of genes belonging to this set revealed that only 15% of acyl-lipid metabolism genes were experiencing functional diversification (<xref ref-type="table" rid="t1">Table 1</xref>
). The higher retention of lipid metabolism genes in <italic>C. sativa</italic>
 does not necessarily reflect adaptation of an oilseed phenotype and could merely be a consequence of polyploidy; however, the larger number of genes involved in lipid metabolism in <italic>C. sativa</italic>
, with the majority showing no expression or functional divergence, suggests that complex regulatory mechanisms govern oil biosynthesis to ensure gene dosage balance. The knowledge of copy number, genomic context and regulation of oil metabolism genes will aid in the future manipulation of biofuel traits in <italic>C. sativa.</italic>
</p>
</sec>
</sec>
<sec disp-level="1" sec-type="discussion"><title>Discussion</title>
<p><italic>C. sativa</italic>
 is an excellent model that exemplifies the selection of a crop ecotype from a weedy ancestor. Without knowledge of the parental diploids it is difficult to predict the exact origin of <italic>C. sativa</italic>
; however, the strict maintenance of homologous recombination between highly syntenic sub-genomes suggests that like many successful crop species <italic>C. sativa</italic>
 may have been formed through inter-specific hybridization of lower chromosome number ancestors. The emerging signatures of genome dominance and functional diversification among a subset of genes in <italic>C. sativa</italic>
 are largely concordant with the characteristics of genomic shock triggered by hybridization and dosage imbalance during allopolyploid formation<xref ref-type="bibr" rid="b29">29</xref>
. The genome sequence strongly supports the hypothesis that the relatively large genome size and high gene content of <italic>C. sativa</italic>
 are the consequence of two polyploidy events from an ancestral genome similar to <italic>A. lyrata</italic>
. The minimal chromosomal rearrangements and lack of widespread or biased fractionation in the three <italic>C. sativa</italic>
 sub-genomes suggests that the hybridization of the sub-genomes occurred in quick succession and relatively recently, probably emerging on the same time scale as crops such as canola, cotton or wheat, during the rapid expansion of agricultural practices 5–10,000 ya. It is remarkable that despite having nearly identical sub-genomes <italic>C. sativa</italic>
 behaves like a diploid with normal disomic inheritance. The age of divergence of the three genomes (5 Mya) may have been sufficient to preclude homeologous pairing, as has been suggested for the natural allopolyploid <italic>Arabidopsis suecica</italic>
<xref ref-type="bibr" rid="b32">32</xref>
, or <italic>C. sativa</italic>
 similar to wheat and canola may have established the diploid behaviour of the sub-genomes through genetic control of aberrant pairing<xref ref-type="bibr" rid="b32">32</xref>
.</p>
<p>Polyploidization generally evokes a myriad of genetic and epigenetic responses resulting in expression variation and novel regulatory interactions, which is thought to lead to subfunctionalization or neofunctionalization of duplicated genes<xref ref-type="bibr" rid="b29">29</xref>
<xref ref-type="bibr" rid="b33">33</xref>
<xref ref-type="bibr" rid="b34">34</xref>
. The genome Cs-G3 shows some evidence of expression dominance and a small proportion of the triplicated genes (22%) suggest functional diversification. However, the overall expression landscape of <italic>C. sativa</italic>
 supports preferential sheltering of duplicated genetic material, which would accommodate buffering of essential functions and maintenance of gene dosage balance.</p>
<p>Polyploidy has commonly been associated with increased allelic diversity, heterozygosity and fixed heterosis, contributing to increased vigour, productivity and novel phenotypic variation, with the resultant prevalence of this phenomenon among crop species. The triplicated gene repertoire of <italic>C. sativa</italic>
 may be the genetic basis of several of its desirable agronomic and oil-quality attributes. One of the challenging practical consequences of the homogenous polyploid genetic code in <italic>C. sativa</italic>
 is that most traits will be controlled by multiple loci, where both traditional breeding and gene manipulation approaches will be more difficult. However, knowledge of the genome organization of <italic>C. sativa</italic>
 combined with ongoing efforts directed towards characterization of its transcriptome and germplasm diversity will accelerate future breeding of elite cultivars and designer oilseed lines for the biofuel and chemical industries.</p>
</sec>
<sec disp-level="1" sec-type="methods"><title>Methods</title>
<sec disp-level="2"><title>Plant material and nuclear DNA isolation</title>
<p>A homozygous doubled haploid line DH55, derived from <italic>C. sativa</italic>
 genotype SRS 933, was chosen for sequencing. For nuclei isolation, ~40 g of fresh leaf tissue from 4-week-old etiolated DH55 seedlings was homogenized in 200 ml ice-cold homogenization buffer (0.01 M Trizma base, 0.08 M KCL, 0.01 M EDTA, 1 mM spermidine, 1 mM spermine, 0.5 M sucrose plus 0.15% β-mercaptoethanol, pH 9.4–9.5). The homogenate was filtered through two layers of cheesecloth and one layer of miracloth, and the nuclei pelleted by centrifugation at 1,800 <italic>g</italic>
 at 4 °C for 20 min. The pellet was resuspensed in wash buffer (1 × homogenization buffer plus 0.5% Triton-X100) followed by centrifugation at 1,800 <italic>g</italic>
 at 4 °C for 15 min, three times. After the final wash, the nuclei were resuspended in 10 ml lysis buffer (100 mM TrisCl, 100 mM NaCl, 50 mM EDTA, 2% SDS). High-molecular weight genomic DNA was then extracted by traditional proteinase K (0.05 mg ml<sup>−1</sup>
; 65 °C for 2 h) digestion followed by RNAase A treatment, two cycles of phenol/chloroform extraction and ethanol precipitation. Quantification of genomic DNA was performed using PicoGreen dsDNA kit (Molecular Probes).</p>
</sec>
<sec disp-level="2"><title>Library construction and sequencing</title>
<p>Genomic DNA (5–40 μg) was randomly sheared using the Covaris S2 ultrasonicator (Covaris Inc.), Hydroshear (Genomic Solutions Inc.) or gas-driven nebulizers. For Illumina sequencing, two paired-end (PE) libraries (with median insert sizes of 225 and 325 bps; <xref ref-type="supplementary-material" rid="S1">Supplementary Table 1</xref>
) and three short-span mate-paired (MP) libraries (3, 5 and 8 Kb) were constructed following standard TruSeq DNA sample preparation and MP library preparation kit v2 (Illumina), respectively. All libraries were size-selected using Pippin prep automated gel electrophoresis system (Sage Science), quantified using a Bioanalyzer (Agilent) and KAPA library quantification kit for Illumina (KAPA Biosystems) and sequenced from both ends (PE) for 100 cycles on an Illumina HiSeq 2000 instrument.</p>
<p>For 454 pyrosequencing, three medium-span MP libraries with median insert sizes of 15, 20 and 25 kb (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 1</xref>
) were constructed following the method described in the GS FLX Titanium 20-kb span MP library preparation manual from Roche. Additionally, for 454 pyrosequencing, we constructed a long-span fosmid-based 40 kb MP library using NxSeq 40-kb MP cloning kit (Lucigen) with several modifications to the manufacturer’s protocol. A detailed protocol is described in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 6</xref>
. These libraries were sequenced using a Roche 454 FLX Titanium sequencer.</p>
</sec>
<sec disp-level="2"><title>Genome assembly</title>
<p>Before assembly, all Illumina and 454 reads were filtered for adapter contamination, PCR duplicates, ambiguous residues (N’s) and low-quality regions, as described in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 7</xref>
. The initial backbone of the draft genome was assembled with Illumina reads using <italic>De Bruijn</italic>
 graph-based SOAPdenovo (version 2.01) assembler<xref ref-type="bibr" rid="b35">35</xref>
, run with a kmer parameter of 47 (selected after testing a range of kmer values between 31 and 55) and each library ranked according to insert size from smallest to largest. The gaps within assembled scaffolds were filled with the short insert (225 and 325 bp) PE reads using GapCloser (version 1.12)<xref ref-type="bibr" rid="b35">35</xref>
. The resulting assembly consisted of a total of 39,514 contigs and short scaffolds, with a sequence span of 641.39 Mb and an N50 size of 603 kb (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 2</xref>
).</p>
<p>To improve the scaffold size, we used Bambus<xref ref-type="bibr" rid="b36">36</xref>
 to overlay the MP information generated by 454 pyrosequencing onto SOAPdenovo scaffolds. To achieve this, 454 MP reads of 15–40 kb span were aligned to SOAPdenovo scaffolds using a genomic mapping and alignment programme (GMAP)<xref ref-type="bibr" rid="b37">37</xref>
. The output from GMAP was used to create a Bambus-compatible GDE-formatted contig file as a source of information about scaffold links. MP links were checked for validity. Redundant or multi-mapped mates were considered invalid; additionally, mates where only one read mapped or if both mates mapped to a single scaffold were also ignored. Thus, only MPs that uniquely mapped against two independent scaffolds with no overlap were considered valid. Bambus was run in a hierarchical fashion (each MP link considered in ascending order of their length) with scaffolding parameters, including redundancy (minimum number of links required to connect two scaffolds) level of 2 and link-size error (estimated error in mate span determination) of 5%. By default, the scaffolds resulting from Bambus are potentially ambiguous as two or more contigs may occupy the same place in the genome<xref ref-type="bibr" rid="b36">36</xref>
. Such situations may occur either due to misassembled repeats, or when assembling homeologs within polyploid plant genomes. We used the ‘untangle’ utility of Bambus to disambiguate such scaffolds and generate a collection of linear scaffolds. Bambus was able to order, orient and merge 5,247 of these pre-assembled SOAPdenovo scaffolds into 3,206 superscaffolds, resulting in a greatly improved assembly with an N50 size of 2.16 Mb (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 2</xref>
).</p>
</sec>
<sec disp-level="2"><title>High-density genetic mapping and anchoring of the genome</title>
<p>To order, orient and anchor Bambus scaffolds along the chromosome, a high-density genetic map representing 20 linkage groups was constructed using a mapping population (F6) of 96 recombinant inbred lines derived previously from a cross between the phenotypically distinct <italic>C. sativa</italic>
 cultivars Lindo and Licalla<xref ref-type="bibr" rid="b1">1</xref>
. A total of 3,575 polymorphic loci (SNPs, simple sequence repeats and insertion/deletion polymorphisms; <xref ref-type="supplementary-material" rid="S1">Supplementary Table 3</xref>
), identified by a combination of the GoldenGate genotyping assay (Illumina) and RAD (Restriction site Associated DNA)<xref ref-type="bibr" rid="b38">38</xref>
 approaches, were used to integrate Bambus superscaffolds with the genetic map.</p>
<p>To further assist with ordering and orientation of scaffolds for which there was paucity of adequate genetic recombination and markers, collinearity between <italic>C. sativa</italic>
 provisional pseudomolecules and <italic>A. thaliana</italic>
 chromosomes or <italic>A. lyrata</italic>
 scaffolds was established using NUCmer<xref ref-type="bibr" rid="b39">39</xref>
 and BLASTP<xref ref-type="bibr" rid="b40">40</xref>
. A total of 57 instances of false joins or insertions within Bambus superscaffolds were identified based on marker discontiguity and collinearity information. Such misassembled scaffolds were split and the correct position of each of the fragments was determined based on marker and collinearity information. Final scaffolds were renamed as ‘Scaffold’ and numbered sequentially based on their length from longest to shortest. The order and orientation of scaffolds within each pseudomolecule was determined based on marker order within each scaffold, and marker contiguity pattern between adjoining scaffolds. Scaffolds with too few markers were ordered and oriented using collinearity information. The final version of the draft genome representing 20 pseudochromosomes (corresponding to 20 linkage groups) and 37,398 unanchored scaffolds was collated using a custom Perl script, and the ordering and orientation information of scaffolds within each pseudochromosome was compiled in AGP files. The quality of the assembled genome was ascertained by performing several independent tests, as described in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 3</xref>
.</p>
</sec>
<sec disp-level="2"><title>Repeat annotation</title>
<p>Using the assembled <italic>C. sativa</italic>
 genome as input, a <italic>de novo</italic>
 repeat library was constructed by using RECON and RepeatScout within RepeatModeler (Version 1.05; <ext-link ext-link-type="uri" xlink:href="http://www.repeatmasker.org/RepeatModeler.html">http://www.repeatmasker.org/RepeatModeler.html</ext-link>
). To reduce potential false positives, repetitive sequences were compared (BLASTX with E-value cutoff of 1E−5) with annotated gene models in the <italic>A. thaliana</italic>
 protein database and significant non-TE hits were removed. The final consensus repeat library was used to mask the genome by RepeatMasker (Version 3.3.0; <ext-link ext-link-type="uri" xlink:href="http://www.repeatmasker.org/RMDownload.html">http://www.repeatmasker.org/RMDownload.html</ext-link>
).</p>
</sec>
<sec disp-level="2"><title>Gene annotation</title>
<p>For accurate annotation of gene models, an integrated computational approach (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 4</xref>
) based on two major annotation pipelines, Maker<xref ref-type="bibr" rid="b41">41</xref>
 and PASA<xref ref-type="bibr" rid="b42">42</xref>
, was adopted. Maker provides a simplified process for aligning ESTs and proteins to the genome, and integrates this external homology evidence with <italic>ab initio</italic>
 gene predictions to produce final gene annotations with evidence-based quality statistics. Inputs for Maker included the repeat-masked <italic>C. sativa</italic>
 genome assembly, 42,350 ESTs, a genome-guided <italic>de novo</italic>
 transcript assembly comprising 201,365 transcripts and a protein database containing annotated proteins from <italic>A. thaliana</italic>
, <italic>A. lyrata</italic>
, <italic>B. rapa</italic>
 and <italic>Thellungiella parvula</italic>
. <italic>Ab initio</italic>
 gene predictions were made by Fgenesh<xref ref-type="bibr" rid="b43">43</xref>
 and Augustus<xref ref-type="bibr" rid="b44">44</xref>
. Maker gene structure annotations were further updated by PASA using evidence from <italic>de novo</italic>
 RNA-seq assembly and Sanger/454 ESTs. Annotation updates by PASA included annotation of untranslated regions, addition of models for alternative splicing variants and gene boundary adjustments. A total of 84,071 genes were annotated by this approach, of which 7,175 genes were identified as ‘fused’ where two or more neighbouring <italic>A. thaliana</italic>
 genes aligned (BLASTN with E-value cutoff of 1E−10) to different parts of a single predicted gene model in <italic>C. sativa</italic>
. The fused genes were replaced with 12,984 alternative gene models and the output was passed through another round of PASA. By manual curation, 793 EST-only or other predictions that overlapped with gene models that had better external homology evidence support were removed. The final annotation set contained a total of 89,418 genes encoding 94,495 transcripts.</p>
</sec>
<sec disp-level="2"><title>Synteny analysis</title>
<p>Sequence homology was detected by BLASTP of the predicted proteins against <italic>A. thaliana</italic>
 proteome. BLAST hits with E-value of 1e−20 or better and within the top 40% drop from the best bit score were kept for further analysis. The chains of syntenic <italic>C. sativa</italic>
–<italic>A. thaliana</italic>
 gene pairs were computed by DAGChainer<xref ref-type="bibr" rid="b45">45</xref>
 using default parameters. In case of a <italic>C. sativa</italic>
 gene participating in more than one syntenic chain due to duplication in the <italic>A. thaliana</italic>
 genome, the <italic>C. sativa</italic>
–<italic>A. thaliana</italic>
 pair in the weaker scoring chain was removed from the analysis. The syntelog table was generated by placing the syntenic chains onto the <italic>C. sativa</italic>
 chromosomes.</p>
</sec>
<sec disp-level="2"><title>Reconstruction of triplicated sub-genomes within <italic>C. sativa</italic>
</title>
<p>Considering the high level of conservation of synteny and GB contiguity between <italic>C. sativa</italic>
 and <italic>Arabidopsis</italic>
 species, the <italic>A. lyrata</italic>
 genome was utilized to represent the genome organization of individual sub-genomes within <italic>C. sativa</italic>
. Since the prevalence of inversions and intrachromosomal rearrangements is thought to be more common than interchromosomal translocations<xref ref-type="bibr" rid="b23">23</xref>
, segments of each <italic>C. sativa</italic>
 chromosome syntenic to corresponding <italic>A. lyrata</italic>
 chromosomes were assumed to be derived from the same ancestral chromosome. In the event where interchromosomal rearrangements were inferred, the order, orientation and contiguity of genes across potential adjacent segments were examined, and the most parsimonious scenario for the original segment order and chromosome assignment within each sub-genome was deduced.</p>
</sec>
<sec disp-level="2"><title>Transcriptome sequencing</title>
<p>The whole-plant transcriptome of <italic>C. sativa</italic>
 based on Illumina RNA-seq data was characterized to assist in the genome annotation process. Twelve different tissue samples were collected during both vegetative (germinating seed, cotyledon, young leaf, senescing leaf, root and stem) and reproductive (bud, flower, and early, early-mid, late-mid and late seed development) stages of the life cycle (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 10</xref>
). For each tissue type, at least three independent biological replicates were analysed. Total RNA from vegetative tissue was isolated using the RNeasy plant mini kit (Qiagen), including on-column DNase digestion, according to the manufacturer’s instructions. Total RNA from siliques was isolated using a method described by Suzuki <italic>et al.</italic>
<xref ref-type="bibr" rid="b46">46</xref>
, consisting of a two-step extraction process with high sodium extraction buffer isopropanol precipitation and LiCl precipitation, and then cleaned using RNeasy plant Mini kit (Qiagen), including on-column DNase digestion. The integrity and quantity of total RNA was assessed using RNA 6000 Nano labchip on the BioAnalyzer (Agilent). Sequencing libraries were constructed following standard TruSeq RNA sample preparation guide (Illumina) and multiplexed (12 samples per lane of a flow cell), PE sequencing was performed using the Illumina Hiseq 2000 platform. A total of 78.5 Gb raw RNA-seq data was generated. Before assembly, all reads were filtered for adapter contamination, ambiguous residues (N’s) and low quality regions, as described in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 7</xref>
. <italic>De novo</italic>
 assembly of transcripts and expression analysis was performed using a combination of different programs, including Tophat<xref ref-type="bibr" rid="b47">47</xref>
, Cufflinks<xref ref-type="bibr" rid="b47">47</xref>
, Trinity<xref ref-type="bibr" rid="b48">48</xref>
 and PASA<xref ref-type="bibr" rid="b42">42</xref>
.</p>
<p>Clean and non-ribosomal reads from one biological replicate of each tissue sample were pooled and used for genome-guided <italic>de novo</italic>
 transcript assembly. A hybrid approach combining three different programs, including Tophat, Trinity and PASA, was employed to align RNA-seq reads to the genome (Tophat), assemble aligned reads (Trinity) and further align and assemble the Trinity-reconstructed transcripts (PASA). This approach produced 201,365 transcripts, which were used in the annotation of protein-coding genes in the <italic>C. sativa</italic>
 genome (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 4</xref>
).</p>
<p>For <italic>de novo</italic>
 transcriptome assembly, clean and non-ribosomal reads from all tissue samples (including all three biological replicates) were pooled. <italic>De novo</italic>
 assembly was carried out using Trinity with default parameters. The final assembly included 271,745 Trinity transcripts and 115,114 Trinity components, which were used for updating Maker gene structure annotation using PASA (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 4</xref>
).</p>
<p>To estimate transcript abundance, the whole-plant and tissue-wise expression of <italic>C. sativa</italic>
 genes was assessed using a Tophat and Cufflink-based method<xref ref-type="bibr" rid="b47">47</xref>
. For both Tophat and Cufflink analysis, default parameters were used except that the maximum and minimum intron lengths were set at 2,500 and 20, respectively. For whole-plant-wide expression analysis of <italic>C. sativa</italic>
 genes, RNA-seq data from all tissues and biological replicates were pooled. Transcript abundance was measured as fragments per kilobase of exon per million fragments mapped (FPKM) values.</p>
<p>For all the 18,565 fully retained <italic>Camelina sativa</italic>
 genes, the raw expression values (FPKM) from each of the 12 tissue types were transformed by adding 1 and taking the natural logarithm, in order to analyse expression divergence. All subsequent calculations were done with the transformed values. This transformation reduces the range of the data, and the residuals more frequently follow a normal distribution (determined by the distribution of <italic>P</italic>
 values for the Shapiro–Wilk test for normality), which is an assumption of ANOVA. Of the gene triplets, 18,491 had non-zero variance of expression. For the expression analyses, the mean was taken over the three replicates, for each sub-genome of origin (G) and tissue type (T) combination. The mean for each G was also obtained (over all T and replicates). Two-way ANOVAs were carried out to test for the interaction between the G effect and the T effect on expression in addition to the sum of these separate effects. The cumulative frequency of each genome demonstrating the highest expression (<xref ref-type="fig" rid="f6">Fig. 6b</xref>
) was obtained after sorting the gene triplets into increasing order of <italic>P</italic>
 value (ANOVA test for interaction, based on a sample size of 108 per gene triplet (3 genes by 12 tissue types by 3 replicates)) for the G × T effect. Of the 18,491 gene triplets, 14,391 showed a significant (<italic>P</italic>
<0.05; ANOVA test for interaction) G × T interaction effect. However, in most cases due to small statistical error the interaction was small in magnitude despite its statistical significance. The magnitude of interaction <italic>σ</italic>
<sub>i</sub>
 was estimated as the square root of the variance of the interaction in the random effects model<xref ref-type="bibr" rid="b49">49</xref>
 that considers the levels of G and T as randomly sampled. This estimate was obtained from equation (1)</p>
<p><disp-formula id="eq1"><inline-graphic id="d33e1486" xlink:href="ncomms4706-m1.jpg"></inline-graphic>
</disp-formula>
</p>
<p>where <italic>E</italic>
 stands for expectation, and the expected MS terms are the expected mean squares calculated as in ANOVA, and <italic>n</italic>
 is the number of replicates (in this case 3). The scatterplot (<xref ref-type="fig" rid="f6">Fig. 6c</xref>
) illustrates the relationship between these two measures of the interaction. Only 4,106 triplets are both statistically significant (<italic>P</italic>
<0.05; ANOVA test for interaction) and have a magnitude of interaction <italic>σ</italic>
<sub>i</sub>
>0.25. From this set, only the individual genes (12,112) that had a non-zero expression for at least one tissue type were considered for hierarchical clustering. The clustering was carried out using 1 minus the Pearson sample correlation as the distance measure, and this requires the sample variances of the means for each gene to be non-zero. These sample variances were only zero in the case that all the means were zero. The average linkage method was used for clustering. This method clusters correlated sets of expression means together regardless of the absolute magnitude of the expression levels. Using this clustering, the heatmap (<xref ref-type="supplementary-material" rid="S1">Supplementary Fig. 12</xref>
) was drawn using the statistical package R where the colours were 256 levels of rainbow, starting with red representing low values and ending with blue representing high values. The superimposed pink-coloured rectangles highlight areas of low or high expression. The breakpoints determining the <italic>y</italic>
 coordinates of the rectangles were also chosen with reference to the dendrogram for the clustering such that they separated clearly defined clusters of genes having similar expression patterns.</p>
</sec>
<sec disp-level="2"><title>Identification of orphan genes</title>
<p>A blast-based filtering approach was used to identify orphan genes in the <italic>C. sativa</italic>
 genome. BLASTP (E-value cutoff of 0.01) was used to search all predicted peptides of <italic>C. sativa</italic>
 against all 39 sequenced plant species available at phytozome.net (v9.1) excluding five Brassicaceae species. The orphan candidates were filtered out, and then BLAST searched against the NCBI nr, nt and est databases (E-value cutoff of 0.01). Orphans having significant blast hits with Brassicaceae species were filtered out. The species and family information of all the candidates were extracted from the NCBI taxonomy database using in-house developed scripts. Filtered candidates were further searched against the nr database using PSI-BLAST. Orphan candidates displaying InterProScan hits with non-Brassicaceae species were not considered as orphans. <italic>C. sativa</italic>
 specific orphans were extracted from the 3,761 Brassicaceae-specific orphans by BLAST search of these genes against the <italic>C. sativa</italic>
 genome.</p>
</sec>
<sec disp-level="2"><title>Identification of evolutionary origins of orphan genes</title>
<p><italic>C. sativa</italic>
 orphan genes originating by gene duplication events were identified by BLASTP and BLASTN searches against all non-orphan <italic>C. sativa</italic>
 genes. Orphan genes containing non-coding or out-of-frame CDS hits were identified using BLASTN against all sequenced plant species, and hits with Brassicaceae and non-Brassicaceae species were categorized. An orphan gene was considered to be overprinted if the peptide sequence of orphan overlapped with the CDSs of other genes (not with untranslated regions or intronic regions).</p>
</sec>
<sec disp-level="2"><title>BAC library construction and sequencing</title>
<p>A ~10-fold coverage BAC library of <italic>C. sativa</italic>
 was constructed in the pIndiogoBAC vector by Bio S&T Inc. (Montreal, Canada). Young etiolated leaves from seedlings of DH55 were used as source material for BAC library construction. A subset of 768 randomly selected BACs was sequenced by Amplicon Express Inc. (Pullman, WA, USA) by Focused Genome Sequencing, a next-generation sequencing-based method that allows high-quality assembly of BAC clone sequence data using the Illumina platform.</p>
</sec>
<sec disp-level="2"><title>Phylogenetic analysis of <italic>Camelina</italic>
 species</title>
<p><italic>De novo</italic>
 transcriptome sequencing of five <italic>Camelina</italic>
 species, including <italic>C. hispida, C. rumelica</italic>
 ssp. transcaspica, <italic>C. rumelica</italic>
 ssp. Iran, <italic>C. rumelica</italic>
 ssp. USSR and <italic>C. laxa</italic>
, was performed using Roche 454 pyrosequencing (<xref ref-type="supplementary-material" rid="S1">Supplementary Table 22</xref>
). The unigene sets generated by <italic>de novo</italic>
 EST assembly were utilized in establishing a highly resolved molecular phylogeny of <italic>Camelina</italic>
 species and their relationship with the three sub-genomes of <italic>C. sativa</italic>
. Additional information is provided in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 5</xref>
.</p>
</sec>
</sec>
<sec disp-level="1"><title>Author contributions</title>
<p>S.K., A.G.S and I.A.P.P conceived the study. V.B., E.E.H., T.H. and C.C. performed sequencing and genetic mapping. C.K., S.K., J.N., W.E.C., M.G.L., S.J.R., R.T. and C.S. carried out assembly, bioinformatic and statistical analyses. S.K and I.A.P.P wrote the manuscript. All authors discussed the results and commented on the manuscript.</p>
</sec>
<sec disp-level="1"><title>Additional information</title>
<p><bold>Accession codes:</bold>
 Sequence data for <italic>Camelina sativa</italic>
 have been deposited in DDBJ/EMBL/GenBank sequence read archive under the accession codes <ext-link ext-link-type="NCBI:sra" xlink:href="SRP038024">SRP038024</ext-link>
, <ext-link ext-link-type="NCBI:sra" xlink:href="SRS558774">SRS558774</ext-link>
, <ext-link ext-link-type="NCBI:sra" xlink:href="SRS566487">SRS566487</ext-link>
 and <ext-link ext-link-type="NCBI:sra" xlink:href="SRS559344">SRS559344</ext-link>
. The genome assembly for <italic>Camelina sativa</italic>
 has been deposited in DDBJ/EMBL/GenBank nucleotide core database under the accession code <ext-link ext-link-type="DDBJ/EMBL/GenBank" xlink:href="JFZQ00000000">JFZQ00000000</ext-link>
.</p>
<p><bold>How to cite this article:</bold>
 Kagale, S. <italic>et al.</italic>
 The emerging biofuel crop <italic>Camelina sativa</italic>
 retains a highly undifferentiated hexaploid genome structure. <italic>Nat. Commun.</italic>
 5:3706 doi: 10.1038/ncomms4706 (2014).</p>
</sec>
<sec sec-type="supplementary-material" id="S1"><title>Supplementary Material</title>
<supplementary-material id="d33e18" content-type="local-data"><caption><title>Supplementary Figures, Tables, Notes and References</title>
<p>Supplementary Figures 1-13, Supplementary Tables 1-23, Supplementary Notes 1-9 and Supplementary References</p>
</caption>
<media xlink:href="ncomms4706-s1.pdf"></media>
</supplementary-material>
<supplementary-material id="d33e24" content-type="local-data"><caption><title>Supplementary Data 1</title>
<p>Distribution of orthologues of <italic>Camelina sativa</italic>
 genes in other plant genomes</p>
</caption>
<media xlink:href="ncomms4706-s2.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e33" content-type="local-data"><caption><title>Supplementary Data 2</title>
<p>A syntelog matrix representing individual <italic>Arabidopsis thaliana</italic>
 genes and the corresponding triplets of <italic>Camelina sativa</italic>
 homeologues</p>
</caption>
<media xlink:href="ncomms4706-s3.xlsx"></media>
</supplementary-material>
<supplementary-material id="d33e45" content-type="local-data"><caption><title>Supplementary Data 3</title>
<p>Expansion of acyl-lipid metabolism genes in <italic>Camelina sativa</italic>
</p>
</caption>
<media xlink:href="ncomms4706-s4.xlsx"></media>
</supplementary-material>
</sec>
</body>
<back><ack><p>The work was supported by funding from the Saskatchewan Agricultural Development Fund and by funding to Genome Prairie from the Western Economic Partnership Agreement project ‘Prairie Gold’. We thank Doug Heath at Genome Prairie for project management support and Cathy Coutu for providing pod tissue for RNA extraction. We would also like to thank Rod Snowdon (Justus Liebig University, Giessen, Germany) for providing access to the recombinant inbred mapping population.</p>
</ack>
<ref-list><ref id="b1"><mixed-citation publication-type="journal"><name><surname>Gehringer</surname>
<given-names>A.</given-names>
</name>
, <name><surname>Friedt</surname>
<given-names>W.</given-names>
</name>
, <name><surname>Luhs</surname>
<given-names>W.</given-names>
</name>
 & <name><surname>Snowdon</surname>
<given-names>R. J.</given-names>
</name>
<article-title>Genetic mapping of agronomic traits in false flax (<italic>Camelina sativa</italic>
 subsp. sativa)</article-title>
. <source>Genome</source>
<volume>49</volume>
, <fpage>1555</fpage>
–<lpage>1563</lpage>
 (<year>2006</year>
).<pub-id pub-id-type="pmid">17426770</pub-id>
</mixed-citation>
</ref>
<ref id="b2"><mixed-citation publication-type="journal"><name><surname>Moser</surname>
<given-names>B. R.</given-names>
</name>
<article-title>Biodiesel from alternative oilseed feedstocks: camelina and field pennycress</article-title>
. <source>Biofuels</source>
<volume>3</volume>
, <fpage>193</fpage>
–<lpage>209</lpage>
 (<year>2012</year>
).</mixed-citation>
</ref>
<ref id="b3"><mixed-citation publication-type="journal"><name><surname>Séguin-Swartz</surname>
<given-names>G.</given-names>
</name>
<italic>et al.</italic>
<article-title>Diseases of <italic>Camelina sativa</italic>
 (false flax)</article-title>
. <source>Can. J. Plant Pathol.</source>
<volume>31</volume>
, <fpage>375</fpage>
–<lpage>386</lpage>
 (<year>2009</year>
).</mixed-citation>
</ref>
<ref id="b4"><mixed-citation publication-type="journal"><name><surname>Beilstein</surname>
<given-names>M. A.</given-names>
</name>
, <name><surname>Al-Shehbaz</surname>
<given-names>I. A.</given-names>
</name>
, <name><surname>Mathews</surname>
<given-names>S.</given-names>
</name>
 & <name><surname>Kellogg</surname>
<given-names>E. A.</given-names>
</name>
<article-title>Brassicaceae phylogeny inferred from phytochrome A and ndhF sequence data: tribes and trichomes revisited</article-title>
. <source>Am. J. Bot.</source>
<volume>95</volume>
, <fpage>1307</fpage>
–<lpage>1327</lpage>
 (<year>2008</year>
).<pub-id pub-id-type="pmid">21632335</pub-id>
</mixed-citation>
</ref>
<ref id="b5"><mixed-citation publication-type="journal"><name><surname>Hutcheon</surname>
<given-names>C.</given-names>
</name>
<italic>et al.</italic>
<article-title>Polyploid genome of <italic>Camelina sativa</italic>
 revealed by isolation of fatty acid synthesis genes</article-title>
. <source>BMC Plant Biol.</source>
<volume>10</volume>
, <fpage>233</fpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20977772</pub-id>
</mixed-citation>
</ref>
<ref id="b6"><mixed-citation publication-type="journal"><name><surname>Bennetzen</surname>
<given-names>J. L.</given-names>
</name>
<article-title>Mechanisms and rates of genome expansion and contraction in flowering plants</article-title>
. <source>Genetica</source>
<volume>115</volume>
, <fpage>29</fpage>
–<lpage>36</lpage>
 (<year>2002</year>
).<pub-id pub-id-type="pmid">12188046</pub-id>
</mixed-citation>
</ref>
<ref id="b7"><mixed-citation publication-type="journal"><name><surname>Kumar</surname>
<given-names>A.</given-names>
</name>
 & <name><surname>Bennetzen</surname>
<given-names>J. L.</given-names>
</name>
<article-title>Plant retrotransposons</article-title>
. <source>Annu. Rev. Genet.</source>
<volume>33</volume>
, <fpage>479</fpage>
–<lpage>532</lpage>
 (<year>1999</year>
).<pub-id pub-id-type="pmid">10690416</pub-id>
</mixed-citation>
</ref>
<ref id="b8"><mixed-citation publication-type="journal"><name><surname>Wang</surname>
<given-names>X.</given-names>
</name>
<italic>et al.</italic>
<article-title>The genome of the mesopolyploid crop species <italic>Brassica rapa</italic>
</article-title>
. <source>Nat. Genet.</source>
<volume>43</volume>
, <fpage>1035</fpage>
–<lpage>1039</lpage>
 (<year>2011</year>
).<pub-id pub-id-type="pmid">21873998</pub-id>
</mixed-citation>
</ref>
<ref id="b9"><mixed-citation publication-type="journal"><name><surname>Parkin</surname>
<given-names>I. A.</given-names>
</name>
<italic>et al.</italic>
<article-title>Segmental structure of the <italic>Brassica napus</italic>
 genome based on comparative analysis with <italic>Arabidopsis thaliana</italic>
</article-title>
. <source>Genetics</source>
<volume>171</volume>
, <fpage>765</fpage>
–<lpage>781</lpage>
 (<year>2005</year>
).<pub-id pub-id-type="pmid">16020789</pub-id>
</mixed-citation>
</ref>
<ref id="b10"><mixed-citation publication-type="journal"><name><surname>Hu</surname>
<given-names>T. T.</given-names>
</name>
<italic>et al.</italic>
<article-title>The <italic>Arabidopsis lyrata</italic>
 genome sequence and the basis of rapid genome size change</article-title>
. <source>Nat. Genet.</source>
<volume>43</volume>
, <fpage>476</fpage>
–<lpage>481</lpage>
 (<year>2011</year>
).<pub-id pub-id-type="pmid">21478890</pub-id>
</mixed-citation>
</ref>
<ref id="b11"><mixed-citation publication-type="journal"><name><surname>Xu</surname>
<given-names>X.</given-names>
</name>
<italic>et al.</italic>
<article-title>Genome sequence and analysis of the tuber crop potato</article-title>
. <source>Nature</source>
<volume>475</volume>
, <fpage>189</fpage>
–<lpage>195</lpage>
 (<year>2011</year>
).<pub-id pub-id-type="pmid">21743474</pub-id>
</mixed-citation>
</ref>
<ref id="b12"><mixed-citation publication-type="journal"><name><surname>Schmutz</surname>
<given-names>J.</given-names>
</name>
<italic>et al.</italic>
<article-title>Genome sequence of the palaeopolyploid soybean</article-title>
. <source>Nature</source>
<volume>463</volume>
, <fpage>178</fpage>
–<lpage>183</lpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20075913</pub-id>
</mixed-citation>
</ref>
<ref id="b13"><mixed-citation publication-type="journal"><name><surname>Paterson</surname>
<given-names>A. H.</given-names>
</name>
<italic>et al.</italic>
<article-title>The <italic>Sorghum bicolor</italic>
 genome and the diversification of grasses</article-title>
. <source>Nature</source>
<volume>457</volume>
, <fpage>551</fpage>
–<lpage>556</lpage>
 (<year>2009</year>
).<pub-id pub-id-type="pmid">19189423</pub-id>
</mixed-citation>
</ref>
<ref id="b14"><mixed-citation publication-type="journal"><name><surname>Brenchley</surname>
<given-names>R.</given-names>
</name>
<italic>et al.</italic>
<article-title>Analysis of the bread wheat genome using whole-genome shotgun sequencing</article-title>
. <source>Nature</source>
<volume>491</volume>
, <fpage>705</fpage>
–<lpage>710</lpage>
 (<year>2012</year>
).<pub-id pub-id-type="pmid">23192148</pub-id>
</mixed-citation>
</ref>
<ref id="b15"><mixed-citation publication-type="journal"><name><surname>Dujon</surname>
<given-names>B.</given-names>
</name>
<article-title>The yeast genome project: what did we learn?</article-title>
<source>Trends Genet.</source>
<volume>12</volume>
, <fpage>263</fpage>
–<lpage>270</lpage>
 (<year>1996</year>
).<pub-id pub-id-type="pmid">8763498</pub-id>
</mixed-citation>
</ref>
<ref id="b16"><mixed-citation publication-type="journal"><name><surname>Schranz</surname>
<given-names>M. E.</given-names>
</name>
, <name><surname>Lysak</surname>
<given-names>M. A.</given-names>
</name>
 & <name><surname>Mitchell-Olds</surname>
<given-names>T.</given-names>
</name>
<article-title>The ABC’s of comparative genomics in the Brassicaceae: building blocks of crucifer genomes</article-title>
. <source>Trends Plant Sci.</source>
<volume>11</volume>
, <fpage>535</fpage>
–<lpage>542</lpage>
 (<year>2006</year>
).<pub-id pub-id-type="pmid">17029932</pub-id>
</mixed-citation>
</ref>
<ref id="b17"><mixed-citation publication-type="journal"><name><surname>Cheng</surname>
<given-names>F.</given-names>
</name>
<italic>et al.</italic>
<article-title>Deciphering the diploid ancestral genome of the mesohexaploid <italic>Brassica rapa</italic>
</article-title>
. <source>Plant Cell</source>
<volume>25</volume>
, <fpage>1541</fpage>
–<lpage>1554</lpage>
 (<year>2013</year>
).<pub-id pub-id-type="pmid">23653472</pub-id>
</mixed-citation>
</ref>
<ref id="b18"><mixed-citation publication-type="journal"><name><surname>Mandakova</surname>
<given-names>T.</given-names>
</name>
 & <name><surname>Lysak</surname>
<given-names>M. A.</given-names>
</name>
<article-title>Chromosomal phylogeny and karyotype evolution in x=7 crucifer species (Brassicaceae)</article-title>
. <source>Plant Cell</source>
<volume>20</volume>
, <fpage>2559</fpage>
–<lpage>2570</lpage>
 (<year>2008</year>
).<pub-id pub-id-type="pmid">18836039</pub-id>
</mixed-citation>
</ref>
<ref id="b19"><mixed-citation publication-type="journal"><name><surname>Koch</surname>
<given-names>M. A.</given-names>
</name>
<italic>et al.</italic>
<article-title>BrassiBase: tools and biological resources to study characters and traits in the Brassicaceae-version 1.1</article-title>
. <source>Taxon</source>
<volume>61</volume>
, <fpage>1001</fpage>
–<lpage>1009</lpage>
 (<year>2012</year>
).</mixed-citation>
</ref>
<ref id="b20"><mixed-citation publication-type="journal"><name><surname>Beilstein</surname>
<given-names>M. A.</given-names>
</name>
, <name><surname>Nagalingum</surname>
<given-names>N. S.</given-names>
</name>
, <name><surname>Clements</surname>
<given-names>M. D.</given-names>
</name>
, <name><surname>Manchester</surname>
<given-names>S. R.</given-names>
</name>
 & <name><surname>Mathews</surname>
<given-names>S.</given-names>
</name>
<article-title>Dated molecular phylogenies indicate a Miocene origin for <italic>Arabidopsis thaliana</italic>
</article-title>
. <source>Proc. Natl Acad. Sci. USA</source>
<volume>107</volume>
, <fpage>18724</fpage>
–<lpage>18728</lpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20921408</pub-id>
</mixed-citation>
</ref>
<ref id="b21"><mixed-citation publication-type="journal"><name><surname>Nagaharu</surname>
<given-names>U.</given-names>
</name>
<article-title>Genome analysis in <italic>Brassica</italic>
 with special reference to the experimental formation of <italic>Brassica napus</italic>
 and peculiar mode of fertilization</article-title>
. <source>Jpn J. Bot</source>
<volume>7</volume>
, <fpage>389</fpage>
–<lpage>452</lpage>
 (<year>1935</year>
).</mixed-citation>
</ref>
<ref id="b22"><mixed-citation publication-type="journal"><name><surname>Mandakova</surname>
<given-names>T.</given-names>
</name>
, <name><surname>Joly</surname>
<given-names>S.</given-names>
</name>
, <name><surname>Krzywinski</surname>
<given-names>M.</given-names>
</name>
, <name><surname>Mummenhoff</surname>
<given-names>K.</given-names>
</name>
 & <name><surname>Lysak</surname>
<given-names>M. A.</given-names>
</name>
<article-title>Fast diploidization in close mesopolyploid relatives of Arabidopsis</article-title>
. <source>Plant Cell</source>
<volume>22</volume>
, <fpage>2277</fpage>
–<lpage>2290</lpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20639445</pub-id>
</mixed-citation>
</ref>
<ref id="b23"><mixed-citation publication-type="journal"><name><surname>Schnable</surname>
<given-names>J. C.</given-names>
</name>
, <name><surname>Springer</surname>
<given-names>N. M.</given-names>
</name>
 & <name><surname>Freeling</surname>
<given-names>M.</given-names>
</name>
<article-title>Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss</article-title>
. <source>Proc. Natl Acad. Sci. USA</source>
<volume>108</volume>
, <fpage>4069</fpage>
–<lpage>4074</lpage>
 (<year>2011</year>
).<pub-id pub-id-type="pmid">21368132</pub-id>
</mixed-citation>
</ref>
<ref id="b24"><mixed-citation publication-type="journal"><name><surname>Sankoff</surname>
<given-names>D.</given-names>
</name>
, <name><surname>Zheng</surname>
<given-names>C.</given-names>
</name>
 & <name><surname>Zhu</surname>
<given-names>Q.</given-names>
</name>
<article-title>The collapse of gene complement following whole genome duplication</article-title>
. <source>BMC Genomics</source>
<volume>11</volume>
, <fpage>313</fpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20482863</pub-id>
</mixed-citation>
</ref>
<ref id="b25"><mixed-citation publication-type="journal"><name><surname>Thomas</surname>
<given-names>B. C.</given-names>
</name>
, <name><surname>Pedersen</surname>
<given-names>B.</given-names>
</name>
 & <name><surname>Freeling</surname>
<given-names>M.</given-names>
</name>
<article-title>Following tetraploidy in an <italic>Arabidopsis</italic>
 ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes</article-title>
. <source>Genome Res.</source>
<volume>16</volume>
, <fpage>934</fpage>
–<lpage>946</lpage>
 (<year>2006</year>
).<pub-id pub-id-type="pmid">16760422</pub-id>
</mixed-citation>
</ref>
<ref id="b26"><mixed-citation publication-type="journal"><name><surname>Tang</surname>
<given-names>H.</given-names>
</name>
<italic>et al.</italic>
<article-title>Altered patterns of fractionation and exon deletions in <italic>Brassica rapa</italic>
 support a two-step model of paleohexaploidy</article-title>
. <source>Genetics</source>
<volume>190</volume>
, <fpage>1563</fpage>
–<lpage>1574</lpage>
 (<year>2012</year>
).<pub-id pub-id-type="pmid">22308264</pub-id>
</mixed-citation>
</ref>
<ref id="b27"><mixed-citation publication-type="journal"><name><surname>Cheng</surname>
<given-names>F.</given-names>
</name>
<italic>et al.</italic>
<article-title>Biased gene fractionation and dominant gene expression among the subgenomes of <italic>Brassica rapa</italic>
</article-title>
. <source>PLoS ONE</source>
<volume>7</volume>
, <fpage>e36442</fpage>
 (<year>2012</year>
).<pub-id pub-id-type="pmid">22567157</pub-id>
</mixed-citation>
</ref>
<ref id="b28"><mixed-citation publication-type="journal"><name><surname>Langham</surname>
<given-names>R. J.</given-names>
</name>
<italic>et al.</italic>
<article-title>Genomic duplication, fractionation and the origin of regulatory novelty</article-title>
. <source>Genetics</source>
<volume>166</volume>
, <fpage>935</fpage>
–<lpage>945</lpage>
 (<year>2004</year>
).<pub-id pub-id-type="pmid">15020478</pub-id>
</mixed-citation>
</ref>
<ref id="b29"><mixed-citation publication-type="journal"><name><surname>Doyle</surname>
<given-names>J. J.</given-names>
</name>
<italic>et al.</italic>
<article-title>Evolutionary genetics of genome merger and doubling in plants</article-title>
. <source>Annu. Rev. Genet.</source>
<volume>42</volume>
, <fpage>443</fpage>
–<lpage>461</lpage>
 (<year>2008</year>
).<pub-id pub-id-type="pmid">18983261</pub-id>
</mixed-citation>
</ref>
<ref id="b30"><mixed-citation publication-type="journal"><name><surname>Gout</surname>
<given-names>J.-F.</given-names>
</name>
, <name><surname>Kahn</surname>
<given-names>D.</given-names>
</name>
 & <name><surname>Duret</surname>
<given-names>L.</given-names>
</name>
 Paramecium Post-Genomics Consortium. <article-title>The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution</article-title>
. <source>PLoS Genet.</source>
<volume>6</volume>
, <fpage>e1000944</fpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20485561</pub-id>
</mixed-citation>
</ref>
<ref id="b31"><mixed-citation publication-type="journal"><name><surname>Li-Beisson</surname>
<given-names>Y.</given-names>
</name>
<italic>et al.</italic>
<article-title>Acyl-lipid metabolism</article-title>
. <source>Arabidopsis Book</source>
<volume>11</volume>
, <fpage>e0161</fpage>
 (<year>2013</year>
).<pub-id pub-id-type="pmid">23505340</pub-id>
</mixed-citation>
</ref>
<ref id="b32"><mixed-citation publication-type="journal"><name><surname>Jackson</surname>
<given-names>S.</given-names>
</name>
 & <name><surname>Chen</surname>
<given-names>Z. J.</given-names>
</name>
<article-title>Genomic and expression plasticity of polyploidy</article-title>
. <source>Curr. Opin. Plant Biol.</source>
<volume>13</volume>
, <fpage>153</fpage>
–<lpage>159</lpage>
 (<year>2010</year>
).<pub-id pub-id-type="pmid">20031477</pub-id>
</mixed-citation>
</ref>
<ref id="b33"><mixed-citation publication-type="journal"><name><surname>Cusack</surname>
<given-names>B. P.</given-names>
</name>
 & <name><surname>Wolfe</surname>
<given-names>K. H.</given-names>
</name>
<article-title>When gene marriages don’t work out: divorce by subfunctionalization</article-title>
. <source>Trends Genet.</source>
<volume>23</volume>
, <fpage>270</fpage>
–<lpage>272</lpage>
 (<year>2007</year>
).<pub-id pub-id-type="pmid">17418444</pub-id>
</mixed-citation>
</ref>
<ref id="b34"><mixed-citation publication-type="journal"><name><surname>Blanc</surname>
<given-names>G.</given-names>
</name>
 & <name><surname>Wolfe</surname>
<given-names>K. H.</given-names>
</name>
<article-title>Functional divergence of duplicated genes formed by polyploidy during <italic>Arabidopsis</italic>
 evolution</article-title>
. <source>Plant Cell</source>
<volume>16</volume>
, <fpage>1679</fpage>
–<lpage>1691</lpage>
 (<year>2004</year>
).<pub-id pub-id-type="pmid">15208398</pub-id>
</mixed-citation>
</ref>
<ref id="b35"><mixed-citation publication-type="journal"><name><surname>Luo</surname>
<given-names>R.</given-names>
</name>
<italic>et al.</italic>
<article-title>SOAPdenovo2: an empirically improved memory-efficient short-read <italic>de novo</italic>
 assembler</article-title>
. <source>Gigascience</source>
<volume>1</volume>
, <fpage>18</fpage>
 (<year>2012</year>
).<pub-id pub-id-type="pmid">23587118</pub-id>
</mixed-citation>
</ref>
<ref id="b36"><mixed-citation publication-type="journal"><name><surname>Pop</surname>
<given-names>M.</given-names>
</name>
, <name><surname>Kosack</surname>
<given-names>D. S.</given-names>
</name>
 & <name><surname>Salzberg</surname>
<given-names>S. L.</given-names>
</name>
<article-title>Hierarchical scaffolding with Bambus</article-title>
. <source>Genome Res.</source>
<volume>14</volume>
, <fpage>149</fpage>
–<lpage>159</lpage>
 (<year>2004</year>
).<pub-id pub-id-type="pmid">14707177</pub-id>
</mixed-citation>
</ref>
<ref id="b37"><mixed-citation publication-type="journal"><name><surname>Wu</surname>
<given-names>T. D.</given-names>
</name>
 & <name><surname>Watanabe</surname>
<given-names>C. K.</given-names>
</name>
<article-title>GMAP: a genomic mapping and alignment program for mRNA and EST sequences</article-title>
. <source>Bioinformatics</source>
<volume>21</volume>
, <fpage>1859</fpage>
–<lpage>1875</lpage>
 (<year>2005</year>
).<pub-id pub-id-type="pmid">15728110</pub-id>
</mixed-citation>
</ref>
<ref id="b38"><mixed-citation publication-type="journal"><name><surname>Baird</surname>
<given-names>N. A.</given-names>
</name>
<italic>et al.</italic>
<article-title>Rapid SNP discovery and genetic mapping using sequenced RAD markers</article-title>
. <source>PLoS ONE</source>
<volume>3</volume>
, <fpage>e3376</fpage>
 (<year>2008</year>
).<pub-id pub-id-type="pmid">18852878</pub-id>
</mixed-citation>
</ref>
<ref id="b39"><mixed-citation publication-type="journal"><name><surname>Kurtz</surname>
<given-names>S.</given-names>
</name>
<italic>et al.</italic>
<article-title>Versatile and open software for comparing large genomes</article-title>
. <source>Genome Biol.</source>
<volume>5</volume>
, <fpage>R12</fpage>
 (<year>2004</year>
).<pub-id pub-id-type="pmid">14759262</pub-id>
</mixed-citation>
</ref>
<ref id="b40"><mixed-citation publication-type="journal"><name><surname>Altschul</surname>
<given-names>S. F.</given-names>
</name>
, <name><surname>Gish</surname>
<given-names>W.</given-names>
</name>
, <name><surname>Miller</surname>
<given-names>W.</given-names>
</name>
, <name><surname>Myers</surname>
<given-names>E. W.</given-names>
</name>
 & <name><surname>Lipman</surname>
<given-names>D. J.</given-names>
</name>
<article-title>Basic local alignment search tool</article-title>
. <source>J. Mol. Biol.</source>
<volume>215</volume>
, <fpage>403</fpage>
–<lpage>410</lpage>
 (<year>1990</year>
).<pub-id pub-id-type="pmid">2231712</pub-id>
</mixed-citation>
</ref>
<ref id="b41"><mixed-citation publication-type="journal"><name><surname>Cantarel</surname>
<given-names>B. L.</given-names>
</name>
<italic>et al.</italic>
<article-title>MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes</article-title>
. <source>Genome Res.</source>
<volume>18</volume>
, <fpage>188</fpage>
–<lpage>196</lpage>
 (<year>2008</year>
).<pub-id pub-id-type="pmid">18025269</pub-id>
</mixed-citation>
</ref>
<ref id="b42"><mixed-citation publication-type="journal"><name><surname>Haas</surname>
<given-names>B. J.</given-names>
</name>
<italic>et al.</italic>
<article-title>Improving the <italic>Arabidopsis</italic>
 genome annotation using maximal transcript alignment assemblies</article-title>
. <source>Nucleic Acids Res.</source>
<volume>31</volume>
, <fpage>5654</fpage>
–<lpage>5666</lpage>
 (<year>2003</year>
).<pub-id pub-id-type="pmid">14500829</pub-id>
</mixed-citation>
</ref>
<ref id="b43"><mixed-citation publication-type="journal"><name><surname>Salamov</surname>
<given-names>A. A.</given-names>
</name>
 & <name><surname>Solovyev</surname>
<given-names>V. V.</given-names>
</name>
<article-title><italic>Ab initio</italic>
 gene finding in Drosophila genomic DNA</article-title>
. <source>Genome Res.</source>
<volume>10</volume>
, <fpage>516</fpage>
–<lpage>522</lpage>
 (<year>2000</year>
).<pub-id pub-id-type="pmid">10779491</pub-id>
</mixed-citation>
</ref>
<ref id="b44"><mixed-citation publication-type="journal"><name><surname>Stanke</surname>
<given-names>M.</given-names>
</name>
, <name><surname>Tzvetkova</surname>
<given-names>A.</given-names>
</name>
 & <name><surname>Morgenstern</surname>
<given-names>B.</given-names>
</name>
<article-title>AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome</article-title>
. <source>Genome Biol.</source>
<volume>7</volume>
, <fpage>S11</fpage>
 (11–18) (<year>2006</year>
).<pub-id pub-id-type="pmid">16925833</pub-id>
</mixed-citation>
</ref>
<ref id="b45"><mixed-citation publication-type="journal"><name><surname>Haas</surname>
<given-names>B. J.</given-names>
</name>
, <name><surname>Delcher</surname>
<given-names>A. L.</given-names>
</name>
, <name><surname>Wortman</surname>
<given-names>J. R.</given-names>
</name>
 & <name><surname>Salzberg</surname>
<given-names>S. L.</given-names>
</name>
<article-title>DAGchainer: a tool for mining segmental genome duplications and synteny</article-title>
. <source>Bioinformatics</source>
<volume>20</volume>
, <fpage>3643</fpage>
–<lpage>3646</lpage>
 (<year>2004</year>
).<pub-id pub-id-type="pmid">15247098</pub-id>
</mixed-citation>
</ref>
<ref id="b46"><mixed-citation publication-type="journal"><name><surname>Suzuki</surname>
<given-names>Y.</given-names>
</name>
, <name><surname>Kawazu</surname>
<given-names>T.</given-names>
</name>
 & <name><surname>Koyama</surname>
<given-names>H.</given-names>
</name>
<article-title>RNA isolation from siliques, dry seeds, and other tissues of <italic>Arabidopsis thaliana</italic>
</article-title>
. <source>Biotechniques</source>
<volume>37</volume>
, <fpage>544</fpage>
 (<year>2004</year>
).</mixed-citation>
</ref>
<ref id="b47"><mixed-citation publication-type="journal"><name><surname>Trapnell</surname>
<given-names>C.</given-names>
</name>
<italic>et al.</italic>
<article-title>Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks</article-title>
. <source>Nat. Protoc.</source>
<volume>7</volume>
, <fpage>562</fpage>
–<lpage>578</lpage>
 (<year>2012</year>
).<pub-id pub-id-type="pmid">22383036</pub-id>
</mixed-citation>
</ref>
<ref id="b48"><mixed-citation publication-type="journal"><name><surname>Grabherr</surname>
<given-names>M. G.</given-names>
</name>
<italic>et al.</italic>
<article-title>Full-length transcriptome assembly from RNA-Seq data without a reference genome</article-title>
. <source>Nat. Biotechnol.</source>
<volume>29</volume>
, <fpage>644</fpage>
–<lpage>652</lpage>
 (<year>2011</year>
).<pub-id pub-id-type="pmid">21572440</pub-id>
</mixed-citation>
</ref>
<ref id="b49"><mixed-citation publication-type="journal"><name><surname>Montgomery</surname>
<given-names>D. C.</given-names>
</name>
<source>Design and Analysis of Experiments</source>
 3rd edn John Wiley and Sons Inc. (<year>1991</year>
).</mixed-citation>
</ref>
</ref-list>
</back>
<floats-group><fig id="f1"><label>Figure 1</label>
<caption><title>The <italic>Camelina sativa</italic>
 genome.</title>
<p>From the outside ring to the centre: (1) the twenty <italic>C. sativa</italic>
 pseudochromosomes (Chr1–20 represented on Mb scale) are shown in different colours with putative centromeric regions indicated by black bands; (2) gene expression levels (log10(average FPKM), bin=250 Kb)), values range from 0 (yellow) to 3.92 (red); (3) the distribution of protein-coding regions (nucleotides per 500 Kb; orange) compared with repetitive sequences (nucleotides per 500 Kb; yellow); and (4) <italic>Ka</italic>
/<italic>Ks</italic>
 ratios (median, bin=500 kb) of syntenic (blue) and non-syntenic (green) genes. The centre shows a graphical view of the triplicated segments of annotated genes connected with lines of colours matching those for the pseudochromosomes.</p>
</caption>
<graphic xlink:href="ncomms4706-f1"></graphic>
</fig>
<fig id="f2"><label>Figure 2</label>
<caption><title>Comparison of gene number in <italic>C. sativa</italic>
 plotted against genome size with a subset of completely sequenced plant genomes (<italic>N</italic>
=35).</title>
<p>The significance test for a single outlier in regression data was performed as described in detail in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 8</xref>
 and the confidence intervals for the regression line are shown as dotted lines. Mes, <italic>Manihot esculenta</italic>
; Rco, <italic>Ricinus communis</italic>
; Lus, <italic>Linum usitatissimum</italic>
; Ptr, <italic>Populus trichocarpa</italic>
; Mtr, <italic>Medicago truncatula</italic>
; Pvu, <italic>Phaseolus vulgaris</italic>
; Gma, <italic>Glycine max</italic>
; Csa, <italic>Cucumis sativis</italic>
; Ppe, <italic>Prunus persica</italic>
; Mdo, <italic>Malus domestica</italic>
; Fve, <italic>Fragaria vesca</italic>
; Ath, <italic>Arabidopsis thaliana</italic>
; Aly, <italic>Arabidopsis lyrata</italic>
; Csa, <italic>Camelina sativa</italic>
; Cru, <italic>Capsella rubella</italic>
; Bra, <italic>Brassica rapa</italic>
; Tha, <italic>Thellungiella halophila</italic>
; Cpa, <italic>Carica papaya</italic>
; Gra, <italic>Gossypium raimondii</italic>
; Tca, <italic>Theobroma cacao</italic>
; Csi, <italic>Citrus sinensis</italic>
; Egr, <italic>Eucalyptus grandis</italic>
; Vvi, <italic>Vitis vinifera</italic>
; Stu, <italic>Solanum tuberosum</italic>
; Sly, <italic>Solanum lycopersicum</italic>
; Mgu, <italic>Mimulus guttatus</italic>
; Aco, <italic>Aquilegia coerulea</italic>
; Sbi, <italic>Sorghum bicolour</italic>
; Zma, <italic>Zea mays</italic>
; Sit, <italic>Setaria italica</italic>
; Pvi, <italic>Panicum virgatum</italic>
; Osa, <italic>Oryza sativa</italic>
; Bdi, <italic>Brachypodium distachyon</italic>
; Smo, <italic>Selaginella moellendorfii</italic>
; Ppa, <italic>Physcomitrella patens</italic>
.</p>
</caption>
<graphic xlink:href="ncomms4706-f2"></graphic>
</fig>
<fig id="f3"><label>Figure 3</label>
<caption><title>Comparative analysis and evolution of the <italic>C. sativa</italic>
 genome.</title>
<p>(<bold>a</bold>
) MUMer plot comparing the <italic>C. sativa</italic>
 and <italic>A. lyrata</italic>
 genomes. Syntenic and collinear regions making the three complete sub-genomes in <italic>C. sativa</italic>
 are circled in red, blue and green. (<bold>b</bold>
) Reconstruction of the three sub-genomes of <italic>C. sativa.</italic>
 Chromosome and ancestral genomic-block-level organization of the sub-genomes in <italic>C. sativa</italic>
 is shown. Based on synteny and collinearity between <italic>C. sativa</italic>
 and <italic>Arabidopsis</italic>
 species, and GB contiguity in the ancestral karyotype, pseudochromosomes were assigned to three sub-genomes in <italic>C. sativa</italic>
. Each pseudochromosome was subdivided among ancestral genomic blocks (A–X), which are coloured based on their occurrence in the ACK. (<bold>c</bold>
) ACK consisting of the 24 conserved genomic blocks (A–X). (<bold>d</bold>
) The ancestral diploid karyotype (derivative of ACK) of <italic>C. sativa</italic>
. (<bold>e</bold>
) The presumed origin and reconstruction of the fusion chromosome (AK2/4) of the dACK.</p>
</caption>
<graphic xlink:href="ncomms4706-f3"></graphic>
</fig>
<fig id="f4"><label>Figure 4</label>
<caption><title>Phylogenetic relationship between the three sub-genomes of <italic>C. sativa</italic>
 and lower-chromosome <italic>Camelina</italic>
 species.</title>
<p>A maximum-likelihood tree produced from a supermatrix constructed using 4,867 orthologous sequences. Clade support values near nodes represent bootstrap proportions in percentages. Branch lengths represent estimated nucleotide substitutions per site.</p>
</caption>
<graphic xlink:href="ncomms4706-f4"></graphic>
</fig>
<fig id="f5"><label>Figure 5</label>
<caption><title>Age distribution of duplicated genes in <italic>C. sativa</italic>
.</title>
<p>Gaussian mixture models fitted to frequency distributions of <italic>K</italic>
s (synonymous substitution) values obtained by comparing pairs of paralogous (<italic>C. sativa</italic>
—neopolyploidy) and orthologous (Sub-genomes 1/2/3 <italic>versus A. thaliana</italic>
) genes are shown. The mixture model analysis is described in <xref ref-type="supplementary-material" rid="S1">Supplementary Note 9</xref>
 and the complete list of Gaussian components is provided in <xref ref-type="supplementary-material" rid="S1">Supplementary Table 23</xref>
.</p>
</caption>
<graphic xlink:href="ncomms4706-f5"></graphic>
</fig>
<fig id="f6"><label>Figure 6</label>
<caption><title>Gene expression dynamics reveal genome dominance and functional diversification of <italic>C. sativa</italic>
 homeologous genes.</title>
<p>(<bold>a</bold>
) Relationship between gene retention rate following whole-genome triplication and the gene expression levels. (<bold>b</bold>
) Cumulative frequency of homeologous genes belonging to the three sub-genomes within <italic>C. sativa</italic>
 with highest expression across all tissue types. <italic>P</italic>
 values (ANOVA test for interaction, <italic>N</italic>
=108 per gene triplet) were calculated for interaction between sub-genomes (G) and tissue-type (T) effects on expression. To highlight differences between sub-genomes only the subset of the data with <italic>P</italic>
>10<sup>−10</sup>
 is shown. (<bold>c</bold>
) Scatterplot showing the magnitude of interaction effect calculated as the s.d. from a random effects model estimate for G × T interaction variance. Homeologous triplets were classified into groups, no interaction (<italic>P</italic>
>0.05; ANOVA test for interaction, <italic>N</italic>
=108), negligible interaction (<italic>P</italic>
<0.05 and STDEV(G × T)<0.25) and interaction (<italic>P</italic>
<0.05 and STDEV(G × T)>0.25).</p>
</caption>
<graphic xlink:href="ncomms4706-f6"></graphic>
</fig>
<table-wrap position="float" id="t1"><label>Table 1</label>
<caption><title>Incidence of expression and functional diversification of fully retained acyl-lipid metabolism related genes in <italic>C. sativa</italic>
.</title>
</caption>
<table frame="hsides" rules="groups" border="1"><colgroup><col align="left"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="bottom"><tr><th align="left" valign="top" charoff="50"> </th>
<th align="center" valign="top" charoff="50"><bold>All</bold>
</th>
<th align="center" valign="top" charoff="50"><bold>Acyl-lipid metabolism</bold>
</th>
</tr>
</thead>
<tbody valign="top"><tr><td align="left" valign="top" charoff="50">Fully retained triplets</td>
<td align="center" valign="top" charoff="50">18,565</td>
<td align="center" valign="top" charoff="50">586</td>
</tr>
<tr><td align="left" valign="top" charoff="50">Significant expression divergence<xref ref-type="fn" rid="t1-fn1">*</xref>
</td>
<td align="char" valign="top" char="(" charoff="50">14,391 (77.5%)</td>
<td align="char" valign="top" char="(" charoff="50">497 (84.8%)</td>
</tr>
<tr><td align="left" valign="top" charoff="50">Significant interaction effect<xref ref-type="fn" rid="t1-fn2">†</xref>
</td>
<td align="char" valign="top" char="(" charoff="50">4,106 (22.1%)</td>
<td align="char" valign="top" char="(" charoff="50">181 (30.9%)</td>
</tr>
<tr><td align="left" valign="top" charoff="50">Gene silencing</td>
<td align="char" valign="top" char="(" charoff="50">900 (4.8%)</td>
<td align="char" valign="top" char="(" charoff="50">23 (3.9%)</td>
</tr>
<tr><td align="left" valign="top" charoff="50">Functional divergence</td>
<td align="char" valign="top" char="(" charoff="50">2,603 (14.0%)</td>
<td align="char" valign="top" char="(" charoff="50">90 (15.4%)</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="t1-fn1"><p><sup>*</sup>
G × T interaction; <italic>P</italic>
<0.05 (ANOVA test for interaction); sample size <italic>N</italic>
=108 per gene triplet (number of genes × number of tissue types (12) × number of replications (3)).</p>
</fn>
<fn id="t1-fn2"><p><sup>†</sup>
<italic>P</italic>
<0.05 (ANOVA test for interaction); STDEV (G × T random effects)>0.25); sample size <italic>N</italic>
=108 per gene triplet (number of genes × number of tissue types (12) × number of replications (3)).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</floats-group>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Bois/explor/OrangerV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000002 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000002 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Bois
   |area=    OrangerV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4015329
   |texte=   The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24759634" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OrangerV1

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 17:11:04 2016. Site generation: Wed Mar 6 18:18:32 2024

	Serveur d'exploration sur l'oranger
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'oranger

The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure

The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki