Links to Exploration step
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">SynFind: Compiling Syntenic Regions across Any Set of Genomes on
Demand</title>
<author><name sortKey="Tang, Haibao" sort="Tang, Haibao" uniqKey="Tang H" first="Haibao" last="Tang">Haibao Tang</name>
<affiliation><nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Bomhoff, Matthew D" sort="Bomhoff, Matthew D" uniqKey="Bomhoff M" first="Matthew D." last="Bomhoff">Matthew D. Bomhoff</name>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Briones, Evan" sort="Briones, Evan" uniqKey="Briones E" first="Evan" last="Briones">Evan Briones</name>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zhang, Liangsheng" sort="Zhang, Liangsheng" uniqKey="Zhang L" first="Liangsheng" last="Zhang">Liangsheng Zhang</name>
<affiliation><nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Schnable, James C" sort="Schnable, James C" uniqKey="Schnable J" first="James C." last="Schnable">James C. Schnable</name>
<affiliation><nlm:aff id="evv219-AFF3">Department of Agronomy and Horticulture, University of Nebraska, Lincoln</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Lyons, Eric" sort="Lyons, Eric" uniqKey="Lyons E" first="Eric" last="Lyons">Eric Lyons</name>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26560340</idno>
<idno type="pmc">4700967</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4700967</idno>
<idno type="RBID">PMC:4700967</idno>
<idno type="doi">10.1093/gbe/evv219</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000059</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">SynFind: Compiling Syntenic Regions across Any Set of Genomes on
Demand</title>
<author><name sortKey="Tang, Haibao" sort="Tang, Haibao" uniqKey="Tang H" first="Haibao" last="Tang">Haibao Tang</name>
<affiliation><nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Bomhoff, Matthew D" sort="Bomhoff, Matthew D" uniqKey="Bomhoff M" first="Matthew D." last="Bomhoff">Matthew D. Bomhoff</name>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Briones, Evan" sort="Briones, Evan" uniqKey="Briones E" first="Evan" last="Briones">Evan Briones</name>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Zhang, Liangsheng" sort="Zhang, Liangsheng" uniqKey="Zhang L" first="Liangsheng" last="Zhang">Liangsheng Zhang</name>
<affiliation><nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Schnable, James C" sort="Schnable, James C" uniqKey="Schnable J" first="James C." last="Schnable">James C. Schnable</name>
<affiliation><nlm:aff id="evv219-AFF3">Department of Agronomy and Horticulture, University of Nebraska, Lincoln</nlm:aff>
</affiliation>
</author>
<author><name sortKey="Lyons, Eric" sort="Lyons, Eric" uniqKey="Lyons E" first="Eric" last="Lyons">Eric Lyons</name>
<affiliation><nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Genome Biology and Evolution</title>
<idno type="eISSN">1759-6653</idno>
<imprint><date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>The identification of conserved syntenic regions enables discovery of predicted
locations for orthologous and homeologous genes, even when no such gene is present.
This capability means that synteny-based methods are far more effective than sequence
similarity-based methods in identifying true-negatives, a necessity for studying gene
loss and gene transposition. However, the identification of syntenic regions requires
complex analyses which must be repeated for pairwise comparisons between any two
species. Therefore, as the number of published genomes increases, there is a growing
demand for scalable, simple-to-use applications to perform comparative genomic
analyses that cater to both gene family studies and genome-scale studies. We
implemented SynFind, a web-based tool that addresses this need. Given one query
genome, SynFind is capable of identifying conserved syntenic regions in any set of
target genomes. SynFind is capable of reporting per-gene information, useful for
researchers studying specific gene families, as well as genome-wide data sets of
syntenic gene and predicted gene locations, critical for researchers focused on
large-scale genomic analyses. Inference of syntenic homologs provides the basis for
correlation of functional changes around genes of interests between related
organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic
data from over 15,000 organisms from all domains of life as well as supporting
multiple releases of the same organism. SynFind makes use of a powerful job execution
framework that promises scalability and reproducibility. SynFind can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://genomevolution.org/CoGe/SynFind.pl">http://genomevolution.org/CoGe/SynFind.pl</ext-link>
. A video tutorial of SynFind
using <italic>Phytophthrora</italic>
as an example is available at <ext-link ext-link-type="uri" xlink:href="http://www.youtube.com/watch?v=2Agczny9Nyc">http://www.youtube.com/watch?v=2Agczny9Nyc</ext-link>
.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Barbaglia, Am" uniqKey="Barbaglia A">AM Barbaglia</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Baxter, L" uniqKey="Baxter L">L Baxter</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bowers, Je" uniqKey="Bowers J">JE Bowers</name>
</author>
<author><name sortKey="Chapman, Ba" uniqKey="Chapman B">BA Chapman</name>
</author>
<author><name sortKey="Rong, J" uniqKey="Rong J">J Rong</name>
</author>
<author><name sortKey="Paterson, Ah" uniqKey="Paterson A">AH Paterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Byrne, Kp" uniqKey="Byrne K">KP Byrne</name>
</author>
<author><name sortKey="Wolfe, Kh" uniqKey="Wolfe K">KH Wolfe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cai, B" uniqKey="Cai B">B Cai</name>
</author>
<author><name sortKey="Yang, X" uniqKey="Yang X">X Yang</name>
</author>
<author><name sortKey="Tuskan, Ga" uniqKey="Tuskan G">GA Tuskan</name>
</author>
<author><name sortKey="Cheng, Zm" uniqKey="Cheng Z">ZM Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chalhoub, B" uniqKey="Chalhoub B">B Chalhoub</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Charles, M" uniqKey="Charles M">M Charles</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Davidson, Rm" uniqKey="Davidson R">RM Davidson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dewey, Cn" uniqKey="Dewey C">CN Dewey</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dong, X" uniqKey="Dong X">X Dong</name>
</author>
<author><name sortKey="Fredman, D" uniqKey="Fredman D">D Fredman</name>
</author>
<author><name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Engstrom, Pg" uniqKey="Engstrom P">PG Engstrom</name>
</author>
<author><name sortKey="Ho Sui, Sj" uniqKey="Ho Sui S">SJ Ho Sui</name>
</author>
<author><name sortKey="Drivenes, O" uniqKey="Drivenes O">O Drivenes</name>
</author>
<author><name sortKey="Becker, Ts" uniqKey="Becker T">TS Becker</name>
</author>
<author><name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ghiurcuta, Cg" uniqKey="Ghiurcuta C">CG Ghiurcuta</name>
</author>
<author><name sortKey="Moret, Bm" uniqKey="Moret B">BM Moret</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Green, Re" uniqKey="Green R">RE Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haudry, A" uniqKey="Haudry A">A Haudry</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heger, A" uniqKey="Heger A">A Heger</name>
</author>
<author><name sortKey="Ponting, Cp" uniqKey="Ponting C">CP Ponting</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hofberger, Ja" uniqKey="Hofberger J">JA Hofberger</name>
</author>
<author><name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author><name sortKey="Edger, Pp" uniqKey="Edger P">PP Edger</name>
</author>
<author><name sortKey="Chris Pires, J" uniqKey="Chris Pires J">J Chris Pires</name>
</author>
<author><name sortKey="Eric Schranz, M" uniqKey="Eric Schranz M">M Eric Schranz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ibarra Laclette, E" uniqKey="Ibarra Laclette E">E Ibarra-Laclette</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jin, Q" uniqKey="Jin Q">Q Jin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kielbasa, Sm" uniqKey="Kielbasa S">SM Kielbasa</name>
</author>
<author><name sortKey="Wan, R" uniqKey="Wan R">R Wan</name>
</author>
<author><name sortKey="Sato, K" uniqKey="Sato K">K Sato</name>
</author>
<author><name sortKey="Horton, P" uniqKey="Horton P">P Horton</name>
</author>
<author><name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lai, J" uniqKey="Lai J">J Lai</name>
</author>
<author><name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author><name sortKey="Messing, J" uniqKey="Messing J">J Messing</name>
</author>
<author><name sortKey="Dooner, Hk" uniqKey="Dooner H">HK Dooner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Law, M" uniqKey="Law M">M Law</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author><name sortKey="Stoeckert, Cj" uniqKey="Stoeckert C">CJ Stoeckert</name>
</author>
<author><name sortKey="Roos, Ds" uniqKey="Roos D">DS Roos</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ling, X" uniqKey="Ling X">X Ling</name>
</author>
<author><name sortKey="He, X" uniqKey="He X">X He</name>
</author>
<author><name sortKey="Xin, D" uniqKey="Xin D">D Xin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lohr, S" uniqKey="Lohr S">S Lohr</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author><name sortKey="Pedersen, B" uniqKey="Pedersen B">B Pedersen</name>
</author>
<author><name sortKey="Kane, J" uniqKey="Kane J">J Kane</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Moreno Hagelsieb, G" uniqKey="Moreno Hagelsieb G">G Moreno-Hagelsieb</name>
</author>
<author><name sortKey="Trevino, V" uniqKey="Trevino V">V Trevino</name>
</author>
<author><name sortKey="Perez Rueda, E" uniqKey="Perez Rueda E">E Perez-Rueda</name>
</author>
<author><name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author><name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ng, Mp" uniqKey="Ng M">MP Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ostlund, G" uniqKey="Ostlund G">G Ostlund</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Poyatos, Jf" uniqKey="Poyatos J">JF Poyatos</name>
</author>
<author><name sortKey="Hurst, Ld" uniqKey="Hurst L">LD Hurst</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Proost, S" uniqKey="Proost S">S Proost</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Revanna, Kv" uniqKey="Revanna K">KV Revanna</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rodelsperger, C" uniqKey="Rodelsperger C">C Rodelsperger</name>
</author>
<author><name sortKey="Dieterich, C" uniqKey="Dieterich C">C Dieterich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
<author><name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sinha, Au" uniqKey="Sinha A">AU Sinha</name>
</author>
<author><name sortKey="Meller, J" uniqKey="Meller J">J Meller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Soderlund, C" uniqKey="Soderlund C">C Soderlund</name>
</author>
<author><name sortKey="Bomhoff, M" uniqKey="Bomhoff M">M Bomhoff</name>
</author>
<author><name sortKey="Nelson, Wm" uniqKey="Nelson W">WM Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author><name sortKey="Bowers, Je" uniqKey="Bowers J">JE Bowers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author><name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tettelin, H" uniqKey="Tettelin H">H Tettelin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Thrasher, A" uniqKey="Thrasher A">A Thrasher</name>
</author>
<author><name sortKey="Thain, D" uniqKey="Thain D">D Thain</name>
</author>
<author><name sortKey="Emrich, S" uniqKey="Emrich S">S Emrich</name>
</author>
<author><name sortKey="Musgrave, Z" uniqKey="Musgrave Z">Z Musgrave</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Vergara, Ia" uniqKey="Vergara I">IA Vergara</name>
</author>
<author><name sortKey="Chen, N" uniqKey="Chen N">N Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Waters, Aj" uniqKey="Waters A">AJ Waters</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wolfe, Kh" uniqKey="Wolfe K">KH Wolfe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Woodhouse, Mr" uniqKey="Woodhouse M">MR Woodhouse</name>
</author>
<author><name sortKey="Pedersen, B" uniqKey="Pedersen B">B Pedersen</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Woodhouse, Mr" uniqKey="Woodhouse M">MR Woodhouse</name>
</author>
<author><name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author><name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
<front><journal-meta><journal-id journal-id-type="nlm-ta">Genome Biol Evol</journal-id>
<journal-id journal-id-type="iso-abbrev">Genome Biol Evol</journal-id>
<journal-id journal-id-type="publisher-id">gbe</journal-id>
<journal-id journal-id-type="hwp">gbe</journal-id>
<journal-title-group><journal-title>Genome Biology and Evolution</journal-title>
</journal-title-group>
<issn pub-type="epub">1759-6653</issn>
<publisher><publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">26560340</article-id>
<article-id pub-id-type="pmc">4700967</article-id>
<article-id pub-id-type="doi">10.1093/gbe/evv219</article-id>
<article-id pub-id-type="publisher-id">evv219</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Genome Resources</subject>
</subj-group>
</article-categories>
<title-group><article-title>SynFind: Compiling Syntenic Regions across Any Set of Genomes on
Demand</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Tang</surname>
<given-names>Haibao</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF1"><sup>1</sup>
</xref>
<xref ref-type="aff" rid="evv219-AFF2"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Bomhoff</surname>
<given-names>Matthew D.</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF2"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Briones</surname>
<given-names>Evan</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF2"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Zhang</surname>
<given-names>Liangsheng</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF1"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Schnable</surname>
<given-names>James C.</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF3"><sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Lyons</surname>
<given-names>Eric</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF2"><sup>2</sup>
</xref>
<xref ref-type="corresp" rid="evv219-COR1">*</xref>
</contrib>
<aff id="evv219-AFF1"><sup>1</sup>
Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</aff>
<aff id="evv219-AFF2"><sup>2</sup>
School of Plant Sciences, iPlant Collaborative, University of Arizona</aff>
<aff id="evv219-AFF3"><sup>3</sup>
Department of Agronomy and Horticulture, University of Nebraska, Lincoln</aff>
</contrib-group>
<author-notes><corresp id="evv219-COR1">*Corresponding author: E-mail:
<email>elyons.uoa@gmail.com</email>
.</corresp>
<fn id="FN1"><p><bold>Associate editor:</bold>
Kenneth Wolfe</p>
</fn>
</author-notes>
<pub-date pub-type="collection"><month>12</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="epub"><day>11</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>11</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the
. </pmc-comment>
<volume>7</volume>
<issue>12</issue>
<fpage>3286</fpage>
<lpage>3298</lpage>
<history><date date-type="accepted"><day>6</day>
<month>11</month>
<year>2015</year>
</date>
</history>
<permissions><copyright-statement>© The Author(s) 2015. Published by Oxford University Press on
behalf of the Society for Molecular Biology and Evolution.</copyright-statement>
<copyright-year>2015</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/" license-type="creative-commons"><license-p>This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the
original work is properly cited.</license-p>
</license>
</permissions>
<abstract><p>The identification of conserved syntenic regions enables discovery of predicted
locations for orthologous and homeologous genes, even when no such gene is present.
This capability means that synteny-based methods are far more effective than sequence
similarity-based methods in identifying true-negatives, a necessity for studying gene
loss and gene transposition. However, the identification of syntenic regions requires
complex analyses which must be repeated for pairwise comparisons between any two
species. Therefore, as the number of published genomes increases, there is a growing
demand for scalable, simple-to-use applications to perform comparative genomic
analyses that cater to both gene family studies and genome-scale studies. We
implemented SynFind, a web-based tool that addresses this need. Given one query
genome, SynFind is capable of identifying conserved syntenic regions in any set of
target genomes. SynFind is capable of reporting per-gene information, useful for
researchers studying specific gene families, as well as genome-wide data sets of
syntenic gene and predicted gene locations, critical for researchers focused on
large-scale genomic analyses. Inference of syntenic homologs provides the basis for
correlation of functional changes around genes of interests between related
organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic
data from over 15,000 organisms from all domains of life as well as supporting
multiple releases of the same organism. SynFind makes use of a powerful job execution
framework that promises scalability and reproducibility. SynFind can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://genomevolution.org/CoGe/SynFind.pl">http://genomevolution.org/CoGe/SynFind.pl</ext-link>
. A video tutorial of SynFind
using <italic>Phytophthrora</italic>
as an example is available at <ext-link ext-link-type="uri" xlink:href="http://www.youtube.com/watch?v=2Agczny9Nyc">http://www.youtube.com/watch?v=2Agczny9Nyc</ext-link>
.</p>
</abstract>
<kwd-group><kwd>synteny</kwd>
<kwd>homology</kwd>
<kwd>genome evolution</kwd>
<kwd>cyberinfrastructure</kwd>
</kwd-group>
<counts><page-count count="13"></page-count>
</counts>
</article-meta>
</front>
<body><sec sec-type="intro"><title>Introduction</title>
<p>Conserved synteny refers to an inferred homology relationship between genes which are
supported by sharing a common genomic neighborhood, and is a widely used measurement of
evolutionary divergence across all domains of life (<xref rid="evv219-B30" ref-type="bibr">Moreno-Hagelsieb et al. 2001</xref>
; <xref rid="evv219-B12" ref-type="bibr">Engstrom et al. 2007</xref>
; <xref rid="evv219-B18" ref-type="bibr">Heger and Ponting 2007</xref>
; <xref rid="evv219-B33" ref-type="bibr">Poyatos and
Hurst 2007</xref>
; <xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al.
2008</xref>
). Conserved synteny is evident when large sets of genes or genomic
features are preserved in close proximity (synteny), and often in the same order and
orientations (colinearity) (<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al.
2008</xref>
). Conserved synteny across species lays an essential foundation for
genomic research, including map-based cloning, validating predicted gene models (<xref rid="evv219-B24" ref-type="bibr">Law et al. 2015</xref>
), and identifying conserved
noncoding sequences (<xref rid="evv219-B17" ref-type="bibr">Haudry et al. 2013</xref>
).
Conserved synteny within species identifies ancient polyploidy events or other types of
large-scale genomic duplications (<xref rid="evv219-B52" ref-type="bibr">Wolfe
2001</xref>
).</p>
<p>Synteny provides an extra layer of information to confirm gene homology, and is much
more reliable than inference based on sequence similarities alone. Results from a
typical Basic Local Alignment Search Tool (BLAST) analyses do not easily indicate
whether there is a gene loss or transposition. Popular approaches based on the
reciprocal best hit do not take into account the ancestral state of a genome nor provide
much insight into the evolutionary history of a gene or gene family. More generally,
protein clustering algorithms such as OrthoMCL (<xref rid="evv219-B25" ref-type="bibr">Li et al. 2003</xref>
) and INPARANOID (<xref rid="evv219-B32" ref-type="bibr">Ostlund et al. 2010</xref>
) may be successful for single copy gene families when
evolutionary rates are constant, but can be confounded by accelerated rates of evolution
in certain gene copies, and will sometimes produce false-positive assignments of
orthology, particularly in cases of reciprocal loss of paralogous genes between species.
Positional studies that track gene movements over evolutionary time require more
gene-centric synteny tools (<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al.
2011</xref>
).</p>
<p>Curated syntenic gene sets are critical tools for deriving genome-scale patterns and
evolutionary trends, and are widely popular (<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al. 2011</xref>
; <xref rid="evv219-B3" ref-type="bibr">Baxter et al.
2012</xref>
; <xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
).
Unfortunately, construction of robust and accurate syntenic data sets requires a set of
specialized comparative genomic skills currently limited to a small number of research
groups. Until now, the primary method by which the broader research community employed
syntenic information in their research is through manually curated syntenic gene sets
published by these groups. Manually curated gene sets are inherently limiting because,
as a result of the lag introduced by the publication cycle, by the time a given syntenic
gene set is published, genome assemblies for new species will often have become
available, and genome assemblies, annotations, and gene identifiers will often have been
updated for existing published genomes. Genome sequence assemblies being released at an
ever increasing pace, there is a need for tools that enable individual researchers to
rapidly identify syntenic regions between species.</p>
<p>The majority of community use of synteny data generally falls into one of several use
cases: 1) Researchers interested in a specific gene from a specific species who want to
rapidly find the syntenic ortholog(s) of their target gene in one or more additional
species and 2) researchers who want to trace changes in the positional history of a
single gene or gene family across a population of related species. In addition to the
lag time introduced in publishing syntenic gene lists, most published lists only provide
information on conserved syntenic orthologs, but do not provide information on predicted
syntenic locations for genes where no syntenic orthologs are found. This severely limits
their utility for use case #2 above, as it strips out one of the key advantages of
syntenic analysis, the ability to identify confident sets of “true
negatives.” True negatives include both lineage specific, recently inserted genes
(also known as the “gray genome”) (<xref rid="evv219-B13" ref-type="bibr">Freeling et al. 2008</xref>
), and genes conserved at syntenic locations across
multiple species in a clade but deleted from the genomes of one or more specific
species. Many evolutionary studies require the knowledge of whether a certain gene is
indeed missing or relocated from a genomic region (transposition). Distinguishing
transposition from gene removal is critical because potential changes in gene expression
patterns are different under these two scenarios.</p>
<p>Identification of syntenic genes has additional advantages for functional research
studies, as syntenic homologs are more likely to retain the same expression pattern than
nonsyntenic homologs (<xref rid="evv219-B10" ref-type="bibr">Dewey 2011</xref>
; <xref rid="evv219-B37" ref-type="bibr">Schnable 2015</xref>
). Orthologous genes (as
identified by OrthoMCL) at nonsyntenic locations show reduced correlation in expression
pattern between different grass species (<xref rid="evv219-B9" ref-type="bibr">Davidson
et al. 2012</xref>
). Genes captured by helitrons and relocated to a new genomic
neighborhood in maize show novel patterns of expression (<xref rid="evv219-B2" ref-type="bibr">Barbaglia et al. 2012</xref>
). Common methods of gene
transposition—transposon capture (<xref rid="evv219-B23" ref-type="bibr">Lai et
al. 2005</xref>
) and intrachromosomal recombination (<xref rid="evv219-B53" ref-type="bibr">Woodhouse et al. 2010</xref>
)—can often carry protein-coding
sequence of a gene without the associated regulatory sequences. A study in maize also
found that genes that retain in syntenic positions across multiple grass species were
significantly more likely than nonsyntenic genes to produce visible mutant phenotypes
when knocked out (<xref rid="evv219-B38" ref-type="bibr">Schnable and Freeling
2011</xref>
), further highlighting the functional relevance of synteny information in
the validation of direct functional homologs.</p>
<p>As we provide a novel implementation of yet another synteny-finding tool, we offer an
overview of popular synteny-finding algorithms, including several tools that were
designed and implemented by several of the authors in the past. In general, the
synteny-finding algorithms can be grouped based on whether they are based on positional
colinearity or positional density, for what type of statistical features they are
searching (<xref rid="evv219-B14" ref-type="bibr">Ghiurcuta and Moret 2014</xref>
), and
their definition of “syntenic block.” A list of recent synteny search
software includes iAdHore (<xref rid="evv219-B34" ref-type="bibr">Proost et al.
2012</xref>
), mGSV (<xref rid="evv219-B35" ref-type="bibr">Revanna et al.
2012</xref>
), SyMap (<xref rid="evv219-B41" ref-type="bibr">Soderlund et al.
2011</xref>
), SynMap (<xref rid="evv219-B29" ref-type="bibr">Lyons et al. 2008</xref>
),
Orthocluster (<xref rid="evv219-B48" ref-type="bibr">Vergara and Chen 2010</xref>
),
Synorth (<xref rid="evv219-B11" ref-type="bibr">Dong et al. 2009</xref>
), MCScan (<xref rid="evv219-B45" ref-type="bibr">Tang, Wang, et al. 2008</xref>
), and MCScanX (<xref rid="evv219-B50" ref-type="bibr">Wang et al. 2012</xref>
) among many others. These
synteny search software vary greatly in the trade-offs accepted by the authors in terms
of run time, computational resource requirements, and goal of minimizing either type I
(false positive) or type II (false negative) errors. In addition, from a pragmatic
standpoint, the tools are also distinguished by interface type (i.e., command line, web
based) and whether a given tool offers the built-in functionality to provide graphical
outputs, enabling visual proofing of results. Herein, we provide a review of major
features of recent synteny-finding software in <xref ref-type="table" rid="evv219-T1">table 1</xref>
. <table-wrap id="evv219-T1" orientation="portrait" position="float"><label>Table 1</label>
<caption><p>Comparison of Major Features of Synteny-Based Homology Detection Software</p>
</caption>
<table frame="hsides" rules="groups"><thead align="left"><tr><th rowspan="1" colspan="1">Tool</th>
<th rowspan="1" colspan="1">References</th>
<th rowspan="1" colspan="1">Interface</th>
<th rowspan="1" colspan="1">Multiple Genomes</th>
<th rowspan="1" colspan="1">Syntenic Families</th>
<th rowspan="1" colspan="1">Infer Gene Loss</th>
<th rowspan="1" colspan="1">Scoring Mode</th>
<th rowspan="1" colspan="1">Parallel Computing</th>
<th rowspan="1" colspan="1">Integration with Data</th>
</tr>
</thead>
<tbody align="left"><tr><td rowspan="1" colspan="1">ColinearScan</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B49" ref-type="bibr">Wang et al. (2006)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">Cinteny</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B40" ref-type="bibr">Sinha and Meller
(2007)</xref>
</td>
<td rowspan="1" colspan="1">Web</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Limited (∼20)</td>
</tr>
<tr><td rowspan="1" colspan="1">MCScan</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al.
(2008)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">SynMap</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B29" ref-type="bibr">Lyons et al. (2008)</xref>
</td>
<td rowspan="1" colspan="1">Web</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Hybrid</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">CoGe (∼25K)</td>
</tr>
<tr><td rowspan="1" colspan="1">MCMuSeC</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B26" ref-type="bibr">Ling et al. (2009)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">Synteny</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">OrthoClusterDB</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B31" ref-type="bibr">Ng et al. (2009)</xref>
</td>
<td rowspan="1" colspan="1">Web</td>
<td rowspan="1" colspan="1">Limited</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Limited (∼50)</td>
</tr>
<tr><td rowspan="1" colspan="1">Cyntenator</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B36" ref-type="bibr">Rodelsperger and Dieterich
(2010)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">MicroSyn</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B6" ref-type="bibr">Cai et al. (2011)</xref>
</td>
<td rowspan="1" colspan="1">GUI</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Synteny</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">SyMAP</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B41" ref-type="bibr">Soderlund et al.
(2011)</xref>
</td>
<td rowspan="1" colspan="1">GUI/Web</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Hybrid</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Limited (∼10)</td>
</tr>
<tr><td rowspan="1" colspan="1">MCScanX</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B50" ref-type="bibr">Wang et al. (2012)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">i-ADHoRe</td>
<td rowspan="1" colspan="1"><xref rid="evv219-B34" ref-type="bibr">Proost et al. (2012)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
<td rowspan="1" colspan="1">Both/Hybrid</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">−</td>
</tr>
<tr><td rowspan="1" colspan="1">SynFind</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Command/Web</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">Both</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">CoGe (∼25K)</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="evv219-TF1"><p>N<sc>ote</sc>
.—The tools published in the last 10 years are given in
the table. Symbols + and − represent yes and no, respectively.
“Scoring mode” is the optimization goal used in identifying
syntenic regions. “Colinear” requires the gene order to be
preserved; “Synteny” does not enforce conserved gene order;
“Hybrid” uses “Colinear” initially and recruits
imperfect synteny; “Both” supports both modes as program
options. “Integration with data” is a count of available genomes
for immediate use with a given tool.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>A careful evaluation of these algorithms suggested fundamental challenges that are still
not met for more general uses. First and foremost, data curation is often a significant
challenge (<xref rid="evv219-B27" ref-type="bibr">Lohr 2014</xref>
), requiring users to
convert genomic annotation files into a range of idiosyncratic file formats required by
different algorithms. Many tools are run from the command line, and often obtaining the
most accurate results from a given tool will require experimentation with a range of
settings, presenting an additional challenge to users who must develop methods of
evaluating and ranking multiple output data sets. As the number of organisms a user is
interested in comparing grows, computational time requirements will often scale
quadratically, presenting challenges for these primarily offline algorithms.</p>
<p>After closely working with researchers in the community in the past few years, it was
clear that the life cycle of gene synteny analysis requires running multiple algorithms
to create input homology data (different BLAST-like algorithms), adjusting parameters
on-the-fly (configurable thresholds), as well as allowing different
synteny-finding/scoring schemes (colinear vs. density) (<xref ref-type="table" rid="evv219-T1">table 1</xref>
). Following the same design principle as other CoGe
tools, we continue to adopt a cloud-based implementation that offers a one-stop solution
that combines user-configurable input data (genomes and structural annotations),
algorithms, scalable computing resources (parallelization, memory, and storage),
integrated visualization, links to additional tools for further data analysis, readily
exportable results, and reproducibility through permanent URLs.</p>
<p>Our new online method, SynFind, has a number of features not typically found in other
systems (<xref ref-type="table" rid="evv219-T1">table 1</xref>
) that reflect recent
innovations in comparative genomic analysis adopted in a few newly sequenced genomes
(<xref rid="evv219-B1" ref-type="bibr">Amborella Genome Project 2013</xref>
; <xref rid="evv219-B20" ref-type="bibr">Ibarra-Laclette et al. 2013</xref>
; <xref rid="evv219-B7" ref-type="bibr">Chalhoub et al. 2014</xref>
; <xref rid="evv219-B16" ref-type="bibr">Green et al. 2014</xref>
). SynFind identifies multiple syntenic
regions between a gene in a reference genome and a target genome, entirely independently
of whether syntenic ortholog or paralog is present at the predicted location or not.
SynFind provides the option for both density and colinear scoring of syntenic regions to
address the different structural genomic changes in taxa with different evolutionary
distances and different genome assembly qualities. SynFind generates syntenic depth
tables as well as gene presence–absence table to reveal ancient polyploidy events
and genes unique to one genome against others. Most critically, the integration with
CoGe provides instant access to thousands of genomes across all domains of life along
with CoGe’s tools to let users add new genomes, keep them private, and compare
them using SynFind as rapidly as they are released. Tight integration with up-to-date
genomic data facilitates access to computing resources, downstream visualization and
analysis tools, thereby creating an open-ended pipeline of research that facilitates
exploration of multidimensional genomic data sets that bridge evolutionary genomics and
functional genomics.</p>
</sec>
<sec sec-type="materials|methods"><title>Materials and Methods</title>
<sec><title>Synteny Score</title>
<p>SynFind processes putatively homologous gene pairs in order to extract the syntenic
blocks, using each gene as query. Gene pairs are computed from sequence similarity
search programs, such as BLAST, LASTZ, or LAST (<xref rid="evv219-B22" ref-type="bibr">Kielbasa et al. 2011</xref>
). The modular architecture of SynFind
allows the straightforward incorporation of new sequence similarity search algorithms
in the future. Although SynFind can output information for a single gene, in each
run, syntenic regions in the target genome(s) are identified for every annotated gene
in the query genome. Extra caution is taken with genes which are members of tandem
arrays (groups of homologous genes clustered together in the genome) as matches among
such genes are likely overcounted and show up as false-positive synteny blocks.
Consequently, tandem matches are reduced to a single copy in this step to avoid
seeding a synteny block inside a tandem array. The treatment of tandem arrays is
similar to the strategy used in MCScanX and iADHoRe (<xref rid="evv219-B34" ref-type="bibr">Proost et al. 2012</xref>
; <xref rid="evv219-B50" ref-type="bibr">Wang et al. 2012</xref>
).</p>
<p>To seed synteny blocks, our algorithm works by selecting a fixed number of genes up
and downstream from the query gene (<xref ref-type="fig" rid="evv219-F1">fig.
1</xref>
<italic>A</italic>
). This method is robust with respect to variation in
gene density and intergenic spacing observed across different species. All gene pairs
to a target genome between the region surrounding the gene of interest and candidate
syntenic locations in the target genome are then identified and the number of
matching gene pairs is counted as the “synteny score” (<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>B</italic>
). SynFind provides
positioning cues for visualization through genome browsers. Comparisons across sets
of homologous regions are facilitated through automated centering and truncation of
colinear panels. The middle gene of the current window or the “query” is
used to as the center of the syntenic panels. The extent of syntenic gene pairs in
the current window can be used to truncate the matching panels to focus on a
particular region of interest. Finally, SynFind automatically flips sequences so
syntenic regions are visualized on the same strand for clarity. These data are useful
in automatically creating local syntenic views in CoGe for subsequent manual
validation. <fig id="evv219-F1" orientation="portrait" position="float"><label>F<sc>ig</sc>
. 1.—</label>
<caption><p>Illustration of three key steps in SynFind. The three key steps include
(<italic>A</italic>
) extraction of genomic neighborhood,
(<italic>B</italic>
) gene pair generation and scoring of each matching
region, and (<italic>C</italic>
) identification of flankers (neighboring
gene pairs) and annotation of syntelog class.</p>
</caption>
<graphic xlink:href="evv219f1p"></graphic>
</fig>
</p>
<p>The output of the seeding step consists of syntenic gene pairs and a score to
indicate the level of conserved synteny between their respective genomic locations.
For each target region found, the synteny score reflects the number of gene pairs
that are syntenic or colinear within the window, depending on the scoring function.
When a matching region is found, the flanking genes for the query gene are identified
and the status of the syntelog is tracked in a single letter notation—S/F/G,
following the nomenclature in <xref rid="evv219-B54" ref-type="bibr">Woodhouse et al.
(2011)</xref>
. S is “syntelog,” which means that it has a match to the
region. In this case, the match itself is used to represent the region. In contrast,
F class and G class refer to the cases that the syntelog is missing (fractionated or
moved) from syntenic region identified in the target genome. F has both flankers
present, whereas G has only one flanker (<xref ref-type="fig" rid="evv219-F1">fig.
1</xref>
<italic>C</italic>
). G class syntenic regions are largely the result of
adjacent genomic rearrangements (inversions and translocations) in either the target
or query genome, but can also occur at the end of pseudomolecules, scaffolds, or
contigs. In the case of F or G, a flanker gene is used to represent the region as a
“proxy” to identify the approximate location of where a syntelog is
expected to reside in the target genome.</p>
<p>As a final validation, we recover tandem matches by checking against the original
BLAST output as the tandem matches were reduced to single copy prior to the
“seeding” step. This validation step increases the sensitivity of SynFind
for genes inside tandem arrays. A single best match among the tandem array is
selected to be the representative syntelog for a query gene, for the sake of clarity.
The source code of SynFind can be found at <ext-link ext-link-type="uri" xlink:href="https://github.com/tanghaibao/quota-alignment/blob/master/scripts/synteny_score.py">https://github.com/tanghaibao/quota-alignment/blob/master/scripts/synteny_score.py</ext-link>
(last accessed November 30, 2015).</p>
</sec>
<sec><title>Choice of Parameters: Beauty in Simplicity</title>
<p>There are a few intuitive, user-configurable parameters that adjust sensitivity or
specificity of SynFind.</p>
<sec><title>Window Size: Window Size in Number of Neighboring Genes (Default: 40)</title>
<p>Given an anchor gene, SynFind searches upstream and downstream half a window size
from the query. For example, a window size of 40 means that a total of 41 genes
are checked: The query gene, plus 20 upstream genes and 20 downstream genes (<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>A</italic>
).</p>
<p><italic>Minimum synteny score</italic>
: The minimum number of anchoring genes to
call a region “syntenic.”</p>
<p>The combination of “window size” and “minimum number of
genes” together controls the sensitivity and specificity of the algorithm
(<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>B</italic>
). The
default number 4 means that a region is considered syntenic if 4 of 41 genes are
syntenic. This threshold is capable of finding weakly homologous regions, such as
regions undergoing high degree of fractionation following polyploidy. In our test,
moving the threshold below 10% would often run into the risk of false
positives due to repeats and gene transpositions.</p>
</sec>
<sec><title>Scoring Function</title>
<p>Scoring can be based on colinearity or density. For colinearity, a colinear
arrangement of syntenic genes is enforced, based on the “longest increasing
subsequence” method (<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al.
2011</xref>
). For density, we use single-linkage clustering to group gene pairs
within the window in comparison, and any arrangement of gene-pairs is tolerated.
Although colinearity is frequently used in plant genome comparisons, synteny
without requiring shared order is often the only criteria in the comparison of
insect and vertebrate genomes, due to different rates and scales of inversions and
translocations between plant and animal genomes (<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). The two different scoring
functions allow flexibility in accommodating taxa with different modes of
karyotypic evolutions.</p>
</sec>
<sec><title>Maximum Syntenic Depth: Limit the Number of Syntenic Regions Up To the
Specified Depth</title>
<p>This parameter is useful in lineages with shared duplication events. Enforcing the
syntenic depth allows screening of regions derived from specific evolutionary
events (<xref rid="evv219-B43" ref-type="bibr">Tang et al. 2011</xref>
). In
particular, enforcing a maximum syntenic depth of 1 between species which are
diploid relative to each other, but share one or more ancient whole-genome
duplications (WGDs) would limit the search to only orthologous regions. The
default is to output all syntenic regions found.</p>
</sec>
</sec>
<sec><title>CoGe Implementation</title>
<p>SynFind is implemented as one of the main entry points and analytical tools of CoGe.
The user-interface (UI) contains two sections: One which is used to select a gene of
interest and target genomes to search for syntenic homologs, the other to specify
SynFind’s algorithms and parameters (<xref ref-type="fig" rid="evv219-F2">fig.
2</xref>
). This UI is consistent with the general look-and-feel for other CoGe
tools. CoGe’s implementation of SynFind allows users to search an arbitrary
number of genomes for syntelogs of any gene located in a genome to which the user has
access. Specifically, the genomes need to be any public data sets or private data
sets that are owned by or shared with the user. Target genomes to be analyzed by
SynFind are similarly specified by searching for organisms by name or taxonomic
description, and then selecting the appropriate genome (<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>A</italic>
). By repeating the name searches,
several genomes may be added to the genome list (<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>B</italic>
). Researchers may also select a previously saved
genome list (e.g., a list of “ten grass genomes that have been sequenced thus
far”) as a shortcut for researchers interested in a frequently accessed set of
species. SynFind depends on the existence of structurally annotated protein coding
gene models as a starting point for any query (<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>C</italic>
). Some “draft” genome assemblies are
released and loaded into CoGe with no available gene annotations. These genomes are
automatically detected and excluded from the genome list (with information presented
to the user as to why the genome is blocked from analysis by SynFind). In the
configuration tab, users can select which algorithm to use for generating the
homology pairs file as well as SynFind parameters: Window size, minimum number of
genes to call a region syntenic, and the scoring scheme (colinear or density) (<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>D</italic>
). <fig id="evv219-F2" orientation="portrait" position="float"><label>F<sc>ig</sc>
. 2.—</label>
<caption><p>SynFind web UI. The web UI includes several components that users can
interact with (<italic>A</italic>
) find target genome and select target
genome version, (<italic>B</italic>
) build list of multiple target genomes,
(<italic>C</italic>
) input query gene, (<italic>D</italic>
) set SynFind
parameters.</p>
</caption>
<graphic xlink:href="evv219f2p"></graphic>
</fig>
</p>
<p>When SynFind completes its analysis, the results show a table of matching regions
along with their synteny scores and whether or not a syntenic gene was identified
(<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>A</italic>
). Additional
links are available under the table, including microsynteny analysis of the
identified regions in GEvo for validation, pairwise syntenic dotplots in SynMap,
links to raw data and intermediate data files, and a link to revisit and regenerate
the same SynFind analysis (<xref ref-type="fig" rid="evv219-F3">fig.
3</xref>
<italic>B</italic>
). <fig id="evv219-F3" orientation="portrait" position="float"><label>F<sc>ig</sc>
. 3.—</label>
<caption><p>SynFind example output. The output of a typical SynFind search:
(<italic>A</italic>
) List of all syntenic regions found and presence of
syntelog, (<italic>B</italic>
) links for micro-synteny viewer (GEvo) and
master tables for downstream analyses, (<italic>C</italic>
) syntenic depth
table useful for evaluating syntenic coverage and WGD events.</p>
</caption>
<graphic xlink:href="evv219f3p"></graphic>
</fig>
</p>
</sec>
<sec><title>Master Syntenic Pairs Table</title>
<p>SynFind identifies syntenic regions against any set of genomes given a gene in one
genome, and curates the results in a master gene list. The pan-genome master list is
important as this file contains all the syntenic regions identified in the target
genomes for all of the genes in the query genome. The master list is a tab-delimited
table, containing all syntenic gene sets between the query and target genomes, along
with links to visualize microsynteny for each local set of region. As a filtering
option, SynFind can also report top <italic>N</italic>
best matches in query
genome(s), which is useful to extract only orthologous regions that are often the
best syntenic match when <italic>N</italic>
is set to 1. As a byproduct of this
master gene pairs table, SynFind reports a list of genes that are unique to some
genomes. For example, in the case of comparing a set of bacterial strains, this
feature can be used to find pathogenicity genes and phage insertions specific to one
strain against others (<xref rid="evv219-B46" ref-type="bibr">Tettelin et al.
2005</xref>
).</p>
</sec>
<sec><title>Syntenic Depth</title>
<p>Syntenic depth refers to the number of syntenic regions identified in a target genome
for a given query position. SynFind calculates syntenic depth on a per gene basis and
reports these data as a histogram, showing a breakdown of how many genes are covered
in 1-, 2-, to <italic>x</italic>
-fold regions (<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>C</italic>
). Genes with a syntenic depth of zero are the
genes that lack any matching region in the target genome. A syntenic depth of one
most often reflects identification of an orthologous genomic region between two
species, whereas a syntenic depth greater than 1 most often is the result of either
paralogous or co-orthologous regions derived from whole-genome (or other large scale)
duplications. Syntenic depth provides a more consistent marker for large scale
genomic events than changes in the copy number of individual genes which are
influenced by a greater number of small scale processes (expansion and contraction of
tandem arrays, transposon capture and duplication, etc.). The proportion of genes
with a syntenic depth of at least 1 is a useful metric for evaluating the relative
completeness of genome assemblies, whereas modal and maximum syntenic depths are good
indicators for the number of paleopolyploidies in a given lineage.</p>
<p>Plant genomes have rich history of genome-wide duplication events that give rise to
very high level of syntenic depth (<xref rid="evv219-B42" ref-type="bibr">Tang,
Bowers, et al. 2008</xref>
). For example, in comparison to
<italic>Arabidopsis</italic>
genome, both peach and grapevine genomes show
significant genome coverage of depth up to 3 (<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>C</italic>
), corresponding to the pan-rosid genome
triplication event (<xref rid="evv219-B29" ref-type="bibr">Lyons et al. 2008</xref>
;
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). The
syntenic depth evaluation of SynFind was employed to identify multiple degenerate
polyploidy events in the highly compact plant genome, Utricularia (Ibarra-Laclette et
al. 2013). Examples of various syntenic depth tables and their interpretation in the
context of paleopolyploidy can be found on CoGePedia (<ext-link ext-link-type="uri" xlink:href="http://genomevolution.org/r/4suf">http://genomevolution.org/r/4suf</ext-link>
, last accessed November 30,
2015).</p>
</sec>
</sec>
<sec><title>Results and Discussion</title>
<sec><title>Focused Analyses for Functionally Important Genes</title>
<p>We show that SynFind is powerful for gene-centric analyses through selected examples
based on past studies, but the usage is generally applicable to almost any gene
family members in any set of organisms available in the CoGe database. In the past,
such comparative analyses would usually take much dedicated time and work—from
downloading and reformatting data sets, performing sequence alignment, reformatting
data again for use in synteny detection tools, identifying syntenic genes, selecting
informative visualization software for manual validation, and performing multiple
analyses to identify an optimal configuration of parameters and software
tools—all of which can now be performed within the SynFind tool in a few
clicks.</p>
<p>One natural application of SynFind is to deduce gene presence and absence across a
set of related organisms. In the context of bacterial genomics, we can infer possible
pathogenic sequences through syntenic comparisons (<xref rid="evv219-B21" ref-type="bibr">Jin et al. 2002</xref>
; <xref rid="evv219-B46" ref-type="bibr">Tettelin et al. 2005</xref>
). We used SynFind to compare three-way
<italic>Shigella flexneri</italic>
2a strain 301, <italic>Escherichia
coli</italic>
K12 substrain 1655 and <italic>Escherichia coli</italic>
O157:H7
strain EDL933, in an analysis similar to the study in <xref rid="evv219-B21" ref-type="bibr">Jin et al. (2002)</xref>
. When using <italic>S. flexneri</italic>
genome as the query, we looked for the cases where SynFind reported either proxy in
the two <italic>E. coli</italic>
genomes, that is, the genes that were missing in
their expected locations or for which expected regions could not be identified. This
has allowed us to identify <italic>Shigella</italic>
<italic>-</italic>
specific
“islands.” In particular, one 27 gene island (from
<italic>SF0294</italic>
to <italic>SF0320</italic>
) found only in the
<italic>Shigella</italic>
genome, previously termed SfII, was shown to be a
lysogenic phage insertion, by which <italic>Shigella</italic>
might have acquired
virulence (Jin et al. 2002). Other interesting genes on these
<italic>Shigella</italic>
-specific islands include <italic>ipaH</italic>
genes
(e.g., <italic>SF0722</italic>
, <italic>SF1383</italic>
, <italic>SF1880</italic>
, and
<italic>SF2610</italic>
) that shared homology with different phages (Jin et al.
2002). The SynFind link to this analysis is available: <ext-link ext-link-type="uri" xlink:href="https://genomevolution.org/r/fggo">https://genomevolution.org/r/fggo</ext-link>
(last accessed November 30,
2015).</p>
<p>As our second example, we use another previously studied gene involved in the soft
grain trait in the grasses. Genes involved in the soft grain trait has been studied
extensively in wheat, including the <italic>Hardness</italic>
(<italic>Ha</italic>
)
locus and several <italic>Ha</italic>
-like genes (<xref rid="evv219-B8" ref-type="bibr">Charles et al. 2009</xref>
). SynFind analysis (Brachypodium genes
as “query,” barley, rice, and sorghum as “target”) showed
that <italic>Ha</italic>
-like genes were present in Brachypodium representing the
lineage of Pooideae, but were missing in rice and sorghum. For barley, rice and
sorghum, SynFind output displays “proxy for region” rather than a direct
syntelog (<xref ref-type="fig" rid="evv219-F4">fig. 4</xref>
<italic>A</italic>
). With
visual proofing using GEvo, we confirmed that there is a syntenic sequence match in
barley, whereas there are no matching sequences in rice and sorghum as indicated by
SynFind (<xref ref-type="fig" rid="evv219-F4">fig. 4</xref>
<italic>B</italic>
). This
suggested that the flanking regions of <italic>Ha</italic>
-like gene were relatively
intact whereas the gene itself has been lost in rice and sorghum. Alternatively, the
gene could be inserted into this region in Brachypodium and barley. Although both
scenarios are equally likely, previous study preferred the scenario that the gene was
lost in rice and sorghum (<xref rid="evv219-B8" ref-type="bibr">Charles et al.
2009</xref>
). With SynFind tool, we have confirmed that the presence or absence of
the <italic>Ha</italic>
-like gene in this set of syntenic regions nicely explains the
soft wheat and barley grains versus the hard grains like in rice and sorghum. <fig id="evv219-F4" orientation="portrait" position="float"><label>F<sc>ig</sc>
. 4.—</label>
<caption><p>SynFind analysis of <italic>Ha</italic>
-like gene across Brachypodium,
barley, rice, sorghum. (<italic>A</italic>
) SynFind table output
illustrating four matching regions in the selected grasses. Result can be
regenerated: <ext-link ext-link-type="uri" xlink:href="https://genomevolution.org/r/iiv4">https://genomevolution.org/r/iiv4</ext-link>
(last accessed November 30,
2015). (<italic>B</italic>
) GEvo visualization of the compiled syntenic
regions, showing the presence of a syntenic sequence in barley, and lack of
syntenic ortholog in <italic>Ha</italic>
-like gene in rice and sorghum. Each
panel represents a syntenic region in Brachypodium, barley, rice, and
sorghum, from top to bottom. Arrows in each panel represent gene models, and
boxes on top of the gene models are sequence matches (HSPs). For the top
Brachypodium panel, there are three tracks of HSPs, which are to barley, to
rice and to sorghum, respectively. We can conclude that the
<italic>Ha</italic>
-like gene in Brachypodium has match to barley and no
match to rice and sorghum. Result can be regenerated: <ext-link ext-link-type="uri" xlink:href="https://genomevolution.org/r/iivx">https://genomevolution.org/r/iivx</ext-link>
(last accessed November 30,
2015).</p>
</caption>
<graphic xlink:href="evv219f4p"></graphic>
</fig>
</p>
<p>In addition to the two examples shown above for the purpose of demonstration, SynFind
has enabled a number of evolutionary studies of important functional genes in diverse
lineages (<xref rid="evv219-B53" ref-type="bibr">Woodhouse et al. 2010</xref>
; <xref rid="evv219-B44" ref-type="bibr">Tang and Lyons 2012</xref>
; <xref rid="evv219-B19" ref-type="bibr">Hofberger et al. 2013</xref>
; <xref rid="evv219-B51" ref-type="bibr">Waters et al. 2013</xref>
). For example, SynFind was used to
screen regions in the <italic>Aethionema arabicum</italic>
genome displaying synteny
to genomic regions in <italic>Arabidopsis thaliana</italic>
harboring glucosinolate
biosynthesis (GS) loci (Hofberger et al. 2013). SynFind was essential in clarifying
the series of tandem duplication and WGD events that drove GS pathway expansion,
which were critical to the evolutionary success to the mustard family (Hofberger et
al. 2013). Also, SynFind was essential for proving that the genome of
<italic>Utricularia gibba</italic>
, despite is small size (82 MB), is derived from
three sequential WGD events (Ibarra-Laclette et al. 2013).</p>
</sec>
<sec><title>Quality of Homology Assignments and Benchmark of SynFind against Competing
Tools</title>
<p>Clade-wide syntenic gene sets are useful for detecting genome-wide transposition and
deletion events (<xref rid="evv219-B53" ref-type="bibr">Woodhouse et al. 2010</xref>
;
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
), and
automation of this step could be essential in such studies. We have benchmarked
SynFind against a number of studies that typically require a substantial amount of
human curation to complete. Although the human curated gene sets are still imperfect
and subject to errors, they serve as a basis for comparing between different synteny
search tools including SynFind. In this study, we evaluate the performance of SynFind
and compare that with competing software including MCScanX and iADHoRe, which are the
two most popular state-of-the-art tools that perform well in a number of studies
(<xref rid="evv219-B34" ref-type="bibr">Proost et al. 2012</xref>
; <xref rid="evv219-B50" ref-type="bibr">Wang et al. 2012</xref>
).</p>
<p>Our first set of test data is a list of WGD duplicates from <italic>A.
thaliana</italic>
curated by <xref rid="evv219-B4" ref-type="bibr">Bowers et al.
(2003)</xref>
. This list contains a total of 5,788 gene duplicates collectively
derived from the alpha, beta, and gamma WGDs (<xref rid="evv219-B4" ref-type="bibr">Bowers et al. 2003</xref>
). Our second data set is based on comparison of yeast
genomes, using data from Yeast Gene Order Browser (YGOB) (<xref rid="evv219-B5" ref-type="bibr">Byrne and Wolfe 2005</xref>
). We were able to find 14 yeast
genomes in the CoGe system, whereas a few yeast species in YGOB were not yet released
to GenBank with structural gene annotations and therefore not included in this study.
YGOB uses “pillars” to store homology assignments (Byrne and Wolfe 2005),
which were converted to gene pairs for validation purposes. Finally, as the third
test set, we used a pan-grass synteny gene set curated by <xref rid="evv219-B39" ref-type="bibr">Schnable et al. (2012)</xref>
. Schnable et al. manually clustered
and curated gene members from rice, Brachypodium, sorghum, and maize according to
inter- and intragenomic comparisons (<xref rid="evv219-B39" ref-type="bibr">Schnable
et al. 2012</xref>
). A typical set of syntenic genes in the Schnable set contain
up to 2 rice genes, up to 2 Brachypodium genes, and up to 2 sorghum genes all derived
from the shared pan-grass WGD, and up to 4 maize genes because of an additional
maize-specific WGD. Similarly, we converted families into a list of gene pairs before
validation. The choice of these data sets is based on the availability of curated
data sets, and inclusion of gene sets with both paralogous and orthologous
relationships.</p>
<p>For SynFind, MCScanX, and iADHoRe, we computed the syntenic gene list and compared
against the curated set, which are considered as “truth” (<xref ref-type="fig" rid="evv219-F5">fig. 5</xref>
). Two metrics are
computed—“Sensitivity” (Sn) is defined as common items divided by
total items in truth set; “Purity” (Pu) is defined as common items
divided by total items in the test set as can be used to infer false-positive
discovery. SynFind consistently ranks the highest in sensitivity, recovering
63%, 75%, and 61% of the items in the truth set (<xref ref-type="fig" rid="evv219-F5">fig. 5</xref>
). As a tradeoff, the purity of
SynFind results compare less favorably than the other tools (<xref ref-type="fig" rid="evv219-F5">fig. 5</xref>
). As we have designed SynFind as a gene-centric
query tool, this benchmark reflects our focus on sensitivity—we would tolerate
some false positives but prefer to have low false negatives. Differences in the
treatments of tandem gene sets may have contributed to the nonoverlapping
members—SynFind, MCScanX, and iADHoRe may have picked a single matching gene
within the array which is not necessarily the tandem member in the curated set. <fig id="evv219-F5" orientation="portrait" position="float"><label>F<sc>ig</sc>
. 5.—</label>
<caption><p>Comparison of SynFind, MCScanX, and iADHoRe on curated data sets.
(<italic>A</italic>
) <italic>Arabidopsis thaliana</italic>
alpha, beta,
and gamma duplicates from Bowers et al. (2003). (<italic>B</italic>
) Yeast
genomes from YGOB (Byrne and Wolfe 2005). (<italic>C</italic>
) Grass genomes
from <xref rid="evv219-B39" ref-type="bibr">Schnable et al. (2012)</xref>
.
Sn: sensitivity, defined as common items divided by total items in truth
set; Pu: Purity, defined as common items divided by total items in the test
set.</p>
</caption>
<graphic xlink:href="evv219f5p"></graphic>
</fig>
</p>
<p>The list of predicted locations for missing genes is often good indication of
potential loss-of-function, which could be associated with differences in phenotypic
and physiological traits between grasses, as illustrated in our <italic>Ha</italic>
example. Missing genes in one grass genome versus others could also suggest possible
misassemblies, leading to iterative improvement of genome assemblies and recovery of
missing gene fragments in genome annotation efforts (Law et al. 2015).</p>
</sec>
<sec><title>Integration with CoGe Comparative Genomics Platform</title>
<p>Integration in CoGe permits SynFind to be tightly connected to thousands of genomes
as well as to downstream analysis tools such as GEvo (<xref rid="evv219-B28" ref-type="bibr">Lyons and Freeling 2008</xref>
) and SynMap (<xref rid="evv219-B29" ref-type="bibr">Lyons et al. 2008</xref>
) for micro and whole-genome syntenic
analysis, respectively. The method for selecting query and target genomes loads the
same module. SynFind automatically generates links to GEvo views for gene-centric
analyses as well as SynMap views for chromosome-level analyses. The open-ended
analysis workflow provides the users with enough flexibility between tools of
different scales. In addition, CoGe’s user-data management systems let
researches add private genomes and share them with collaborators, create lists
(notebooks) of genomes that can be imported quickly into SynFind, and automatically
record links to regenerate any analysis performed.</p>
<p>The CoGe job execution (JEX) framework facilitates parallel processing of queries
against multiple genomes by using Work Queue (<xref rid="evv219-B47" ref-type="bibr">Thrasher et al. 2012</xref>
) (<xref ref-type="fig" rid="evv219-F6">fig.
6</xref>
). When a SynFind analysis runs, each pairwise workflow consisting of
separate query-target genome pairs is submitted to CoGe’s JEX framework. The
JEX framework controls the parallel computing in processing multiple genomes (<xref ref-type="fig" rid="evv219-F6">fig. 6</xref>
). It first checks to see whether the
anticipated results file already exists and retrieves that file if it does,
otherwise, it submits the analysis for processing and subsequently caches the results
file. This system permits reusing the results of previously run analysis as well as
running multiple workflows in parallel. For example, in contrast to other gene
clustering approaches, new genomes can be incrementally added to the target list and
the CoGe server would only need to compute the missing comparisons. Overall, this
greatly improves the performance of the system in terms of the time it takes to
complete an analysis. Additionally, if a user decides to modify and rerun an
analysis, recomputation starts from the first divergent step of the analysis, while
reusing data from earlier, identically configured steps, allowing fast tweaking of
parameters. <fig id="evv219-F6" orientation="portrait" position="float"><label>F<sc>ig</sc>
. 6.—</label>
<caption><p>SynFind computational workflow as implemented on CoGe. The query genome and
target list of genomes are processed in parallel—extracting coding
sequences, building homology lists, filtering tandem repeats, and running
SynFind algorithm. The last step assembles the processed data into a master
table. This strategy is similar to the “Map-Reduce” paradigm
used in parallel computing.</p>
</caption>
<graphic xlink:href="evv219f6p"></graphic>
</fig>
</p>
<p>The scale of analysis in comparative genomics is an important issue. Although SynMap
excels in identifying large-scale structural similarities, it lacks the gene-centric
searches where researchers just want to study their genes of interest across a set of
genomes. This conceptual difference was often referred to as
“macrosynteny” versus “microsynteny” analyses in comparative
genomics. Microsynteny search tools, such as SynFind, achieve higher sensitivity and
more flexibility for gene-centric research. Although SynMap is necessarily
constrained to making pairwise comparisons between genomes, SynFind can
simultaneously launch comparisons of multiple genomes. Additionally, SynFind
identifies syntenic locations even when the gene itself is absent, either as a result
of lineage-specific gene deletion or lineage-specific gene insertion. Analyses based
on SynMap output required substantial customized offline postprocessing and analysis
to generate equivalent predicted locations (<xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
). Importantly, both of these tools permit on-the-fly
analyses and allow direct manipulation of parameters (e.g., higher or lower
stringency, such as window size and “score cutoff”), and are
interconnected in order to characterize and validate patterns of genome structure and
dynamics.</p>
<p>A typical exploratory workflow that we recommend would be to 1) use SynMap to
characterize genome-wide rearrangements and possibly genome duplications, 2) zoom-in
on a pair of contigs or chromosomes with interesting rearrangement or duplication
pattern, 3) select a gene to fish out additional syntenic regions using SynFind, and
4) validate putatively syntenic regions using GEvo to ensure that each region covered
the entire region of interest. In real-world applications, the combination of SynFind
and SynMap can both be applied to offer complementary views. For example, in a study
of conservation of imprinting across a set of grass taxa, gene-level comparisons were
made between syntenic genes in the genomes of maize, rice, and sorghum using the
software SynMap followed by SynFind to offer the most coverage (<xref rid="evv219-B51" ref-type="bibr">Waters et al. 2013</xref>
).</p>
</sec>
<sec><title>Scalable and Sustainable Infrastructure for Gene-Centric Evolutionary
Study</title>
<p>The SynFind algorithm addresses important limitations and challenges in the
postgenomics era. Researchers have access to large and inexpensive sequencing power
making it possible to study genetic and genomic evolution across whole clades of
species rather than being confined to individual model organisms. However, in order
to unlock the potential power of comparative genomic approaches to accelerate studies
of the origin, regulation, and function of individual genes it is necessary to enable
the broadest possible range of scientists to make direct comparisons across the
genomes of large groups of related species. Online computational resources, such as
CoGe, create ecosystems of specialized applications that are easily linked to and
from one another. Similarly, resources developed by cyberinfrastructure projects such
as the iPlant Collaborative (<xref rid="evv219-B15" ref-type="bibr">Goff et al.
2011</xref>
) and XSEDE provide computational platforms that enable scalable access
to computing and data storage resources.</p>
<p>The development of computational ecosystems which will be successful in bringing
about a democratization of bioinformatics research requires the deployment of modular
analysis pipelines that allow each new tool to exploit existing computational
resources, architectures, and curated data sets. SynFind joins the increasing list of
CoGe-powered and iPlant-enabled applications (Goff et al. 2011), which already
include GEvo, SynMap, and many others. The availability of SynFind will begin to
merge the two analytical worlds of comparative and functional genomics such that
researchers can more easily transfer system-level functional knowledge from data-rich
model organisms to the thousands of others organisms being analyzed by only a handful
of scientists. Conversely, SynFind enables comparative, in silico studies across a
wide range of species to inform the study of specific genes within model organisms,
where even today 30–34% of all genes have no annotated function (data
from <italic>Arabidopsis thaliana</italic>
, as cited in the <ext-link ext-link-type="uri" xlink:href="https://www.whitehouse.gov/sites/default/files/microsites/ostp/NSTC/npgi_five-year_plan_5-2014.pdf">National Plant Genome Initiative 2014 report</ext-link>
).</p>
</sec>
</sec>
<sec sec-type="conclusions"><title>Conclusions</title>
<p>SynFind fills the current gap of algorithm that performs syntenic gene queries and
compiles matching set of genomic regions on-the-fly. SynFind identifies all syntenic
regions to a given gene in a user-selected set of genomes, regardless of whether the
gene is still present in that region. SynFind is powered by an algorithm that calculates
synteny score between a pair of regions. Performance-wise, SynFind has higher
sensitivity but lower purity compared with competing tools when validated against
manually curated sets. Feature-wise, SynFind contains several key functions not
typically found in existing systems (<xref ref-type="table" rid="evv219-T1">table
1</xref>
). Integrated with the CoGe online platform and powered by the iPlant
project, syntenic queries can now be performed in an interactive manner and retrieved
for downstream analyses through SynFind in a scalable and reproducible manner. SynFind
is an important tool for assessing genome dynamics including gene transpositions, impact
of genome duplications, and correlation to functional changes across a set of related
taxa of interest.</p>
</sec>
<sec><title>Data Availability</title>
<p>SynFind is available for use through a web-based interface in CoGe. Data sets used in
benchmarking SynFind with related tools are available on figshare with the following
public DOI: <list list-type="bullet"><list-item><p>Tang, Haibao (2015): SynFind supporting data: Benchmark on three curated
syntenic gene sets. figshare. <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1589735">http://dx.doi.org/10.6084/m9.figshare.1589735</ext-link>
(last accessed
November 30, 2015)</p>
</list-item>
</list>
</p>
</sec>
</body>
<back><ack><title>Acknowledgments</title>
<p>The authors thank the Fujian provincial government for a Fujian “100 Talent
Plan” award to H.T. E.L. is supported by the Gordon and Betty Moore Foundation
grant number 3383 and the National Science Foundation grant number DBI – 1265383.
iPlant is supported by the National Science Foundation under grant numbers DBI-0735191
and DBI-1265383. They also thank Zhenghui Zhong for providing help in benchmarking the
performance of SynFind. They declare that they have no competing interests.</p>
</ack>
<ref-list><title>Literature Cited</title>
<ref id="evv219-B1"><mixed-citation publication-type="journal"><collab>Amborella Genome Project</collab>
.
<year>2013</year>
<article-title>The Amborella genome and the evolution of
flowering plants</article-title>
. <source>Science</source>
<volume>342</volume>
:<fpage>1241089</fpage>
.<pub-id pub-id-type="pmid">24357323</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B2"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Barbaglia</surname>
<given-names>AM</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>Gene capture by
Helitron transposons reshuffles the transcriptome of maize</article-title>
.
<source>Genetics</source>
<volume>190</volume>
:<fpage>965</fpage>
–<lpage>975</lpage>
.<pub-id pub-id-type="pmid">22174072</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B3"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Baxter</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>Conserved
noncoding sequences highlight shared components of regulatory networks in
dicotyledonous plants</article-title>
. <source>Plant Cell</source>
<volume>24</volume>
:<fpage>3949</fpage>
–<lpage>3965</lpage>
.<pub-id pub-id-type="pmid">23110901</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B4"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bowers</surname>
<given-names>JE</given-names>
</name>
<name><surname>Chapman</surname>
<given-names>BA</given-names>
</name>
<name><surname>Rong</surname>
<given-names>J</given-names>
</name>
<name><surname>Paterson</surname>
<given-names>AH</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Unravelling angiosperm
genome evolution by phylogenetic analysis of chromosomal duplication
events</article-title>
. <source>Nature</source>
<volume>422</volume>
:<fpage>433</fpage>
–<lpage>438</lpage>
.<pub-id pub-id-type="pmid">12660784</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B5"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Byrne</surname>
<given-names>KP</given-names>
</name>
<name><surname>Wolfe</surname>
<given-names>KH</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>The Yeast Gene Order
Browser: combining curated homology and syntenic context reveals gene fate in
polyploid species</article-title>
. <source>Genome Res.</source>
<volume>15</volume>
:<fpage>1456</fpage>
–<lpage>1461</lpage>
.<pub-id pub-id-type="pmid">16169922</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B6"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname>
<given-names>B</given-names>
</name>
<name><surname>Yang</surname>
<given-names>X</given-names>
</name>
<name><surname>Tuskan</surname>
<given-names>GA</given-names>
</name>
<name><surname>Cheng</surname>
<given-names>ZM</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>MicroSyn: a user
friendly tool for detection of microsynteny in a gene family</article-title>
.
<source>BMC Bioinformatics</source>
<volume>12</volume>
:<fpage>79</fpage>
.<pub-id pub-id-type="pmid">21418570</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B7"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chalhoub</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<year>2014</year>
<article-title>Early
allopolyploid evolution in the post-Neolithic <italic>Brassica napus</italic>
oilseed genome</article-title>
. <source>Science</source>
<volume>345</volume>
:<fpage>950</fpage>
–<lpage>953</lpage>
.<pub-id pub-id-type="pmid">25146293</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Charles</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Sixty million
years in evolution of soft grain trait in grasses: emergence of the softness locus
in the common ancestor of Pooideae and Ehrhartoideae, after their divergence from
Panicoideae</article-title>
. <source>Mol Biol Evol.</source>
<volume>26</volume>
:<fpage>1651</fpage>
–<lpage>1661</lpage>
.<pub-id pub-id-type="pmid">19395588</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Davidson</surname>
<given-names>RM</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>Comparative
transcriptomics of three Poaceae species reveals patterns of gene expression
evolution</article-title>
. <source>Plant J.</source>
<volume>71</volume>
:<fpage>492</fpage>
–<lpage>502</lpage>
.<pub-id pub-id-type="pmid">22443345</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B10"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dewey</surname>
<given-names>CN</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Positional orthology:
putting genomic evolutionary relationships into context</article-title>
.
<source>Brief Bioinformatics</source>
<volume>12</volume>
:<fpage>401</fpage>
–<lpage>412</lpage>
.<pub-id pub-id-type="pmid">21705766</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dong</surname>
<given-names>X</given-names>
</name>
<name><surname>Fredman</surname>
<given-names>D</given-names>
</name>
<name><surname>Lenhard</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Synorth: exploring the
evolution of synteny and long-range regulatory interactions in vertebrate
genomes</article-title>
. <source>Genome Biol.</source>
<volume>10</volume>
:<fpage>R86</fpage>
.<pub-id pub-id-type="pmid">19698106</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Engstrom</surname>
<given-names>PG</given-names>
</name>
<name><surname>Ho Sui</surname>
<given-names>SJ</given-names>
</name>
<name><surname>Drivenes</surname>
<given-names>O</given-names>
</name>
<name><surname>Becker</surname>
<given-names>TS</given-names>
</name>
<name><surname>Lenhard</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Genomic regulatory
blocks underlie extensive microsynteny conservation in insects</article-title>
.
<source>Genome Res.</source>
<volume>17</volume>
:<fpage>1898</fpage>
–<lpage>1908</lpage>
.<pub-id pub-id-type="pmid">17989259</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Many or most
genes in <italic>Arabidopsis</italic>
transposed after the origin of the order
Brassicales</article-title>
. <source>Genome Res.</source>
<volume>18</volume>
:<fpage>1924</fpage>
–<lpage>1937</lpage>
.<pub-id pub-id-type="pmid">18836034</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B14"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ghiurcuta</surname>
<given-names>CG</given-names>
</name>
<name><surname>Moret</surname>
<given-names>BM</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Evaluating synteny for
improved comparative studies</article-title>
. <source>Bioinformatics</source>
<volume>30</volume>
:<fpage>i9</fpage>
–<lpage>i18</lpage>
.<pub-id pub-id-type="pmid">24932010</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B15"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Goff</surname>
<given-names>SA</given-names>
</name>
<etal></etal>
</person-group>
<year>2011</year>
<article-title>The iPlant
collaborative: cyberinfrastructure for plant biology</article-title>
.
<source>Front Plant Sci.</source>
<volume>2</volume>
:<fpage>34</fpage>
.<pub-id pub-id-type="pmid">22645531</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Green</surname>
<given-names>RE</given-names>
</name>
<etal></etal>
</person-group>
<year>2014</year>
<article-title>Three crocodilian
genomes reveal ancestral patterns of evolution among archosaurs</article-title>
.
<source>Science</source>
<volume>346</volume>
:<fpage>1254449</fpage>
.<pub-id pub-id-type="pmid">25504731</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haudry</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<year>2013</year>
<article-title>An atlas of over
90,000 conserved noncoding sequences provides insight into crucifer regulatory
regions</article-title>
. <source>Nat Genet.</source>
<volume>45</volume>
:<fpage>891</fpage>
–<lpage>898</lpage>
.<pub-id pub-id-type="pmid">23817568</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Heger</surname>
<given-names>A</given-names>
</name>
<name><surname>Ponting</surname>
<given-names>CP</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Evolutionary rate
analyses of orthologs and paralogs from 12 Drosophila genomes</article-title>
.
<source>Genome Res.</source>
<volume>17</volume>
:<fpage>1837</fpage>
–<lpage>1849</lpage>
.<pub-id pub-id-type="pmid">17989258</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hofberger</surname>
<given-names>JA</given-names>
</name>
<name><surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name><surname>Edger</surname>
<given-names>PP</given-names>
</name>
<name><surname>Chris Pires</surname>
<given-names>J</given-names>
</name>
<name><surname>Eric Schranz</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Whole genome and tandem
duplicate retention facilitated glucosinolate pathway diversification in the
mustard family</article-title>
. <source>Genome Biol Evol.</source>
<volume>5</volume>
:<fpage>2155</fpage>
–<lpage>2173</lpage>
.<pub-id pub-id-type="pmid">24171911</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B20"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ibarra-Laclette</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<year>2013</year>
<article-title>Architecture and
evolution of a minute plant genome</article-title>
. <source>Nature</source>
<volume>498</volume>
:<fpage>94</fpage>
–<lpage>98</lpage>
.<pub-id pub-id-type="pmid">23665961</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B21"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jin</surname>
<given-names>Q</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Genome sequence
of <italic>Shigella flexneri</italic>
2a: insights into pathogenicity through
comparison with genomes of <italic>Escherichia coli</italic>
K12 and
O157</article-title>
. <source>Nucleic Acids Res.</source>
<volume>30</volume>
:<fpage>4432</fpage>
–<lpage>4441</lpage>
.<pub-id pub-id-type="pmid">12384590</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B22"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kielbasa</surname>
<given-names>SM</given-names>
</name>
<name><surname>Wan</surname>
<given-names>R</given-names>
</name>
<name><surname>Sato</surname>
<given-names>K</given-names>
</name>
<name><surname>Horton</surname>
<given-names>P</given-names>
</name>
<name><surname>Frith</surname>
<given-names>MC</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Adaptive seeds tame
genomic sequence comparison</article-title>
. <source>Genome Res.</source>
<volume>21</volume>
:<fpage>487</fpage>
–<lpage>493</lpage>
.<pub-id pub-id-type="pmid">21209072</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lai</surname>
<given-names>J</given-names>
</name>
<name><surname>Li</surname>
<given-names>Y</given-names>
</name>
<name><surname>Messing</surname>
<given-names>J</given-names>
</name>
<name><surname>Dooner</surname>
<given-names>HK</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Gene movement by
Helitron transposons contributes to the haplotype variability of
maize</article-title>
. <source>Proc Natl Acad Sci U S A.</source>
<volume>102</volume>
:<fpage>9068</fpage>
–<lpage>9073</lpage>
.<pub-id pub-id-type="pmid">15951422</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B24"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Law</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2015</year>
<article-title>Automated update,
revision, and quality control of the maize genome annotations using MAKER-P
improves the B73 RefGen_v3 gene models and identifies new genes</article-title>
.
<source>Plant Physiol.</source>
<volume>167</volume>
:<fpage>25</fpage>
–<lpage>39</lpage>
.<pub-id pub-id-type="pmid">25384563</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B25"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Li</surname>
<given-names>L</given-names>
</name>
<name><surname>Stoeckert</surname>
<given-names>CJ</given-names>
<suffix>Jr</suffix>
</name>
<name><surname>Roos</surname>
<given-names>DS</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>OrthoMCL: identification
of ortholog groups for eukaryotic genomes</article-title>
. <source>Genome
Res.</source>
<volume>13</volume>
:<fpage>2178</fpage>
–<lpage>2189</lpage>
.<pub-id pub-id-type="pmid">12952885</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B26"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ling</surname>
<given-names>X</given-names>
</name>
<name><surname>He</surname>
<given-names>X</given-names>
</name>
<name><surname>Xin</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Detecting gene clusters
under evolutionary constraint in a large number of genomes</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
:<fpage>571</fpage>
–<lpage>577</lpage>
.<pub-id pub-id-type="pmid">19158161</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B27"><mixed-citation publication-type="book"><person-group person-group-type="author"><name><surname>Lohr</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2014 Aug 18</year>
<article-title>For big-data
scientists, “Janitor Work” is key hurdle to insights</article-title>
.
<italic>The New York Times</italic>
<publisher-name>New York
City</publisher-name>
<comment>Available from: <ext-link ext-link-type="uri" xlink:href="http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0">http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0</ext-link>
</comment>
.</mixed-citation>
</ref>
<ref id="evv219-B28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>How to usefully compare
homologous plant genes and chromosomes as DNA sequences</article-title>
.
<source>Plant J.</source>
<volume>53</volume>
:<fpage>661</fpage>
–<lpage>673</lpage>
.<pub-id pub-id-type="pmid">18269575</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name><surname>Pedersen</surname>
<given-names>B</given-names>
</name>
<name><surname>Kane</surname>
<given-names>J</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>The value of nonmodel
genomes and an example using synmap within coge to dissect the hexaploidy that
predates the rosids</article-title>
. <source>Trop Plant Biol.</source>
<volume>1</volume>
:<fpage>181</fpage>
–<lpage>190</lpage>
.</mixed-citation>
</ref>
<ref id="evv219-B30"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Moreno-Hagelsieb</surname>
<given-names>G</given-names>
</name>
<name><surname>Trevino</surname>
<given-names>V</given-names>
</name>
<name><surname>Perez-Rueda</surname>
<given-names>E</given-names>
</name>
<name><surname>Smith</surname>
<given-names>TF</given-names>
</name>
<name><surname>Collado-Vides</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Transcription unit
conservation in the three domains of life: a perspective from <italic>Escherichia
coli</italic>
</article-title>
. <source>Trends Genet.</source>
<volume>17</volume>
:<fpage>175</fpage>
–<lpage>177</lpage>
.<pub-id pub-id-type="pmid">11275307</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ng</surname>
<given-names>MP</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>OrthoClusterDB:
an online platform for synteny blocks</article-title>
. <source>BMC
Bioinformatics</source>
<volume>10</volume>
:<fpage>192</fpage>
.<pub-id pub-id-type="pmid">19549318</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B32"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ostlund</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<year>2010</year>
<article-title>InParanoid 7: new
algorithms and tools for eukaryotic orthology analysis</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>38</volume>
:<fpage>D196</fpage>
–<lpage>D203</lpage>
.<pub-id pub-id-type="pmid">19892828</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B33"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Poyatos</surname>
<given-names>JF</given-names>
</name>
<name><surname>Hurst</surname>
<given-names>LD</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>The determinants of gene
order conservation in yeasts</article-title>
. <source>Genome Biol.</source>
<volume>8</volume>
:<fpage>R233</fpage>
.<pub-id pub-id-type="pmid">17983469</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B34"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Proost</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>i-ADHoRe
3.0—fast and sensitive detection of genomic homology in extremely large data
sets</article-title>
. <source>Nucleic Acids Res.</source>
<volume>40</volume>
:<fpage>e11</fpage>
.<pub-id pub-id-type="pmid">22102584</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B35"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Revanna</surname>
<given-names>KV</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>A web-based
multi-genome synteny viewer for customized data</article-title>
. <source>BMC
Bioinformatics</source>
<volume>13</volume>
:<fpage>190</fpage>
.<pub-id pub-id-type="pmid">22856879</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B36"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rodelsperger</surname>
<given-names>C</given-names>
</name>
<name><surname>Dieterich</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>CYNTENATOR: progressive
gene order alignment of 17 vertebrate genomes</article-title>
. <source>PLoS
One</source>
<volume>5</volume>
:<fpage>e8861</fpage>
.<pub-id pub-id-type="pmid">20126624</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B37"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schnable</surname>
<given-names>JC</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Genome evolution in
maize: from genomes back to genes</article-title>
. <source>Annu Rev Plant
Biol.</source>
<volume>66</volume>
:<fpage>329</fpage>
–<lpage>343</lpage>
.<pub-id pub-id-type="pmid">25494463</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B38"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schnable</surname>
<given-names>JC</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Genes identified by
visible mutant phenotypes show increased bias toward one of two subgenomes of
maize</article-title>
. <source>PLoS One</source>
<volume>6</volume>
:<fpage>e17855</fpage>
.<pub-id pub-id-type="pmid">21423772</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B39"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schnable</surname>
<given-names>JC</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
<name><surname>Lyons</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Genome-wide analysis of
syntenic gene deletion in the grasses</article-title>
. <source>Genome Biol
Evol.</source>
<volume>4</volume>
:<fpage>265</fpage>
–<lpage>277</lpage>
.<pub-id pub-id-type="pmid">22275519</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B40"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sinha</surname>
<given-names>AU</given-names>
</name>
<name><surname>Meller</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Cinteny: flexible
analysis and visualization of synteny and genome rearrangements in multiple
organisms</article-title>
. <source>BMC Bioinformatics</source>
<volume>8</volume>
:<fpage>82</fpage>
.<pub-id pub-id-type="pmid">17343765</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B41"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Soderlund</surname>
<given-names>C</given-names>
</name>
<name><surname>Bomhoff</surname>
<given-names>M</given-names>
</name>
<name><surname>Nelson</surname>
<given-names>WM</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>SyMAP v3.4: a turnkey
synteny system with application to plant genomes</article-title>
. <source>Nucleic
Acids Res.</source>
<volume>39</volume>
:<fpage>e68</fpage>
.<pub-id pub-id-type="pmid">21398631</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B42"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<name><surname>Bowers</surname>
<given-names>JE</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Synteny and
collinearity in plant genomes</article-title>
. <source>Science</source>
<volume>320</volume>
:<fpage>486</fpage>
–<lpage>488</lpage>
.<pub-id pub-id-type="pmid">18436778</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B43"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<year>2011</year>
<article-title>Screening synteny
blocks in pairwise genome comparisons through integer programming</article-title>
.
<source>BMC Bioinformatics</source>
<volume>12</volume>
:<fpage>102</fpage>
.<pub-id pub-id-type="pmid">21501495</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B44"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<name><surname>Lyons</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Unleashing the genome of
<italic>Brassica rapa</italic>
</article-title>
. <source>Front Plant
Sci.</source>
<volume>3</volume>
:<fpage>172</fpage>
.<pub-id pub-id-type="pmid">22866056</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B45"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<name><surname>Wang</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Unraveling
ancient hexaploidy through multiply-aligned angiosperm gene maps</article-title>
.
<source>Genome Res.</source>
<volume>18</volume>
:<fpage>1944</fpage>
–<lpage>1954</lpage>
.<pub-id pub-id-type="pmid">18832442</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B46"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tettelin</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Genome analysis
of multiple pathogenic isolates of <italic>Streptococcus agalactiae</italic>
:
implications for the microbial “pan-genome.”</article-title>
<source>Proc Natl Acad Sci U S A.</source>
.
<volume>102</volume>
:<fpage>13950</fpage>
–<lpage>13955</lpage>
.<pub-id pub-id-type="pmid">16172379</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B47"><mixed-citation publication-type="confproc"><person-group person-group-type="editor"><name><surname>Thrasher</surname>
<given-names>A</given-names>
</name>
<name><surname>Thain</surname>
<given-names>D</given-names>
</name>
<name><surname>Emrich</surname>
<given-names>S</given-names>
</name>
<name><surname>Musgrave</surname>
<given-names>Z</given-names>
</name>
</person-group>
, editors. <comment>Computational advances in bio and
medical sciences (ICCABS). 2012 IEEE 2nd International Conference on 2012 Feb
23–25. University of Las Vegas (Nevada): ICCABS</comment>
.</mixed-citation>
</ref>
<ref id="evv219-B48"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vergara</surname>
<given-names>IA</given-names>
</name>
<name><surname>Chen</surname>
<given-names>N</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Large synteny blocks
revealed between <italic>Caenorhabditis elegans</italic>
and
<italic>Caenorhabditis briggsae</italic>
genomes using
OrthoCluster</article-title>
. <source>BMC Genomics</source>
<volume>11</volume>
:<fpage>516</fpage>
.<pub-id pub-id-type="pmid">20868500</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B49"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Statistical
inference of chromosomal homology based on gene colinearity and applications to
<italic>Arabidopsis</italic>
and rice</article-title>
. <source>BMC
Bioinformatics</source>
<volume>7</volume>
:<fpage>447</fpage>
.<pub-id pub-id-type="pmid">17038171</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B50"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>MCScanX: a
toolkit for detection and evolutionary analysis of gene synteny and
collinearity</article-title>
. <source>Nucleic Acids Res.</source>
<volume>40</volume>
:<fpage>e49</fpage>
.<pub-id pub-id-type="pmid">22217600</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B51"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Waters</surname>
<given-names>AJ</given-names>
</name>
<etal></etal>
</person-group>
<year>2013</year>
<article-title>Comprehensive
analysis of imprinted genes in maize reveals allelic variation for imprinting and
limited conservation with other species</article-title>
. <source>Proc Natl Acad
Sci U S A.</source>
<volume>110</volume>
:<fpage>19639</fpage>
–<lpage>19644</lpage>
.<pub-id pub-id-type="pmid">24218619</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B52"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wolfe</surname>
<given-names>KH</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Yesterday’s
polyploids and the mystery of diploidization</article-title>
. <source>Nat Rev
Genet.</source>
<volume>2</volume>
:<fpage>333</fpage>
–<lpage>341</lpage>
.<pub-id pub-id-type="pmid">11331899</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B53"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Woodhouse</surname>
<given-names>MR</given-names>
</name>
<name><surname>Pedersen</surname>
<given-names>B</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Transposed genes in
<italic>Arabidopsis</italic>
are often associated with flanking
repeats</article-title>
. <source>PLoS Genet.</source>
<volume>6</volume>
:<fpage>e1000949</fpage>
.<pub-id pub-id-type="pmid">20485521</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B54"><mixed-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Woodhouse</surname>
<given-names>MR</given-names>
</name>
<name><surname>Tang</surname>
<given-names>H</given-names>
</name>
<name><surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Different gene families
in <italic>Arabidopsis thaliana</italic>
transposed in different epochs and at
different frequencies throughout the rosids</article-title>
. <source>Plant
Cell</source>
<volume>23</volume>
:<fpage>4241</fpage>
–<lpage>4253</lpage>
.<pub-id pub-id-type="pmid">22180627</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000059 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000059 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= CyberinfraV1 |flux= Pmc |étape= Corpus |type= RBID |clé= |texte= }}
![]() | This area was generated with Dilib version V0.6.25. | ![]() |