Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000059 ( Pmc/Corpus ); précédent : 0000589; suivant : 0000600 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand</title>
<author>
<name sortKey="Tang, Haibao" sort="Tang, Haibao" uniqKey="Tang H" first="Haibao" last="Tang">Haibao Tang</name>
<affiliation>
<nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bomhoff, Matthew D" sort="Bomhoff, Matthew D" uniqKey="Bomhoff M" first="Matthew D." last="Bomhoff">Matthew D. Bomhoff</name>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Briones, Evan" sort="Briones, Evan" uniqKey="Briones E" first="Evan" last="Briones">Evan Briones</name>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Liangsheng" sort="Zhang, Liangsheng" uniqKey="Zhang L" first="Liangsheng" last="Zhang">Liangsheng Zhang</name>
<affiliation>
<nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schnable, James C" sort="Schnable, James C" uniqKey="Schnable J" first="James C." last="Schnable">James C. Schnable</name>
<affiliation>
<nlm:aff id="evv219-AFF3">Department of Agronomy and Horticulture, University of Nebraska, Lincoln</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lyons, Eric" sort="Lyons, Eric" uniqKey="Lyons E" first="Eric" last="Lyons">Eric Lyons</name>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26560340</idno>
<idno type="pmc">4700967</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4700967</idno>
<idno type="RBID">PMC:4700967</idno>
<idno type="doi">10.1093/gbe/evv219</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000059</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand</title>
<author>
<name sortKey="Tang, Haibao" sort="Tang, Haibao" uniqKey="Tang H" first="Haibao" last="Tang">Haibao Tang</name>
<affiliation>
<nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bomhoff, Matthew D" sort="Bomhoff, Matthew D" uniqKey="Bomhoff M" first="Matthew D." last="Bomhoff">Matthew D. Bomhoff</name>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Briones, Evan" sort="Briones, Evan" uniqKey="Briones E" first="Evan" last="Briones">Evan Briones</name>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Liangsheng" sort="Zhang, Liangsheng" uniqKey="Zhang L" first="Liangsheng" last="Zhang">Liangsheng Zhang</name>
<affiliation>
<nlm:aff id="evv219-AFF1">Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schnable, James C" sort="Schnable, James C" uniqKey="Schnable J" first="James C." last="Schnable">James C. Schnable</name>
<affiliation>
<nlm:aff id="evv219-AFF3">Department of Agronomy and Horticulture, University of Nebraska, Lincoln</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lyons, Eric" sort="Lyons, Eric" uniqKey="Lyons E" first="Eric" last="Lyons">Eric Lyons</name>
<affiliation>
<nlm:aff id="evv219-AFF2">School of Plant Sciences, iPlant Collaborative, University of Arizona</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Genome Biology and Evolution</title>
<idno type="eISSN">1759-6653</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, even when no such gene is present. This capability means that synteny-based methods are far more effective than sequence similarity-based methods in identifying true-negatives, a necessity for studying gene loss and gene transposition. However, the identification of syntenic regions requires complex analyses which must be repeated for pairwise comparisons between any two species. Therefore, as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of target genomes. SynFind is capable of reporting per-gene information, useful for researchers studying specific gene families, as well as genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://genomevolution.org/CoGe/SynFind.pl">http://genomevolution.org/CoGe/SynFind.pl</ext-link>
. A video tutorial of SynFind using
<italic>Phytophthrora</italic>
as an example is available at
<ext-link ext-link-type="uri" xlink:href="http://www.youtube.com/watch?v=2Agczny9Nyc">http://www.youtube.com/watch?v=2Agczny9Nyc</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barbaglia, Am" uniqKey="Barbaglia A">AM Barbaglia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baxter, L" uniqKey="Baxter L">L Baxter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bowers, Je" uniqKey="Bowers J">JE Bowers</name>
</author>
<author>
<name sortKey="Chapman, Ba" uniqKey="Chapman B">BA Chapman</name>
</author>
<author>
<name sortKey="Rong, J" uniqKey="Rong J">J Rong</name>
</author>
<author>
<name sortKey="Paterson, Ah" uniqKey="Paterson A">AH Paterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Byrne, Kp" uniqKey="Byrne K">KP Byrne</name>
</author>
<author>
<name sortKey="Wolfe, Kh" uniqKey="Wolfe K">KH Wolfe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cai, B" uniqKey="Cai B">B Cai</name>
</author>
<author>
<name sortKey="Yang, X" uniqKey="Yang X">X Yang</name>
</author>
<author>
<name sortKey="Tuskan, Ga" uniqKey="Tuskan G">GA Tuskan</name>
</author>
<author>
<name sortKey="Cheng, Zm" uniqKey="Cheng Z">ZM Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chalhoub, B" uniqKey="Chalhoub B">B Chalhoub</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Charles, M" uniqKey="Charles M">M Charles</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davidson, Rm" uniqKey="Davidson R">RM Davidson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dewey, Cn" uniqKey="Dewey C">CN Dewey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dong, X" uniqKey="Dong X">X Dong</name>
</author>
<author>
<name sortKey="Fredman, D" uniqKey="Fredman D">D Fredman</name>
</author>
<author>
<name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Engstrom, Pg" uniqKey="Engstrom P">PG Engstrom</name>
</author>
<author>
<name sortKey="Ho Sui, Sj" uniqKey="Ho Sui S">SJ Ho Sui</name>
</author>
<author>
<name sortKey="Drivenes, O" uniqKey="Drivenes O">O Drivenes</name>
</author>
<author>
<name sortKey="Becker, Ts" uniqKey="Becker T">TS Becker</name>
</author>
<author>
<name sortKey="Lenhard, B" uniqKey="Lenhard B">B Lenhard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghiurcuta, Cg" uniqKey="Ghiurcuta C">CG Ghiurcuta</name>
</author>
<author>
<name sortKey="Moret, Bm" uniqKey="Moret B">BM Moret</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Green, Re" uniqKey="Green R">RE Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haudry, A" uniqKey="Haudry A">A Haudry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heger, A" uniqKey="Heger A">A Heger</name>
</author>
<author>
<name sortKey="Ponting, Cp" uniqKey="Ponting C">CP Ponting</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hofberger, Ja" uniqKey="Hofberger J">JA Hofberger</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Edger, Pp" uniqKey="Edger P">PP Edger</name>
</author>
<author>
<name sortKey="Chris Pires, J" uniqKey="Chris Pires J">J Chris Pires</name>
</author>
<author>
<name sortKey="Eric Schranz, M" uniqKey="Eric Schranz M">M Eric Schranz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ibarra Laclette, E" uniqKey="Ibarra Laclette E">E Ibarra-Laclette</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jin, Q" uniqKey="Jin Q">Q Jin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kielbasa, Sm" uniqKey="Kielbasa S">SM Kielbasa</name>
</author>
<author>
<name sortKey="Wan, R" uniqKey="Wan R">R Wan</name>
</author>
<author>
<name sortKey="Sato, K" uniqKey="Sato K">K Sato</name>
</author>
<author>
<name sortKey="Horton, P" uniqKey="Horton P">P Horton</name>
</author>
<author>
<name sortKey="Frith, Mc" uniqKey="Frith M">MC Frith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lai, J" uniqKey="Lai J">J Lai</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Messing, J" uniqKey="Messing J">J Messing</name>
</author>
<author>
<name sortKey="Dooner, Hk" uniqKey="Dooner H">HK Dooner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Law, M" uniqKey="Law M">M Law</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author>
<name sortKey="Stoeckert, Cj" uniqKey="Stoeckert C">CJ Stoeckert</name>
</author>
<author>
<name sortKey="Roos, Ds" uniqKey="Roos D">DS Roos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ling, X" uniqKey="Ling X">X Ling</name>
</author>
<author>
<name sortKey="He, X" uniqKey="He X">X He</name>
</author>
<author>
<name sortKey="Xin, D" uniqKey="Xin D">D Xin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lohr, S" uniqKey="Lohr S">S Lohr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Pedersen, B" uniqKey="Pedersen B">B Pedersen</name>
</author>
<author>
<name sortKey="Kane, J" uniqKey="Kane J">J Kane</name>
</author>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moreno Hagelsieb, G" uniqKey="Moreno Hagelsieb G">G Moreno-Hagelsieb</name>
</author>
<author>
<name sortKey="Trevino, V" uniqKey="Trevino V">V Trevino</name>
</author>
<author>
<name sortKey="Perez Rueda, E" uniqKey="Perez Rueda E">E Perez-Rueda</name>
</author>
<author>
<name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author>
<name sortKey="Collado Vides, J" uniqKey="Collado Vides J">J Collado-Vides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ng, Mp" uniqKey="Ng M">MP Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ostlund, G" uniqKey="Ostlund G">G Ostlund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Poyatos, Jf" uniqKey="Poyatos J">JF Poyatos</name>
</author>
<author>
<name sortKey="Hurst, Ld" uniqKey="Hurst L">LD Hurst</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Proost, S" uniqKey="Proost S">S Proost</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Revanna, Kv" uniqKey="Revanna K">KV Revanna</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodelsperger, C" uniqKey="Rodelsperger C">C Rodelsperger</name>
</author>
<author>
<name sortKey="Dieterich, C" uniqKey="Dieterich C">C Dieterich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schnable, Jc" uniqKey="Schnable J">JC Schnable</name>
</author>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sinha, Au" uniqKey="Sinha A">AU Sinha</name>
</author>
<author>
<name sortKey="Meller, J" uniqKey="Meller J">J Meller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soderlund, C" uniqKey="Soderlund C">C Soderlund</name>
</author>
<author>
<name sortKey="Bomhoff, M" uniqKey="Bomhoff M">M Bomhoff</name>
</author>
<author>
<name sortKey="Nelson, Wm" uniqKey="Nelson W">WM Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Bowers, Je" uniqKey="Bowers J">JE Bowers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tettelin, H" uniqKey="Tettelin H">H Tettelin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thrasher, A" uniqKey="Thrasher A">A Thrasher</name>
</author>
<author>
<name sortKey="Thain, D" uniqKey="Thain D">D Thain</name>
</author>
<author>
<name sortKey="Emrich, S" uniqKey="Emrich S">S Emrich</name>
</author>
<author>
<name sortKey="Musgrave, Z" uniqKey="Musgrave Z">Z Musgrave</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vergara, Ia" uniqKey="Vergara I">IA Vergara</name>
</author>
<author>
<name sortKey="Chen, N" uniqKey="Chen N">N Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waters, Aj" uniqKey="Waters A">AJ Waters</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wolfe, Kh" uniqKey="Wolfe K">KH Wolfe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woodhouse, Mr" uniqKey="Woodhouse M">MR Woodhouse</name>
</author>
<author>
<name sortKey="Pedersen, B" uniqKey="Pedersen B">B Pedersen</name>
</author>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Woodhouse, Mr" uniqKey="Woodhouse M">MR Woodhouse</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H Tang</name>
</author>
<author>
<name sortKey="Freeling, M" uniqKey="Freeling M">M Freeling</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Genome Biol Evol</journal-id>
<journal-id journal-id-type="iso-abbrev">Genome Biol Evol</journal-id>
<journal-id journal-id-type="publisher-id">gbe</journal-id>
<journal-id journal-id-type="hwp">gbe</journal-id>
<journal-title-group>
<journal-title>Genome Biology and Evolution</journal-title>
</journal-title-group>
<issn pub-type="epub">1759-6653</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26560340</article-id>
<article-id pub-id-type="pmc">4700967</article-id>
<article-id pub-id-type="doi">10.1093/gbe/evv219</article-id>
<article-id pub-id-type="publisher-id">evv219</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genome Resources</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Tang</surname>
<given-names>Haibao</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="evv219-AFF2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bomhoff</surname>
<given-names>Matthew D.</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Briones</surname>
<given-names>Evan</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Liangsheng</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schnable</surname>
<given-names>James C.</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lyons</surname>
<given-names>Eric</given-names>
</name>
<xref ref-type="aff" rid="evv219-AFF2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="evv219-COR1">*</xref>
</contrib>
<aff id="evv219-AFF1">
<sup>1</sup>
Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province, China</aff>
<aff id="evv219-AFF2">
<sup>2</sup>
School of Plant Sciences, iPlant Collaborative, University of Arizona</aff>
<aff id="evv219-AFF3">
<sup>3</sup>
Department of Agronomy and Horticulture, University of Nebraska, Lincoln</aff>
</contrib-group>
<author-notes>
<corresp id="evv219-COR1">*Corresponding author: E-mail:
<email>elyons.uoa@gmail.com</email>
.</corresp>
<fn id="FN1">
<p>
<bold>Associate editor:</bold>
Kenneth Wolfe</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<month>12</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>11</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>11</day>
<month>11</month>
<year>2015</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>7</volume>
<issue>12</issue>
<fpage>3286</fpage>
<lpage>3298</lpage>
<history>
<date date-type="accepted">
<day>6</day>
<month>11</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.</copyright-statement>
<copyright-year>2015</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/" license-type="creative-commons">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, even when no such gene is present. This capability means that synteny-based methods are far more effective than sequence similarity-based methods in identifying true-negatives, a necessity for studying gene loss and gene transposition. However, the identification of syntenic regions requires complex analyses which must be repeated for pairwise comparisons between any two species. Therefore, as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of target genomes. SynFind is capable of reporting per-gene information, useful for researchers studying specific gene families, as well as genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://genomevolution.org/CoGe/SynFind.pl">http://genomevolution.org/CoGe/SynFind.pl</ext-link>
. A video tutorial of SynFind using
<italic>Phytophthrora</italic>
as an example is available at
<ext-link ext-link-type="uri" xlink:href="http://www.youtube.com/watch?v=2Agczny9Nyc">http://www.youtube.com/watch?v=2Agczny9Nyc</ext-link>
.</p>
</abstract>
<kwd-group>
<kwd>synteny</kwd>
<kwd>homology</kwd>
<kwd>genome evolution</kwd>
<kwd>cyberinfrastructure</kwd>
</kwd-group>
<counts>
<page-count count="13"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Conserved synteny refers to an inferred homology relationship between genes which are supported by sharing a common genomic neighborhood, and is a widely used measurement of evolutionary divergence across all domains of life (
<xref rid="evv219-B30" ref-type="bibr">Moreno-Hagelsieb et al. 2001</xref>
;
<xref rid="evv219-B12" ref-type="bibr">Engstrom et al. 2007</xref>
;
<xref rid="evv219-B18" ref-type="bibr">Heger and Ponting 2007</xref>
;
<xref rid="evv219-B33" ref-type="bibr">Poyatos and Hurst 2007</xref>
;
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). Conserved synteny is evident when large sets of genes or genomic features are preserved in close proximity (synteny), and often in the same order and orientations (colinearity) (
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). Conserved synteny across species lays an essential foundation for genomic research, including map-based cloning, validating predicted gene models (
<xref rid="evv219-B24" ref-type="bibr">Law et al. 2015</xref>
), and identifying conserved noncoding sequences (
<xref rid="evv219-B17" ref-type="bibr">Haudry et al. 2013</xref>
). Conserved synteny within species identifies ancient polyploidy events or other types of large-scale genomic duplications (
<xref rid="evv219-B52" ref-type="bibr">Wolfe 2001</xref>
).</p>
<p>Synteny provides an extra layer of information to confirm gene homology, and is much more reliable than inference based on sequence similarities alone. Results from a typical Basic Local Alignment Search Tool (BLAST) analyses do not easily indicate whether there is a gene loss or transposition. Popular approaches based on the reciprocal best hit do not take into account the ancestral state of a genome nor provide much insight into the evolutionary history of a gene or gene family. More generally, protein clustering algorithms such as OrthoMCL (
<xref rid="evv219-B25" ref-type="bibr">Li et al. 2003</xref>
) and INPARANOID (
<xref rid="evv219-B32" ref-type="bibr">Ostlund et al. 2010</xref>
) may be successful for single copy gene families when evolutionary rates are constant, but can be confounded by accelerated rates of evolution in certain gene copies, and will sometimes produce false-positive assignments of orthology, particularly in cases of reciprocal loss of paralogous genes between species. Positional studies that track gene movements over evolutionary time require more gene-centric synteny tools (
<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al. 2011</xref>
).</p>
<p>Curated syntenic gene sets are critical tools for deriving genome-scale patterns and evolutionary trends, and are widely popular (
<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al. 2011</xref>
;
<xref rid="evv219-B3" ref-type="bibr">Baxter et al. 2012</xref>
;
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
). Unfortunately, construction of robust and accurate syntenic data sets requires a set of specialized comparative genomic skills currently limited to a small number of research groups. Until now, the primary method by which the broader research community employed syntenic information in their research is through manually curated syntenic gene sets published by these groups. Manually curated gene sets are inherently limiting because, as a result of the lag introduced by the publication cycle, by the time a given syntenic gene set is published, genome assemblies for new species will often have become available, and genome assemblies, annotations, and gene identifiers will often have been updated for existing published genomes. Genome sequence assemblies being released at an ever increasing pace, there is a need for tools that enable individual researchers to rapidly identify syntenic regions between species.</p>
<p>The majority of community use of synteny data generally falls into one of several use cases: 1) Researchers interested in a specific gene from a specific species who want to rapidly find the syntenic ortholog(s) of their target gene in one or more additional species and 2) researchers who want to trace changes in the positional history of a single gene or gene family across a population of related species. In addition to the lag time introduced in publishing syntenic gene lists, most published lists only provide information on conserved syntenic orthologs, but do not provide information on predicted syntenic locations for genes where no syntenic orthologs are found. This severely limits their utility for use case #2 above, as it strips out one of the key advantages of syntenic analysis, the ability to identify confident sets of “true negatives.” True negatives include both lineage specific, recently inserted genes (also known as the “gray genome”) (
<xref rid="evv219-B13" ref-type="bibr">Freeling et al. 2008</xref>
), and genes conserved at syntenic locations across multiple species in a clade but deleted from the genomes of one or more specific species. Many evolutionary studies require the knowledge of whether a certain gene is indeed missing or relocated from a genomic region (transposition). Distinguishing transposition from gene removal is critical because potential changes in gene expression patterns are different under these two scenarios.</p>
<p>Identification of syntenic genes has additional advantages for functional research studies, as syntenic homologs are more likely to retain the same expression pattern than nonsyntenic homologs (
<xref rid="evv219-B10" ref-type="bibr">Dewey 2011</xref>
;
<xref rid="evv219-B37" ref-type="bibr">Schnable 2015</xref>
). Orthologous genes (as identified by OrthoMCL) at nonsyntenic locations show reduced correlation in expression pattern between different grass species (
<xref rid="evv219-B9" ref-type="bibr">Davidson et al. 2012</xref>
). Genes captured by helitrons and relocated to a new genomic neighborhood in maize show novel patterns of expression (
<xref rid="evv219-B2" ref-type="bibr">Barbaglia et al. 2012</xref>
). Common methods of gene transposition—transposon capture (
<xref rid="evv219-B23" ref-type="bibr">Lai et al. 2005</xref>
) and intrachromosomal recombination (
<xref rid="evv219-B53" ref-type="bibr">Woodhouse et al. 2010</xref>
)—can often carry protein-coding sequence of a gene without the associated regulatory sequences. A study in maize also found that genes that retain in syntenic positions across multiple grass species were significantly more likely than nonsyntenic genes to produce visible mutant phenotypes when knocked out (
<xref rid="evv219-B38" ref-type="bibr">Schnable and Freeling 2011</xref>
), further highlighting the functional relevance of synteny information in the validation of direct functional homologs.</p>
<p>As we provide a novel implementation of yet another synteny-finding tool, we offer an overview of popular synteny-finding algorithms, including several tools that were designed and implemented by several of the authors in the past. In general, the synteny-finding algorithms can be grouped based on whether they are based on positional colinearity or positional density, for what type of statistical features they are searching (
<xref rid="evv219-B14" ref-type="bibr">Ghiurcuta and Moret 2014</xref>
), and their definition of “syntenic block.” A list of recent synteny search software includes iAdHore (
<xref rid="evv219-B34" ref-type="bibr">Proost et al. 2012</xref>
), mGSV (
<xref rid="evv219-B35" ref-type="bibr">Revanna et al. 2012</xref>
), SyMap (
<xref rid="evv219-B41" ref-type="bibr">Soderlund et al. 2011</xref>
), SynMap (
<xref rid="evv219-B29" ref-type="bibr">Lyons et al. 2008</xref>
), Orthocluster (
<xref rid="evv219-B48" ref-type="bibr">Vergara and Chen 2010</xref>
), Synorth (
<xref rid="evv219-B11" ref-type="bibr">Dong et al. 2009</xref>
), MCScan (
<xref rid="evv219-B45" ref-type="bibr">Tang, Wang, et al. 2008</xref>
), and MCScanX (
<xref rid="evv219-B50" ref-type="bibr">Wang et al. 2012</xref>
) among many others. These synteny search software vary greatly in the trade-offs accepted by the authors in terms of run time, computational resource requirements, and goal of minimizing either type I (false positive) or type II (false negative) errors. In addition, from a pragmatic standpoint, the tools are also distinguished by interface type (i.e., command line, web based) and whether a given tool offers the built-in functionality to provide graphical outputs, enabling visual proofing of results. Herein, we provide a review of major features of recent synteny-finding software in
<xref ref-type="table" rid="evv219-T1">table 1</xref>
.
<table-wrap id="evv219-T1" orientation="portrait" position="float">
<label>Table 1</label>
<caption>
<p>Comparison of Major Features of Synteny-Based Homology Detection Software</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="1" colspan="1">Tool</th>
<th rowspan="1" colspan="1">References</th>
<th rowspan="1" colspan="1">Interface</th>
<th rowspan="1" colspan="1">Multiple Genomes</th>
<th rowspan="1" colspan="1">Syntenic Families</th>
<th rowspan="1" colspan="1">Infer Gene Loss</th>
<th rowspan="1" colspan="1">Scoring Mode</th>
<th rowspan="1" colspan="1">Parallel Computing</th>
<th rowspan="1" colspan="1">Integration with Data</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">ColinearScan</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B49" ref-type="bibr">Wang et al. (2006)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Cinteny</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B40" ref-type="bibr">Sinha and Meller (2007)</xref>
</td>
<td rowspan="1" colspan="1">Web</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Limited (∼20)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">MCScan</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. (2008)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">SynMap</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B29" ref-type="bibr">Lyons et al. (2008)</xref>
</td>
<td rowspan="1" colspan="1">Web</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Hybrid</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">CoGe (∼25K)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">MCMuSeC</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B26" ref-type="bibr">Ling et al. (2009)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">Synteny</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">OrthoClusterDB</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B31" ref-type="bibr">Ng et al. (2009)</xref>
</td>
<td rowspan="1" colspan="1">Web</td>
<td rowspan="1" colspan="1">Limited</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Limited (∼50)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Cyntenator</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B36" ref-type="bibr">Rodelsperger and Dieterich (2010)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">MicroSyn</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B6" ref-type="bibr">Cai et al. (2011)</xref>
</td>
<td rowspan="1" colspan="1">GUI</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Synteny</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">SyMAP</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B41" ref-type="bibr">Soderlund et al. (2011)</xref>
</td>
<td rowspan="1" colspan="1">GUI/Web</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Hybrid</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Limited (∼10)</td>
</tr>
<tr>
<td rowspan="1" colspan="1">MCScanX</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B50" ref-type="bibr">Wang et al. (2012)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Colinear</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">i-ADHoRe</td>
<td rowspan="1" colspan="1">
<xref rid="evv219-B34" ref-type="bibr">Proost et al. (2012)</xref>
</td>
<td rowspan="1" colspan="1">Command</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Both/Hybrid</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">SynFind</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Command/Web</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">Both</td>
<td rowspan="1" colspan="1">+</td>
<td rowspan="1" colspan="1">CoGe (∼25K)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="evv219-TF1">
<p>N
<sc>ote</sc>
.—The tools published in the last 10 years are given in the table. Symbols + and − represent yes and no, respectively. “Scoring mode” is the optimization goal used in identifying syntenic regions. “Colinear” requires the gene order to be preserved; “Synteny” does not enforce conserved gene order; “Hybrid” uses “Colinear” initially and recruits imperfect synteny; “Both” supports both modes as program options. “Integration with data” is a count of available genomes for immediate use with a given tool.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>A careful evaluation of these algorithms suggested fundamental challenges that are still not met for more general uses. First and foremost, data curation is often a significant challenge (
<xref rid="evv219-B27" ref-type="bibr">Lohr 2014</xref>
), requiring users to convert genomic annotation files into a range of idiosyncratic file formats required by different algorithms. Many tools are run from the command line, and often obtaining the most accurate results from a given tool will require experimentation with a range of settings, presenting an additional challenge to users who must develop methods of evaluating and ranking multiple output data sets. As the number of organisms a user is interested in comparing grows, computational time requirements will often scale quadratically, presenting challenges for these primarily offline algorithms.</p>
<p>After closely working with researchers in the community in the past few years, it was clear that the life cycle of gene synteny analysis requires running multiple algorithms to create input homology data (different BLAST-like algorithms), adjusting parameters on-the-fly (configurable thresholds), as well as allowing different synteny-finding/scoring schemes (colinear vs. density) (
<xref ref-type="table" rid="evv219-T1">table 1</xref>
). Following the same design principle as other CoGe tools, we continue to adopt a cloud-based implementation that offers a one-stop solution that combines user-configurable input data (genomes and structural annotations), algorithms, scalable computing resources (parallelization, memory, and storage), integrated visualization, links to additional tools for further data analysis, readily exportable results, and reproducibility through permanent URLs.</p>
<p>Our new online method, SynFind, has a number of features not typically found in other systems (
<xref ref-type="table" rid="evv219-T1">table 1</xref>
) that reflect recent innovations in comparative genomic analysis adopted in a few newly sequenced genomes (
<xref rid="evv219-B1" ref-type="bibr">Amborella Genome Project 2013</xref>
;
<xref rid="evv219-B20" ref-type="bibr">Ibarra-Laclette et al. 2013</xref>
;
<xref rid="evv219-B7" ref-type="bibr">Chalhoub et al. 2014</xref>
;
<xref rid="evv219-B16" ref-type="bibr">Green et al. 2014</xref>
). SynFind identifies multiple syntenic regions between a gene in a reference genome and a target genome, entirely independently of whether syntenic ortholog or paralog is present at the predicted location or not. SynFind provides the option for both density and colinear scoring of syntenic regions to address the different structural genomic changes in taxa with different evolutionary distances and different genome assembly qualities. SynFind generates syntenic depth tables as well as gene presence–absence table to reveal ancient polyploidy events and genes unique to one genome against others. Most critically, the integration with CoGe provides instant access to thousands of genomes across all domains of life along with CoGe’s tools to let users add new genomes, keep them private, and compare them using SynFind as rapidly as they are released. Tight integration with up-to-date genomic data facilitates access to computing resources, downstream visualization and analysis tools, thereby creating an open-ended pipeline of research that facilitates exploration of multidimensional genomic data sets that bridge evolutionary genomics and functional genomics.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Synteny Score</title>
<p>SynFind processes putatively homologous gene pairs in order to extract the syntenic blocks, using each gene as query. Gene pairs are computed from sequence similarity search programs, such as BLAST, LASTZ, or LAST (
<xref rid="evv219-B22" ref-type="bibr">Kielbasa et al. 2011</xref>
). The modular architecture of SynFind allows the straightforward incorporation of new sequence similarity search algorithms in the future. Although SynFind can output information for a single gene, in each run, syntenic regions in the target genome(s) are identified for every annotated gene in the query genome. Extra caution is taken with genes which are members of tandem arrays (groups of homologous genes clustered together in the genome) as matches among such genes are likely overcounted and show up as false-positive synteny blocks. Consequently, tandem matches are reduced to a single copy in this step to avoid seeding a synteny block inside a tandem array. The treatment of tandem arrays is similar to the strategy used in MCScanX and iADHoRe (
<xref rid="evv219-B34" ref-type="bibr">Proost et al. 2012</xref>
;
<xref rid="evv219-B50" ref-type="bibr">Wang et al. 2012</xref>
).</p>
<p>To seed synteny blocks, our algorithm works by selecting a fixed number of genes up and downstream from the query gene (
<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>A</italic>
). This method is robust with respect to variation in gene density and intergenic spacing observed across different species. All gene pairs to a target genome between the region surrounding the gene of interest and candidate syntenic locations in the target genome are then identified and the number of matching gene pairs is counted as the “synteny score” (
<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>B</italic>
). SynFind provides positioning cues for visualization through genome browsers. Comparisons across sets of homologous regions are facilitated through automated centering and truncation of colinear panels. The middle gene of the current window or the “query” is used to as the center of the syntenic panels. The extent of syntenic gene pairs in the current window can be used to truncate the matching panels to focus on a particular region of interest. Finally, SynFind automatically flips sequences so syntenic regions are visualized on the same strand for clarity. These data are useful in automatically creating local syntenic views in CoGe for subsequent manual validation.
<fig id="evv219-F1" orientation="portrait" position="float">
<label>F
<sc>ig</sc>
. 1.—</label>
<caption>
<p>Illustration of three key steps in SynFind. The three key steps include (
<italic>A</italic>
) extraction of genomic neighborhood, (
<italic>B</italic>
) gene pair generation and scoring of each matching region, and (
<italic>C</italic>
) identification of flankers (neighboring gene pairs) and annotation of syntelog class.</p>
</caption>
<graphic xlink:href="evv219f1p"></graphic>
</fig>
</p>
<p>The output of the seeding step consists of syntenic gene pairs and a score to indicate the level of conserved synteny between their respective genomic locations. For each target region found, the synteny score reflects the number of gene pairs that are syntenic or colinear within the window, depending on the scoring function. When a matching region is found, the flanking genes for the query gene are identified and the status of the syntelog is tracked in a single letter notation—S/F/G, following the nomenclature in
<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al. (2011)</xref>
. S is “syntelog,” which means that it has a match to the region. In this case, the match itself is used to represent the region. In contrast, F class and G class refer to the cases that the syntelog is missing (fractionated or moved) from syntenic region identified in the target genome. F has both flankers present, whereas G has only one flanker (
<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>C</italic>
). G class syntenic regions are largely the result of adjacent genomic rearrangements (inversions and translocations) in either the target or query genome, but can also occur at the end of pseudomolecules, scaffolds, or contigs. In the case of F or G, a flanker gene is used to represent the region as a “proxy” to identify the approximate location of where a syntelog is expected to reside in the target genome.</p>
<p>As a final validation, we recover tandem matches by checking against the original BLAST output as the tandem matches were reduced to single copy prior to the “seeding” step. This validation step increases the sensitivity of SynFind for genes inside tandem arrays. A single best match among the tandem array is selected to be the representative syntelog for a query gene, for the sake of clarity. The source code of SynFind can be found at
<ext-link ext-link-type="uri" xlink:href="https://github.com/tanghaibao/quota-alignment/blob/master/scripts/synteny_score.py">https://github.com/tanghaibao/quota-alignment/blob/master/scripts/synteny_score.py</ext-link>
(last accessed November 30, 2015).</p>
</sec>
<sec>
<title>Choice of Parameters: Beauty in Simplicity</title>
<p>There are a few intuitive, user-configurable parameters that adjust sensitivity or specificity of SynFind.</p>
<sec>
<title>Window Size: Window Size in Number of Neighboring Genes (Default: 40)</title>
<p>Given an anchor gene, SynFind searches upstream and downstream half a window size from the query. For example, a window size of 40 means that a total of 41 genes are checked: The query gene, plus 20 upstream genes and 20 downstream genes (
<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>A</italic>
).</p>
<p>
<italic>Minimum synteny score</italic>
: The minimum number of anchoring genes to call a region “syntenic.”</p>
<p>The combination of “window size” and “minimum number of genes” together controls the sensitivity and specificity of the algorithm (
<xref ref-type="fig" rid="evv219-F1">fig. 1</xref>
<italic>B</italic>
). The default number 4 means that a region is considered syntenic if 4 of 41 genes are syntenic. This threshold is capable of finding weakly homologous regions, such as regions undergoing high degree of fractionation following polyploidy. In our test, moving the threshold below 10% would often run into the risk of false positives due to repeats and gene transpositions.</p>
</sec>
<sec>
<title>Scoring Function</title>
<p>Scoring can be based on colinearity or density. For colinearity, a colinear arrangement of syntenic genes is enforced, based on the “longest increasing subsequence” method (
<xref rid="evv219-B54" ref-type="bibr">Woodhouse et al. 2011</xref>
). For density, we use single-linkage clustering to group gene pairs within the window in comparison, and any arrangement of gene-pairs is tolerated. Although colinearity is frequently used in plant genome comparisons, synteny without requiring shared order is often the only criteria in the comparison of insect and vertebrate genomes, due to different rates and scales of inversions and translocations between plant and animal genomes (
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). The two different scoring functions allow flexibility in accommodating taxa with different modes of karyotypic evolutions.</p>
</sec>
<sec>
<title>Maximum Syntenic Depth: Limit the Number of Syntenic Regions Up To the Specified Depth</title>
<p>This parameter is useful in lineages with shared duplication events. Enforcing the syntenic depth allows screening of regions derived from specific evolutionary events (
<xref rid="evv219-B43" ref-type="bibr">Tang et al. 2011</xref>
). In particular, enforcing a maximum syntenic depth of 1 between species which are diploid relative to each other, but share one or more ancient whole-genome duplications (WGDs) would limit the search to only orthologous regions. The default is to output all syntenic regions found.</p>
</sec>
</sec>
<sec>
<title>CoGe Implementation</title>
<p>SynFind is implemented as one of the main entry points and analytical tools of CoGe. The user-interface (UI) contains two sections: One which is used to select a gene of interest and target genomes to search for syntenic homologs, the other to specify SynFind’s algorithms and parameters (
<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
). This UI is consistent with the general look-and-feel for other CoGe tools. CoGe’s implementation of SynFind allows users to search an arbitrary number of genomes for syntelogs of any gene located in a genome to which the user has access. Specifically, the genomes need to be any public data sets or private data sets that are owned by or shared with the user. Target genomes to be analyzed by SynFind are similarly specified by searching for organisms by name or taxonomic description, and then selecting the appropriate genome (
<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>A</italic>
). By repeating the name searches, several genomes may be added to the genome list (
<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>B</italic>
). Researchers may also select a previously saved genome list (e.g., a list of “ten grass genomes that have been sequenced thus far”) as a shortcut for researchers interested in a frequently accessed set of species. SynFind depends on the existence of structurally annotated protein coding gene models as a starting point for any query (
<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>C</italic>
). Some “draft” genome assemblies are released and loaded into CoGe with no available gene annotations. These genomes are automatically detected and excluded from the genome list (with information presented to the user as to why the genome is blocked from analysis by SynFind). In the configuration tab, users can select which algorithm to use for generating the homology pairs file as well as SynFind parameters: Window size, minimum number of genes to call a region syntenic, and the scoring scheme (colinear or density) (
<xref ref-type="fig" rid="evv219-F2">fig. 2</xref>
<italic>D</italic>
).
<fig id="evv219-F2" orientation="portrait" position="float">
<label>F
<sc>ig</sc>
. 2.—</label>
<caption>
<p>SynFind web UI. The web UI includes several components that users can interact with (
<italic>A</italic>
) find target genome and select target genome version, (
<italic>B</italic>
) build list of multiple target genomes, (
<italic>C</italic>
) input query gene, (
<italic>D</italic>
) set SynFind parameters.</p>
</caption>
<graphic xlink:href="evv219f2p"></graphic>
</fig>
</p>
<p>When SynFind completes its analysis, the results show a table of matching regions along with their synteny scores and whether or not a syntenic gene was identified (
<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>A</italic>
). Additional links are available under the table, including microsynteny analysis of the identified regions in GEvo for validation, pairwise syntenic dotplots in SynMap, links to raw data and intermediate data files, and a link to revisit and regenerate the same SynFind analysis (
<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>B</italic>
).
<fig id="evv219-F3" orientation="portrait" position="float">
<label>F
<sc>ig</sc>
. 3.—</label>
<caption>
<p>SynFind example output. The output of a typical SynFind search: (
<italic>A</italic>
) List of all syntenic regions found and presence of syntelog, (
<italic>B</italic>
) links for micro-synteny viewer (GEvo) and master tables for downstream analyses, (
<italic>C</italic>
) syntenic depth table useful for evaluating syntenic coverage and WGD events.</p>
</caption>
<graphic xlink:href="evv219f3p"></graphic>
</fig>
</p>
</sec>
<sec>
<title>Master Syntenic Pairs Table</title>
<p>SynFind identifies syntenic regions against any set of genomes given a gene in one genome, and curates the results in a master gene list. The pan-genome master list is important as this file contains all the syntenic regions identified in the target genomes for all of the genes in the query genome. The master list is a tab-delimited table, containing all syntenic gene sets between the query and target genomes, along with links to visualize microsynteny for each local set of region. As a filtering option, SynFind can also report top
<italic>N</italic>
best matches in query genome(s), which is useful to extract only orthologous regions that are often the best syntenic match when
<italic>N</italic>
is set to 1. As a byproduct of this master gene pairs table, SynFind reports a list of genes that are unique to some genomes. For example, in the case of comparing a set of bacterial strains, this feature can be used to find pathogenicity genes and phage insertions specific to one strain against others (
<xref rid="evv219-B46" ref-type="bibr">Tettelin et al. 2005</xref>
).</p>
</sec>
<sec>
<title>Syntenic Depth</title>
<p>Syntenic depth refers to the number of syntenic regions identified in a target genome for a given query position. SynFind calculates syntenic depth on a per gene basis and reports these data as a histogram, showing a breakdown of how many genes are covered in 1-, 2-, to
<italic>x</italic>
-fold regions (
<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>C</italic>
). Genes with a syntenic depth of zero are the genes that lack any matching region in the target genome. A syntenic depth of one most often reflects identification of an orthologous genomic region between two species, whereas a syntenic depth greater than 1 most often is the result of either paralogous or co-orthologous regions derived from whole-genome (or other large scale) duplications. Syntenic depth provides a more consistent marker for large scale genomic events than changes in the copy number of individual genes which are influenced by a greater number of small scale processes (expansion and contraction of tandem arrays, transposon capture and duplication, etc.). The proportion of genes with a syntenic depth of at least 1 is a useful metric for evaluating the relative completeness of genome assemblies, whereas modal and maximum syntenic depths are good indicators for the number of paleopolyploidies in a given lineage.</p>
<p>Plant genomes have rich history of genome-wide duplication events that give rise to very high level of syntenic depth (
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). For example, in comparison to
<italic>Arabidopsis</italic>
genome, both peach and grapevine genomes show significant genome coverage of depth up to 3 (
<xref ref-type="fig" rid="evv219-F3">fig. 3</xref>
<italic>C</italic>
), corresponding to the pan-rosid genome triplication event (
<xref rid="evv219-B29" ref-type="bibr">Lyons et al. 2008</xref>
;
<xref rid="evv219-B42" ref-type="bibr">Tang, Bowers, et al. 2008</xref>
). The syntenic depth evaluation of SynFind was employed to identify multiple degenerate polyploidy events in the highly compact plant genome, Utricularia (Ibarra-Laclette et al. 2013). Examples of various syntenic depth tables and their interpretation in the context of paleopolyploidy can be found on CoGePedia (
<ext-link ext-link-type="uri" xlink:href="http://genomevolution.org/r/4suf">http://genomevolution.org/r/4suf</ext-link>
, last accessed November 30, 2015).</p>
</sec>
</sec>
<sec>
<title>Results and Discussion</title>
<sec>
<title>Focused Analyses for Functionally Important Genes</title>
<p>We show that SynFind is powerful for gene-centric analyses through selected examples based on past studies, but the usage is generally applicable to almost any gene family members in any set of organisms available in the CoGe database. In the past, such comparative analyses would usually take much dedicated time and work—from downloading and reformatting data sets, performing sequence alignment, reformatting data again for use in synteny detection tools, identifying syntenic genes, selecting informative visualization software for manual validation, and performing multiple analyses to identify an optimal configuration of parameters and software tools—all of which can now be performed within the SynFind tool in a few clicks.</p>
<p>One natural application of SynFind is to deduce gene presence and absence across a set of related organisms. In the context of bacterial genomics, we can infer possible pathogenic sequences through syntenic comparisons (
<xref rid="evv219-B21" ref-type="bibr">Jin et al. 2002</xref>
;
<xref rid="evv219-B46" ref-type="bibr">Tettelin et al. 2005</xref>
). We used SynFind to compare three-way
<italic>Shigella flexneri</italic>
2a strain 301,
<italic>Escherichia coli</italic>
K12 substrain 1655 and
<italic>Escherichia coli</italic>
O157:H7 strain EDL933, in an analysis similar to the study in
<xref rid="evv219-B21" ref-type="bibr">Jin et al. (2002)</xref>
. When using
<italic>S. flexneri</italic>
genome as the query, we looked for the cases where SynFind reported either proxy in the two
<italic>E. coli</italic>
genomes, that is, the genes that were missing in their expected locations or for which expected regions could not be identified. This has allowed us to identify
<italic>Shigella</italic>
<italic>-</italic>
specific “islands.” In particular, one 27 gene island (from
<italic>SF0294</italic>
to
<italic>SF0320</italic>
) found only in the
<italic>Shigella</italic>
genome, previously termed SfII, was shown to be a lysogenic phage insertion, by which
<italic>Shigella</italic>
might have acquired virulence (Jin et al. 2002). Other interesting genes on these
<italic>Shigella</italic>
-specific islands include
<italic>ipaH</italic>
genes (e.g.,
<italic>SF0722</italic>
,
<italic>SF1383</italic>
,
<italic>SF1880</italic>
, and
<italic>SF2610</italic>
) that shared homology with different phages (Jin et al. 2002). The SynFind link to this analysis is available:
<ext-link ext-link-type="uri" xlink:href="https://genomevolution.org/r/fggo">https://genomevolution.org/r/fggo</ext-link>
(last accessed November 30, 2015).</p>
<p>As our second example, we use another previously studied gene involved in the soft grain trait in the grasses. Genes involved in the soft grain trait has been studied extensively in wheat, including the
<italic>Hardness</italic>
(
<italic>Ha</italic>
) locus and several
<italic>Ha</italic>
-like genes (
<xref rid="evv219-B8" ref-type="bibr">Charles et al. 2009</xref>
). SynFind analysis (Brachypodium genes as “query,” barley, rice, and sorghum as “target”) showed that
<italic>Ha</italic>
-like genes were present in Brachypodium representing the lineage of Pooideae, but were missing in rice and sorghum. For barley, rice and sorghum, SynFind output displays “proxy for region” rather than a direct syntelog (
<xref ref-type="fig" rid="evv219-F4">fig. 4</xref>
<italic>A</italic>
). With visual proofing using GEvo, we confirmed that there is a syntenic sequence match in barley, whereas there are no matching sequences in rice and sorghum as indicated by SynFind (
<xref ref-type="fig" rid="evv219-F4">fig. 4</xref>
<italic>B</italic>
). This suggested that the flanking regions of
<italic>Ha</italic>
-like gene were relatively intact whereas the gene itself has been lost in rice and sorghum. Alternatively, the gene could be inserted into this region in Brachypodium and barley. Although both scenarios are equally likely, previous study preferred the scenario that the gene was lost in rice and sorghum (
<xref rid="evv219-B8" ref-type="bibr">Charles et al. 2009</xref>
). With SynFind tool, we have confirmed that the presence or absence of the
<italic>Ha</italic>
-like gene in this set of syntenic regions nicely explains the soft wheat and barley grains versus the hard grains like in rice and sorghum.
<fig id="evv219-F4" orientation="portrait" position="float">
<label>F
<sc>ig</sc>
. 4.—</label>
<caption>
<p>SynFind analysis of
<italic>Ha</italic>
-like gene across Brachypodium, barley, rice, sorghum. (
<italic>A</italic>
) SynFind table output illustrating four matching regions in the selected grasses. Result can be regenerated:
<ext-link ext-link-type="uri" xlink:href="https://genomevolution.org/r/iiv4">https://genomevolution.org/r/iiv4</ext-link>
(last accessed November 30, 2015). (
<italic>B</italic>
) GEvo visualization of the compiled syntenic regions, showing the presence of a syntenic sequence in barley, and lack of syntenic ortholog in
<italic>Ha</italic>
-like gene in rice and sorghum. Each panel represents a syntenic region in Brachypodium, barley, rice, and sorghum, from top to bottom. Arrows in each panel represent gene models, and boxes on top of the gene models are sequence matches (HSPs). For the top Brachypodium panel, there are three tracks of HSPs, which are to barley, to rice and to sorghum, respectively. We can conclude that the
<italic>Ha</italic>
-like gene in Brachypodium has match to barley and no match to rice and sorghum. Result can be regenerated:
<ext-link ext-link-type="uri" xlink:href="https://genomevolution.org/r/iivx">https://genomevolution.org/r/iivx</ext-link>
(last accessed November 30, 2015).</p>
</caption>
<graphic xlink:href="evv219f4p"></graphic>
</fig>
</p>
<p>In addition to the two examples shown above for the purpose of demonstration, SynFind has enabled a number of evolutionary studies of important functional genes in diverse lineages (
<xref rid="evv219-B53" ref-type="bibr">Woodhouse et al. 2010</xref>
;
<xref rid="evv219-B44" ref-type="bibr">Tang and Lyons 2012</xref>
;
<xref rid="evv219-B19" ref-type="bibr">Hofberger et al. 2013</xref>
;
<xref rid="evv219-B51" ref-type="bibr">Waters et al. 2013</xref>
). For example, SynFind was used to screen regions in the
<italic>Aethionema arabicum</italic>
genome displaying synteny to genomic regions in
<italic>Arabidopsis thaliana</italic>
harboring glucosinolate biosynthesis (GS) loci (Hofberger et al. 2013). SynFind was essential in clarifying the series of tandem duplication and WGD events that drove GS pathway expansion, which were critical to the evolutionary success to the mustard family (Hofberger et al. 2013). Also, SynFind was essential for proving that the genome of
<italic>Utricularia gibba</italic>
, despite is small size (82 MB), is derived from three sequential WGD events (Ibarra-Laclette et al. 2013).</p>
</sec>
<sec>
<title>Quality of Homology Assignments and Benchmark of SynFind against Competing Tools</title>
<p>Clade-wide syntenic gene sets are useful for detecting genome-wide transposition and deletion events (
<xref rid="evv219-B53" ref-type="bibr">Woodhouse et al. 2010</xref>
;
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
), and automation of this step could be essential in such studies. We have benchmarked SynFind against a number of studies that typically require a substantial amount of human curation to complete. Although the human curated gene sets are still imperfect and subject to errors, they serve as a basis for comparing between different synteny search tools including SynFind. In this study, we evaluate the performance of SynFind and compare that with competing software including MCScanX and iADHoRe, which are the two most popular state-of-the-art tools that perform well in a number of studies (
<xref rid="evv219-B34" ref-type="bibr">Proost et al. 2012</xref>
;
<xref rid="evv219-B50" ref-type="bibr">Wang et al. 2012</xref>
).</p>
<p>Our first set of test data is a list of WGD duplicates from
<italic>A. thaliana</italic>
curated by
<xref rid="evv219-B4" ref-type="bibr">Bowers et al. (2003)</xref>
. This list contains a total of 5,788 gene duplicates collectively derived from the alpha, beta, and gamma WGDs (
<xref rid="evv219-B4" ref-type="bibr">Bowers et al. 2003</xref>
). Our second data set is based on comparison of yeast genomes, using data from Yeast Gene Order Browser (YGOB) (
<xref rid="evv219-B5" ref-type="bibr">Byrne and Wolfe 2005</xref>
). We were able to find 14 yeast genomes in the CoGe system, whereas a few yeast species in YGOB were not yet released to GenBank with structural gene annotations and therefore not included in this study. YGOB uses “pillars” to store homology assignments (Byrne and Wolfe 2005), which were converted to gene pairs for validation purposes. Finally, as the third test set, we used a pan-grass synteny gene set curated by
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. (2012)</xref>
. Schnable et al. manually clustered and curated gene members from rice, Brachypodium, sorghum, and maize according to inter- and intragenomic comparisons (
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
). A typical set of syntenic genes in the Schnable set contain up to 2 rice genes, up to 2 Brachypodium genes, and up to 2 sorghum genes all derived from the shared pan-grass WGD, and up to 4 maize genes because of an additional maize-specific WGD. Similarly, we converted families into a list of gene pairs before validation. The choice of these data sets is based on the availability of curated data sets, and inclusion of gene sets with both paralogous and orthologous relationships.</p>
<p>For SynFind, MCScanX, and iADHoRe, we computed the syntenic gene list and compared against the curated set, which are considered as “truth” (
<xref ref-type="fig" rid="evv219-F5">fig. 5</xref>
). Two metrics are computed—“Sensitivity” (Sn) is defined as common items divided by total items in truth set; “Purity” (Pu) is defined as common items divided by total items in the test set as can be used to infer false-positive discovery. SynFind consistently ranks the highest in sensitivity, recovering 63%, 75%, and 61% of the items in the truth set (
<xref ref-type="fig" rid="evv219-F5">fig. 5</xref>
). As a tradeoff, the purity of SynFind results compare less favorably than the other tools (
<xref ref-type="fig" rid="evv219-F5">fig. 5</xref>
). As we have designed SynFind as a gene-centric query tool, this benchmark reflects our focus on sensitivity—we would tolerate some false positives but prefer to have low false negatives. Differences in the treatments of tandem gene sets may have contributed to the nonoverlapping members—SynFind, MCScanX, and iADHoRe may have picked a single matching gene within the array which is not necessarily the tandem member in the curated set.
<fig id="evv219-F5" orientation="portrait" position="float">
<label>F
<sc>ig</sc>
. 5.—</label>
<caption>
<p>Comparison of SynFind, MCScanX, and iADHoRe on curated data sets. (
<italic>A</italic>
)
<italic>Arabidopsis thaliana</italic>
alpha, beta, and gamma duplicates from Bowers et al. (2003). (
<italic>B</italic>
) Yeast genomes from YGOB (Byrne and Wolfe 2005). (
<italic>C</italic>
) Grass genomes from
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. (2012)</xref>
. Sn: sensitivity, defined as common items divided by total items in truth set; Pu: Purity, defined as common items divided by total items in the test set.</p>
</caption>
<graphic xlink:href="evv219f5p"></graphic>
</fig>
</p>
<p>The list of predicted locations for missing genes is often good indication of potential loss-of-function, which could be associated with differences in phenotypic and physiological traits between grasses, as illustrated in our
<italic>Ha</italic>
example. Missing genes in one grass genome versus others could also suggest possible misassemblies, leading to iterative improvement of genome assemblies and recovery of missing gene fragments in genome annotation efforts (Law et al. 2015).</p>
</sec>
<sec>
<title>Integration with CoGe Comparative Genomics Platform</title>
<p>Integration in CoGe permits SynFind to be tightly connected to thousands of genomes as well as to downstream analysis tools such as GEvo (
<xref rid="evv219-B28" ref-type="bibr">Lyons and Freeling 2008</xref>
) and SynMap (
<xref rid="evv219-B29" ref-type="bibr">Lyons et al. 2008</xref>
) for micro and whole-genome syntenic analysis, respectively. The method for selecting query and target genomes loads the same module. SynFind automatically generates links to GEvo views for gene-centric analyses as well as SynMap views for chromosome-level analyses. The open-ended analysis workflow provides the users with enough flexibility between tools of different scales. In addition, CoGe’s user-data management systems let researches add private genomes and share them with collaborators, create lists (notebooks) of genomes that can be imported quickly into SynFind, and automatically record links to regenerate any analysis performed.</p>
<p>The CoGe job execution (JEX) framework facilitates parallel processing of queries against multiple genomes by using Work Queue (
<xref rid="evv219-B47" ref-type="bibr">Thrasher et al. 2012</xref>
) (
<xref ref-type="fig" rid="evv219-F6">fig. 6</xref>
). When a SynFind analysis runs, each pairwise workflow consisting of separate query-target genome pairs is submitted to CoGe’s JEX framework. The JEX framework controls the parallel computing in processing multiple genomes (
<xref ref-type="fig" rid="evv219-F6">fig. 6</xref>
). It first checks to see whether the anticipated results file already exists and retrieves that file if it does, otherwise, it submits the analysis for processing and subsequently caches the results file. This system permits reusing the results of previously run analysis as well as running multiple workflows in parallel. For example, in contrast to other gene clustering approaches, new genomes can be incrementally added to the target list and the CoGe server would only need to compute the missing comparisons. Overall, this greatly improves the performance of the system in terms of the time it takes to complete an analysis. Additionally, if a user decides to modify and rerun an analysis, recomputation starts from the first divergent step of the analysis, while reusing data from earlier, identically configured steps, allowing fast tweaking of parameters.
<fig id="evv219-F6" orientation="portrait" position="float">
<label>F
<sc>ig</sc>
. 6.—</label>
<caption>
<p>SynFind computational workflow as implemented on CoGe. The query genome and target list of genomes are processed in parallel—extracting coding sequences, building homology lists, filtering tandem repeats, and running SynFind algorithm. The last step assembles the processed data into a master table. This strategy is similar to the “Map-Reduce” paradigm used in parallel computing.</p>
</caption>
<graphic xlink:href="evv219f6p"></graphic>
</fig>
</p>
<p>The scale of analysis in comparative genomics is an important issue. Although SynMap excels in identifying large-scale structural similarities, it lacks the gene-centric searches where researchers just want to study their genes of interest across a set of genomes. This conceptual difference was often referred to as “macrosynteny” versus “microsynteny” analyses in comparative genomics. Microsynteny search tools, such as SynFind, achieve higher sensitivity and more flexibility for gene-centric research. Although SynMap is necessarily constrained to making pairwise comparisons between genomes, SynFind can simultaneously launch comparisons of multiple genomes. Additionally, SynFind identifies syntenic locations even when the gene itself is absent, either as a result of lineage-specific gene deletion or lineage-specific gene insertion. Analyses based on SynMap output required substantial customized offline postprocessing and analysis to generate equivalent predicted locations (
<xref rid="evv219-B39" ref-type="bibr">Schnable et al. 2012</xref>
). Importantly, both of these tools permit on-the-fly analyses and allow direct manipulation of parameters (e.g., higher or lower stringency, such as window size and “score cutoff”), and are interconnected in order to characterize and validate patterns of genome structure and dynamics.</p>
<p>A typical exploratory workflow that we recommend would be to 1) use SynMap to characterize genome-wide rearrangements and possibly genome duplications, 2) zoom-in on a pair of contigs or chromosomes with interesting rearrangement or duplication pattern, 3) select a gene to fish out additional syntenic regions using SynFind, and 4) validate putatively syntenic regions using GEvo to ensure that each region covered the entire region of interest. In real-world applications, the combination of SynFind and SynMap can both be applied to offer complementary views. For example, in a study of conservation of imprinting across a set of grass taxa, gene-level comparisons were made between syntenic genes in the genomes of maize, rice, and sorghum using the software SynMap followed by SynFind to offer the most coverage (
<xref rid="evv219-B51" ref-type="bibr">Waters et al. 2013</xref>
).</p>
</sec>
<sec>
<title>Scalable and Sustainable Infrastructure for Gene-Centric Evolutionary Study</title>
<p>The SynFind algorithm addresses important limitations and challenges in the postgenomics era. Researchers have access to large and inexpensive sequencing power making it possible to study genetic and genomic evolution across whole clades of species rather than being confined to individual model organisms. However, in order to unlock the potential power of comparative genomic approaches to accelerate studies of the origin, regulation, and function of individual genes it is necessary to enable the broadest possible range of scientists to make direct comparisons across the genomes of large groups of related species. Online computational resources, such as CoGe, create ecosystems of specialized applications that are easily linked to and from one another. Similarly, resources developed by cyberinfrastructure projects such as the iPlant Collaborative (
<xref rid="evv219-B15" ref-type="bibr">Goff et al. 2011</xref>
) and XSEDE provide computational platforms that enable scalable access to computing and data storage resources.</p>
<p>The development of computational ecosystems which will be successful in bringing about a democratization of bioinformatics research requires the deployment of modular analysis pipelines that allow each new tool to exploit existing computational resources, architectures, and curated data sets. SynFind joins the increasing list of CoGe-powered and iPlant-enabled applications (Goff et al. 2011), which already include GEvo, SynMap, and many others. The availability of SynFind will begin to merge the two analytical worlds of comparative and functional genomics such that researchers can more easily transfer system-level functional knowledge from data-rich model organisms to the thousands of others organisms being analyzed by only a handful of scientists. Conversely, SynFind enables comparative, in silico studies across a wide range of species to inform the study of specific genes within model organisms, where even today 30–34% of all genes have no annotated function (data from
<italic>Arabidopsis thaliana</italic>
, as cited in the
<ext-link ext-link-type="uri" xlink:href="https://www.whitehouse.gov/sites/default/files/microsites/ostp/NSTC/npgi_five-year_plan_5-2014.pdf">National Plant Genome Initiative 2014 report</ext-link>
).</p>
</sec>
</sec>
<sec sec-type="conclusions">
<title>Conclusions</title>
<p>SynFind fills the current gap of algorithm that performs syntenic gene queries and compiles matching set of genomic regions on-the-fly. SynFind identifies all syntenic regions to a given gene in a user-selected set of genomes, regardless of whether the gene is still present in that region. SynFind is powered by an algorithm that calculates synteny score between a pair of regions. Performance-wise, SynFind has higher sensitivity but lower purity compared with competing tools when validated against manually curated sets. Feature-wise, SynFind contains several key functions not typically found in existing systems (
<xref ref-type="table" rid="evv219-T1">table 1</xref>
). Integrated with the CoGe online platform and powered by the iPlant project, syntenic queries can now be performed in an interactive manner and retrieved for downstream analyses through SynFind in a scalable and reproducible manner. SynFind is an important tool for assessing genome dynamics including gene transpositions, impact of genome duplications, and correlation to functional changes across a set of related taxa of interest.</p>
</sec>
<sec>
<title>Data Availability</title>
<p>SynFind is available for use through a web-based interface in CoGe. Data sets used in benchmarking SynFind with related tools are available on figshare with the following public DOI:
<list list-type="bullet">
<list-item>
<p>Tang, Haibao (2015): SynFind supporting data: Benchmark on three curated syntenic gene sets. figshare.
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1589735">http://dx.doi.org/10.6084/m9.figshare.1589735</ext-link>
(last accessed November 30, 2015)</p>
</list-item>
</list>
</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The authors thank the Fujian provincial government for a Fujian “100 Talent Plan” award to H.T. E.L. is supported by the Gordon and Betty Moore Foundation grant number 3383 and the National Science Foundation grant number DBI – 1265383. iPlant is supported by the National Science Foundation under grant numbers DBI-0735191 and DBI-1265383. They also thank Zhenghui Zhong for providing help in benchmarking the performance of SynFind. They declare that they have no competing interests.</p>
</ack>
<ref-list>
<title>Literature Cited</title>
<ref id="evv219-B1">
<mixed-citation publication-type="journal">
<collab>Amborella Genome Project</collab>
.
<year>2013</year>
<article-title>The Amborella genome and the evolution of flowering plants</article-title>
.
<source>Science</source>
<volume>342</volume>
:
<fpage>1241089</fpage>
.
<pub-id pub-id-type="pmid">24357323</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B2">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barbaglia</surname>
<given-names>AM</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>Gene capture by Helitron transposons reshuffles the transcriptome of maize</article-title>
.
<source>Genetics</source>
<volume>190</volume>
:
<fpage>965</fpage>
<lpage>975</lpage>
.
<pub-id pub-id-type="pmid">22174072</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B3">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baxter</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants</article-title>
.
<source>Plant Cell</source>
<volume>24</volume>
:
<fpage>3949</fpage>
<lpage>3965</lpage>
.
<pub-id pub-id-type="pmid">23110901</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bowers</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Rong</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Paterson</surname>
<given-names>AH</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events</article-title>
.
<source>Nature</source>
<volume>422</volume>
:
<fpage>433</fpage>
<lpage>438</lpage>
.
<pub-id pub-id-type="pmid">12660784</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B5">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Byrne</surname>
<given-names>KP</given-names>
</name>
<name>
<surname>Wolfe</surname>
<given-names>KH</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species</article-title>
.
<source>Genome Res.</source>
<volume>15</volume>
:
<fpage>1456</fpage>
<lpage>1461</lpage>
.
<pub-id pub-id-type="pmid">16169922</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cai</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Tuskan</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>ZM</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>MicroSyn: a user friendly tool for detection of microsynteny in a gene family</article-title>
.
<source>BMC Bioinformatics</source>
<volume>12</volume>
:
<fpage>79</fpage>
.
<pub-id pub-id-type="pmid">21418570</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B7">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chalhoub</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<year>2014</year>
<article-title>Early allopolyploid evolution in the post-Neolithic
<italic>Brassica napus</italic>
oilseed genome</article-title>
.
<source>Science</source>
<volume>345</volume>
:
<fpage>950</fpage>
<lpage>953</lpage>
.
<pub-id pub-id-type="pmid">25146293</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Charles</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>Sixty million years in evolution of soft grain trait in grasses: emergence of the softness locus in the common ancestor of Pooideae and Ehrhartoideae, after their divergence from Panicoideae</article-title>
.
<source>Mol Biol Evol.</source>
<volume>26</volume>
:
<fpage>1651</fpage>
<lpage>1661</lpage>
.
<pub-id pub-id-type="pmid">19395588</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B9">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Davidson</surname>
<given-names>RM</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>Comparative transcriptomics of three Poaceae species reveals patterns of gene expression evolution</article-title>
.
<source>Plant J.</source>
<volume>71</volume>
:
<fpage>492</fpage>
<lpage>502</lpage>
.
<pub-id pub-id-type="pmid">22443345</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B10">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dewey</surname>
<given-names>CN</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Positional orthology: putting genomic evolutionary relationships into context</article-title>
.
<source>Brief Bioinformatics</source>
<volume>12</volume>
:
<fpage>401</fpage>
<lpage>412</lpage>
.
<pub-id pub-id-type="pmid">21705766</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B11">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dong</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Fredman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lenhard</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Synorth: exploring the evolution of synteny and long-range regulatory interactions in vertebrate genomes</article-title>
.
<source>Genome Biol.</source>
<volume>10</volume>
:
<fpage>R86</fpage>
.
<pub-id pub-id-type="pmid">19698106</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B12">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Engstrom</surname>
<given-names>PG</given-names>
</name>
<name>
<surname>Ho Sui</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Drivenes</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Becker</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Lenhard</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Genomic regulatory blocks underlie extensive microsynteny conservation in insects</article-title>
.
<source>Genome Res.</source>
<volume>17</volume>
:
<fpage>1898</fpage>
<lpage>1908</lpage>
.
<pub-id pub-id-type="pmid">17989259</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B13">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Many or most genes in
<italic>Arabidopsis</italic>
transposed after the origin of the order Brassicales</article-title>
.
<source>Genome Res.</source>
<volume>18</volume>
:
<fpage>1924</fpage>
<lpage>1937</lpage>
.
<pub-id pub-id-type="pmid">18836034</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B14">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ghiurcuta</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Moret</surname>
<given-names>BM</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Evaluating synteny for improved comparative studies</article-title>
.
<source>Bioinformatics</source>
<volume>30</volume>
:
<fpage>i9</fpage>
<lpage>i18</lpage>
.
<pub-id pub-id-type="pmid">24932010</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B15">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goff</surname>
<given-names>SA</given-names>
</name>
<etal></etal>
</person-group>
<year>2011</year>
<article-title>The iPlant collaborative: cyberinfrastructure for plant biology</article-title>
.
<source>Front Plant Sci.</source>
<volume>2</volume>
:
<fpage>34</fpage>
.
<pub-id pub-id-type="pmid">22645531</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B16">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Green</surname>
<given-names>RE</given-names>
</name>
<etal></etal>
</person-group>
<year>2014</year>
<article-title>Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs</article-title>
.
<source>Science</source>
<volume>346</volume>
:
<fpage>1254449</fpage>
.
<pub-id pub-id-type="pmid">25504731</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B17">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haudry</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<year>2013</year>
<article-title>An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions</article-title>
.
<source>Nat Genet.</source>
<volume>45</volume>
:
<fpage>891</fpage>
<lpage>898</lpage>
.
<pub-id pub-id-type="pmid">23817568</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B18">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heger</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ponting</surname>
<given-names>CP</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes</article-title>
.
<source>Genome Res.</source>
<volume>17</volume>
:
<fpage>1837</fpage>
<lpage>1849</lpage>
.
<pub-id pub-id-type="pmid">17989258</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B19">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hofberger</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Edger</surname>
<given-names>PP</given-names>
</name>
<name>
<surname>Chris Pires</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Eric Schranz</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Whole genome and tandem duplicate retention facilitated glucosinolate pathway diversification in the mustard family</article-title>
.
<source>Genome Biol Evol.</source>
<volume>5</volume>
:
<fpage>2155</fpage>
<lpage>2173</lpage>
.
<pub-id pub-id-type="pmid">24171911</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B20">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ibarra-Laclette</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<year>2013</year>
<article-title>Architecture and evolution of a minute plant genome</article-title>
.
<source>Nature</source>
<volume>498</volume>
:
<fpage>94</fpage>
<lpage>98</lpage>
.
<pub-id pub-id-type="pmid">23665961</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B21">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jin</surname>
<given-names>Q</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Genome sequence of
<italic>Shigella flexneri</italic>
2a: insights into pathogenicity through comparison with genomes of
<italic>Escherichia coli</italic>
K12 and O157</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>30</volume>
:
<fpage>4432</fpage>
<lpage>4441</lpage>
.
<pub-id pub-id-type="pmid">12384590</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B22">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kielbasa</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Wan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Horton</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>MC</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Adaptive seeds tame genomic sequence comparison</article-title>
.
<source>Genome Res.</source>
<volume>21</volume>
:
<fpage>487</fpage>
<lpage>493</lpage>
.
<pub-id pub-id-type="pmid">21209072</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B23">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lai</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Messing</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dooner</surname>
<given-names>HK</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Gene movement by Helitron transposons contributes to the haplotype variability of maize</article-title>
.
<source>Proc Natl Acad Sci U S A.</source>
<volume>102</volume>
:
<fpage>9068</fpage>
<lpage>9073</lpage>
.
<pub-id pub-id-type="pmid">15951422</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B24">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Law</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2015</year>
<article-title>Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes</article-title>
.
<source>Plant Physiol.</source>
<volume>167</volume>
:
<fpage>25</fpage>
<lpage>39</lpage>
.
<pub-id pub-id-type="pmid">25384563</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B25">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stoeckert</surname>
<given-names>CJ</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Roos</surname>
<given-names>DS</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>OrthoMCL: identification of ortholog groups for eukaryotic genomes</article-title>
.
<source>Genome Res.</source>
<volume>13</volume>
:
<fpage>2178</fpage>
<lpage>2189</lpage>
.
<pub-id pub-id-type="pmid">12952885</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B26">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ling</surname>
<given-names>X</given-names>
</name>
<name>
<surname>He</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Xin</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Detecting gene clusters under evolutionary constraint in a large number of genomes</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
:
<fpage>571</fpage>
<lpage>577</lpage>
.
<pub-id pub-id-type="pmid">19158161</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B27">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Lohr</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2014 Aug 18</year>
<article-title>For big-data scientists, “Janitor Work” is key hurdle to insights</article-title>
.
<italic>The New York Times</italic>
<publisher-name>New York City</publisher-name>
<comment>Available from:
<ext-link ext-link-type="uri" xlink:href="http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0">http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=0</ext-link>
</comment>
.</mixed-citation>
</ref>
<ref id="evv219-B28">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>How to usefully compare homologous plant genes and chromosomes as DNA sequences</article-title>
.
<source>Plant J.</source>
<volume>53</volume>
:
<fpage>661</fpage>
<lpage>673</lpage>
.
<pub-id pub-id-type="pmid">18269575</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B29">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kane</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>The value of nonmodel genomes and an example using synmap within coge to dissect the hexaploidy that predates the rosids</article-title>
.
<source>Trop Plant Biol.</source>
<volume>1</volume>
:
<fpage>181</fpage>
<lpage>190</lpage>
.</mixed-citation>
</ref>
<ref id="evv219-B30">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moreno-Hagelsieb</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Trevino</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Perez-Rueda</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>TF</given-names>
</name>
<name>
<surname>Collado-Vides</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Transcription unit conservation in the three domains of life: a perspective from
<italic>Escherichia coli</italic>
</article-title>
.
<source>Trends Genet.</source>
<volume>17</volume>
:
<fpage>175</fpage>
<lpage>177</lpage>
.
<pub-id pub-id-type="pmid">11275307</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B31">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ng</surname>
<given-names>MP</given-names>
</name>
<etal></etal>
</person-group>
<year>2009</year>
<article-title>OrthoClusterDB: an online platform for synteny blocks</article-title>
.
<source>BMC Bioinformatics</source>
<volume>10</volume>
:
<fpage>192</fpage>
.
<pub-id pub-id-type="pmid">19549318</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B32">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ostlund</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<year>2010</year>
<article-title>InParanoid 7: new algorithms and tools for eukaryotic orthology analysis</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>38</volume>
:
<fpage>D196</fpage>
<lpage>D203</lpage>
.
<pub-id pub-id-type="pmid">19892828</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B33">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Poyatos</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Hurst</surname>
<given-names>LD</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>The determinants of gene order conservation in yeasts</article-title>
.
<source>Genome Biol.</source>
<volume>8</volume>
:
<fpage>R233</fpage>
.
<pub-id pub-id-type="pmid">17983469</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B34">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Proost</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>40</volume>
:
<fpage>e11</fpage>
.
<pub-id pub-id-type="pmid">22102584</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B35">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Revanna</surname>
<given-names>KV</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>A web-based multi-genome synteny viewer for customized data</article-title>
.
<source>BMC Bioinformatics</source>
<volume>13</volume>
:
<fpage>190</fpage>
.
<pub-id pub-id-type="pmid">22856879</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B36">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodelsperger</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Dieterich</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>CYNTENATOR: progressive gene order alignment of 17 vertebrate genomes</article-title>
.
<source>PLoS One</source>
<volume>5</volume>
:
<fpage>e8861</fpage>
.
<pub-id pub-id-type="pmid">20126624</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B37">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schnable</surname>
<given-names>JC</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Genome evolution in maize: from genomes back to genes</article-title>
.
<source>Annu Rev Plant Biol.</source>
<volume>66</volume>
:
<fpage>329</fpage>
<lpage>343</lpage>
.
<pub-id pub-id-type="pmid">25494463</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B38">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schnable</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Genes identified by visible mutant phenotypes show increased bias toward one of two subgenomes of maize</article-title>
.
<source>PLoS One</source>
<volume>6</volume>
:
<fpage>e17855</fpage>
.
<pub-id pub-id-type="pmid">21423772</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B39">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schnable</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Genome-wide analysis of syntenic gene deletion in the grasses</article-title>
.
<source>Genome Biol Evol.</source>
<volume>4</volume>
:
<fpage>265</fpage>
<lpage>277</lpage>
.
<pub-id pub-id-type="pmid">22275519</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B40">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinha</surname>
<given-names>AU</given-names>
</name>
<name>
<surname>Meller</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms</article-title>
.
<source>BMC Bioinformatics</source>
<volume>8</volume>
:
<fpage>82</fpage>
.
<pub-id pub-id-type="pmid">17343765</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B41">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soderlund</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bomhoff</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>WM</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>SyMAP v3.4: a turnkey synteny system with application to plant genomes</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>39</volume>
:
<fpage>e68</fpage>
.
<pub-id pub-id-type="pmid">21398631</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B42">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bowers</surname>
<given-names>JE</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Synteny and collinearity in plant genomes</article-title>
.
<source>Science</source>
<volume>320</volume>
:
<fpage>486</fpage>
<lpage>488</lpage>
.
<pub-id pub-id-type="pmid">18436778</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B43">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<year>2011</year>
<article-title>Screening synteny blocks in pairwise genome comparisons through integer programming</article-title>
.
<source>BMC Bioinformatics</source>
<volume>12</volume>
:
<fpage>102</fpage>
.
<pub-id pub-id-type="pmid">21501495</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B44">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Unleashing the genome of
<italic>Brassica rapa</italic>
</article-title>
.
<source>Front Plant Sci.</source>
<volume>3</volume>
:
<fpage>172</fpage>
.
<pub-id pub-id-type="pmid">22866056</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B45">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps</article-title>
.
<source>Genome Res.</source>
<volume>18</volume>
:
<fpage>1944</fpage>
<lpage>1954</lpage>
.
<pub-id pub-id-type="pmid">18832442</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B46">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tettelin</surname>
<given-names>H</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Genome analysis of multiple pathogenic isolates of
<italic>Streptococcus agalactiae</italic>
: implications for the microbial “pan-genome.”</article-title>
<source>Proc Natl Acad Sci U S A.</source>
.
<volume>102</volume>
:
<fpage>13950</fpage>
<lpage>13955</lpage>
.
<pub-id pub-id-type="pmid">16172379</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B47">
<mixed-citation publication-type="confproc">
<person-group person-group-type="editor">
<name>
<surname>Thrasher</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Thain</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Emrich</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Musgrave</surname>
<given-names>Z</given-names>
</name>
</person-group>
, editors.
<comment>Computational advances in bio and medical sciences (ICCABS). 2012 IEEE 2nd International Conference on 2012 Feb 23–25. University of Las Vegas (Nevada): ICCABS</comment>
.</mixed-citation>
</ref>
<ref id="evv219-B48">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vergara</surname>
<given-names>IA</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>N</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Large synteny blocks revealed between
<italic>Caenorhabditis elegans</italic>
and
<italic>Caenorhabditis briggsae</italic>
genomes using OrthoCluster</article-title>
.
<source>BMC Genomics</source>
<volume>11</volume>
:
<fpage>516</fpage>
.
<pub-id pub-id-type="pmid">20868500</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B49">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Statistical inference of chromosomal homology based on gene colinearity and applications to
<italic>Arabidopsis</italic>
and rice</article-title>
.
<source>BMC Bioinformatics</source>
<volume>7</volume>
:
<fpage>447</fpage>
.
<pub-id pub-id-type="pmid">17038171</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B50">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<year>2012</year>
<article-title>MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>40</volume>
:
<fpage>e49</fpage>
.
<pub-id pub-id-type="pmid">22217600</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B51">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Waters</surname>
<given-names>AJ</given-names>
</name>
<etal></etal>
</person-group>
<year>2013</year>
<article-title>Comprehensive analysis of imprinted genes in maize reveals allelic variation for imprinting and limited conservation with other species</article-title>
.
<source>Proc Natl Acad Sci U S A.</source>
<volume>110</volume>
:
<fpage>19639</fpage>
<lpage>19644</lpage>
.
<pub-id pub-id-type="pmid">24218619</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B52">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wolfe</surname>
<given-names>KH</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Yesterday’s polyploids and the mystery of diploidization</article-title>
.
<source>Nat Rev Genet.</source>
<volume>2</volume>
:
<fpage>333</fpage>
<lpage>341</lpage>
.
<pub-id pub-id-type="pmid">11331899</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B53">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woodhouse</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Transposed genes in
<italic>Arabidopsis</italic>
are often associated with flanking repeats</article-title>
.
<source>PLoS Genet.</source>
<volume>6</volume>
:
<fpage>e1000949</fpage>
.
<pub-id pub-id-type="pmid">20485521</pub-id>
</mixed-citation>
</ref>
<ref id="evv219-B54">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woodhouse</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Freeling</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Different gene families in
<italic>Arabidopsis thaliana</italic>
transposed in different epochs and at different frequencies throughout the rosids</article-title>
.
<source>Plant Cell</source>
<volume>23</volume>
:
<fpage>4241</fpage>
<lpage>4253</lpage>
.
<pub-id pub-id-type="pmid">22180627</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000059  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000059  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024