Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Comparative context analysis of codon pairs on an ORFeome scale

Identifieur interne : 000258 ( Pmc/Corpus ); précédent : 000257; suivant : 000259

Comparative context analysis of codon pairs on an ORFeome scale

Auteurs : Gabriela Moura ; Miguel Pinheiro ; Raquel Silva ; Isabel Miranda ; Vera Afreixo ; Gaspar Dias ; Adelaide Freitas ; José L. Oliveira ; Manuel As Santos

Source :

RBID : PMC:1088947

Abstract

We have developed a system for comparative codon context analysis of open reading frames in whole genomes, providing insights into the rules that govern the evolution of codon-pair context.


Url:
DOI: 10.1186/gb-2005-6-3-r28
PubMed: 15774029
PubMed Central: 1088947

Links to Exploration step

PMC:1088947

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Comparative context analysis of codon pairs on an ORFeome scale</title>
<author>
<name sortKey="Moura, Gabriela" sort="Moura, Gabriela" uniqKey="Moura G" first="Gabriela" last="Moura">Gabriela Moura</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pinheiro, Miguel" sort="Pinheiro, Miguel" uniqKey="Pinheiro M" first="Miguel" last="Pinheiro">Miguel Pinheiro</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Raquel" sort="Silva, Raquel" uniqKey="Silva R" first="Raquel" last="Silva">Raquel Silva</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Miranda, Isabel" sort="Miranda, Isabel" uniqKey="Miranda I" first="Isabel" last="Miranda">Isabel Miranda</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Afreixo, Vera" sort="Afreixo, Vera" uniqKey="Afreixo V" first="Vera" last="Afreixo">Vera Afreixo</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dias, Gaspar" sort="Dias, Gaspar" uniqKey="Dias G" first="Gaspar" last="Dias">Gaspar Dias</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Freitas, Adelaide" sort="Freitas, Adelaide" uniqKey="Freitas A" first="Adelaide" last="Freitas">Adelaide Freitas</name>
<affiliation>
<nlm:aff id="I3">Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose L" sort="Oliveira, Jose L" uniqKey="Oliveira J" first="José L" last="Oliveira">José L. Oliveira</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Santos, Manuel As" sort="Santos, Manuel As" uniqKey="Santos M" first="Manuel As" last="Santos">Manuel As Santos</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">15774029</idno>
<idno type="pmc">1088947</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1088947</idno>
<idno type="RBID">PMC:1088947</idno>
<idno type="doi">10.1186/gb-2005-6-3-r28</idno>
<date when="2005">2005</date>
<idno type="wicri:Area/Pmc/Corpus">000258</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000258</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Comparative context analysis of codon pairs on an ORFeome scale</title>
<author>
<name sortKey="Moura, Gabriela" sort="Moura, Gabriela" uniqKey="Moura G" first="Gabriela" last="Moura">Gabriela Moura</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pinheiro, Miguel" sort="Pinheiro, Miguel" uniqKey="Pinheiro M" first="Miguel" last="Pinheiro">Miguel Pinheiro</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Raquel" sort="Silva, Raquel" uniqKey="Silva R" first="Raquel" last="Silva">Raquel Silva</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Miranda, Isabel" sort="Miranda, Isabel" uniqKey="Miranda I" first="Isabel" last="Miranda">Isabel Miranda</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Afreixo, Vera" sort="Afreixo, Vera" uniqKey="Afreixo V" first="Vera" last="Afreixo">Vera Afreixo</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Dias, Gaspar" sort="Dias, Gaspar" uniqKey="Dias G" first="Gaspar" last="Dias">Gaspar Dias</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Freitas, Adelaide" sort="Freitas, Adelaide" uniqKey="Freitas A" first="Adelaide" last="Freitas">Adelaide Freitas</name>
<affiliation>
<nlm:aff id="I3">Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose L" sort="Oliveira, Jose L" uniqKey="Oliveira J" first="José L" last="Oliveira">José L. Oliveira</name>
<affiliation>
<nlm:aff id="I2">Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Santos, Manuel As" sort="Santos, Manuel As" uniqKey="Santos M" first="Manuel As" last="Santos">Manuel As Santos</name>
<affiliation>
<nlm:aff id="I1">Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Genome Biology</title>
<idno type="ISSN">1465-6906</idno>
<idno type="eISSN">1465-6914</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>We have developed a system for comparative codon context analysis of open reading frames in whole genomes, providing insights into the rules that govern the evolution of codon-pair context.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Genome Biol</journal-id>
<journal-title>Genome Biology</journal-title>
<issn pub-type="ppub">1465-6906</issn>
<issn pub-type="epub">1465-6914</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">15774029</article-id>
<article-id pub-id-type="pmc">1088947</article-id>
<article-id pub-id-type="publisher-id">gb-2005-6-3-r28</article-id>
<article-id pub-id-type="doi">10.1186/gb-2005-6-3-r28</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Method</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Comparative context analysis of codon pairs on an ORFeome scale</article-title>
</title-group>
<contrib-group>
<contrib id="A1" contrib-type="author">
<name>
<surname>Moura</surname>
<given-names>Gabriela</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>gmoura@bio.ua.pt</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Pinheiro</surname>
<given-names>Miguel</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>monsanto@ieeta.pt</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Silva</surname>
<given-names>Raquel</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>rsilva@bio.ua.pt</email>
</contrib>
<contrib id="A4" contrib-type="author">
<name>
<surname>Miranda</surname>
<given-names>Isabel</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>imiranda@bio.ua.pt</email>
</contrib>
<contrib id="A5" contrib-type="author">
<name>
<surname>Afreixo</surname>
<given-names>Vera</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>vafreixo@mat.ua.pt</email>
</contrib>
<contrib id="A6" contrib-type="author">
<name>
<surname>Dias</surname>
<given-names>Gaspar</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>gaspar@ieeta.pt</email>
</contrib>
<contrib id="A7" contrib-type="author">
<name>
<surname>Freitas</surname>
<given-names>Adelaide</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>adelaide@mat.ua.pt</email>
</contrib>
<contrib id="A8" contrib-type="author">
<name>
<surname>Oliveira</surname>
<given-names>José L</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>jlo@ieeta.pt</email>
</contrib>
<contrib id="A9" corresp="yes" contrib-type="author">
<name>
<surname>Santos</surname>
<given-names>Manuel AS</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>msantos@bio.ua.pt</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Centre for Cell Biology, Department of Biology, University of Aveiro, 3810-193 Aveiro, Portugal</aff>
<aff id="I2">
<label>2</label>
Institute of Electronics and Telematics Engineering, University of Aveiro, 3810-193 Aveiro, Portugal</aff>
<aff id="I3">
<label>3</label>
Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal</aff>
<pub-date pub-type="ppub">
<year>2005</year>
</pub-date>
<pub-date pub-type="epub">
<day>15</day>
<month>2</month>
<year>2005</year>
</pub-date>
<volume>6</volume>
<issue>3</issue>
<fpage>R28</fpage>
<lpage>R28</lpage>
<ext-link ext-link-type="uri" xlink:href="http://genomebiology.com/2005/6/3/R28"></ext-link>
<history>
<date date-type="received">
<day>24</day>
<month>9</month>
<year>2004</year>
</date>
<date date-type="rev-recd">
<day>25</day>
<month>11</month>
<year>2004</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>1</month>
<year>2005</year>
</date>
</history>
<copyright-statement>Copyright © 2005 Moura et al.; licensee BioMed Central Ltd.</copyright-statement>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
</license>
<abstract abstract-type="short">
<p>We have developed a system for comparative codon context analysis of open reading frames in whole genomes, providing insights into the rules that govern the evolution of codon-pair context.</p>
</abstract>
<abstract>
<p>Codon context is an important feature of gene primary structure that modulates mRNA decoding accuracy. We have developed an analytical software package and a graphical interface for comparative codon context analysis of all the open reading frames in a genome (the ORFeome). Using the complete ORFeome sequences of
<italic>Saccharomyces cerevisiae</italic>
,
<italic>Schizosaccharomyces pombe</italic>
,
<italic>Candida albicans </italic>
and
<italic>Escherichia coli</italic>
, we show that this methodology permits large-scale codon context comparisons and provides new insight on the rules that govern the evolution of codon-pair context.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The standard genetic code uses 64 codons for only 22 amino acids, including the amino acids selenocysteine and pyrrolysine whose incorporation into protein requires the reassignment of the UGA and UAG stop codons, respectively [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. This degeneracy of the genetic code has important implications for gene primary structure evolution as it provides nature with a vast array of options for building open reading frame (ORF) sequences for any particular protein. However, the usage of synonymous codons for building ORFs is not random, suggesting the existence of mechanistic or evolutionary constraints that limit the degree of freedom for coding sequence building [
<xref ref-type="bibr" rid="B3">3</xref>
-
<xref ref-type="bibr" rid="B6">6</xref>
]. In other words, each organism uses a set of rules for building ORF sequences which restrict the total number of options provided by the degeneracy of the genetic code. These rules are only partly understood. Nevertheless, it is becoming increasingly clear that codon usage and context bias reflect the action of two main evolutionary forces: selection for mRNA decoding efficiency and mutational drift acting indiscriminately on coding and noncoding DNA [
<xref ref-type="bibr" rid="B7">7</xref>
-
<xref ref-type="bibr" rid="B10">10</xref>
].</p>
<p>Codon usage reflects selection for translational efficiency, as highly expressed genes tend to use codons that are decoded by abundant cognate tRNAs [
<xref ref-type="bibr" rid="B11">11</xref>
-
<xref ref-type="bibr" rid="B13">13</xref>
]. Similarly, the context of a sequential pair of codons (codon-pair) is biased, but this bias is apparently linked more to decoding accuracy than to translational speed [
<xref ref-type="bibr" rid="B14">14</xref>
-
<xref ref-type="bibr" rid="B17">17</xref>
]. This suggests that the translational machinery is sensitive to the nature of the codon-pair present in the ribosome A and P decoding sites [
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B18">18</xref>
-
<xref ref-type="bibr" rid="B20">20</xref>
], raising the possibility that, like codon usage, codon context may also be species specific. This is supported by the fact that tRNA populations diverge in the number and abundance of tRNA isoacceptors for each codon family and also in the pattern of modified nucleosides in the tRNAs, which also affects mRNA decoding accuracy.</p>
<p>To shed new light on the overall pattern of codon context at the species level and evaluate how codon-pair context varies between species, we have developed software and statistical methodologies for codon-pair context analysis on all the ORFs in a genome as a whole (the ORFeome). Because our main interest is to evaluate the effect of codon context on mRNA decoding accuracy, this study focuses on the context of codon-pairs and not on long-range context effects. With a few exceptions, long-range context is not relevant for mRNA decoding by the ribosome. These new methodologies were tested using the complete ORFeome sequences of the eukaryotes
<italic>Saccharomyces cerevisiae</italic>
,
<italic>Candida albicans </italic>
and
<italic>Schizosaccharomyces pombe </italic>
and the bacterium
<italic>Escherichia coli</italic>
. The methodology developed provides robust and flexible tools for intra- and inter-ORFeome comparative codon-pair context analysis, permits identification of species-specific codon context fingerprints and provides new insight into the role of codon context on mRNA decoding accuracy and ultimately on the pressure imposed by the translational machinery on the evolution of the ORFeome. The software developed, called Anaconda, is available at [
<xref ref-type="bibr" rid="B21">21</xref>
].</p>
</sec>
<sec>
<title>Results</title>
<sec>
<title>Global analysis of codon context in yeast</title>
<p>The Anaconda bioinformatics system developed in this study identifies the start codon of an ORF and reads it by moving a 'decoding window' three nucleotides at a time in the 3' direction until it encounters a stop codon. While doing so it fixes the middle codon of the reading window and memorizes its 5' and 3' neighbors. Anaconda creates a table of frequencies of 64 × 64 codons that allows computation of the number of times the complete set of contiguous codon pairs occurs in an ORF or in an ORFeome. The overall architecture of Anaconda is described in Figure
<xref ref-type="fig" rid="F1">1</xref>
.</p>
<p>The codon-pair context frequency table built by Anaconda allows the statistical analysis of contingency tables to be used to test whether the context is significantly biased [
<xref ref-type="bibr" rid="B22">22</xref>
-
<xref ref-type="bibr" rid="B25">25</xref>
]. These tables allow one to test the existence of association between codon-pairs through the chi-square (χ
<sup>2</sup>
) test of independence; to identify preferred and rejected pairs of codons in the ORFeome through the analysis of adjusted residuals for contingency tables (Table
<xref ref-type="table" rid="T1">1</xref>
and Figure
<xref ref-type="fig" rid="F2">2</xref>
); and to construct a codon context map on an ORFeome scale (Figure
<xref ref-type="fig" rid="F3">3</xref>
). The Anaconda algorithm, its graphical interface and implemented statistical methodologies were tested using the yeast
<italic>S. cerevisiae </italic>
ORFeome. For this, the complete ORFeome was downloaded from the yeast genome database [
<xref ref-type="bibr" rid="B26">26</xref>
], the adjusted residual values for the total number of codon pairs were calculated (see Materials and methods) and each residual value present in a cell of the contingency table (64 lines × 64 columns) was converted into a two-color coded map (Figure
<xref ref-type="fig" rid="F3">3</xref>
). In the latter, green represents positive values greater than +3 (herein called preferred codon-pairs) and red represents negative values lower than -3 (herein called rejected codon-pairs) according to the color scale indicated in Figure
<xref ref-type="fig" rid="F3">3a</xref>
. The data clearly show that each codon has a set of preferred 3' codon neighbors (green) and rejects a set of other codons (red), indicating that codon context is highly biased in
<italic>S. cerevisiae</italic>
. However, in a rather large number of cases, the 3' codon context is not biased or at least strongly rejected or preferred. This is indicated by the black color in the map (Figure
<xref ref-type="fig" rid="F3">3</xref>
) and in the histogram of the residuals distribution (Figure
<xref ref-type="fig" rid="F4">4</xref>
). This black color corresponds to residual values that fall within the interval of -3 to +3 and correspond to codon contexts that do not contribute to the bias for a confidence level of 99.73% (Table
<xref ref-type="table" rid="T1">1</xref>
and Figure
<xref ref-type="fig" rid="F2">2</xref>
). The overall empirical distribution of residual values for codon context in the yeast ORFeome (Figure
<xref ref-type="fig" rid="F4">4</xref>
) clearly shows that a large fraction (about 47%) of codon-pair contexts fall within the interval of -3 to +3, indicating that in many cases the context may not be under high selective constraint.</p>
</sec>
<sec>
<title>Codon clustering unveils unique features of codon context</title>
<p>The codon-pair context maps shown in Figure
<xref ref-type="fig" rid="F3">3a,b</xref>
were built using a manually predefined distribution of codons in both lines and columns. To better understand the full extent of the codon-pair context bias in yeast, the data were clustered using the Pearson's correlation coefficient [
<xref ref-type="bibr" rid="B27">27</xref>
], which enables grouping of codons with similar context preferences. Using double clustering (that is, clustering both lines and columns) several distinct groups of red and green codon-pair contexts were identified for the
<italic>S. cerevisiae </italic>
ORFeome, thus showing that certain groups of codons have similar 3'-neighbor preferences (Figure
<xref ref-type="fig" rid="F5">5</xref>
).</p>
<p>To identify the codons responsible for defining the subgroups with high bias (red and green clusters) and evaluate whether these could define codon-pair context rules, one zooms in on the context subclusters. Three specific subclusters (one red and two green) were analyzed in this study (Figure
<xref ref-type="fig" rid="F6">6a-c</xref>
). The red subcluster shown in Figure
<xref ref-type="fig" rid="F6">6a</xref>
is defined by codon-pairs in which the last nucleotide of the first codon is uridine (U) and the first nucleotide of the next codon (3' side) is adenosine (A). As no such rule was observed for the other codon positions - that is, positions 1 and 2 or 2 and 3 of codon 1 or positions 1 and 2 or 2 and 3 of codon 2 (data not shown), the codons are clustered based on the following context rejection rule: XXU-AYY. The intensity of rejection (given by the adjusted residual itself) is not identical for all codon combinations within the subcluster. However, with the exception of the asparagine AAU and serine AGU codons, and some others whose residual values fall within the non-statistically significant -3 to +3 interval, all other U-ending codons avoid 3'-neighbor codons starting with an A. If one assumes that fixed codons in the map (lines) represent P-site codons and 3' codons (columns) represent A-site codons, then the above rule suggests that the third base of a P-site codon somehow influences the choice of the first base of the A-site codon. In other words, and assuming that context modulates decoding accuracy,
<italic>S. cerevisiae </italic>
codon pairs that end with an U and start with an A are likely to cause some trouble to the ribosome during decoding.</p>
<p>The above observations were confirmed by analyzing two green codon-pair context subclusters (good contexts). In these cases, two different clustering rules were identified, namely the XXC-AYY and the XXU-GYY (Figure
<xref ref-type="fig" rid="F6">6b,c</xref>
). Like the bad context subcluster discussed previously, in these good context subclusters there are exceptions that include red and black context cells. Nevertheless, there is a strong trend for the above rule within each subcluster, indicating once more that the third base of the P-site codon influences the first base of the A-site codon. The fact that these rules cannot be seen for other codon positions, and that there are exceptions to these rules for other codon families in the overall map, excludes the possibility that the third-first base rules identified reflect dinucleotide preferences or rejections arising from DNA replication and repair ([
<xref ref-type="bibr" rid="B28">28</xref>
] and see later).</p>
</sec>
<sec>
<title>Comparative codon context analysis</title>
<p>Because the
<italic>S. cerevisiae </italic>
codon-pair context map produced a clear context pattern, we wondered whether this map could represent a species-specific fingerprint, as is the case for the codon-usage fingerprint. For this, maps for
<italic>S. pombe</italic>
,
<italic>C. albicans </italic>
and
<italic>E. coli </italic>
were also constructed, with the latter being used as an outgroup. Some similarities between the codon-pair context maps were immediately visible, namely a strong green diagonal line in the yeast maps (Figure
<xref ref-type="fig" rid="F7">7</xref>
). There are, however, important differences that become evident when the negative and positive residual values are ranked for the yeast species studied (Table
<xref ref-type="table" rid="T2">2</xref>
). These values represent the most negative and positive residuals of the yeast maps and consequently provide a good indication of the differences in codon context present in the three yeast species. Of the 10 most positive residual values ranked in Table
<xref ref-type="table" rid="T2">2</xref>
, only two are common for the three yeast species, namely GAA-GAA, GGU-GGU and GCU-GCU. A similar result was obtained when the most negative values were ranked (Table
<xref ref-type="table" rid="T2">2</xref>
). In addition, the
<italic>C. albicans </italic>
genome shows a more biased codon-pair context status. For example, the 10th most positive residual (49,476 for ACA-ACA) is higher than the maximum residual value for
<italic>S. cerevisiae </italic>
and
<italic>S. pombe</italic>
: 45,422 for CAG-CAG and 35,086 for UCU-UCU, respectively (Table
<xref ref-type="table" rid="T2">2</xref>
).</p>
<p>An additional approach to identifying codon-pair context differences between
<italic>S. cerevisiae</italic>
,
<italic>S. pombe </italic>
and
<italic>C. albicans</italic>
, was undertaken by overlapping the complete codon context maps displayed in Figure
<xref ref-type="fig" rid="F7">7</xref>
. For this, the maps built with a predefined order of codons for both the 64 lines and the 64 columns were merged, allowing the construction of a comparison codon-pair context map. We call this a differential codon-pair context map (DCM) and it corresponds to the module of the difference between the residuals of overlapped cells of the 64 × 64 context table (Figure
<xref ref-type="fig" rid="F8">8</xref>
). A new color scale based on gradation of blue was used for the differential display. Using this methodology, the codon context differences for the three yeast species became self-evident, indicating that codon context - like codon usage - is species specific (Figure
<xref ref-type="fig" rid="F8">8</xref>
). In all three DCMs shown in Figure
<xref ref-type="fig" rid="F8">8</xref>
there are common features, which are indicated by the black cells; however, the differences (blue) are clearly visible. As expected from the phylogenetic distance of the various species studied, the DCMs for the pairs
<italic>E. coli</italic>
-
<italic>S. cerevisiae </italic>
and
<italic>E. coli</italic>
-
<italic>C. albicans </italic>
show many more differences than the DCM for the pair
<italic>S. cerevisiae</italic>
-
<italic>C. albicans</italic>
.</p>
<p>The DCMs also show that codon-pair context is more similar for the pair
<italic>S. pombe</italic>
-
<italic>S. cerevisiae </italic>
(data not shown) than for the other two yeast pairs, indicating that there are fewer differences between
<italic>S. pombe </italic>
and
<italic>S. cerevisiae </italic>
than between
<italic>C. albicans </italic>
and
<italic>S. cerevisiae</italic>
. This is surprising, considering that
<italic>S. pombe </italic>
diverged from
<italic>S. cerevisiae </italic>
420 million years ago whereas
<italic>C. albicans </italic>
diverged from the latter only 170 million years ago [
<xref ref-type="bibr" rid="B29">29</xref>
]. The effect of the rather strong green diagonal (codon repeats) in the
<italic>C. albicans </italic>
maps is also visible in the DCMs (blue cells) of the
<italic>C. albicans</italic>
-
<italic>S. cerevisiae </italic>
pairs (Figure
<xref ref-type="fig" rid="F8">8a</xref>
). In order to shed more light on the differences in the codon context maps for the three yeasts, codon pairs were ranked according to the module of the difference between residuals (Table
<xref ref-type="table" rid="T3">3</xref>
). Surprisingly, only one codon pair for the three yeast species (CAA-CAA) is present among the 10 highest values that were ranked. Further, the difference between these three species is not only qualitative, as shown above, but is also quantitative. For example, for the
<italic>S. pombe-S. cerevisiae </italic>
pair, the highest difference was found for the pair CAG-CAG with a value of 27,798, whereas in the
<italic>S. pombe-C. albicans </italic>
map the CAA-CAA pair showed a difference value of 100,639. In fact, in the latter yeast pair DCM all 10 values related are higher than the highest value (27,798) found for the CAG-CAG codon pair in the
<italic>S. pombe</italic>
-
<italic>S. cerevisiae </italic>
map (Table
<xref ref-type="table" rid="T3">3</xref>
). Therefore, when taken together, DCMs and residuals rankings provide unique insight into the codon-pair context differences, even for phylogenetically related species such as yeasts.</p>
</sec>
<sec>
<title>Contribution of mutation bias to codon-pair context</title>
<p>An important feature of the codon-pair context map in the yeasts analyzed, but not in
<italic>E. coli</italic>
, is the presence of a diagonal green line (Figures
<xref ref-type="fig" rid="F3">3</xref>
,
<xref ref-type="fig" rid="F7">7</xref>
). The existence of this green line implies that in those yeasts, most codons prefer to have another identical codon on their 3' side, indicating a degree of tandem codon duplication in the ORFeome of yeasts. Trinucleotide repeats are characteristic of eukaryotic genomes and have been attributed to DNA polymerase slippage during genome replication [
<xref ref-type="bibr" rid="B30">30</xref>
]. Whether the codon duplication observed in the ORFeome of the yeasts analyzed is a consequence of DNA replication only, or also reflects an evolutionary constraint imposed by the mRNA decoding machinery on those ORFeomes, is not yet clear and we are currently investigating this. In any case, this diagonal line in the codon context maps of yeasts is a strong feature, since the highest residuals of codon pairs (preferred pairs) occur for tandem codon repeats (Table
<xref ref-type="table" rid="T2">2</xref>
).</p>
<p>The above observations prompted us to investigate whether mutational bias also played a part in codon-pair context bias and whether such bias could be extracted from the codon-pair context maps. For this, particular attention was given to GC content because it plays a major role in codon usage [
<xref ref-type="bibr" rid="B31">31</xref>
]. An algorithm was implemented into Anaconda for calculating %GC total, %GC at codon position 1 (GC1), %GC at codon position 2 (GC2) and %GC at codon position 3 (GC3). While scanning an ORFeome, Anaconda divides ORFs into GC-content subgroups and creates groups of ORFs with high and low GC content. It also determines the distribution of ORFs according to their GC total and GC3 (Figure
<xref ref-type="fig" rid="F9">9a,c</xref>
). Codon-pair codon context maps can be built for each subgroup of codons and the maps compared using the DCM tool (Figures
<xref ref-type="fig" rid="F9">9b,d</xref>
and
<xref ref-type="fig" rid="F10">10</xref>
).</p>
<p>Because GC bias is better observed at the third codon position as a result of the degeneracy of the genetic code, GC3 was used to evaluate whether mutational bias contributed to the codon-pair context using the
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
ORFeomes as proof of principle. In the former, the ORF distribution varied from a minimum of 11.9% to a maximum of 76.7%; however, most ORFs fell within a narrow interval between 35-40% GC3 (Figure
<xref ref-type="fig" rid="F9">9a</xref>
). In the case of
<italic>E. coli</italic>
, the ORF distribution is broader, varying from a minimum of 20.0% to a maximum of 89.4%, but most ORFs have a GC3 between 50% and 60% (Figure
<xref ref-type="fig" rid="F9">9c</xref>
). This distribution made it possible to build codon-pair context maps for the low GC3 and high GC3 subgroups. As differences between these low and high GC3 context maps were expected to allow for evaluation of the importance of the bias introduced by mutational drift into the codon-pair context maps, these maps were overlapped using the DCM tool. As before, the maps were built using a single colour (blue) to aid visualization of the context differences. If mutational drift did not contribute to the context bias, the codon-pair context maps of the GC3 subgroups would be identical, producing a black differential display map. This is because the difference of the module of the residuals would be zero for all cells of the table of residuals.</p>
<p>The differential display map for the low and high GC3 ORF subgroups of
<italic>S. cerevisiae </italic>
showed several differences, indicating that GC bias contributes to the codon-pair context. However, most of these differences corresponded to small deviations in the strength of the rejection or preference of the codon-pair contexts (Figure
<xref ref-type="fig" rid="F9">9b</xref>
and
<xref ref-type="fig" rid="F10">10</xref>
, see also Table
<xref ref-type="table" rid="T4">4</xref>
). In other words, the residual values had the same positive or negative signal in both cases but the value was higher in one GC3 subgroup than the other and vice versa. In some cases, an inversion of signal of the residuals (for example, from positive to negative) was detected, indicating that the residual of the codon-pair was positive in one GC3 subgroup and negative in the other GC3 subgroup (light blue in Figure
<xref ref-type="fig" rid="F9">9b</xref>
). This inversion of signal provides clear evidence for the influence of GC content bias in the codon-pair context. Similar results were obtained for the
<italic>E. coli </italic>
ORFeome; however, a much larger number of inversions of the residual signal was observed in this case, indicating that the GC content bias is far stronger in
<italic>E. coli </italic>
than in
<italic>S. cerevisiae </italic>
(Figures
<xref ref-type="fig" rid="F9">9d</xref>
and
<xref ref-type="fig" rid="F10">10</xref>
, see also Table
<xref ref-type="table" rid="T4">4</xref>
). The reasons for these differences and the quantitative contribution of mutational bias to codon-pair context bias is not yet fully understood and is currently being investigated. However, Anaconda already provides strong evidence for a role for mutational bias on codon-pair context.</p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>Codon context has been extensively studied in prokaryotic, eukaryotic, mitochondrial and viral genomes, and these studies unequivocally showed that codon-pair context is biased [
<xref ref-type="bibr" rid="B9">9</xref>
,
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
-
<xref ref-type="bibr" rid="B35">35</xref>
]. However, no tool has yet been developed to display codon context data and in particular codon-pair context (short-range context) in a way that would facilitate interpretation of the data and allow inter- or intra-genome context comparisons. This is essential if putative general rules that govern codon-pair context evolution are to be unraveled. The Anaconda bioinformation system has been developed to address this problem. By using statistical methodologies based on contingency tables and residual analysis (see Materials and methods), specific codon-pair context patterns were unveiled and displayed using a color coded ORFeome-context map. The data highlighted codon-pair context bias in yeasts and
<italic>E. coli </italic>
and some rules that define codon-pair context patterns in yeast.</p>
<sec>
<title>Forces that shape codon-pair context</title>
<p>Studies carried out in the 1980 s in
<italic>E. coli </italic>
have demonstrated that codon-pair context influences mRNA decoding accuracy and efficiency, indicating that the translational machinery imposes significant constraints on codon-pair context [
<xref ref-type="bibr" rid="B17">17</xref>
,
<xref ref-type="bibr" rid="B36">36</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
]. For example, in starved
<italic>E. coli </italic>
cells, the asparagine AAU and AAC codons are misread as lysine at high frequency [
<xref ref-type="bibr" rid="B16">16</xref>
]. Quantification of the level of lysine misincorporation at those codons and determination of the effect of the 3' nucleotide context on lysine misincorporation showed that the AAU codon is misread up to nine times more frequently than the AAC codon, and that the 3' nucleotide context (III-I context) influenced the level of misreading by as much as twofold [
<xref ref-type="bibr" rid="B16">16</xref>
]. Additional studies carried out
<italic>in vitro </italic>
in
<italic>E. coli</italic>
, have also shown that ribosomes discriminate C-ending Phe UUC and Leu CUC codons less well than the U-ending Phe UUU and Leu CUU, showing that synonymous codons differ in translational accuracy [
<xref ref-type="bibr" rid="B38">38</xref>
]. Therefore, a possible role for codon-pair context is minimization of decoding error, in particular in those codons that are poorly discriminated by the ribosome.</p>
<p>In
<italic>E. coli</italic>
, over-represented codon-pairs are translated more slowly than under-represented codon-pairs, indicating that codon-pair context also influences translational speed [
<xref ref-type="bibr" rid="B14">14</xref>
]. This suggests that codon-pair context in
<italic>E. coli </italic>
is under strong selective constraints imposed by the translational machinery. Whether the context patterns now unveiled in yeast reflect similar selective constraints remains unclear. Nevertheless, the codon-pair context maps described here provide a good starting point to address this important biological question
<italic>in vivo </italic>
in yeast in a guided manner. Additional evidence for a role for selection on codon-pair context was highlighted by the negligible, or even zero, contribution of GC3 to the context bias in very frequent or very infrequent codon-pairs (strong contexts) in both
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
(Figure
<xref ref-type="fig" rid="F9">9</xref>
, Table
<xref ref-type="table" rid="T4">4</xref>
) and by a number of exceptions to the context rules that define the subclusters of codon-pairs (Figure
<xref ref-type="fig" rid="F6">6</xref>
). For example, within the XXU-AYY subcluster of rejected codons (Figure
<xref ref-type="fig" rid="F6">6a</xref>
), the codon pairs AAU-AGC, AAU-AGU, AAU-AAU, AAU-AAC and the set of AGU-AGC, AGU-AGU, AGU-AAU, AGU-ACA, AGU-AUA have positive residuals, indicating that they are codon pairs preferred by the ORFeome. Similar exceptions are found within the subclusters of preferred codon pairs shown (Figure
<xref ref-type="fig" rid="F6">6b,c</xref>
). Furthermore, a detailed analysis of the overall ORFeome context map (Figure
<xref ref-type="fig" rid="F5">5</xref>
) shows that other codon-pairs violate the XXU-AYY rules, namely GGU-AUG, GGU-AUC, GGU-AUU, GGU-ACC, GGU-ACU. This supports the hypothesis that those clusters of the context map are not formed on the basis of particular dinucleotide combinations that may be related to genome mutational drift. This is further confirmed by our observation that the dinucleotide preference in the XXU-AYY, XXC-AYY and XXU-GYY codon pairs is not observed when the various positions within each codon or codon-pair are analyzed. In other words, in the codon pair X
<sub>1</sub>
X
<sub>2</sub>
X
<sub>3</sub>
-Y
<sub>1</sub>
Y
<sub>2</sub>
Y
<sub>3</sub>
, the X
<sub>3</sub>
-Y
<sub>1 </sub>
preferences are not observed for the dinucleotides X
<sub>1</sub>
-X
<sub>2</sub>
, X
<sub>2</sub>
-X
<sub>3</sub>
, Y
<sub>1</sub>
-Y
<sub>2 </sub>
and Y
<sub>2</sub>
-Y
<sub>3 </sub>
(data not shown).</p>
<p>Despite these arguments, mutational bias does influence codon-pair context [
<xref ref-type="bibr" rid="B7">7</xref>
,
<xref ref-type="bibr" rid="B39">39</xref>
-
<xref ref-type="bibr" rid="B41">41</xref>
]. Observed mutational bias reflects mutational events that act indiscriminately on all DNA sequences (coding and noncoding DNA) and is consequently a property of the genome rather than the result of selection acting within ORFs [
<xref ref-type="bibr" rid="B42">42</xref>
-
<xref ref-type="bibr" rid="B45">45</xref>
]. The data presented here is in line with those observations. For example, context maps shown in this study indicate that several of the context clusters are formed on the basis of dinucleotide context rules (III-I rule), namely the XXU-AYY, XXC-AYY, XXU-GYY (Figure
<xref ref-type="fig" rid="F6">6a-c</xref>
). As dinucleotide context is related to DNA repair and replication constraints those clusters reflect mutational bias [
<xref ref-type="bibr" rid="B28">28</xref>
]. An important feature that highlights the influence of mutational bias on codon-pair context is GC content, in particular GC3 content. GC content has a strong influence in codon usage and in extreme cases can even drive certain codons out of ORFeomes [
<xref ref-type="bibr" rid="B46">46</xref>
,
<xref ref-type="bibr" rid="B47">47</xref>
]. The data presented here clearly show that GC3 affects codon-pair context; however, this effect is mainly visible for codon-pairs that have weak residuals (Table
<xref ref-type="table" rid="T4">4</xref>
, Figure
<xref ref-type="fig" rid="F9">9</xref>
). As strong residuals (either positive or negative) provide an indirect measure of the strength of the codon-pair association, it is likely that for extreme residuals GC3 bias introduces only noise into the analysis whereas for residuals near the statistically nonsignificant interval (-3, +3), GC3 bias represents a major contribution to the context bias observed (Figure
<xref ref-type="fig" rid="F9">9</xref>
).</p>
<p>Apart from those cases mentioned above, other species-specific genomic features also contribute to codon-pair context bias highlighted by Anaconda. For example, the yeast codon-pair context maps show a feature of eukaryotic genomes which is not related to mRNA translation: trinucleotide repeats which are evident in the diagonal line present in Figures
<xref ref-type="fig" rid="F3">3</xref>
and
<xref ref-type="fig" rid="F7">7</xref>
. This strongly suggests that there is a very high degree of tandem codon repeats (trinucleotide repeats), which are likely to arise from biased DNA replication (DNA polymerase slippage, see [
<xref ref-type="bibr" rid="B30">30</xref>
]). Whether these repeated codon-pairs improve mRNA translation efficiency or accuracy in yeast remains to be determined experimentally. As far as we are aware, there is no experimental evidence showing increased decoding accuracy or efficiency at those sites.</p>
<p>Finally, constraints imposed by protein sequences and mRNA secondary structure are also thought to influence codon context [
<xref ref-type="bibr" rid="B48">48</xref>
,
<xref ref-type="bibr" rid="B49">49</xref>
]. The context maps seem to exclude the former hypothesis because no cluster is formed as a result of selection or rejection of two adjacent amino acids. In regard to the latter constraint, the Anaconda algorithm was not designed to detect mRNA secondary structures and consequently this question cannot be addressed at this stage.</p>
</sec>
</sec>
<sec>
<title>Conclusions</title>
<p>The Anaconda algorithm was developed with the aim of studying codon-pair context on an ORFeome scale, define rules that govern codon-pair context, carry out large-scale interspecies codon-pair context comparisons and clarify the effect of selection and mutational drift on codon-pair context. The results provide important new insight on the role of codon-pair context on mRNA decoding accuracy and efficiency, and we expect that it will allow the development of reporter genes for
<italic>in vivo </italic>
and
<italic>in vitro </italic>
quantification of codon-decoding error and translational speed. Finally, Anaconda will be a valuable tool to redesign ORFs for efficient and accurate heterologous or homologous protein expression in yeast and, eventually, in other suitable host systems.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title>Statistics</title>
<p>To study the association between contiguous codon-pairs, the coding sequences analyzed by Anaconda are processed in a 64 × 64 contingency table subdivided in mutually exclusive categories. If the 3' context is being analyzed, the rows of the table correspond to the codons in the P-site and the columns to the codons in the A-site of the ribosome. At the 5' context analysis the situation is inverted, and so the contingency table built is a transposed version of the one for 3' analysis.</p>
<p>A number of different mathematical methodologies have already been used to study codon context bias (for example [
<xref ref-type="bibr" rid="B9">9</xref>
,
<xref ref-type="bibr" rid="B50">50</xref>
-
<xref ref-type="bibr" rid="B52">52</xref>
]). In this study, the analysis of contingency tables and residuals (Figure
<xref ref-type="fig" rid="F3">3</xref>
) was considered appropriate, assuming a multinomial probabilistic model for the contingency table (a detailed discussion of this model in the context of genomic data can be found in [
<xref ref-type="bibr" rid="B53">53</xref>
]). In general, all these methodologies are based on
<italic>z</italic>
-score-type tests and give information about preference and rejection. Basically, those methodologies differ in the probabilistic model assumed, leading to statistics whose probability distribution is in most cases unknown. The advantage of the methodology proposed here is that its theory of inference is well known, yielding an analysis that is more sequential, more easily interpretable and with more complementary tools for analysis (for example, measures of association). In other words, this methodology was chosen because the adjusted residual values give direct information about preference and rejection in relation to what would be expected on a random basis. Furthermore, the probability distribution under the hypothesis of independence is determined without data simulations.</p>
<p>For analysis of contingency tables and residuals [
<xref ref-type="bibr" rid="B22">22</xref>
-
<xref ref-type="bibr" rid="B25">25</xref>
], given an
<italic>r </italic>
×
<italic>c </italic>
contingency table where a multinomial distribution is assumed (Table
<xref ref-type="table" rid="T5">5</xref>
), the hypothesis of independence between the variables A and B is tested using the Pearson's statistic given by:</p>
<p>
<inline-graphic xlink:href="gb-2005-6-3-r28-i1.gif"></inline-graphic>
</p>
<p>where:</p>
<p>
<inline-graphic xlink:href="gb-2005-6-3-r28-i2.gif"></inline-graphic>
</p>
<p>It is known that Pearson's statistic has an asymptotical chi-square probability distribution with (
<italic>r </italic>
- 1)(
<italic>c </italic>
- 1) degrees of freedom. To identify cells in the table responsible for the eventual rejections of independence, the adjusted residuals
<italic>d</italic>
<sub>
<italic>ij </italic>
</sub>
are calculated by:</p>
<p>
<inline-graphic xlink:href="gb-2005-6-3-r28-i3.gif"></inline-graphic>
</p>
<p>where:</p>
<p>
<inline-graphic xlink:href="gb-2005-6-3-r28-i4.gif"></inline-graphic>
</p>
<p>is the variance estimated for
<italic>r</italic>
<sub>
<italic>ij</italic>
</sub>
. Haberman [
<xref ref-type="bibr" rid="B54">54</xref>
] has shown that, under independence between A and B, the adjusted residuals
<italic>d</italic>
<sub>
<italic>ij </italic>
</sub>
have a standardized normal probability distribution, and therefore
<italic>P</italic>
(- 3 <
<italic>d</italic>
<sub>
<italic>ij </italic>
</sub>
< 3) ≈ 0.9973, as
<italic>N </italic>
→ + ∞. This means that, for a 99,73% confidence level, the pair (A
<sub>i</sub>
, B
<sub>j</sub>
) is considered responsible for rejection of the hypothesis of independence if |
<italic>d</italic>
<sub>
<italic>ij</italic>
</sub>
| ≥ 3. In practice, we consider that an adjusted residual is statistically significant if its absolute value is greater then 3.</p>
<p>Additionally, to find codon context patterns in the contingency table, lines and columns can be grouped using classifying methodologies such as cluster analysis. These patterns are determined by calculating similarities between two vectors of the contingency table using the centred Pearson correlation coefficient and applying single linkage. The single-linkage method produces groups with 'chaining effect': that is, any element of a group is more 'similar' to an element of the same group than to any element of another group.</p>
</sec>
<sec>
<title>Software</title>
<p>The architecture of the Anaconda software is based on three main modules, namely data acquisition, processing and visualization (Figure
<xref ref-type="fig" rid="F1">1</xref>
). Each module works independently from the others and can easily be replaced or updated. Also, this component-based approach allows for insertion of new modules or new tools in each module, such as new statistical features.</p>
<p>The acquisition and processing modules download row data from genome databases, create a local database of usable ORFs and analyze the data using an algorithm that simulates the ribosome during mRNA decoding. It finally constructs a database containing the processed data. This data is then submitted to statistical analysis as described above. The visualization module allows the user to visualize the data matrices and gene sequences and to create filters that permit searching for specific sequence patterns defined by the user.</p>
<p>The data-acquisition module deals with genome input files, namely reading and interpreting FASTA sequences of complete or partial sets of ORFs from public or private genome databases. To ensure that the screened sequences have the best possible quality, and hence do not introduce background noise in the following analyses, several quality filters are applied to the reading process. When the filters are activated the data are classified according to the following criteria. Valid data consist of genes whose sequence is a multiple of three; which start with an AUG codon and stop with a UAG, UAA or UGA codon, and which satisfy other user-defined requirements. Rejected data consist of genes whose sequence does not fulfill the above requirements. The result is the separation of valid from rejected ORFs. Other parameters needed by the application, such as reference relative synonymous codon usage (RSCU) values for codon adaptation index (CAI) calculation [
<xref ref-type="bibr" rid="B55">55</xref>
], are also uploaded by this module.</p>
<p>The processing module is the core of the application, where the codon context analysis is performed. After prescanning the files, the user can test the existence of significant bias in the codon context and use the residual values to further explore the matrices of residual values (see Statistics, above). The data generated are then converted into a contingency table that includes the corresponding observed values of Pearson's statistics, and the matrix of adjusted residuals [
<xref ref-type="bibr" rid="B25">25</xref>
].</p>
<p>After processing, the data become available to the visualization module. This module is the graphical interface. It follows the file manager paradigm in which information is presented in hierarchical views. This module offers a set of tools that enable several tasks to be carried out, namely to search prespecified sequence patterns, to visualize data in histogram form, to cluster codon context data, and to export residual values. It is also possible to visualize other information at the gene level, such as rare codons and their distribution in the ORFs, to determine their ratio relative to the total number of codons, to determine the GC% at the first, second and third codon positions and determine the codon adaptation index (CAI) and the effective number of codons [
<xref ref-type="bibr" rid="B55">55</xref>
,
<xref ref-type="bibr" rid="B56">56</xref>
].</p>
</sec>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>We thank FCT (Project: POCTI/BME/39030/2001), IEETA and the II-UA (CTS-12) for supporting the development of the Anaconda software. G.M. is funded by FCT grant SFRH/BPD/7195/2001 and M.P. by INFOGENMED (FP-V). M.S. is supported by an EMBO YIP Award.</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sandman</surname>
<given-names>KK</given-names>
</name>
<name>
<surname>Tardiff</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Neely</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Noren</surname>
<given-names>CJ</given-names>
</name>
</person-group>
<article-title>Revised
<italic>Escherichia coli </italic>
selenocysteine insertion requirements determined by
<italic>in vivo </italic>
screening of combinatorial libraries of SECIS variants.</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>2234</fpage>
<lpage>2241</lpage>
<pub-id pub-id-type="pmid">12682374</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkg304</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Theobald-Dietrich</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Frugier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Giege</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rudinger-Thirion</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Atypical archaeal tRNA pyrrolysine transcript behaves towards EF-Tu as a typical elongator tRNA.</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>1091</fpage>
<lpage>1096</lpage>
<pub-id pub-id-type="pmid">14872064</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkh266</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Thomas</surname>
<given-names>LK</given-names>
</name>
<name>
<surname>Dix</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>Codon choice and gene expression: synonymous codons differ in their ability to direct aminoacylated-transfer RNA binding to ribosomes
<italic>in vitro</italic>
.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1988</year>
<volume>85</volume>
<fpage>4242</fpage>
<lpage>4246</lpage>
<pub-id pub-id-type="pmid">3288988</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ikemura</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and
<italic>Escherichia coli </italic>
with reference to the abundance of isoaccepting transfer RNAs.</article-title>
<source>J Mol Biol</source>
<year>1982</year>
<volume>158</volume>
<fpage>573</fpage>
<lpage>597</lpage>
<pub-id pub-id-type="pmid">6750137</pub-id>
<pub-id pub-id-type="doi">10.1016/0022-2836(82)90250-9</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carlini</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Stephan</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>
<italic>In vivo </italic>
introduction of unpreferred synonymous codons into the
<italic>Drosophila </italic>
Adh gene results in reduced levels of ADH protein.</article-title>
<source>Genetics</source>
<year>2003</year>
<volume>163</volume>
<fpage>239</fpage>
<lpage>243</lpage>
<pub-id pub-id-type="pmid">12586711</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elf</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nilsson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tenson</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ehrenberg</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Selective charging of tRNA isoacceptors explains patterns of codon usage.</article-title>
<source>Science</source>
<year>2003</year>
<volume>300</volume>
<fpage>1718</fpage>
<lpage>1722</lpage>
<pub-id pub-id-type="pmid">12805541</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1083811</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akashi</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Synonymous codon usage in
<italic>Drosophila melanogaster </italic>
: natural selection and transational accuracy.</article-title>
<source>Genetics</source>
<year>1994</year>
<volume>136</volume>
<fpage>927</fpage>
<lpage>935</lpage>
<pub-id pub-id-type="pmid">8005445</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berg</surname>
<given-names>OG</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>Codon bias in
<italic>Escherichia coli </italic>
: the influence of codon context on mutation and selection.</article-title>
<source>Nucleic Acids Res</source>
<year>1997</year>
<volume>25</volume>
<fpage>1397</fpage>
<lpage>1404</lpage>
<pub-id pub-id-type="pmid">9060435</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/25.7.1397</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fedorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saxonov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Regularities of context-dependent codon bias in eukaryotic genes.</article-title>
<source>Nucleic Acids Res</source>
<year>2002</year>
<volume>30</volume>
<fpage>1192</fpage>
<lpage>1197</lpage>
<pub-id pub-id-type="pmid">11861911</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/30.5.1192</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McVean</surname>
<given-names>GAT</given-names>
</name>
<name>
<surname>Hurst</surname>
<given-names>GDD</given-names>
</name>
</person-group>
<article-title>Evolutionary lability of context-dependent codon bias in bacteria.</article-title>
<source>J Mol Evol</source>
<year>2000</year>
<volume>50</volume>
<fpage>264</fpage>
<lpage>275</lpage>
<pub-id pub-id-type="pmid">10754070</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duret</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>tRNA gene number and codon usage in the
<italic>C. elegans </italic>
genome are co-adapted for optimal translation of highly expressed genes.</article-title>
<source>Trends Genet</source>
<year>2000</year>
<volume>16</volume>
<fpage>287</fpage>
<lpage>289</lpage>
<pub-id pub-id-type="pmid">10858656</pub-id>
<pub-id pub-id-type="doi">10.1016/S0168-9525(00)02041-2</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ikemura</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Codon usage and tRNA content in unicellular and multicellular organisms.</article-title>
<source>Mol Biol Evol</source>
<year>1985</year>
<volume>2</volume>
<fpage>13</fpage>
<lpage>34</lpage>
<pub-id pub-id-type="pmid">3916708</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moriyama</surname>
<given-names>EN</given-names>
</name>
<name>
<surname>Powell</surname>
<given-names>JR</given-names>
</name>
</person-group>
<article-title>Codon usage bias and tRNA abundance in
<italic>Drosophila</italic>
.</article-title>
<source>J Mol Evol</source>
<year>1997</year>
<volume>45</volume>
<fpage>514</fpage>
<lpage>523</lpage>
<pub-id pub-id-type="pmid">9342399</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Irwin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Heck</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Hatfield</surname>
<given-names>GW</given-names>
</name>
</person-group>
<article-title>Codon pair utilization biases influence translational elongation step times.</article-title>
<source>J Biol Chem</source>
<year>1995</year>
<volume>270</volume>
<fpage>22801</fpage>
<lpage>22806</lpage>
<pub-id pub-id-type="pmid">7559409</pub-id>
<pub-id pub-id-type="doi">10.1074/jbc.270.39.22801</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parker</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Errors and alternatives in reading the universal genetic code.</article-title>
<source>Microbiol Rev</source>
<year>1989</year>
<volume>53</volume>
<fpage>273</fpage>
<lpage>298</lpage>
<pub-id pub-id-type="pmid">2677635</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Precup</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Parker</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Missense misreading of asparagine codons as a function of codon identity and context.</article-title>
<source>J Biol Chem</source>
<year>1987</year>
<volume>262</volume>
<fpage>11351</fpage>
<lpage>11355</lpage>
<pub-id pub-id-type="pmid">3112158</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Precup</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ulrich</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Roopnarine</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Parker</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Context specific misreading of phenylalanine codons.</article-title>
<source>Mol Gen Genet</source>
<year>1989</year>
<volume>218</volume>
<fpage>397</fpage>
<lpage>401</lpage>
<pub-id pub-id-type="pmid">2685541</pub-id>
<pub-id pub-id-type="doi">10.1007/BF00332401</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Curran</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Poole</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Tate</surname>
<given-names>WP</given-names>
</name>
<name>
<surname>Gross</surname>
<given-names>BL</given-names>
</name>
</person-group>
<article-title>Selection of aminoacyl-tRNAs at sense codons: the size of the tRNA variable loop determines whether the immediate 3' nucleotide to the codon has a context effect.</article-title>
<source>Nucleic Acids Res</source>
<year>1995</year>
<volume>23</volume>
<fpage>4104</fpage>
<lpage>4108</lpage>
<pub-id pub-id-type="pmid">7479072</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shpaer</surname>
<given-names>EG</given-names>
</name>
</person-group>
<article-title>Constrains on codon context in
<italic>Escherichia coli </italic>
genes. Their possible role in modulating the efficiency of translation.</article-title>
<source>J Mol Biol</source>
<year>1986</year>
<volume>188</volume>
<fpage>555</fpage>
<lpage>564</lpage>
<pub-id pub-id-type="pmid">3525848</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gutman</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Hatfield</surname>
<given-names>GW</given-names>
</name>
</person-group>
<article-title>Nonrandom utilization of codon pairs in
<italic>Escherichia coli</italic>
.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1989</year>
<volume>86</volume>
<fpage>3699</fpage>
<lpage>3703</lpage>
<pub-id pub-id-type="pmid">2657727</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="other">
<article-title>Functional Evolutionary Genomics Laboratory: University of Aveiro.</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.bio.ua.pt/genomica/lab"></ext-link>
</citation>
</ref>
<ref id="B22">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Bishop</surname>
<given-names>YMM</given-names>
</name>
<name>
<surname>Fienberg</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Holland</surname>
<given-names>PW</given-names>
</name>
</person-group>
<article-title>Discrete Multivariate Analysis. Theory and Practice Cambridge</article-title>
<source>UK: MIT Press</source>
<year>1975</year>
</citation>
</ref>
<ref id="B23">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Everitt</surname>
<given-names>BS</given-names>
</name>
</person-group>
<article-title>The Analysis of Contingency Tables</article-title>
<source>New York: John Wiley and Sons</source>
<year>1997</year>
</citation>
</ref>
<ref id="B24">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Sheskin</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Parametric and Nonparametric Statistical Procedures</article-title>
<source>London: Chapman & Hall/CRC</source>
<year>2000</year>
</citation>
</ref>
<ref id="B25">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Agresti</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Categorical Data Analysis</article-title>
<source>New York: Wiley</source>
<year>2002</year>
</citation>
</ref>
<ref id="B26">
<citation citation-type="other">
<article-title>
<italic>Saccharomyces </italic>
Genome Database.</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.yeastgenome.org"></ext-link>
</citation>
</ref>
<ref id="B27">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Everitt</surname>
<given-names>BS</given-names>
</name>
</person-group>
<article-title>Cluster Analysis</article-title>
<source>New York: Arnold</source>
<year>1998</year>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nussinov</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Doublet frequencies in evolutionary distinct groups.</article-title>
<source>Nucleic Acids Res</source>
<year>1984</year>
<volume>12</volume>
<fpage>1749</fpage>
<lpage>1763</lpage>
<pub-id pub-id-type="pmid">6583663</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Massey</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Moura</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Beltrao</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Almeida</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Garey</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Tuite</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MAS</given-names>
</name>
</person-group>
<article-title>Comparative evolutionary genomics unveils the molecular mechanism of reassignment of the CTG codon in
<italic>Candida </italic>
spp.</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<fpage>544</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="pmid">12670996</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.811003</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freudenreich</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Kantrow</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Zakian</surname>
<given-names>VA</given-names>
</name>
</person-group>
<article-title>Expansion and length-dependent fragility of CTG repeats in yeast.</article-title>
<source>Science</source>
<year>1998</year>
<volume>279</volume>
<fpage>853</fpage>
<lpage>856</lpage>
<pub-id pub-id-type="pmid">9452383</pub-id>
<pub-id pub-id-type="doi">10.1126/science.279.5352.853</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sueoka</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Translation-coupled violation of parity rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position.</article-title>
<source>Gene</source>
<year>1999</year>
<volume>238</volume>
<fpage>53</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">10570983</pub-id>
<pub-id pub-id-type="doi">10.1016/S0378-1119(99)00320-0</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fulgsang</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Patterns of context-dependent codon biases.</article-title>
<source>Biochem Biophys Res Commun</source>
<year>2003</year>
<volume>304</volume>
<fpage>86</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="pmid">12705888</pub-id>
<pub-id pub-id-type="doi">10.1016/S0006-291X(03)00530-8</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gouy</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Codon contexts in enterobacterial and coliphage genes.</article-title>
<source>Mol Biol Evol</source>
<year>1987</year>
<volume>4</volume>
<fpage>426</fpage>
<lpage>444</lpage>
<pub-id pub-id-type="pmid">3128715</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yarus</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Folley</surname>
<given-names>LS</given-names>
</name>
</person-group>
<article-title>Sense codons are found in specific contexts.</article-title>
<source>J Mol Biol</source>
<year>1985</year>
<volume>182</volume>
<fpage>529</fpage>
<lpage>540</lpage>
<pub-id pub-id-type="pmid">3892014</pub-id>
<pub-id pub-id-type="doi">10.1016/0022-2836(85)90239-6</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>RH</given-names>
</name>
</person-group>
<article-title>Codon context and protein synthesis: enhancements of the genetic code.</article-title>
<source>Biochimie</source>
<year>1994</year>
<volume>76</volume>
<fpage>351</fpage>
<lpage>354</lpage>
<pub-id pub-id-type="pmid">7849098</pub-id>
<pub-id pub-id-type="doi">10.1016/0300-9084(94)90108-2</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Carrier</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Buckingham</surname>
<given-names>RH</given-names>
</name>
</person-group>
<article-title>An effect of codon context on the mistranslation of UGU codons
<italic>in vitro</italic>
.</article-title>
<source>J Mol Biol</source>
<year>1984</year>
<volume>175</volume>
<fpage>29</fpage>
<lpage>38</lpage>
<pub-id pub-id-type="pmid">6374156</pub-id>
<pub-id pub-id-type="doi">10.1016/0022-2836(84)90443-1</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Murgola</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Pagel</surname>
<given-names>FT</given-names>
</name>
<name>
<surname>Hijazi</surname>
<given-names>KA</given-names>
</name>
</person-group>
<article-title>Codon context effects in missense suppression.</article-title>
<source>J Mol Biol</source>
<year>1984</year>
<volume>175</volume>
<fpage>19</fpage>
<lpage>27</lpage>
<pub-id pub-id-type="pmid">6374155</pub-id>
<pub-id pub-id-type="doi">10.1016/0022-2836(84)90442-X</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dix</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>Codon choice and gene expression: synonymous codons differ in translational accuracy.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1989</year>
<volume>86</volume>
<fpage>6888</fpage>
<lpage>6892</lpage>
<pub-id pub-id-type="pmid">2674938</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Hottes</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>L</given-names>
</name>
<name>
<surname>McAdams</surname>
<given-names>HH</given-names>
</name>
</person-group>
<article-title>Codon usage between genomes is constrained by genome-wide mutational processes.</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2004</year>
<volume>101</volume>
<fpage>3480</fpage>
<lpage>3485</lpage>
<pub-id pub-id-type="pmid">14990797</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.0307827100</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eyre-Walker</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Synonymous codon bias is related to gene length in
<italic>Escherichia coli </italic>
: selection for translational accuracy?</article-title>
<source>Mol Biol Evol</source>
<year>1996</year>
<volume>13</volume>
<fpage>864</fpage>
<lpage>872</lpage>
<pub-id pub-id-type="pmid">8754221</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Antezana</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Mammalian mutation pressure, synonymous codon choice, and mRNA degradation.</article-title>
<source>J Mol Evol</source>
<year>2003</year>
<volume>57</volume>
<fpage>694</fpage>
<lpage>701</lpage>
<pub-id pub-id-type="pmid">14745538</pub-id>
<pub-id pub-id-type="doi">10.1007/s00239-003-2519-1</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akashi</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Codon bias evolution in
<italic>Drosophila</italic>
. Population genetics of mutation-selection drift.</article-title>
<source>Gene</source>
<year>1997</year>
<volume>205</volume>
<fpage>269</fpage>
<lpage>278</lpage>
<pub-id pub-id-type="pmid">9461401</pub-id>
<pub-id pub-id-type="doi">10.1016/S0378-1119(97)00400-9</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sueoka</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kawanishi</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>DNA G + C content of the third codon position and codon usage biases of human genes.</article-title>
<source>Gene</source>
<year>2000</year>
<fpage>53</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="pmid">11164037</pub-id>
<pub-id pub-id-type="doi">10.1016/S0378-1119(00)00480-7</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lobry</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Sueoka</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Asymmetric directional mutation pressures in bacteria.</article-title>
<source>Genome Biol</source>
<year>2002</year>
<volume>3</volume>
<fpage>research0058.1</fpage>
<lpage>0058.14</lpage>
<pub-id pub-id-type="pmid">12372146</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2002-3-10-research0058</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Knight</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Freeland</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Landweber</surname>
<given-names>LF</given-names>
</name>
</person-group>
<article-title>A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes.</article-title>
<source>Genome Biol</source>
<year>2001</year>
<volume>2</volume>
<fpage>research0010.1</fpage>
<lpage>100.13</lpage>
<pub-id pub-id-type="pmid">11305938</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2001-2-4-research0010</pub-id>
</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Osawa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jukes</surname>
<given-names>TH</given-names>
</name>
</person-group>
<article-title>On codon reassignment.</article-title>
<source>J Mol Evol</source>
<year>1995</year>
<volume>41</volume>
<fpage>247</fpage>
<lpage>249</lpage>
<pub-id pub-id-type="pmid">7666454</pub-id>
<pub-id pub-id-type="doi">10.1007/BF00170679</pub-id>
</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Knight</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Freeland</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Landweber</surname>
<given-names>LF</given-names>
</name>
</person-group>
<article-title>Rewiring the keyboard: evolvability of the genetic code.</article-title>
<source>Nat Rev Genet</source>
<year>2001</year>
<volume>2</volume>
<fpage>49</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">11253070</pub-id>
<pub-id pub-id-type="doi">10.1038/35047500</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Puhler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kalinowski</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Comparing expression level-dependent features in codon usage with protein abundance: an analysis of 'predictive proteomics'.</article-title>
<source>Proteomics</source>
<year>2004</year>
<volume>4</volume>
<fpage>46</fpage>
<lpage>58</lpage>
<pub-id pub-id-type="pmid">14730671</pub-id>
<pub-id pub-id-type="doi">10.1002/pmic.200300501</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cohen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Skiena</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Natural selection and algorithmic design of mRNA.</article-title>
<source>J Comput Biol</source>
<year>2003</year>
<volume>10</volume>
<fpage>419</fpage>
<lpage>432</lpage>
<pub-id pub-id-type="pmid">12935336</pub-id>
<pub-id pub-id-type="doi">10.1089/10665270360688101</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boycheva</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chkodrov</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>Codon pairs in the genome of
<italic>Escherichia coli</italic>
.</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>987</fpage>
<lpage>998</lpage>
<pub-id pub-id-type="pmid">12761062</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btg082</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Giddings</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Gesteland</surname>
<given-names>RF</given-names>
</name>
<name>
<surname>Atkins</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>IP</given-names>
</name>
</person-group>
<article-title>Computational identification of putative programmed translational frameshift sites.</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<fpage>1046</fpage>
<lpage>1053</lpage>
<pub-id pub-id-type="pmid">12176827</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/18.8.1046</pub-id>
</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hooper</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Berg</surname>
<given-names>OG</given-names>
</name>
</person-group>
<article-title>Detection of genes with atypical nucleotide sequence in microbial genomes.</article-title>
<source>J Mol Evol</source>
<year>2002</year>
<volume>54</volume>
<fpage>365</fpage>
<lpage>375</lpage>
<pub-id pub-id-type="pmid">11847562</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Avery</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Henderson</surname>
<given-names>DA</given-names>
</name>
</person-group>
<article-title>Fitting Markov chain models to discrete state series such as DNA sequences.</article-title>
<source>Appl Statist</source>
<year>1999</year>
<volume>48</volume>
<fpage>53</fpage>
<lpage>61</lpage>
</citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haberman</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>Analysis of residuals in cross-classified tables.</article-title>
<source>Biometrics</source>
<year>1973</year>
<volume>29</volume>
<fpage>205</fpage>
<lpage>220</lpage>
</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharp</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>WH</given-names>
</name>
</person-group>
<article-title>The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications.</article-title>
<source>Nucleic Acids Res</source>
<year>1987</year>
<volume>15</volume>
<fpage>1281</fpage>
<lpage>1295</lpage>
<pub-id pub-id-type="pmid">3547335</pub-id>
</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wright</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>The 'effective number of codons' used in a gene.</article-title>
<source>Gene</source>
<year>1990</year>
<volume>87</volume>
<fpage>23</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="pmid">2110097</pub-id>
<pub-id pub-id-type="doi">10.1016/0378-1119(90)90491-9</pub-id>
</citation>
</ref>
</ref-list>
<sec sec-type="display-objects">
<title>Figures and Tables</title>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>Architecture of the Anaconda bioinformation system. The Anaconda package contains a data-acquisition module that permits downloading raw data from genome databases and filter it into a local database. This data is then processed using a ribosome simulation algorithm and transferred to a 64 × 64 table that renders itself to statistical analysis. The processed data is then transferred to the visualization module that has a number of different tools that permit different types of data visualization and analysis. RSCU, relative synonymous codon usage values from very highly expressed genes, necessary for codon adaptation index (CAI) calculation (see [55]).</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-1"></graphic>
</fig>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>Codon context is highly biased in yeast. The bar chart shows the distribution of the adjusted residual values given in Table 1 for the 3' context of the
<italic>S. cerevisiae </italic>
CUG codon. See Table 1 legend for details.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-2"></graphic>
</fig>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>
<italic>S. cerevisiae </italic>
genome map of codon context. For visualization purposes the values of the residuals of the 64 × 64 codon context table were converted into a color-coded map in which red represents the negative values (bad context) and green the positive values (good context). The values that are not statistically significant are indicated in black (-3 to +3). The color scale represents the full range of values of residuals for yeast codon context. Fixed codons represent the P-site codons and the 3' context refers to the A-site codons as viewed by the ribosome simulation software module.
<bold>(a) </bold>
The yeast complete 3' codon context map shows a diagonal green line, which indicates that most codons prefer themselves as neighbors on their 3' side. The map also indicates that without exception, each codon prefers a defined set of neighbors (green) and avoids others (red). The intensity of red and green indicates the extent of the preference or rejection.
<bold>(b) </bold>
Codons that are represented in the map can be visualized by zooming into particular areas of the map (boxed in dark blue in (a)). The order of the fixed and 3' context codons indicated in (b) is predefined in the software module.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-3"></graphic>
</fig>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption>
<p>Distribution of the adjusted residuals from the
<italic>S. cerevisiae </italic>
codon context map. Forty-three percent of the residuals fall within the nonsignificant -3 to +3 interval, indicating that a very large number of codon combinations are not significant to the rejection of independence - that is, are not significantly preferred or rejected in this genome.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-4"></graphic>
</fig>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption>
<p>Codon context bias is organized in discrete groups. A two-way Pearson clustering by single linkage of the codon context data highlights regions of good and bad codon context, indicating that codon context bias is highly structured. A significant number of codons do not fall into the major clusters, indicating that their preferences and rejections are defined on a one-to-one basis. The 3' codon contexts whose residual values fall within the nonstatistically significant -3 to +3 interval are also scattered in the map, indicating that there is no cluster of codons that have little or no preference for particular codons as 3' neighbors.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-5"></graphic>
</fig>
<fig position="float" id="F6">
<label>Figure 6</label>
<caption>
<p>Codon clusters define specific codon-context rules in
<italic>S. cerevisiae</italic>
.
<bold>(a) </bold>
A major cluster of bad context is defined by codon pairs whose wobble base of the first codon is uridine (U) and the first base of the 3' neighbor is adenosine (A). This cluster defines a XXU-AYY context rule, in which X and Y are any nucleotide. Within this cluster some of the Asn and Ser codons represent exceptions to the above rule as their residual signal is positive (green cells).
<bold>(b,c) </bold>
Two of the good context clusters define two distinct codon context rules, namely (b) XXC-AYY and (c) XXU-GYY rules. As before, some of the codons within those clusters are exceptions to the above rules and a number of codons have no particular preferences or rejections (black cells).</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-6"></graphic>
</fig>
<fig position="float" id="F7">
<label>Figure 7</label>
<caption>
<p>Codon context maps are species specific. Comparison of the genomic codon context maps of
<italic>S. cerevisiae</italic>
,
<italic>C. albicans</italic>
,
<italic>S. pombe </italic>
and
<italic>E. coli </italic>
shows that they are all different. There are common features between the maps but differences are clearly visible, indicating that each species has a specific set of codon context rules. Among the common features, the green diagonal line in the yeast maps is the most relevant. This diagonal indicates that almost all codons prefer themselves as their 3' neighbors and is strongly marked in the
<italic>C. albicans </italic>
context map, suggesting that in this species, tandem codon repetition is very common.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-7"></graphic>
</fig>
<fig position="float" id="F8">
<label>Figure 8</label>
<caption>
<p>Differential display maps for comparative analysis of codon context. To compare the codon context maps of different species, the order of the codons displayed in the map was fixed and the maps overlapped using a differential display tool built into the Anaconda bioinformation system. Maps representing the context differences between
<bold>(a) </bold>
<italic>S. cerevisiae </italic>
and
<italic>C. albicans</italic>
,
<bold>(b) </bold>
<italic>E. coli </italic>
and
<italic>S. cerevisiae </italic>
and
<bold>(c) </bold>
<italic>C. albicans </italic>
and
<italic>S. cerevisiae </italic>
were obtained by calculating the module of the difference between the residuals of each map. The differences are represented in blue according to the color scale. The blue cells indicate the highest context difference and the black cells represent pairs of codons that have similar residual values between two species (module of the difference between residuals falls within the 0-15 interval). The maps show rather large differences in codon context between
<italic>E. coli </italic>
and
<italic>S. cerevisiae </italic>
or
<italic>C. albicans </italic>
and smaller differences between
<italic>S. cerevisiae </italic>
and
<italic>C. albicans</italic>
.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-8"></graphic>
</fig>
<fig position="float" id="F9">
<label>Figure 9</label>
<caption>
<p>GC3 distribution in the complete ORFeome of
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
and its influence on the overall codon-pair context analysis. In order to study the role of mutational bias upon codon-pair context the ORFeomes of both
<bold>(a,b) </bold>
<italic>S. cerevisiae </italic>
and
<bold>(c,d) </bold>
<italic>E. coli </italic>
were distributed according to the %GC3 of individual ORFs. The GC3 of the
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
ORFeomes varied between the intervals 11.9-76.7% and 20-89.4%, respectively. For
<italic>S. cerevisiae</italic>
, however, most ORFs had a %GC3 between 35 and 40% (light blue bar in (a)), while for
<italic>E. coli </italic>
the majority of the ORFs have a %GC3 between 50 and 60% (light blue bars in (c)). Determination of the codon-pair context for the low and high GC3 subgroups permitted identification of their context differences. The computation of the number of residuals that changed their signal (for example, positive to negative) from one subgroup (low GC3) into the other (high GC3) provided a quantitative measure of the role of GC3 on codon-pair context (red bars in (b) and (d)). For both
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
GC3 bias has a strong effect on codon-pair context for weak residuals (-3 to +3), but no such effect was observed for contexts with the highest residuals (strong context), indicating that GC3 bias is mainly felt in weak codon-pair contexts.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-9"></graphic>
</fig>
<fig position="float" id="F10">
<label>Figure 10</label>
<caption>
<p>ORFs with low and high GC3 have different codon-pair contexts. To highlight the effect of GC3 bias on codon-pair context, the context maps for the subgroups of low GC3 and high GC3 ORFs of both
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
were overlapped using the differential display codon-pair context (DCM) tool. The DCM maps for
<italic>S. cerevisiae </italic>
and
<italic>E. coli </italic>
showed significant differences (light blue cells in the DCMs), in particular in
<italic>E. coli</italic>
, indicating that GC3 bias influences codon-pair context.</p>
</caption>
<graphic xlink:href="gb-2005-6-3-r28-10"></graphic>
</fig>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>The 3' codon context of CUG</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="left">3' Codon</td>
<td align="left">Residual</td>
<td align="left">3' Codon</td>
<td align="left">Residual</td>
<td align="left">3' Codon</td>
<td align="left">Residual</td>
<td align="left">3' Codon</td>
<td align="left">Residual</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">AAA</td>
<td align="left">7.436</td>
<td align="left">ACG</td>
<td align="left">0.644</td>
<td align="left">UCU</td>
<td align="left">-10.007</td>
<td align="left">CCA</td>
<td align="left">-2.438</td>
</tr>
<tr>
<td align="left">AAG</td>
<td align="left">1.927</td>
<td align="left">CGU</td>
<td align="left">-1.809</td>
<td align="left">CUU</td>
<td align="left">1.167</td>
<td align="left">CCG</td>
<td align="left">2.895</td>
</tr>
<tr>
<td align="left">AAU</td>
<td align="left">0.397</td>
<td align="left">CGC</td>
<td align="left">2.981</td>
<td align="left">CUC</td>
<td align="left">2.18</td>
<td align="left">CAU</td>
<td align="left">2.026</td>
</tr>
<tr>
<td align="left">AAC</td>
<td align="left">2.037</td>
<td align="left">CGA</td>
<td align="left">8.258</td>
<td align="left">CUA</td>
<td align="left">5.258</td>
<td align="left">CAC</td>
<td align="left">2.642</td>
</tr>
<tr>
<td align="left">ACU</td>
<td align="left">-6.947</td>
<td align="left">CGG</td>
<td align="left">5.404</td>
<td align="left">CUG</td>
<td align="left">6.774</td>
<td align="left">CAA</td>
<td align="left">4.049</td>
</tr>
<tr>
<td align="left">ACC</td>
<td align="left">-5.239</td>
<td align="left">ACG</td>
<td align="left">-4.726</td>
<td align="left">CCU</td>
<td align="left">-1.769</td>
<td align="left">CAG</td>
<td align="left">7.105</td>
</tr>
<tr>
<td align="left">ACA</td>
<td align="left">-5.12</td>
<td align="left">AGG</td>
<td align="left">-0.666</td>
<td align="left">CCC</td>
<td align="left">8.894</td>
<td align="left">UAA</td>
<td align="left">0.22</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Positive values indicate that the 3' codons appear in the genome more times than expected (good context) while negative values indicate that the 3' codons appear fewer times than expected assuming a random distribution (bad context). Residual values give a quantitative indication of the context bias, where values falling within the -3 to +3 interval are not statistically significant (no bias). See also Figure 2.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>Ranking of the 10 most negative and 10 most positive residual values in
<italic>S. cerevisiae</italic>
,
<italic>S. pombe </italic>
and
<italic>C. albicans </italic>
contexts</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="left">
<italic>S. cerevisiae</italic>
</td>
<td></td>
<td align="left">
<italic>S. pombe</italic>
</td>
<td></td>
<td align="left">
<italic>C. albicans</italic>
</td>
<td></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Context</td>
<td align="left">Residual</td>
<td align="left">Context</td>
<td align="left">Residual</td>
<td align="left">Context</td>
<td align="left">Residual</td>
</tr>
<tr>
<td colspan="6">
<hr></hr>
</td>
</tr>
<tr>
<td align="left" colspan="6">Most negative values</td>
</tr>
<tr>
<td align="left">
<bold>UUU → AAG</bold>
</td>
<td align="left">
<bold>-24.58</bold>
</td>
<td align="left">GAA → CCU</td>
<td align="left">-24.159</td>
<td align="left">UUU → CCA</td>
<td align="left">-32.691</td>
</tr>
<tr>
<td align="left">GAU → AAG</td>
<td align="left">-22.487</td>
<td align="left">GAU → AAG</td>
<td align="left">-24.124</td>
<td align="left">UUC → GAA</td>
<td align="left">-31.586</td>
</tr>
<tr>
<td align="left">AUU → AAA</td>
<td align="left">-21.546</td>
<td align="left">
<bold>UUU → AAG</bold>
</td>
<td align="left">
<bold>-23.899</bold>
</td>
<td align="left">UCA → GAU</td>
<td align="left">-28.317</td>
</tr>
<tr>
<td align="left">
<bold>AUU → AAG</bold>
</td>
<td align="left">
<bold>-21.285</bold>
</td>
<td align="left">AUU → AAA</td>
<td align="left">-22.923</td>
<td align="left">
<bold>AUU → AAG</bold>
</td>
<td align="left">
<bold>-28.284</bold>
</td>
</tr>
<tr>
<td align="left">CUU → AAA</td>
<td align="left">-20.656</td>
<td align="left">UCU → AAG</td>
<td align="left">-22.334</td>
<td align="left">GGU → UUU</td>
<td align="left">-27.198</td>
</tr>
<tr>
<td align="left">UUU → AAA</td>
<td align="left">-20.563</td>
<td align="left">CUU → AAA</td>
<td align="left">-21.25</td>
<td align="left">AAC → UUA</td>
<td align="left">-26.198</td>
</tr>
<tr>
<td align="left">UCC → GAA</td>
<td align="left">-20.069</td>
<td align="left">GUU → AAA</td>
<td align="left">-21.218</td>
<td align="left">GAC → UUA</td>
<td align="left">-25.795</td>
</tr>
<tr>
<td align="left">AAG → UCU</td>
<td align="left">-19.706</td>
<td align="left">
<bold>AUU → AAG</bold>
</td>
<td align="left">
<bold>-21.08</bold>
</td>
<td align="left">
<bold>UUU → AAG</bold>
</td>
<td align="left">
<bold>-25.316</bold>
</td>
</tr>
<tr>
<td align="left">GAU → CAA</td>
<td align="left">-19.274</td>
<td align="left">UUU → AAA</td>
<td align="left">-20.704</td>
<td align="left">GGA → AAA</td>
<td align="left">-25.26</td>
</tr>
<tr>
<td align="left">GAA → CCA</td>
<td align="left">-19.155</td>
<td align="left">GAA → UCU</td>
<td align="left">-20.698</td>
<td align="left">UUC → GAU</td>
<td align="left">-24.822</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left" colspan="6">Most positive values</td>
</tr>
<tr>
<td align="left">GAU → GAU</td>
<td align="left">29.839</td>
<td align="left">CAG → CAA</td>
<td align="left">25.279</td>
<td align="left">ACA → ACA</td>
<td align="left">49.476</td>
</tr>
<tr>
<td align="left">AAG → AAG</td>
<td align="left">29.937</td>
<td align="left">GAA → GAG</td>
<td align="left">25.644</td>
<td align="left">CAC → CAC</td>
<td align="left">49.511</td>
</tr>
<tr>
<td align="left">UUG → AAA</td>
<td align="left">30.459</td>
<td align="left">AAG → AAG</td>
<td align="left">26.901</td>
<td align="left">CCA → CCA</td>
<td align="left">52.889</td>
</tr>
<tr>
<td align="left">
<bold>GAA → GAA</bold>
</td>
<td align="left">
<bold>30.573</bold>
</td>
<td align="left">CUU → CGU</td>
<td align="left">27.013</td>
<td align="left">
<bold>GAA → GAA</bold>
</td>
<td align="left">
<bold>57.356</bold>
</td>
</tr>
<tr>
<td align="left">AAG → AAA</td>
<td align="left">31.427</td>
<td align="left">
<bold>GAA → GAA</bold>
</td>
<td align="left">
<bold>28.051</bold>
</td>
<td align="left">AAG → AAA</td>
<td align="left">58.605</td>
</tr>
<tr>
<td align="left">CAG → CAA</td>
<td align="left">33.445</td>
<td align="left">AGA → AGA</td>
<td align="left">29.623</td>
<td align="left">
<bold>GCU → GCU</bold>
</td>
<td align="left">
<bold>62.611</bold>
</td>
</tr>
<tr>
<td align="left">AGA → AGA</td>
<td align="left">33.798</td>
<td align="left">AAA → AAG</td>
<td align="left">30.358</td>
<td align="left">ACC → ACC</td>
<td align="left">70.117</td>
</tr>
<tr>
<td align="left">
<bold>GGU → GGU</bold>
</td>
<td align="left">
<bold>35.979</bold>
</td>
<td align="left">
<bold>GCU → GCU</bold>
</td>
<td align="left">
<bold>32.158</bold>
</td>
<td align="left">
<bold>GGU → GGU</bold>
</td>
<td align="left">
<bold>72.48</bold>
</td>
</tr>
<tr>
<td align="left">
<bold>GCU → GCU</bold>
</td>
<td align="left">
<bold>36.231</bold>
</td>
<td align="left">
<bold>GGU → GGU</bold>
</td>
<td align="left">
<bold>33.681</bold>
</td>
<td align="left">AAC → AAC</td>
<td align="left">87.115</td>
</tr>
<tr>
<td align="left">CAG → CAG</td>
<td align="left">45.422</td>
<td align="left">UCU → UCU</td>
<td align="left">35.086</td>
<td align="left">CAA → CAA</td>
<td align="left">105.216</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Anaconda was used to analyze the codon context of the complete genomes of
<italic>S. cerevisiae</italic>
,
<italic>S. pombe </italic>
and
<italic>C. albicans</italic>
. All possible codon contexts were ranked according to their calculated adjusted residuals, and the 10 most negative and 10 most positive were selected as extreme examples. The results indicate that only a small number of bad or good codon pairs (shown in bold) are shared between all three yeast species.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>Ranking of the codon pairs that display the highest residual difference between
<italic>S. cerevisiae</italic>
,
<italic>S. pombe </italic>
and
<italic>C. albicans</italic>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="left" colspan="2">
<italic>S. pombe-S. cerevisiae</italic>
</td>
<td align="left" colspan="2">
<italic>S. pombe-C. albicans</italic>
</td>
<td align="left" colspan="2">
<italic>C. albicans-S. cerevisiae</italic>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Context</td>
<td align="left">Difference</td>
<td align="left">Context</td>
<td align="left">Difference</td>
<td align="left">Context</td>
<td align="left">Difference</td>
</tr>
<tr>
<td colspan="6">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">CAG → CAG</td>
<td align="left">27,798</td>
<td align="left">
<bold>CAA → CAA</bold>
</td>
<td align="left">
<bold>100,639</bold>
</td>
<td align="left">
<bold>CAA → CAA</bold>
</td>
<td align="left">
<bold>79,38</bold>
</td>
</tr>
<tr>
<td align="left">UUG → AAA</td>
<td align="left">25,266</td>
<td align="left">AAC → AAC</td>
<td align="left">76,716</td>
<td align="left">AAC → AAC</td>
<td align="left">62,939</td>
</tr>
<tr>
<td align="left">CUU → CGU</td>
<td align="left">25,168</td>
<td align="left">ACC → ACC</td>
<td align="left">60,208</td>
<td align="left">ACC → ACC</td>
<td align="left">50,735</td>
</tr>
<tr>
<td align="left">CAA → CAG</td>
<td align="left">24,507</td>
<td align="left">CCA → CCA</td>
<td align="left">47,603</td>
<td align="left">CCA → CCA</td>
<td align="left">39,196</td>
</tr>
<tr>
<td align="left">AAA → AAG</td>
<td align="left">23,593</td>
<td align="left">ACA → ACA</td>
<td align="left">47,359</td>
<td align="left">CAC → CAC</td>
<td align="left">39,032</td>
</tr>
<tr>
<td align="left">UUC → AAA</td>
<td align="left">22,86</td>
<td align="left">CAC → CAC</td>
<td align="left">47,175</td>
<td align="left">ACA → ACA</td>
<td align="left">39,029</td>
</tr>
<tr>
<td align="left">AAU → AAU</td>
<td align="left">22,021</td>
<td align="left">GGA → AAA</td>
<td align="left">45,043</td>
<td align="left">GGU → GGU</td>
<td align="left">36,501</td>
</tr>
<tr>
<td align="left">
<bold>CAA → CAA</bold>
</td>
<td align="left">
<bold>21,259</bold>
</td>
<td align="left">AAG → AAA</td>
<td align="left">43,994</td>
<td align="left">GGA → UUA</td>
<td align="left">35,81</td>
</tr>
<tr>
<td align="left">GUU → CUU</td>
<td align="left">21,194</td>
<td align="left">CAA → CAG</td>
<td align="left">43,927</td>
<td align="left">GGA → AAA</td>
<td align="left">29,786</td>
</tr>
<tr>
<td align="left">GAU → GAC</td>
<td align="left">19,483</td>
<td align="left">UCA → UCA</td>
<td align="left">41,533</td>
<td align="left">GUU → GAU</td>
<td align="left">29,753</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Anaconda was used to analyze the codon context of the complete genomes of
<italic>S. cerevisiae</italic>
,
<italic>S. pombe </italic>
and
<italic>C. albicans</italic>
. The adjusted residuals of each codon context calculated for each pair of genomes - that is,
<italic>S. pombe-S. cerevisiae</italic>
;
<italic>S. pombe-C. albicans</italic>
; and
<italic>C. albicans-S. cerevisiae </italic>
- were subtracted and the result converted into a positive number by a module calculation. These values were used to rank the respective codon contexts and the 10 highest cases obtained were selected. Among these three yeast species,
<italic>S. pombe </italic>
and
<italic>S. cerevisiae </italic>
display the lowest differences, with the maximum value of the difference being found for the CAG-CAG pair (27.798). For
<italic>S. pombe </italic>
and
<italic>C. albicans </italic>
that value reaches 100.639 for the CAA-CAA codon pair. It is noteworthy that the highest difference value for the former pair is lower than the lowest value for the latter in this ranking of context differences. The only codon pair shared between all three yeast pairs is shown in bold.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption>
<p>GC3 influences codon-pair context</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="left" colspan="5">Residuals</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">ORFeome</td>
<td align="left">[- ∞, -9]</td>
<td align="left">[-9, -3]</td>
<td align="left">[-3, 3]</td>
<td align="left">[3, 9]</td>
<td align="left">[9, + ∞]</td>
</tr>
<tr>
<td colspan="6">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<italic>S. cerevisiae</italic>
</td>
<td align="left">0.0</td>
<td align="left">2.5</td>
<td align="left">
<bold>94.2</bold>
</td>
<td align="left">3.3</td>
<td align="left">0.0</td>
</tr>
<tr>
<td align="left">
<italic>E. coli</italic>
</td>
<td align="left">0.7</td>
<td align="left">15.2</td>
<td align="left">
<bold>67.1</bold>
</td>
<td align="left">15.0</td>
<td align="left">2.0</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>In order to measure the influence of GC bias on codon-pair context, the percentage of adjusted residuals that reversed their residual signals from positive to negative (or vice versa) between low and high GC3 subgroups of ORFs was determined. Most of the residual signal inversions for both species considered fall within the nonstatistically significant interval of the residuals (-3 to +3) indicating that GC3 bias is mainly felt in codon-pairs where the association is very weak or nonexistent (highlighted in bold).</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption>
<p>A hypothetical
<italic>r </italic>
×
<italic>c </italic>
contingency table</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td></td>
<td align="left">B
<sub>1</sub>
</td>
<td align="left">...</td>
<td align="left">B
<sub>j</sub>
</td>
<td align="left">...</td>
<td align="left">B
<sub>c</sub>
</td>
<td align="left">Marginal total</td>
</tr>
<tr>
<td align="left">A
<sub>1</sub>
</td>
<td align="left">n
<sub>11</sub>
</td>
<td align="left">...</td>
<td align="left">n
<sub>1j</sub>
</td>
<td align="left">...</td>
<td align="left">n
<sub>1c</sub>
</td>
<td align="left">n
<sub>1</sub>
*</td>
</tr>
<tr>
<td align="left">...</td>
<td></td>
<td align="left">...</td>
<td></td>
<td align="left">...</td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">A
<sub>l</sub>
</td>
<td align="left">n
<sub>l1</sub>
</td>
<td align="left">...</td>
<td align="left">n
<sub>ij</sub>
</td>
<td align="left">...</td>
<td align="left">n
<sub>lc</sub>
</td>
<td align="left">n
<sub>1</sub>
*</td>
</tr>
<tr>
<td align="left">...</td>
<td></td>
<td align="left">...</td>
<td></td>
<td align="left">...</td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">A
<sub>r</sub>
</td>
<td align="left">n
<sub>r1</sub>
</td>
<td align="left">...</td>
<td align="left">n
<sub>rl</sub>
</td>
<td align="left">...</td>
<td align="left">n
<sub>rc</sub>
</td>
<td align="left">n
<sub>r</sub>
*</td>
</tr>
<tr>
<td align="left">Marginal total</td>
<td align="left">n*
<sub>1</sub>
</td>
<td align="left">...</td>
<td align="left">n*
<sub>i</sub>
</td>
<td align="left">...</td>
<td align="left">n*
<sub>c</sub>
</td>
<td align="left">N</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The table illustrates how contingency tables were constructed and how the statistical methodologies described in methods were implemented. One set of categories is represented by rows, the other by columns. In the present case, if the 3' context is being analyzed by Anaconda the rows of the table (A) correspond to the 5' codons and the columns (B) to the 3' codons of each pair.</p>
</table-wrap-foot>
</table-wrap>
</sec>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000258 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000258 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:1088947
   |texte=   Comparative context analysis of codon pairs on an ORFeome scale
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:15774029" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024