Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Codon-triplet context unveils unique features of the Candida albicans protein coding genome

Identifieur interne : 000247 ( Pmc/Corpus ); précédent : 000246; suivant : 000248

Codon-triplet context unveils unique features of the Candida albicans protein coding genome

Auteurs : Gabriela R. Moura ; José P. Lousado ; Miguel Pinheiro ; Laura Carreto ; Raquel M. Silva ; José L. Oliveira ; Manuel As Santos

Source :

RBID : PMC:2244636

Abstract

Background

The evolutionary forces that determine the arrangement of synonymous codons within open reading frames and fine tune mRNA translation efficiency are not yet understood. In order to tackle this question we have carried out a large scale study of codon-triplet contexts in 11 fungal species to unravel associations or relationships between codons present at the ribosome A-, P- and E-sites during each decoding cycle.

Results

Our analysis unveiled high bias within the context of codon-triplets, in particular strong preference for triplets of identical codons. We have also identified a surprisingly large number of codon-triplet combinations that vanished from fungal ORFeomes. Candida albicans exacerbated these features, showed an unbalanced tRNA population for decoding its pool of codons and used near-cognate decoding for a large set of codons, suggesting that unique evolutionary forces shaped the evolution of its ORFeome.

Conclusion

We have developed bioinformatics tools for large-scale analysis of codon-triplet contexts. These algorithms identified codon-triplets context biases, allowed for large scale comparative codon-triplet analysis, and identified rules governing codon-triplet context. They could also detect alterations to the standard genetic code.


Url:
DOI: 10.1186/1471-2164-8-444
PubMed: 18047667
PubMed Central: 2244636

Links to Exploration step

PMC:2244636

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Codon-triplet context unveils unique features of the
<italic>Candida albicans </italic>
protein coding genome</title>
<author>
<name sortKey="Moura, Gabriela R" sort="Moura, Gabriela R" uniqKey="Moura G" first="Gabriela R" last="Moura">Gabriela R. Moura</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lousado, Jose P" sort="Lousado, Jose P" uniqKey="Lousado J" first="José P" last="Lousado">José P. Lousado</name>
<affiliation>
<nlm:aff id="I2">ESTGL, Polytechnic Institute of Viseu, 5100-074 Lamego, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pinheiro, Miguel" sort="Pinheiro, Miguel" uniqKey="Pinheiro M" first="Miguel" last="Pinheiro">Miguel Pinheiro</name>
<affiliation>
<nlm:aff id="I3">Institute of Electronics and Telematics Engineering (IEETA). University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Carreto, Laura" sort="Carreto, Laura" uniqKey="Carreto L" first="Laura" last="Carreto">Laura Carreto</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Raquel M" sort="Silva, Raquel M" uniqKey="Silva R" first="Raquel M" last="Silva">Raquel M. Silva</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose L" sort="Oliveira, Jose L" uniqKey="Oliveira J" first="José L" last="Oliveira">José L. Oliveira</name>
<affiliation>
<nlm:aff id="I3">Institute of Electronics and Telematics Engineering (IEETA). University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Santos, Manuel As" sort="Santos, Manuel As" uniqKey="Santos M" first="Manuel As" last="Santos">Manuel As Santos</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">18047667</idno>
<idno type="pmc">2244636</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244636</idno>
<idno type="RBID">PMC:2244636</idno>
<idno type="doi">10.1186/1471-2164-8-444</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000247</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000247</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Codon-triplet context unveils unique features of the
<italic>Candida albicans </italic>
protein coding genome</title>
<author>
<name sortKey="Moura, Gabriela R" sort="Moura, Gabriela R" uniqKey="Moura G" first="Gabriela R" last="Moura">Gabriela R. Moura</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lousado, Jose P" sort="Lousado, Jose P" uniqKey="Lousado J" first="José P" last="Lousado">José P. Lousado</name>
<affiliation>
<nlm:aff id="I2">ESTGL, Polytechnic Institute of Viseu, 5100-074 Lamego, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pinheiro, Miguel" sort="Pinheiro, Miguel" uniqKey="Pinheiro M" first="Miguel" last="Pinheiro">Miguel Pinheiro</name>
<affiliation>
<nlm:aff id="I3">Institute of Electronics and Telematics Engineering (IEETA). University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Carreto, Laura" sort="Carreto, Laura" uniqKey="Carreto L" first="Laura" last="Carreto">Laura Carreto</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Raquel M" sort="Silva, Raquel M" uniqKey="Silva R" first="Raquel M" last="Silva">Raquel M. Silva</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose L" sort="Oliveira, Jose L" uniqKey="Oliveira J" first="José L" last="Oliveira">José L. Oliveira</name>
<affiliation>
<nlm:aff id="I3">Institute of Electronics and Telematics Engineering (IEETA). University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Santos, Manuel As" sort="Santos, Manuel As" uniqKey="Santos M" first="Manuel As" last="Santos">Manuel As Santos</name>
<affiliation>
<nlm:aff id="I1">Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The evolutionary forces that determine the arrangement of synonymous codons within open reading frames and fine tune mRNA translation efficiency are not yet understood. In order to tackle this question we have carried out a large scale study of codon-triplet contexts in 11 fungal species to unravel associations or relationships between codons present at the ribosome A-, P- and E-sites during each decoding cycle.</p>
</sec>
<sec>
<title>Results</title>
<p>Our analysis unveiled high bias within the context of codon-triplets, in particular strong preference for triplets of identical codons. We have also identified a surprisingly large number of codon-triplet combinations that vanished from fungal ORFeomes.
<italic>Candida albicans </italic>
exacerbated these features, showed an unbalanced tRNA population for decoding its pool of codons and used near-cognate decoding for a large set of codons, suggesting that unique evolutionary forces shaped the evolution of its ORFeome.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have developed bioinformatics tools for large-scale analysis of codon-triplet contexts. These algorithms identified codon-triplets context biases, allowed for large scale comparative codon-triplet analysis, and identified rules governing codon-triplet context. They could also detect alterations to the standard genetic code.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-title>BMC Genomics</journal-title>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">18047667</article-id>
<article-id pub-id-type="pmc">2244636</article-id>
<article-id pub-id-type="publisher-id">1471-2164-8-444</article-id>
<article-id pub-id-type="doi">10.1186/1471-2164-8-444</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Codon-triplet context unveils unique features of the
<italic>Candida albicans </italic>
protein coding genome</article-title>
</title-group>
<contrib-group>
<contrib id="A1" contrib-type="author">
<name>
<surname>Moura</surname>
<given-names>Gabriela R</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>gmoura@bio.ua.pt</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Lousado</surname>
<given-names>José P</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>jlousado@sapo.pt</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Pinheiro</surname>
<given-names>Miguel</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>monsanto@ieeta.pt</email>
</contrib>
<contrib id="A4" contrib-type="author">
<name>
<surname>Carreto</surname>
<given-names>Laura</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>lcarreto@bio.ua.pt</email>
</contrib>
<contrib id="A5" contrib-type="author">
<name>
<surname>Silva</surname>
<given-names>Raquel M</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>rsilva@bio.ua.pt</email>
</contrib>
<contrib id="A6" contrib-type="author">
<name>
<surname>Oliveira</surname>
<given-names>José L</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>jlo@ieeta.pt</email>
</contrib>
<contrib id="A7" corresp="yes" contrib-type="author">
<name>
<surname>Santos</surname>
<given-names>Manuel AS</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>msantos@bio.ua.pt</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Biology and CESAM. University of Aveiro, 3810-193 Aveiro, Portugal</aff>
<aff id="I2">
<label>2</label>
ESTGL, Polytechnic Institute of Viseu, 5100-074 Lamego, Portugal</aff>
<aff id="I3">
<label>3</label>
Institute of Electronics and Telematics Engineering (IEETA). University of Aveiro, 3810-193 Aveiro, Portugal</aff>
<pub-date pub-type="collection">
<year>2007</year>
</pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>11</month>
<year>2007</year>
</pub-date>
<volume>8</volume>
<fpage>444</fpage>
<lpage>444</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2164/8/444"></ext-link>
<history>
<date date-type="received">
<day>15</day>
<month>6</month>
<year>2007</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>11</month>
<year>2007</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2007 Moura et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2007</copyright-year>
<copyright-holder>Moura et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> Moura R Gabriela gmoura@bio.ua.pt Codon-triplet context unveils unique features of the Candida albicans protein coding genome 2007BMC Genomics 8(1): 444-. (2007)1471-2164(2007)8:1<444>urn:ISSN:1471-2164</pmc-comment>
</license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>The evolutionary forces that determine the arrangement of synonymous codons within open reading frames and fine tune mRNA translation efficiency are not yet understood. In order to tackle this question we have carried out a large scale study of codon-triplet contexts in 11 fungal species to unravel associations or relationships between codons present at the ribosome A-, P- and E-sites during each decoding cycle.</p>
</sec>
<sec>
<title>Results</title>
<p>Our analysis unveiled high bias within the context of codon-triplets, in particular strong preference for triplets of identical codons. We have also identified a surprisingly large number of codon-triplet combinations that vanished from fungal ORFeomes.
<italic>Candida albicans </italic>
exacerbated these features, showed an unbalanced tRNA population for decoding its pool of codons and used near-cognate decoding for a large set of codons, suggesting that unique evolutionary forces shaped the evolution of its ORFeome.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have developed bioinformatics tools for large-scale analysis of codon-triplet contexts. These algorithms identified codon-triplets context biases, allowed for large scale comparative codon-triplet analysis, and identified rules governing codon-triplet context. They could also detect alterations to the standard genetic code.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The degeneracy of the genetic code allows synthesis of identical proteins from mRNAs with rather different primary structures. This bias in synonymous codon usage is linked to tRNA abundance, codon-pair context effects, genome G + C pressure, the strength of codon-anticodon interactions, and to other DNA replication, transcription and mRNA translation biases [
<xref ref-type="bibr" rid="B1">1</xref>
-
<xref ref-type="bibr" rid="B5">5</xref>
]. Interestingly, codon-pair context fine tunes mRNA decoding efficiency [
<xref ref-type="bibr" rid="B6">6</xref>
-
<xref ref-type="bibr" rid="B9">9</xref>
]. For example, in
<italic>E. coli</italic>
, 3'context alteration from G to U in the insertion sequence IS911 (A-AAA-AAG) increases frameshifting from 10% to 60% [
<xref ref-type="bibr" rid="B10">10</xref>
], while, intriguingly, the over-represented ACG-CUG codon-pair is translated slower than the under-represented synonymous codon-pair ACC-CUG [
<xref ref-type="bibr" rid="B9">9</xref>
].</p>
<p>Those context effects suggest that codon-pairs are important modulators of mRNA translation accuracy and speed. However, codon-pairs cannot reflect the full bias imposed by the translational machinery on mRNA primary structure since the ribosome has 3 rather than 2 decoding sites, namely A-, P- and E-sites [
<xref ref-type="bibr" rid="B11">11</xref>
]. The A- and P-sites are directly involved in aminoacyl-tRNA (aa-tRNA) selection and translocation and, for these reasons, it is not surprising that codon-pair context influences protein synthesis fidelity. From a structural perspective, the role of the E-site, which is occupied by deacylated tRNA during exit from the ribosome [
<xref ref-type="bibr" rid="B12">12</xref>
], on mRNA decoding speed and accuracy is not so clear. However, E-site occupation does influence decoding fidelity by changing allosterically the affinity of the A-site during selection of in-coming aa-tRNAs [
<xref ref-type="bibr" rid="B13">13</xref>
-
<xref ref-type="bibr" rid="B15">15</xref>
]. This allosteric interaction between the E- and A- sites, plus ribosome crystallography and cryo-EM studies [
<xref ref-type="bibr" rid="B16">16</xref>
], provide strong functional and structural evidence for a critical role of the 3 tRNAs accommodated in the ribosome in decoding efficiency. In other words, the E-site is more than just an exit site for deacylated tRNAs from the ribosome. Hence, codon-triplets present in the ribosome A-, P- and E-sites are expected to play an important role in the accuracy and efficiency of mRNA translation. If so, like codon-pair context, codon-triplet context should be biased. This hypothesis is supported by the observation that non-programmed translational frameshifting and programmed translational events involve more than two consecutive codons (e.g. [
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B10">10</xref>
]).</p>
<p>In a previous study, we have developed software and statistical methodologies for analysis of codon-pair contexts and have identified general rules that govern such context [
<xref ref-type="bibr" rid="B4">4</xref>
]. In here, we have extended those studies to the analyses of codon-triplets context, using several fungal ORFeomes as model systems. This study produced very large data sets and posed significant computational challenges, which prompted the development of a dedicated database for data storage and tools for data mining. We show for the first time that context of codon-triplets is highly biased and species specific and we discuss the implications of trinucleotide repeats for codon-triplets context. We also explain how our approach can be used to identify non-standard mRNA decoding events and alterations to the genetic code.</p>
</sec>
<sec>
<title>Results</title>
<sec>
<title>Tools to study codon-triplets</title>
<p>We have implemented computational algorithms and data storage facilities for comparative genomics of codon-triplet analysis in fungal genomes (Figure
<xref ref-type="fig" rid="F1">1</xref>
). The algorithm developed simulates the ribosome during decoding by reading Open Reading Frames (ORFs), from the ATG initiation codon, and moving the reading window three nucleotides at a time until a stop codon is encountered. While doing this, it memorizes all codon-triplets, which represent A-, P- and E-site codons during mRNA decoding. In this study, triplet counting was performed on complete sets of ORFs (ORFeomes), which were initially filtered to eliminate aberrant ORFs lacking ATG initiation or TAG/TGA/TAA stop codons, or containing premature stop codons or ambiguous bases (N). The first and last triplets of each ORF were not considered to avoid translation initiation and termination context effects [
<xref ref-type="bibr" rid="B17">17</xref>
,
<xref ref-type="bibr" rid="B18">18</xref>
].</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>
<bold>Schematic representation of the bioinformatics system</bold>
. Gene sequences were downloaded from genome databases (Table 1) and filtered into a local database to eliminate false Open Reading Frames. Sequences were then processed by counting all codon-triplets, excluding the first and the last ones of each ORF, which have specific translation initiation and termination contexts. These data were transferred to a 3-dimensional 61 × 61 × 61 matrix and were saved as a Microsoft Access Database file. The processed data were then analyzed using Weka-3 data mining tools [19] and direct database queries. This methodology allowed us to handle very large data sets and identify differences in codon-triplet context between fungal species. These differences were finally subjected to statistical analyses.</p>
</caption>
<graphic xlink:href="1471-2164-8-444-1"></graphic>
</fig>
<p>Since analysis of codon triplets generates a 3-dimensional 61 × 61 × 61 matrix for each ORFeome, we have used a relational database to store the processed data (Figure
<xref ref-type="fig" rid="F1">1</xref>
). These large data sets were then analyzed using data mining tools [
<xref ref-type="bibr" rid="B19">19</xref>
] and direct database queries. These studies aimed at identifying major differences in triplet-codon context between the fungal ORFeomes stored in the main database (Table
<xref ref-type="table" rid="T1">1</xref>
). A similar methodology was used to count amino acid triplets generated from the same ORFeome sequences. For this, codons were translated to the respective amino acids using standard genetic code rules or using non-standard decoding of the leucine-CTG codons as serine in
<italic>Candida albicans </italic>
and
<italic>Debaryomyces hansenii </italic>
[
<xref ref-type="bibr" rid="B20">20</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
]. Finally, new algorithms were implemented to count codon and amino acid repetitions on an ORFeome wide scale. All results obtained were compared with values expected for a random distribution of codons, which were calculated considering the frequencies of random distribution of individual codons or amino acids in the genomes.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Data source</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="left">Species</td>
<td align="left">Site/link</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<italic>A. fumigatus</italic>
</td>
<td align="left">[42]</td>
</tr>
<tr>
<td align="left">
<italic>C. albicans</italic>
</td>
<td align="left">[43]</td>
</tr>
<tr>
<td align="left">
<italic>C. glabrata</italic>
</td>
<td align="left">[44]</td>
</tr>
<tr>
<td align="left">
<italic>D. hansenii</italic>
</td>
<td align="left">[45]</td>
</tr>
<tr>
<td align="left">
<italic>E. gossypii</italic>
</td>
<td align="left">[46]</td>
</tr>
<tr>
<td align="left">
<italic>K. lactis</italic>
</td>
<td align="left">[47]</td>
</tr>
<tr>
<td align="left">
<italic>S. bayanus</italic>
</td>
<td align="left">[48]</td>
</tr>
<tr>
<td align="left">
<italic>S. cerevisiae</italic>
</td>
<td align="left">[49]</td>
</tr>
<tr>
<td align="left">
<italic>S. mikatae</italic>
</td>
<td align="left">[48]</td>
</tr>
<tr>
<td align="left">
<italic>S. paradoxus</italic>
</td>
<td align="left">[48]</td>
</tr>
<tr>
<td align="left">
<italic>S. pombe</italic>
</td>
<td align="left">[50]</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>11 ORFeome sequences were downloaded from the web sites indicated bellow between December 2005 and February 2006. Each ORFeome was scanned for detection of invalid ORFs (see Methods), lacking ATG-start and TAA, TAG and TGA stop-codons. ORFs containing permature stop codons or undefined nucleotides (N) were also discarded from the analysis.</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Codon-triplets in fungal genomes</title>
<p>The tools described above permitted carrying out a comparative analysis of codon-triplets in 11 fungal ORFeomes (Table
<xref ref-type="table" rid="T1">1</xref>
). Clear patterns of codon-triplets preferences and rejections were identified for each ORFeome and, as for codon-pair contexts [
<xref ref-type="bibr" rid="B4">4</xref>
], such patterns were specific of each ORFeome (Tables
<xref ref-type="table" rid="T2">2</xref>
and
<xref ref-type="table" rid="T3">3</xref>
and Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S1). This first analysis also showed that the percentage of codon-triplets that vanished from the ORFeomes was much higher than expected from random distribution of the triplets in these ORFeomes (Figure
<xref ref-type="fig" rid="F2">2A</xref>
). The percentage of vanished codon-triplets varied between 8 and 11% in most fungal ORFeomes, but was significantly lower in
<italic>Aspergillus fumigatus </italic>
(0.5%),
<italic>Eremothecium gossypii </italic>
(2.9%) and
<italic>Saccharomyces mikatae </italic>
(1.6%). The human pathogen
<italic>Candida albicans </italic>
had higher percentage of such triplets (16.5%) and those vanished codon-triplets were reflected also at the amino acid level, since
<italic>C. albicans </italic>
was the only species where an amino acid triplet, namely Trp-Met-Trp, was absent. Conversely, analysis of the 10 most frequent codon-triplets (Figure
<xref ref-type="fig" rid="F2">2B</xref>
) showed an even distribution in these fungal ORFeomes with exception of
<italic>C. albicans</italic>
, where the percentage of these abundant triplets increased more than 2-fold (0.45%). Overall, in
<italic>C. albicans </italic>
there was a clear over-representation of a subset of codon-triplets, namely (CAA-CAA-CAA), (GAA-GAA-GAA), (AAT-AAT-AAT) or (GAT-GAT-GAT) (Table
<xref ref-type="table" rid="T2">2</xref>
) and strong repression of another subset, namely (GAA-AAA-AAA), (AAA-AAA-AAT), (AAA-AAA-AAA) or (TTA-AAA-AAA) (Table
<xref ref-type="table" rid="T3">3</xref>
), indicating higher bias of codon-triplet usage in this human pathogen (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S1).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>Ranking of the 10 most preferred codon-triplets in fungal ORFeomes</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="center">
<italic>
<underline>A.fum</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>C.alb</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>C.gla</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>D.han</underline>
</italic>
</td>
<td align="center">
<italic>E.gos</italic>
</td>
<td align="center">
<italic>
<underline>K.lac</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.bay</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.cer</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.mik</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.par</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.pom</underline>
</italic>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">
<bold>GAG</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>GAG</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">ATT</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>GAG</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>GAG</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">ACT</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>GAG</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>GAG</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">AGT</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">GAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>AAC</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">ACA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>AAC</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">CCA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">
<bold>AAC</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">ATT</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">
<bold>GAC</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">GAT</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CCA</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>GAC</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">GAT</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">ATT</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>GAC</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">GAA</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">ACT</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>AAC</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">
<bold>GCG</bold>
</td>
<td align="center">CAA</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CAA</td>
<td align="center">CAA</td>
<td align="center">GAT</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>AAC</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">
<bold>GCG</bold>
</td>
<td align="center">CAG</td>
<td align="center">CAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CAG</td>
<td align="center">CAG</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">
<bold>
<underline>GAA</underline>
</bold>
</td>
<td align="center">
<bold>AAC</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>GCG</bold>
</td>
<td align="center">CAA</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CAA</td>
<td align="center">CAA</td>
<td align="center">GAA</td>
</tr>
<tr>
<td align="left">6</td>
<td align="center">GAG</td>
<td align="center">CAA</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">GAG</td>
<td align="center">GAT</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">GAT</td>
<td align="center">TCC</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">CAG</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAC</td>
<td align="center">GAA</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">CAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">GAA</td>
<td align="center">ACA</td>
</tr>
<tr>
<td></td>
<td align="center">GAA</td>
<td align="center">CAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAG</td>
<td align="center">GAT</td>
<td align="center">
<bold>GAT</bold>
</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">GAA</td>
<td align="center">CCA</td>
</tr>
<tr>
<td align="left">7</td>
<td align="center">GAA</td>
<td align="center">
<bold>GGT</bold>
</td>
<td align="center">AAG</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">GAG</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">
<bold>GGT</bold>
</td>
<td align="center">AAG</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">CTG</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">GAT</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">
<bold>GGT</bold>
</td>
<td align="center">AAA</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">CTG</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">GAA</td>
</tr>
<tr>
<td align="left">8</td>
<td align="center">AAG</td>
<td align="center">CAA</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAC</td>
<td align="center">CAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">
<bold>GAT</bold>
</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">CAA</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAG</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">
<bold>GAT</bold>
</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">CAG</td>
<td align="center">
<bold>CAA</bold>
</td>
<td align="center">GAT</td>
<td align="center">GAC</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>GAT</bold>
</td>
</tr>
<tr>
<td align="left">9</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAC</td>
<td align="center">GAC</td>
<td align="center">CAA</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>TCT</bold>
</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAG</td>
<td align="center">CAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">
<bold>TCT</bold>
</td>
</tr>
<tr>
<td></td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">GAG</td>
<td align="center">CAG</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>TCT</bold>
</td>
</tr>
<tr>
<td align="left">10</td>
<td align="center">GAG</td>
<td align="center">
<bold>CCA</bold>
</td>
<td align="center">GAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CTG</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">CAG</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">AGT</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">
<bold>CCA</bold>
</td>
<td align="center">GAA</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CTG</td>
<td align="center">GAA</td>
<td align="center">GAG</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">CAA</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">TCA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<bold>CCA</bold>
</td>
<td align="center">GAG</td>
<td align="center">
<bold>AAT</bold>
</td>
<td align="center">CAG</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">
<bold>CAG</bold>
</td>
<td align="center">CAG</td>
<td align="center">
<bold>AAG</bold>
</td>
<td align="center">ACT</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The complete set of codon-triplets present in the 11 fungal ORFeomes studied were identified and the codon-triplets were ranked according to the difference between observed and expected values (bias). The 10 most preferred codon-triplets indicate that the codon-triplets with highest positive bias are common in all fungal ORFeomes. For example, the strongly biased GAA-GAA-GAA triplet was preferred in 10 out of 11 fungal ORFeomes (underscored). Also, codon-triplets containing identical codons were frequent (bold). Interestingly, the most common feature of these preferred codon-triplets was the presence of a guanosine (G) at position 4 and 7 of the triplet (X
<sub>1</sub>
X
<sub>2</sub>
X
<sub>3</sub>
-
<bold>Y</bold>
<sub>4</sub>
Y
<sub>5</sub>
Y
<sub>6</sub>
-
<bold>Z</bold>
<sub>7</sub>
Z
<sub>8</sub>
Z
<sub>9</sub>
), a feature which was common to half of the codon-triplets indicated below.</p>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>Ranking of the 10 most rejected codon-triplets in fungal ORfeomes</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="center">
<italic>A.fum</italic>
</td>
<td align="center">
<italic>
<underline>C.alb</underline>
</italic>
</td>
<td align="center">
<italic>C.gla</italic>
</td>
<td align="center">
<italic>
<underline>D.han</underline>
</italic>
</td>
<td align="center">
<italic>E.gos</italic>
</td>
<td align="center">
<italic>
<underline>K.lac</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.bay</underline>
</italic>
</td>
<td align="center">
<italic>
<underline>S.cer</underline>
</italic>
</td>
<td align="center">
<italic>S.mik</italic>
</td>
<td align="center">
<italic>S.par</italic>
</td>
<td align="center">
<italic>S.pom</italic>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="center">GAC</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">AAA</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">AAA</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">GCT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">CCT</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">AAG</td>
<td align="center">AAG</td>
<td align="center">AAG</td>
<td align="center">AAA</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">GAC</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAC</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAT</td>
<td align="center">
<underline>ATT</underline>
</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">ATT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">GAA</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">AAA</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAA</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">GCT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">
<underline>ATT</underline>
</td>
<td align="center">GAA</td>
<td align="center">AAA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">TTT</td>
</tr>
<tr>
<td></td>
<td align="center">GAC</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>GGT</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">AAA</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">GAG</td>
<td align="center">
<underline>TTA</underline>
</td>
<td align="center">GAA</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">TTG</td>
<td align="center">ATT</td>
</tr>
<tr>
<td></td>
<td align="center">GCT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>ATT</underline>
</td>
<td align="center">TTT</td>
<td align="center">AAA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">CAG</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAA</td>
<td align="center">ATT</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">GAT</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">AAG</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">TTT</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">TCT</td>
<td align="center">CTT</td>
</tr>
<tr>
<td></td>
<td align="center">GCC</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GGT</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">CAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>TTG</underline>
</td>
<td align="center">AAA</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">AAA</td>
</tr>
<tr>
<td align="left">6</td>
<td align="center">GCT</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">AAT</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">AAA</td>
<td align="center">ATT</td>
<td align="center">AAA</td>
<td align="center">ATT</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">TCT</td>
<td align="center">ATT</td>
<td align="center">ATT</td>
<td align="center">TTT</td>
</tr>
<tr>
<td></td>
<td align="center">TTC</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">GGT</td>
<td align="center">
<underline>AAG</underline>
</td>
<td align="center">CAG</td>
<td align="center">AAA</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">AAA</td>
<td align="center">AAA</td>
<td align="center">TTT</td>
</tr>
<tr>
<td align="left">7</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">AAA</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GGT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">GAT</td>
<td align="center">AAG</td>
<td align="center">AAA</td>
<td align="center">GAT</td>
<td align="center">AAG</td>
<td align="center">GAT</td>
</tr>
<tr>
<td></td>
<td align="center">TTC</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">
<underline>AAC</underline>
</td>
<td align="center">CTG</td>
<td align="center">AAG</td>
<td align="center">GAT</td>
<td align="center">GGT</td>
<td align="center">AAG</td>
<td align="center">TCT</td>
<td align="center">AAG</td>
</tr>
<tr>
<td align="left">8</td>
<td align="center">ATT</td>
<td align="center">
<underline>TCA</underline>
</td>
<td align="center">
<underline>AAC</underline>
</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">TTC</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
<td align="center">GAA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GCG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">TTT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">TTT</td>
<td align="center">GAT</td>
<td align="center">ATT</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAG</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">AAG</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">AAA</td>
<td align="center">AAA</td>
</tr>
<tr>
<td align="left">9</td>
<td align="center">GAG</td>
<td align="center">
<underline>TTA</underline>
</td>
<td align="center">AAC</td>
<td align="center">AAT</td>
<td align="center">GAG</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAA</td>
<td align="center">AAT</td>
<td align="center">
<underline>ATT</underline>
</td>
<td align="center">ATT</td>
</tr>
<tr>
<td></td>
<td align="center">CCT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">TTT</td>
<td align="center">TCT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAT</td>
<td align="center">TTT</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAA</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">AAG</td>
<td align="center">TTA</td>
<td align="center">AAG</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">AAT</td>
<td align="center">AAA</td>
<td align="center">
<underline>ATT</underline>
</td>
<td align="center">GAT</td>
</tr>
<tr>
<td align="left">10</td>
<td align="center">GAT</td>
<td align="center">
<underline>CCA</underline>
</td>
<td align="center">GAT</td>
<td align="center">GAA</td>
<td align="center">TTT</td>
<td align="center">
<underline>GAT</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAA</td>
<td align="center">TTT</td>
</tr>
<tr>
<td></td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">AAG</td>
<td align="center">CCA</td>
<td align="center">AAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>CAA</underline>
</td>
<td align="center">
<underline>ATT</underline>
</td>
<td align="center">GAT</td>
<td align="center">AAA</td>
</tr>
<tr>
<td></td>
<td align="center">GAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">GAA</td>
<td align="center">GAT</td>
<td align="center">GAG</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>GAA</underline>
</td>
<td align="center">
<underline>AAA</underline>
</td>
<td align="center">
<underline>AAT</underline>
</td>
<td align="center">CAA</td>
<td align="center">GTT</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The complete set of codon-triplets present in the 11 fungal ORFeomes studied were identified and the codon-triplet contexts were ranked according to the difference between observed and expected values (bias). This group of codon-triplets can be divided according to the presence or absence of runs of more than 5 consecutive adenosines (underlined). Interestingly, the most common feature of these strongly repressed codon triplets was the presence of an adenosine (A) at position 4 and 7 of the triplet (X
<sub>1</sub>
X
<sub>2</sub>
X
<sub>3</sub>
-
<bold>Y</bold>
<sub>4</sub>
Y
<sub>5</sub>
Y
<sub>6</sub>
-
<bold>Z</bold>
<sub>7</sub>
Z
<sub>8</sub>
Z
<sub>9</sub>
), a feature that appears in one third of the codon-triplets shown below.</p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>
<bold>Major differences in codon-triplet contexts in fungal genomes</bold>
. In order to characterize codon-triplet distributions in the 11 fungal species studied, we have calculated the percentage of codon-triplets that did not appear in the fungal ORFeomes (panel A). Additionally, the fraction corresponding to the 10 most frequent codon-triplets were also quantified (panel B). In both cases,
<italic>C. albicans </italic>
showed stronger bias in codon-triplet distribution than the other species. Bars represent observed percentages while blue dots indicate values expected from random codon-triplet distribution.</p>
</caption>
<graphic xlink:href="1471-2164-8-444-2"></graphic>
</fig>
<p>Since codon-triplet choice was an ORFeome specific feature that could influence mRNA decoding efficiency (see above), the stronger bias found in
<italic>C. albicans </italic>
prompted the question of whether it could be linked to mRNA translation. In order to shed new light into this question, codon usage bias and tRNA gene copy number, which provides direct indication of tRNA expression level, were determined at an ORFeome scale for all ORFeomes analyzed (Table
<xref ref-type="table" rid="T4">4</xref>
). In
<italic>C. albicans</italic>
, there were fewer tRNA genes (131), but the total number of codons was the second largest (2 939 109) of the ORFeomes set. Consequently, the relative tRNA abundance (given by gene copy number) per codon (or per amino acid) was lower in
<italic>C. albicans </italic>
than in the other fungi (Figure
<xref ref-type="fig" rid="F3">3</xref>
). For example, the tRNA
<sup>Asn </sup>
gene had 4 copies in
<italic>C. albicans </italic>
and between 4 and 10 copies in the other species, but the total number of Asn codons was highest in
<italic>C. albicans </italic>
(Total Asn codons = 201 917) (data not shown). In order to determine whether this relative decrease in tRNA gene copy number unbalanced tRNA abundance and codon usage, we have calculated the relative synonymous codon usage (RSCU) values for the entire set of codons (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
, Table S1), and the relative tRNA isoacceptor usage (RIU) values. The later is equivalent to RSCU values [
<xref ref-type="bibr" rid="B23">23</xref>
] but, since it was calculated using tRNA gene copy number, it informed about tRNA abundance bias. That is, different RIU values in a group of isoacceptors indicated differences in cellular levels of these tRNAs. By restricting the quotient RSCU/RIU to pairs of cognate-codons and cognate-tRNAs, we were able to calculate a Decoding Adaptation Quotient (DAQ = RSCU/RIU) by averaging the quotients obtained for each codon/tRNA pair (see Methods). DAQ values close to 1 indicated that codon usage and tRNA gene copy number (tRNA abundance) were well matched (high correlation), as was the case for
<italic>A. fumigatus</italic>
,
<italic>C. glabrata</italic>
,
<italic>E. gossypii</italic>
,
<italic>K. lactis</italic>
,
<italic>S. paradoxus </italic>
or
<italic>S. pombe </italic>
(Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S2). Organisms with DAQ>>1 often used rare tRNAs to decode frequently used codons as was the case for
<italic>D. hansenii</italic>
,
<italic>S. bayanus</italic>
,
<italic>S. cerevisiae </italic>
or
<italic>S. mikatae</italic>
. Interestingly,
<italic>C. albicans </italic>
originated the lowest DAQ value (0.841) of all fungi studied (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S2), indicating that this fungal pathogen prefers codons that are decoded by near-cognate rather than cognate tRNAs. The divergence of decoding preferences between
<italic>C. albicans </italic>
and the other fungi can be clearly exemplified for Asn codons (AAC and AAU). All fungi analyzed decode both codons using a single tRNA (tRNA
<sub>GUU</sub>
<sup>Asn</sup>
) and usually prefer the cognate codon AAC, however,
<italic>C. albicans </italic>
had a strong preference for AAU codons (RSCU = 1.435) over AAC (RSCU = 0.565), (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
, Table S1). That is, its most frequently used Asn codon is decoded by a near-cognate tRNA (tRNA
<sup>Asn </sup>
<sub>GUU</sub>
). This near-cognate decoding preference was also observed for the codons corresponding to the amino acids His (
<bold>CAC</bold>
,
<underline>CAU</underline>
), Asp (
<bold>GAC</bold>
,
<underline>GAU</underline>
), Gly (
<bold>GGA</bold>
,
<bold>GGC</bold>
,
<bold>GGG, </bold>
<underline>GGU</underline>
), Tyr (
<bold>UAC</bold>
,
<underline>UAU</underline>
), Cys (
<bold>UGC</bold>
,
<underline>UGU</underline>
) and Phe (
<bold>UUC</bold>
,
<underline>UUU</underline>
), where the preferred codons in
<italic>C. albicans </italic>
(underlined, above) do not have cognate tRNAs (codons with cognate tRNAs are indicated in bold).</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption>
<p>tRNA gene copy number
<italic>vs </italic>
total number of codons in fungi</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td></td>
<td align="center">tRNA gene copy number</td>
<td align="center">total codons</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<italic>A.fumigatus</italic>
</td>
<td align="center">163</td>
<td align="center">3842897</td>
</tr>
<tr>
<td align="left">
<bold>
<italic>C.albicans</italic>
</bold>
</td>
<td align="center">
<bold>131</bold>
</td>
<td align="center">
<bold>2939109</bold>
</td>
</tr>
<tr>
<td align="left">
<italic>C.glabrata</italic>
</td>
<td align="center">207</td>
<td align="center">2525088</td>
</tr>
<tr>
<td align="left">
<italic>D.hansenii</italic>
</td>
<td align="center">205</td>
<td align="center">2796378</td>
</tr>
<tr>
<td align="left">
<italic>E.gossypii</italic>
</td>
<td align="center">169</td>
<td align="center">2220107</td>
</tr>
<tr>
<td align="left">
<italic>K.lactis</italic>
</td>
<td align="center">162</td>
<td align="center">2397264</td>
</tr>
<tr>
<td align="left">
<italic>S.bayanus</italic>
</td>
<td align="center">274</td>
<td align="center">1811749</td>
</tr>
<tr>
<td align="left">
<italic>S.cerevisiae</italic>
</td>
<td align="center">273</td>
<td align="center">2804657</td>
</tr>
<tr>
<td align="left">
<italic>S.mikatae</italic>
</td>
<td align="center">251</td>
<td align="center">1698131</td>
</tr>
<tr>
<td align="left">
<italic>S.paradoxus</italic>
</td>
<td align="center">200</td>
<td align="center">2132879</td>
</tr>
<tr>
<td align="left">
<italic>S.pombe</italic>
</td>
<td align="center">156</td>
<td align="center">2062840</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Complete genome sequences of the 11 fungal species studied were used to extract tRNA sequences using tRNAscan-SE (see Methods). The total number of different tRNA gene sequences retrieved was compared with the total number of codons.
<italic>C. albicans </italic>
(bold) had the smallest set of tRNA genes, but was the second largest ORFeome.</p>
</table-wrap-foot>
</table-wrap>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>
<bold>Relative tRNA gene copy number is lower in
<italic>C. albicans </italic>
than in other fungi</bold>
. The tRNA gene copy number was determined for each tRNA isoacceptor using tRNAscan-SE [40], and the gene copy number of each group of isoacceptors was summed and divided by the number of times the respective amino acid was present in each ORFeome. In order to carry out comparisons between ORFeomes, data obtained for individual amino acids was averaged into a single column for each organism. Values are presented as tRNA gene copy number per 100 000 cognate amino acids.
<italic>A. fumigatus </italic>
and
<italic>C. albicans </italic>
have the lowest tRNA gene copy number (tRNA abundance) per aa, while
<italic>C. bayanus </italic>
and
<italic>S. mikatae </italic>
have the highest.</p>
</caption>
<graphic xlink:href="1471-2164-8-444-3"></graphic>
</fig>
<p>The relative low number of tRNA genes in
<italic>C. albicans </italic>
suggested that, either
<italic>C. albicans </italic>
regulates expression of certain tRNAs through yet unknown cis-acting elements and uses novel polIII transcriptional activators (i.e., tRNA gene copy numbers are not indicative of tRNA availability), or its mRNA translation machinery works under tRNA limitation. In order to clarify these important points we have scanned the 5'-upstream sequences of the
<italic>C. albicans </italic>
tRNA genes and searched for conserved elements that could explain tRNA up-regulation by the polIII transcriptional machinery. However, we were unable to identify such putative conserved polIII enhancers (data not shown). Therefore, one is left with the intriguing possibility that tRNA limitation and generalized near-cognate decoding may yet be another unique feature of the
<italic>C. albicans </italic>
translational machinery. This may explain the strong bias of
<italic>C. albicans </italic>
codon-triplet usage since, unlike in other species,
<italic>C. albicans </italic>
maximizes the utilization of a small subset of tRNAs to decode strongly biased codons present in the triplets. This puzzling result requires experimental confirmation through
<italic>in vivo </italic>
tRNAs quantification to clarify whether tRNA limitation is a feature of the
<italic>C. albicans </italic>
translational machinery, and, more importantly, whether such putative limited tRNA availability increases decoding error [
<xref ref-type="bibr" rid="B17">17</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
,
<xref ref-type="bibr" rid="B25">25</xref>
]. We have discovered recently that ambiguous decoding of the reassigned CUG codon (serine + leucine) generates phenotypic diversity in this human pathogen [
<xref ref-type="bibr" rid="B26">26</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
] and it will be most interesting to elucidate whether
<italic>C. albicans </italic>
uses generalized mistranslation as a strategy to expose hidden phenotypic diversity. Finally, we cannot exclude that biases of codon-triplets arise from protein primary structure constraints. Indeed, our study on a genetic code alteration in
<italic>C. albicans </italic>
supports this hypothesis (see below). However, tri-peptide biases would only be relevant for this study if they were significantly different in
<italic>C. albicans </italic>
and this was not observed. Rather, the main differences between
<italic>C. albicans </italic>
and the other species were related to a limited subset of contexts with repeated codons and amino acids in consecutive positions (se bellow).</p>
</sec>
<sec>
<title>Strings of repeated codons</title>
<p>The high frequency of repeated codons and amino acids in fungal ORFeomes and the high percentage of triplets of identical codons and amino acids in
<italic>C. albicans </italic>
(Figure
<xref ref-type="fig" rid="F4">4A, B</xref>
), prompted us to carry out a more detailed analysis of the codon composition of such repetitions. For this, the distribution of isolated codons, identical codon-pairs, identical codon-triplets and identical codon-strings were determined (Figure
<xref ref-type="fig" rid="F5">5</xref>
). Isolated codons were underrepresented in all ORFeomes, in particular in
<italic>C. albicans </italic>
(Figure
<xref ref-type="fig" rid="F5">5A</xref>
). However, this effect was minimized in pairs of identical codons, where observed and expected (random distribution) values were similar (Figure
<xref ref-type="fig" rid="F5">5B</xref>
). In
<italic>C. albicans</italic>
, the strong under-representation of isolated codons was not visible in identical codon-pairs (Figure
<xref ref-type="fig" rid="F5">5B</xref>
), and that bias was reversed and sharply increased for repetitions of identical codon-triplets and identical codon-strings (Figure
<xref ref-type="fig" rid="F5">5C, D</xref>
). Indeed, the distribution of the latter was remarkably different between
<italic>C. albicans </italic>
and the other ORFeomes (Figure
<xref ref-type="fig" rid="F5">5D</xref>
).</p>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption>
<p>
<bold>High repetition of identical codons and amino acids in
<italic>C. albicans</italic>
</bold>
. The percentage of ORFeomes composed of identical codon-triplets was determined. The percentage of these triplets (panel A) and of their respective amino acids (panel B) was much higher in
<italic>C. albicans </italic>
than in the other species, indicating a strong bias in the distribution of repeated codons in its ORFeome (red bars). Bars represent the observed percentages while blue dots indicate expected values.</p>
</caption>
<graphic xlink:href="1471-2164-8-444-4"></graphic>
</fig>
<fig position="float" id="F5">
<label>Figure 5</label>
<caption>
<p>
<bold>Low frequency of isolated (non-repeated) codons in
<italic>C. albicans</italic>
</bold>
. Since codon repeats were very frequent in
<italic>C. albicans </italic>
and also in the other species we have computed separated the proportion of codons that appeared isolated or in identical codon-pairs, -triplets or longer strings.
<italic>C. albicans </italic>
had lower frequency of isolated codons (non-repeated identical codons) than the other species, although there was general repression of isolated codons in all species (panel A). This bias was reversed for repetitions of 2 or more identical codons, which again was exacerbated in
<italic>C. albicans </italic>
(red bar; panels B-D).</p>
</caption>
<graphic xlink:href="1471-2164-8-444-5"></graphic>
</fig>
<p>We then analyzed the amino acid composition of the repeated codon-triplets and again strong biases were observed (Figure
<xref ref-type="fig" rid="F6">6</xref>
). Most repetitions involved the amino acids Gln, Asp, Glu, Asn and Ser, confirming previous observations in
<italic>S. cerevisiae </italic>
and higher eukaryotes [
<xref ref-type="bibr" rid="B28">28</xref>
,
<xref ref-type="bibr" rid="B29">29</xref>
]. Repetitions of the amino acids Ala, Pro, His, Thr, Gly, Lys and Arg had an intermediate representation, while those of Val, Phe Ile and Leu were often underrepresented and in certain ORFeomes repetitions of Tyr, Cys, Met and Trp were absent or rarely used (Figure
<xref ref-type="fig" rid="F6">6</xref>
). Of all amino acids, Gln was more frequent and was also rarely present as an isolated amino acid across all ORFeomes (blue bars, first column of Figure
<xref ref-type="fig" rid="F6">6</xref>
). Once more,
<italic>C. albicans </italic>
showed stronger bias for amino acid repetitions since the number of Gln, Asp, Glu, Asn and Ser repetitions was higher than in the other ORFeomes, while repetitions of Pro, His and Thr, which were not frequent in the other genomes, were frequent in the
<italic>C. albicans </italic>
genome (Figure
<xref ref-type="fig" rid="F6">6</xref>
).</p>
<fig position="float" id="F6">
<label>Figure 6</label>
<caption>
<p>
<bold>Specificity of amino acid repeats</bold>
. The degeneracy of the genetic code prompted us to determine whether amino acid repeats would provide a better picture of the frequency of repeated features in the fungal ORFeomes. For this, the repeats were quantified and displayed as shown. In the diagram, and for each species, the first line in each column from the top corresponds to cases in which the amino acid appeared isolated in ORFs. The second line corresponds to isolated pairs of identical amino acids and so on, so that, for each column, higher number of lines correspond to longer amino acid strings. As expected, amino acid repeats were biased, as indicated by the color scale used in the map, where light blue corresponds to repressed repeats and the brown color indicates preferred repeats. Yellow represents repeats whose observed and expected frequencies were similar. Amino acid repeats were amino acid specific. For example, Gln, Asp, Glu, Asn or Ser (first group on the left) formed long strings more frequently than expected (brown), while strings of Phe, Ile, and Leu were repressed in all ORFeomes (blue bars).
<italic>C. albicans </italic>
had the longest strings of almost all amino acids, in particular of Asn, Pro, His and Thr (highlighted in grey, top of the diagram).</p>
</caption>
<graphic xlink:href="1471-2164-8-444-6"></graphic>
</fig>
<p>Finally, the distribution of the above repetitions was analyzed for synonymous codons of each amino acid (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figures S3A,B). Of the most frequent amino acid repetitions, Gln used CAA codons mainly in all but
<italic>A. fumigatus</italic>
. Strings of Asn often used AAC codons, but some ORFeomes preferred AAT codons (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S3A). Of the 6 serine codons, AGT, TCA and TCT were the most commonly used, while Thr repetitions used ACA or ACT but rarely ACC or ACG codons. The few Lys repetitions observed used AAG codons almost exclusively and repressed AAA codons, an effect that may be linked to strong repression of homopolymeric strings, since repetitions of CCC, GGG and TTT codons were also strongly repressed (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S3B).</p>
</sec>
<sec>
<title>Composition of codon-triplets that vanished from ORFeomes</title>
<p>The high proportion of triplets that vanished from fungal ORFeomes (Figure
<xref ref-type="fig" rid="F2">2A</xref>
) prompted us to investigate whether particular codon-context trends could be identified, which would explain repression of particular codon combinations. For this, the codon composition of triplets absent in each ORFeome, at the 3 different codon positions (XXX-YYY-ZZZ), was determined. No significant differences could be detected between ORFeomes or codon positions, and the results obtained with codons starting with any base (N) were redundant (data not shown). To overcome this effect and highlight major effects only the data was averaged. A well defined pattern of preferences and rejections linked to the second and third bases of codons in absent codon-triplets (Figure
<xref ref-type="fig" rid="F7">7</xref>
) became apparent. This indicated that the first base of the codon, and the position of the codon in the triplet, did not contribute to triplet disappearance. Conversely, codons ending with two adenosines (NAA) showed poor association with absent triplets, while NCC, NCG or NGN were strongly associated to repressed codon triplets (Figure
<xref ref-type="fig" rid="F7">7</xref>
).</p>
<fig position="float" id="F7">
<label>Figure 7</label>
<caption>
<p>
<bold>Bias of codon-triplets that vanished from ORFeomes</bold>
. The number of possible codon-triplet combinations that were not present in fungal ORFeomes was surprisingly high. In order to elucidate why these triplets disappeared from ORFeomes, the respective codons were further studied, namely by counting the number of times each codon appeared in the first, second or third position of the triplets. No significant differences were found between species and between codon-triplet positions. Also, the first base of all codons originated redundant results. Therefore, values for all ORFeomes, triplet positions, and also for all codons starting with A, C, G or T were averaged. NCC, NCG and NGN codons were the most frequent codons in codon-triplets that were absent in fungal ORFeomes (red bars in panel B). Conversely, NAA codons were underrepresented in this group (yellow bar).</p>
</caption>
<graphic xlink:href="1471-2164-8-444-7"></graphic>
</fig>
<p>As before, CTG reassignment in
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
prompted us to investigate whether the unusual decoding of CTG codons forced the disappearance of codon-triplets. For this, absent triplets that contained CTN codons, i.e. CTA, CTC, CTG or CTT were studied (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S4). Each species had its preference pattern. The CTA codon was absent mainly in triplets of
<italic>A. fumigatus</italic>
, CTC in
<italic>C. glabrata </italic>
and
<italic>Saccharomyces </italic>
sp. and CTT in
<italic>E. gossypii</italic>
. Significantly, CTG codons were the most frequent CTN codons in codon-triplets that vanished from
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
ORFeomes (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S4A). Moreover, in
<italic>D. hansenii </italic>
the number of codon-triplets that vanished only from that ORFeome and lacked CTGs was two fold higher than in the other fungi, including
<italic>C. albicans </italic>
(Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S4B).</p>
</sec>
<sec>
<title>Genetic code alteration signature</title>
<p>In
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
leucine CTG codons are decoded as serine [
<xref ref-type="bibr" rid="B20">20</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
]. This genetic code alteration appeared approximately 272 ± 25 million years ago in the yeast ancestor and reprogrammed more than 30,000 CTGs present in its genes [
<xref ref-type="bibr" rid="B30">30</xref>
]. Such dramatic genetic event imposed negative pressure on CTG usage and eliminated most of these codons. Interestingly, a high number of "old" leucine-CTGs were replaced by "new" serine-CTGs that evolved from mutation of serine rather than leucine codons [
<xref ref-type="bibr" rid="B30">30</xref>
]. In other words, the CTGs existent in the ORFeomes of
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
are new serine codons that appeared during the last 272 ± 25 million years. Since serine codons are often present in codon repetitions while leucine codons are strongly repressed (Figure
<xref ref-type="fig" rid="F6">6</xref>
), we have taken advantage of this genetic code alteration to shed new light on the evolutionary dynamics of codon (amino acid) repetitions in yeasts. Furthermore, since leucine is hydrophobic and serine polar, we hypothesized that constraints imposed by protein structure would be visible as alterations in the context of CTG containing triplets.</p>
<p>Contexts of the NNN-Leu-NNN and NNN-Ser-NNN types were identified in the ORFeomes set and the values were displayed in such a way that upstream and downstream rejected and preferred codon neighbors could be highlighted (Figure
<xref ref-type="fig" rid="F8">8</xref>
). This was carried out by determining codon neighbor combinations (upstream and downstream) that were preferred in leucine- or serine-bearing triplets (leucine and serine neighbor signatures) and computing the number of times each signature appeared above the expected threshold, when the middle codon of the triplet was CTG (Figure
<xref ref-type="fig" rid="F8">8A</xref>
). As expected, leucine and serine had clear neighborhood preferences, but this context signature was lost for CTGs in
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
(blue boxes, Figure
<xref ref-type="fig" rid="F8">8A</xref>
), which decode leucine-CTG codons as serine. In these species, CTGs had a signature that was not observed for leucine-CTA or serine-TCA codons, used as external controls (Figure
<xref ref-type="fig" rid="F8">8B, 8C</xref>
).</p>
<fig position="float" id="F8">
<label>Figure 8</label>
<caption>
<p>
<bold>Amino acid context signatures detect genetic code alterations</bold>
. In order to determine whether genetic code alterations could originate a specific triplet signature, the frequencies of amino acid contexts having leucine or serine in the middle position (ex. LYS-
<bold>LEU</bold>
-ASP/LYS-
<bold>SER</bold>
-ASP) were subtracted. Whenever this difference was higher than 0.0005 or lower than -0.0005 the respective context was considered biased towards leucine or serine, respectively. These biased neighborhoods were checked for Leu/Ser-CTG, Leu-CTA and Ser-TCA codons. The expected values were calculated for all the contexts and subtracted from the observed values. In order to normalize the bias with the total pool size for each codon-context each difference was divided by the expected value [(Obs-Exp)/Exp]. The sum of the quotients of all leucine-preferred (yellow bars) and serine-preferred (red bars) neighborhoods for each ORFeome showed the global effect. As expected, leucine CTG codons (panel-A) were more frequent in leucine-preferred contexts (yellow bars) than in serine-preferred ones (red bars). However, this signature was broken in
<italic>C. albicans </italic>
and
<italic>D. hansenii</italic>
, (where CTGs are decoded as serine and not leucine) since CTGs were associated with serine- rather than leucine-preferred neighbors. This trend was not detected in any other leucine or serine codon (Ex: Leu-CTA and Ser-TCA, panels-B and -C, respectively), indicating that genetic code alterations can be detected through codon-triplet context analysis.</p>
</caption>
<graphic xlink:href="1471-2164-8-444-8"></graphic>
</fig>
</sec>
</sec>
<sec>
<title>Discussion</title>
<sec>
<title>Context of codon-triplets</title>
<p>The close phylogenetic relationship of the fungi used in this study supports the hypothesis that, like codon-pair contexts, codon-triplet contexts are species specific. If codon-triplet contexts fine tune ribosome decoding efficiency, as we have hypothesized, then it is likely that the translation machinery of these fungal species imposes different pressure on codon-triplet context. This is in line with the finding that overexpression of genes in heterologous hosts is sometimes remarkably difficult to achieve due to differences in codon usage and other yet poorly understood translational constraints [
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
]. Moreover, the fidelity of heterologous protein synthesis is often affected by codon-pair context [
<xref ref-type="bibr" rid="B31">31</xref>
], which is also species specific [
<xref ref-type="bibr" rid="B33">33</xref>
], and, since the ribosome has 3 tRNA binding sites, one would expect that codon-triplet context specificity is indeed required to fine tune mRNA decoding efficiency.</p>
<p>Despite the species specificity found, common trends of codon-triplet context were observed in fungal ORFeomes. For example, CC- and CG-ending codons and codons containing guanosine in the middle position, i.e. NGN (Figure
<xref ref-type="fig" rid="F7">7</xref>
, red bars) were repressed in codon-triplets. Conversely, codons ending with two adenosines, i.e. NAA, were rare in codon-triplets that vanished from these fungal ORFeomes (Figure
<xref ref-type="fig" rid="F7">7</xref>
, yellow bar). The position of each codon in the triplet and the nature of the nucleotide at the first codon position, which strongly influenced codon-pair context [
<xref ref-type="bibr" rid="B4">4</xref>
] were not relevant in vanished codon-triplet contexts. Conversely, the last and middle nucleotides of each codon influenced those codon-triplet contexts (Figure
<xref ref-type="fig" rid="F7">7</xref>
). Variation in the last position of codons produced most codon usage biases because nucleotide changes at this position are often silent [
<xref ref-type="bibr" rid="B34">34</xref>
], but changes in the middle position of codons frequently result in amino acid changes in protein sequences [
<xref ref-type="bibr" rid="B35">35</xref>
]. Therefore, the apparent role of the third codon position on codon-triplet context may be linked to codon usage bias, while the role of the middle nucleotide of codons may be related to protein structure constraints. When the later constraint was removed from our data set by considering synonymous codons only (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S4), C + G pressure (
<italic>A. fumigatus </italic>
and
<italic>K. lactis</italic>
) and the
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
genetic code alteration appeared as important modulators of codon-triplet context. Indeed, G + C pressure seems to be the reason for the enrichment of vanished codon triplets in codons ending with A or T in
<italic>A. fumigatus </italic>
and in G or C in
<italic>K. lactis </italic>
(Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S4A), a result which is inline with the global GC% of both ORFeomes (GC% = 54,01 and GC% = 40,10, respectively). In
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
repression of triplets containing CTG codons was clearly visible, indicating that the genetic code alteration increases discrimination of CTG-associated contexts. Codon-triplets that were absent in
<italic>D. hansenii </italic>
only (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S4B) contained twice the number of CTG-bearing codon-triplets, suggesting that non-standard decoding of CTG as serine had stronger impact in
<italic>D. hansenii </italic>
than in
<italic>C. albicans </italic>
ORFeomes.</p>
</sec>
<sec>
<title>Strings of repeated codons</title>
<p>Apart from codon-triplet contexts, our software tools detected strings of repeated codons (Figures
<xref ref-type="fig" rid="F4">4</xref>
,
<xref ref-type="fig" rid="F5">5</xref>
). Tandem codon repeats are frequent in eukaryotic protein coding DNA [
<xref ref-type="bibr" rid="B29">29</xref>
] and result from slippage of the DNA polymerase δ during genome replication [
<xref ref-type="bibr" rid="B36">36</xref>
]. Such repetitions are also present in non-coding DNA in the form of trinucleotide repeats and, therefore, are unrelated to codon decoding by the ribosome [
<xref ref-type="bibr" rid="B28">28</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
]. Indeed, they are much more frequent in non-coding sequences, which contain greater variety of tandem repeats (especially from 1–6 bp) [
<xref ref-type="bibr" rid="B37">37</xref>
]. Despite this, one was prompted to ask whether tandem codon repeats could have a negative impact on decoding fidelity. For example, could ribosome frameshifting and drop off increase at codon strings due to depletion of tRNAs during decoding of repeated codons? If so, translation of the very high number of codon-strings in the
<italic>C. albicans </italic>
genome (Figure
<xref ref-type="fig" rid="F5">5</xref>
) would be problematic. This hypothesis was supported by the low relative abundance of tRNAs necessary to decode such repeated codon-strings (Figure
<xref ref-type="fig" rid="F3">3</xref>
). Indeed, of the 10 most preferred codon-triplets in
<italic>C. albicans</italic>
, 7 corresponded to contexts of repeated codons (Table
<xref ref-type="table" rid="T2">2</xref>
) whose decoding involves low abundance tRNAs (either cognate or near-cognate), as, for example, the above mentioned Asn codons, AAC and AAT.</p>
<p>The amino acids involved in formation of codon-strings in
<italic>C. albicans </italic>
and other organisms were identical, namely Gln, Asp, Glu, Asn and Ser [
<xref ref-type="bibr" rid="B28">28</xref>
,
<xref ref-type="bibr" rid="B29">29</xref>
] (Figure
<xref ref-type="fig" rid="F6">6</xref>
). However, in
<italic>C. albicans </italic>
Pro, His and Thr also formed repetitions that were not observed in other organisms. Also, there was codon discrimination within amino acid repetitions (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S3A,B). For example, in almost all ORFeomes studied Gln-CAA was more frequent than expected while its synonymous Gln-CAG was repressed (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S3A). In
<italic>C. albicans</italic>
, Thr-ACA and Thr-ACT codons were frequently used in Thr-strings, while the Thr-ACC and Thr-ACG codons were not (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S3A). This preference for certain codons within amino acid runs, suggests bias in DNA polymerase δ slippage or, alternatively, identical codon-repetitions produced during genome replication were later
<italic>polished </italic>
at the 3
<sup>rd </sup>
codon position by positive pressure arising from the translation process.</p>
<p>Finally, in all ORFeomes studied, acidic amino acids were present more often than basic amino acids in amino acid runs, the hydrophobic amino acids Phe, Ile or Leu did not form repetitions (Figure
<xref ref-type="fig" rid="F6">6</xref>
) and runs of amino acids formed by homopolymeric codon strings, i.e. AAA, CCC, GGG or TTT (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
, Figure S3B) were also strongly repressed, as already observed in other eukaryotic genomes [
<xref ref-type="bibr" rid="B29">29</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
]. Since the latter corresponded to frameshifting-prone contexts [
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B38">38</xref>
] it is likely that their repression is related to translation fidelity. For example, the A AAA AAG motif found in
<italic>dnaX </italic>
and in many insertion sequences of the IS3 family has been considered the most efficient heptameric -1 shift motif in
<italic>E. coli </italic>
[
<xref ref-type="bibr" rid="B10">10</xref>
].</p>
</sec>
<sec>
<title>Leucine vs serine context signatures</title>
<p>
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
non-standard serine-CTG and standard leucine-CTG codons of the other fungal ORFeomes had divergent triplet-context preferences (Figure
<xref ref-type="fig" rid="F8">8</xref>
). The Ser-CTG codons of
<italic>C. albicans </italic>
and
<italic>D. hansenii </italic>
had codon neighbors typical of serines rather than leucines, indicating that these residues have clear neighbor preferences (upstream and downstream) and that the alteration of identity of the CTG codon from leucine to serine re-shaped the context of the codon-triplets containing CTGs. This implies that sense-to-sense genetic code alterations are accompanied by alteration in the context (upstream and downstream) of the codons that change identity to maintain amino acid triplet signatures (amino acid context). This may minimize the negative impact of genetic code alterations on protein structure and indicates that triplet amino acid context signatures are efficient tools to predict genetic code alterations.</p>
</sec>
</sec>
<sec>
<title>Conclusion</title>
<p>Our methodology to study codon-triplet context permitted carrying out large scale comparative analyses of triplets in 11 fungal genomes. Like codon-pair context, codon-triplet context is biased and such bias is maximal in the main human pathogen
<italic>C. albicans</italic>
. The data unveiled the nature and extent of codon repetitions in fungal ORFeomes and identified important differences in codon repetitions between fungal species.
<italic>C. albicans </italic>
showed the highest frequency and the longest codon repetitions and used codons in some repetitions that were not found in other fungi. Interestingly, codon-triplet contexts had specific signatures that were not observed for the CUG codon, which was reassigned from leucine to serine in
<italic>C. albicans </italic>
and
<italic>D. hansenii</italic>
. Such signatures highlight genetic code alterations in newly sequenced genomes.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Three-codon contexts</title>
<p>ORFeome sequences were retrieved from NCBI Genbank, the Broad Institute and from the
<italic>Candida </italic>
Genome database (Table
<xref ref-type="table" rid="T1">1</xref>
). ORF sequences that did not start with the ATG codon, did not end with one of the 3 stop codons inframe (TAA, TGA, TAG) or had internal stop codons or undefined bases (N), were discarded from the dataset. For data processing, we have developed an algorithm that fixes the frame at the initiation codon (ATG) and reads the 3 first inframe codons (reading window). It then moves the reading window one codon at a time in the 3'direction and memorizes all triplets until it encounters a stop codon. The algorithm reads entire ORFeomes but discards the first and last 3 codons which have specific contexts necessary for efficient translation initiation and termination (e.g. [
<xref ref-type="bibr" rid="B39">39</xref>
]). The results obtained were stored in a tri-dimensional array of 61 × 61 × 61 dimension, which was represented by cod(i, j, k), where i, j and k were codons of the first, second and third position, respectively (Figure
<xref ref-type="fig" rid="F1">1</xref>
). The values stored in the array corresponded to the number of times that a particular triplet appeared in one ORFeome. Similarly, a matrix of 20 × 20 × 20 dimension was built for amino acid triplets.</p>
<p>Since tandem codon repetitions were prominent in all genomes and introduced noise in the codon-triplet analysis, repetitions of more than 3 consecutive codons were excluded from the analysis during a second round of data processing. For this, the algorithm was modified as illustrated in Figure
<xref ref-type="fig" rid="F9">9</xref>
. The above methodology was also used for amino acid triplet counting. However, the ignored triplets present in strings were counted separately to evaluate the composition and length of each amino acid (codon) string. The results obtained for each ORFeome were stored as an array of
<italic>m </italic>
× 61, where
<italic>m </italic>
represents the maximum string length found in that ORFeome and the stored values correspond to the number of times each codon or amino acid appeared in sequences from 1 to
<italic>m</italic>
. The data arrays built by the algorithms described above were stored in a database to facilitate subsequent data analysis, which was performed using the Weka-3 package for data mining [
<xref ref-type="bibr" rid="B19">19</xref>
] and direct queries to the database (Figure
<xref ref-type="fig" rid="F1">1</xref>
).</p>
<fig position="float" id="F9">
<label>Figure 9</label>
<caption>
<p>
<bold>Methodology to remove regions of tandem codon repetition</bold>
. Repetitions of more than 3 consecutive codons were excluded from the analysis by analyzing four consecutive codons at each step. At each iteration, the presence of identical codons forming 4 consecutive triplets was verified and when such triplets were found the algorithm proceeded reading without counting the triplets until a different codon appeared in the ORF sequence.</p>
</caption>
<graphic xlink:href="1471-2164-8-444-9"></graphic>
</fig>
<p>Final results are shown together with expected non-biased results. To calculate the latter, we used the frequency of the respective codons or amino acids in the total ORFeome, which corresponds to the probability of their random appearance in each ORFeome. So, the expected frequency for any 3-codon context, codon1-codon2-codon3, would be the product of the frequencies of the individual codons, F(codon1)*F(codon2)*F(codon3).</p>
</sec>
<sec>
<title>tRNA genes</title>
<p>tRNA genes were identified using the tRNAscan-SE software package for tRNA identification and gene copy number quantification [
<xref ref-type="bibr" rid="B40">40</xref>
]. This freeware software was used as a standalone platform, which we have modified slightly to scan several genomes automatically. The gene copy number of each tRNA isoacceptor was calculated and compared to the total number of cognate codons present in coding sequences for each species. This provided a relative measure of tRNA availability in each organism. For this, the relative synonymous codon usage (RSCU) values were calculated for all codons, according to Sharp and Li [
<xref ref-type="bibr" rid="B23">23</xref>
]. Briefly, the RSCU of a codon (X) represents the number of times it appears in a sequence (observed usage), divided by the expected usage value, assuming random usage of synonymous codons for the corresponding amino acid (C). Therefore, RSCU values for a group of synonymous codons are similar if there is no codon usage bias, but become divergent when there is codon usage bias. Since there is a strong relationship between codon usage and tRNA abundance in bacteria and in
<italic>S. cerevisiae </italic>
[
<xref ref-type="bibr" rid="B2">2</xref>
], we have used a new index to determine whether such relationship was maintained in the fungal genomes under study. For this, we have calculated the "relative isoacceptor usage" (RIU) values for all tRNAs, through the methodology below:</p>
<p>
<disp-formula>
<mml:math id="M1" name="1471-2164-8-444-i1" overflow="scroll">
<mml:semantics>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>RIU</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext>i</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>j</mml:mtext>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mtext>X</mml:mtext>
<mml:mrow>
<mml:mtext>i</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>j</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msub>
<mml:mtext>n</mml:mtext>
<mml:mtext>i</mml:mtext>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mtext>j</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mtext>n</mml:mtext>
<mml:mtext>i</mml:mtext>
</mml:msub>
</mml:mrow>
</mml:munderover>
<mml:mrow>
<mml:msub>
<mml:mtext>X</mml:mtext>
<mml:mrow>
<mml:mtext>i</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext>j</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:semantics>
</mml:math>
</disp-formula>
</p>
<p>where
<italic>X</italic>
<sub>
<italic>ij </italic>
</sub>
is the tRNA gene copy number for the
<italic>jth </italic>
anticodon for the
<italic>ith </italic>
amino acid, and
<italic>n</italic>
<sub>
<italic>i </italic>
</sub>
is the number of isoacceptors for the same amino acid. We assumed that tRNA abundance is directly proportional to tRNA gene copy number, as is the case in
<italic>S. cerevisiae </italic>
and other eukaryotes [
<xref ref-type="bibr" rid="B41">41</xref>
]. As for RSCUs, RIU values for each group of isoacceptors are similar when tRNA gene copy is not biased and different when tRNA gene copy number for each tRNA isoacceptor is biased.</p>
<p>Finally, a decoding adaptation quotient (DAQ), which quantified the relationship (adaptation) between codon usage and tRNA abundance, was calculated by dividing RSCU by RIU values of cognate codon/tRNA pairs (DAQ = RSCU/RIU). DAQ values of 1 indicate a perfect match between tRNA copy number and codon usage, while a DAQ>1 indicates that highly used codons (high RSCU) are decoded by tRNAs whose gene copy number is low (low abundance; low RIU), and DAQ<1 indicates that codons that are used less frequently (low RSCU) are decoded by abundant tRNAs (high gene copy number; high RIU).</p>
</sec>
</sec>
<sec>
<title>Authors' contributions</title>
<p>GM participated in the design of the study, performed the statistical analysis and drafted the manuscript; JPL created the software for codon triplet and codon repeats quantification; MP performed the tRNAscan-SE analysis; LC and RMS participated in the discussion of the results; JLO participated in the design of the study and coordinated the work of JPL and MP; and MASS conceived the study, and participated in its design and coordination. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional File 1</title>
<p>Supplementary figures. Supplementary figures representing: S1) histograms for codon-triplet distributions; S2) tRNAs and codon usage unbalance in
<italic>C. albicans</italic>
; S3A,B) codon repeat composition; and S4) the frequency of CTN codons in codon triplets that vanished from each ORFeome.</p>
</caption>
<media xlink:href="1471-2164-8-444-S1.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional File 2</title>
<p>Supplementary table. The data provided represent the RSCU values as calculated for the
<italic>C. albicans</italic>
' ORFeome.</p>
</caption>
<media xlink:href="1471-2164-8-444-S2.pdf" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>This study was supported by FEDER/FCT projects POCTI/BME/39030; SAU-MMO/55476; BIA-PRO/55472; BIA-MIC/55466 and Human Frontier Science Program project RGP45/2005. We are thankful to IEETA and the II-UA (project CTS-12) for supporting the development of the ANACONDA software. GM was supported by FCT grant SFRH/BPD/7195/2001, MP by HFSP and MASS was supported by an EMBO YIP Award.</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dong</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nilsson</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Kurland</surname>
<given-names>CG</given-names>
</name>
</person-group>
<article-title>Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates</article-title>
<source>J Mol Biol</source>
<year>1996</year>
<volume>260</volume>
<fpage>649</fpage>
<lpage>663</lpage>
<pub-id pub-id-type="pmid">8709146</pub-id>
<pub-id pub-id-type="doi">10.1006/jmbi.1996.0428</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xia</surname>
<given-names>X</given-names>
</name>
</person-group>
<article-title>How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae?</article-title>
<source>Genetics</source>
<year>1998</year>
<volume>149</volume>
<fpage>37</fpage>
<lpage>44</lpage>
<pub-id pub-id-type="pmid">9584084</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanaya</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yamada</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kinouchi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kudo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Ikemura</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis</article-title>
<source>J Mol Evol</source>
<year>2001</year>
<volume>53</volume>
<fpage>290</fpage>
<lpage>298</lpage>
<pub-id pub-id-type="pmid">11675589</pub-id>
<pub-id pub-id-type="doi">10.1007/s002390010219</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moura</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pinheiro</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Miranda</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Afreixo</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Dias</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Freitas</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Comparative context analysis of codon pairs on an ORFeome scale</article-title>
<source>Genome Biol</source>
<year>2005</year>
<volume>6</volume>
<fpage>R28</fpage>
<pub-id pub-id-type="pmid">15774029</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2005-6-3-r28</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boycheva</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chkodrov</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>Codon pairs in the genome of Escherichia coli</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>987</fpage>
<lpage>998</lpage>
<pub-id pub-id-type="pmid">12761062</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btg082</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Murgola</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Pagel</surname>
<given-names>FT</given-names>
</name>
<name>
<surname>Hijazi</surname>
<given-names>KA</given-names>
</name>
</person-group>
<article-title>Codon context effects in missense suppression</article-title>
<source>J Mol Biol</source>
<year>1984</year>
<volume>175</volume>
<fpage>19</fpage>
<lpage>27</lpage>
<pub-id pub-id-type="pmid">6374155</pub-id>
<pub-id pub-id-type="doi">10.1016/0022-2836(84)90442-X</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tork</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hatin</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Rousset</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Fabret</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>The major 5' determinant in stop codon read-through involves two adjacent adenines</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>32</volume>
<fpage>415</fpage>
<lpage>421</lpage>
<pub-id pub-id-type="pmid">14736996</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkh201</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Giddings</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Gesteland</surname>
<given-names>RF</given-names>
</name>
<name>
<surname>Atkins</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>IP</given-names>
</name>
</person-group>
<article-title>Computational identification of putative programmed translational frameshift sites</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<fpage>1046</fpage>
<lpage>1053</lpage>
<pub-id pub-id-type="pmid">12176827</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/18.8.1046</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Irwin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Heck</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Hatfield</surname>
<given-names>GW</given-names>
</name>
</person-group>
<article-title>Codon pair utilization biases influence translational elongation step times</article-title>
<source>J Biol Chem</source>
<year>1995</year>
<volume>270</volume>
<fpage>22801</fpage>
<lpage>22806</lpage>
<pub-id pub-id-type="pmid">7559409</pub-id>
<pub-id pub-id-type="doi">10.1074/jbc.270.39.22801</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bertrand</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Prere</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Gesteland</surname>
<given-names>RF</given-names>
</name>
<name>
<surname>Atkins</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Fayet</surname>
<given-names>O</given-names>
</name>
</person-group>
<article-title>Influence of the stacking potential of the base 3' of tandem shift codons on -1 ribosomal frameshifting used for gene expression</article-title>
<source>RNA</source>
<year>2002</year>
<volume>8</volume>
<fpage>16</fpage>
<lpage>28</lpage>
<pub-id pub-id-type="pmid">11871658</pub-id>
<pub-id pub-id-type="doi">10.1017/S1355838202012086</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rheinberger</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Sternbach</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nierhaus</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>Three tRNA binding sites on Escherichia coli ribosomes</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1981</year>
<volume>78</volume>
<fpage>5310</fpage>
<lpage>5314</lpage>
<pub-id pub-id-type="pmid">7029532</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.78.9.5310</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wettstein</surname>
<given-names>FO</given-names>
</name>
<name>
<surname>Noll</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Binding of transfer ribonucleic acid to ribosomes engaged in protein synthesis: number and properties of ribosomal binding sites</article-title>
<source>J Mol Biol</source>
<year>1965</year>
<volume>11</volume>
<fpage>35</fpage>
<lpage>53</lpage>
<pub-id pub-id-type="pmid">14255759</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nierhaus</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>The allosteric three-site model for the ribosomal elongation cycle: features and future</article-title>
<source>Biochemistry</source>
<year>1990</year>
<volume>29</volume>
<fpage>4997</fpage>
<lpage>5008</lpage>
<pub-id pub-id-type="pmid">2198935</pub-id>
<pub-id pub-id-type="doi">10.1021/bi00473a001</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nierhaus</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>Decoding errors and the involvement of the E-site</article-title>
<source>Biochimie</source>
<year>2006</year>
<volume>88</volume>
<fpage>1013</fpage>
<lpage>1019</lpage>
<pub-id pub-id-type="pmid">16644089</pub-id>
<pub-id pub-id-type="doi">10.1016/j.biochi.2006.02.009</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilson</surname>
<given-names>DN</given-names>
</name>
<name>
<surname>Nierhaus</surname>
<given-names>KH</given-names>
</name>
</person-group>
<article-title>The E-site story: the importance of maintaining two tRNAs on the ribosome during protein synthesis</article-title>
<source>Cell Mol Life Sci</source>
<year>2006</year>
<volume>63</volume>
<fpage>2725</fpage>
<lpage>2737</lpage>
<pub-id pub-id-type="pmid">17013564</pub-id>
<pub-id pub-id-type="doi">10.1007/s00018-006-6125-4</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Korostelev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Trakhanov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Laurberg</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Noller</surname>
<given-names>HF</given-names>
</name>
</person-group>
<article-title>Crystal structure of a 70S ribosome-tRNA complex reveals functional interactions and rearrangements</article-title>
<source>Cell</source>
<year>2006</year>
<volume>126</volume>
<fpage>1065</fpage>
<lpage>1077</lpage>
<pub-id pub-id-type="pmid">16962654</pub-id>
<pub-id pub-id-type="doi">10.1016/j.cell.2006.08.032</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Grosjean</surname>
<given-names>H</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Kirkwood TBL, Rosenberger RF, Galas DJ</surname>
</name>
</person-group>
<article-title>The accuracy of mRNA-tRNA recognition</article-title>
<source>Accuracy in Molecular Processes: Its Control and Relevance to Living Systems</source>
<year>1986</year>
<publisher-name>London: Chapman and Hall</publisher-name>
<fpage>83</fpage>
<lpage>126</lpage>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tate</surname>
<given-names>WP</given-names>
</name>
<name>
<surname>Poole</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Mannering</surname>
<given-names>SA</given-names>
</name>
</person-group>
<article-title>Hidden infidelities of the translational stop signal</article-title>
<source>Prog Nucleic Acid Res Mol Biol</source>
<year>1996</year>
<volume>52</volume>
<fpage>293</fpage>
<lpage>335</lpage>
<pub-id pub-id-type="pmid">8821264</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Witten</surname>
<given-names>IH</given-names>
</name>
<name>
<surname>Frank</surname>
<given-names>E</given-names>
</name>
</person-group>
<source>Data Mining: Practical machine learning tools and techniques</source>
<year>2005</year>
<publisher-name>San Francisco: Morgan Kaufmann</publisher-name>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ohama</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Suzuki</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Mori</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Osawa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ueda</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Nakase</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Non-universal decoding of the leucine codon CUG in several
<italic>Candida </italic>
species</article-title>
<source>Nucleic Acids Res</source>
<year>1993</year>
<volume>21</volume>
<fpage>4039</fpage>
<lpage>4045</lpage>
<pub-id pub-id-type="pmid">8371978</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/21.17.4039</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Santos</surname>
<given-names>MAS</given-names>
</name>
<name>
<surname>Tuite</surname>
<given-names>MF</given-names>
</name>
</person-group>
<article-title>The CUG codon is decoded in vivo as serine and not leucine in Candida albicans</article-title>
<source>Nucleic Acids Res</source>
<year>1995</year>
<volume>23</volume>
<fpage>1481</fpage>
<lpage>1486</lpage>
<pub-id pub-id-type="pmid">7784200</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/23.9.1481</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sugita</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Nakase</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Nonuniversal usage of the leucine CUG codon in yeasts: Investigation of basidiomycetous yeast</article-title>
<source>J Gen Appl Microbiol</source>
<year>1999</year>
<volume>45</volume>
<fpage>193</fpage>
<lpage>197</lpage>
<pub-id pub-id-type="pmid">12501377</pub-id>
<pub-id pub-id-type="doi">10.2323/jgam.45.193</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sharp</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>WH</given-names>
</name>
</person-group>
<article-title>The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications</article-title>
<source>Nucleic Acids Res</source>
<year>1987</year>
<volume>15</volume>
<fpage>1281</fpage>
<lpage>1295</lpage>
<pub-id pub-id-type="pmid">3547335</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/15.3.1281</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Scorer</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Carrier</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Rosenberger</surname>
<given-names>RF</given-names>
</name>
</person-group>
<article-title>Amino acid misincorporation during high-level expression of mouse epidermal growth factor in Escherichia coli</article-title>
<source>Nucleic Acids Res</source>
<year>1991</year>
<volume>19</volume>
<fpage>3511</fpage>
<lpage>3516</lpage>
<pub-id pub-id-type="pmid">1852602</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/19.13.3511</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kramer</surname>
<given-names>EB</given-names>
</name>
<name>
<surname>Farabaugh</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<article-title>The frequency of translational misreading errors in E. coli is largely determined by tRNA competition</article-title>
<source>RNA</source>
<year>2007</year>
<volume>13</volume>
<fpage>87</fpage>
<lpage>96</lpage>
<pub-id pub-id-type="pmid">17095544</pub-id>
<pub-id pub-id-type="doi">10.1261/rna.294907</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miranda</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Rocha</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Mateus</surname>
<given-names>DD</given-names>
</name>
<name>
<surname>Moura</surname>
<given-names>GR</given-names>
</name>
<name>
<surname>Carreto</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>A Genetic Code Alteration Is a Phenotype Diversity Generator in the Human Pathogen Candida albicans</article-title>
<source>PLoS ONE</source>
<year>2007</year>
<volume>2</volume>
<fpage>e996</fpage>
<pub-id pub-id-type="pmid">17912373</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pone.0000996</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gomes</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Miranda</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Moura</surname>
<given-names>GR</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Akoulitchev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>A genetic code alteration generates a proteome of high diversity in the human pathogen Candida albicans</article-title>
<source>Genome Biol</source>
<year>2007</year>
<volume>8</volume>
<fpage>R206</fpage>
<pub-id pub-id-type="pmid">17916231</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2007-8-10-r206</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Young</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Sloan</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Van Riper</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae</article-title>
<source>Genetics</source>
<year>2000</year>
<volume>154</volume>
<fpage>1053</fpage>
<lpage>1068</lpage>
<pub-id pub-id-type="pmid">10757753</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karlin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Brocchieri</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Bergman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mrazek</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gentles</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>Amino acid runs in eukaryotic proteomes and disease associations</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2002</year>
<volume>99</volume>
<fpage>333</fpage>
<lpage>338</lpage>
<pub-id pub-id-type="pmid">11782551</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.012608599</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Massey</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Moura</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Beltrao</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Almeida</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Garey</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Tuite</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Comparative evolutionary genomics unveils the molecular mechanism of reassignment of the CTG codon in Candida spp</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<fpage>544</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="pmid">12670996</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.811003</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Howard</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Roepe</surname>
<given-names>PD</given-names>
</name>
</person-group>
<article-title>Analysis of the antimalarial drug resistance protein Pfcrt expressed in yeast</article-title>
<source>J Biol Chem</source>
<year>2002</year>
<volume>277</volume>
<fpage>49767</fpage>
<lpage>49775</lpage>
<pub-id pub-id-type="pmid">12351620</pub-id>
<pub-id pub-id-type="doi">10.1074/jbc.M204005200</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beutler</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Gelbart</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Koziol</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Beutler</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1989</year>
<volume>86</volume>
<fpage>192</fpage>
<lpage>196</lpage>
<pub-id pub-id-type="pmid">2463621</pub-id>
<pub-id pub-id-type="doi">10.1073/pnas.86.1.192</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moura</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pinheiro</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Freitas</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Oliveira</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MA</given-names>
</name>
</person-group>
<article-title>Computational and Statistical Methodologies for ORFeome Primary Structure Analysis</article-title>
<source>Methods Mol Biol</source>
<year>2007</year>
<volume>395</volume>
<fpage>449</fpage>
<lpage>462</lpage>
<pub-id pub-id-type="pmid">17993691</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Dutta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Das</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Codon usage in highly expressed genes of Haemophillus influenzae and Mycobacterium tuberculosis: translational selection versus mutational bias</article-title>
<source>Gene</source>
<year>1998</year>
<volume>215</volume>
<fpage>405</fpage>
<lpage>413</lpage>
<pub-id pub-id-type="pmid">9714839</pub-id>
<pub-id pub-id-type="doi">10.1016/S0378-1119(98)00257-1</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chiusano</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Alvarez</surname>
<given-names>VF</given-names>
</name>
<name>
<surname>Di Giulio</surname>
<given-names>M</given-names>
</name>
<name>
<surname>D' Onofrio</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ammirato</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Colonna</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bernardi</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code</article-title>
<source>Gene</source>
<year>2000</year>
<volume>261</volume>
<fpage>63</fpage>
<lpage>69</lpage>
<pub-id pub-id-type="pmid">11164038</pub-id>
<pub-id pub-id-type="doi">10.1016/S0378-1119(00)00521-7</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rocha</surname>
<given-names>EP</given-names>
</name>
<name>
<surname>Matic</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Taddei</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Over-representation of repeats in stress response genes: a strategy to increase versatility under stressful conditions?</article-title>
<source>Nucleic Acids Res</source>
<year>2002</year>
<volume>30</volume>
<fpage>1886</fpage>
<lpage>1894</lpage>
<pub-id pub-id-type="pmid">11972324</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/30.9.1886</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Borstnik</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Pumpernik</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Tandem repeats in protein coding regions of primate genes</article-title>
<source>Genome Res</source>
<year>2002</year>
<volume>12</volume>
<fpage>909</fpage>
<lpage>915</lpage>
<pub-id pub-id-type="pmid">12045144</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.138802</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schwartz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Curran</surname>
<given-names>JF</given-names>
</name>
</person-group>
<article-title>Analyses of frameshifting at UUU-pyrimidine sites</article-title>
<source>Nucleic Acids Res</source>
<year>1997</year>
<volume>25</volume>
<fpage>2005</fpage>
<lpage>2011</lpage>
<pub-id pub-id-type="pmid">9115369</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/25.10.2005</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mottagui-Tabar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bjornsson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Isaksson</surname>
<given-names>LA</given-names>
</name>
</person-group>
<article-title>The second to last amino acid in the nascent peptide as a codon context determinant</article-title>
<source>EMBO J</source>
<year>1994</year>
<volume>13</volume>
<fpage>249</fpage>
<lpage>257</lpage>
<pub-id pub-id-type="pmid">8306967</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lowe</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence</article-title>
<source>Nucleic Acids Res</source>
<year>1997</year>
<volume>25</volume>
<fpage>955</fpage>
<lpage>964</lpage>
<pub-id pub-id-type="pmid">9023104</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/25.5.955</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geiduschek</surname>
<given-names>EP</given-names>
</name>
<name>
<surname>Tocchini-Valentini</surname>
<given-names>GP</given-names>
</name>
</person-group>
<article-title>Transcription by RNA polymerase III</article-title>
<source>Annu Rev Biochem</source>
<year>1988</year>
<volume>57</volume>
<fpage>873</fpage>
<lpage>914</lpage>
<pub-id pub-id-type="pmid">3052292</pub-id>
<pub-id pub-id-type="doi">10.1146/annurev.bi.57.070188.004301</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Aspergillus fumigatus</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Aspergillus_fumigatus/"></ext-link>
</citation>
</ref>
<ref id="B43">
<citation citation-type="other">
<article-title>Candida Genome Database</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.candidagenome.org/download/sequence/Assembly19/archived_as_released/"></ext-link>
</citation>
</ref>
<ref id="B44">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Candida glabrata</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Candida_glabrata_CBS138/"></ext-link>
</citation>
</ref>
<ref id="B45">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Debaryomyces hansenii</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Debaryomyces_hansenii_CBS767/"></ext-link>
</citation>
</ref>
<ref id="B46">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Eremothecium gossypii</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Eremothecium_gossypii/"></ext-link>
</citation>
</ref>
<ref id="B47">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Kluyveromyces lactis</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Kluyveromyces_lactis_NRRL_Y-1140/"></ext-link>
</citation>
</ref>
<ref id="B48">
<citation citation-type="other">
<article-title>The Broad Institute Database</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.broad.mit.edu/annotation/fungi/comp_yeasts/downloads.html"></ext-link>
</citation>
</ref>
<ref id="B49">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Saccharomyces cerevisiae</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Saccharomyces_cerevisiae/"></ext-link>
</citation>
</ref>
<ref id="B50">
<citation citation-type="other">
<article-title>NCBI Genbank Link for Schizosaccharomyces pombe</article-title>
<ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/Fungi/Schizosaccharomyces_pombe/"></ext-link>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000247 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000247 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2244636
   |texte=   Codon-triplet context unveils unique features of the Candida albicans protein coding genome
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:18047667" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024