Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000515 ( Pmc/Corpus ); précédent : 0005149; suivant : 0005160 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Large Scale Comparative Codon-Pair Context Analysis Unveils General Rules that Fine-Tune Evolution of mRNA Primary Structure</title>
<author>
<name sortKey="Moura, Gabriela" sort="Moura, Gabriela" uniqKey="Moura G" first="Gabriela" last="Moura">Gabriela Moura</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pinheiro, Miguel" sort="Pinheiro, Miguel" uniqKey="Pinheiro M" first="Miguel" last="Pinheiro">Miguel Pinheiro</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Arrais, Joel" sort="Arrais, Joel" uniqKey="Arrais J" first="Joel" last="Arrais">Joel Arrais</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gomes, Ana Cristina" sort="Gomes, Ana Cristina" uniqKey="Gomes A" first="Ana Cristina" last="Gomes">Ana Cristina Gomes</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Carreto, Laura" sort="Carreto, Laura" uniqKey="Carreto L" first="Laura" last="Carreto">Laura Carreto</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Freitas, Adelaide" sort="Freitas, Adelaide" uniqKey="Freitas A" first="Adelaide" last="Freitas">Adelaide Freitas</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Mathematics, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose L" sort="Oliveira, Jose L" uniqKey="Oliveira J" first="José L." last="Oliveira">José L. Oliveira</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Santos, Manuel A S" sort="Santos, Manuel A S" uniqKey="Santos M" first="Manuel A. S." last="Santos">Manuel A. S. Santos</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17786218</idno>
<idno type="pmc">1952141</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1952141</idno>
<idno type="RBID">PMC:1952141</idno>
<idno type="doi">10.1371/journal.pone.0000847</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000515</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000515</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Large Scale Comparative Codon-Pair Context Analysis Unveils General Rules that Fine-Tune Evolution of mRNA Primary Structure</title>
<author>
<name sortKey="Moura, Gabriela" sort="Moura, Gabriela" uniqKey="Moura G" first="Gabriela" last="Moura">Gabriela Moura</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pinheiro, Miguel" sort="Pinheiro, Miguel" uniqKey="Pinheiro M" first="Miguel" last="Pinheiro">Miguel Pinheiro</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Arrais, Joel" sort="Arrais, Joel" uniqKey="Arrais J" first="Joel" last="Arrais">Joel Arrais</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gomes, Ana Cristina" sort="Gomes, Ana Cristina" uniqKey="Gomes A" first="Ana Cristina" last="Gomes">Ana Cristina Gomes</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Carreto, Laura" sort="Carreto, Laura" uniqKey="Carreto L" first="Laura" last="Carreto">Laura Carreto</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Freitas, Adelaide" sort="Freitas, Adelaide" uniqKey="Freitas A" first="Adelaide" last="Freitas">Adelaide Freitas</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Department of Mathematics, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oliveira, Jose L" sort="Oliveira, Jose L" uniqKey="Oliveira J" first="José L." last="Oliveira">José L. Oliveira</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Santos, Manuel A S" sort="Santos, Manuel A S" uniqKey="Santos M" first="Manuel A. S." last="Santos">Manuel A. S. Santos</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Codon usage and codon-pair context are important gene primary structure features that influence mRNA decoding fidelity. In order to identify general rules that shape codon-pair context and minimize mRNA decoding error, we have carried out a large scale comparative codon-pair context analysis of 119 fully sequenced genomes.</p>
</sec>
<sec>
<title>Methodologies/Principal Findings</title>
<p>We have developed mathematical and software tools for large scale comparative codon-pair context analysis. These methodologies unveiled general and species specific codon-pair context rules that govern evolution of mRNAs in the 3 domains of life. We show that evolution of bacterial and archeal mRNA primary structure is mainly dependent on constraints imposed by the translational machinery, while in eukaryotes DNA methylation and tri-nucleotide repeats impose strong biases on codon-pair context.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The data highlight fundamental differences between prokaryotic and eukaryotic mRNA decoding rules, which are partially independent of codon usage.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-title>PLoS ONE</journal-title>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17786218</article-id>
<article-id pub-id-type="pmc">1952141</article-id>
<article-id pub-id-type="publisher-id">07-PONE-RA-01385R1</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0000847</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline">
<subject>Molecular Biology</subject>
<subject>Evolutionary Biology/Genomics</subject>
<subject>Genetics and Genomics/Bioinformatics</subject>
<subject>Genetics and Genomics/Genomics</subject>
<subject>Molecular Biology/Bioinformatics</subject>
<subject>Molecular Biology/Molecular Evolution</subject>
<subject>Molecular Biology/Translation Mechanisms</subject>
<subject>Molecular Biology/Translational Regulation</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Large Scale Comparative Codon-Pair Context Analysis Unveils General Rules that Fine-Tune Evolution of mRNA Primary Structure</article-title>
<alt-title alt-title-type="running-head">Codon-Context Evolution</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Moura</surname>
<given-names>Gabriela</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pinheiro</surname>
<given-names>Miguel</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Arrais</surname>
<given-names>Joel</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gomes</surname>
<given-names>Ana Cristina</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Carreto</surname>
<given-names>Laura</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Freitas</surname>
<given-names>Adelaide</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Oliveira</surname>
<given-names>José L.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Santos</surname>
<given-names>Manuel A. S.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="n101">
<sup>*</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>Department of Biology, Center for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Institute of Electronics and Telematics Engineering, University of Aveiro, Aveiro, Portugal</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Department of Mathematics, University of Aveiro, Aveiro, Portugal</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Christoffels</surname>
<given-names>Alan</given-names>
</name>
<role>Academic Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">Temasek Life Sciences Laboratory, Singapore</aff>
<author-notes>
<corresp id="n101">* To whom correspondence should be addressed. E-mail:
<email>msantos@bio.ua.pt</email>
</corresp>
<fn fn-type="con">
<p>Conceived and designed the experiments: MS GM MP JO. Performed the experiments: GM MP AG LC. Analyzed the data: GM MP JA AF JO. Contributed reagents/materials/analysis tools: GM. Wrote the paper: MS GM. Other: Developed software: JO JA MP.</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2007</year>
</pub-date>
<pub-date pub-type="epub">
<day>5</day>
<month>9</month>
<year>2007</year>
</pub-date>
<volume>2</volume>
<issue>9</issue>
<elocation-id>e847</elocation-id>
<history>
<date date-type="received">
<day>25</day>
<month>5</month>
<year>2007</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>7</month>
<year>2007</year>
</date>
</history>
<copyright-statement>Moura et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</copyright-statement>
<copyright-year>2007</copyright-year>
<abstract>
<sec>
<title>Background</title>
<p>Codon usage and codon-pair context are important gene primary structure features that influence mRNA decoding fidelity. In order to identify general rules that shape codon-pair context and minimize mRNA decoding error, we have carried out a large scale comparative codon-pair context analysis of 119 fully sequenced genomes.</p>
</sec>
<sec>
<title>Methodologies/Principal Findings</title>
<p>We have developed mathematical and software tools for large scale comparative codon-pair context analysis. These methodologies unveiled general and species specific codon-pair context rules that govern evolution of mRNAs in the 3 domains of life. We show that evolution of bacterial and archeal mRNA primary structure is mainly dependent on constraints imposed by the translational machinery, while in eukaryotes DNA methylation and tri-nucleotide repeats impose strong biases on codon-pair context.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The data highlight fundamental differences between prokaryotic and eukaryotic mRNA decoding rules, which are partially independent of codon usage.</p>
</sec>
</abstract>
<counts>
<page-count count="10"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>A myriad of evolutionary forces shape the primary structure of coding components (ORFs) of genomes, herein called ORFeomes. These include genome and gene duplication, chromosome rearrangements, DNA recombination, deletions and insertions, transposition of mobile elements, single nucleotide polymorphisms, nucleotide repeats and biased G+C pressure
<xref ref-type="bibr" rid="pone.0000847-Cliften1">[1]</xref>
<xref ref-type="bibr" rid="pone.0000847-Chen1">[4]</xref>
. Apart from these DNA replication derived phenomena others arising from DNA transcription, mRNA stability and translation
<xref ref-type="bibr" rid="pone.0000847-Chan1">[5]</xref>
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
are also likely to fine tune ORFeomes' primary structure, but their significance is not yet fully understood.</p>
<p>At the mRNA translation level, synonymous codon usage and codon-pair context (representing the pair of codons located in the A and P- ribosome sites) are expected to be under selective pressure since they affect mRNA decoding speed and accuracy
<xref ref-type="bibr" rid="pone.0000847-Berg1">[8]</xref>
<xref ref-type="bibr" rid="pone.0000847-Shah1">[15]</xref>
. Synonymous codon usage biases are explained mainly by G+C content and only secondarily by constraints imposed by mRNA translation variables
<xref ref-type="bibr" rid="pone.0000847-Chen1">[4]</xref>
, namely tRNA abundance, efficiency of tRNA charging, mRNA decoding efficiency (speed plus accuracy), mRNA stability and structure, gene expression, and amino acid composition
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Ogle1">[13]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Buckingham1">[16]</xref>
<xref ref-type="bibr" rid="pone.0000847-Curran2">[18]</xref>
. The nucleotides surrounding a codon also influence synonymous codon usage, with the strongest influence arising from the interplay between the last nucleotide of a codon and the first nucleotide of the neighbor codon (N
<sub>1</sub>
N
<sub>2</sub>
<bold>N
<sub>3</sub>
</bold>
<bold>N
<sub>1</sub>
</bold>
N
<sub>2</sub>
N
<sub>3</sub>
), the so called N
<sub>3</sub>
-N
<sub>1</sub>
context
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Fedorov1">[19]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
. Conversely to codon usage, the forces that modulate codon-pair context, with the exception of the context of initiation and termination codons
<xref ref-type="bibr" rid="pone.0000847-Buckingham1">[16]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Tate1">[21]</xref>
, are still poorly understood. The few studies carried out to date show, however, that codon-pair context has a direct impact on missense, nonsense and frameshifting errors
<xref ref-type="bibr" rid="pone.0000847-Shah1">[15]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Murgola1">[22]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Tork1">[23]</xref>
.</p>
<p>In
<italic>E. coli</italic>
, missense error
<italic>in vivo</italic>
, under standard growth conditions, is in the order of 10
<sup>−3</sup>
to 10
<sup>−4</sup>
per codon decoded
<xref ref-type="bibr" rid="pone.0000847-Rodnina1">[24]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Kramer1">[25]</xref>
. Frameshifting and stop codon readthrough errors happen at levels of 3×10
<sup>−4</sup>
to 10
<sup>−5</sup>
and of 10
<sup>−3</sup>
to 10
<sup>−6</sup>
, respectively
<xref ref-type="bibr" rid="pone.0000847-Atkins1">[26]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Freistroffer1">[27]</xref>
. Under stress, namely amino acid starvation, these basal error rates increase significantly
<xref ref-type="bibr" rid="pone.0000847-Buckingham1">[16]</xref>
, indicating that decoding error in nature may be significantly higher than in optimal laboratory conditions. Furthermore, 30% of the newly synthesized proteins in HeLa, lymph node, L-K
<sup>b</sup>
and dendritic cells are defective ribosomal products (DRiPs) that arise from missense, frameshifting and ribosome drop off at mRNA pausing sites
<xref ref-type="bibr" rid="pone.0000847-Princiotta1">[28]</xref>
. Since protein synthesis utilizes 45% of the cell ATP, 30% DRiP rate represents 11% of wastage of total cellular energy
<xref ref-type="bibr" rid="pone.0000847-Princiotta1">[28]</xref>
. Whether this is a common trend in all type of cells is unknown, however, peptides resulting from proteasome degradation of DRiPs are a major source of peptides for MHC class I molecules, highlighting an unanticipated role of mistranslation in immune cells
<xref ref-type="bibr" rid="pone.0000847-Princiotta1">[28]</xref>
.</p>
<p>It is not yet clear whether the ribosome drops off randomly or preferentially at specific mRNA drop off hot spots. In other words, it is important to elucidate whether average decoding error (10
<sup>−4</sup>
to 10
<sup>−5</sup>
) is evenly distributed along mRNAs (average error) or whether it fluctuates along the mRNA? If so, how can decoding error hot spots be identified? In order to obtain insight into these questions and identify mRNA primary structural features that influence mRNA decoding error, we have developed a software package, statistical and graphical tools to study codon-pairs corresponding to ribosomal A- and P-site codons, using genome wide approaches (ANACONDA vs 1.0)
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Pinheiro1">[29]</xref>
. ANACONDA 1.0 already allowed us to demonstrate that codon-pair context is weakly modulated by G+C pressure
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
. In the present study, we have significantly improved ANACONDA (creating its version 2.0) and used it to carry out large scale comparative codon-pair context analysis using complete ORFeome sequences of 81 Eubacteria, 18 Archaea and 20 Eukaryota. The data show that i) codon-pair context is species specific, ii) there are general rules governing its evolution in the three domains of life and iii) in eubacteria and archeae codon-pair context is mainly determined by constraints imposed by the translational machinery, while, iv) in eukaryotes the emergence of DNA methylation and tri-nucleotide repeats influenced codon-pair context. The data suggests the existence of fundamental differences between prokaryotic and eukaryotic mRNA decoding rules and shows that codon-pair context is partially independent of codon usage.</p>
</sec>
<sec id="s2">
<title>Results</title>
<sec id="s2a">
<title>New tools for large scale comparative codon-pair context analysis</title>
<p>The ANACONDA 1.0 algorithm developed previously
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Pinheiro1">[29]</xref>
simulates the ribosome during decoding by reading Open Reading Frames (ORFs) sequences, starting at the AUG initiation codon and moving the reading window three nucleotides at a time (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1A</xref>
). While doing this, it memorizes all codon-pairs, which represent A- and P-site codons during mRNA decoding. It then builds a codon-pair contingency table (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1B</xref>
) that renders itself to statistical analysis and permits determination of the codon-pair context bias
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
. The existence of association between codon-pairs is determined through the chi-square (χ
<sup>2</sup>
) test of independence and preferred and rejected pairs of codons are identified through the analysis of adjusted residuals for contingency tables. These rejected and preferred pairs of codons are then displayed in a 61x64 green (preferred) and red (rejected) color coded map that generates a global view of the codon-pair context data for any ORFeome (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1C</xref>
). ANACONDA 1.0 also clusters the data according to the context preferences and rejections (residuals values) and builds Differential Display Maps (DDM), which represent codon-pair context differences between two different ORFeomes (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2</xref>
).</p>
<fig id="pone-0000847-g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.g001</object-id>
<label>Figure 1</label>
<caption>
<title>Flowchart of the codon-pair context analysis performed by ANACONDA.</title>
<p>A) The software selects valid ORFs from the total set available for each species (ORFeome) and counts all combinations of two consecutive codons (codon-pair context) that are present in the sequences. B) The observed values are incorporated into a contingency table in which the lines correspond to the 5′ codon (ribosome P-site) and the columns to the 3′ codon (ribosome A-site) of each pair. C) The contingency table of observed values is then compared to another table in which the values expected under independence are calculated. The cell corresponding to each pair of codons was colored in green for preferred contexts or red for rejected ones. This produces a color-coded map for the 61×64 two-codon contexts of one ORFeome. D) To aid simultaneous comparison of a large set of ORFeomes the 61×64 map is automatically converted into one single column with 3904 lines, one for each pair of codons. E) Finally, the columns that illustrate the two-codon context bias of each individual ORFeome are placed side by side, yielding a large-scale codon context comparison map. Both maps for codon context bias, i.e. the 61×64 map for a single species and the large-scale codon context comparison map can be rearranged using clustering methodologies that highlight similar codon-pair context patterns. For detailed description of statistics and software, see
<xref ref-type="sec" rid="s4">Methods</xref>
or
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Pinheiro1">[29]</xref>
.</p>
</caption>
<graphic xlink:href="pone.0000847.g001"></graphic>
</fig>
<fig id="pone-0000847-g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Codon-pair context is species specific.</title>
<p>A) Individual codon-pair context maps built for various genomes followed phylogeny indicating that codon-pair context is species specific. For instance, the human ORFeome map is more similar to that of chimpanzee (
<italic>Pan troglodytes</italic>
) than to the mouse (
<italic>Mus musculus</italic>
) map. B) This result was confirmed using differential display maps (DDM) that subtract two codon-pair context maps. For example,
<italic>H. sapiens</italic>
<italic>M. musculus</italic>
(
<italic>H.s</italic>
vs
<italic>M.m</italic>
);
<italic>H. sapiens</italic>
<italic>P. troglodytes</italic>
(
<italic>H.s</italic>
vs
<italic>P.t</italic>
). In these differential display maps major codon-pair context differences (above 15) are shown in light blue and darker maps correspond to species with more similar codon-pair context biases. In the present example, the maps of
<italic>H.s</italic>
vs
<italic>M.m</italic>
and
<italic>H.s</italic>
vs
<italic>P.t</italic>
have 6% and 1% of blue cells, respectively. C and D) The same phylogenetical relationship could be detected for bacterial ORFeomes, as exemplified for
<italic>Escherichia coli</italic>
,
<italic>Bacillus cereus</italic>
and
<italic>Salmonella typhi</italic>
. The DDM built with these species have 55% (
<italic>E.c</italic>
vs
<italic>B.c</italic>
) and 20% (
<italic>E.c</italic>
vs
<italic>S.t</italic>
) of blue cells. E) Finally, the phylogenetical relationship was maintained when the above species were clustered according to the similarities of the codon-pair context maps. The yeasts
<italic>Saccharomyces cerevisiae</italic>
and
<italic>Schizosaccharomyces pombe</italic>
were added to include an intermediate group of lower eukaryotes in the tree. Adjusted residuals are colored in the maps according to the color scale shown, so that green cells correspond to preferred and red cells to rejected contexts.</p>
</caption>
<graphic xlink:href="pone.0000847.g002"></graphic>
</fig>
<p>In an attempt to identify putative general rules that govern codon-pair context, we have carried out large scale codon-pair comparisons, using ANACONDA version 2.0. For this, new algorithms and tools were developed to convert the 61x64 codon-pair context colour-coded maps into a single colour-coded column containing 3904 lines, representing all possible combinations of pairs of the 64 codons (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1D</xref>
). ANACONDA 2.0 compared these colour-coded columns, clustered the data and highlighted groups of codons that had similar pair preference and rejection patterns (
<xref ref-type="fig" rid="pone-0000847-g001">Figures 1E</xref>
<xref ref-type="fig" rid="pone-0000847-g002"></xref>
<xref ref-type="fig" rid="pone-0000847-g003">3</xref>
). Since the size of ORFeomes varied significantly between bacteria and eukaryotes, ANACONDA 2.0 normalized the data using the biggest ORFeome as a reference data set (
<xref ref-type="supplementary-material" rid="pone.0000847.s001">Figure S1</xref>
). This permitted carrying out direct comparisons of large and small ORFeomes and allowed the study of codon-pair context preferences (positive residual value; green color in the map) and rejections (negative residual values; red color in the map) of 119 ORFeomes of Eubacteria, Archaea and Eukarya, including the human and chimpanzee ORFeomes (
<xref ref-type="supplementary-material" rid="pone.0000847.s001">Figure S2A</xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s002"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s003">C</xref>
).</p>
</sec>
<sec id="s2b">
<title>Codon-pair context preferences are species specific</title>
<p>Codon-pair context maps showed remarkable diversity from bacteria to high eukaryotes (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2</xref>
;
<xref ref-type="supplementary-material" rid="pone.0000847.s006">Figure S3A</xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s007"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s008"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s009"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s010"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s011"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s012"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s013"></xref>
<xref ref-type="supplementary-material" rid="pone.0000847.s014">J</xref>
). For example, codon-pair context preferences of the human (
<italic>Homo sapiens</italic>
or
<italic>H.s</italic>
) and mouse (
<italic>Mus musculus</italic>
or
<italic>M.m</italic>
) ORFeomes showed several differences, which were unveiled by direct comparison of ORFeomes and construction of Differential Display Maps (DDM) (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2A,B</xref>
), as described in our previous study
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
. Conversely, the codon-pair context maps for the chimpanzee (
<italic>Pan troglodytes</italic>
or
<italic>P.t</italic>
) and human ORFeomes were remarkably similar (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2B</xref>
), which was in agreement with the high homogeneity found for codon-pair distributions of both ORFeomes (data not shown). The same trend was found in bacteria. Indeed, the
<italic>Escherichia coli</italic>
(
<italic>E.c</italic>
) ORFeome codon-pair context map was more similar to that of
<italic>Salmonella typhi</italic>
(
<italic>S.t</italic>
) than to
<italic>Bacillus cereus</italic>
(
<italic>B.c</italic>
in
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2C,D</xref>
). Clustering of the codon-pair context maps showed that codon-pair context follows rRNA phylogeny (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2E</xref>
), highlighting the possibility of using codon-pair context maps as species specific fingerprints. Furthermore, the overall correlation between the 3 domains of life was lower than that calculated within each domain, as the Spearman's correlations of the ranks (
<xref ref-type="supplementary-material" rid="pone.0000847.s018">Table S1</xref>
) showed low correlation coefficients between species of different domains of life, i.e. 0,452 for Eukarya vs Archaea; 0,450 for Eukarya vs Eubacteria; and 0,500 for Archaea vs Eubacteria. While correlation coefficients calculated between species of the same domain were high, i.e. 0,988 among Eukarya (between
<italic>H. sapiens</italic>
and
<italic>P. troglodytes</italic>
); 0,823 among Archaea (between
<italic>P. abyssi</italic>
and
<italic>P. horikoshii</italic>
); and 0,959 among Eubacteria (between
<italic>E. coli</italic>
and
<italic>S. flexneri</italic>
).</p>
<p>The distribution of residual values over the entire set of ORFeomes showed that the 3 domains of life have significantly different codon-pair preferences (
<xref ref-type="table" rid="pone-0000847-t001">Tables 1</xref>
,
<xref ref-type="table" rid="pone-0000847-t002">2</xref>
). For example, codon-pair contexts with highest and lowest adjusted residual values showed no common codon-pairs in the 3 domains of life, suggesting fundamental differences between eukarya, eubacteria and archeae in codon-pair rules and in the evolutionary forces that shape ORFeomes primary structure. Interestingly, 9 out of the 10 codon-pair contexts with highest residual values (best codon-pairs) of all eukaryotic ORFeomes were pairs formed by identical codons (codon repeats) (
<xref ref-type="table" rid="pone-0000847-t001">Table 1</xref>
). The same trend was also detected when the most frequently preferred codon-pair contexts for each domain were compared (
<xref ref-type="table" rid="pone-0000847-t002">Table 2</xref>
). With this approach, common codon-pair contexts were identified for the 3 domains of life. For example, AAU-CCA and GGC-UGU had positive residuals in Eubacteria and Archaea. In Eukarya and Archaea ACU-AAG had negative residuals and AGA-AGA had positive residuals in Eubacteria and Eukarya. This suggested that, despite the species specificity of codon-pair context maps, at least some of the evolutionary constraints that shaped codon-pair context are conserved across species in the three domains of life.</p>
<table-wrap id="pone-0000847-t001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.t001</object-id>
<label>Table 1</label>
<caption>
<title>The most biased codon-pair contexts.</title>
</caption>
<graphic id="pone-0000847-t001-1" xlink:href="pone.0000847.t001"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0000847-t001-1">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td colspan="6" align="left" rowspan="1">The 10 lowest residual values</td>
</tr>
<tr>
<td colspan="2" align="left" rowspan="1">EUBACTERIA</td>
<td colspan="2" align="left" rowspan="1">ARCHAEA</td>
<td colspan="2" align="left" rowspan="1">EUKARYOTA</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Residual</td>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Residual</td>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Residual</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GCC>CUG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−308,976</td>
<td align="left" rowspan="1" colspan="1">GAC>GUC</td>
<td align="left" rowspan="1" colspan="1">−216,464</td>
<td align="left" rowspan="1" colspan="1">
<underline>CUG>GAG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−135,197</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>CUG>GCG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−277,801</td>
<td align="left" rowspan="1" colspan="1">
<underline>GGC>GCC</underline>
</td>
<td align="left" rowspan="1" colspan="1">−201,205</td>
<td align="left" rowspan="1" colspan="1">GUC>GAG</td>
<td align="left" rowspan="1" colspan="1">−125,758</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>CUG>GGC</underline>
</td>
<td align="left" rowspan="1" colspan="1">−248,528</td>
<td align="left" rowspan="1" colspan="1">
<underline>CUG>GAG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−187,918</td>
<td align="left" rowspan="1" colspan="1">UUU>AAG</td>
<td align="left" rowspan="1" colspan="1">−118,366</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UUC>GAG</td>
<td align="left" rowspan="1" colspan="1">−235,436</td>
<td align="left" rowspan="1" colspan="1">
<underline>GGC>GCC</underline>
</td>
<td align="left" rowspan="1" colspan="1">−183,471</td>
<td align="left" rowspan="1" colspan="1">AAU>UUA</td>
<td align="left" rowspan="1" colspan="1">−118,201</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GCC>GGC</td>
<td align="left" rowspan="1" colspan="1">−231,399</td>
<td align="left" rowspan="1" colspan="1">
<underline>CUC>GAG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−178,679</td>
<td align="left" rowspan="1" colspan="1">GGC>CUG</td>
<td align="left" rowspan="1" colspan="1">−110,765</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CUG>CUC</td>
<td align="left" rowspan="1" colspan="1">−226,625</td>
<td align="left" rowspan="1" colspan="1">
<bold>CUC>GAG</bold>
</td>
<td align="left" rowspan="1" colspan="1">−176,574</td>
<td align="left" rowspan="1" colspan="1">CUC>GAG</td>
<td align="left" rowspan="1" colspan="1">−109,220</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GUG>GCG</td>
<td align="left" rowspan="1" colspan="1">−224,022</td>
<td align="left" rowspan="1" colspan="1">GAG>CUC</td>
<td align="left" rowspan="1" colspan="1">−169,707</td>
<td align="left" rowspan="1" colspan="1">
<underline>GCC>GAA</underline>
</td>
<td align="left" rowspan="1" colspan="1">−107,698</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CUC>CGC</td>
<td align="left" rowspan="1" colspan="1">−223,365</td>
<td align="left" rowspan="1" colspan="1">
<underline>CUC>GAG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−148,041</td>
<td align="left" rowspan="1" colspan="1">CUC>CUG</td>
<td align="left" rowspan="1" colspan="1">−107,332</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CUG>CAG</td>
<td align="left" rowspan="1" colspan="1">−222,711</td>
<td align="left" rowspan="1" colspan="1">UUA>GAU</td>
<td align="left" rowspan="1" colspan="1">−145,409</td>
<td align="left" rowspan="1" colspan="1">
<underline>GCC>GAA</underline>
</td>
<td align="left" rowspan="1" colspan="1">−107,194</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GCC>CUG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−222,703</td>
<td align="left" rowspan="1" colspan="1">GAU>CCA</td>
<td align="left" rowspan="1" colspan="1">−141,241</td>
<td align="left" rowspan="1" colspan="1">AAA>UUU</td>
<td align="left" rowspan="1" colspan="1">−106,245</td>
</tr>
<tr>
<td colspan="6" align="left" rowspan="1">
<bold>The 10 highest residual values</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>CGA>UCG</underline>
</td>
<td align="left" rowspan="1" colspan="1">921,068</td>
<td align="left" rowspan="1" colspan="1">GUG>UUG</td>
<td align="left" rowspan="1" colspan="1">348,268</td>
<td align="left" rowspan="1" colspan="1">
<bold>AAU>AAU</bold>
</td>
<td align="left" rowspan="1" colspan="1">429,080</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GCG>AUC</underline>
</td>
<td align="left" rowspan="1" colspan="1">787,349</td>
<td align="left" rowspan="1" colspan="1">CUU>GCA</td>
<td align="left" rowspan="1" colspan="1">308,638</td>
<td align="left" rowspan="1" colspan="1">
<bold>
<underline>CAG>CAG</underline>
</bold>
</td>
<td align="left" rowspan="1" colspan="1">357,404</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GAU>CGC</underline>
</td>
<td align="left" rowspan="1" colspan="1">726,087</td>
<td align="left" rowspan="1" colspan="1">CUU>GAA</td>
<td align="left" rowspan="1" colspan="1">298,926</td>
<td align="left" rowspan="1" colspan="1">
<bold>
<underline>AGC>AGC</underline>
</bold>
</td>
<td align="left" rowspan="1" colspan="1">258,564</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GCG>AUC</underline>
</td>
<td align="left" rowspan="1" colspan="1">674,901</td>
<td align="left" rowspan="1" colspan="1">GAC>GCC</td>
<td align="left" rowspan="1" colspan="1">285,894</td>
<td align="left" rowspan="1" colspan="1">
<bold>
<underline>CAG>CAG</underline>
</bold>
</td>
<td align="left" rowspan="1" colspan="1">225,474</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>CGA>UCG</underline>
</td>
<td align="left" rowspan="1" colspan="1">635,246</td>
<td align="left" rowspan="1" colspan="1">CCU>GAA</td>
<td align="left" rowspan="1" colspan="1">283,549</td>
<td align="left" rowspan="1" colspan="1">
<bold>GAU>GAU</bold>
</td>
<td align="left" rowspan="1" colspan="1">217,335</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GGA>AGC</td>
<td align="left" rowspan="1" colspan="1">473,874</td>
<td align="left" rowspan="1" colspan="1">CCU>GGG</td>
<td align="left" rowspan="1" colspan="1">242,525</td>
<td align="left" rowspan="1" colspan="1">CCA>CCG</td>
<td align="left" rowspan="1" colspan="1">215,623</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GAU>CGC</underline>
</td>
<td align="left" rowspan="1" colspan="1">441,929</td>
<td align="left" rowspan="1" colspan="1">UUA>AAA</td>
<td align="left" rowspan="1" colspan="1">238,652</td>
<td align="left" rowspan="1" colspan="1">
<bold>
<underline>AGC>AGC</underline>
</bold>
</td>
<td align="left" rowspan="1" colspan="1">215,121</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GCG>CUG</td>
<td align="left" rowspan="1" colspan="1">429,895</td>
<td align="left" rowspan="1" colspan="1">
<bold>GCC>GCC</bold>
</td>
<td align="left" rowspan="1" colspan="1">238,422</td>
<td align="left" rowspan="1" colspan="1">
<bold>GGU>GGU</bold>
</td>
<td align="left" rowspan="1" colspan="1">215,059</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AAA>GAG</td>
<td align="left" rowspan="1" colspan="1">423,652</td>
<td align="left" rowspan="1" colspan="1">GAU>UUG</td>
<td align="left" rowspan="1" colspan="1">235,469</td>
<td align="left" rowspan="1" colspan="1">
<bold>AAG>AAG</bold>
</td>
<td align="left" rowspan="1" colspan="1">198,659</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>GCG>AUC</underline>
</td>
<td align="left" rowspan="1" colspan="1">416,940</td>
<td align="left" rowspan="1" colspan="1">GCC>GAC</td>
<td align="left" rowspan="1" colspan="1">234,454</td>
<td align="left" rowspan="1" colspan="1">
<bold>GAA>GAA</bold>
</td>
<td align="left" rowspan="1" colspan="1">198,519</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt101">
<p>In order to identify the strongest bias in codon-pair contexts they were ranked according to their residual values in Eubacteria, Archaea and Eukaryota. The 10 lowest or highest residuals obtained in each group are shown. Codon-pair contexts that appeared in more than one group are underlined, while codon-pair contexts of identical codons are shown in bold. Eubacteria showed the highest codon-pair biases since the amplitude of the adjusted residuals varied between −309 and 921. Interestingly, 9 out of the 10 highest residuals of eukaryotic ORFeomes corresponded to codon-pair contexts formed by identical codons in both positions (in bold).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="pone-0000847-t002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.t002</object-id>
<label>Table 2</label>
<caption>
<title>General codon-pair contexts.</title>
</caption>
<graphic id="pone-0000847-t002-2" xlink:href="pone.0000847.t002"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0000847-t002-2">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td colspan="6" align="left" rowspan="1">Negative codon-pair contexts</td>
</tr>
<tr>
<td colspan="2" align="left" rowspan="1">EUBACTERIA</td>
<td colspan="2" align="left" rowspan="1">ARCHAEA</td>
<td colspan="2" align="left" rowspan="1">EUKARYOTA</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Max.</td>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Max.</td>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Max.</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">AUG>UAU</td>
<td align="left" rowspan="1" colspan="1">−2,443</td>
<td align="left" rowspan="1" colspan="1">GCU>AAC</td>
<td align="left" rowspan="1" colspan="1">−20,054</td>
<td align="left" rowspan="1" colspan="1">
<underline>ACU>AAG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−36,764</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UGG>GCC</td>
<td align="left" rowspan="1" colspan="1">0,000</td>
<td align="left" rowspan="1" colspan="1">
<underline>ACU>AAG</underline>
</td>
<td align="left" rowspan="1" colspan="1">−16,118</td>
<td align="left" rowspan="1" colspan="1">UCU>AAG</td>
<td align="left" rowspan="1" colspan="1">−35,027</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AUG>UGA</td>
<td align="left" rowspan="1" colspan="1">0,000</td>
<td align="left" rowspan="1" colspan="1">ACC>AAA</td>
<td align="left" rowspan="1" colspan="1">−13,191</td>
<td align="left" rowspan="1" colspan="1">AUU>AAG</td>
<td align="left" rowspan="1" colspan="1">−30,44</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GUG>UCA</td>
<td align="left" rowspan="1" colspan="1">4,769</td>
<td align="left" rowspan="1" colspan="1">UGC>GCA</td>
<td align="left" rowspan="1" colspan="1">−11,122</td>
<td align="left" rowspan="1" colspan="1">AAU>AAG</td>
<td align="left" rowspan="1" colspan="1">−27,011</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UUC>GCA</td>
<td align="left" rowspan="1" colspan="1">8,384</td>
<td align="left" rowspan="1" colspan="1">CUC>GAG</td>
<td align="left" rowspan="1" colspan="1">−10,371</td>
<td align="left" rowspan="1" colspan="1">GCU>AAG</td>
<td align="left" rowspan="1" colspan="1">−26,357</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GGC>CAA</td>
<td align="left" rowspan="1" colspan="1">10,086</td>
<td align="left" rowspan="1" colspan="1">ACC>AAG</td>
<td align="left" rowspan="1" colspan="1">−9,663</td>
<td align="left" rowspan="1" colspan="1">UUU>AAG</td>
<td align="left" rowspan="1" colspan="1">−25,784</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UUG>UAC</td>
<td align="left" rowspan="1" colspan="1">11,826</td>
<td align="left" rowspan="1" colspan="1">CAC>AAA</td>
<td align="left" rowspan="1" colspan="1">−9,257</td>
<td align="left" rowspan="1" colspan="1">CCU>AAG</td>
<td align="left" rowspan="1" colspan="1">−25,695</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GUU>AGC</td>
<td align="left" rowspan="1" colspan="1">20,281</td>
<td align="left" rowspan="1" colspan="1">CCC>AAA</td>
<td align="left" rowspan="1" colspan="1">−9,050</td>
<td align="left" rowspan="1" colspan="1">UGU>AAG</td>
<td align="left" rowspan="1" colspan="1">−25,652</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GUA>UAC</td>
<td align="left" rowspan="1" colspan="1">39,931</td>
<td align="left" rowspan="1" colspan="1">UGC>GCU</td>
<td align="left" rowspan="1" colspan="1">−6,427</td>
<td align="left" rowspan="1" colspan="1">UAU>AAG</td>
<td align="left" rowspan="1" colspan="1">−25,582</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GCG>UAC</td>
<td align="left" rowspan="1" colspan="1">76,191</td>
<td align="left" rowspan="1" colspan="1">CCU>AUG</td>
<td align="left" rowspan="1" colspan="1">−6,140</td>
<td align="left" rowspan="1" colspan="1">AGU>AAG</td>
<td align="left" rowspan="1" colspan="1">−25,04</td>
</tr>
<tr>
<td colspan="6" align="left" rowspan="1">
<bold>Positive codon-pair contexts</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Min.</td>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Min.</td>
<td align="left" rowspan="1" colspan="1">Context</td>
<td align="left" rowspan="1" colspan="1">Min.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UAC>AAC</td>
<td align="left" rowspan="1" colspan="1">−5,974</td>
<td align="left" rowspan="1" colspan="1">GAC>UGG</td>
<td align="left" rowspan="1" colspan="1">14,515</td>
<td align="left" rowspan="1" colspan="1">
<bold>AAG>AAG</bold>
</td>
<td align="left" rowspan="1" colspan="1">67,019</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AUG>AGU</td>
<td align="left" rowspan="1" colspan="1">−8,173</td>
<td align="left" rowspan="1" colspan="1">
<underline>GGC>UGU</underline>
</td>
<td align="left" rowspan="1" colspan="1">13,185</td>
<td align="left" rowspan="1" colspan="1">
<bold>GCU>GCU</bold>
</td>
<td align="left" rowspan="1" colspan="1">51,04</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GUU>UCU</td>
<td align="left" rowspan="1" colspan="1">−8,426</td>
<td align="left" rowspan="1" colspan="1">
<underline>AAU>CCA</underline>
</td>
<td align="left" rowspan="1" colspan="1">12,265</td>
<td align="left" rowspan="1" colspan="1">
<bold>GGU>GGU</bold>
</td>
<td align="left" rowspan="1" colspan="1">34,927</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AAA>UAG</td>
<td align="left" rowspan="1" colspan="1">−10,902</td>
<td align="left" rowspan="1" colspan="1">GGC>UGG</td>
<td align="left" rowspan="1" colspan="1">9,454</td>
<td align="left" rowspan="1" colspan="1">
<bold>
<underline>AGA>AGA</underline>
</bold>
</td>
<td align="left" rowspan="1" colspan="1">29,651</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<underline>AAU>CCA</underline>
</td>
<td align="left" rowspan="1" colspan="1">−11,288</td>
<td align="left" rowspan="1" colspan="1">
<bold>UGG>UGG</bold>
</td>
<td align="left" rowspan="1" colspan="1">8,653</td>
<td align="left" rowspan="1" colspan="1">AAG>AAA</td>
<td align="left" rowspan="1" colspan="1">28,187</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<underline>AGA>AGA</underline>
</bold>
</td>
<td align="left" rowspan="1" colspan="1">−0,75</td>
<td align="left" rowspan="1" colspan="1">UUC>UGG</td>
<td align="left" rowspan="1" colspan="1">8,575</td>
<td align="left" rowspan="1" colspan="1">
<bold>AAC>AAC</bold>
</td>
<td align="left" rowspan="1" colspan="1">27,524</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AGU>UUU</td>
<td align="left" rowspan="1" colspan="1">−5,871</td>
<td align="left" rowspan="1" colspan="1">GUA>AAU</td>
<td align="left" rowspan="1" colspan="1">7,486</td>
<td align="left" rowspan="1" colspan="1">
<bold>AGC>AGC</bold>
</td>
<td align="left" rowspan="1" colspan="1">26,491</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AAG>UAA</td>
<td align="left" rowspan="1" colspan="1">−6,285</td>
<td align="left" rowspan="1" colspan="1">AAC>UGC</td>
<td align="left" rowspan="1" colspan="1">6,051</td>
<td align="left" rowspan="1" colspan="1">
<bold>UCU>UCA</bold>
</td>
<td align="left" rowspan="1" colspan="1">25,624</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GGC>UCU</td>
<td align="left" rowspan="1" colspan="1">−18,712</td>
<td align="left" rowspan="1" colspan="1">ACA>ACA</td>
<td align="left" rowspan="1" colspan="1">5,273</td>
<td align="left" rowspan="1" colspan="1">CCU>CCA</td>
<td align="left" rowspan="1" colspan="1">23,884</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GGG>CAU</td>
<td align="left" rowspan="1" colspan="1">−27,619</td>
<td align="left" rowspan="1" colspan="1">UGC>CCC</td>
<td align="left" rowspan="1" colspan="1">5,243</td>
<td align="left" rowspan="1" colspan="1">
<bold>CCU>CCU</bold>
</td>
<td align="left" rowspan="1" colspan="1">23,624</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt102">
<p>In order to determine whether there are general rules for codon-pair contexts, the contexts that were negative or positive in the highest number of species were identified and sorted by the maximum and minimum residual value found for each context as shown above. As a consequence, contexts that have negative maximum values or positive minimum values have the same sign in all species of each domain (general rules). Codon-pair contexts that appeared more than once are underlined, while codon contexts of identical codons are shown in bold. Major preference for codon repetitions in eukaryotes is clearly visible in the dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s2c">
<title>Context preferences exist in coding and non-coding sequences</title>
<p>A large-scale codon-pair context comparison was carried out to visualize general context patterns, using clustering tools (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
). Interestingly, a red region, corresponding to negative residual values (rejected context), appeared across the 119 ORFeomes studied (blue box in
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
). These rejected codon-pairs were of the general type NNU
<sub>3</sub>
-A
<sub>1</sub>
NN, where N represents any base. Other general patterns in the map represented either preferred codon-pairs in Archaea and Eukarya (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
; region-Y), rejected codon-pairs in Archaea and Eukarya (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
; region-Z) or strongly rejected codon-pairs in Eubacteria (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
; region-X).</p>
<fig id="pone-0000847-g003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.g003</object-id>
<label>Figure 3</label>
<caption>
<title>Nucleotide context preferences can be detected in total genome sequences.</title>
<p>A large-scale map for codon-pair context was produced using either the ORFeome (panel A) or the total genome (panel B) sequences of 119 species (see
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1</xref>
and Methods). Such patterns are either universal i.e. present in every species, or visible only in special phylogenetic groups. Surprisingly, most of the ORFeome patterns were also present in total genome sequences, implying that the major forces that drive the evolution of coding sequences are not necessarily connected to mRNA translation. Moreover, when a Differential Map Display (DDM) was built to compare the two former maps (panel C) it became clear that eukaryotes have a more heterogeneous behavior, since they showed greater resemblance between coding and non-coding sequences (darker pattern in the DDM), but they also produced the larger differences found in the DDM (*). These differences correspond either to two-codon context rules imposed by the translational machinery and hence specific of ORFeomes, or to genome biases that are strongly repressed in coding sequences, where they are probably associated to increased decoding error rates. ORFeomes were arranged in the map by domain of life (Eukaryota, Archaea and Eubacteria from left to right) and sorted as shown in
<xref ref-type="supplementary-material" rid="pone.0000847.s002">Figure S2</xref>
. Adjusted residuals are colored in the maps so that green cells correspond to preferred and red cells to rejected contexts, while in the DDM major differences (above 15) between residuals of the previous maps are shown in light blue.</p>
</caption>
<graphic xlink:href="pone.0000847.g003"></graphic>
</fig>
<p>In order to evaluate whether those general codon-pair context patterns arose from DNA replication biases, a second large scale comparative map was built using complete genome sequences (coding + non-coding) of the 119 organisms under study. For this, ANACONDA 2.0 scanned full chromosome sequences starting at the first six nucleotides and moved the scanning window three nucleotides at each step. In this way, both coding and non-coding sequences were analyzed and the frequency of all hexanucleotides was computed, without worrying about the DNA strand location or the reading frame of coding sequences, i.e. ORFs were scanned randomly in the frames 0, +1 or +2. This full genome context map (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3B</xref>
) showed patterns that were also observed in the ORFeome map (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
), confirming that DNA replication biases strongly influence codon-pair context. Since the difference between full genome (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3B</xref>
) and ORFeome (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
) codon context maps could separate global genome biases from translational biases, a DDM was built and the differences between the two were colored using a blue color scale (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3C</xref>
), as before (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2</xref>
). The DDM showed significant differences between full genome and ORFeome maps indicating that codon-pair context is also influenced by evolutionary forces that are not related to DNA replication biases. Interestingly, the column corresponding to eukaryotes was generally darker than the rest of the map, meaning that coding and non-coding sequences are similar in eukaryotes (i.e. stronger influence of DNA replication biases). However, the eukaryotic region of the DDM included the highest differences between ORFeomes and genomes (marked with * in
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3C</xref>
), suggesting that the eukaryotic translational machinery also imposes strong selective pressure on specific combinations of codons, resulting in a localized higher divergence between coding and non-coding sequences.</p>
</sec>
<sec id="s2d">
<title>Codon-pair context is influenced by genome and mRNA translation biases</title>
<p>Since DNA replication biases are partly visible at the dinucleotide level
<xref ref-type="bibr" rid="pone.0000847-Campbell1">[30]</xref>
<xref ref-type="bibr" rid="pone.0000847-Hooper1">[32]</xref>
, we have constructed individual codon-pair context maps in which rows and columns were sorted to separate P-site codons ending with a particular nucleotide (N3; rows) and A-site codons starting with a particular nucleotide (N1; columns) (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4A</xref>
). These two consecutive positions of codon-pair context discriminated rather well codon-pair preferences and such discrimination was very strong for high eukaryotes and weak for low eukaryotes and bacteria (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4A</xref>
). In order to determine whether such dinucleotide bias was linked to translational selection or to overall genome dinucleotide preferences, the dinucleotide bias was determined for the full set of 119 genomes under study (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4B</xref>
). Overall, rejection of UA dinucleotides in the 3 domains of life was evident; a trend that corresponded to the negative codon-pair context rule (NNU
<sub>3</sub>
-A
<sub>1</sub>
NN) described above. The overall dinucleotide biases were also in agreement with the codon-pair context pattern (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4A</xref>
). For example, the rejection of CpG dinucleotides in higher eukaryotes (with the surprising exception of the honeybee,
<italic>Apis mellifera</italic>
), was also observed in NNC
<sub>3</sub>
-G
<sub>1</sub>
NN codon-pairs (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4A</xref>
). Other examples were UpG and CpA dinucleotides that were strongly preferred in higher eukaryotes (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4B</xref>
), a characteristic that was also reflected in codon-pair context maps (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4A</xref>
). Finally, the dinucleotide biases (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4B</xref>
) showed overall preference for ApA and UpU dinucleotides. This feature originated from frequent tandem repeats of 3 and more identical bases (
<xref ref-type="supplementary-material" rid="pone.0000847.s015">Figure S4</xref>
).</p>
<fig id="pone-0000847-g004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.g004</object-id>
<label>Figure 4</label>
<caption>
<title>Influence of dinucleotide bias on the codon-pair context preferences.</title>
<p>A) In order to highlight the influence of dinucleotide bias on codon-pair contexts, the maps of
<italic>H. sapiens</italic>
,
<italic>M. musculus</italic>
,
<italic>S. cerevisiae</italic>
and
<italic>E. coli</italic>
were arranged according to their (N
<sub>3</sub>
-N
<sub>1</sub>
) context. High degree of context discrimination was achieved by these two positions in higher eukaryotes, especially for the dinucleotide CpG (blue square), however this effect was weak in yeast and
<italic>E. coli</italic>
showed an opposite preference pattern (green). Adjusted residuals are colored in the maps so that green cells correspond to preferred and red cells to rejected contexts. B) In order to further evaluate the role of the dinucleotide N
<sub>3</sub>
-N
<sub>1</sub>
bias on codon-pair context biases dinucleotide preferences were determined using total genome sequences. The dinucleotide combinations with highest bias were displayed in green (preferred dinucleotides) or red (rejected ones) and correspond to dinucleotides that appear 1% above or bellow the expected level, respectively. The UpA dinucleotide is strongly repressed throughout all domains of life. Other constraints imposed on ORFeomes by genomes biases include the rejection of CpG dinucleotides in higher eukaryotes and the accumulation of CpA and UpG in higher eukaryotes or UpU and ApA in almost all organisms. The last preference is related to high number of tandem repeats of more than 3 consecutive Us or As (
<xref ref-type="supplementary-material" rid="pone.0000847.s015">Figure S4</xref>
). ORFeomes were arranged in both maps by domain of life (Eukaryota, Archaea and Eubacteria from left to right) and sorted as shown in
<xref ref-type="supplementary-material" rid="pone.0000847.s002">Figure S2</xref>
.</p>
</caption>
<graphic xlink:href="pone.0000847.g004"></graphic>
</fig>
<p>The only universal rule detected in the large-scale comparison (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3</xref>
) contained codon pairs of the type NNU
<sub>3</sub>
-A
<sub>1</sub>
NN. Since this rule included out-of-frame stop codons, namely UAA or UAG (i.e. NNU
<sub>3</sub>
-A
<sub>1</sub>
A
<sub>2</sub>
N or NNU
<sub>3</sub>
-A
<sub>1</sub>
G
<sub>2</sub>
N), we investigated whether NNU
<sub>3</sub>
-A
<sub>1</sub>
NN rejection was related to premature translation termination. For this, we constructed a subset of codon-pair context maps in which the contexts containing out-of-frame stop codons were represented (
<xref ref-type="fig" rid="pone-0000847-g005">Figure 5</xref>
). This approach showed that NNU
<sub>3</sub>
-A
<sub>1</sub>
A
<sub>2</sub>
N and NNU
<sub>3</sub>
-A
<sub>1</sub>
G
<sub>2</sub>
N type contexts were indeed the most negative in almost all ORFeomes. However, NNU
<sub>3</sub>
-G
<sub>1</sub>
A
<sub>2</sub>
N; NU
<sub>2</sub>
A
<sub>3</sub>
-A
<sub>1</sub>
NN and NU
<sub>2</sub>
G
<sub>3</sub>
-A
<sub>1</sub>
NN contexts which also contained out-of-frame stop codons had a majority of positive residual values (green), while NNU
<sub>3</sub>
-A
<sub>1</sub>
C
<sub>2</sub>
N and NNU
<sub>3</sub>
-A
<sub>1</sub>
U
<sub>2</sub>
N contexts that did not contain out-of-frame stop codons had a majority of negative residual values (red). Since some of the positive context rules (
<xref ref-type="fig" rid="pone-0000847-g005">Figure 5</xref>
) included the dinucleotide UpA, which was rejected in the total genomes map (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4</xref>
), it is likely that dinucleotide bias is not the only cause for the rejection of codon-pair contexts. On the other hand, premature termination was not the only potential problem here, because NNU
<sub>3</sub>
-A
<sub>1</sub>
C
<sub>2</sub>
N and NNU
<sub>3</sub>
-A
<sub>1</sub>
U
<sub>2</sub>
N did not correspond to out-of-frame stop codons and were also strongly rejected in ORFeomes (
<xref ref-type="fig" rid="pone-0000847-g005">Figure 5</xref>
).</p>
<fig id="pone-0000847-g005" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.g005</object-id>
<label>Figure 5</label>
<caption>
<title>Genome dinucleotide bias has a strong influence on codon-pair context.</title>
<p>Since the most generalized negative codon-pair context rule detected corresponds to the general expression NNU
<sub>3</sub>
-A
<sub>1</sub>
NN, which includes the out-of-frame translation termination contexts NNU
<sub>3</sub>
-A
<sub>1</sub>
A
<sub>2</sub>
N and NNU
<sub>3</sub>
-A
<sub>1</sub>
G
<sub>2</sub>
N, other out-of-frame context terminators were analyzed separately. For this, the adjusted residuals of such contexts were included in an ORFeome comparison map. It was clear that NNU
<sub>3</sub>
-A
<sub>1</sub>
GN and NNU
<sub>3</sub>
-A
<sub>1</sub>
AN were indeed the most negative codon-pair contexts bearing out-of-frame stops, followed by NUA
<sub>3</sub>
-G
<sub>1</sub>
NN. The other groups of contexts tested did not generate codon-pair context rules, although some of them contained the strongly repressed UpA dinucleotide. The hypothesis that rejection of codon-pair contexts containing out-of-frame stop codons, namely NNU
<sub>3</sub>
-A
<sub>1</sub>
A
<sub>2</sub>
N and NNU
<sub>3</sub>
-A
<sub>1</sub>
G
<sub>2</sub>
N evolved to avoid premature termination was partially contradicted by the existence of similar patterns of NNU
<sub>3</sub>
-A
<sub>1</sub>
NN-type contexts that do not include any out-of-frame stops, namely NNU
<sub>3</sub>
-A
<sub>1</sub>
C
<sub>2</sub>
N and NNU
<sub>3</sub>
-A
<sub>1</sub>
U
<sub>2</sub>
N. ORFeomes were arranged in the map by domain of life (Eukaryota, Archaea and Eubacteria from left to right) and sorted as shown in
<xref ref-type="supplementary-material" rid="pone.0000847.s002">Figure S2</xref>
. Adjusted residuals are colored in the maps so that green cells correspond to preferred and red cells to rejected contexts.</p>
</caption>
<graphic xlink:href="pone.0000847.g005"></graphic>
</fig>
</sec>
<sec id="s2e">
<title>General codon-pair context rules</title>
<p>In order to highlight the codon-pair context preferences that were exclusive of coding sequences, the original map of ORFeomes (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3A</xref>
) was rebuilt to show in black cells whose residuals values were similar to those of identical contexts in the complete genomes map (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3B</xref>
). In this filtered map (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A</xref>
) green and red colored cells corresponded to those context residual values calculated for ORFeomes that were significantly different from those calculated for complete genomes, i.e. cells that were colored in blue in
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3C</xref>
. This large-scale comparative map allowed extraction of ORFeome specific codon-context patterns, while the converse filtering originated a complete genomes map that permitted extraction of genome specific patterns (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6B</xref>
). This approach highlighted clear codon-pair context differences between ORFeome and complete genome maps (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A,B</xref>
). Interestingly, these patterns corresponded to different sets of codon-pair contexts that could be easily described by the expressions annotated on the side of each map (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A,B</xref>
see also
<xref ref-type="supplementary-material" rid="pone.0000847.s016">Figure S5A</xref>
,
<xref ref-type="supplementary-material" rid="pone.0000847.s017">B</xref>
for different thresholds of visualization). In almost all cases, these codon-pair context rules fixed the last nucleotide of the first codon and the first nucleotide of the second codon, confirming that (N
<sub>3</sub>
-N
<sub>1</sub>
) positions shape codon context. Remarkably, the major patterns that appeared in the filtered map for complete genomes (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6B</xref>
) were related to UpA-rich hexanucleotides that produce weak codon-anticodon interactions in coding regions and should hence be under differential selective pressure in both types of sequences.</p>
<fig id="pone-0000847-g006" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0000847.g006</object-id>
<label>Figure 6</label>
<caption>
<title>Some codon-pair context patterns are associated to mRNA primary structure biases.</title>
<p>A) In order to identify ORFeome specific codon-pair context biases the two large scale context maps were filtered in such a way that only cells that yielded residual differences above 15 between the ORFeomes and total genomes sequences were shown. All other cases were colored in black (see
<xref ref-type="supplementary-material" rid="pone.0000847.s016">Figure S5</xref>
for different display thresholds). Codon-pair context patterns specific of ORFeomes are highlighted on the side of panel A. B) To visualize the patterns that appear in genomes and are absent in ORfeomes, large-scale comparative maps obtained with total genomes and ORFeomes were subtracted and only the cells that yielded differences above 15 were displayed. This highlighted patterns that are strongly preferred or repressed in coding sequences and may correspond to mistranslation hot spots.</p>
</caption>
<graphic xlink:href="pone.0000847.g006"></graphic>
</fig>
</sec>
</sec>
<sec id="s3">
<title>Discussion</title>
<p>Mistranslation is a poorly understood biological phenomenon which is influenced by various protein synthesis factors and mRNA primary structure features
<xref ref-type="bibr" rid="pone.0000847-Ogle1">[13]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Hooper2">[33]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Stahl1">[34]</xref>
. In order to shed new light on how the later influences decoding error and extend previous studies carried out mainly on the effect of codon usage on mistranslation
<xref ref-type="bibr" rid="pone.0000847-Kramer1">[25]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Dix1">[35]</xref>
, we are investigating the effect of codon-pair context on decoding fidelity. Our comparative genomics approaches unveiled the effect of both genome replication and translation specific biases on codon-pair context. The few studies carried out to date on codon-pair context were unable to distinguish those two types of biases
<xref ref-type="bibr" rid="pone.0000847-Boycheva1">[12]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Gutman1">[36]</xref>
<xref ref-type="bibr" rid="pone.0000847-Rocha1">[38]</xref>
. Our large scale approach confirmed the importance of genomic biases but also unveiled important translational biases that shape codon-pair context and should be primary targets for mistranslation hot spots.</p>
<p>Large-scale genomic analysis, such as the one that we have performed, allows for obtaining a global view of mistranslation in a way that is totally out of reach from analysis of single ORFeomes. Indeed, comparison of large sets of codon-pair context data unveiled the main codon-pair context patterns that exist in the 3 domains of life. Interestingly, when the most preferred or repressed codon-pair contexts of all organisms were considered (
<xref ref-type="table" rid="pone-0000847-t001">Table 1</xref>
), but also when common rules were selected (
<xref ref-type="table" rid="pone-0000847-t002">Table 2</xref>
), there was little or no overlapping between the context patterns of the 3 domains of life. This suggests that genome replication and/or mRNA translation in each domain imposes specific constraints to decoding sequences which produce different codon-pair context outcomes. Also, the phylogeny of individual species appeared as an important determinant of its codon-pair context behavior (
<xref ref-type="fig" rid="pone-0000847-g002">Figure 2</xref>
), in a similar manner to that described for codon usage bias
<xref ref-type="bibr" rid="pone.0000847-Grantham1">[39]</xref>
or dinucleotide genome signatures
<xref ref-type="bibr" rid="pone.0000847-Nakashima1">[31]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Hooper1">[32]</xref>
.</p>
<sec id="s3a">
<title>Influence of genome wide biases on codon-pair context</title>
<p>Our observation that ORFeomes and total genomes produce similar patterns of codon-pair context (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3</xref>
) confirmed previous studies
<xref ref-type="bibr" rid="pone.0000847-Chen1">[4]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Buckingham2">[40]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-McVean1">[41]</xref>
. This implies that most sequence constraints that can be detected in coding sequences are not imposed by the translational machinery, but arise from selective pressure imposed by DNA replication and related biases. That codon usage biases were mainly due to mutational pressure and only secondarily to translational selection further confirmed the relevance of DNA replication biases on codon-pair context
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
. In this scenario, one is prompted to hypothesize that the translational process may work with sub-optimized mRNA sequences since codon-context fine tunes decoding fidelity
<xref ref-type="bibr" rid="pone.0000847-Shah1">[15]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Murgola1">[22]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Tork1">[23]</xref>
.</p>
<p>Genomes are known to have biased dinucleotide frequencies
<xref ref-type="bibr" rid="pone.0000847-Nakashima1">[31]</xref>
, a feature that has frequently been used to produce genomic signatures of phylogenetical and taxonomical relevance
<xref ref-type="bibr" rid="pone.0000847-Nakashima1">[31]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Hooper1">[32]</xref>
. At the ORFeome level this bias influences codon usage
<xref ref-type="bibr" rid="pone.0000847-Hooper1">[32]</xref>
but may also interfere with codon-context, whenever the last nucleotide of one codon is associated with the first nucleotide of the second codon of the pair. Indeed, (N
<sub>3</sub>
–N
<sub>1</sub>
) contexts explained part of our results (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6</xref>
) and confirmed the good discrimination obtained when one ORFeome map for codon context was arranged according to the last position of the first codon and the first position of the next codon (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4A</xref>
).</p>
<p>The association bias of two consecutive nucleotides is a characteristic of genomes which results from global selective pressures acting upon DNA at the level of repair and replication mechanisms
<xref ref-type="bibr" rid="pone.0000847-Hooper1">[32]</xref>
or ecological constraints that may influence, for instance, the overall G+C content of the genome
<xref ref-type="bibr" rid="pone.0000847-Lao1">[42]</xref>
<xref ref-type="bibr" rid="pone.0000847-Tekaia1">[44]</xref>
. Regulatory activity acting upon the entire genome is another cause of dinucleotide bias. CpG dinucleotides, for example, are signals for DNA methylation, a mechanism commonly used by higher organisms to protect their genome from selfish DNA elements and to regulate gene expression
<xref ref-type="bibr" rid="pone.0000847-Chan1">[5]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Robertson1">[6]</xref>
. Our dinucleotide bias analysis for the 119 organisms confirmed a clear rejection of CpG methylation in coding sequences of high eukaryotes, as would be expected, since methylated DNA becomes unavailable for transcription and hence translation
<xref ref-type="bibr" rid="pone.0000847-Chan1">[5]</xref>
. On the other hand, UpA dinucleotides are highly repressed in DNA sequences of most organisms
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Nakashima1">[31]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Beutler1">[45]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Nakashima2">[46]</xref>
. Interestingly, UpA dinucleotides are sites for preferential hydrolysis of RNA by macrophage ribonucleases
<xref ref-type="bibr" rid="pone.0000847-Beutler1">[45]</xref>
destabilizing RNA molecules
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
and should hence be avoided
<xref ref-type="bibr" rid="pone.0000847-Beutler1">[45]</xref>
. Furthermore, Duan and colleagues
<xref ref-type="bibr" rid="pone.0000847-Duan1">[7]</xref>
proposed that mRNA stability imposes strong selective pressure on synonymous codon usage and it is likely that this is also true for codon-pair context. Our data confirmed that hypothesis since NNU
<sub>3</sub>
-A
<sub>1</sub>
NN contexts were highly repressed in the 119 different genomes analyzed.</p>
</sec>
<sec id="s3b">
<title>Influence of translational biases on codon-pair context</title>
<p>As already mentioned, the unique universal rule that could be detected in the 119 genomes analyzed was rejection of most codon-pair contexts of the type NNU
<sub>3</sub>
-A
<sub>1</sub>
NN (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3</xref>
). Clearly, this trend is a direct result of repression of the TpA dinucleotide in total DNA sequences (
<xref ref-type="fig" rid="pone-0000847-g003">Figure 3</xref>
). However, it was surprising that other UpA bearing contexts did not show strong rejection. For example, NU
<sub>2</sub>
A
<sub>3</sub>
-A
<sub>1</sub>
NN contexts are mainly preferred in coding sequences (
<xref ref-type="fig" rid="pone-0000847-g005">Figure 5</xref>
) indicating strong differences between codon-pairs containing UpA dinucleotides and suggesting that translation does influence codon-pair choice.</p>
<p>When a large-scale comparison of codon-pair context excluded global genome biases (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A</xref>
) it became evident that contexts that were truly produced by translation-driven selection were grouped in negative or positive rules depending on the phylogeny of the organisms. This was in agreement with the previous observation that strongly biased codon-pair contexts were different between the 3 domains of life (
<xref ref-type="table" rid="pone-0000847-t001">Tables 1</xref>
,
<xref ref-type="table" rid="pone-0000847-t002">2</xref>
), and supported the hypothesis that differences in the translational machineries of different organisms reshape mRNA primary structure in different ways. For example, the NNC
<sub>3</sub>
-N
<sub>4</sub>
NN contexts pattern of higher eukaryotes (B1 and B2 in
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A</xref>
) could be explained by specific decoding rules of C-ending codons in Eukarya. Indeed, eukaryotic species translate several C-ending codons by wobble pairing rules using inosine
<xref ref-type="bibr" rid="pone.0000847-Marck1">[47]</xref>
, which recognizes A, C, or U at the wobbling position
<xref ref-type="bibr" rid="pone.0000847-Crick1">[48]</xref>
while bacterial species decode most C-ending codons with Watson-Crick C-G base pairing between codon and anticodon
<xref ref-type="bibr" rid="pone.0000847-Marck1">[47]</xref>
.</p>
<p>As to the other minor rules highlighted on the left side of
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6</xref>
, namely NNU
<sub>3</sub>
-G
<sub>1</sub>
G
<sub>2</sub>
R, NRU
<sub>3</sub>
-G
<sub>1</sub>
A
<sub>2</sub>
N and NG
<sub>2</sub>
N-NG
<sub>2</sub>
N or NC
<sub>2</sub>
N-NC
<sub>2</sub>
N, they may be related to both canonical decoding of U-ending codons in eukaryotes (A1 and A2 in
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A</xref>
) and to the existence of runs of special sets of amino acids, namely serine/proline/threonine/alanine and arginine/glycine (C in
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6A</xref>
). That contexts of repeated codons are preferred in eukaryotic genes (
<xref ref-type="table" rid="pone-0000847-t001">Tables 1</xref>
,
<xref ref-type="table" rid="pone-0000847-t002">2</xref>
) and that proline, alanine and glycine are frequently found in amino acid runs of human genes
<xref ref-type="bibr" rid="pone.0000847-Caburet1">[49]</xref>
corroborates the above hypothesis.</p>
<p>On the other hand, most of the major genomic constraints that were not present in coding sequences, namely NNU
<sub>3</sub>
-A
<sub>1</sub>
NN, NYU
<sub>3</sub>
-A
<sub>1</sub>
RN and N(U/A)
<sub>2</sub>
U
<sub>3</sub>
-U
<sub>1</sub>
(U/A)
<sub>2</sub>
N or N(U/A)
<sub>2</sub>
A
<sub>3</sub>
-A
<sub>1</sub>
(U/A)
<sub>2</sub>
N rules (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6B</xref>
) were associated to weak decoding interactions involving A-U codon-anticodon pairing. Moreover, these rules are produced by either strong genomic dinucleotide bias against UpA (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6B</xref>
, rule D) or by rejection of error prone UA-rich codon-pair contexts in coding sequences (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6B</xref>
, rule F), in a clear confirmation of the additive effect of translational and non-translational selective pressures. Finally, we could also see a preference for trinucleotide repeats in non-coding sequences that was not detectable in coding regions, at least in eukaryotes (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6B</xref>
, rule E). This has already been described in primates and is related to strong mRNA primary structure constraints associated to high mRNA decoding efficiency
<xref ref-type="bibr" rid="pone.0000847-Borstnik1">[50]</xref>
.</p>
</sec>
<sec id="s3c">
<title>Conclusions</title>
<p>Codon-pair contexts are biased in ORFeomes and such bias is the result of both translation and non-translation driven processes. Indeed, translational and DNA replication/repair and cis regulatory elements act synergistically on codon-pair context. This myriad of selective pressures creates significant difficulties to the identification of codon-context biases associated to mRNA translation only. Our large scale comparative genome approach indicated that: i) there is a strong influence of non-translational selective pressures upon coding sequences, especially in eukaryotic organisms since these have a higher degree of resemblance between ORFeome and total genome biases; ii) the strongest non-translational selective pressures that could be identified were dinucleotide biases, mainly imposed by regulatory cis-elements linked to DNA methylation or mRNA stability
<xref ref-type="bibr" rid="pone.0000847-Chan1">[5]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Beutler1">[45]</xref>
, and preference for trinucleotide repeats, usually associated with DNA polymerase slippage during replication
<xref ref-type="bibr" rid="pone.0000847-Rocha2">[51]</xref>
; iii) apart from this non-translational noise, DNA coding sequences showed specific features that could be related to mRNA translation, namely repression of usage of premature termination or error-prone contexts associated to weak codon-anticodon interactions. It will now be most interesting to validate these
<italic>in silico</italic>
data
<italic>in vivo</italic>
, and identify experimentally the codon-pair contexts that are strongly selected for high mRNA decoding fidelity.</p>
</sec>
</sec>
<sec sec-type="methods" id="s4">
<title>Methods</title>
<sec id="s4a">
<title>Primary data sources</title>
<p>Nucleotide sequences, of complete genomes and assembled ORFeomes, were downloaded from GenBank or Ensembl Web sites (Genbank:
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ncbi.nih.gov/genomes/">ftp://ftp.ncbi.nih.gov/genomes/</ext-link>
; Ensembl:
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ensembl.org/pub/">ftp://ftp.ensembl.org/pub/</ext-link>
) between December 2005 and January 2006. These included the DNA sequences of 81 eubacterial, 18 archaeal and 20 eukaryotic species. Plasmid sequences were not included in the analysis and all chromosomal sequences from one genome were analyzed together by ANACONDA 2.0. The total set of files downloaded and used in this study is documented as supplementary data (
<xref ref-type="supplementary-material" rid="pone.0000847.s002">Figure S2</xref>
).</p>
</sec>
<sec id="s4b">
<title>Statistical analyses</title>
<p>Two-codon context bias was studied in complete ORFeome sequences using the residual analysis tools available in the software package ANACONDA 1.0 (a detailed explanation of this software can be found in
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
,
<xref ref-type="bibr" rid="pone.0000847-Pinheiro1">[29]</xref>
. ANACONDA is publicly available at
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.ua.pt/submited-papers">http://bioinformatics.ua.pt/submited-papers</ext-link>
).</p>
<p>Briefly, this methodology counts all consecutive pairs of codons and uses statistical analysis for contingency tables where a multinomial distribution is assumed (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1B</xref>
). The final result of such statistical approach is the calculation of adjusted residuals for each codon pair present in any ORFeome. The adjusted residuals give direct information about preference or rejection of these codon pairs in relation to what would be expected assuming independence of the distribution (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1B</xref>
).</p>
<p>Since, under independence between two consecutive codons, the adjusted residuals
<italic>d
<sub>ij</sub>
</italic>
have a standardized normal probability distribution
<xref ref-type="bibr" rid="pone.0000847-Haberman1">[52]</xref>
, we have concluded that:
<inline-formula>
<inline-graphic xlink:href="pone.0000847.e001.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, as the total number of observations is very high. This means that, for a 99,73% confidence level, an adjusted residue was statistically significant if its absolute value was greater than 3
<xref ref-type="bibr" rid="pone.0000847-Moura1">[20]</xref>
. However, this approach was based on a local analysis for each residual value. Herein, we considered a global analysis for each species and have thus constructed a simultaneous confidence region for all residual values. Since there are K = 61×64 different intervals we have introduced the Bonferroni correction to ensure an overall level of significance of α (usually α = 0.05, 0.01, 0.001). The Bonferroni correction is used for correction where each interval is constructed at a 100×(1–α/K) level (see, for example,
<xref ref-type="bibr" rid="pone.0000847-Simenoff1">[53]</xref>
). Therefore, a–
<italic>a</italic>
to
<italic>a</italic>
interval at a confidence level of 100×(1−α/(61×64)) was constructed for each adjusted residual value
<italic>d
<sub>ij</sub>
</italic>
. Considering again the asymptotic normal distribution of
<italic>d
<sub>ij</sub>
</italic>
<xref ref-type="bibr" rid="pone.0000847-Haberman1">[52]</xref>
we had
<italic>a</italic>
≈4,70341 when 1–α = 0,99,
<italic>a</italic>
≈5,15350 when 1–α = 0,999,
<italic>a</italic>
≈8,16204 when 1–α = 0.01×10
<sup>−10</sup>
. Thus, we assumed that the codon-pair adjusted residuals that fall within the interval −5 to 5 were not statistically significant, for a global confidence level of 99% (colored in black in all maps shown herein).</p>
<p>The final output of residual analysis performed by ANACONDA is a codon-pair context map for each ORFeome being studied (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1C</xref>
). These maps show one colored square for each codon-pair, the first codon corresponding to rows and the second corresponding to columns in the map. The color scale chosen for this layout determines that preferred contexts are shown in green while repressed ones appear in red (
<xref ref-type="fig" rid="pone-0000847-g001">Figure 1C</xref>
).</p>
<p>Taking advantage of the automated statistical analysis performed by ANACONDA, individual maps for all 119 ORFeomes were built (see Figure S3). In order to facilitate large-scale comparison of maps these were converted into single lines and clustered together (
<xref ref-type="fig" rid="pone-0000847-g001">Figures 1D,E</xref>
and
<xref ref-type="fig" rid="pone-0000847-g003">3</xref>
). The patterns that appear in the resulting comparative map were then characterized by the codon-pair contexts that were present in each pattern. Also, the values of the adjusted residuals calculated for each species were corrected for ORFeome size to allow direct comparisons among ORFeomes.</p>
<p>The above approach was also used to study total genome sequences of the same 119 species in order to differentiate between the effect of translational selection acting upon coding sequences alone and genome mutational biases. With the same purpose, the bias for dinucleotides was studied in total genome sequences, and shown as observed frequencies, colored in green or red whenever the result was 1% above or below the expected value, respectively (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4B</xref>
).</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="s5">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pone.0000847.s001">
<label>Figure S1</label>
<caption>
<p>Data normalization. In order to correct the size differences of ORFeomes, particularly between eukaryotes and non-eukaryotes, the adjusted residuals were normalized for 21 million codons which correspond approximately to the larger ORFeome analyzed (X. tropicalis). Normalization of codon-pair data for human chromosomes 1, 2, 3, 22 and ORFeome are displayed. The normalization effect is shown by the brightness of the maps, which is variable in non-normalized maps (above) and constant in normalized ones (below). After data normalization the differences between maps could be compared as shown in the DDM (right end of the Figure).</p>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s001.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s002">
<label>Figure S2A</label>
<caption>
<p>List of species used. All species used in the study are listed according to the download order. The database of origin and respective accession numbers are indicated. A - Eukaryotes; B - Archaea and Eubacteria; C - Eubacteria (cont.).</p>
<p>(0.54 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s002.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s003">
<label>Figure S2B</label>
<caption>
<p>(0.54 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s003.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s004">
<label>Figure S2C</label>
<caption>
<p>(0.54 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s004.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s005">
<label>Figure S3A</label>
<caption>
<p>Individual codon-pair context maps of the 119 species. The codon-pair context maps built with ANACONDA software for individual ORFeomes are shown as ordered in Suppl.
<xref ref-type="supplementary-material" rid="pone.0000847.s002">Figure S2</xref>
.</p>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s005.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s006">
<label>Figure S3B</label>
<caption>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s006.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s007">
<label>Figure S3C</label>
<caption>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s007.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s008">
<label>Figure S3D</label>
<caption>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s008.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s009">
<label>Figure S3E</label>
<caption>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s009.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s010">
<label>Figure S3F</label>
<caption>
<p>(4.32 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s010.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s011">
<label>Figure S3G</label>
<caption>
<p>(2.16 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s011.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s012">
<label>Figure S3H</label>
<caption>
<p>(2.16 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s012.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s013">
<label>Figure S3I</label>
<caption>
<p>(2.16 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s013.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s014">
<label>Figure S3J</label>
<caption>
<p>(2.16 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s014.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s015">
<label>Figure S4</label>
<caption>
<p>A and U bases are preferentially arranged in polynucleotide strings. In order to check if the preference detected for AA and UU dinucleotides in total genomes (
<xref ref-type="fig" rid="pone-0000847-g004">Figure 4B</xref>
) was due to a tendency for these bases to appear as polynucleotide strings we counted the number of times each individual base appeared isolated or in strings of two, three or more equal bases. The result of this approach points to a clear positive bias towards the accumulation of 3 or more consecutive A or U bases in total genomes.</p>
<p>(0.10 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s015.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s016">
<label>Figure S5A</label>
<caption>
<p>Codon-pair context patterns that are exclusive of ORFeomes or genomes. The filtering technique that was used to determine the biases of codon-pair contexts in coding and total sequences (
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6</xref>
) was further explored in here to evaluate the strength of those biases. This was done by gradually increasing the threshold of the residuals (D) that are significantly different in both maps, i.e. those that were allowed to appear in their original colors in the filtered map. When D was increased, only major differences between ORFeomes and genomes were visible, corresponding to differences between the residuals of ORFeomes and genomes maps that stay above 15, 20, 30 or 50. The strongest rules detected by this approach correspond to those identified as B1 and B2 in the filtered map for ORFeomes (map A in
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6</xref>
), and F1, F2 and E2 in the filtered map for genomes (map B in
<xref ref-type="fig" rid="pone-0000847-g006">Figure 6</xref>
), because they are still visible when D = 50.</p>
<p>(0.55 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s016.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s017">
<label>Figure S5B</label>
<caption>
<p>(0.53 MB TIF)</p>
</caption>
<media xlink:href="pone.0000847.s017.tif" mimetype="image" mime-subtype="tiff">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0000847.s018">
<label>Table S1</label>
<caption>
<p>Codon-pair distribution similarities between the 3 domains of life. In order to compare the overall distribution of codon-pair contexts among the 119 organisms we have calculated the Spearman's correlation coefficients between all pairs of ORFeomes, producing a triangular colored map. The 119 species were organized by domain of life and sorted alphabetically in each domain. Pairs of species that were not statistically correlated (for a level of significance of 5%) are colored in grey, while green colored cells indicate pairs of species that were highly correlated (correlation coefficient above 0,80), and blue colored cells correspond to the major values fount inside each domain.</p>
<p>(0.32 MB XLS)</p>
</caption>
<media xlink:href="pone.0000847.s018.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="pone.0000847-Cliften1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cliften</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Johnston</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>After the duplication: gene loss and adaptation in Saccharomyces genomes.</article-title>
<source>Genetics</source>
<volume>172</volume>
<fpage>863</fpage>
<lpage>872</lpage>
<pub-id pub-id-type="pmid">16322519</pub-id>
</citation>
</ref>
<ref id="pone.0000847-vandeLagemaat1">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van de Lagemaat</surname>
<given-names>LN</given-names>
</name>
<name>
<surname>Gagnier</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Medstrand</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mager</surname>
<given-names>DL</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Genomic deletions and precise removal of transposable elements mediated by short identical DNA segments in primates.</article-title>
<source>Genome Res</source>
<volume>15</volume>
<fpage>1243</fpage>
<lpage>1249</lpage>
<pub-id pub-id-type="pmid">16140992</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Lin1">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>YW</given-names>
</name>
<name>
<surname>Thi</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Kuo</surname>
<given-names>PL</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>BD</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Polymorphisms associated with the DAZ genes on the human Y chromosome.</article-title>
<source>Genomics</source>
<volume>86</volume>
<fpage>431</fpage>
<lpage>438</lpage>
<pub-id pub-id-type="pmid">16085382</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Chen1">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Hottes</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>L</given-names>
</name>
<name>
<surname>McAdams</surname>
<given-names>HH</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Codon usage between genomes is constrained by genome-wide mutational processes.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>101</volume>
<fpage>3480</fpage>
<lpage>3485</lpage>
<pub-id pub-id-type="pmid">14990797</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Chan1">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Henderson</surname>
<given-names>IR</given-names>
</name>
<name>
<surname>Jacobsen</surname>
<given-names>SE</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Gardening the genome: DNA methylation in Arabidopsis thaliana.</article-title>
<source>Nat Rev Genet</source>
<volume>6</volume>
<fpage>351</fpage>
<lpage>360</lpage>
<pub-id pub-id-type="pmid">15861207</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Robertson1">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Robertson</surname>
<given-names>KD</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>DNA methylation and human disease.</article-title>
<source>Nat Rev Genet</source>
<volume>6</volume>
<fpage>597</fpage>
<lpage>610</lpage>
<pub-id pub-id-type="pmid">16136652</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Duan1">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Duan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Antezana</surname>
<given-names>MA</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Mammalian mutation pressure, synonymous codon choice, and mRNA degradation.</article-title>
<source>J Mol Evol</source>
<volume>57</volume>
<fpage>694</fpage>
<lpage>701</lpage>
<pub-id pub-id-type="pmid">14745538</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Berg1">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berg</surname>
<given-names>OG</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>Codon bias in Escherichia coli: the influence of codon context on mutation and selection.</article-title>
<source>Nucleic Acids Res</source>
<volume>25</volume>
<fpage>1397</fpage>
<lpage>1404</lpage>
<pub-id pub-id-type="pmid">9060435</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Akashi1">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Akashi</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy.</article-title>
<source>Genetics</source>
<volume>136</volume>
<fpage>927</fpage>
<lpage>935</lpage>
<pub-id pub-id-type="pmid">8005445</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Curran1">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Curran</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Yarus</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>1989</year>
<article-title>Rates of aminoacyl-tRNA selection at 29 sense codons in vivo.</article-title>
<source>J Mol Biol</source>
<volume>209</volume>
<fpage>65</fpage>
<lpage>77</lpage>
<pub-id pub-id-type="pmid">2478714</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Percudani1">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Percudani</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ottonello</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Selection at the wobble position of codons read by the same tRNA in Saccharomyces cerevisiae.</article-title>
<source>Mol Biol Evol</source>
<volume>16</volume>
<fpage>1752</fpage>
<lpage>1762</lpage>
<pub-id pub-id-type="pmid">10605116</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Boycheva1">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boycheva</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chkodrov</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>I</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Codon pairs in the genome of Escherichia coli.</article-title>
<source>Bioinformatics</source>
<volume>19</volume>
<fpage>987</fpage>
<lpage>998</lpage>
<pub-id pub-id-type="pmid">12761062</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Ogle1">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ogle</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Ramakrishnan</surname>
<given-names>V</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Structural insights into translational fidelity.</article-title>
<source>Annu Rev Biochem</source>
<volume>74</volume>
<fpage>129</fpage>
<lpage>177</lpage>
<pub-id pub-id-type="pmid">15952884</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Irwin1">
<label>14</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Irwin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Heck</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Hatfield</surname>
<given-names>GW</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>Codon pair utilization biases influence translational elongation step times.</article-title>
<source>J Biol Chem</source>
<volume>270</volume>
<fpage>22801</fpage>
<lpage>22806</lpage>
<pub-id pub-id-type="pmid">7559409</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Shah1">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shah</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Giddings</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Gesteland</surname>
<given-names>RF</given-names>
</name>
<name>
<surname>Atkins</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Ivanov</surname>
<given-names>IP</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Computational identification of putative programmed translational frameshift sites.</article-title>
<source>Bioinformatics</source>
<volume>18</volume>
<fpage>1046</fpage>
<lpage>1053</lpage>
<pub-id pub-id-type="pmid">12176827</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Buckingham1">
<label>16</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Grosjean</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>1986</year>
<article-title>The accuracy of mRNA-tRNA recognition.</article-title>
<person-group person-group-type="editor">
<name>
<surname>Kirkwood</surname>
<given-names>TBL</given-names>
</name>
<name>
<surname>Rosenberger</surname>
<given-names>RF</given-names>
</name>
<name>
<surname>Galas</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<source>Accuracy in Molecular Processes: Its Control and Relevance to Living Systems</source>
<publisher-loc>London</publisher-loc>
<publisher-name>Chapman and Hall</publisher-name>
<fpage>83</fpage>
<lpage>126</lpage>
</citation>
</ref>
<ref id="pone.0000847-Percudani2">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Percudani</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Pavesi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ottonello</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>Transfer RNA gene redundancy and translational selection in Saccharomyces cerevisiae.</article-title>
<source>J Mol Biol</source>
<volume>268</volume>
<fpage>322</fpage>
<lpage>330</lpage>
<pub-id pub-id-type="pmid">9159473</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Curran2">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Curran</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Poole</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Tate</surname>
<given-names>WP</given-names>
</name>
<name>
<surname>Gross</surname>
<given-names>BL</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>Selection of aminoacyl-tRNAs at sense codons: the size of the tRNA variable loop determines whether the immediate 3′ nucleotide to the codon has a context effect.</article-title>
<source>Nucleic Acids Res</source>
<volume>23</volume>
<fpage>4104</fpage>
<lpage>4108</lpage>
<pub-id pub-id-type="pmid">7479072</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Fedorov1">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fedorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saxonov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Regularities of context-dependent codon bias in eukaryotic genes.</article-title>
<source>Nucleic Acids Res</source>
<volume>30</volume>
<fpage>1192</fpage>
<lpage>1197</lpage>
<pub-id pub-id-type="pmid">11861911</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Moura1">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moura</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pinheiro</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Miranda</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Afreixo</surname>
<given-names>V</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Comparative context analysis of codon pairs on an ORFeome scale.</article-title>
<source>Genome Biol</source>
<volume>6</volume>
<fpage>R28</fpage>
<pub-id pub-id-type="pmid">15774029</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Tate1">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tate</surname>
<given-names>WP</given-names>
</name>
<name>
<surname>Poole</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Mannering</surname>
<given-names>SA</given-names>
</name>
</person-group>
<year>1996</year>
<article-title>Hidden infidelities of the translational stop signal.</article-title>
<source>Prog Nucleic Acid Res Mol Biol</source>
<volume>52</volume>
<fpage>293</fpage>
<lpage>335</lpage>
<pub-id pub-id-type="pmid">8821264</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Murgola1">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Murgola</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Pagel</surname>
<given-names>FT</given-names>
</name>
<name>
<surname>Hijazi</surname>
<given-names>KA</given-names>
</name>
</person-group>
<year>1984</year>
<article-title>Codon context effects in missense suppression.</article-title>
<source>J Mol Biol</source>
<volume>175</volume>
<fpage>19</fpage>
<lpage>27</lpage>
<pub-id pub-id-type="pmid">6374155</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Tork1">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tork</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hatin</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Rousset</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Fabret</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>The major 5′ determinant in stop codon read-through involves two adjacent adenines.</article-title>
<source>Nucleic Acids Res</source>
<volume>32</volume>
<fpage>415</fpage>
<lpage>421</lpage>
<pub-id pub-id-type="pmid">14736996</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Rodnina1">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodnina</surname>
<given-names>MV</given-names>
</name>
<name>
<surname>Wintermeyer</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Fidelity of aminoacyl-tRNA selection on the ribosome: kinetic and structural mechanisms.</article-title>
<source>Annu Rev Biochem</source>
<volume>70</volume>
<fpage>415</fpage>
<lpage>435</lpage>
<pub-id pub-id-type="pmid">11395413</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Kramer1">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kramer</surname>
<given-names>EB</given-names>
</name>
<name>
<surname>Farabaugh</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>The frequency of translational misreading errors in E. coli is largely determined by tRNA competition.</article-title>
<source>RNA</source>
<volume>13</volume>
<fpage>87</fpage>
<lpage>96</lpage>
<pub-id pub-id-type="pmid">17095544</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Atkins1">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Atkins</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Weiss</surname>
<given-names>RB</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gesteland</surname>
<given-names>RF</given-names>
</name>
</person-group>
<year>1991</year>
<article-title>Towards a genetic dissection of the basis of triplet decoding, and its natural subversion: programmed reading frame shifts and hops.</article-title>
<source>Annu Rev Genet</source>
<volume>25</volume>
<fpage>201</fpage>
<lpage>228</lpage>
<pub-id pub-id-type="pmid">1812806</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Freistroffer1">
<label>27</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Freistroffer</surname>
<given-names>DV</given-names>
</name>
<name>
<surname>Kwiatkowski</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Buckingham</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Ehrenberg</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>The accuracy of codon recognition by polypeptide release factors.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>97</volume>
<fpage>2046</fpage>
<lpage>2051</lpage>
<pub-id pub-id-type="pmid">10681447</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Princiotta1">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Princiotta</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Finzi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Gibbs</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schuchmann</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2003</year>
<article-title>Quantitating protein synthesis, degradation, and endogenous antigen processing.</article-title>
<source>Immunity</source>
<volume>18</volume>
<fpage>343</fpage>
<lpage>354</lpage>
<pub-id pub-id-type="pmid">12648452</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Pinheiro1">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pinheiro</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Afreixo</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Moura</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Freitas</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Santos</surname>
<given-names>MA</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Statistical, computational and visualization methodologies to unveil gene primary structure features.</article-title>
<source>Methods Inf Med</source>
<volume>45</volume>
<fpage>163</fpage>
<lpage>168</lpage>
<pub-id pub-id-type="pmid">16538282</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Campbell1">
<label>30</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Campbell</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mrazek</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Karlin</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>96</volume>
<fpage>9184</fpage>
<lpage>9189</lpage>
<pub-id pub-id-type="pmid">10430917</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Nakashima1">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nakashima</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nishikawa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ooi</surname>
<given-names>T</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>Differences in dinucleotide frequencies of human, yeast, and Escherichia coli genes.</article-title>
<source>DNA Res</source>
<volume>4</volume>
<fpage>185</fpage>
<lpage>192</lpage>
<pub-id pub-id-type="pmid">9330906</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Hooper1">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hooper</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Berg</surname>
<given-names>OG</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Detection of genes with atypical nucleotide sequence in microbial genomes.</article-title>
<source>J Mol Evol</source>
<volume>54</volume>
<fpage>365</fpage>
<lpage>375</lpage>
<pub-id pub-id-type="pmid">11847562</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Hooper2">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hooper</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Berg</surname>
<given-names>OG</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Gradients in nucleotide and codon usage along Escherichia coli genes.</article-title>
<source>Nucleic Acids Res</source>
<volume>28</volume>
<fpage>3517</fpage>
<lpage>3523</lpage>
<pub-id pub-id-type="pmid">10982871</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Stahl1">
<label>34</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stahl</surname>
<given-names>G</given-names>
</name>
<name>
<surname>McCarty</surname>
<given-names>GP</given-names>
</name>
<name>
<surname>Farabaugh</surname>
<given-names>PJ</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Ribosome structure: revisiting the connection between translational accuracy and unconventional decoding.</article-title>
<source>Trends Biochem Sci</source>
<volume>27</volume>
<fpage>178</fpage>
<lpage>183</lpage>
<pub-id pub-id-type="pmid">11943544</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Dix1">
<label>35</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dix</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>RC</given-names>
</name>
</person-group>
<year>1989</year>
<article-title>Codon choice and gene expression: synonymous codons differ in translational accuracy.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>86</volume>
<fpage>6888</fpage>
<lpage>6892</lpage>
<pub-id pub-id-type="pmid">2674938</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Gutman1">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gutman</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Hatfield</surname>
<given-names>GW</given-names>
</name>
</person-group>
<year>1989</year>
<article-title>Nonrandom utilization of codon pairs in Escherichia coli.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>86</volume>
<fpage>3699</fpage>
<lpage>3703</lpage>
<pub-id pub-id-type="pmid">2657727</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Buchan1">
<label>37</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buchan</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Aucott</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>Stansfield</surname>
<given-names>I</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>tRNA properties help shape codon pair preferences in open reading frames.</article-title>
<source>Nucleic Acids Res</source>
<volume>34</volume>
<fpage>1015</fpage>
<lpage>1027</lpage>
<pub-id pub-id-type="pmid">16473853</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Rocha1">
<label>38</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rocha</surname>
<given-names>EP</given-names>
</name>
<name>
<surname>Danchin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Viari</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>1999</year>
<article-title>Universal replication biases in bacteria.</article-title>
<source>Mol Microbiol</source>
<volume>32</volume>
<fpage>11</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="pmid">10216855</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Grantham1">
<label>39</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grantham</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gautier</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gouy</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mercier</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Pave</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>1980</year>
<article-title>Codon catalog usage and the genome hypothesis.</article-title>
<source>Nucleic Acids Res</source>
<volume>8</volume>
<fpage>r49</fpage>
<lpage>r62</lpage>
<pub-id pub-id-type="pmid">6986610</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Buckingham2">
<label>40</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckingham</surname>
<given-names>RH</given-names>
</name>
</person-group>
<year>1990</year>
<article-title>Codon context.</article-title>
<source>Experientia</source>
<volume>46</volume>
<fpage>1126</fpage>
<lpage>1133</lpage>
<pub-id pub-id-type="pmid">2253710</pub-id>
</citation>
</ref>
<ref id="pone.0000847-McVean1">
<label>41</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>McVean</surname>
<given-names>GAT</given-names>
</name>
<name>
<surname>Hurst</surname>
<given-names>GDD</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Evolutionary lability of context-dependent codon bias in bacteria.</article-title>
<source>J Mol Evol</source>
<volume>50</volume>
<fpage>264</fpage>
<lpage>275</lpage>
<pub-id pub-id-type="pmid">10754070</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Lao1">
<label>42</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lao</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Forsdyke</surname>
<given-names>DR</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine.</article-title>
<source>Genome Res</source>
<volume>10</volume>
<fpage>228</fpage>
<lpage>236</lpage>
<pub-id pub-id-type="pmid">10673280</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Kennedy1">
<label>43</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kennedy</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>WV</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Hood</surname>
<given-names>L</given-names>
</name>
<name>
<surname>DasSarma</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence.</article-title>
<source>Genome Res</source>
<volume>11</volume>
<fpage>1641</fpage>
<lpage>1650</lpage>
<pub-id pub-id-type="pmid">11591641</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Tekaia1">
<label>44</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tekaia</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Yeramian</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Dujon</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis.</article-title>
<source>Gene</source>
<volume>297</volume>
<fpage>51</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="pmid">12384285</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Beutler1">
<label>45</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beutler</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Gelbart</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Koziol</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Beutler</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>1989</year>
<article-title>Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>86</volume>
<fpage>192</fpage>
<lpage>196</lpage>
<pub-id pub-id-type="pmid">2463621</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Nakashima2">
<label>46</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nakashima</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ota</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nishikawa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ooi</surname>
<given-names>T</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Genes from nine genomes are separated into their organisms in the dinucleotide composition space.</article-title>
<source>DNA Res</source>
<volume>5</volume>
<fpage>251</fpage>
<lpage>259</lpage>
<pub-id pub-id-type="pmid">9872449</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Marck1">
<label>47</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marck</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Grosjean</surname>
<given-names>H</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features.</article-title>
<source>RNA</source>
<volume>8</volume>
<fpage>1189</fpage>
<lpage>1232</lpage>
<pub-id pub-id-type="pmid">12403461</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Crick1">
<label>48</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crick</surname>
<given-names>FH</given-names>
</name>
</person-group>
<year>1966</year>
<article-title>Codon–anticodon pairing: the wobble hypothesis.</article-title>
<source>J Mol Biol</source>
<volume>19</volume>
<fpage>548</fpage>
<lpage>555</lpage>
<pub-id pub-id-type="pmid">5969078</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Caburet1">
<label>49</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Caburet</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Vaiman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Veitia</surname>
<given-names>RA</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>A genomic basis for the evolution of vertebrate transcription factors containing amino Acid runs.</article-title>
<source>Genetics</source>
<volume>167</volume>
<fpage>1813</fpage>
<lpage>1820</lpage>
<pub-id pub-id-type="pmid">15342519</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Borstnik1">
<label>50</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Borstnik</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Pumpernik</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Tandem repeats in protein coding regions of primate genes.</article-title>
<source>Genome Res</source>
<volume>12</volume>
<fpage>909</fpage>
<lpage>915</lpage>
<pub-id pub-id-type="pmid">12045144</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Rocha2">
<label>51</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rocha</surname>
<given-names>EP</given-names>
</name>
<name>
<surname>Matic</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Taddei</surname>
<given-names>F</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Over-representation of repeats in stress response genes: a strategy to increase versatility under stressful conditions?</article-title>
<source>Nucleic Acids Res</source>
<volume>30</volume>
<fpage>1886</fpage>
<lpage>1894</lpage>
<pub-id pub-id-type="pmid">11972324</pub-id>
</citation>
</ref>
<ref id="pone.0000847-Haberman1">
<label>52</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haberman</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>1973</year>
<article-title>Analysis of residuals in cross-classified tables.</article-title>
<source>Biometrics</source>
<volume>29</volume>
<fpage>205</fpage>
<lpage>220</lpage>
</citation>
</ref>
<ref id="pone.0000847-Simenoff1">
<label>53</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Simenoff</surname>
<given-names>JS</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Analyzing categorical data.</article-title>
<publisher-loc>New York</publisher-loc>
<publisher-name>Springer-Verlag</publisher-name>
</citation>
</ref>
</ref-list>
<fn-group>
<fn fn-type="conflict">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="financial-disclosure">
<p>
<bold>Funding: </bold>
This study was supported by FEDER/FCT projects POCTI/BME/39030; SAU-MMO/55476; BIA-PRO/55472; BIA-MIC/55466; PTDC/MAT/72974/2006 and Human Frontier Science Programme project RGP45/2005.</p>
</fn>
</fn-group>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000515  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000515  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024