Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'

Identifieur interne : 000217 ( Pmc/Corpus ); précédent : 000216; suivant : 000218

Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'

Auteurs : Michelle M. Meyer ; Tyler D. Ames ; Daniel P. Smith ; Zasha Weinberg ; Michael S. Schwalbach ; Stephen J. Giovannoni ; Ronald R. Breaker

Source :

RBID : PMC:2704228

Abstract

Background

Metagenomic sequence data are proving to be a vast resource for the discovery of biological components. Yet analysis of this data to identify functional RNAs lags behind efforts to characterize protein diversity. The genome of 'Candidatus Pelagibacter ubique' HTCC 1062 is the closest match for approximately 20% of marine metagenomic sequence reads. It is also small, contains little non-coding DNA, and has strikingly low GC content.

Results

To aid the discovery of RNA motifs within the marine metagenome we exploited the genomic properties of 'Cand. P. ubique' by targeting our search to long intergenic regions (IGRs) with relatively high GC content. Analysis of known RNAs (rRNA, tRNA, riboswitches etc.) shows that structured RNAs are significantly enriched in such IGRs. To identify additional candidate structured RNAs, we examined other IGRs with similar characteristics from 'Cand. P. ubique' using comparative genomics approaches in conjunction with marine metagenomic data. Employing this strategy, we discovered four candidate structured RNAs including a new riboswitch class as well as three additional likely cis-regulatory elements that precede genes encoding ribosomal proteins S2 and S12, and the cytoplasmic protein component of the signal recognition particle. We also describe four additional potential RNA motifs with few or no examples occurring outside the metagenomic data.

Conclusion

This work begins the process of identifying functional RNA motifs present in the metagenomic data and illustrates how existing completed genomes may be used to aid in this task.


Url:
DOI: 10.1186/1471-2164-10-268
PubMed: 19531245
PubMed Central: 2704228

Links to Exploration step

PMC:2704228

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Identification of candidate structured RNAs in the marine organism '
<italic>Candidatus </italic>
Pelagibacter ubique'</title>
<author>
<name sortKey="Meyer, Michelle M" sort="Meyer, Michelle M" uniqKey="Meyer M" first="Michelle M" last="Meyer">Michelle M. Meyer</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ames, Tyler D" sort="Ames, Tyler D" uniqKey="Ames T" first="Tyler D" last="Ames">Tyler D. Ames</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Smith, Daniel P" sort="Smith, Daniel P" uniqKey="Smith D" first="Daniel P" last="Smith">Daniel P. Smith</name>
<affiliation>
<nlm:aff id="I4">Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Weinberg, Zasha" sort="Weinberg, Zasha" uniqKey="Weinberg Z" first="Zasha" last="Weinberg">Zasha Weinberg</name>
<affiliation>
<nlm:aff id="I3">Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schwalbach, Michael S" sort="Schwalbach, Michael S" uniqKey="Schwalbach M" first="Michael S" last="Schwalbach">Michael S. Schwalbach</name>
<affiliation>
<nlm:aff id="I4">Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Giovannoni, Stephen J" sort="Giovannoni, Stephen J" uniqKey="Giovannoni S" first="Stephen J" last="Giovannoni">Stephen J. Giovannoni</name>
<affiliation>
<nlm:aff id="I4">Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Breaker, Ronald R" sort="Breaker, Ronald R" uniqKey="Breaker R" first="Ronald R" last="Breaker">Ronald R. Breaker</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19531245</idno>
<idno type="pmc">2704228</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2704228</idno>
<idno type="RBID">PMC:2704228</idno>
<idno type="doi">10.1186/1471-2164-10-268</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000217</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Identification of candidate structured RNAs in the marine organism '
<italic>Candidatus </italic>
Pelagibacter ubique'</title>
<author>
<name sortKey="Meyer, Michelle M" sort="Meyer, Michelle M" uniqKey="Meyer M" first="Michelle M" last="Meyer">Michelle M. Meyer</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ames, Tyler D" sort="Ames, Tyler D" uniqKey="Ames T" first="Tyler D" last="Ames">Tyler D. Ames</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Smith, Daniel P" sort="Smith, Daniel P" uniqKey="Smith D" first="Daniel P" last="Smith">Daniel P. Smith</name>
<affiliation>
<nlm:aff id="I4">Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Weinberg, Zasha" sort="Weinberg, Zasha" uniqKey="Weinberg Z" first="Zasha" last="Weinberg">Zasha Weinberg</name>
<affiliation>
<nlm:aff id="I3">Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schwalbach, Michael S" sort="Schwalbach, Michael S" uniqKey="Schwalbach M" first="Michael S" last="Schwalbach">Michael S. Schwalbach</name>
<affiliation>
<nlm:aff id="I4">Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Giovannoni, Stephen J" sort="Giovannoni, Stephen J" uniqKey="Giovannoni S" first="Stephen J" last="Giovannoni">Stephen J. Giovannoni</name>
<affiliation>
<nlm:aff id="I4">Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Breaker, Ronald R" sort="Breaker, Ronald R" uniqKey="Breaker R" first="Ronald R" last="Breaker">Ronald R. Breaker</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Metagenomic sequence data are proving to be a vast resource for the discovery of biological components. Yet analysis of this data to identify functional RNAs lags behind efforts to characterize protein diversity. The genome of '
<italic>Candidatus </italic>
Pelagibacter ubique' HTCC 1062 is the closest match for approximately 20% of marine metagenomic sequence reads. It is also small, contains little non-coding DNA, and has strikingly low GC content.</p>
</sec>
<sec>
<title>Results</title>
<p>To aid the discovery of RNA motifs within the marine metagenome we exploited the genomic properties of '
<italic>Cand</italic>
. P. ubique' by targeting our search to long intergenic regions (IGRs) with relatively high GC content. Analysis of known RNAs (rRNA, tRNA, riboswitches etc.) shows that structured RNAs are significantly enriched in such IGRs. To identify additional candidate structured RNAs, we examined other IGRs with similar characteristics from '
<italic>Cand</italic>
. P. ubique' using comparative genomics approaches in conjunction with marine metagenomic data. Employing this strategy, we discovered four candidate structured RNAs including a new riboswitch class as well as three additional likely
<italic>cis</italic>
-regulatory elements that precede genes encoding ribosomal proteins S2 and S12, and the cytoplasmic protein component of the signal recognition particle. We also describe four additional potential RNA motifs with few or no examples occurring outside the metagenomic data.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>This work begins the process of identifying functional RNA motifs present in the metagenomic data and illustrates how existing completed genomes may be used to aid in this task.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-title>BMC Genomics</journal-title>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19531245</article-id>
<article-id pub-id-type="pmc">2704228</article-id>
<article-id pub-id-type="publisher-id">1471-2164-10-268</article-id>
<article-id pub-id-type="doi">10.1186/1471-2164-10-268</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Identification of candidate structured RNAs in the marine organism '
<italic>Candidatus </italic>
Pelagibacter ubique'</article-title>
</title-group>
<contrib-group>
<contrib id="A1" contrib-type="author">
<name>
<surname>Meyer</surname>
<given-names>Michelle M</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>michelle.meyer@yale.edu</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Ames</surname>
<given-names>Tyler D</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>tyler.ames@yale.edu</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Smith</surname>
<given-names>Daniel P</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>dansmith@orst.edu</email>
</contrib>
<contrib id="A4" contrib-type="author">
<name>
<surname>Weinberg</surname>
<given-names>Zasha</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>zasha.weinberg@yale.edu</email>
</contrib>
<contrib id="A5" contrib-type="author">
<name>
<surname>Schwalbach</surname>
<given-names>Michael S</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>schwalbm@onid.orst.edu</email>
</contrib>
<contrib id="A6" contrib-type="author">
<name>
<surname>Giovannoni</surname>
<given-names>Stephen J</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>steve.giovannoni@oregonstate.edu</email>
</contrib>
<contrib id="A7" corresp="yes" contrib-type="author">
<name>
<surname>Breaker</surname>
<given-names>Ronald R</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>ronald.breaker@yale.edu</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Molecular Cellular and Developmental Biology, Yale University, Box 208103, New Haven, CT 06520, USA</aff>
<aff id="I2">
<label>2</label>
Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520, USA</aff>
<aff id="I3">
<label>3</label>
Howard Hughes Medical Institute, Yale University, Box 208103, New Haven, CT 06520, USA</aff>
<aff id="I4">
<label>4</label>
Department of Microbiology, Oregon State University, Corvallis, OR 97333, USA</aff>
<pub-date pub-type="collection">
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>16</day>
<month>6</month>
<year>2009</year>
</pub-date>
<volume>10</volume>
<fpage>268</fpage>
<lpage>268</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2164/10/268"></ext-link>
<history>
<date date-type="received">
<day>6</day>
<month>1</month>
<year>2009</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>6</month>
<year>2009</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2009 Meyer et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2009</copyright-year>
<copyright-holder>Meyer et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> Meyer M Michelle michelle.meyer@yale.edu Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique' 2009BMC Genomics 10(1): 268-. (2009)1471-2164(2009)10:1<268>urn:ISSN:1471-2164</pmc-comment>
</license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>Metagenomic sequence data are proving to be a vast resource for the discovery of biological components. Yet analysis of this data to identify functional RNAs lags behind efforts to characterize protein diversity. The genome of '
<italic>Candidatus </italic>
Pelagibacter ubique' HTCC 1062 is the closest match for approximately 20% of marine metagenomic sequence reads. It is also small, contains little non-coding DNA, and has strikingly low GC content.</p>
</sec>
<sec>
<title>Results</title>
<p>To aid the discovery of RNA motifs within the marine metagenome we exploited the genomic properties of '
<italic>Cand</italic>
. P. ubique' by targeting our search to long intergenic regions (IGRs) with relatively high GC content. Analysis of known RNAs (rRNA, tRNA, riboswitches etc.) shows that structured RNAs are significantly enriched in such IGRs. To identify additional candidate structured RNAs, we examined other IGRs with similar characteristics from '
<italic>Cand</italic>
. P. ubique' using comparative genomics approaches in conjunction with marine metagenomic data. Employing this strategy, we discovered four candidate structured RNAs including a new riboswitch class as well as three additional likely
<italic>cis</italic>
-regulatory elements that precede genes encoding ribosomal proteins S2 and S12, and the cytoplasmic protein component of the signal recognition particle. We also describe four additional potential RNA motifs with few or no examples occurring outside the metagenomic data.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>This work begins the process of identifying functional RNA motifs present in the metagenomic data and illustrates how existing completed genomes may be used to aid in this task.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The discovery of many RNA sequences that do not encode proteins (non-coding RNAs or ncRNA) and have biological functions beyond those of tRNA and rRNA, has significantly expanded the known role of RNA in diverse cellular processes. Consequently, there is a growing effort to systematically identify ncRNAs utilizing both experimental and computational techniques. Experimental approaches are typically used to identify non-coding portions of an organism's genome that are actively being transcribed. These approaches are not dependent on the identification of conserved RNA sequences or secondary structures, and therefore are well-suited for the discovery of unstructured or poorly-conserved ncRNAs. However, experimental limitations can cause some RNAs to be missed, and the false-positive rate may be high due to "transcriptional noise" [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. Alternatively, computational methods seek to identify evidence of conserved RNA sequences and secondary structures through comparative genomics [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
]. However, such methods usually cannot be used to identify RNA motifs that may not have conserved secondary structure, are small with few base-pairing elements, or are not well-represented in genomic sequence databases.</p>
<p>Marine metagenomic sequence data are a proven resource for the discovery of novel protein diversity and have provided additional examples for thousands of previously identified open reading frames (ORFs) with no known homologs [
<xref ref-type="bibr" rid="B5">5</xref>
]. While there have been surveys conducted with the marine metagenome to discover additional examples of known ncRNAs [
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
], there have been no studies explicitly examining these data for novel RNA motifs, in part due to unique computational challenges inherent to metagenomic datasets. Specifically, the exceedingly large amount of sequence data available (~7 billion base pairs), relatively poor annotation of protein coding regions due to a high frequency of fragmentary genes that result from short sequence reads, and comparatively high sequencing error rates make metagenomic data analysis difficult [
<xref ref-type="bibr" rid="B8">8</xref>
-
<xref ref-type="bibr" rid="B10">10</xref>
].</p>
<p>To circumvent many of the challenges associated with analyzing metagenomic sequence data, we have used the genome of '
<italic>Cand</italic>
. P. ubique' HTCC 1062 as a starting point to discover new RNA motifs within the marine metagenome. Bacteria of the SAR11 clade, of which '
<italic>Cand</italic>
. P. ubique' is a representative, are found throughout the world's oceans and are the dominant aerobic heterotrophs in marine surface waters [
<xref ref-type="bibr" rid="B11">11</xref>
]. Given its numeric advantage, genes from members of the SAR11 clade are well-represented in marine metagenomic libraries with nearly 20% of sequence reads from the Global Oceanographic Survey (GOS) matching most closely to genes present in the '
<italic>Cand</italic>
. P. ubique' genome [
<xref ref-type="bibr" rid="B12">12</xref>
,
<xref ref-type="bibr" rid="B13">13</xref>
]. Only ~30% of the GOS reads could be aligned well to the 584 available reference genomes. The other predominant genera represented in the GOS data are
<italic>Prochlorococcus, Synechococcus, Burkholderia</italic>
, and
<italic>Shewanella</italic>
, none of which are closely related to '
<italic>Cand</italic>
. P. ubique'. While, alignments to every reference genome were identified, typically they showed identity to regions corresponding to large, highly conserved genes [
<xref ref-type="bibr" rid="B13">13</xref>
].</p>
<p>At 1.3 million base pairs, the genome of '
<italic>Cand</italic>
. P. ubique' is the smallest known for a free-living organism, but it appears to encode for nearly all the basic functions of Alphaproteobacteria cells [
<xref ref-type="bibr" rid="B14">14</xref>
]. The genome contains very little non-coding DNA, with a median intergenic region (IGR) length of 3 nucleotides. In addition, the organism has remarkably low GC content (29%). While evaluating nucleotide composition is usually not a viable method for identifying ncRNAs [
<xref ref-type="bibr" rid="B15">15</xref>
], in genomes with a strong AT bias or hyperthermophilic environment, the higher GC content necessary to maintain a stable RNA structure may be used to identify candidate ncRNAs [
<xref ref-type="bibr" rid="B16">16</xref>
-
<xref ref-type="bibr" rid="B19">19</xref>
]. '
<italic>Cand</italic>
. P. ubique' offers an ideal opportunity to utilize nucleotide composition as its genome has very few long IGRs, which are generally low GC (23% on average).</p>
<p>In the current study we combine nucleotide composition with comparative genomics approaches to identify novel structured RNA motifs in '
<italic>Cand</italic>
. P. ubique' and the marine metagenomic data. First, we demonstrate that longer, higher GC '
<italic>Cand</italic>
. P. ubique' IGRs are much more likely to contain structured RNAs (rRNAs, tRNAs, etc.). Subsequently, we utilized the IGRs in '
<italic>Cand</italic>
. P. ubique' with similar properties that lack assigned ncRNAs as the starting point for a comparative sequence analysis strategy that takes advantage of marine metagenomic sequences. We discovered four likely structured ncRNAs including a new riboswitch class, and three other candidate
<italic>cis</italic>
-regulatory motifs. In addition we describe several other conserved IGRs that encode potential structured RNA elements.</p>
</sec>
<sec>
<title>Results</title>
<sec>
<title>Analysis strategy</title>
<p>To identify potential ncRNAs in the genome of '
<italic>Cand</italic>
. P. ubique', all IGRs were extracted from the '
<italic>Cand</italic>
. P. ubique' genome and ranked by GC content. When '
<italic>Cand</italic>
. P. ubique' IGRs are plotted by their length and percent GC, those containing annotated RNAs (rRNAs, tRNAs, riboswitches, etc.) cluster toward the top right of the graph (Figure
<xref ref-type="fig" rid="F1">1</xref>
). This finding indicates that the vast majority of GC-enriched IGRs longer than 100 bp carry annotated ncRNAs (Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
).</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>
<bold>Percent GC-content versus length of intergenic regions (IGRs) in '
<italic>Cand</italic>
. P. ubique'</bold>
. Transfer and ribosomal RNAs are as annotated by Rfam [
<xref ref-type="bibr" rid="B24">24</xref>
] and RefSeq (RefSeq accession NC_007205.1). Other structured RNAs include known riboswitches, 4.5S RNA (SRP RNA), RNase P RNA and tmRNA.</p>
</caption>
<graphic xlink:href="1471-2164-10-268-1"></graphic>
</fig>
<p>To identify additional structured RNAs that may not be annotated, we performed BLAST searches of the remaining IGRs against the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) database [
<xref ref-type="bibr" rid="B20">20</xref>
]. Table
<xref ref-type="table" rid="T1">1</xref>
lists GC enriched '
<italic>Cand</italic>
. P. ubique' IGRs longer than 100 bp and the number of BLAST hits identified with an E-value less than 10
<sup>-5 </sup>
as a measure of conservation. The average number of blast hits for IGRs containing tRNAs is 2158, with a standard deviation of 1282. However, the average number of blast hits for the '
<italic>Cand</italic>
. P. ubique' IGRs containing SAM-II riboswitches, which are significantly smaller than a tRNA and most commonly present in Alpha-, Beta- and Gammaproteobacteria, is approximately 500. Based on this analysis and the need for a relatively large number of BLAST hits for subsequent comparative sequence analysis algorithms, IGRs with greater than 200 BLAST hits were further screened for unannotated ncRNAs and misannotated protein coding sequence. This screening process revealed several misannotated protein coding sequences in addition to several known structured RNAs not previously annotated (Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
– Table
<xref ref-type="table" rid="T1">1</xref>
). The RNA motifs identified are typically very highly ranked on our list, and include tmRNA, the RNA component of the signal recognition particle (SRP), the RNase P RNA (class A), and a number of riboswitches (Table
<xref ref-type="table" rid="T1">1</xref>
).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>'
<italic>Cand</italic>
. P. ubique' IGRs longer than 100 bp ranked by GC content. IGRs containing tRNA and rRNA removed</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="center">Coordinates</td>
<td align="center">Length</td>
<td align="center">%GC</td>
<td align="center">BLAST Hits</td>
<td align="center">RNA (strand)</td>
<td align="center">Locus Tag</td>
<td align="center">Flanking Gene Name (Strand)</td>
<td align="center">Locus Tag</td>
<td align="center">Flanking Gene Name (strand)</td>
</tr>
</thead>
<tbody>
<tr>
<td align="center">10302–10518</td>
<td align="center">217</td>
<td align="center">48.85</td>
<td align="center">1761</td>
<td align="center">tmRNA (+)</td>
<td align="center">SAR11_0010</td>
<td align="center">
<italic>thyX </italic>
(-)</td>
<td align="center">SAR11_0011</td>
<td align="center">COG4696 (-)</td>
</tr>
<tr>
<td align="center">649763–649953</td>
<td align="center">191</td>
<td align="center">41.88</td>
<td align="center">1990</td>
<td align="center">glycine
<break></break>
riboswitch (+)</td>
<td align="center">SAR11_0664</td>
<td align="center">membrane prot.(+)</td>
<td align="center">SAR11_0666</td>
<td align="center">gcvT (+)</td>
</tr>
<tr>
<td align="center">493521–493664</td>
<td align="center">144</td>
<td align="center">36.81</td>
<td align="center">888</td>
<td align="center">4.5 S RNA
<break></break>
(SRP RNA) (+)</td>
<td align="center">SAR11_506</td>
<td align="center">
<italic>pheA </italic>
(+)</td>
<td align="center">SAR11_0507</td>
<td align="center">
<italic>dnaX </italic>
(+)</td>
</tr>
<tr>
<td align="center">1127293–1127553</td>
<td align="center">261</td>
<td align="center">36.78</td>
<td align="center">127</td>
<td align="center">SAM-II/SAM-V riboswitch (-)</td>
<td align="center">SAR11_1129</td>
<td align="center">
<italic>bhmt </italic>
(-)</td>
<td align="center">SAR11_1730</td>
<td align="center">hyp. protein (-)</td>
</tr>
<tr>
<td align="center">564786–564910</td>
<td align="center">125</td>
<td align="center">35.2</td>
<td align="center">611</td>
<td align="center">
<italic>pntA </italic>
element (+)</td>
<td align="center">SAR11_0573</td>
<td align="center">
<italic>rpmJ </italic>
(+)</td>
<td align="center">SAR11_0574</td>
<td align="center">
<italic>pntA </italic>
(+)</td>
</tr>
<tr>
<td align="center">38796–39447</td>
<td align="center">652</td>
<td align="center">34.51</td>
<td align="center">2475</td>
<td align="center">RNase P
<break></break>
RNA (-)</td>
<td align="center">SAR11_0033</td>
<td align="center">
<italic>mraZ </italic>
(-)</td>
<td align="center">SAR11_0034</td>
<td align="center">
<italic>ybjR </italic>
(-)</td>
</tr>
<tr>
<td align="center">260190–260348</td>
<td align="center">159</td>
<td align="center">33.96</td>
<td align="center">1615</td>
<td align="center">
<italic>ffh </italic>
motif (-)</td>
<td align="center">SAR11_2356</td>
<td align="center">
<italic>ffh </italic>
(-)</td>
<td align="center">SAR11_0257</td>
<td align="center">
<italic>dapF </italic>
(+)</td>
</tr>
<tr>
<td align="center">626974–627168</td>
<td align="center">195</td>
<td align="center">33.33</td>
<td align="center">1168</td>
<td></td>
<td align="center">SAR11_0641</td>
<td align="center">
<italic>recA </italic>
(+)</td>
<td align="center">SAR11_0642</td>
<td align="center">protease (-)</td>
</tr>
<tr>
<td align="center">786467–786574</td>
<td align="center">108</td>
<td align="center">33.33</td>
<td align="center">927</td>
<td align="center">TPP
<break></break>
riboswitch (+)</td>
<td align="center">SAR11_0810</td>
<td align="center">hyp. protein (+)</td>
<td align="center">SAR11_0811</td>
<td align="center">transporter (+)</td>
</tr>
<tr>
<td align="center">585015–585135</td>
<td align="center">121</td>
<td align="center">33.06</td>
<td align="center">41</td>
<td></td>
<td align="center">SAR11_0599</td>
<td align="center">COG1729 (+)</td>
<td align="center">SAR11_0600</td>
<td align="center">
<italic>mesj </italic>
(+)</td>
</tr>
<tr>
<td align="center">498458–498706</td>
<td align="center">249</td>
<td align="center">32.93</td>
<td align="center">2398</td>
<td align="center">glycine
<break></break>
riboswitch (-)</td>
<td align="center">SAR11_0510</td>
<td align="center">
<italic>glcB </italic>
(-)</td>
<td align="center">SAR11_0511</td>
<td align="center">
<italic>accA </italic>
(+)</td>
</tr>
<tr>
<td align="center">622388–622552</td>
<td align="center">165</td>
<td align="center">32.73</td>
<td align="center">1301</td>
<td align="center">SAR11_0636 element (+)</td>
<td align="center">SAR11_0635</td>
<td align="center">hyp. protein (-)</td>
<td align="center">SAR11_0636</td>
<td align="center">hyp protein (+)</td>
</tr>
<tr>
<td align="center">1142870–1143031</td>
<td align="center">162</td>
<td align="center">32.1</td>
<td align="center">29</td>
<td></td>
<td align="center">SAR11_1190</td>
<td align="center">COG0659 (-)</td>
<td align="center">SAR11_1191</td>
<td align="center">HIT protein (-)</td>
</tr>
<tr>
<td align="center">159067–159166</td>
<td align="center">100</td>
<td align="center">32</td>
<td align="center">25</td>
<td></td>
<td align="center">SAR11_0156</td>
<td align="center">hyp. protein (-)</td>
<td align="center">SAR11_0157</td>
<td align="center">
<italic>ispA </italic>
(-)</td>
</tr>
<tr>
<td align="center">1292813–1292925</td>
<td align="center">113</td>
<td align="center">31.86</td>
<td align="center">57</td>
<td></td>
<td align="center">SAR11_1357</td>
<td align="center">
<italic>livF2 </italic>
(-)</td>
<td align="center">SAR11_1358</td>
<td align="center">
<italic>livG2 </italic>
(-)</td>
</tr>
<tr>
<td align="center">1120412–1120856</td>
<td align="center">445</td>
<td align="center">31.46</td>
<td align="center">66</td>
<td></td>
<td align="center">SAR11_1164</td>
<td align="center">lipoprotein (-)</td>
<td align="center">SAR11_1165</td>
<td align="center">exonuclease (+)</td>
</tr>
<tr>
<td align="center">873155–873283</td>
<td align="center">129</td>
<td align="center">31.01</td>
<td align="center">832</td>
<td align="center">
<italic>rpsB </italic>
motif (+)</td>
<td align="center">SAR11_0906</td>
<td align="center">
<italic>dnaE </italic>
(+)</td>
<td align="center">SAR11_0907</td>
<td align="center">
<italic>rpsB </italic>
(+)</td>
</tr>
<tr>
<td align="center">628285–628539</td>
<td align="center">255</td>
<td align="center">30.2</td>
<td align="center">571</td>
<td></td>
<td align="center">SAR11_0642</td>
<td align="center">protease (-)</td>
<td align="center">SAR11_0643</td>
<td align="center">
<italic>alaS </italic>
(+)</td>
</tr>
<tr>
<td align="center">1005679–1005890</td>
<td align="center">212</td>
<td align="center">30.19</td>
<td align="center">483</td>
<td align="center">SAM-V (+)</td>
<td align="center">SAR11_1029</td>
<td align="center">
<italic>rplM </italic>
(-)</td>
<td align="center">SAR11_1030</td>
<td align="center">
<italic>metY </italic>
(+)</td>
</tr>
<tr>
<td align="center">361353–361571</td>
<td align="center">219</td>
<td align="center">30.14</td>
<td align="center">76</td>
<td></td>
<td align="center">SAR11_0369</td>
<td align="center">
<italic>grpE </italic>
(-)</td>
<td align="center">SAR11_0370</td>
<td align="center">HAM1-like
<break></break>
prot. (+)</td>
</tr>
<tr>
<td align="center">1125490–1125606</td>
<td align="center">117</td>
<td align="center">29.91</td>
<td align="center">11</td>
<td></td>
<td align="center">SAR11_1171</td>
<td align="center">
<italic>ordL </italic>
(-)</td>
<td align="center">SAR11_1172</td>
<td align="center">
<italic>osmC </italic>
(-)</td>
</tr>
<tr>
<td align="center">1189853–1189956</td>
<td align="center">104</td>
<td align="center">29.81</td>
<td align="center">25</td>
<td></td>
<td align="center">SAR11_1248</td>
<td align="center">hyp. protein (+)</td>
<td align="center">SAR11_1249</td>
<td align="center">hyp. protein (+)</td>
</tr>
<tr>
<td align="center">676100–676308</td>
<td align="center">208</td>
<td align="center">28.7</td>
<td align="center">193</td>
<td></td>
<td align="center">SAR11_0691</td>
<td align="center">hyp. protein (-)</td>
<td align="center">SAR11_0692</td>
<td align="center">
<italic>yajQ </italic>
(-)</td>
</tr>
<tr>
<td align="center">1212757–1212865</td>
<td align="center">109</td>
<td align="center">29.36</td>
<td align="center">22</td>
<td></td>
<td align="center">SAR11_1279</td>
<td align="center">membrane prot. (-)</td>
<td align="center">SAR11_1280</td>
<td align="center">hyp. protein (+)</td>
</tr>
<tr>
<td align="center">732778–732938</td>
<td align="center">161</td>
<td align="center">29.19</td>
<td align="center">446</td>
<td align="center">SAM-V (-)</td>
<td align="center">SAR11_0750</td>
<td align="center">
<italic>mmuM </italic>
(-)</td>
<td align="center">SAR11_0751</td>
<td align="center">hyp. protein. (-)</td>
</tr>
<tr>
<td align="center">57720–58035</td>
<td align="center">316</td>
<td align="center">29.11</td>
<td align="center">25</td>
<td></td>
<td align="center">SAR11_0046</td>
<td align="center">autotransporter (-)</td>
<td align="center">SAR11_0047</td>
<td align="center">transcription regulator (+)</td>
</tr>
<tr>
<td align="center">120095–120215</td>
<td align="center">121</td>
<td align="center">28.93</td>
<td align="center">211</td>
<td align="center">
<italic>bablM</italic>
<break></break>
element (+)</td>
<td align="center">SAR11_0108</td>
<td align="center">
<italic>rnhB </italic>
(+)</td>
<td align="center">SAR11_0109</td>
<td align="center">
<italic>babIM </italic>
(+)</td>
</tr>
<tr>
<td align="center">762114–762332</td>
<td align="center">219</td>
<td align="center">28.31</td>
<td align="center">55</td>
<td></td>
<td align="center">SAR11_0784</td>
<td align="center">hyp. protein (+)</td>
<td align="center">SAR11_0785</td>
<td align="center">hyp. protein (+)</td>
</tr>
<tr>
<td align="center">834435–834636</td>
<td align="center">202</td>
<td align="center">28.22</td>
<td align="center">42</td>
<td></td>
<td align="center">SAR11_0864</td>
<td align="center">hyp. protein (+)</td>
<td align="center">SAR11_0865</td>
<td align="center">transporter (+)</td>
</tr>
<tr>
<td align="center">1164239–1164384</td>
<td align="center">146</td>
<td align="center">28.08</td>
<td align="center">0</td>
<td></td>
<td align="center">SAR11_1216</td>
<td align="center">
<italic>ecpD </italic>
(+)</td>
<td align="center">SAR11_1218</td>
<td align="center">
<italic>sigB </italic>
(+)</td>
</tr>
<tr>
<td align="center">52729–52884</td>
<td align="center">157</td>
<td align="center">28.02</td>
<td align="center">22</td>
<td></td>
<td align="center">SAR11_0042</td>
<td align="center">autotransporter (-)</td>
<td align="center">SAR11_0043</td>
<td align="center">hyp. protein (-)</td>
</tr>
<tr>
<td align="center">1297623–1297755</td>
<td align="center">133</td>
<td align="center">27.82</td>
<td align="center">480</td>
<td align="center">
<italic>rhtB </italic>
element (-)</td>
<td align="center">SAR11_1362</td>
<td align="center">
<italic>rhtB </italic>
(-)</td>
<td align="center">SAR11_1363</td>
<td align="center">hyp. protein (+)</td>
</tr>
<tr>
<td align="center">675041–675166</td>
<td align="center">126</td>
<td align="center">27.78</td>
<td align="center">205</td>
<td></td>
<td align="center">SAR11_0690</td>
<td align="center">hyp. protein (-)</td>
<td align="center">SAR11_0691</td>
<td align="center">hyp. protein (-)</td>
</tr>
<tr>
<td align="center">762678–763012</td>
<td align="center">335</td>
<td align="center">27.76</td>
<td align="center">76</td>
<td></td>
<td align="center">SAR11_0785</td>
<td align="center">hyp. protein (+)</td>
<td align="center">SAR11_0786</td>
<td align="center">
<italic>qacH </italic>
(-)</td>
</tr>
<tr>
<td align="center">43688–43789</td>
<td align="center">102</td>
<td align="center">27.4</td>
<td align="center">570</td>
<td></td>
<td align="center">SAR11_0037</td>
<td align="center">
<italic>rpoD </italic>
(-)</td>
<td align="center">SAR11_0038</td>
<td align="center">
<italic>dnaG </italic>
(-)</td>
</tr>
<tr>
<td align="center">791867–792012</td>
<td align="center">146</td>
<td align="center">27.4</td>
<td align="center">125</td>
<td></td>
<td align="center">SAR11_0817</td>
<td align="center">
<italic>hupA </italic>
(+)</td>
<td align="center">SAR11_0818</td>
<td align="center">
<italic>amtB (+)</italic>
</td>
</tr>
<tr>
<td align="center">1132812–1132928</td>
<td align="center">117</td>
<td align="center">27.35</td>
<td align="center">10</td>
<td></td>
<td align="center">SAR11_1178</td>
<td align="center">
<italic>pstC </italic>
(-)</td>
<td align="center">SAR11_1179</td>
<td align="center">
<italic>pstS </italic>
(-)</td>
</tr>
<tr>
<td align="center">1123617–1123934</td>
<td align="center">318</td>
<td align="center">27.04</td>
<td align="center">192</td>
<td></td>
<td align="center">SAR11_1169</td>
<td align="center">hyp. protein (-)</td>
<td align="center">SAR11_1170</td>
<td align="center">hyp. protein (-)</td>
</tr>
<tr>
<td align="center">1181972–1182071</td>
<td align="center">100</td>
<td align="center">27</td>
<td align="center">77</td>
<td></td>
<td align="center">SAR11_1238</td>
<td align="center">
<italic>sfuC </italic>
(-)</td>
<td align="center">SAR11_1239</td>
<td align="center">hyp. protein (-)</td>
</tr>
<tr>
<td align="center">670506–670772</td>
<td align="center">267</td>
<td align="center">26.97</td>
<td align="center">194</td>
<td></td>
<td align="center">SAR11_0685</td>
<td align="center">
<italic>moeA </italic>
(-)</td>
<td align="center">SAR11_0686</td>
<td align="center">hyp. protein (-)</td>
</tr>
<tr>
<td align="center">1074189–1074359</td>
<td align="center">171</td>
<td align="center">26.9</td>
<td align="center">650</td>
<td align="center">
<italic>rpsL </italic>
motif (-)</td>
<td align="center">SAR11_1121</td>
<td align="center">
<italic>rpsL </italic>
(-)</td>
<td align="center">SAR11_1122</td>
<td align="center">
<italic>rpoC </italic>
(-)</td>
</tr>
<tr>
<td align="center">164139–164261</td>
<td align="center">123</td>
<td align="center">26.82</td>
<td align="center">90</td>
<td></td>
<td align="center">SAR11_0160</td>
<td align="center">COG0647G (-)</td>
<td align="center">SAR11_0161</td>
<td align="center">
<italic>groES (+)</italic>
</td>
</tr>
<tr>
<td align="center">1245732–1245856</td>
<td align="center">125</td>
<td align="center">26.4</td>
<td align="center">37</td>
<td></td>
<td align="center">SAR11_1309</td>
<td align="center">hyp. protein (+)</td>
<td align="center">SAR11_1310</td>
<td align="center">
<italic>amt </italic>
(+)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Identification of SRP RNA (4.5S RNA) [
<xref ref-type="bibr" rid="B21">21</xref>
] and RNase P RNA [
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B23">23</xref>
] was very straightforward. Both are completely contained within their respective IGRs and conform to well-established consensus sequences [
<xref ref-type="bibr" rid="B24">24</xref>
]. We also easily identified a variety of RNA
<italic>cis</italic>
-regulatory elements known as riboswitches [
<xref ref-type="bibr" rid="B25">25</xref>
] including two representatives of the glycine riboswitch class [
<xref ref-type="bibr" rid="B26">26</xref>
] previously described in '
<italic>Cand</italic>
. P. ubique' [
<xref ref-type="bibr" rid="B27">27</xref>
], two class II SAM riboswitches (SAM-II) [
<xref ref-type="bibr" rid="B28">28</xref>
] and a TPP riboswitch [
<xref ref-type="bibr" rid="B29">29</xref>
,
<xref ref-type="bibr" rid="B30">30</xref>
].</p>
<p>In contrast, identification of the tmRNA [
<xref ref-type="bibr" rid="B31">31</xref>
] representative was somewhat more challenging. The tmRNA eluded identification during initial screens for several reasons. First, in the genome of '
<italic>Cand</italic>
. P. ubique' the flanking gene (
<italic>thyX</italic>
, SAR11_0010) is likely misannotated resulting in a partial overlap of the annotated coding region with the tmRNA. While coding sequences in '
<italic>Cand</italic>
. P. ubique' often overlap by several nucleotides, an in-frame methionine at position 30 of the existing annotation for thymidylate synthase sequence is most likely the correct start site based on BLAST analysis of ThyX protein sequences. Second, the genomic sequence of the tmRNA is split and permuted relative to the mature form of the RNA in '
<italic>Cand</italic>
. P. ubique'. While this feature is shared by most other Alphaproteobacteria and by some Cyanobacteria [
<xref ref-type="bibr" rid="B32">32</xref>
], it makes identification of the RNA more difficult because the region between the two sections varies in length between 75 and 125 bp [
<xref ref-type="bibr" rid="B33">33</xref>
], and the permuted model is not currently represented in the Rfam database [
<xref ref-type="bibr" rid="B24">24</xref>
].</p>
<p>By applying length, %GC and conservation thresholds we have significantly enriched our list of IGRs for known structured RNAs. Only, 4% of all IGRs in '
<italic>Cand</italic>
. P. ubique' contain known structured RNAs. Approximately 17% of IGRs greater than 100 bp contain structured RNA; and eliminating IGRs with <26% GC increases this percentage to ~40%. Applying the BLAST hit threshold further increases percentage of considered IGRs containing known structured RNAs to ~75%. However, our parameter choices do exclude 2 of the 34 IGRs (6%) containing previously known RNAs. The first is a tRNA that is found within an IGR of 98 bp. We explored lowering the 100 bp threshold. However, we identified few additional candidates, and these candidates typically were very close to previously established thresholds for other parameters further decreasing their attractiveness for comprehensive study. The second example of a known RNA we excluded using our parameters is the IGR containing a SAM-II riboswitch preceding
<italic>metX </italic>
(SAR11_0217), which failed to rank highly based on GC-enrichment. The IGR containing this riboswitch is 191 nucleotides long and 22.5% GC (ranked 121
<sup>st </sup>
in the genome based on Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). However, the SAM-II aptamer alone is 70 nucleotides long and 30% GC. An early investigation of the '
<italic>Cand</italic>
. P. ubique' genome did explore ranking the IGRs by the highest percent GC within a "sliding window" of 50 nucleotides [
<xref ref-type="bibr" rid="B19">19</xref>
]. However, this did not change the rankings of '
<italic>Cand</italic>
. P. ubique' IGRs significantly (R
<sup>2 </sup>
= 0.84, Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
). Thus, this additional level of complexity was not implemented for the final analysis.</p>
<p>For those IGRs that are longer than 100 bp, greater than 26% GC, and well-conserved in the marine metagenome (Table
<xref ref-type="table" rid="T1">1</xref>
) but do not contain known structured RNAs, similar sequences identified by the BLAST analysis were used as input for comparative sequence analysis algorithms employed for ncRNA discovery. For each IGR several hypothetical alignments and secondary structures were generated using a covariance model search [
<xref ref-type="bibr" rid="B34">34</xref>
]. These alignments and predicted secondary structures were then used as the starting point for homology searches of the NCBI and metagenomic sequence databases to identify additional examples [
<xref ref-type="bibr" rid="B35">35</xref>
,
<xref ref-type="bibr" rid="B36">36</xref>
]. To confirm and refine secondary-structure models and sequence alignments, all examples for a particular IGR were subsequently combined and the process repeated beginning with the covariance model search to generate an RNA secondary structure that is well-supported by a large number of representatives (100–300 unique sequences).</p>
<p>Using this strategy, we discovered candidate structured RNA elements located 5' relative to genes encoding ribosomal proteins S2 (
<italic>rpsB</italic>
) and S12 (
<italic>rpsL</italic>
), and the signal recognition particle protein (
<italic>ffh</italic>
). We also found a structured RNA element associated with genes for the methionine biosynthesis proteins
<italic>O</italic>
-acetylhomoserine (thiol)-lyase (
<italic>metY</italic>
), homoserine
<italic>S</italic>
-methyltransferase (
<italic>mmum</italic>
) and betaine-homocysteine methyltransferase (
<italic>bhmt</italic>
) (Figure
<xref ref-type="fig" rid="F2">2</xref>
). Moreover, we identified a series of IGRs that contain potential RNA structures that are less well-supported by the alignments and often include highly conserved regions with few mutations and thus few opportunities to observe covariation and compatible mutations that are the hallmark of a correctly predicted RNA secondary structure (Figure
<xref ref-type="fig" rid="F3">3</xref>
). Features of these new-found candidate structured RNAs are described below.</p>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>
<bold>Consensus sequences and structures for the four RNA motifs identified</bold>
. (A)
<italic>rpsB </italic>
motif, (B)
<italic>rpsL </italic>
motif, (C)
<italic>ffh </italic>
motif, (D) SAM-V riboswitch. See Additional files
<xref ref-type="supplementary-material" rid="S4">4</xref>
,
<xref ref-type="supplementary-material" rid="S5">5</xref>
,
<xref ref-type="supplementary-material" rid="S6">6</xref>
,
<xref ref-type="supplementary-material" rid="S7">7</xref>
for alignments of all representatives. Calculations for conservation of nucleotide identity are described in the Methods section. Proposed base pairs with more than 5% non-canonical Watson-Crick pairings or missing nucleotides are not classified as covarying.</p>
</caption>
<graphic xlink:href="1471-2164-10-268-2"></graphic>
</fig>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>
<bold>The conserved sequence and secondary structure of the four candidate RNA motifs identified</bold>
. (A)
<italic>rhtB </italic>
associated element, (B)
<italic>pntA </italic>
associated element, (C)
<italic>bablM </italic>
associated element, (D) SAR11_0636 element. See Additional files
<xref ref-type="supplementary-material" rid="S8">8</xref>
,
<xref ref-type="supplementary-material" rid="S9">9</xref>
,
<xref ref-type="supplementary-material" rid="S10">10</xref>
,
<xref ref-type="supplementary-material" rid="S11">11</xref>
for alignments of all representatives. Structural notations are as in Fig. 2, and consensus nucleotides and covariation computed identically to Fig. 2.</p>
</caption>
<graphic xlink:href="1471-2164-10-268-3"></graphic>
</fig>
</sec>
<sec>
<title>
<italic>rpsB </italic>
motif</title>
<p>We identified a likely RNA motif preceding the gene
<italic>rpsB</italic>
, which encodes ribosomal protein S2. The motif is present in both marine metagenomic sequences and most Alphaproteobacteria with the exception of most members of the Rickettsiaceae family (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
). In addition, we identified representatives in most Gammaproteobacteria, a few Epsilon-, Delta-, and Betaproteobacteria, Cyanobacteria, and some Firmicutes. In nearly all examples where the downstream genes can be determined, the motif precedes
<italic>rpsB</italic>
. However, a few precede
<italic>fts</italic>
, which encodes elongation factor Ts (Ef-Ts) and is often found in the same operon as
<italic>rpsB </italic>
[
<xref ref-type="bibr" rid="B37">37</xref>
].</p>
<p>The structure of
<italic>rpsB </italic>
motif (Figure
<xref ref-type="fig" rid="F2">2A</xref>
) consists of a long base-paired structure (P1) capped by a three-stem junction carrying two variable length stems (P2 and P3), both of which may be very short, or absent in some representatives. The nucleotide junction between P2 and P3 (J2–3) forms a pseudoknot with the 3' extension following P1. P2 is quite short in '
<italic>Cand</italic>
. P. ubique' and consists of only three base pairs. In Cyanobacteria, Firmicutes, and most Gammaproteobacteria this pairing element is entirely absent or very short (three or fewer base pairs). In contrast, P2 is up to eleven base-pairs in some species of Alphaproteobacteria. P3 is also quite short in '
<italic>Cand</italic>
. P. ubique' with only two base pairs, however, it is typically at least four base pairs and has greater than twelve base pairs in several species of Alpha- and Gammaproteobacteria. The pseudoknot interaction is present across all of the taxa. However, in Firmicutes it appears to only consist of three base pairs rather than the five predicted in other phylogenetic groups.</p>
<p>
<italic>Cis</italic>
-regulatory elements in the 5' untranslated regions (UTRs) of ribosomal protein encoding mRNAs have long been known [
<xref ref-type="bibr" rid="B38">38</xref>
]. Ribosomal proteins L1 [
<xref ref-type="bibr" rid="B39">39</xref>
], L4 [
<xref ref-type="bibr" rid="B40">40</xref>
,
<xref ref-type="bibr" rid="B41">41</xref>
], L10/L12 [
<xref ref-type="bibr" rid="B42">42</xref>
], L20 [
<xref ref-type="bibr" rid="B43">43</xref>
], S4 [
<xref ref-type="bibr" rid="B44">44</xref>
,
<xref ref-type="bibr" rid="B45">45</xref>
], S7 [
<xref ref-type="bibr" rid="B46">46</xref>
], S8 [
<xref ref-type="bibr" rid="B47">47</xref>
,
<xref ref-type="bibr" rid="B48">48</xref>
], S15 [
<xref ref-type="bibr" rid="B49">49</xref>
], and S1 [
<xref ref-type="bibr" rid="B50">50</xref>
] are known to bind mRNA sequences to control gene expression. All such sequences characterized to date are autoregulatory, where the mRNA is bound by a ribosomal protein encoded within the transcript [
<xref ref-type="bibr" rid="B38">38</xref>
]. Typically such sequences inhibit translation, although some regulate transcription [
<xref ref-type="bibr" rid="B41">41</xref>
,
<xref ref-type="bibr" rid="B51">51</xref>
].</p>
<p>The role of the S2 ribosomal protein in translation is not well understood. S2 binds the 30S subunit late in ribosome biogenesis and acts as a bridge between the 16S RNA and ribosomal protein S1, which is the only ribosomal protein contacting the 30S subunit through protein-protein interactions [
<xref ref-type="bibr" rid="B52">52</xref>
]. The function of S1 is similarly unclear; however it has been implicated in translating highly structured mRNAs [
<xref ref-type="bibr" rid="B53">53</xref>
], as well as in the formation of the translation initiation complex at internal ribosome binding sites [
<xref ref-type="bibr" rid="B54">54</xref>
]. Analysis of the crystal structure of the 30S subunit from
<italic>T. Thermophilus </italic>
ribosome shows that S2 contacts distal regions of the 16S RNA (H26 in the body and H35–37 in the body) [
<xref ref-type="bibr" rid="B55">55</xref>
]. These regions bear no obvious resemblance to the motif we have identified. However, structural mimicry cannot be excluded. In several instances the 5' UTR of an mRNA and the ribosomal RNA bound by the same protein share similar tertiary structures despite having little or no primary or secondary structure similarity [
<xref ref-type="bibr" rid="B56">56</xref>
-
<xref ref-type="bibr" rid="B59">59</xref>
].</p>
<p>The region upstream of the ribosomal protein S2 was identified as a potential 5' UTR in a transcriptome analysis of
<italic>Escherichia coli </italic>
[
<xref ref-type="bibr" rid="B60">60</xref>
,
<xref ref-type="bibr" rid="B61">61</xref>
]. In addition, recent
<italic>in vivo </italic>
work in
<italic>E. coli </italic>
shows that the region 162 nucleotides upstream of
<italic>rpsB </italic>
controls an
<italic>rpsB-lacZ </italic>
fusion construct in response to exogenous S2 added in
<italic>trans </italic>
[
<xref ref-type="bibr" rid="B62">62</xref>
]. This work identified the conserved RNA structure upstream of
<italic>rpsB </italic>
in other Gammaproteobacteria. However, we identified a more broadly conserved motif in Alpha- Beta- and Deltaproteobacteria as well as Cyanobacteria and Firmicutes. In addition, the pseudoknot interaction had not previously been identified.</p>
</sec>
<sec>
<title>
<italic>rpsL </italic>
motif</title>
<p>A second putative motif in the 5' UTR of a ribosomal mRNA was identified for
<italic>rpsL </italic>
(encoding ribosomal protein S12), the first gene in a series of 22 genes encoding ribosomal proteins in '
<italic>Cand. P. ubique</italic>
' that are homologous to those in the
<italic>E. coli str, spc</italic>
, and
<italic>S10 </italic>
ribosomal operons. We identified over 900 representatives (659 unique sequences) of the motif in the marine metagenome in addition to the instance in '
<italic>Cand</italic>
. P. ubique' (Additional file
<xref ref-type="supplementary-material" rid="S5">5</xref>
). The motif is consistently identified 3' of
<italic>rpoC</italic>
, which encodes RNA polymerase, and 5' of
<italic>rpsL</italic>
. The genes further downstream of
<italic>rpsL </italic>
are typically those identified in the '
<italic>Cand. P</italic>
. ubique' operon. However, due to the length of the metagenomic sequences analyzed it is impossible to determine whether the entire series of ORFs is conserved. The motif occasionally precedes
<italic>rpsG </italic>
or
<italic>fusA </italic>
genes that directly follow
<italic>rpsL </italic>
in the '
<italic>Cand</italic>
. P. ubique' genome. Despite extensive searching, we only identified the motif in '
<italic>Cand</italic>
. P. ubique' and marine metagenomic sequence samples.</p>
<p>The motif consists of a bulged P1 stem connecting to a three-stem junction (Figure
<xref ref-type="fig" rid="F2">2B</xref>
). The P2 stem shows covariation throughout its length, however, the loop region is diverse both in length (3–10 nt) and sequence. Both the P1 and P3 stems show some covariation, but more positions exhibit breaks in the Watson-Crick base pairing compared with the P2 stem. The nucleotides in J2–3 are identical in nearly all examples, and the P3 loop and P1 bulge also show extensive conservation.</p>
<p>Several proteins encoded by this series of ribosomal protein genes in '
<italic>Cand</italic>
. P. ubique' have been shown to regulate ribosomal protein expression in
<italic>E. coli </italic>
[
<xref ref-type="bibr" rid="B40">40</xref>
,
<xref ref-type="bibr" rid="B41">41</xref>
,
<xref ref-type="bibr" rid="B46">46</xref>
-
<xref ref-type="bibr" rid="B48">48</xref>
,
<xref ref-type="bibr" rid="B62">62</xref>
]. The
<italic>str </italic>
ribosomal operon (encoding ribosomal proteins S12, S7, and elongation factors G and Tu) is regulated by the binding of S7 to the transcript region between the genes for S12 and S7 [
<xref ref-type="bibr" rid="B46">46</xref>
]. Similarly, the
<italic>spc </italic>
operon (encoding ribosomal proteins L14, L24, L5, S14, S8, L6, L18, S5, L30 L15 and
<italic>secY</italic>
) is regulated by S8 binding to an mRNA structure between L24 and L5 [
<xref ref-type="bibr" rid="B47">47</xref>
,
<xref ref-type="bibr" rid="B48">48</xref>
]. The eleven-gene
<italic>S10 </italic>
operon (encoding ribosomal proteins S10, L3, L4, L23, L2, S19, L22, S3, L16, L29, S17) is regulated by ribosomal protein L4 binding to a 5' UTR preceding the S10 gene [
<xref ref-type="bibr" rid="B40">40</xref>
,
<xref ref-type="bibr" rid="B41">41</xref>
].</p>
<p>The secondary structure of the motif described here does not bear any resemblance to the regulatory motifs associated with S7, S8 and L4. Additionally, the
<italic>rpsL </italic>
motif is not located at the same genomic position as any of the
<italic>E. coli </italic>
regulatory motifs. While this series of ribosomal proteins in '
<italic>Cand</italic>
. P. ubique' essentially consists of the three separate
<italic>E. coli </italic>
operons, separate regulation in this organism is unlikely as the coding regions typically overlap by a few base pairs and the largest IGR is nine nucleotides. This motif is not identified outside of '
<italic>Cand</italic>
. P. ubique' and the metagenomic data. However, given its genomic context and conserved secondary structure, the
<italic>rpsL </italic>
motif is likely a structured RNA involved with regulation of ribosomal protein expression. Considering the large number of potential candidates, we cannot predict with confidence which protein may be its binding partner.</p>
</sec>
<sec>
<title>
<italic>ffh </italic>
motif</title>
<p>We identified an RNA motif in the IGR preceding the gene
<italic>ffh </italic>
which encodes the cytoplasmic protein component of the bacterial signal recognition particle (SRP). The motif is well-conserved in metagenomic sequence samples with over 600 representatives (345 unique sequences) (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). In addition, this motif is widespread among Alphaproteobacteria occurring in all fully-sequenced representatives of the Rhodobacterales, Sphingomonadales and Rhizobiales classes. However, the
<italic>ffh </italic>
motif does not occur in any sequenced representatives of the Rhodospirillales or Caulobacterales classes and it is also not found in representatives of Rickettsiales other than '
<italic>Cand</italic>
. P. ubique'. In nearly all examples where the downstream genes can be identified, the motif precedes
<italic>ffh</italic>
. This transcript has been detected by several metatranscriptomics analyses of microbial small RNAs [
<xref ref-type="bibr" rid="B63">63</xref>
,
<xref ref-type="bibr" rid="B64">64</xref>
].</p>
<p>The RNA motif consists of a single bulged hairpin (Figure
<xref ref-type="fig" rid="F2">2C</xref>
). However, there is convincing co-variation found at all positions along the stem with the exception of the first base-pair which is always a cytosine-guanosine pair. Additionally, there is significant sequence conservation within the bulge. In particular the two cytosine residues are found in nearly every example.</p>
<p>The signal recognition particle (SRP) is an essential RNA-protein complex conserved in all three domains of life that targets secreted proteins to the plasma membrane in eubacteria and archaea or to the endoplasmic reticulum in eukaryotes through interactions with peptide signal sequences [
<xref ref-type="bibr" rid="B21">21</xref>
]. The eubacterial SRP complex consists of the 4.5S RNA, a cytoplasmic protein (Ffh), and a receptor protein (FstY) that targets the complex to the membrane. Ffh binds directly to a conserved portion of the 4.5S RNA known as helix 8 [
<xref ref-type="bibr" rid="B65">65</xref>
], and FstY in turn binds Ffh [
<xref ref-type="bibr" rid="B66">66</xref>
,
<xref ref-type="bibr" rid="B67">67</xref>
]. The eukaryotic and archaeal SRPs typically consist of larger RNAs and a greater number of proteins. However, the interactions between the RNA component and the cytoplasmic protein are conserved [
<xref ref-type="bibr" rid="B68">68</xref>
].</p>
<p>How the levels of the Ffh protein and the 4.5S RNA are regulated is not fully understood. In
<italic>E. coli </italic>
the 4.5S RNA is present in excess compared to Ffh [
<xref ref-type="bibr" rid="B69">69</xref>
], and it has been shown using both depletion studies [
<xref ref-type="bibr" rid="B70">70</xref>
] and examination of a temperature sensitive
<italic>ffh </italic>
mutant in
<italic>E. coli </italic>
[
<xref ref-type="bibr" rid="B71">71</xref>
] that Ffh is significantly stabilized by its interactions with the 4.5S RNA and is rapidly degraded when not bound to the RNA. However, no regulation at the transcriptional or translational level has been described. The RNA motif identified does not appear to resemble the portion of the 4.5S RNA bound by Ffh. However, it is possible that the motif plays a role in the regulation of the
<italic>ffh </italic>
gene, especially given the widespread distribution of this motif and the precedent for
<italic>cis</italic>
-regulatory mRNA elements associated with the genes of RNA binding proteins [
<xref ref-type="bibr" rid="B72">72</xref>
].</p>
</sec>
<sec>
<title>Methionine biosynthesis associated motif</title>
<p>We identified a conserved RNA motif preceding the methionine biosynthesis genes
<italic>mmum, metY</italic>
, and
<italic>bhmt</italic>
. This conserved sequence was previously identified as a potential regulatory region in '
<italic>Cand</italic>
. P. ubique' as the three genes appear to be co-regulated from proteomic studies [
<xref ref-type="bibr" rid="B73">73</xref>
]. We found 690 representatives (505 unique sequences) in metagenomic sequences, most of which precede
<italic>metY </italic>
(Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
). However, there are metagenomic examples that precede
<italic>bhmt</italic>
,
<italic>metH</italic>
, and
<italic>mmum</italic>
. In addition, there is a single example in the genome of
<italic>Psychroflexus torquis </italic>
ATCC 700755 (RefSeq accession NZ_AAPR0000000) also preceding
<italic>metY</italic>
.</p>
<p>The motif consists of a simple pseudoknotted structure that is typically within ten nucleotides of a start codon (Figure
<xref ref-type="fig" rid="F2">2D</xref>
). Both stems show covariation and many loop nucleotides are well-conserved. Based on the association of the motif with methionine biosynthesis genes, the coregulation of the three genes in '
<italic>Cand</italic>
. P. ubique' [
<xref ref-type="bibr" rid="B73">73</xref>
], and the prevalence of
<italic>S</italic>
-adenosylmethionine (SAM)-binding riboswitches [
<xref ref-type="bibr" rid="B74">74</xref>
], we hypothesized that the RNA was a SAM-binding riboswitch.
<italic>In vitro </italic>
biochemical characterization of the RNA has revealed that representatives of this RNA motif selectively bind SAM (M. Meyer, E. Poiata, and R. Breaker; unpublished data).</p>
<p>The RNA motif also displays some similarities to the previously described class II SAM riboswitches (SAM-II) that bind SAM and control sulfur metabolism genes in Alphaproteobacteria [
<xref ref-type="bibr" rid="B28">28</xref>
]. In particular the two RNA motifs share a similar overall pseudoknotted structure and many of the bases shown to contact the ligand in a crystal structure of the class II SAM riboswitch [
<xref ref-type="bibr" rid="B75">75</xref>
] have equivalent nucleotides in the new-found motif. Despite these similarities, the motif lacks the final 3' base-pairing element present in most SAM-II riboswitch representatives. Moreover, both paired regions in the new motif differ in length from those in the SAM-II consensus, and the loop regions outside those that bind the ligand in the SAM-II riboswitch are not well conserved. Such differences in the riboswitch aptamers for SAM-I and SAM-IV riboswitches cause representatives to be sorted into distinct collections when examined using bioinformatics search algorithms that identify common sequence and structural elements [
<xref ref-type="bibr" rid="B76">76</xref>
]. Likewise, the differences between SAM-II and the new-found motif also cause them to be sorted independently, suggesting that this is a new class of SAM-binding riboswitches that we have termed SAM-V.</p>
</sec>
<sec>
<title>Other potential RNA motifs</title>
<p>In addition to the motifs that we identified that have strong support as structured RNAs based on their alignments and distribution, we also identified several potential RNA motifs that are less well-supported. These candidate RNA motifs have fewer positions with covariation or compatible mutations and are not identified outside the genome of '
<italic>Cand</italic>
. P. ubique' and metagenomic sequences. However, they do exhibit evidence of possible RNA structure formation and our models are supported by sequence alignments from the marine metagenome.</p>
<p>The first of these motifs consists of a single bulged hairpin (Figure
<xref ref-type="fig" rid="F3">3A</xref>
). Both portions of the stem are conserved, and show indications of covariation and compatible mutations at many positions. Both the loop and the bulge are also well-conserved. The alignment consists of ~1250 representatives (919 unique sequences) from the marine metagenome and '
<italic>Cand</italic>
. P. ubique' (Additional file
<xref ref-type="supplementary-material" rid="S8">8</xref>
). In '
<italic>Cand</italic>
. P. ubique' the motif is flanked by a hypothetical protein and
<italic>rhtB </italic>
(LysE type translocator). In the metagenomic sequence, this context is largely conserved. However, the motif also appears upstream of
<italic>proC </italic>
(pyrroline-5-carboxylate reductase), as well as other genes further downstream of
<italic>rhtB </italic>
in '
<italic>Cand</italic>
. P. ubique' such as
<italic>livM </italic>
and
<italic>livK </italic>
(components of putative branched amino acid transporters). Approximately 50% of examples of this motif, including the one in '
<italic>Cand</italic>
. P. ubique', are directly followed by a poly-uridine track of 6–9 nucleotides potentially forming a rho-independent terminator stem [
<xref ref-type="bibr" rid="B77">77</xref>
]. This feature suggests either a potential regulatory function or a conserved termination signal. However, the lower portion of the well-conserved hairpin structure also forms a fairly convincing inverted repeat sequence, which may indicate alternative functionality.</p>
<p>The second motif consists of two base-paired stems in series where the loop of the second is especially well-conserved (Figure
<xref ref-type="fig" rid="F3">3B</xref>
). The alignment includes 365 unique sequences derived from metagenomic sequences (~400 total representatives), in addition to the example in '
<italic>Cand</italic>
. P. ubique' (Additional file
<xref ref-type="supplementary-material" rid="S9">9</xref>
). In '
<italic>Cand</italic>
. P. ubique' the motif is flanked by
<italic>rpmJ</italic>
, which encodes the ribosomal protein L36, and
<italic>pntA</italic>
, which encodes the alpha subunit of a pyridine nucleotide transhydrogenase. In the marine metagenome the motif consistently precedes
<italic>pntA</italic>
, but the gene annotated directly 5' of the motif varies. Most frequently it is the 5S rRNA gene, or
<italic>rmlB </italic>
(dTDP-D-glucose 4,6-dehydratase, COG1088). The conserved position of this motif 5' of the
<italic>pntA </italic>
gene suggests a regulatory function related to
<italic>pntA</italic>
. However, there is an additional ~60 bp of sequence between the motif and the start of the gene. While this sequence is somewhat conserved at the nucleotide level, this region does not appear to have any structure supported by compatible or covarying base-pair interactions.</p>
<p>The third motif (Figure
<xref ref-type="fig" rid="F3">3C</xref>
) also consists of a set of predicted base-pairing stems in series. The sequence of the first predicted stem is very strongly conserved, with no mutations observed in any of the representatives identified. The second stem shows a few compatible mutations and the position nearest the loop frequently fails to maintain base pairing. The loops and linker regions exhibit almost no conservation. Approximately 540 representatives (314 unique sequences) were identified in the marine metagenome, and the genomic context is well conserved (Additional file
<xref ref-type="supplementary-material" rid="S10">10</xref>
). The motif occurs between
<italic>rnhB1 </italic>
(RNaseHII) and
<italic>bablM </italic>
(a site-specific DNA methylase) in the genome of '
<italic>Cand</italic>
. P. ubique' and the vast majority of metagenomic examples fall between genes annotated as
<italic>rnhB1 </italic>
and a DNA methylase.</p>
<p>The fourth motif is somewhat more complex than others in this category (Figure
<xref ref-type="fig" rid="F3">3D</xref>
). There are ~640 representatives (338 unique sequences) in the marine metagenome in addition to that in the genome of '
<italic>Cand</italic>
. P. ubique' (Additional file
<xref ref-type="supplementary-material" rid="S11">11</xref>
). Its three-stem junction carries a well conserved stem (P2) that contains two bulged regions, one of which is highly conserved. Due to this conservation, none of the base pairs are supported by covariation and only a few by compatible mutations. The other two stems (P1 and P3) are only moderately conserved, and the loop of P3 is variable in length containing between 5 and 12 nucleotides with no strong conservation. The motif occurs between two hypothetical proteins. One (SAR11_0635) is annotated as both an SOS-mediated transcriptional repressor and an S24-like peptidase depending on the database, and the other (SAR11_0636) is annotated as a SOUL heme-binding protein. In the metagenomic data, neither of these associations is strictly conserved and the annotated genes on either side vary widely. The genes annotated directly 5' to the motif are typically syntenous with those in '
<italic>Cand</italic>
. P. ubique' (i.e. predicted glycoyltransferase, SAR11_0633). The genes annotated directly 3' of the motif show even greater variation and do not seem to be syntenous with the '
<italic>Cand</italic>
. P. ubique' genome. Based on these observations, it seems likely that the RNA is not a
<italic>cis</italic>
-regulatory element, but rather could be a separately transcribed non-coding RNA.</p>
<p>Microarray studies show that transcripts for all of these genes, although not necessarily any untranslated regions, are present in '
<italic>Cand</italic>
. P. ubique' during both exponential growth and stationary phase cells. Interestingly, comparison of microarray and quantitative proteomic data (unpublished data) for
<italic>pntA </italic>
shows a ~300% increase in protein as cells enter stationary phase, starkly contrasting the corresponding 9% decease in transcript levels. This disparity between transcript and protein expression provides further evidence for post-transcriptional regulation of the gene. Unfortunately, proteomic data are not available for RhtB and BabIM (not included in the AMT-tag library), and SAR11_0636 was never observed in the proteomic dataset, so direct comparisons are not possible for these genes.</p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>In this study we identified structured RNAs that are conserved in both the genome of '
<italic>Cand</italic>
. P. ubique' and the marine metagenomic datasets. A few these RNAs were assigned to previously-known classes, while this is the first description of others. Our work differs from other surveys of ncRNAs in the metagenome [
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
] in that we did not seek to identify additional examples of known motifs, but rather we sought to discover motifs not previously described. We identified three likely
<italic>cis</italic>
-regulatory protein binding motifs and a new riboswitch class, and our approach is validated by the confirmed biological function for two of the four motifs (
<italic>rpsB </italic>
motif and SAM-V riboswitch). In addition to these four RNA
<italic>cis</italic>
-regulatory elements, we also describe a series of motifs for which there is less evidence of RNA structure. While these RNA motifs are less well-supported by compatible and covarying mutations than the others we present, the structures are credible given the number of representatives identified, the degree of sequence conservation, and the thermodynamics of RNA folding.</p>
<p>There are many additional IGRs in '
<italic>Cand</italic>
. P. ubique' that contain a high percentage GC and seem highly conserved (Table
<xref ref-type="table" rid="T1">1</xref>
), yet have no discernable RNA structure. For some of these IGRs, the large number of BLAST hits is the result of many different short aligned sections of high identity within the IGR (e.g. the IGR between SAR11_0641 and SAR11_0642). By contrast, in the IGRs where we identified convincing structured RNAs there is typically a longer region of alignment with mutations distributed throughout. For several other IGRs there are a large number of BLAST hits that align but form no detectable RNA structure (e.g. the IGR between SAR11_0037 and SAR11_0038). These regions may contain RNAs that are not extensively structured (e.g. antisense RNAs that base pair to target RNAs) [
<xref ref-type="bibr" rid="B78">78</xref>
], or perhaps they are conserved protein binding sites that act at the level of DNA.</p>
<p>The parameters we used to identify IGRs for inspection were based on the properties of previously annotated RNAs and were designed to capture most structured RNAs. However, one IGR containing a known structured RNA does not meet our parameters for inspection. The IGR containing a SAM-II riboswitch preceding
<italic>metX </italic>
(SAR11_0217) failed to rank highly based on GC-enrichment. The IGR containing this riboswitch is 191 nucleotides long and 22.5% GC (ranked 121
<sup>st </sup>
in the genome based on Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
), significantly below where we arbitrarily stopped examining IGRs due to the decreasing number of convincing BLAST matches (Table
<xref ref-type="table" rid="T1">1</xref>
). However, the SAM-II aptamer alone is 70 nucleotides long and 30% GC. An early investigation of the '
<italic>Cand</italic>
. P. ubique' genome did explore ranking the IGRs by the highest percent GC within a "sliding window" of 50 nucleotides [
<xref ref-type="bibr" rid="B19">19</xref>
]. However, this did not change the rankings of '
<italic>Cand</italic>
. P. ubique' IGRs significantly (R
<sup>2 </sup>
= 0.84, Additional file
<xref ref-type="supplementary-material" rid="S11">11</xref>
). Thus, this additional level of complexity was not implemented for the final analysis.</p>
<p>In contrast to other computational genomics studies [
<xref ref-type="bibr" rid="B3">3</xref>
], we identified relatively few candidate RNAs. This is likely because there is relatively little to find in '
<italic>Cand</italic>
. P. ubique' compared with organisms that have larger genomes. The genome of '
<italic>Cand</italic>
. P. ubique' is hypothesized to be streamlined to minimize nutrient use [
<xref ref-type="bibr" rid="B14">14</xref>
,
<xref ref-type="bibr" rid="B79">79</xref>
]. Even the strong AT bias may reflect adaptation to nitrogen limitation in a nutrient poor environment because GC pairs require an additional nitrogen compared to AT base pairs. A survey examining lengths of the RNase P RNA, SRP RNA, TPP and glycine riboswitches in '
<italic>Cand</italic>
. P. ubique' compared with those in other Alphaproteobacteria showed that RNAs in '
<italic>Cand</italic>
. P. ubique' have tendency toward fewer nucleotides (Additional file
<xref ref-type="supplementary-material" rid="S12">12</xref>
). On average they are greater than one standard deviation lower than the mean for a given RNA (average Z-value of -1.12). While this result is not statistically significant, the motifs identified here further reflect this tendency. The S2 motif identified in '
<italic>Cand</italic>
. P. ubique' is among the shortest with an exceedingly short P2 stem (3 bp) and no P3 stem. The presence of RNA-based regulatory motifs in '
<italic>Cand</italic>
. P. ubique' indicates that such mechanisms can be an effective use of scarce resources, and the smaller RNAs likely reflect pressure to decrease the number of nucleotides at both the DNA and RNA level. Interestingly ribosomal RNAs and tRNAs both showed less variation in length among Alphaproteobacteria than other structured RNAs, as well as less or no evidence of reduction in '
<italic>Cand</italic>
. P. ubique' suggesting that it is difficult to alter RNAs with functions critical for survival.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>This study increased the number of candidate structured RNAs in both '
<italic>Cand</italic>
. P. ubique' and the marine metagenome. Several of the RNAs discovered have wide phylogenetic distribution, while others can only be found through examination of metagenomic data. The combination of computational approaches used in this work is relatively simple and in principle might be applied to any organisms with similar properties. This work also underscores how single completed genomes that are carefully annotated are important components in the effort toward annotating and understanding the vast amount metagenomic data available.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Identification of candidate RNA motifs</title>
<p>Non-protein coding segments of the '
<italic>Cand</italic>
. P. ubique' genome (RefSeq accession number NC_007205.1) were computationally identified based on the RefSeq version 25 gene annotations and their sequences extracted [
<xref ref-type="bibr" rid="B80">80</xref>
]. The size and percent GC values for these regions were established. Individual sequences annotated as harboring a structured ncRNA according to the Rfam database (version 8.1) were identified [
<xref ref-type="bibr" rid="B24">24</xref>
]. Two additional sequences containing tRNAs were identified from the RefSeq annotation of the '
<italic>Cand</italic>
. P. ubique' genome, and the riboswitches were located based on alignments maintained through periodic homology searches [
<xref ref-type="bibr" rid="B81">81</xref>
].</p>
<p>As all known structured RNAs in '
<italic>Cand</italic>
. P. ubique' are present in IGRs longer than 100 bp (Fig.
<xref ref-type="fig" rid="F1">1</xref>
), we used 100 bp as the minimum size requirement for the IGRs we examined. The conservation level for each IGR was determined by the number of hits returned with an E-value less than 10
<sup>-5 </sup>
from a nucleotide BLAST analysis of the IGR against the "GOS: All Metagenomic Sequence Reads" database maintained at the CAMERA website [
<xref ref-type="bibr" rid="B20">20</xref>
]. IGRs not well-conserved in metagenomic sequence data (less than 200 blast hits) were removed from consideration. The remaining IGRs were screened for the presence of unannotated protein coding regions first through BLASTX and subsequently TBLASTX searches of the NCBI nr and nr/nt databases
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi"></ext-link>
and TBLASTN searches of the "All Metagenomic Sequence Reads" CAMERA database. Those sequences containing a conserved protein coding region (Additional File
<xref ref-type="supplementary-material" rid="S2">2</xref>
) were excluded from further analysis.</p>
<p>For the remaining IGRs, all blast matches from the conservation analysis were collected and the sequences extended to match the length of the IGR, or to the end of the sequence read (average trimmed sequence read is 822 bp in length [
<xref ref-type="bibr" rid="B13">13</xref>
]). This collection of sequences was then used as input for CMFinder version 0.2 [
<xref ref-type="bibr" rid="B34">34</xref>
] which created multiple sequence alignments with putative conserved secondary structures. These alignments were manually examined for features indicative of a structured RNA such as extent of covariation within predicted stems and conservation in areas outside base-paired regions. For most IGRs, several alternative structures were initially chosen for further analysis due to the high level of conservation in the sequences.</p>
<p>The alignments and hypothetical secondary structures were used to search for additional homologs in the RefSeq25 database [
<xref ref-type="bibr" rid="B80">80</xref>
] along with metagenome sequences from acid mine drainage [
<xref ref-type="bibr" rid="B82">82</xref>
], soil and whale fall [
<xref ref-type="bibr" rid="B83">83</xref>
], human gut [
<xref ref-type="bibr" rid="B84">84</xref>
,
<xref ref-type="bibr" rid="B85">85</xref>
], mouse gut [
<xref ref-type="bibr" rid="B86">86</xref>
], gutless sea worms [
<xref ref-type="bibr" rid="B87">87</xref>
], sludge [
<xref ref-type="bibr" rid="B88">88</xref>
], Global Ocean Survey scaffolds [
<xref ref-type="bibr" rid="B12">12</xref>
,
<xref ref-type="bibr" rid="B13">13</xref>
], other marine sequences [
<xref ref-type="bibr" rid="B89">89</xref>
] and termite hindgut [
<xref ref-type="bibr" rid="B90">90</xref>
].</p>
<p>Homology searches were performed using R
<sc>AVE</sc>
N
<sc>N</sc>
A version 0.2f, essentially as described previously [
<xref ref-type="bibr" rid="B35">35</xref>
,
<xref ref-type="bibr" rid="B36">36</xref>
,
<xref ref-type="bibr" rid="B91">91</xref>
,
<xref ref-type="bibr" rid="B92">92</xref>
]. For each IGR, homologs resulting from these searches were used in conjunction with the original sequences as the starting input for a second CMFinder search and the homology search process was iterated to derive a single structure, or in cases of predicted pseudoknot interactions two compatible structures, supported by the alignment.</p>
</sec>
<sec>
<title>Analysis of motifs</title>
<p>The alignments of IGRs where convincing RNA structure could be identified were manually edited by RALEE [
<xref ref-type="bibr" rid="B93">93</xref>
]. We used RNAshapes [
<xref ref-type="bibr" rid="B94">94</xref>
], CMFinder [
<xref ref-type="bibr" rid="B34">34</xref>
] and R
<sc>AVE</sc>
N
<sc>N</sc>
A [
<xref ref-type="bibr" rid="B36">36</xref>
] during these analyses. Additional homology searches were conducted using the R
<sc>AVE</sc>
N
<sc>N</sc>
A '-local' and '-global' command line options with the microbial subset of RefSeq version 25, and the metagenomic sequence databases described above. As the full RefSeq database is 3,717,469,431 nucleotides and the combined metagenomic databases total 5,529,658,033 nucleotides, several subset databases (Proteobacteria, Alphaproteobacteria, Bacteroidetes, Additional File
<xref ref-type="supplementary-material" rid="S2">2</xref>
and Global Ocean Survey Scaffolds) were used to reduce the number of false positive hits. Local searches tended to have greater success identifying homologs of motifs with variable length or optional stems.</p>
<p>For the genome context annotations, protein-coding genes were assembled from the annotations in RefSeq and from "predicted proteins" [
<xref ref-type="bibr" rid="B5">5</xref>
] in Global Ocean Survey sequences or annonatated genes in IMG/M [
<xref ref-type="bibr" rid="B95">95</xref>
]. However, sequences from three metagenome projects [
<xref ref-type="bibr" rid="B85">85</xref>
,
<xref ref-type="bibr" rid="B89">89</xref>
,
<xref ref-type="bibr" rid="B90">90</xref>
] were extracted from GenBank and genes were predicted using the MetaGene program (dated Oct. 12, 2006) with default parameters [
<xref ref-type="bibr" rid="B96">96</xref>
]. Conserved protein domains were detected using the Conserved Domain Database version 2.08 [
<xref ref-type="bibr" rid="B97">97</xref>
].</p>
<p>The extent of covariation and conservation of sequences reflected in consensus diagrams (e.g. Figure
<xref ref-type="fig" rid="F2">2</xref>
) was determined as previously described [
<xref ref-type="bibr" rid="B92">92</xref>
]. Sequences were weighted to de-emphasize highly similar homologs using the GSC algorithm [
<xref ref-type="bibr" rid="B98">98</xref>
] implemented by Infernal [
<xref ref-type="bibr" rid="B35">35</xref>
]. Base pairs where both positions in the sequence alignment varied among sequences while maintaining Watson-Crick or G-U wobble base pairing were classified as covarying. Base pairs where a single position varied were classified as compatible mutations. If the frequency of non-Watson-Crick or G-U pairs exceeded 5%, no covariation or compatible mutation was annotated.</p>
</sec>
</sec>
<sec>
<title>List of abbreviations</title>
<p>IGR: intergenic region; ncRNA: noncoding RNA; GOS: Global Oceanographic Survey; CAMERA: Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis; SRP: signal recognition particle; UTR: untranslated region; SAM:
<italic>S</italic>
-adenosylmethionine; bp: base pair.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>MMM conceived and designed the study, executed bioinformatics searches, analyzed the data, and drafted the manuscript. TDA participated in the design of the study and provided bioinformatics infrastructure. DPS conceived the study, performed proteomics searches, and revised the manuscript. ZW provided bioinformatics infrastructure and reviewed motif analysis. MSS conceived the study and revised the manuscript. SJG conceived the study and revised the manuscript. RRB participated in the design of the study, reviewed motif analysis, and revised the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>All '
<italic>Cand</italic>
. P. ubique' IGRs greater than 100 bp</bold>
. A list of all intergenic regions in '
<italic>Cand</italic>
. P. ubique' longer than 100 bp with the length, GC content and annotated RNAs indicated.</p>
</caption>
<media xlink:href="1471-2164-10-268-S1.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2</title>
<p>
<bold>Misannotated protein coding regions identified</bold>
. A list of likely misannotated protein coding regions identified in the course of this study.</p>
</caption>
<media xlink:href="1471-2164-10-268-S2.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3</title>
<p>
<bold>IGR ranking by %GC and sliding window %GC</bold>
. Comparison of ranking IGRs by %GC and an alternative ranking methodology based on a sliding window of 50 nucleotides.</p>
</caption>
<media xlink:href="1471-2164-10-268-S3.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional file 4</title>
<p>
<bold>rpsB alignment</bold>
. Text file containing Stockholm alignment of the
<italic>rpsB </italic>
motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S4.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional file 5</title>
<p>
<bold>rpsL alignment</bold>
. Text file containing Stockholm alignment of the
<italic>rpsL </italic>
motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S5.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional file 6</title>
<p>
<bold>ffh alignment</bold>
. Text file containing Stockholm alignment of the
<italic>ffh </italic>
motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S6.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S7">
<caption>
<title>Additional file 7</title>
<p>
<bold>SAMV alignment</bold>
. Text file containing Stockholm alignment of the SAM-V motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S7.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S8">
<caption>
<title>Additional file 8</title>
<p>
<bold>rhtb alignment</bold>
. Text file containing Stockholm alignment of the
<italic>rhtb </italic>
motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S8.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S9">
<caption>
<title>Additional file 9</title>
<p>
<bold>pntA alignment</bold>
. Text file containing Stockholm alignment of the
<italic>pntA </italic>
motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S9.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S10">
<caption>
<title>Additional file 10</title>
<p>
<bold>bablM alignment</bold>
. Text file containing Stockholm alignment of the
<italic>bablM </italic>
motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S10.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S11">
<caption>
<title>Additional file 11</title>
<p>
<bold>SAR11_0636 alignment</bold>
. Text file containing Stockholm alignment of the SAR11_0636 motif, may be viewed in any text editor including XEmacs with the RALEE extension, or MS-wordpad.</p>
</caption>
<media xlink:href="1471-2164-10-268-S11.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S12">
<caption>
<title>Additional file 12</title>
<p>
<bold>RNA motifs from Alphaproteobacteria ordered by length</bold>
. Glycine riboswitch, TPP riboswitch, SRP, and RNaseP RNAs from Alphaproteobacteria ordered by length.</p>
</caption>
<media xlink:href="1471-2164-10-268-S12.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>We thank Dr. Ming Chen Hammond for helpful discussions, N. Carriero and R. Bjornson for assisting our use of the Yale Life Sciences High Performance Computing Center (NIH grant RR19895-02), and the Pacific Northwest National Laboratory for the quantitative proteomic analysis. The work reported here was supported in part by NIH award U54AI57158 (Northeast Biodefense Center – Lipkin). M.M.M. is supported by an NIH NRSA (F32GM079974) and the Breaker Lab also receives support from the Howard Hughes Medical Institute. Portions of this work were also supported by a Marine Microbiology Initiative investigator award from the Gordon and Betty Moore Foundation.</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huttenhofer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vogel</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Experimental approaches to identify non-coding RNAs</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<fpage>635</fpage>
<lpage>646</lpage>
<pub-id pub-id-type="pmid">16436800</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altuvia</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Identification of bacterial small non-coding RNAs: experimental approaches</article-title>
<source>Curr Opin Microbiol</source>
<year>2007</year>
<volume>10</volume>
<fpage>257</fpage>
<lpage>261</lpage>
<pub-id pub-id-type="pmid">17553733</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Computational genomics of noncoding RNA genes</article-title>
<source>Cell</source>
<year>2002</year>
<volume>109</volume>
<fpage>137</fpage>
<lpage>140</lpage>
<pub-id pub-id-type="pmid">12007398</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jossinet</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Ludwig</surname>
<given-names>TE</given-names>
</name>
<name>
<surname>Westhof</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>RNA structure: bioinformatic analysis</article-title>
<source>Curr Opin Microbiol</source>
<year>2007</year>
<volume>10</volume>
<fpage>279</fpage>
<lpage>285</lpage>
<pub-id pub-id-type="pmid">17548241</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yooseph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Remington</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Manning</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Jaroszewski</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Cieplak</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>CS</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Mashiyama</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Joachimiak</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>van Belle</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chandonia</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Soergel</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Zhai</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Natarajan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Raphael</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Bafna</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Eisenberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dixon</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Strausberg</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Frazier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Venter</surname>
<given-names>JC</given-names>
</name>
</person-group>
<article-title>The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families</article-title>
<source>PLoS Biol</source>
<year>2007</year>
<volume>5</volume>
<fpage>e16</fpage>
<pub-id pub-id-type="pmid">17355171</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kazanov</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Vitreschak</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Gelfand</surname>
<given-names>MS</given-names>
</name>
</person-group>
<article-title>Abundance and functional diversity of riboswitches in microbial communities</article-title>
<source>BMC Genomics</source>
<year>2007</year>
<volume>8</volume>
<fpage>347</fpage>
<pub-id pub-id-type="pmid">17908319</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Pulukkunat</surname>
<given-names>DK</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Deciphering RNA structural diversity and systematic phylogeny from microbial metagenomes</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>2283</fpage>
<lpage>2294</lpage>
<pub-id pub-id-type="pmid">17389640</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mavromatis</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Goltsman</surname>
<given-names>E</given-names>
</name>
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Rigoutsos</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Salamov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Korzeniewski</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Land</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lapidus</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Grigoriev</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
</person-group>
<article-title>Use of simulated data sets to evaluate the fidelity of metagenomic processing methods</article-title>
<source>Nat Methods</source>
<year>2007</year>
<volume>4</volume>
<fpage>495</fpage>
<lpage>500</lpage>
<pub-id pub-id-type="pmid">17468765</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Foerstner</surname>
<given-names>KU</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Get the most out of your metagenome: computational analysis of environmental sequence data</article-title>
<source>Curr Opin Microbiol</source>
<year>2007</year>
<volume>10</volume>
<fpage>490</fpage>
<lpage>498</lpage>
<pub-id pub-id-type="pmid">17936679</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yooseph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>182</fpage>
<pub-id pub-id-type="pmid">18402669</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Morris</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Rappe</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Connon</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Vergin</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Siebold</surname>
<given-names>WA</given-names>
</name>
<name>
<surname>Carlson</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>SAR11 clade dominates ocean surface bacterioplankton communities</article-title>
<source>Nature</source>
<year>2002</year>
<volume>420</volume>
<fpage>806</fpage>
<lpage>810</lpage>
<pub-id pub-id-type="pmid">12490947</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Venter</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Remington</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Paulsen</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Fouts</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Hoffman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Parsons</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Baden-Tillson</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Pfannkoch</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>HO</given-names>
</name>
</person-group>
<article-title>Environmental genome shotgun sequencing of the Sargasso Sea</article-title>
<source>Science</source>
<year>2004</year>
<volume>304</volume>
<fpage>66</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="pmid">15001713</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rusch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yooseph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Hoffman</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Remington</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Beeson</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Baden-Tillson</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Thorpe</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Freeman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Andres-Pfannkoch</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Venter</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kravitz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>heidelberg</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Utterback</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Falcón</surname>
<given-names>LI</given-names>
</name>
<name>
<surname>Souza</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Bonilla-Rosso</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Equiarte</surname>
<given-names>LE</given-names>
</name>
<name>
<surname>Karl</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Sathyendranath</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Platt</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bermingham</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Gallardo</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Tamayo-Castillo</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ferrari</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Strausberg</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Nealson</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Frazier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Venter</surname>
<given-names>JC</given-names>
</name>
</person-group>
<article-title>The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific</article-title>
<source>PLoS Biol</source>
<year>2007</year>
<volume>5</volume>
<fpage>e77</fpage>
<pub-id pub-id-type="pmid">17355176</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Tripp</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Givan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Podar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vergin</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Baptista</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bibbs</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Eads</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Noordewier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rappé</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Short</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Carrington</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Mathur</surname>
<given-names>EJ</given-names>
</name>
</person-group>
<article-title>Genome streamlining in a cosmopolitan oceanic bacterium</article-title>
<source>Science</source>
<year>2005</year>
<volume>309</volume>
<fpage>1242</fpage>
<lpage>1245</lpage>
<pub-id pub-id-type="pmid">16109880</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rivas</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Klein</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Computational identification of noncoding RNAs in
<italic>E. coli </italic>
by comparative genomics</article-title>
<source>Curr Biol</source>
<year>2001</year>
<volume>11</volume>
<fpage>1369</fpage>
<lpage>1373</lpage>
<pub-id pub-id-type="pmid">11553332</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klein</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Misulovin</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Noncoding RNA genes identified in AT-rich hyperthermophiles</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2002</year>
<volume>99</volume>
<fpage>7542</fpage>
<lpage>7547</lpage>
<pub-id pub-id-type="pmid">12032319</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Larsson</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hinas</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ardell</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Kirsebom</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Virtanen</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Soderbom</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>De novo search for non-coding RNA genes in the AT-rich genome of
<italic>Dictyostelium discoideum</italic>
: performance of Markov-dependent genome feature scoring</article-title>
<source>Genome Res</source>
<year>2008</year>
<volume>18</volume>
<fpage>888</fpage>
<lpage>899</lpage>
<pub-id pub-id-type="pmid">18347326</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schattner</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Searching for RNA genes using base-composition statistics</article-title>
<source>Nucleic Acids Res</source>
<year>2002</year>
<volume>30</volume>
<fpage>2076</fpage>
<lpage>2082</lpage>
<pub-id pub-id-type="pmid">11972348</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Upadhyay</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bawankar</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Malhotra</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Patankar</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>A screen for conserved sequences with biased base composition identifies noncoding RNAs in the A-T rich genome of
<italic>Plasmodium falciparum</italic>
</article-title>
<source>Mol Biochem Parasitol</source>
<year>2005</year>
<volume>144</volume>
<fpage>149</fpage>
<lpage>158</lpage>
<pub-id pub-id-type="pmid">16183147</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seshadri</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kravitz</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Smarr</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gilna</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Frazier</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>CAMERA: a community resource for metagenomics</article-title>
<source>PLoS Biol</source>
<year>2007</year>
<volume>5</volume>
<fpage>e75</fpage>
<pub-id pub-id-type="pmid">17355175</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pool</surname>
<given-names>MR</given-names>
</name>
</person-group>
<article-title>Signal recognition particles in chloroplasts, bacteria, yeast and mammals</article-title>
<source>Mol Membr Biol</source>
<year>2005</year>
<volume>22</volume>
<fpage>3</fpage>
<lpage>15</lpage>
<pub-id pub-id-type="pmid">16092520</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altman</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>A view of RNase P</article-title>
<source>Mol Biosyst</source>
<year>2007</year>
<volume>3</volume>
<fpage>604</fpage>
<lpage>607</lpage>
<pub-id pub-id-type="pmid">17700860</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kazantsev</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Pace</surname>
<given-names>NR</given-names>
</name>
</person-group>
<article-title>Bacterial RNase P: a new view of an ancient enzyme</article-title>
<source>Nat Rev Microbiol</source>
<year>2006</year>
<volume>4</volume>
<fpage>729</fpage>
<lpage>740</lpage>
<pub-id pub-id-type="pmid">16980936</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Griffiths-Jones</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Moxon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Khanna</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Bateman</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Rfam: annotating non-coding RNAs in complete genomes</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>D121</fpage>
<lpage>124</lpage>
<pub-id pub-id-type="pmid">15608160</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Winkler</surname>
<given-names>WC</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>Regulation of bacterial gene expression by riboswitches</article-title>
<source>Annu Rev Microbiol</source>
<year>2005</year>
<volume>59</volume>
<fpage>487</fpage>
<lpage>517</lpage>
<pub-id pub-id-type="pmid">16153177</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mandal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Barrick</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Weinberg</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Emilsson</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Ruzzo</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>A glycine-dependent riboswitch that uses cooperative binding to control gene expression</article-title>
<source>Science</source>
<year>2004</year>
<volume>306</volume>
<fpage>275</fpage>
<lpage>279</lpage>
<pub-id pub-id-type="pmid">15472076</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tripp</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Schwalbach</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Kitner</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>Unique glycine-activated riboswitch linked to glycine-serine auxotrophy in SAR11</article-title>
<source>Env Microbiol</source>
<year>2009</year>
<volume>11</volume>
<fpage>230</fpage>
<lpage>238</lpage>
<pub-id pub-id-type="pmid">19125817</pub-id>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corbino</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Barrick</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Welz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tucker</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Puskarz</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Mandal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rudnick</surname>
<given-names>ND</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>Evidence for a second class of
<italic>S</italic>
-adenosylmethionine riboswitches and other regulatory RNA motifs in alpha-proteobacteria</article-title>
<source>Genome Biol</source>
<year>2005</year>
<volume>6</volume>
<fpage>R70</fpage>
<pub-id pub-id-type="pmid">16086852</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodionov</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Vitreschak</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Mironov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Gelfand</surname>
<given-names>MS</given-names>
</name>
</person-group>
<article-title>Comparative genomics of thiamin biosynthesis in procaryotes. New genes and regulatory mechanisms</article-title>
<source>J Biol Chem</source>
<year>2002</year>
<volume>277</volume>
<fpage>48949</fpage>
<lpage>48959</lpage>
<pub-id pub-id-type="pmid">12376536</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Winkler</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Nahvi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression</article-title>
<source>Nature</source>
<year>2002</year>
<volume>419</volume>
<fpage>952</fpage>
<lpage>956</lpage>
<pub-id pub-id-type="pmid">12410317</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moore</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Sauer</surname>
<given-names>RT</given-names>
</name>
</person-group>
<article-title>The tmRNA system for translational surveillance and ribosome rescue</article-title>
<source>Annu Rev Biochem</source>
<year>2007</year>
<volume>76</volume>
<fpage>101</fpage>
<lpage>124</lpage>
<pub-id pub-id-type="pmid">17291191</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keiler</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>KP</given-names>
</name>
</person-group>
<article-title>tmRNAs that encode proteolysis-inducing tags are found in all known bacterial genomes: A two-piece tmRNA functions in
<italic>Caulobacter</italic>
</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2000</year>
<volume>97</volume>
<fpage>7778</fpage>
<lpage>7783</lpage>
<pub-id pub-id-type="pmid">10884408</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zwieb</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gorodkin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Knudsen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Burks</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wower</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>tmRDB (tmRNA database)</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>446</fpage>
<lpage>447</lpage>
<pub-id pub-id-type="pmid">12520048</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yao</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Weinberg</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Ruzzo</surname>
<given-names>WL</given-names>
</name>
</person-group>
<article-title>CMfinder–a covariance model based RNA motif finding algorithm</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>445</fpage>
<lpage>452</lpage>
<pub-id pub-id-type="pmid">16357030</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Infernal Users Guide</article-title>
<year>2009</year>
<ext-link ext-link-type="uri" xlink:href="ftp://selab.janelia.org/pub/software/infernal/Userguide.pdf"></ext-link>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weinberg</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Ruzzo</surname>
<given-names>WL</given-names>
</name>
</person-group>
<article-title>Sequence-based heuristics for faster annotation of non-coding RNA families</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>35</fpage>
<lpage>39</lpage>
<pub-id pub-id-type="pmid">16267089</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>An</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bendiak</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Mamelak</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Friesen</surname>
<given-names>JD</given-names>
</name>
</person-group>
<article-title>Organization and nucleotide sequence of a new ribosomal operon in
<italic>Escherichia coli </italic>
containing the genes for ribosomal protein S2 and elongation factor Ts</article-title>
<source>Nucleic Acids Res</source>
<year>1981</year>
<volume>9</volume>
<fpage>4163</fpage>
<lpage>4172</lpage>
<pub-id pub-id-type="pmid">6272196</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zengel</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Lindahl</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Diverse mechanisms for regulating ribosomal protein synthesis in
<italic>Escherichia coli</italic>
</article-title>
<source>Prog Nucleic Acid Res Mol Biol</source>
<year>1994</year>
<volume>47</volume>
<fpage>331</fpage>
<lpage>370</lpage>
<pub-id pub-id-type="pmid">7517053</pub-id>
</citation>
</ref>
<ref id="B39">
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Gourse</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Sharrock</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Nomura</surname>
<given-names>M</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Hardesty B, Kramer G</surname>
</name>
</person-group>
<article-title>Control of ribosome synthesis in
<italic>Escherichia coli</italic>
</article-title>
<source>Structure, function and genetics of ribosomes</source>
<year>1986</year>
<publisher-name>New York: Springer</publisher-name>
<fpage>766</fpage>
<lpage>788</lpage>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yates</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Nomura</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>
<italic>E. coli </italic>
ribosomal protein L4 is a feedback regulatory protein</article-title>
<source>Cell</source>
<year>1980</year>
<volume>21</volume>
<fpage>517</fpage>
<lpage>522</lpage>
<pub-id pub-id-type="pmid">6996835</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zengel</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Mueckl</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lindahl</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Protein L4 of the
<italic>E. coli </italic>
ribosome regulates an eleven gene r protein operon</article-title>
<source>Cell</source>
<year>1980</year>
<volume>21</volume>
<fpage>523</fpage>
<lpage>535</lpage>
<pub-id pub-id-type="pmid">6157482</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnsen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Christensen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Dennis</surname>
<given-names>PP</given-names>
</name>
<name>
<surname>Fiil</surname>
<given-names>NP</given-names>
</name>
</person-group>
<article-title>Autogenous control: ribosomal protein L10-L12 complex binds to the leader sequence of its mRNA</article-title>
<source>Embo J</source>
<year>1982</year>
<volume>1</volume>
<fpage>999</fpage>
<lpage>1004</lpage>
<pub-id pub-id-type="pmid">6765237</pub-id>
</citation>
</ref>
<ref id="B43">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guillier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Allemand</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Raibaud</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Dardel</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Springer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chiaruttini</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Translational feedback regulation of the gene for L35 in
<italic>Escherichia coli </italic>
requires binding of ribosomal protein L20 to two sites in its leader mRNA: a possible case of ribosomal RNA-messenger RNA molecular mimicry</article-title>
<source>RNA</source>
<year>2002</year>
<volume>8</volume>
<fpage>878</fpage>
<lpage>889</lpage>
<pub-id pub-id-type="pmid">12166643</pub-id>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jinks-Robertson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nomura</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Ribosomal protein S4 acts in trans as a translational repressor to regulate expression of the alpha operon in
<italic>Escherichia coli</italic>
</article-title>
<source>J Bacteriol</source>
<year>1982</year>
<volume>151</volume>
<fpage>193</fpage>
<lpage>202</lpage>
<pub-id pub-id-type="pmid">6211432</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grundy</surname>
<given-names>FJ</given-names>
</name>
<name>
<surname>Henkin</surname>
<given-names>TM</given-names>
</name>
</person-group>
<article-title>The rpsD gene, encoding ribosomal protein S4, is autogenously regulated in
<italic>Bacillus subtilis</italic>
</article-title>
<source>J Bacteriol</source>
<year>1991</year>
<volume>173</volume>
<fpage>4595</fpage>
<lpage>4602</lpage>
<pub-id pub-id-type="pmid">1906866</pub-id>
</citation>
</ref>
<ref id="B46">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saito</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mattheakis</surname>
<given-names>LC</given-names>
</name>
<name>
<surname>Nomura</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Post-transcriptional regulation of the str operon in
<italic>Escherichia coli</italic>
. Ribosomal protein S7 inhibits coupled translation of S7 but not its independent translation</article-title>
<source>J Mol Biol</source>
<year>1994</year>
<volume>235</volume>
<fpage>111</fpage>
<lpage>124</lpage>
<pub-id pub-id-type="pmid">7507167</pub-id>
</citation>
</ref>
<ref id="B47">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cerretti</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Mattheakis</surname>
<given-names>LC</given-names>
</name>
<name>
<surname>Kearney</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Vu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Nomura</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Translational regulation of the
<italic>spc </italic>
operon in
<italic>Escherichia coli</italic>
. Identification and structural analysis of the target site for S8 repressor protein</article-title>
<source>J Mol Biol</source>
<year>1988</year>
<volume>204</volume>
<fpage>309</fpage>
<lpage>329</lpage>
<pub-id pub-id-type="pmid">2464692</pub-id>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gregory</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Cahill</surname>
<given-names>PB</given-names>
</name>
<name>
<surname>Thurlow</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Zimmermann</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>Interaction of
<italic>Escherichia coli </italic>
ribosomal protein S8 with its binding sites in ribosomal RNA and messenger RNA</article-title>
<source>J Mol Biol</source>
<year>1988</year>
<volume>204</volume>
<fpage>295</fpage>
<lpage>307</lpage>
<pub-id pub-id-type="pmid">2464691</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Philippe</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Portier</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Mougel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grunberg-Manago</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ebel</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Ehresmann</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ehresmann</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Target site of
<italic>Escherichia coli </italic>
ribosomal protein S15 on its messenger RNA. Conformation and interaction with the protein</article-title>
<source>J Mol Biol</source>
<year>1990</year>
<volume>211</volume>
<fpage>415</fpage>
<lpage>426</lpage>
<pub-id pub-id-type="pmid">2407855</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tchufistova</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>Komarova</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Boni</surname>
<given-names>IV</given-names>
</name>
</person-group>
<article-title>A key role for the mRNA leader structure in translational control of ribosomal protein S1 synthesis in gamma-proteobacteria</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>6996</fpage>
<lpage>7002</lpage>
<pub-id pub-id-type="pmid">14627832</pub-id>
</citation>
</ref>
<ref id="B51">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lindahl</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Archer</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zengel</surname>
<given-names>JM</given-names>
</name>
</person-group>
<article-title>Transcription of the
<italic>S10 </italic>
ribosomal protein operon is regulated by an attenuator in the leader</article-title>
<source>Cell</source>
<year>1983</year>
<volume>33</volume>
<fpage>241</fpage>
<lpage>248</lpage>
<pub-id pub-id-type="pmid">6380754</pub-id>
</citation>
</ref>
<ref id="B52">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaczanowska</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ryden-Aulin</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Ribosome biogenesis and the translation process in
<italic>Escherichia coli</italic>
</article-title>
<source>Microbiol Mol Biol Rev</source>
<year>2007</year>
<volume>71</volume>
<fpage>477</fpage>
<lpage>494</lpage>
<pub-id pub-id-type="pmid">17804668</pub-id>
</citation>
</ref>
<ref id="B53">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szer</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Hermoso</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Leffler</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Ribosomal protein S1 and polypeptide chain initiation in bacteria</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1975</year>
<volume>72</volume>
<fpage>2325</fpage>
<lpage>2329</lpage>
<pub-id pub-id-type="pmid">1094462</pub-id>
</citation>
</ref>
<ref id="B54">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tedin</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Moll</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Grill</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Resch</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Graschopf</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gualerzi</surname>
<given-names>CO</given-names>
</name>
<name>
<surname>Blasi</surname>
<given-names>U</given-names>
</name>
</person-group>
<article-title>Translation initiation factor 3 antagonizes authentic start codon selection on leaderless mRNAs</article-title>
<source>Mol Microbiol</source>
<year>1999</year>
<volume>31</volume>
<fpage>67</fpage>
<lpage>77</lpage>
<pub-id pub-id-type="pmid">9987111</pub-id>
</citation>
</ref>
<ref id="B55">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brodersen</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Clemons</surname>
<given-names>WM</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Carter</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Wimberly</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Ramakrishnan</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Crystal structure of the 30 S ribosomal subunit from
<italic>Thermus thermophilus</italic>
: structure of the proteins and their interactions with 16 S RNA</article-title>
<source>J Mol Biol</source>
<year>2002</year>
<volume>316</volume>
<fpage>725</fpage>
<lpage>768</lpage>
<pub-id pub-id-type="pmid">11866529</pub-id>
</citation>
</ref>
<ref id="B56">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Merianos</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>PB</given-names>
</name>
</person-group>
<article-title>The structure of a ribosomal protein S8/
<italic>spc </italic>
operon mRNA complex</article-title>
<source>RNA</source>
<year>2004</year>
<volume>10</volume>
<fpage>954</fpage>
<lpage>964</lpage>
<pub-id pub-id-type="pmid">15146079</pub-id>
</citation>
</ref>
<ref id="B57">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nevskaya</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tishchenko</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gabdoulkhakov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nikonova</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Nikonov</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Nikulin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Platonova</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Garber</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nikonov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Piendl</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Ribosomal protein L1 recognizes the same specific structural motif in its target sites on the autoregulatory mRNA and 23S rRNA</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>478</fpage>
<lpage>485</lpage>
<pub-id pub-id-type="pmid">15659579</pub-id>
</citation>
</ref>
<ref id="B58">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Scott</surname>
<given-names>LG</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>JR</given-names>
</name>
</person-group>
<article-title>Interaction of the
<italic>Bacillus stearothermophilus </italic>
ribosomal protein S15 with its 5'-translational operator mRNA</article-title>
<source>J Mol Biol</source>
<year>2001</year>
<volume>314</volume>
<fpage>413</fpage>
<lpage>422</lpage>
<pub-id pub-id-type="pmid">11846555</pub-id>
</citation>
</ref>
<ref id="B59">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stelzl</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Zengel</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Tovbina</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Walker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nierhaus</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Lindahl</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>RNA-structural mimicry in
<italic>Escherichia coli </italic>
ribosomal protein L4-dependent regulation of the S10 operon</article-title>
<source>J Biol Chem</source>
<year>2003</year>
<volume>278</volume>
<fpage>28237</fpage>
<lpage>28245</lpage>
<pub-id pub-id-type="pmid">12738792</pub-id>
</citation>
</ref>
<ref id="B60">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hershberg</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Altuvia</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Margalit</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>A survey of small RNA-encoding genes in
<italic>Escherichia coli</italic>
</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>1813</fpage>
<lpage>1820</lpage>
<pub-id pub-id-type="pmid">12654996</pub-id>
</citation>
</ref>
<ref id="B61">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tjaden</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Saxena</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Stolyar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Haynor</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Kolker</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Rosenow</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Transcriptome analysis of
<italic>Escherichia coli </italic>
using high-density oligonucleotide probe arrays</article-title>
<source>Nucleic Acids Res</source>
<year>2002</year>
<volume>30</volume>
<fpage>3732</fpage>
<lpage>3738</lpage>
<pub-id pub-id-type="pmid">12202758</pub-id>
</citation>
</ref>
<ref id="B62">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aseev</surname>
<given-names>LV</given-names>
</name>
<name>
<surname>Levandovskaya</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Tchufistova</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>Scaptsova</surname>
<given-names>NV</given-names>
</name>
<name>
<surname>Boni</surname>
<given-names>IV</given-names>
</name>
</person-group>
<article-title>A new regulatory circuit in ribosomal protein operons: S2-mediated control of the rpsB-tsf expression in vivo</article-title>
<source>RNA</source>
<year>2008</year>
<volume>14</volume>
<fpage>1882</fpage>
<lpage>1894</lpage>
<pub-id pub-id-type="pmid">18648071</pub-id>
</citation>
</ref>
<ref id="B63">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frias-Lopez</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Coleman</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Delong</surname>
<given-names>EF</given-names>
</name>
</person-group>
<article-title>Microbial community gene expression in ocean surface waters</article-title>
<source>Proc Nat Acad Sci USA</source>
<year>2008</year>
<volume>105</volume>
<fpage>3805</fpage>
<lpage>3810</lpage>
<pub-id pub-id-type="pmid">18316740</pub-id>
</citation>
</ref>
<ref id="B64">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Delong</surname>
<given-names>EF</given-names>
</name>
</person-group>
<article-title>Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column</article-title>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<fpage>266</fpage>
<lpage>269</lpage>
<pub-id pub-id-type="pmid">19444216</pub-id>
</citation>
</ref>
<ref id="B65">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Batey</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Rambo</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>Lucast</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Rha</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Doudna</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>Crystal structure of the ribonucleoprotein core of the signal recognition particle</article-title>
<source>Science</source>
<year>2000</year>
<volume>287</volume>
<fpage>1232</fpage>
<lpage>1239</lpage>
<pub-id pub-id-type="pmid">10678824</pub-id>
</citation>
</ref>
<ref id="B66">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Egea</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Shan</surname>
<given-names>SO</given-names>
</name>
<name>
<surname>Napetschnig</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Savage</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Walter</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Stroud</surname>
<given-names>RM</given-names>
</name>
</person-group>
<article-title>Substrate twinning activates the signal recognition particle and its receptor</article-title>
<source>Nature</source>
<year>2004</year>
<volume>427</volume>
<fpage>215</fpage>
<lpage>221</lpage>
<pub-id pub-id-type="pmid">14724630</pub-id>
</citation>
</ref>
<ref id="B67">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Focia</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Shepotinovskaya</surname>
<given-names>IV</given-names>
</name>
<name>
<surname>Seidler</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Freymann</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>Heterodimeric GTPase core of the SRP targeting complex</article-title>
<source>Science</source>
<year>2004</year>
<volume>303</volume>
<fpage>373</fpage>
<lpage>377</lpage>
<pub-id pub-id-type="pmid">14726591</pub-id>
</citation>
</ref>
<ref id="B68">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Doudna</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Batey</surname>
<given-names>RT</given-names>
</name>
</person-group>
<article-title>Structural insights into the signal recognition particle</article-title>
<source>Annu Rev Biochem</source>
<year>2004</year>
<volume>73</volume>
<fpage>539</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="pmid">15189152</pub-id>
</citation>
</ref>
<ref id="B69">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jensen</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Concentrations of 4.5S RNA and Ffh protein in
<italic>Escherichia coli</italic>
: the stability of Ffh protein is dependent on the concentration of 4.5S RNA</article-title>
<source>J Bacteriol</source>
<year>1994</year>
<volume>176</volume>
<fpage>7148</fpage>
<lpage>7154</lpage>
<pub-id pub-id-type="pmid">7525539</pub-id>
</citation>
</ref>
<ref id="B70">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jensen</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Effect of 4.5S RNA depletion on
<italic>Escherichia coli </italic>
protein synthesis and secretion</article-title>
<source>J Bacteriol</source>
<year>1994</year>
<volume>176</volume>
<fpage>2502</fpage>
<lpage>2506</lpage>
<pub-id pub-id-type="pmid">7513325</pub-id>
</citation>
</ref>
<ref id="B71">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Dalbey</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Phillips</surname>
<given-names>GJ</given-names>
</name>
</person-group>
<article-title>Functional analysis of the signal recognition particle in
<italic>Escherichia coli </italic>
by characterization of a temperature-sensitive ffh mutant</article-title>
<source>J Bacteriol</source>
<year>2002</year>
<volume>184</volume>
<fpage>2642</fpage>
<lpage>2653</lpage>
<pub-id pub-id-type="pmid">11976293</pub-id>
</citation>
</ref>
<ref id="B72">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Batey</surname>
<given-names>RT</given-names>
</name>
</person-group>
<article-title>Structures of regulatory elements in mRNAs</article-title>
<source>Curr Opin Struct Biol</source>
<year>2006</year>
<volume>16</volume>
<fpage>299</fpage>
<lpage>306</lpage>
<pub-id pub-id-type="pmid">16707260</pub-id>
</citation>
</ref>
<ref id="B73">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sowell</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Norbeck</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Lipton</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Nicora</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Callister</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Barofsky</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Giovannoni</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>Proteomic analysis of stationary phase in the marine bacterium '
<italic>Candidatus </italic>
Pelagibacter ubique'</article-title>
<source>Appl Environ Microbiol</source>
<year>2008</year>
<volume>74</volume>
<fpage>4091</fpage>
<lpage>4100</lpage>
<pub-id pub-id-type="pmid">18469119</pub-id>
</citation>
</ref>
<ref id="B74">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>JX</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>Riboswitches that sense
<italic>S</italic>
-adenosylmethionine and
<italic>S</italic>
-adenosylhomocysteine</article-title>
<source>Biochem Cell Biol</source>
<year>2008</year>
<volume>86</volume>
<fpage>157</fpage>
<lpage>168</lpage>
<pub-id pub-id-type="pmid">18443629</pub-id>
</citation>
</ref>
<ref id="B75">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gilbert</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Rambo</surname>
<given-names>RP</given-names>
</name>
<name>
<surname>Van Tyne</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Batey</surname>
<given-names>RT</given-names>
</name>
</person-group>
<article-title>Structure of the SAM-II riboswitch bound to
<italic>S</italic>
-adenosylmethionine</article-title>
<source>Nat Struct Mol Biol</source>
<year>2008</year>
<volume>15</volume>
<fpage>177</fpage>
<lpage>182</lpage>
<pub-id pub-id-type="pmid">18204466</pub-id>
</citation>
</ref>
<ref id="B76">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weinberg</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Regulski</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Hammond</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Barrick</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Yao</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Ruzzo</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>The aptamer core of SAM-IV riboswitches mimics the ligand-binding site of SAM-I riboswitches</article-title>
<source>RNA</source>
<year>2008</year>
<volume>14</volume>
<fpage>822</fpage>
<lpage>828</lpage>
<pub-id pub-id-type="pmid">18369181</pub-id>
</citation>
</ref>
<ref id="B77">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Henkin</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Yanofsky</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Regulation by transcription attenuation in bacteria: how RNA provides instructions for transcription termination/antitermination decisions</article-title>
<source>Bioessays</source>
<year>2002</year>
<volume>24</volume>
<fpage>700</fpage>
<lpage>707</lpage>
<pub-id pub-id-type="pmid">12210530</pub-id>
</citation>
</ref>
<ref id="B78">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aiba</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>Mechanism of RNA silencing by Hfq-binding small RNAs</article-title>
<source>Curr Opin Microbiol</source>
<year>2007</year>
<volume>10</volume>
<fpage>134</fpage>
<lpage>139</lpage>
<pub-id pub-id-type="pmid">17383928</pub-id>
</citation>
</ref>
<ref id="B79">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dufresne</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Garczarek</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Partensky</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Accelerated evolution associated with genome reduction in a free-living prokaryote</article-title>
<source>Genome Biol</source>
<year>2005</year>
<volume>6</volume>
<fpage>R14</fpage>
<pub-id pub-id-type="pmid">15693943</pub-id>
</citation>
</ref>
<ref id="B80">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pruitt</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Tatusova</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Maglott</surname>
<given-names>DR</given-names>
</name>
</person-group>
<article-title>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>D61</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="pmid">17130148</pub-id>
</citation>
</ref>
<ref id="B81">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barrick</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>The distributions, mechanisms, and structures of metabolite-binding riboswitches</article-title>
<source>Genome Biol</source>
<year>2007</year>
<volume>8</volume>
<fpage>R239</fpage>
<pub-id pub-id-type="pmid">17997835</pub-id>
</citation>
</ref>
<ref id="B82">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Ram</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Solovyev</surname>
<given-names>VV</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Rokhsar</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Banfield</surname>
<given-names>JF</given-names>
</name>
</person-group>
<article-title>Community structure and metabolism through reconstruction of microbial genomes from the environment</article-title>
<source>Nature</source>
<year>2004</year>
<volume>428</volume>
<fpage>37</fpage>
<lpage>43</lpage>
<pub-id pub-id-type="pmid">14961025</pub-id>
</citation>
</ref>
<ref id="B83">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tringe</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kobayashi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Salamov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>HW</given-names>
</name>
<name>
<surname>Podar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Short</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Mathur</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Detter</surname>
<given-names>JC</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Comparative metagenomics of microbial communities</article-title>
<source>Science</source>
<year>2005</year>
<volume>308</volume>
<fpage>554</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="pmid">15845853</pub-id>
</citation>
</ref>
<ref id="B84">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gill</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Deboy</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Eckburg</surname>
<given-names>PB</given-names>
</name>
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Samuel</surname>
<given-names>BS</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
<name>
<surname>Relman</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Fraser-Liggett</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>KE</given-names>
</name>
</person-group>
<article-title>Metagenomic analysis of the human distal gut microbiome</article-title>
<source>Science</source>
<year>2006</year>
<volume>312</volume>
<fpage>1355</fpage>
<lpage>1359</lpage>
<pub-id pub-id-type="pmid">16741115</pub-id>
</citation>
</ref>
<ref id="B85">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kurokawa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Itoh</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kuwahara</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Oshima</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Toh</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Toyoda</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Takami</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Morita</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>VK</given-names>
</name>
<name>
<surname>Srivastava</surname>
<given-names>TP</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes</article-title>
<source>DNA Res</source>
<year>2007</year>
<volume>14</volume>
<fpage>169</fpage>
<lpage>181</lpage>
<pub-id pub-id-type="pmid">17916580</pub-id>
</citation>
</ref>
<ref id="B86">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turnbaugh</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Ley</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Mahowald</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Magrini</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>JI</given-names>
</name>
</person-group>
<article-title>An obesity-associated gut microbiome with increased capacity for energy harvest</article-title>
<source>Nature</source>
<year>2006</year>
<volume>444</volume>
<fpage>1027</fpage>
<lpage>1031</lpage>
<pub-id pub-id-type="pmid">17183312</pub-id>
</citation>
</ref>
<ref id="B87">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Woyke</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Teeling</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Huntemann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gloeckner</surname>
<given-names>FO</given-names>
</name>
<name>
<surname>Boffelli</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>IJ</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>KW</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Szeto</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Mussmann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Amann</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bergin</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ruehland</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Dubilier</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Symbiosis insights through metagenomic analysis of a microbial consortium</article-title>
<source>Nature</source>
<year>2006</year>
<volume>443</volume>
<fpage>950</fpage>
<lpage>955</lpage>
<pub-id pub-id-type="pmid">16980956</pub-id>
</citation>
</ref>
<ref id="B88">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>García Martín</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kunin</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Warnecke</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>KW</given-names>
</name>
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Yeates</surname>
<given-names>C</given-names>
</name>
<name>
<surname>He</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Salamov</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Szeto</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Dalin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Putman</surname>
<given-names>NH</given-names>
</name>
<name>
<surname>Shapiro</surname>
<given-names>HJ</given-names>
</name>
<name>
<surname>Pangilinan</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Rigoutsos</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Blackall</surname>
<given-names>LL</given-names>
</name>
<name>
<surname>McMahon</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities</article-title>
<source>Nat Biotechnol</source>
<year>2006</year>
<volume>24</volume>
<fpage>1263</fpage>
<lpage>1269</lpage>
<pub-id pub-id-type="pmid">16998472</pub-id>
</citation>
</ref>
<ref id="B89">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeLong</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Preston</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Mincer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rich</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Frigaard</surname>
<given-names>NU</given-names>
</name>
<name>
<surname>Martinez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Brito</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Karl</surname>
<given-names>DM</given-names>
</name>
</person-group>
<article-title>Community genomics among stratified microbial assemblages in the ocean's interior</article-title>
<source>Science</source>
<year>2006</year>
<volume>311</volume>
<fpage>496</fpage>
<lpage>503</lpage>
<pub-id pub-id-type="pmid">16439655</pub-id>
</citation>
</ref>
<ref id="B90">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Warnecke</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Luginbuhl</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ghassemian</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Stege</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Cayouette</surname>
<given-names>M</given-names>
</name>
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Djordjevic</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Aboushadi</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sorek</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tringe</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Podar</surname>
<given-names>M</given-names>
</name>
<name>
<surname>García Martín</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kunin</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Dalevi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Madejska</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kirton</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Platt</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Szeto</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Salamov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mikhailova</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Matson</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Ottesen</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Hernández</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Murillo</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Acosta</surname>
<given-names>LG</given-names>
</name>
<name>
<surname>Rigoutsos</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Tamayo</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Mathur</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Leadbetter</surname>
<given-names>JR</given-names>
</name>
</person-group>
<article-title>Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite</article-title>
<source>Nature</source>
<year>2007</year>
<volume>450</volume>
<fpage>560</fpage>
<lpage>565</lpage>
<pub-id pub-id-type="pmid">18033299</pub-id>
</citation>
</ref>
<ref id="B91">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Klein</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>RSEARCH: finding homologs of single structured RNA sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2003</year>
<volume>4</volume>
<fpage>44</fpage>
<pub-id pub-id-type="pmid">14499004</pub-id>
</citation>
</ref>
<ref id="B92">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weinberg</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Barrick</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Yao</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Roth</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>JN</given-names>
</name>
<name>
<surname>Gore</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>JX</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Block</surname>
<given-names>KF</given-names>
</name>
<name>
<surname>Sudarsan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Neph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ruzzzo</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Breaker</surname>
<given-names>RR</given-names>
</name>
</person-group>
<article-title>Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>4809</fpage>
<lpage>4819</lpage>
<pub-id pub-id-type="pmid">17621584</pub-id>
</citation>
</ref>
<ref id="B93">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Griffiths-Jones</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>RALEE–RNA ALignment editor in Emacs</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>257</fpage>
<lpage>259</lpage>
<pub-id pub-id-type="pmid">15377506</pub-id>
</citation>
</ref>
<ref id="B94">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Steffen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Voss</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Rehmsmeier</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Reeder</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Giegerich</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>RNAshapes: an integrated RNA analysis package based on abstract shapes</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>500</fpage>
<lpage>503</lpage>
<pub-id pub-id-type="pmid">16357029</pub-id>
</citation>
</ref>
<ref id="B95">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Markowitz</surname>
<given-names>VM</given-names>
</name>
<name>
<surname>Ivanova</surname>
<given-names>NN</given-names>
</name>
<name>
<surname>Szeto</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Palaniappan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chu</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Dalevi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>IM</given-names>
</name>
<name>
<surname>Grechkin</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Dubchak</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Lykidis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mavromatis</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Hugenholtz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Kyrpides</surname>
<given-names>NC</given-names>
</name>
</person-group>
<article-title>IMG/M: a data management and analysis system for metagenomes</article-title>
<source>Nucleic Acids Res</source>
<year>2008</year>
<volume>36</volume>
<fpage>D534</fpage>
<lpage>538</lpage>
<pub-id pub-id-type="pmid">17932063</pub-id>
</citation>
</ref>
<ref id="B96">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noguchi</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Takagi</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>MetaGene: prokaryotic gene finding from environmental genome shotgun sequences</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<volume>34</volume>
<fpage>5623</fpage>
<lpage>5630</lpage>
<pub-id pub-id-type="pmid">17028096</pub-id>
</citation>
</ref>
<ref id="B97">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marchler-Bauer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Cherukuri</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>DeWeese-Scott</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Geer</surname>
<given-names>LY</given-names>
</name>
<name>
<surname>Gwadz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>He</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hurwitz</surname>
<given-names>DI</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Ke</surname>
<given-names>Z</given-names>
</name>
<etal></etal>
</person-group>
<article-title>CDD: a Conserved Domain Database for protein classification</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>D192</fpage>
<lpage>196</lpage>
<pub-id pub-id-type="pmid">15608175</pub-id>
</citation>
</ref>
<ref id="B98">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sonnhammer</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Chothia</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Volume changes in protein evolution</article-title>
<source>J Mol Biol</source>
<year>1994</year>
<volume>236</volume>
<fpage>1067</fpage>
<lpage>1078</lpage>
<pub-id pub-id-type="pmid">8120887</pub-id>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000217 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000217 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2704228
   |texte=   Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:19531245" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024