Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities

Identifieur interne : 000313 ( Pmc/Corpus ); précédent : 000312; suivant : 000314

Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities

Auteurs : Jack A. Gilbert ; Dawn Field ; Ying Huang ; Rob Edwards ; Weizhong Li ; Paul Gilna ; Ian Joint

Source :

RBID : PMC:2518522

Abstract

Background

Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities.

Methodology/Principal Findings

We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, ∼91% of which were novel.

Conclusions/Significance

This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities – if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.


Url:
DOI: 10.1371/journal.pone.0003042
PubMed: 18725995
PubMed Central: 2518522

Links to Exploration step

PMC:2518522

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities</title>
<author>
<name sortKey="Gilbert, Jack A" sort="Gilbert, Jack A" uniqKey="Gilbert J" first="Jack A." last="Gilbert">Jack A. Gilbert</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Plymouth Marine Laboratory, Prospect Place, Plymouth, United Kingdom</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Field, Dawn" sort="Field, Dawn" uniqKey="Field D" first="Dawn" last="Field">Dawn Field</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>NERC Centre for Ecology and Hydrology, CEH Oxford, Oxford, United Kingdom</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Huang, Ying" sort="Huang, Ying" uniqKey="Huang Y" first="Ying" last="Huang">Ying Huang</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Edwards, Rob" sort="Edwards, Rob" uniqKey="Edwards R" first="Rob" last="Edwards">Rob Edwards</name>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Department of Computer Science, San Diego State University, San Diego, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff5">
<addr-line>Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gilna, Paul" sort="Gilna, Paul" uniqKey="Gilna P" first="Paul" last="Gilna">Paul Gilna</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Joint, Ian" sort="Joint, Ian" uniqKey="Joint I" first="Ian" last="Joint">Ian Joint</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Plymouth Marine Laboratory, Prospect Place, Plymouth, United Kingdom</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">18725995</idno>
<idno type="pmc">2518522</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2518522</idno>
<idno type="RBID">PMC:2518522</idno>
<idno type="doi">10.1371/journal.pone.0003042</idno>
<date when="2008">2008</date>
<idno type="wicri:Area/Pmc/Corpus">000313</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities</title>
<author>
<name sortKey="Gilbert, Jack A" sort="Gilbert, Jack A" uniqKey="Gilbert J" first="Jack A." last="Gilbert">Jack A. Gilbert</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Plymouth Marine Laboratory, Prospect Place, Plymouth, United Kingdom</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Field, Dawn" sort="Field, Dawn" uniqKey="Field D" first="Dawn" last="Field">Dawn Field</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>NERC Centre for Ecology and Hydrology, CEH Oxford, Oxford, United Kingdom</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Huang, Ying" sort="Huang, Ying" uniqKey="Huang Y" first="Ying" last="Huang">Ying Huang</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Edwards, Rob" sort="Edwards, Rob" uniqKey="Edwards R" first="Rob" last="Edwards">Rob Edwards</name>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Department of Computer Science, San Diego State University, San Diego, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff5">
<addr-line>Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gilna, Paul" sort="Gilna, Paul" uniqKey="Gilna P" first="Paul" last="Gilna">Paul Gilna</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Joint, Ian" sort="Joint, Ian" uniqKey="Joint I" first="Ian" last="Joint">Ian Joint</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Plymouth Marine Laboratory, Prospect Place, Plymouth, United Kingdom</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities.</p>
</sec>
<sec>
<title>Methodology/Principal Findings</title>
<p>We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, ∼91% of which were novel.</p>
</sec>
<sec>
<title>Conclusions/Significance</title>
<p>This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities – if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="EN">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title>PLoS ONE</journal-title>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">18725995</article-id>
<article-id pub-id-type="pmc">2518522</article-id>
<article-id pub-id-type="publisher-id">08-PONE-RA-04404R1</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0003042</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline">
<subject>Computational Biology/Metagenomics</subject>
<subject>Computational Biology/Protein Homology Detection</subject>
<subject>Ecology/Environmental Microbiology</subject>
<subject>Ecology/Global Change Ecology</subject>
<subject>Ecology/Marine and Freshwater Ecology</subject>
<subject>Genetics and Genomics/Bioinformatics</subject>
<subject>Molecular Biology/Bioinformatics</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities</article-title>
<alt-title alt-title-type="running-head">Novel Marine Metatranscripts</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Gilbert</surname>
<given-names>Jack A.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Field</surname>
<given-names>Dawn</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Huang</surname>
<given-names>Ying</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Edwards</surname>
<given-names>Rob</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Weizhong</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gilna</surname>
<given-names>Paul</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Joint</surname>
<given-names>Ian</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>Plymouth Marine Laboratory, Prospect Place, Plymouth, United Kingdom</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>NERC Centre for Ecology and Hydrology, CEH Oxford, Oxford, United Kingdom</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Department of Computer Science, San Diego State University, San Diego, California, United States of America</addr-line>
</aff>
<aff id="aff5">
<label>5</label>
<addr-line>Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Ahmed</surname>
<given-names>Niyaz</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">Centre for DNA Fingerprinting and Diagnostics, India</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>jagi@pml.ac.uk</email>
</corresp>
<fn fn-type="con">
<p>Conceived and designed the experiments: JAG DF IJ. Performed the experiments: JAG. Analyzed the data: JAG DF YH RAE WL PG. Contributed reagents/materials/analysis tools: JAG IJ. Wrote the paper: JAG DF WL IJ.</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2008</year>
</pub-date>
<pub-date pub-type="epub">
<day>22</day>
<month>8</month>
<year>2008</year>
</pub-date>
<volume>3</volume>
<issue>8</issue>
<elocation-id>e3042</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>4</month>
<year>2008</year>
</date>
<date date-type="accepted">
<day>5</day>
<month>8</month>
<year>2008</year>
</date>
</history>
<copyright-statement>Gilbert et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</copyright-statement>
<copyright-year>2008</copyright-year>
<abstract>
<sec>
<title>Background</title>
<p>Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities.</p>
</sec>
<sec>
<title>Methodology/Principal Findings</title>
<p>We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, ∼91% of which were novel.</p>
</sec>
<sec>
<title>Conclusions/Significance</title>
<p>This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities – if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.</p>
</sec>
</abstract>
<counts>
<page-count count="13"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>DNA sequence based metagenomics has become a standard tool for the analysis of natural microbial communities in marine environments
<xref ref-type="bibr" rid="pone.0003042-DeLong1">[1]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Rusch1">[2]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Yooseph1">[3]</xref>
. It involves the sequencing of random community DNA from environmental samples and subsequent determination of taxonomic and protein-encoding gene diversity. However, questions of how natural bacterial assemblages respond to perturbations in environmental conditions, are better answered by analysis of community mRNA than genomic DNA
<xref ref-type="bibr" rid="pone.0003042-Handelsman1">[4]</xref>
.</p>
<p>Historically, metatranscriptomic studies have involved either the use of microarrays
<xref ref-type="bibr" rid="pone.0003042-Parro1">[5]</xref>
or mRNA-derived cDNA clone libraries
<xref ref-type="bibr" rid="pone.0003042-Poretsky1">[6]</xref>
. These approaches have produced significant insight into the metatranscriptome of different communities but have limitations when exploring the diversity of natural communities. Firstly, a microarray only gives information about those sequences for which it was designed and it is usual to screen for gene sequences that are already known (e.g. from a gene-library or metagenomic sources). Secondly, although transcript cloning avoids this problem through the random amplification and sequestering of environmental mRNA fragments, it introduces other biases; e.g. any cloned transcripts that encode toxic products or titrates host DNA-binding factors will skew the relative abundance of sequences.</p>
<p>More recently, the first metatranscriptomic studies using high-throughput sequencing technology (pyrosequencing) have been published
<xref ref-type="bibr" rid="pone.0003042-Leininger1">[7]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Urich1">[8]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
. Two studies of soil communities have sequenced total RNA for the purpose of exploring both community structure, through the analysis of ribosomal RNA (rRNA), and community function, through the study of mRNA
<xref ref-type="bibr" rid="pone.0003042-Leininger1">[7]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Urich1">[8]</xref>
. The first study of a marine microbial community metatranscriptome focused on mRNA analysis and achieved an enrichment of ∼50% mRNA
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
.</p>
<p>Ideally, if the study of mRNA is the prime purpose of a metatranscriptomic study, further enrichment is desirable. Here we present a study of a complex marine microbial metatranscriptome enriched to 99.92% mRNA
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
. Metatranscriptomes were generated from 4 samples taken at two time points in a replicated mesocosm (11,000 liters) study involving an induced phytoplankton bloom
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
. Pyrosequencing technology (GS-FLX pyrosequencer) was used to generate four metatranscriptomes and four corresponding metagenomes with an average sequence length of 215 bp from the middle and end time point in the phytoplankton blooms. This experiment provided an opportunity to obtain replicate samples to explore the use of this approach for detecting changes in the expression of genes over time. The primary focus of the mesocosm experiment was to study the response of marine microbes to the increase in ocean acidification that is resulting from dissolution of anthropogenic CO
<sub>2</sub>
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
and these results will be described in detail elsewhere. The immediate purpose of this study was to 1) demonstrate the feasibility of obtaining highly enriched samples of mRNA (>90%) from these communities, 2) determine whether differences in expression could be identified between time points of this controlled experiment
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
, and finally 3) to determine what proportion of the most highly expressed genes using such a methodology might be novel.</p>
</sec>
<sec sec-type="methods" id="s2">
<title>Methods</title>
<sec id="s2a">
<title>Sampling, cDNA synthesis and sequencing</title>
<p>Water samples were obtained from a replicated mesocosm study (two treatments, each in triplicate) established in coastal waters of a fjord close to Bergen, Norway (60.27°N: 5.22°E). Each mesocosm contained 11,000 L of coastal water and two of the six mesocosms were sampled for this study. To induce the phytoplankton bloom, nitrate and phosphate were added. Water samples were taken at the peak and immediately following the collapse of the phytoplankton bloom from both a high CO
<sub>2</sub>
and control mesocosm.</p>
<p>The nucleic acid extraction methodologies are briefly outlined in Gilbert
<italic>et al.</italic>
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
but are fully described here. To isolate DNA and RNA, 15 L of water from each sample was filtered through a 140 mm diameter, 1.6 µm GF/A filter (Whatman), to reduce eukaryotic cell abundance and maximize the proportion of prokaryotic cells. This filtration took only 3 minutes and the filtrate was applied directly to a 0.22 µm Sterivex filter (Millipore) to allow rapid filtration of samples (<15 minutes per sample) to limited mRNA degradation. Following filtration, each Sterivex was pumped dry, frozen in liquid nitrogen and stored at −80°C until extraction. Total nucleic acid extraction was performed on each Sterivex using the method of Neufeld
<italic>et al.</italic>
<xref ref-type="bibr" rid="pone.0003042-Neufeld1">[12]</xref>
. Throughout the protocol nuclease-free plastic consumables and DEPC-treated water and reagents were used to limited degradation of mRNA. Following extraction, total nucleic acids were eluted in 200 µl of nuclease-free water.</p>
<p>For metagenomic analysis, 100 µl of the total nucleic acid extraction was purified for DNA by treatment with RiboShredder™ RNase (Epicenter) following manufacturer's instructions. Purified metagenomic DNA was quantified by nano-litre spectrophotometry, diluted with nuclease-free water to 500 ng µl
<sup>−1</sup>
and then stored at −80°C until pyrosequencing.</p>
<p>For metatranscriptomic analysis, 100 µl of the total RNA was purified using the RNA MinElute™ clean-up kit (Qiagen); β-mercaptoethanol was added to the RLT buffer. Approximate RNA concentration was determined by nano-litre spectrophotometry and checked for rRNA integrity using an Agilent bioanalyser (RNA nano6000 chip). Average RNA concentration was 2.4 mg ml
<sup>−1</sup>
. The integrity of rRNA was demonstrated by highly defined, discrete rRNA peaks, with the 23S rRNA peak being 1.5–2 times higher than the 16S rRNA peak. Fully intact rRNA is essential for subtractive hybridization because degraded rRNA molecules will not be fully subtracted from the total RNA pool.</p>
<p>DNA contamination was removed from total RNA samples by treating with the Turbo DNA-free enzyme (Ambion). 75 µg of purified total RNA was applied to the subtractive hybridization method (Microbe Express Kit, Ambion) to remove rRNA from the mRNA. Purified mRNA was eluted in 25 µl of TE buffer (10 mM Tris-HCl pH 8.0, 1 mM EDTA) and was further purified with the MEGAclear™ kit (Ambion) to remove small RNAs and small contaminants. Purified mRNA was eluted in 10 µl of nuclease free water and stored at −80°C until further analysis. 0.5 µl of the purified mRNA was then checked using the Agilent bioanalyser for removal of genomic DNA and ribosomal RNAs. The mRNA concentration was estimated using the Agilent bioanalyser software to average 450 ng µl
<sup>−1</sup>
.</p>
<p>mRNA was estimated to be approximately 8% of total RNA isolated. 9.5 µl of the purified mRNA was then applied to a reverse transcription reaction using the SuperScript® III enzyme (Invitrogen) with random hexamer primers (Promega). The cDNA was treated with RiboShredder™ RNase Blend (Epicentre) to remove trace RNA contaminants. To improve the yield of cDNA, 1 µl of each sample was subjected to random amplification using the GenomiPHI™ V2 method (GE Healthcare) yielding approximately 4 µg of cDNA. GenomiPHI technology produces branched DNA molecules that are recalcitrant to the pyrosequencing methodology. Therefore amplified samples were treated with S1 nuclease using the method of Zhang
<italic>et al.</italic>
<xref ref-type="bibr" rid="pone.0003042-Zhang1">[13]</xref>
. DNA and cDNA were nebulized to produce an average size of 500 bp, then cleaned with AMPure beads (Agencourt) and sequenced using the 454 Corporation's GS-FLX instrument at the NERC-funded Advanced Genomics Facility at the University of Liverpool (
<ext-link ext-link-type="uri" xlink:href="http://www.liv.ac.uk/agf/">http://www.liv.ac.uk/agf/</ext-link>
). Extraneous sequences resulting from >1 template molecule per picotitre well were removed from the datasets (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
) as they include exact duplicates and failed sequences that are replete with uncharacterized nucleotides. Metatranscriptomic and metagenomic data sets were deposited in NCBIs Gene Expression Omnibus (GEO,
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/">http://www.ncbi.nlm.nih.gov/geo/</ext-link>
) and are accessible through GEO Series accession number GSE10119. All data is also deposited with the Short Reads Archive (NCBI) under accession number SRA000266. These datasets are also available with richer annotates in ISATAB format
<xref ref-type="bibr" rid="pone.0003042-Sansone1">[14]</xref>
compliant with the “Minimum Information about a (Meta) Genome Sequence” (MIGS) specification
<xref ref-type="bibr" rid="pone.0003042-Field1">[15]</xref>
.</p>
<table-wrap id="pone-0003042-t001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.t001</object-id>
<label>Table 1</label>
<caption>
<title>Comparison of DNA and mRNA from samples from mid- and post-phytoplankton bloom.</title>
</caption>
<graphic id="pone-0003042-t001-1" xlink:href="pone.0003042.t001"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0003042-t001-1">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="4" align="left" rowspan="1">Mid-Bloom</td>
<td colspan="4" align="left" rowspan="1">Post-Bloom</td>
<td colspan="3" align="left" rowspan="1">Combined data</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">DNA-High CO
<sub>2</sub>
</td>
<td align="left" rowspan="1" colspan="1">mRNA-High CO
<sub>2</sub>
</td>
<td align="left" rowspan="1" colspan="1">DNA-Present Day</td>
<td align="left" rowspan="1" colspan="1">mRNA-Present Day</td>
<td align="left" rowspan="1" colspan="1">DNA-High CO
<sub>2</sub>
</td>
<td align="left" rowspan="1" colspan="1">mRNA-High CO
<sub>2</sub>
</td>
<td align="left" rowspan="1" colspan="1">DNA-Present Day</td>
<td align="left" rowspan="1" colspan="1">mRNA-Present Day</td>
<td align="left" rowspan="1" colspan="1">All DNA</td>
<td align="left" rowspan="1" colspan="1">All mRNA</td>
<td align="left" rowspan="1" colspan="1">All samples</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Total size (Mbp)</td>
<td align="left" rowspan="1" colspan="1">47,289,282</td>
<td align="left" rowspan="1" colspan="1">
<bold>30,567,377</bold>
</td>
<td align="left" rowspan="1" colspan="1">30,991,689</td>
<td align="left" rowspan="1" colspan="1">
<bold>38,021,523</bold>
</td>
<td align="left" rowspan="1" colspan="1">59,316,369</td>
<td align="left" rowspan="1" colspan="1">
<bold>21,805,955</bold>
</td>
<td align="left" rowspan="1" colspan="1">68,187,679</td>
<td align="left" rowspan="1" colspan="1">
<bold>26,982,195</bold>
</td>
<td align="left" rowspan="1" colspan="1">205,784,939</td>
<td align="left" rowspan="1" colspan="1">
<bold>117,377,050</bold>
</td>
<td align="left" rowspan="1" colspan="1">323,161,989</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Total No. of reads</td>
<td align="left" rowspan="1" colspan="1">209,073</td>
<td align="left" rowspan="1" colspan="1">
<bold>131,089</bold>
</td>
<td align="left" rowspan="1" colspan="1">134,915</td>
<td align="left" rowspan="1" colspan="1">
<bold>162,871</bold>
</td>
<td align="left" rowspan="1" colspan="1">344,216</td>
<td align="left" rowspan="1" colspan="1">
<bold>96,201</bold>
</td>
<td align="left" rowspan="1" colspan="1">304,020</td>
<td align="left" rowspan="1" colspan="1">
<bold>116,192</bold>
</td>
<td align="left" rowspan="1" colspan="1">992,224</td>
<td align="left" rowspan="1" colspan="1">
<bold>506,353</bold>
</td>
<td align="left" rowspan="1" colspan="1">1,498,577</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Average length (bp)</td>
<td align="left" rowspan="1" colspan="1">226</td>
<td align="left" rowspan="1" colspan="1">
<bold>233</bold>
</td>
<td align="left" rowspan="1" colspan="1">229</td>
<td align="left" rowspan="1" colspan="1">
<bold>233</bold>
</td>
<td align="left" rowspan="1" colspan="1">172</td>
<td align="left" rowspan="1" colspan="1">
<bold>226</bold>
</td>
<td align="left" rowspan="1" colspan="1">224</td>
<td align="left" rowspan="1" colspan="1">
<bold>232</bold>
</td>
<td align="left" rowspan="1" colspan="1">207</td>
<td align="left" rowspan="1" colspan="1">
<bold>231</bold>
</td>
<td align="left" rowspan="1" colspan="1">215</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">% of rRNA genes
<xref ref-type="table-fn" rid="nt102">a</xref>
</td>
<td align="left" rowspan="1" colspan="1">0.33</td>
<td align="left" rowspan="1" colspan="1">
<bold>0.16</bold>
</td>
<td align="left" rowspan="1" colspan="1">0.31</td>
<td align="left" rowspan="1" colspan="1">
<bold>0.1</bold>
</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">
<bold>0.03</bold>
</td>
<td align="left" rowspan="1" colspan="1">0.24</td>
<td align="left" rowspan="1" colspan="1">
<bold>0.03</bold>
</td>
<td align="left" rowspan="1" colspan="1">0.25</td>
<td align="left" rowspan="1" colspan="1">
<bold>0.08</bold>
</td>
<td align="left" rowspan="1" colspan="1">0.16</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Absolute number of unique nucleotide sequence clusters
<xref ref-type="table-fn" rid="nt103">b</xref>
</td>
<td align="left" rowspan="1" colspan="1">170,580</td>
<td align="left" rowspan="1" colspan="1">
<bold>65,717</bold>
</td>
<td align="left" rowspan="1" colspan="1">112,459</td>
<td align="left" rowspan="1" colspan="1">
<bold>67,283</bold>
</td>
<td align="left" rowspan="1" colspan="1">257,375</td>
<td align="left" rowspan="1" colspan="1">
<bold>9,349</bold>
</td>
<td align="left" rowspan="1" colspan="1">232,729</td>
<td align="left" rowspan="1" colspan="1">
<bold>10,703</bold>
</td>
<td align="left" rowspan="1" colspan="1">630,159</td>
<td align="left" rowspan="1" colspan="1">
<bold>133,447</bold>
</td>
<td align="left" rowspan="1" colspan="1">723,050</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Normalized number of clusters
<xref ref-type="table-fn" rid="nt104">c</xref>
</td>
<td align="left" rowspan="1" colspan="1">86,791</td>
<td align="left" rowspan="1" colspan="1">
<bold>50,320</bold>
</td>
<td align="left" rowspan="1" colspan="1">84,096</td>
<td align="left" rowspan="1" colspan="1">
<bold>43,213</bold>
</td>
<td align="left" rowspan="1" colspan="1">86,996</td>
<td align="left" rowspan="1" colspan="1">
<bold>9,349</bold>
</td>
<td align="left" rowspan="1" colspan="1">87,112</td>
<td align="left" rowspan="1" colspan="1">
<bold>9,281</bold>
</td>
<td align="left" rowspan="1" colspan="1">n/a</td>
<td align="left" rowspan="1" colspan="1">
<bold>n/a</bold>
</td>
<td align="left" rowspan="1" colspan="1">n/a</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Total number of reads in top cluster</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">
<bold>988</bold>
</td>
<td align="left" rowspan="1" colspan="1">23</td>
<td align="left" rowspan="1" colspan="1">
<bold>932</bold>
</td>
<td align="left" rowspan="1" colspan="1">19</td>
<td align="left" rowspan="1" colspan="1">
<bold>2,019</bold>
</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="left" rowspan="1" colspan="1">
<bold>2,421</bold>
</td>
<td align="left" rowspan="1" colspan="1">36</td>
<td align="left" rowspan="1" colspan="1">
<bold>4860</bold>
</td>
<td align="left" rowspan="1" colspan="1">4866</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Clustering: 1 sequence</td>
<td align="left" rowspan="1" colspan="1">141,340</td>
<td align="left" rowspan="1" colspan="1">
<bold>56,126</bold>
</td>
<td align="left" rowspan="1" colspan="1">94,386</td>
<td align="left" rowspan="1" colspan="1">
<bold>55,691</bold>
</td>
<td align="left" rowspan="1" colspan="1">200,569</td>
<td align="left" rowspan="1" colspan="1">
<bold>7,397</bold>
</td>
<td align="left" rowspan="1" colspan="1">183,028</td>
<td align="left" rowspan="1" colspan="1">
<bold>8,106</bold>
</td>
<td align="left" rowspan="1" colspan="1">437,149</td>
<td align="left" rowspan="1" colspan="1">
<bold>107,593</bold>
</td>
<td align="left" rowspan="1" colspan="1">494,604</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2–9 sequences</td>
<td align="left" rowspan="1" colspan="1">29,232</td>
<td align="left" rowspan="1" colspan="1">
<bold>8,824</bold>
</td>
<td align="left" rowspan="1" colspan="1">18,072</td>
<td align="left" rowspan="1" colspan="1">
<bold>10,367</bold>
</td>
<td align="left" rowspan="1" colspan="1">56,729</td>
<td align="left" rowspan="1" colspan="1">
<bold>1,182</bold>
</td>
<td align="left" rowspan="1" colspan="1">49,681</td>
<td align="left" rowspan="1" colspan="1">
<bold>1,677</bold>
</td>
<td align="left" rowspan="1" colspan="1">191,822</td>
<td align="left" rowspan="1" colspan="1">
<bold>23,224</bold>
</td>
<td align="left" rowspan="1" colspan="1">224,198</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">10–99 sequences</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">
<bold>652</bold>
</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">
<bold>1,032</bold>
</td>
<td align="left" rowspan="1" colspan="1">77</td>
<td align="left" rowspan="1" colspan="1">
<bold>552</bold>
</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="left" rowspan="1" colspan="1">
<bold>673</bold>
</td>
<td align="left" rowspan="1" colspan="1">1188</td>
<td align="left" rowspan="1" colspan="1">
<bold>2,011</bold>
</td>
<td align="left" rowspan="1" colspan="1">3,639</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">100+ sequences</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">
<bold>115</bold>
</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">
<bold>193</bold>
</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">
<bold>218</bold>
</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">
<bold>247</bold>
</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">
<bold>619</bold>
</td>
<td align="left" rowspan="1" colspan="1">609</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SEED Subsystem hits</td>
<td align="left" rowspan="1" colspan="1">130,567</td>
<td align="left" rowspan="1" colspan="1">
<bold>75,884</bold>
</td>
<td align="left" rowspan="1" colspan="1">120,141</td>
<td align="left" rowspan="1" colspan="1">
<bold>83,076</bold>
</td>
<td align="left" rowspan="1" colspan="1">161,789</td>
<td align="left" rowspan="1" colspan="1">
<bold>16,545</bold>
</td>
<td align="left" rowspan="1" colspan="1">175,477</td>
<td align="left" rowspan="1" colspan="1">
<bold>18,315</bold>
</td>
<td align="left" rowspan="1" colspan="1">n/a</td>
<td align="left" rowspan="1" colspan="1">
<bold>n/a</bold>
</td>
<td align="left" rowspan="1" colspan="1">n/a</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Total pORFs
<xref ref-type="table-fn" rid="nt105">d</xref>
</td>
<td align="left" rowspan="1" colspan="1">419,565</td>
<td align="left" rowspan="1" colspan="1">
<bold>284,665</bold>
</td>
<td align="left" rowspan="1" colspan="1">279,061</td>
<td align="left" rowspan="1" colspan="1">
<bold>345,502</bold>
</td>
<td align="left" rowspan="1" colspan="1">532,373</td>
<td align="left" rowspan="1" colspan="1">
<bold>237,187</bold>
</td>
<td align="left" rowspan="1" colspan="1">637,896</td>
<td align="left" rowspan="1" colspan="1">
<bold>289,951</bold>
</td>
<td align="left" rowspan="1" colspan="1">1,868,895</td>
<td align="left" rowspan="1" colspan="1">
<bold>1,157,305</bold>
</td>
<td align="left" rowspan="1" colspan="1">3,026,200</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Unique pORFs at 95%
<xref ref-type="table-fn" rid="nt106">e</xref>
</td>
<td align="left" rowspan="1" colspan="1">358,705</td>
<td align="left" rowspan="1" colspan="1">
<bold>140,763</bold>
</td>
<td align="left" rowspan="1" colspan="1">242,317</td>
<td align="left" rowspan="1" colspan="1">
<bold>147,697</bold>
</td>
<td align="left" rowspan="1" colspan="1">435,876</td>
<td align="left" rowspan="1" colspan="1">
<bold>31,104</bold>
</td>
<td align="left" rowspan="1" colspan="1">515,266</td>
<td align="left" rowspan="1" colspan="1">
<bold>36,113</bold>
</td>
<td align="left" rowspan="1" colspan="1">1,340,241</td>
<td align="left" rowspan="1" colspan="1">
<bold>299,898</bold>
</td>
<td align="left" rowspan="1" colspan="1">1,571,348</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Protein clusters
<xref ref-type="table-fn" rid="nt107">f</xref>
</td>
<td align="left" rowspan="1" colspan="1">321,839</td>
<td align="left" rowspan="1" colspan="1">
<bold>120,220</bold>
</td>
<td align="left" rowspan="1" colspan="1">223,888</td>
<td align="left" rowspan="1" colspan="1">
<bold>121,547</bold>
</td>
<td align="left" rowspan="1" colspan="1">382,762</td>
<td align="left" rowspan="1" colspan="1">
<bold>16,641</bold>
</td>
<td align="left" rowspan="1" colspan="1">452,754</td>
<td align="left" rowspan="1" colspan="1">
<bold>19,259</bold>
</td>
<td align="left" rowspan="1" colspan="1">1,083,644</td>
<td align="left" rowspan="1" colspan="1">
<bold>238,655</bold>
</td>
<td align="left" rowspan="1" colspan="1">1,228,601</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Protein clusters
<xref ref-type="table-fn" rid="nt108">g</xref>
with similarity to:</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">
<bold>296</bold>
</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">
<bold>468</bold>
</td>
<td align="left" rowspan="1" colspan="1">29</td>
<td align="left" rowspan="1" colspan="1">
<bold>516</bold>
</td>
<td align="left" rowspan="1" colspan="1">31</td>
<td align="left" rowspan="1" colspan="1">
<bold>589</bold>
</td>
<td align="left" rowspan="1" colspan="1">695</td>
<td align="left" rowspan="1" colspan="1">
<bold>1,131</bold>
</td>
<td align="left" rowspan="1" colspan="1">2,029</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PFAM
<xref ref-type="table-fn" rid="nt109">h</xref>
</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">
<bold>19</bold>
</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">
<bold>23</bold>
</td>
<td align="left" rowspan="1" colspan="1">14</td>
<td align="left" rowspan="1" colspan="1">
<bold>56</bold>
</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">
<bold>63</bold>
</td>
<td align="left" rowspan="1" colspan="1">379</td>
<td align="left" rowspan="1" colspan="1">
<bold>76</bold>
</td>
<td align="left" rowspan="1" colspan="1">571</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TIGRfam
<xref ref-type="table-fn" rid="nt110">i</xref>
</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">366</td>
<td align="left" rowspan="1" colspan="1">
<bold>1</bold>
</td>
<td align="left" rowspan="1" colspan="1">476</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">COG
<xref ref-type="table-fn" rid="nt111">j</xref>
</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">
<bold>2</bold>
</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">18</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">17</td>
<td align="left" rowspan="1" colspan="1">
<bold>0</bold>
</td>
<td align="left" rowspan="1" colspan="1">431</td>
<td align="left" rowspan="1" colspan="1">
<bold>6</bold>
</td>
<td align="left" rowspan="1" colspan="1">572</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of novel Protein clusters
<xref ref-type="table-fn" rid="nt112">k</xref>
</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">
<bold>276</bold>
</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">
<bold>444</bold>
</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">
<bold>447</bold>
</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">
<bold>509</bold>
</td>
<td align="left" rowspan="1" colspan="1">202</td>
<td align="left" rowspan="1" colspan="1">
<bold>1026</bold>
</td>
<td align="left" rowspan="1" colspan="1">1287</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt101">
<p>Size, clustering and annotation data were generated by CAMERA. rRNA and subsystem hits were generated by SEED.</p>
</fn>
<fn id="nt102">
<label>a</label>
<p>Analysis of sequences against the Ribosomal Database Project II (RDP-II) and the European Ribosomal large subunit (LSU)dataset.</p>
</fn>
<fn id="nt103">
<label>b</label>
<p>Based on clustering at 95% identity over 80% length of a sequence and over 120 bp.</p>
</fn>
<fn id="nt104">
<label>c</label>
<p>For direct comparison of samples, individual rarefaction analysis, by R (
<ext-link ext-link-type="uri" xlink:href="http://www.r-project.org/">http://www.r-project.org/</ext-link>
) and the vegan package (
<ext-link ext-link-type="uri" xlink:href="http://cc.oulu.fi/jarioksa/softhelp/vegan.html">http://cc.oulu.fi/jarioksa/softhelp/vegan.html</ext-link>
), was used to estimate the number of clusters in each sample after adjusting sample size of each dataset to be equivalent to the number of reads in the smallest dataset (mRNA – High CO
<sub>2</sub>
).</p>
</fn>
<fn id="nt105">
<label>d</label>
<p>Partial Open Reading Frames (pORFs) from six reading frame translation from all reads using translation table 11, starting at the beginning of a read or first ATG after previous stop codon, ending at the end of a read, or at a stop codon and being at least 30 contiguous amino acids.</p>
</fn>
<fn id="nt106">
<label>e</label>
<p>Total pORF reads clustered at 95% identity of over 80% length of sequences.</p>
</fn>
<fn id="nt107">
<label>f</label>
<p>Clusters are identified using the representative sequences of each cluster from the 95% step to cluster at 60% identity of over 80% length of sequences.</p>
</fn>
<fn id="nt108">
<label>g</label>
<p>The dominant clusters (≥10 non-redundant sequences) with the exclusion of spurious pORFs.</p>
</fn>
<fn id="nt109">
<label>h</label>
<p>Protein families database.</p>
</fn>
<fn id="nt110">
<label>i</label>
<p>The Institute for Genomic Research protein database.</p>
</fn>
<fn id="nt111">
<label>j</label>
<p>NCBI clusters of orthologous groups database.</p>
</fn>
<fn id="nt112">
<label>k</label>
<p>With ≥10 non-redundant clustered sequences excluding spurious ORFs.</p>
</fn>
<fn id="nt113">
<p>n/a – not analyzed.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s2b">
<title>Clustering of DNA and mRNA and prediction of partial ORFs (pORFs)</title>
<p>Clustering analysis was performed on the raw reads and translated peptide sequences (see below) using the CD-HIT package
<xref ref-type="bibr" rid="pone.0003042-Li1">[16]</xref>
. The reads from all eight samples were clustered together with CD-HIT-EST program. Sequences were clustered if the identity was ≥95% (++ or +− strand) and the length of the alignment was ≥40 bp and ≥80% length of the shorter sequence. The clustering results show the internal structure of the combined dataset including number of non-redundant sequences, distribution of clustering, number of singletons, etc. The same analysis was applied to each individual sample by counting only the sequences from that sample. Results are shown in rows 5–11 in
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
.</p>
<p>ORFs (including pORFs) were then predicted. Since genes cannot be reliably predicted from such short reads, we applied the methodology used in the Global Ocean Survey (GOS) study
<xref ref-type="bibr" rid="pone.0003042-Rusch1">[2]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Yooseph1">[3]</xref>
, calling pORFs from all six reading frames. As the current study had overall shorter reads than the GOS study (average of 215 bp instead of 822 bp) pORFs had to contain at least 30 amino acids. In total 3,026,200 pORFs were detected. The approach of six reading frame translation can result in many non coding (shadow) pORFs, or spurious pORFs. However, this is less likely with short sequence data because the translations from non-coding frames are usually too short (due to random occurrence of stop codons) to rank as pORFs using our selected cut-off threshold.</p>
<p>The protein gene coding density, according to the most recent NCBI RefSeq database for microbial organisms (
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.ncbi.nih.gov/refseq/release/release-statistics/RefSeq-release27.01062008.stats.txt">ftp://ftp.ncbi.nih.gov/refseq/release/release-statistics/RefSeq-release27.01062008.stats.txt</ext-link>
), is about 0.25 million amino acid per 1 million base pairs (bp). This study, which obtained 162 million amino acids from 323 million bp of sequence, shows only 50% of these pORFs were spurious. Clustering of pORFs can further help to exclude spurious pORFs which are more likely to remain singletons.</p>
<p>The pORFs were clustered with two-step CD-HIT runs. At the first step, pORFs were clustered at 95% identity over 80% of sequence length in order to identify non-redundant sequences. The non-redundant sequences were further clustered at 60% identity, over 80% of sequence coverage, to find clusters of homologous pORFs or protein families (see row
<sup>d, e, f</sup>
in
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
). Here, we only use the non-redundant sequences to count the size of each cluster so that the large clusters reported in row
<sup>g</sup>
in
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
contain diverse sequences. The same clustering techniques were also applied to the data from the metatranscriptomic study of Frias-Lopez and colleagues
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[
<bold>9</bold>
]</xref>
(
<xref ref-type="supplementary-material" rid="pone.0003042.s001">
<bold>Table S1</bold>
</xref>
).</p>
</sec>
<sec id="s2c">
<title>Dividing pORFs into ‘predicted, ‘spurious’ and ‘putative’</title>
<p>The clusters of pORFs were annotated by comparison to the PFAM database (
<ext-link ext-link-type="uri" xlink:href="http://PFAM.sanger.ac.uk/">http://PFAM.sanger.ac.uk/</ext-link>
) by Hmmer, TIGRfam database (
<ext-link ext-link-type="uri" xlink:href="http://www.tigr.org/TIGRFAMs/">http://www.tigr.org/TIGRFAMs/</ext-link>
) by Hmmer and the COG database (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/COG/">http://www.ncbi.nlm.nih.gov/COG/</ext-link>
) by RPS-BLAST (reversed PSI-BLAST). All analyses were annotated with an expect value cut-off of 0.001. Hmmer analysis was performed in fragmental mode, and each hit also had to pass the TC score. The pORFs with significant matches to these reference databases were confirmed as genes, while the pORFs that overlapped with them from a different reading frame (the shadow pORFs) were deemed spurious pORFs. From this final analysis of the 3,026,200 pORFs, 494,253 could be confirmed as predicted proteins, 459,150 excluded as spurious pORFs, and the remainder (2,072,797) marked as “putative proteins”. The combined predicted and putative proteins were used for subsequent analysis.</p>
</sec>
<sec id="s2d">
<title>PCR detection of dominant orphan gene clusters in environmental DNA and mRNA</title>
<p>To validate the presence of highly expressed orphaned sequence clusters in the environment we randomly selected 27 of the most highly expressed nucleotide clusters. It was necessary to establish the presence of these sequences in both original DNA samples and cDNA samples to show they were not artefacts of cDNA amplification by GenomiPHI. To further cluster the 609 dominant nucleotide clusters (>100 sequences per cluster) for the purpose of designing PCR primers, all clusters were re-clustered at 95% identity over at least 40 base pairs. This allowed sequences with small 5′ or 3′ overlaps to be clustered together and increased the probability that the sequences assayed represented different transcripts. This reduced the number of dominant clusters from 609 to 85 (
<xref ref-type="supplementary-material" rid="pone.0003042.s002">
<bold>Table S2</bold>
</xref>
) and resulted in a significant increase in the number of clusters with more than 5,000 reads each. The maximum number of sequences in the largest cluster was 31,642 and 15 clusters now contained more than 5,000 reads. This provided a smaller pool of sequences for analysis and reduced the likelihood of amplifying similar sequences.</p>
<p>Primers were designed to screen 27 of these potential transcripts using the batch Primer3 online interface (
<ext-link ext-link-type="uri" xlink:href="http://probes.pw.usda.gov/cgi-bin/batchprimer3/batchprimer3.cgi">http://probes.pw.usda.gov/cgi-bin/batchprimer3/batchprimer3.cgi</ext-link>
), with the following conservative rules. First, we targeted the ‘representative’ sequence of each cluster (as opposed to the consensus sequence) to maximize the length of the query DNA sequence and avoid use of chimeric sequence that could have resulting from false assembly of the original 609 clusters. Second, we iteratively explored a range of parameters to find a rule that allows us to automatically create primers (no manual inspection required) for all 85 loci using a single set of optimality criteria that were as stringent as possible. In the end, we took the default parameters of the interface and optimized the following parameters: annealing temp (55°C), overall length of product (100 bp), primer size (20 bp), G+C content (50%) and minimum “maximum self-complementary”. Exact optimality criteria used for the selection of each batch of primers is available from the authors.</p>
<p>Individual transcript sequences were amplified by PCR from the environmental DNA used for the metagenomic analyses (co-extracted with the mRNA used for the metatranscriptomic approach), purified mRNA prior to RT-PCR and cDNA prior to GenomiPHI amplification. Each of the 54 primers (
<ext-link ext-link-type="uri" xlink:href="http://nebc.nox.ac.uk/nebcfs/public/Joint/metatranscript_primers.xls">http://nebc.nox.ac.uk/nebcfs/public/Joint/metatranscript_primers.xls</ext-link>
) were diluted to a working concentration of 10 pmol µl
<sup>−1</sup>
. Approximately 10 ng of environmental DNA, cDNA or mRNA was added to a 25 µl PCR reaction with final concentrations of 1×PCR buffer (Promega), 2.5 mM MgCl
<sub>2</sub>
, 0.2 mM deoxynucleoside triphosphates (Invitrogen), 0.4 pmol of each primer, and 1 unit of
<italic>Taq</italic>
DNA polymerase (Promega). Negative controls used were
<italic>Escherichia coli</italic>
K12 genomic DNA and sterile water. Reactions were cycled with a PTC 1000 thermal cycler (MJ Research) using the following conditions; 94°C for 2 minutes, 30 cycles of 94°C for 1 minute, 55°C for 1 minute, 72°C for 2 minutes, and a final extension of 72°C for 10 minutes. Products were visualised by agarose gel electrophoresis (1.8%).</p>
</sec>
</sec>
<sec id="s3">
<title>Results and Discussion</title>
<p>We demonstrate the feasibility of conducting metatranscriptomic studies on RNA samples highly enriched for mRNA from natural microbial communities. This is the first time such a high level of enrichment has been achieved in a metatranscriptomic study (
<xref ref-type="table" rid="pone-0003042-t002">
<bold>Table 2</bold>
</xref>
). Eight samples, four DNA and four mRNA, were processed producing a total of 323,161,989 bp (117.4 Mbp of mRNA and 205.7 Mbp of DNA). This exceeds previously published metatranscriptomic studies because of the inclusion of replicated samples. By further contrast, this is equivalent to 5.1% of the total bp sequenced, and 19% of the number of reads of the recent Global Ocean Survey (GOS) sequencing effort
<xref ref-type="bibr" rid="pone.0003042-Rusch1">[2]</xref>
. Here we present an analysis of these data that confirms the high level of enrichment for mRNA and the high levels of assembly of mRNA sequences compared to the DNA of the metagenomes; we also speculate on the potential coverage of the natural metatranscriptome sampled and discuss potential biases introduced by this methodology and provide evidence against the large-scale generation of mosaics and artefacts by the use of GenomiPHI amplification. We then discuss the proportion of these mRNAs that match protein databases, discuss the most abundant ‘known’ clusters, and compare the metatranscriptome with the metagenome and with the first published study of a marine metatranscriptome
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
.</p>
<table-wrap id="pone-0003042-t002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.t002</object-id>
<label>Table 2</label>
<caption>
<title>Comparison of methods described by current manuscript with the three most recent methods for analysing microbial metatranscriptomes.</title>
</caption>
<graphic id="pone-0003042-t002-2" xlink:href="pone.0003042.t002"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0003042-t002-2">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Leininger et al
<xref ref-type="bibr" rid="pone.0003042-Leininger1">[7]</xref>
; Urich et al
<xref ref-type="bibr" rid="pone.0003042-Urich1">[8]</xref>
</td>
<td align="left" rowspan="1" colspan="1">Frias-Lopez et al.
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
</td>
<td align="left" rowspan="1" colspan="1">Gilbert et al
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
(and this study)</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Habitat</bold>
</td>
<td align="left" rowspan="1" colspan="1">Soil (Nutrient-poor, sandy-soil)</td>
<td align="left" rowspan="1" colspan="1">Marine (oligotrophic ocean)</td>
<td align="left" rowspan="1" colspan="1">Marine (eutrophic coastal waters)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Total biological samples</bold>
</td>
<td align="left" rowspan="1" colspan="1">1 (1 metatranscriptome)</td>
<td align="left" rowspan="1" colspan="1">1 (1 metatranscriptome, 1 metagenome)</td>
<td align="left" rowspan="1" colspan="1">4 (4 metatranscriptomes, 4 metagenomes)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Total DNA/RNA (Millions bp)</bold>
</td>
<td align="left" rowspan="1" colspan="1">∼25.32</td>
<td align="left" rowspan="1" colspan="1">∼60.1</td>
<td align="left" rowspan="1" colspan="1">∼323.2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>RNA purification methodology</bold>
</td>
<td align="left" rowspan="1" colspan="1">Griffiths et al
<xref ref-type="bibr" rid="pone.0003042-Griffiths1">[11]</xref>
method from 6 g of soil.</td>
<td align="left" rowspan="1" colspan="1">mirVana RNA isolation kit (Ambion) from 1 L of sea water</td>
<td align="left" rowspan="1" colspan="1">Neufeld et al,
<xref ref-type="bibr" rid="pone.0003042-Neufeld1">[12]</xref>
method and MinElute RNA cleanup (Qiagen) from 15 L of seawater</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>mRNA isolation and amplification methodology</bold>
</td>
<td align="left" rowspan="1" colspan="1">N/A
<xref ref-type="table-fn" rid="nt116">*</xref>
</td>
<td align="left" rowspan="1" colspan="1">mRNA amplification using MEssageAmp II-Bacterial kit (Ambion)</td>
<td align="left" rowspan="1" colspan="1">MicrobeExpress and Megaclear kit (Ambion). GenomiPHI amplification (GE Healthcare)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>RNA sequencing</bold>
</td>
<td align="left" rowspan="1" colspan="1">GS20-pyrosequencing</td>
<td align="left" rowspan="1" colspan="1">GS-20 pyrosequencing</td>
<td align="left" rowspan="1" colspan="1">GS-flx pyrosequencing</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Average length</bold>
</td>
<td align="left" rowspan="1" colspan="1">98 bp</td>
<td align="left" rowspan="1" colspan="1">112 bp</td>
<td align="left" rowspan="1" colspan="1">215 bp</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Yield of mRNA sequences</bold>
</td>
<td align="left" rowspan="1" colspan="1">8.2%</td>
<td align="left" rowspan="1" colspan="1">47.1%</td>
<td align="left" rowspan="1" colspan="1">99.9%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Yield of orphaned sequences</bold>
</td>
<td align="left" rowspan="1" colspan="1">22% (60% of mRNA assigned tags)
<xref ref-type="table-fn" rid="nt114">1</xref>
</td>
<td align="left" rowspan="1" colspan="1">89.5%
<xref ref-type="table-fn" rid="nt115">2</xref>
</td>
<td align="left" rowspan="1" colspan="1">87%
<xref ref-type="table-fn" rid="nt115">2</xref>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt114">
<label>1</label>
<p>based on hits to nucleotide sequences using the MG-RAST Seed database.</p>
</fn>
<fn id="nt115">
<label>2</label>
<p>based on hits to potential open reading frames using the PFAM, TIGRfam and COG protein databases.</p>
</fn>
<fn id="nt116">
<label>*</label>
<p>Not performed, rRNA and mRNA expressly sequenced together to examine both community structure and function.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<sec id="s3a">
<title>Determining the proportion of ribosomal RNA remaining in mRNA metatranscriptomic samples</title>
<p>Both DNA and mRNA sequences were analyzed using the publicly-available SEED MG-RAST (Metagenome Rapid Annotation using Subsystem Technology,
<ext-link ext-link-type="uri" xlink:href="http://metagenomics.theseed.org">http://metagenomics.theseed.org</ext-link>
<xref ref-type="bibr" rid="pone.0003042-Overbeek1">[17]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Aziz1">[18]</xref>
), which compares inputted sequences against a database of metabolic systems from selected organisms. Taxonomic information for the metagenomes within SEED was obtained by comparison against three 16S rDNA databases (the Ribosomal Database Project II (RDP), Greengenes, and the European Ribosomal Database). Although rRNA comprises approximately 80–90% of total RNA in a typical bacterium
<xref ref-type="bibr" rid="pone.0003042-Wendisch1">[19]</xref>
, it averaged only 0.08% of the total number of sequences in the four combined cDNA libraries (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
). The purification was far more efficient than would be predicted for the methodology and capture-probe range of the Microbe Express kit (Ambion) used for the subtractive hybridisation of rRNA. This could be because the 16S rRNA probes used in the subtractive hybridisation technique may hybridise to a more significant proportion of the community than previously considered. While this might lead to a more substantial removal, it cannot explain the near-complete removal seen in this study. A second more likely possibility is that the multiple displacement amplification approach (GenomiPHI) used to amplify the available mRNA, inefficiently amplified rRNA due to its inherent secondary structure that could have inhibited the reaction (GE Healthcare technical services communication). Both of these options should be further tested.</p>
</sec>
<sec id="s3b">
<title>Comparisons of homology between datasets</title>
<p>To determine the similarity of each dataset to each other, total nucleic acids between each database and total partial ORFs (pORFs) between each database were compared to provide an indication of the number of homologous sequences shared between each pair of datasets (
<xref ref-type="supplementary-material" rid="pone.0003042.s003">
<bold>Table S3</bold>
</xref>
). This demonstrated that each DNA dataset shared approximately 10% to 25% of the nucleic acid sequences and 20% to 33% of the pORF sequences. This suggests that the majority of sequences within each group were unique (singletons) to each dataset; a similar result was seen in the Global Ocean Survey when the metagenomes of different regions were compared
<xref ref-type="bibr" rid="pone.0003042-Rusch1">[2]</xref>
. The comparison between mRNA datasets showed a clear delineation between mid-bloom and post-bloom, with mid-bloom mRNA sharing ∼50% of their nucleic acid transcripts and post-bloom sharing >95% of their nucleic acid transcripts. This result was consistent when the datasets were compared between time points, with ∼50% of mid-bloom transcripts being homologous with ∼90% of post-bloom transcripts (
<xref ref-type="supplementary-material" rid="pone.0003042.s003">
<bold>Table S3</bold>
</xref>
). We postulate below that this difference could be due to an over-abundance of viral transcripts in the post-bloom environment causing the metatranscriptomes to become more homogenous.</p>
<p>It was expected that the metatranscriptome from the post-bloom environment would be more similar to the metagenome from the post-bloom environment than the mid-bloom samples. The comparison clearly demonstrates this (
<xref ref-type="supplementary-material" rid="pone.0003042.s003">
<bold>Table S3</bold>
</xref>
). The mid-bloom metagenomes also had greater homology to the mid-bloom metatranscriptomes than the post-bloom metatranscriptomes.</p>
</sec>
<sec id="s3c">
<title>Clustering of DNA and mRNA sequences confirms higher levels of assembly of mRNAs and differences between time points</title>
<p>To determine the possible level of assembly of sequence clusters, total DNA and mRNA sequences were analyzed using a metagenomic sequence analysis pipeline developed at CAMERA (Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis
<xref ref-type="bibr" rid="pone.0003042-Seshadri1">[20]</xref>
) (access to this pipeline can be arranged by contacting the corresponding author). The number of unique sequences was calculated by clustering un-assembled sequence reads as described in the
<xref ref-type="sec" rid="s2">Materials and Methods</xref>
<xref ref-type="bibr" rid="pone.0003042-Li1">[16]</xref>
.</p>
<p>As shown in
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
an average of 79% of the DNA-derived metagenome sequences from both mid- and post-bloom samples were unique (singletons). This confirms the low level of coverage of the genomes in this sample and the high diversity. In contrast, the mRNA-derived sequences showed much higher levels of clustering, with an average of 45% of the sequences from the mid-bloom (time point 1) and only 9.5% of the post-bloom sequences (time point 2) being unique (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
) (calculated by dividing the total number of unique sequence clusters by the total number of sequence reads). Strikingly, only five of the 609 largest nucleotide-level mRNA clusters (those with ≥100 sequences) had an observed match in any of the DNA metagenomes. This low level of homology between mRNA sequences and the community DNA metagenome was previously noted
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
and is expected given the sparse sequence coverage for the much larger metagenome. Alternatively, this could be an overestimate of the lack of homology as it is possible for any of the RNA clusters to actually come from the same transcript. For example, an mRNA cluster from the first 25% of a given gene (average gene length ∼950 bp
<xref ref-type="bibr" rid="pone.0003042-Xu1">[21]</xref>
) would appear to have no match even if the DNA library captured the other 75% of the gene. Unfortunately, there is no way to resolve this issue given the small size of sequences currently generated with pyrosequencing methodology.</p>
<p>To directly compare the level of assembly (diversity) in the eight samples, individual rarefaction was used to normalize the number of clusters to an equivalent sampling effort (i.e. that of the smallest sample) (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
). The number of clusters in all four DNA samples was surprisingly uniform and was double the number of clusters from the mid-bloom and nine times from the post-bloom mRNA samples. These results show that the metatranscriptome is smaller than the metagenome, assembles better, and that the expression of genes is different for mid-bloom and post-bloom communities. Of these an average of 0.23% of mid-bloom and 2.3% of post-bloom transcript clusters included more than 100 sequences. In other words, the transcription profile became more homogenous in the post-bloom situation.</p>
<p>Based on this clustering, we compared the total number of clusters found to the total number expected within a given water sample. To generate a rough estimate of potential metatranscriptome coverage, we used the approach of Poretsky
<italic>et al.</italic>
<xref ref-type="bibr" rid="pone.0003042-Poretsky1">[6]</xref>
to estimate that each water sample contained ca. 80,000 unique transcripts. This estimate is based on the observed number of dominant taxa and bacterial abundance (data not shown). This is the same order of magnitude as the number of unique mRNA sequences identified (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
) suggesting that this study may have achieved a reasonable coverage of the community metatranscriptome (in comparison, the metagenomes were vastly under-sampled). This is clearly an upper estimate, and given that the top 609 nucleotide clusters could be collapsed with less conservative clustering criteria into 85 larger clusters (see
<xref ref-type="sec" rid="s2">Materials and Methods</xref>
), the actual number of transcripts could be 7-fold or more lower. Indeed the real value could be even lower, since one functional transcript may be coded for by more than one cluster (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
). We have previously shown this for another gene,
<italic>phnA</italic>
, that encodes phosphonoacetate hydrolase; the
<italic>phnA</italic>
from one organism had twelve hits within the metagenomic data, which were spread out over the gene
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
. Using the clustering methodology outlined here, this method would have identified this one gene as belonging to six different clusters due to overlap between the 12 sequences.</p>
<table-wrap id="pone-0003042-t003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.t003</object-id>
<label>Table 3</label>
<caption>
<title>Top ten most abundant identifiable transcripts identified from pORF clustering.</title>
</caption>
<graphic id="pone-0003042-t003-3" xlink:href="pone.0003042.t003"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0003042-t003-3">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Rank</td>
<td align="left" rowspan="1" colspan="1">COG ID</td>
<td align="left" rowspan="1" colspan="1">No. of Seqs (nr)</td>
<td align="left" rowspan="1" colspan="1">No. of clusters</td>
<td align="left" rowspan="1" colspan="1">Annotation</td>
<td align="left" rowspan="1" colspan="1">TIGRfam ID</td>
<td align="left" rowspan="1" colspan="1">No. of Seqs. (nr)</td>
<td align="left" rowspan="1" colspan="1">No. of clusters</td>
<td align="left" rowspan="1" colspan="1">Annotation</td>
<td align="left" rowspan="1" colspan="1">PFAM ID</td>
<td align="left" rowspan="1" colspan="1">No. of Seqs. (nr)</td>
<td align="left" rowspan="1" colspan="1">No. of clusters</td>
<td align="left" rowspan="1" colspan="1">Annotation</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0209</td>
<td align="left" rowspan="1" colspan="1">464 (149)</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">Ribonucleotide reductase, alpha subunit</td>
<td align="left" rowspan="1" colspan="1">TIGR02505</td>
<td align="left" rowspan="1" colspan="1">526 (190)</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">ribonucleoside-triphosphate reductase, adenosylcobalamin-dependent</td>
<td align="left" rowspan="1" colspan="1">PF02407</td>
<td align="left" rowspan="1" colspan="1">27147 (815)</td>
<td align="left" rowspan="1" colspan="1">44</td>
<td align="left" rowspan="1" colspan="1">Putative viral replication protein</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">COG0443</td>
<td align="left" rowspan="1" colspan="1">330 (156)</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">Molecular chaperone</td>
<td align="left" rowspan="1" colspan="1">TIGR02348</td>
<td align="left" rowspan="1" colspan="1">359 (158)</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">chaperonin GroL</td>
<td align="left" rowspan="1" colspan="1">PF00910</td>
<td align="left" rowspan="1" colspan="1">15101 (595)</td>
<td align="left" rowspan="1" colspan="1">28</td>
<td align="left" rowspan="1" colspan="1">RNA helicase</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">COG0459</td>
<td align="left" rowspan="1" colspan="1">359 (158)</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">Chaperonin GroEL (HSP60 family)</td>
<td align="left" rowspan="1" colspan="1">TIGR02350</td>
<td align="left" rowspan="1" colspan="1">330 (156)</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">chaperone protein DnaK</td>
<td align="left" rowspan="1" colspan="1">PF00005</td>
<td align="left" rowspan="1" colspan="1">372 (212)</td>
<td align="left" rowspan="1" colspan="1">18</td>
<td align="left" rowspan="1" colspan="1">ABC transporter</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">COG0376</td>
<td align="left" rowspan="1" colspan="1">96 (46)</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">Catalase (peroxidase I)</td>
<td align="left" rowspan="1" colspan="1">TIGR01369</td>
<td align="left" rowspan="1" colspan="1">214 (108)</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">carbamoyl-phosphate synthase, large subunit</td>
<td align="left" rowspan="1" colspan="1">PF00004</td>
<td align="left" rowspan="1" colspan="1">326 (149)</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">ATPase family associated with various cellular activities (AAA)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">COG0458</td>
<td align="left" rowspan="1" colspan="1">214 (108)</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">Carbamoylphosphate synthase large subunit</td>
<td align="left" rowspan="1" colspan="1">TIGR02188</td>
<td align="left" rowspan="1" colspan="1">236 (120)</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">acetate–CoA ligase</td>
<td align="left" rowspan="1" colspan="1">PF00012</td>
<td align="left" rowspan="1" colspan="1">330 (156)</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">Hsp70 protein</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">COG5265</td>
<td align="left" rowspan="1" colspan="1">236 (126)</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">ABC-type transport system involved in Fe-S cluster assembly, permease and ATPase components</td>
<td align="left" rowspan="1" colspan="1">TIGR00630</td>
<td align="left" rowspan="1" colspan="1">224 (103)</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">excinuclease ABC, A subunit</td>
<td align="left" rowspan="1" colspan="1">PF00118</td>
<td align="left" rowspan="1" colspan="1">359 (158)</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">TCP-1/cpn60 chaperonin family</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">COG0086</td>
<td align="left" rowspan="1" colspan="1">210 (96)</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">DNA-directed RNA polymerase, beta' subunit/160 kD subunit</td>
<td align="left" rowspan="1" colspan="1">TIGR00936</td>
<td align="left" rowspan="1" colspan="1">257 (118)</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">adenosylhomocysteinase</td>
<td align="left" rowspan="1" colspan="1">PF00006</td>
<td align="left" rowspan="1" colspan="1">292 (122)</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">ATP synthase alpha/beta family, nucleotide-binding domain</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">COG0178</td>
<td align="left" rowspan="1" colspan="1">224 (103)</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">Excinuclease ATPase subunit</td>
<td align="left" rowspan="1" colspan="1">TIGR02013</td>
<td align="left" rowspan="1" colspan="1">219 (82)</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">DNA-directed RNA polymerase, beta subunit</td>
<td align="left" rowspan="1" colspan="1">PF00009</td>
<td align="left" rowspan="1" colspan="1">399 (148)</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">Elongation factor Tu GTP binding domain</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">COG0499</td>
<td align="left" rowspan="1" colspan="1">257 (118)</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">S-adenosylhomocysteine hydrolase</td>
<td align="left" rowspan="1" colspan="1">TIGR02506</td>
<td align="left" rowspan="1" colspan="1">267 (78)</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">ribonucleoside-diphosphate reductase, alpha subunit</td>
<td align="left" rowspan="1" colspan="1">PF02867</td>
<td align="left" rowspan="1" colspan="1">326 (101)</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">Ribonucleotide reductase, barrel domain</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">COG0085</td>
<td align="left" rowspan="1" colspan="1">219 (82)</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">DNA-directed RNA polymerase, beta subunit/140 kD subunit</td>
<td align="left" rowspan="1" colspan="1">TIGR01242</td>
<td align="left" rowspan="1" colspan="1">178 (84)</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">26S proteasome subunit P45 family</td>
<td align="left" rowspan="1" colspan="1">PF00501</td>
<td align="left" rowspan="1" colspan="1">228 (116)</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">AMP-binding enzyme</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt117">
<p>This table only includes the pORF clusters with >10 non-redundant sequences. No. of Seqs. refers to the number of sequences which contribute to that cluster, in brackets are the number of non-redundant sequences which contribute to that cluster.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3d">
<title>Evidence against potential biases in detecting naturally occurring mRNA clusters introduced by GenomiPHI</title>
<p>There are two key biases that might be introduced using the methodology presented here. Firstly, the time required to concentrate the community by filtration is longer than the half-life of mRNA, but this is true of most methods used to analyze the metatranscriptome of aquatic samples
<xref ref-type="bibr" rid="pone.0003042-Poretsky1">[6]</xref>
. Recent studies however, have used smaller volumes (e.g. ∼1 L
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
), and the current methodology would still be effective using these smaller volumes. However, in the current study this methodology was run concomitantly with other analyses that required a significant amount of DNA, e.g. fosmid library production
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
.</p>
<p>Secondly, amplification of cDNA using GenomiPHI could introduce artefactual sequences (although evidence of such a bias for transcriptome amplification does not exist
<xref ref-type="bibr" rid="pone.0003042-Francois1">[22]</xref>
). Such artefacts could include mosaic or artefactual sequences that could explain the large number of orphan transcripts found in this study. We therefore performed four types of subsequent analyses to attempt to validate these clusters.</p>
<p>Firstly, to generate empirical evidence of the presence of these clusters in the original water samples and to test for chimeras, a PCR analysis was performed that targeted 27 of the most highly expressed orphan clusters. PCR reactions were performed on 1) the original environmental DNA preparations, 2) unamplified cDNA and 3) mRNA (this was a negative control, since it is DNA-free). Amplification products were detected for all 27 selected target sequences in at least one of the environmental DNA samples (
<xref ref-type="supplementary-material" rid="pone.0003042.s002">
<bold>Table S2</bold>
</xref>
). None of the sequences could be detected in any of the 4 mRNA samples (negative controls) confirming an absence of contamination of DNA. All 27 transcripts were found in all four cDNA samples. For the mid-bloom time points, 12 and 11 of the transcripts respectively were identified in the high CO
<sub>2</sub>
and control environmental DNA samples
<bold>(</bold>
<xref ref-type="supplementary-material" rid="pone.0003042.s002">
<bold>Table S2</bold>
</xref>
). Some, but not all, of these transcripts were of lower abundance when normalized to sequencing effort (data not shown).</p>
<p>Secondly, this is the first published metatranscriptomic study to include biological replicates (
<xref ref-type="table" rid="pone-0003042-t002">
<bold>Table 2</bold>
</xref>
) making it possible to compare observed transcripts generated from independent samples using the same methodology. Of the four metatranscriptomes analyzed, transcript clusters showed similar abundance in
<italic>both</italic>
peak bloom samples and
<italic>both</italic>
post bloom samples (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
<bold>, </bold>
<xref ref-type="supplementary-material" rid="pone.0003042.s002">
<bold>Table S2</bold>
</xref>
). Since all four metatranscriptomes were generated using the same mRNA enrichment methodology, this level of observed similarity of abundant transcripts would not be expected by chance and provides strong evidence that the difference seen
<italic>between</italic>
time points in both the treatment and control samples are due to biological differences in the composition of the community within the bloom (
<xref ref-type="supplementary-material" rid="pone.0003042.s002">
<bold>Table S2</bold>
</xref>
).</p>
<p>Thirdly, we compared the functional profiles of the metagenomes and metatranscriptomes (
<xref ref-type="fig" rid="pone-0003042-g001">
<bold>Fig. 1</bold>
</xref>
). All eight data sets were annotated using similarity matching against SEED subsystems
<xref ref-type="bibr" rid="pone.0003042-Overbeek1">[17]</xref>
. While this approach only validates transcripts with observable homology to genes in known subsystems, it still shows that the metatranscriptome functional profile does not significantly differ from that of the metagenome (one-way Anosim R = 0.271,
<italic>p</italic>
 = >0.05). For this analysis, the number of sequences with significant identity to each metabolic gene in a functional category in the SEED subsystem database were normalised to the sequencing effort for each sample (
<xref ref-type="fig" rid="pone-0003042-g001">
<bold>Fig. 1</bold>
</xref>
) and sequences which could not be annotated in this way were not included.</p>
<fig id="pone-0003042-g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.g001</object-id>
<label>Figure 1</label>
<caption>
<title>Relative abundance of sequence types identified for each sample.</title>
<p>Number of sequences per metabolism subsystem were normalised to sequencing effort for each sample and then relative abundance for each was calculated as a percentage.</p>
</caption>
<graphic xlink:href="pone.0003042.g001"></graphic>
</fig>
<p>Fourthly, we compared the level of assembly and novelty between our four mRNA and DNA samples and that of the only previously published metatranscriptomic study of a marine microbial community
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
. All samples were translated in all six reading frames into contiguous peptides of at least 30 amino acids without a stop codon, spurious pORFs were removed, leaving 2,567,050 predicted or putative pORFs (See
<xref ref-type="sec" rid="s2">Materials and Methods</xref>
,
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
). These pORFs were clustered to assess the diversity of function from each sample, and were compared against known databases to provide basic annotation of known proteins and potential identification of novel pORFs. As shown in
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
, the majority of the highly clustered (≥10 non-redundant sequences per cluster) transcripts (∼94% mid-bloom and ∼87% post-bloom) were novel clusters that may represent uncharacterized proteins.</p>
<p>The mRNA samples from the current study yielded 1∼2 orders of magnitude more novel protein clusters than their corresponding DNA samples when normalized to size (
<xref ref-type="table" rid="pone-0003042-t001">
<bold>Table 1</bold>
</xref>
). Surprisingly, this high level of diversity was actually exceeded by the previously published Frias-Lopez study
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
. Clustering of that data according to the same criteria showed that ∼98% of metatranscriptomic sequences were unique (
<xref ref-type="supplementary-material" rid="pone.0003042.s001">
<bold>Table S1</bold>
</xref>
). To directly compare the annotation of pORFs for this study with that of the Frias-Lopez study
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
, we applied the same clustering techniques for the translated proteins to their raw data (
<xref ref-type="supplementary-material" rid="pone.0003042.s001">
<bold>Table S1</bold>
</xref>
). A total of 1,826 pORF clusters containing >10 non-redundant sequences were found (DNA, rRNA and mRNA) and 865 pORF clusters remained after all rRNA clusters were removed. If the values for novel protein clusters are normalised to sequencing effort, we see that the Frias-Lopez study identified 1.8× the number of novel protein sequences per sequencing effort when compared to the current study. This phenomenon can be partially explained by the differences in read lengths between the studies (
<xref ref-type="table" rid="pone-0003042-t002">
<bold>Table 2</bold>
</xref>
), as longer read lengths are more likely to be positively annotated than shorter read lengths
<xref ref-type="bibr" rid="pone.0003042-Wilson1">[23]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Lipman1">[24]</xref>
.</p>
</sec>
<sec id="s3e">
<title>Differences between sequence abundances in metatranscriptomes and metagenomes</title>
<p>The benefits of applying both metatranscriptomic and metagenomic analysis to the same biological samples include the potential to detect differential expression of mRNAs (function) between communities under different environmental conditions, while the metagenome (DNA) can also provide a frame-of-reference for the total potential of the community metatranscriptome. Using the proportion of DNA and mRNA sequences that had homology to known proteins, we were able to make phylum-level taxonomic assignments using annotations from the SEED databases
<xref ref-type="bibr" rid="pone.0003042-Aziz1">[18]</xref>
(
<xref ref-type="fig" rid="pone-0003042-g002">
<bold>Fig. 2</bold>
</xref>
). Comparison of the 4 DNA and 4 mRNA samples shows them to be significantly different in taxonomic composition (by one-way Anosim analysis, R = 0.385,
<italic>p</italic>
<0.03). Despite this, comparisons of all subsets of the data failed to reveal any significant differences (perhaps due to small sample sizes – data not shown). This suggests that changes seem in mid- and post-bloom time points are due more to changes in particular genes within taxa, than large-scale changes in the abundances of phyla-level taxonomic groups. Some potential qualitative changes can be seen within these patterns that may contribute to this significant difference between DNA and mRNAs including an increased number of transcripts from the
<italic>Bacteroidetes</italic>
phylum (an important group in macromolecule degradation) during the mid-bloom sample compared to its proportion of the same sample of DNA (Bacteroidetes was only the 4
<sup>th</sup>
most abundant metagenomic group but had the 3
<sup>rd</sup>
highest transcriptional activity) (
<xref ref-type="fig" rid="pone-0003042-g002">
<bold>Fig. 2</bold>
</xref>
).</p>
<fig id="pone-0003042-g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Percentage taxonomic affiliation of sequences identified in each dataset by BLAST against the SEED database.</title>
<p>A – community at peak of the phytoplankton bloom (±1 SD). B - community after the phytoplankton bloom (±1 SD). Standard deviations are calculated from comparison of the different treatments. Data shown are for the high CO
<sub>2</sub>
treatment.</p>
</caption>
<graphic xlink:href="pone.0003042.g002"></graphic>
</fig>
</sec>
<sec id="s3f">
<title>The most abundant ‘known’ transcripts found in the metatranscriptome and metagenome</title>
<p>The most abundant ‘known’ pORF clusters included a large number of housekeeping genes. All pORF clusters with >10 non redundant sequences were annotated by comparison to the PFAM, TIGRfam and COG databases (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
). Whilst the PFAM annotations yielded significant numbers of viral proteins, viral sequences were absent from both the TIGRfam and COG annotations (viral annotations discussed in next section).</p>
<p>Among the most abundant sequences with annotations are stress-induced chaperonin proteins (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
), which are potentially expressed in response to the low pH or high CO
<sub>2</sub>
concentration stress found in the ocean acidification mesocosm samples. This is corroborated by the distribution of sequences with ∼60% (1.4×increase) of the chaperonin transcript sequences coming from the high CO
<sub>2</sub>
mesocosms. This is mirrored by the metagenomic data in which ∼55% (1.2×increase) of the chaperonin gene sequences are found in the high CO
<sub>2</sub>
environment. However, it is possible that these proteins are induced when the bacteria are being filtered, and hence this could be an artefact caused by the sampling procedure; using smaller starting volumes should alleviate this. Neither of these proteins were identified as being abundant in the dominant pORF clusters (>10 non-redundant sequences) from study by Frias-Lopez
<italic>et al.</italic>
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
which utilised only 1 L sampling volumes and may have reduced stress on the bacteria by reducing the filtration time.</p>
<p>A range of proteins considered to be ubiquitous in cellular processes also ranked among the most abundant sequences that could be annotated. These included ribonucleotide reductase proteins (COG0209, TIGR02505, TIGR02506, PF02867), which were matched in all 3 reference databases (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
), as were proteins involved in ABC transporters, ATPase activity and AMP-binding (COG5265, TIGR00630, PF00004, PF00005, PF00006, PF00501). RNA polymerases (COG0085, COG0086, TIGR02013) were only assigned by the COG and TIGRfam annotations. Other abundant pORF clusters (>10 non-redundant sequences) encoded catalases/peroxidises (COG0376), carbamoylphosphate syntheases (COG0458, TIGR01369), excinucleases (COG0178), S-adenosylhomocysteine hydrolase (COG0499, TIGR00936), acetate-CoA ligases (TIGR02188), 26S proteasome subunit P45 family protein (TIGR01242), and elongation factor Tu GTP binding domain (PF00009) (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
).</p>
</sec>
<sec id="s3g">
<title>Abundant viral sequences in the post-bloom time point samples may contribute to the large number of orphan transcript clusters found</title>
<p>At the end of any phytoplankton bloom, a substantial increase is expected in the number of expressed viral transcripts. This was observed in our post-bloom mRNA samples, in which transcripts with viral homologues were on average 24.5 times more abundant than viral DNA sequences (
<xref ref-type="fig" rid="pone-0003042-g002">
<bold>Fig. 2</bold>
</xref>
). While free viruses particles would pass through the 0.22 µm filters used, and would therefore have low sequence abundance in the post-bloom samples, infected cells would be expected to have overwhelming viral gene expression during lytic growth (
<xref ref-type="fig" rid="pone-0003042-g002">
<bold>Fig. 2</bold>
</xref>
). The large increase in viral transcription occurred immediately after a substantial increase in bacterial abundance following the phytoplankton bloom (data not shown).</p>
<p>The high expected abundance of viruses in the post-bloom environment suggests that many of the unknown predicted proteins maybe of viral origin
<xref ref-type="bibr" rid="pone.0003042-Yooseph1">[3]</xref>
. This is supported by the annotation of the dominant pORF clusters (with >10 non-redundant sequences,
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
) and
<xref ref-type="fig" rid="pone-0003042-g002">
<bold>Fig. 2</bold>
</xref>
. The most abundant sequence that could be annotated was PF02407, Putative Viral Replication Protein, and the second most abundant was PF00910, RNA helicase, which is thought to be involved in viral infection (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
). These sequences comprise 7.7% and 5% respectively of the dominant clusters that can be annotated by comparison to the PFAM database (12.7% in total).</p>
<p>Furthermore, these two proteins are more abundant in the post-bloom environment, with ∼86% of these transcripts being found only in the post-bloom samples. Interestingly, only a single homologue for PF02407 was found in the post-bloom metagenomes. This not only confirms the results seen in
<xref ref-type="fig" rid="pone-0003042-g002">
<bold>Fig. 2</bold>
</xref>
, but also underscores a clear case of the biological significance of observed differences in the ratios of transcripts and their DNA sequences.</p>
</sec>
<sec id="s3h">
<title>Further validation of metatranscriptomes and metagenomes by direct comparison to an oligotrophic ocean metatranscriptome</title>
<p>Both the validity and nature of mRNA transcripts from this experiment were explored by direct comparison with the Frias-Lopez data set (
<xref ref-type="table" rid="pone-0003042-t002">
<bold>Table 2</bold>
</xref>
)
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
. There was some overlap of sequences, including both house-keeping genes and a few of the most highly expressed novel orphan clusters. But the analysis also highlights the extensive diversity between these samples which, while both taken from the marine environment, came from two distinct marine habitats (
<xref ref-type="table" rid="pone-0003042-t002">
<bold>Table 2</bold>
</xref>
).</p>
<p>Specifically, comparisons were generated using BLASTN (
<xref ref-type="table" rid="pone-0003042-t004">
<bold>Table 4</bold>
</xref>
) for 3 versions of the two data sets: 1) total sequences, 2) representative ntDNA sequences from each nucleotide cluster and 3) representative sequences from each pORF cluster. Both mRNA and DNA sequences were compared separately. Values for the Frias-Lopez cDNA following removal of the rRNA sequences were also used for comparison. The most abundant clusters of the current study were also compared to the Frias-Lopez full mRNA and DNA datasets (
<xref ref-type="table" rid="pone-0003042-t005">
<bold>Table 5</bold>
</xref>
).</p>
<table-wrap id="pone-0003042-t004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.t004</object-id>
<label>Table 4</label>
<caption>
<title>BLASTN comparison of total nucleic acids, representative sequences from nucleic acid clusters and representative sequences from pORF clusters from this study and the Frias-Lopez study
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
.</title>
</caption>
<graphic id="pone-0003042-t004-4" xlink:href="pone.0003042.t004"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0003042-t004-4">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="4" align="left" rowspan="1">Gilbert et al
<xref ref-type="bibr" rid="pone.0003042-Gilbert1">[10]</xref>
and current study</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="2" align="left" rowspan="1">
<italic>DNA, DNA nuc-clusters, DNA pORF clusters</italic>
</td>
<td colspan="2" align="left" rowspan="1">
<italic>mRNA, mRNA nuc-clusters, mRNA pORF clusters</italic>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Frias-Lopez et al </bold>
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[
<bold>9</bold>
]</xref>
</td>
<td align="left" rowspan="1" colspan="1">
<italic>DNA</italic>
</td>
<td align="left" rowspan="1" colspan="1">44261(10.7)</td>
<td align="left" rowspan="1" colspan="1">102637 (10.3)</td>
<td align="left" rowspan="1" colspan="1">19359 (4.67)</td>
<td align="left" rowspan="1" colspan="1">56835 (11.2)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>DNA nuc-clusters</italic>
</td>
<td align="left" rowspan="1" colspan="1">35575 (10.6,
<bold>8.6</bold>
)</td>
<td align="left" rowspan="1" colspan="1">59918 (9.5,
<bold>6</bold>
)</td>
<td align="left" rowspan="1" colspan="1">15564 (4.65,
<bold>3.75</bold>
)</td>
<td align="left" rowspan="1" colspan="1">11698 (8.8,
<bold>2.3</bold>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>DNA-pORF clusters</italic>
</td>
<td align="left" rowspan="1" colspan="1">59774 (15,
<bold>14.4</bold>
)</td>
<td align="left" rowspan="1" colspan="1">40002 (3.7,
<bold>4</bold>
)</td>
<td align="left" rowspan="1" colspan="1">21302 (5.5,
<bold>5.1</bold>
)</td>
<td align="left" rowspan="1" colspan="1">17672 (7.5,
<bold>3.5</bold>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>mRNA</italic>
</td>
<td align="left" rowspan="1" colspan="1">64609 (52.4)</td>
<td align="left" rowspan="1" colspan="1">18602 (1.9)</td>
<td align="left" rowspan="1" colspan="1">58123 (45)</td>
<td align="left" rowspan="1" colspan="1">2680 (0.53)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>mRNA (rRNA removed)</italic>
</td>
<td align="left" rowspan="1" colspan="1">15179 (24.1,
<bold>11.8</bold>
)</td>
<td align="left" rowspan="1" colspan="1">15598 (1.57,
<bold>1.6</bold>
)</td>
<td align="left" rowspan="1" colspan="1">13121 (20.8,
<bold>10.2</bold>
)</td>
<td align="left" rowspan="1" colspan="1">2162 (0.43,
<bold>0.42</bold>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>mRNA nuc-clusters</italic>
</td>
<td align="left" rowspan="1" colspan="1">27094 (38.7,
<bold>21.1</bold>
)</td>
<td align="left" rowspan="1" colspan="1">10689 (1.7,
<bold>1.07</bold>
)</td>
<td align="left" rowspan="1" colspan="1">22289 (31.9,
<bold>17.4</bold>
)</td>
<td align="left" rowspan="1" colspan="1">1942 (1.45,
<bold>0.38</bold>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>mRNA nuc-clusters (rRNA removed)</italic>
</td>
<td align="left" rowspan="1" colspan="1">8338 (19,
<bold>6.5</bold>
)</td>
<td align="left" rowspan="1" colspan="1">9631 (1.5,
<bold>1</bold>
)</td>
<td align="left" rowspan="1" colspan="1">6171 (14,
<bold>5</bold>
)</td>
<td align="left" rowspan="1" colspan="1">1624 (1.2,
<bold>0.3</bold>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">
<italic>mRNA-pORF clusters</italic>
</td>
<td align="left" rowspan="1" colspan="1">4330 (9,
<bold>3.4</bold>
)</td>
<td align="left" rowspan="1" colspan="1">5484 (0.5,
<bold>0.55</bold>
)</td>
<td align="left" rowspan="1" colspan="1">2372 (5,
<bold>1.8</bold>
)</td>
<td align="left" rowspan="1" colspan="1">1736 (0.7,
<bold>0.34</bold>
)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt118">
<p>For each comparison two values are given, the first value is the percentage of Frias-Lopez data which is homologous to data from the current study; the second is the percentage of data from the current study which is homologous to the Frias-Lopez data. Comparisons were performed using BLASTN with the current studies dataset as reference database, and the Frias-Lopez dataset as the query. The (-b –v) parameter in BLASTN was set to 40,000. For every query sequence, every similar sequence in the reference dataset is identified. Sequences from both datasets that meet the criteria of an E-value <0.001 were included. Percentage values in parentheses are calculated by dividing each value by the total number of sequences/representative sequences for each dataset. For the Frias-Lopez data: Total DNA – 414,323, Total DNA nuc-clusters – 334,940, Total DNA pORF clusters – 390,599, Total mRNA – 128,234, Total mRNA (rRNA removed) - 63,111, Total mRNA nuc-clusters – 69,948, Total mRNA nuc-clusters (rRNA removed) - 43,948, Total mRNA-pORF clusters – 46,703. For the Gilbert data: Total DNA – 992,224, Total DNA nuc-clusters – 630,159, Total DNA pORF clusters – 1,083,644, Total mRNA – 506,353, Total mRNA nuc-clusters – 133,447, Total mRNA pORF clusters – 238,655. Percentage values in bold are normalised by divided each value through the Total DNA or Total RNA for the relevant study. Nuc-cluster refers to nucleotide clusters.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="pone-0003042-t005" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.t005</object-id>
<label>Table 5</label>
<caption>
<title>BLASTN comparison of the reference sequences of the abundant nucleic acid clusters (>10 and >100 sequences per cluster) from the current study to the total combined mRNA and DNA sequences from the Frias-Lopez et al
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
study.</title>
</caption>
<graphic id="pone-0003042-t005-5" xlink:href="pone.0003042.t005"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0003042-t005-5">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td colspan="2" align="left" rowspan="1">Frias-Lopez et al
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
study</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">mRNA homologues (%)</td>
<td align="left" rowspan="1" colspan="1">DNA homologues (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Current Study</td>
<td align="left" rowspan="1" colspan="1">3639 nucleotide clusters (10–99 sequences)</td>
<td align="left" rowspan="1" colspan="1">107 (2.9%)</td>
<td align="left" rowspan="1" colspan="1">326 (9%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">85 nucleotide clusters (>100 sequences)</td>
<td align="left" rowspan="1" colspan="1">1 (1.2%)</td>
<td align="left" rowspan="1" colspan="1">4 (4.7%)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt119">
<p>The 3649 clusters have >10 sequences and the 85 clusters are ‘contigs’ of all 609 clusters with >100 sequences (as described in the
<xref ref-type="sec" rid="s2">Materials and Methods</xref>
).</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>About 10% of the sequences in the two metagenomes are shared, but the shared proportion of DNA pORFs is higher (15%) for the Frias-Lopez study and considerably lower (3.7%) for the current study. Interestingly, this trend is confirmed by the DNA-mRNA comparisons in which the total proportion of DNA matches is always far lower than the proportion of mRNA (
<xref ref-type="table" rid="pone-0003042-t004">
<bold>Table 4</bold>
</xref>
). At the highest level of assembly, comparisons of pORF clusters reveal that 9% and 7.5% of the mRNAs are shared with the relevant metagenome. Smaller proportions of the pORF mRNAs of each study (5.0% and 0.7%) showed similarities to each other suggesting that different subsets of the “potential metatranscriptome” of the two communities are expressed in the two habitats.</p>
<p>Shared sequences are expected to include housekeeping genes, as suggested by the homology between many of the largest identifiable clusters found in both studies
<bold>(</bold>
<xref ref-type="table" rid="pone-0003042-t004">
<bold>Table 4</bold>
</xref>
and
<xref ref-type="table" rid="pone-0003042-t006">
<bold>Table 6</bold>
</xref>
). Similarities among the mostly highly expressed abundant clusters are still very rare (
<xref ref-type="table" rid="pone-0003042-t005">
<bold>Table 5</bold>
</xref>
). This could be a result of niche-specific genes (or post-bloom specific genes in this study) and/or the heavy viral load associated with the collapsing algal bloom conditions for the current study. This viral load hypothesis is potentially confirmed by an observed anomaly seen in
<xref ref-type="table" rid="pone-0003042-t004">
<bold>Table 4</bold>
</xref>
. The relative percentage of total nucleic acid comparisons is higher than the comparison between nucleic acid clusters (nuc-clusters) for each analysis except when comparing our mRNA nucleotide clusters (mRNA nuc-clusters). The relative percentage increases from 0.53% to 1.45%, and is seen again when comparing against the Frias-Lopez mRNA following removal of the rRNA, whereby the values are 0.43% increasing to 1.2% (
<xref ref-type="table" rid="pone-0003042-t004">
<bold>Table 4</bold>
</xref>
). We hypothesise that this anomaly is caused by the majority of the Frias-Lopez mRNA homolog's being singletons in our mRNA data, hence on clustering, their contribution to the percentage calculation is more significant. This highlights the fact that the abundant sequences in our mRNA data are not abundantly expressed in the Frias-Lopez data, which is to be expected if they are viral sequences.</p>
<table-wrap id="pone-0003042-t006" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0003042.t006</object-id>
<label>Table 6</label>
<caption>
<title>Top 10 most abundant annotatable transcripts from the Frias-Lopez
<italic>et al.</italic>
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
.</title>
</caption>
<graphic id="pone-0003042-t006-6" xlink:href="pone.0003042.t006"></graphic>
<table frame="hsides" rules="groups" alternate-form-of="pone-0003042-t006-6">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td colspan="4" align="left" rowspan="1">PFAM</td>
<td colspan="4" align="left" rowspan="1">TIGRFAM</td>
<td colspan="4" align="left" rowspan="1">COG</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PFAM ID</td>
<td align="left" rowspan="1" colspan="1">Annotation</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">B</td>
<td align="left" rowspan="1" colspan="1">TIGRfam ID</td>
<td align="left" rowspan="1" colspan="1">Annotation</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">B</td>
<td align="left" rowspan="1" colspan="1">COG ID</td>
<td align="left" rowspan="1" colspan="1">Annotation</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">B</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">PF00004</td>
<td align="left" rowspan="1" colspan="1">ATPase family associated with various cellular activities (AAA)</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">TIGR00485</td>
<td align="left" rowspan="1" colspan="1">Translation elongation factor Tu</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">COG4585</td>
<td align="left" rowspan="1" colspan="1">Signal transduction histidine kinase</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">8</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF01370</td>
<td align="left" rowspan="1" colspan="1">NAD dependent epimerase/dehydratase family</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR01242</td>
<td align="left" rowspan="1" colspan="1">26S proteasome subunit P45 family</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG5116</td>
<td align="left" rowspan="1" colspan="1">26S proteasome regulatory complex component</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">6</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF03143</td>
<td align="left" rowspan="1" colspan="1">Elongation factor Tu C-terminal domain</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR02639</td>
<td align="left" rowspan="1" colspan="1">ATP-dependent Clp protease ATP-binding subunit ClpA</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0050</td>
<td align="left" rowspan="1" colspan="1">GTPases - translation elongation factors</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF00521</td>
<td align="left" rowspan="1" colspan="1">DNA gyrase/topoisomerase IV, subunit A</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR00962</td>
<td align="left" rowspan="1" colspan="1">ATP synthase F1, alpha subunit</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG1222</td>
<td align="left" rowspan="1" colspan="1">ATP-dependent 26S proteasome regulatory subunit</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF03144</td>
<td align="left" rowspan="1" colspan="1">Elongation factor Tu domain 2</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR01472</td>
<td align="left" rowspan="1" colspan="1">GDP-mannose 4,6-dehydratase</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0187</td>
<td align="left" rowspan="1" colspan="1">Type IIA topoisomerase (DNA gyrase/topo II, topoisomerase IV), B subunit</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF00216</td>
<td align="left" rowspan="1" colspan="1">Bacterial DNA-binding protein</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR01017</td>
<td align="left" rowspan="1" colspan="1">Ribosomal protein S4</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0568</td>
<td align="left" rowspan="1" colspan="1">DNA-directed RNA polymerase, sigma subunit (sigma70/sigma32)</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF00101</td>
<td align="left" rowspan="1" colspan="1">Ribulose bisphosphate carboxylase, small chain</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR02521</td>
<td align="left" rowspan="1" colspan="1">Type IV pilus biogenesis/stability protein PilW</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG1089</td>
<td align="left" rowspan="1" colspan="1">GDP-D-mannose dehydratase</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF00016</td>
<td align="left" rowspan="1" colspan="1">Ribulose bisphosphate carboxylase large chain, catalytic domain</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR00038</td>
<td align="left" rowspan="1" colspan="1">translation elongation factor P</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0188</td>
<td align="left" rowspan="1" colspan="1">Type IIA topoisomerase (DNA gyrase/topo II, topoisomerase IV), A subunit</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF01106</td>
<td align="left" rowspan="1" colspan="1">NifU-like domain</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR00050</td>
<td align="left" rowspan="1" colspan="1">RNA methyltransferase, TrmH family, group 1</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0206</td>
<td align="left" rowspan="1" colspan="1">Cell division GTPase</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PF07719</td>
<td align="left" rowspan="1" colspan="1">Tetratricopeptide repeat</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">TIGR00065</td>
<td align="left" rowspan="1" colspan="1">Cell division protein FtsZ</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">COG0278</td>
<td align="left" rowspan="1" colspan="1">Glutaredoxin-related protein</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="nt120">
<p>(B). If a homologue was identified in the current study (A) that too is included. Numbers in columns A and B refer to the number of sequences which were assigned this particular protein annotation. Only pORF clusters with >10 non-redundant sequences were included in this analysis.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The number of protein sequences that can be annotated through comparison to PFAM, TIGRfam or COG was approximately 2.5% prior to removal of rRNA sequences and 4.3% following removal. This is far lower than the 36.5% from our study which could be annotated (8.5 fold more) (
<xref ref-type="table" rid="pone-0003042-t002">
<bold>Tables 2</bold>
</xref>
<bold> and </bold>
<xref ref-type="supplementary-material" rid="pone.0003042.s001">
<bold>Table S1</bold>
</xref>
). When comparing the annotation of the top 10 most abundant pORF clusters (>10 non-redundant sequences) found in the Frias-Lopez studies (
<xref ref-type="table" rid="pone-0003042-t006">
<bold>Table 6</bold>
</xref>
) 6 (by PFAM), 7 (by TIGRfam) and 5 (by COG) clusters are found in both studies as abundant clusters (>100 sequences per pORF cluster). For example, the 1
<sup>st</sup>
and 2
<sup>nd</sup>
most abundant PFAM annotation for the Frias-Lopez study (PF00004) are the 4
<sup>th</sup>
and 10
<sup>th</sup>
most abundant PFAM annotation for the current study (
<xref ref-type="table" rid="pone-0003042-t003">
<bold>Table 3</bold>
</xref>
<bold> & </bold>
<xref ref-type="table" rid="pone-0003042-t006">
<bold>6</bold>
</xref>
).</p>
</sec>
<sec id="s3i">
<title>Summary</title>
<p>The ability to assess natural metatranscriptomes of complex microbial communities under different environmental conditions represents a significant advance in our ability to link community structure with function and DNA genotypes (sequences) with corresponding phenotypes. The approach presented here expands the available methodologies for assaying metatranscriptomes with >99% enrichment from total RNA (by removal of ribosomal RNA) and demonstrates that changes in expression of transcripts can be observed between time points. The outputs of this study include a large number of novel, highly expressed sequence clusters and confirmation that the majority of these clusters are orphaned and therefore further prove the utility of this approach for use in discovering novel genetic capacity
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
. The computational analyses produced in this study also demonstrates the critical importance of access to public portals, namely CAMERA
<xref ref-type="bibr" rid="pone.0003042-Seshadri1">[20]</xref>
and SEED
<xref ref-type="bibr" rid="pone.0003042-Overbeek1">[17]</xref>
,
<xref ref-type="bibr" rid="pone.0003042-Aziz1">[18]</xref>
, for the processing of such vast quantities of complex data.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="s4">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pone.0003042.s001">
<label>Table S1</label>
<caption>
<p>Comparison of DNA and mRNA from samples collected by Frias-Lopez et al
<xref ref-type="bibr" rid="pone.0003042-FriasLopez1">[9]</xref>
.</p>
<p>(0.04 MB RTF)</p>
</caption>
<media xlink:href="pone.0003042.s001.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0003042.s002">
<label>Table S2</label>
<caption>
<p>Information about the 85 most abundant nucleotide clusters. Including size, number of sequences in cluster, distribution of abundance of mRNA and DNA sequences within each cluster and the presence or absence of those clusters for which PCR amplification from environmental DNA was performed. T1B1 refers to high CO2 from the mid-bloom; T1B6 refers to present day CO2 from the mid-bloom; T2B1 refers to high CO2 from the post-bloom; T2B6 refers to present day CO2 from the post-bloom.</p>
<p>(0.27 MB RTF)</p>
</caption>
<media xlink:href="pone.0003042.s002.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0003042.s003">
<label>Table S3</label>
<caption>
<p>Number of (A) nucleotide sequence and (C) partial ORF sequence homologues found between the eight datasets from the current study. Percentage of (B) nucleotide sequence and (D) partial ORF sequence homologues found between the eight datasets from the current study. T1B1 = Mid-Bloom, High CO2. T1B6 = Mid-Bloom, Present Day. T2B1 = Post-Bloom, High CO2. T2B6 = Post-Bloom, Present Day. pORF percentages are based on total pORFs, denoted d in
<xref ref-type="table" rid="pone-0003042-t002">Table 2</xref>
.</p>
<p>(0.10 MB RTF)</p>
</caption>
<media xlink:href="pone.0003042.s003.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank Margaret Hughes and Neil Hall from the NERC / University of Liverpool Advanced Genomics Facility, Simon Thomas, and Bonnie Laverock from Plymouth Marine Laboratory and special acknowledgement to Ludovica Marzo at CEH for her expert help in designing the PCR primers used in the study.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0003042-DeLong1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>DeLong</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Preston</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Mincer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rich</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Hallam</surname>
<given-names>SJ</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Community genomics among stratified microbial assemblages in the ocean's interior.</article-title>
<source>Science</source>
<volume>311</volume>
<fpage>496</fpage>
<lpage>503</lpage>
<pub-id pub-id-type="pmid">16439655</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Rusch1">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rusch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Heidelberg</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.</article-title>
<source>PLoS Biol</source>
<volume>5</volume>
<fpage>398</fpage>
<lpage>431</lpage>
</citation>
</ref>
<ref id="pone.0003042-Yooseph1">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yooseph</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rusch</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Halpern</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>SJ</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.</article-title>
<source>PLoS Biol</source>
<volume>5</volume>
<fpage>432</fpage>
<lpage>466</lpage>
</citation>
</ref>
<ref id="pone.0003042-Handelsman1">
<label>4</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Handelsman</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tiedje</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Alvarez-Cohen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Cann</surname>
<given-names>IKO</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<source>The New Science of metagenomics: revealing the secrets of our microbial planet</source>
<publisher-loc>Washington, DC</publisher-loc>
<publisher-name>The National Academies Press</publisher-name>
</citation>
</ref>
<ref id="pone.0003042-Parro1">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parro</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Moreno-Paz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gonzalez-Toril</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Analysis of environmental transcriptomes by DNA microarrays.</article-title>
<source>Env Microbiol</source>
<volume>9</volume>
<fpage>453</fpage>
<lpage>464</lpage>
<pub-id pub-id-type="pmid">17222143</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Poretsky1">
<label>6</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Poretsky</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Bano</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Buchan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>LeCleir</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kleikemper</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>Analysis of microbial gene transcripts in environmental samples.</article-title>
<source>Appl Env Microbiol</source>
<volume>71</volume>
<fpage>4121</fpage>
<lpage>4126</lpage>
<pub-id pub-id-type="pmid">16000831</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Leininger1">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leininger</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Urich</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Schloter</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schwark</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Archaea predominate among ammonia-oxidizing prokaryotes in soils.</article-title>
<source>Nature</source>
<volume>442</volume>
<fpage>806</fpage>
<lpage>809</lpage>
<pub-id pub-id-type="pmid">16915287</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Urich1">
<label>8</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Urich</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lanzén</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Schleper</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome.</article-title>
<source>PLoS ONE</source>
<volume>3</volume>
<fpage>e2527</fpage>
<comment>doi:10.1371/journal.pone.0002527</comment>
<pub-id pub-id-type="pmid">18575584</pub-id>
</citation>
</ref>
<ref id="pone.0003042-FriasLopez1">
<label>9</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Frias-Lopez</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tyson</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Coleman</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Microbial community gene expression in ocean surface waters.</article-title>
<source>Proc Natl Acad Sci USA</source>
<volume>105</volume>
<fpage>3805</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="pmid">18316740</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Gilbert1">
<label>10</label>
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Gilbert</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cooley</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kulakova</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Potential for phosphonoacetate utilisation by marine bacteria in temperate coastal waters.</article-title>
<source>Env Microb</source>
<comment>Accepted 20
<sup>th</sup>
July 2008</comment>
</citation>
</ref>
<ref id="pone.0003042-Griffiths1">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Griffiths</surname>
<given-names>RI</given-names>
</name>
<name>
<surname>Whiteley</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>O'Donnell</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>MJ</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Rapid method for coextraction of DNA- and rRNA-based microbial community composition.</article-title>
<source>Appl Environ Microbiol</source>
<volume>66</volume>
<fpage>5488</fpage>
<lpage>5491</lpage>
<pub-id pub-id-type="pmid">11097934</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Neufeld1">
<label>12</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Neufeld</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Schäfer</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Boden</surname>
<given-names>R</given-names>
</name>
<name>
<surname>McDonald</surname>
<given-names>IR</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>Stable-isotope probing implicates Methylophaga spp and novel Gammaproteobacteria in marine methanol and methylamine metabolism.</article-title>
<source>ISME J</source>
<volume>1</volume>
<fpage>480</fpage>
<lpage>491</lpage>
<pub-id pub-id-type="pmid">18043650</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Zhang1">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Martiny</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Reppas</surname>
<given-names>NB</given-names>
</name>
<name>
<surname>Barry</surname>
<given-names>KW</given-names>
</name>
<name>
<surname>Malek</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Sequencing genomes from single cells by polymerase cloning.</article-title>
<source>Nature Biotech</source>
<volume>24</volume>
<fpage>680</fpage>
<lpage>686</lpage>
</citation>
</ref>
<ref id="pone.0003042-Sansone1">
<label>14</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sansone</surname>
<given-names>S-A</given-names>
</name>
<name>
<surname>Rocca-Serra</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Brandizi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Brazma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?”.</article-title>
<source>OMICS</source>
<volume>12</volume>
<fpage>143</fpage>
<lpage>149</lpage>
<pub-id pub-id-type="pmid">18447634</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Field1">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Garrity</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Selengut</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>Towards a richer description of our complete collection of genomes and metagenomes: the “Minimum Information about a Genome Sequence” (MIGS) specification.</article-title>
<source>Nature Biotech</source>
<volume>26</volume>
<fpage>541</fpage>
<lpage>7</lpage>
</citation>
</ref>
<ref id="pone.0003042-Li1">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.</article-title>
<source>Bioinformatics</source>
<volume>22</volume>
<fpage>1658</fpage>
<lpage>1659</lpage>
<pub-id pub-id-type="pmid">16731699</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Overbeek1">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Overbeek</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Begley</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Butler</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Choudhuri</surname>
<given-names>JV</given-names>
</name>
<name>
<surname>Chuang</surname>
<given-names>HY</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<article-title>The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.</article-title>
<source>Nuc Acids Res</source>
<volume>33</volume>
<fpage>5691</fpage>
<lpage>5702</lpage>
</citation>
</ref>
<ref id="pone.0003042-Aziz1">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aziz</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Bartels</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Best</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>DeJongh</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Disz</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<year>2008</year>
<article-title>The RAST Server: Rapid Annotations using Subsystems Technology.</article-title>
<source>BMC Genomics</source>
<volume>9</volume>
<fpage>75</fpage>
<pub-id pub-id-type="pmid">18261238</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Wendisch1">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wendisch</surname>
<given-names>VF</given-names>
</name>
<name>
<surname>Zimmer</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Khodursky</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Peter</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Cozzarelli</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<year>2001</year>
<article-title>Isolation of Escherichia coli mRNA and comparison of expression using mRNA and total RNA on DNA microarrays.</article-title>
<source>Anal Biochem</source>
<volume>290</volume>
<fpage>205</fpage>
<lpage>213</lpage>
<pub-id pub-id-type="pmid">11237321</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Seshadri1">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seshadri</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kravitz</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Smarr</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gilna</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Frazier</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>CAMERA: A Community Resource for Metagenomics.</article-title>
<source>PLoS Biol</source>
<volume>5</volume>
<fpage>e75</fpage>
<pub-id pub-id-type="pmid">17355175</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Xu1">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Average Gene Length Is Highly Conserved in Prokaryotes and Eukaryotes and Diverges Only Between the Two Kingdoms.</article-title>
<source>Mol Biol Evol</source>
<volume>23</volume>
<fpage>1107</fpage>
<lpage>1108</lpage>
<pub-id pub-id-type="pmid">16611645</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Francois1">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Francois</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Garzoni</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bento</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schrenzel</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Comparison of amplification methods for transcriptomic analyses of low abundance prokaryotic RNA sources.</article-title>
<source>J Microbiol Methods</source>
<volume>68</volume>
<fpage>385</fpage>
<lpage>391</lpage>
<pub-id pub-id-type="pmid">17112614</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Wilson1">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilson</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Feil</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Lilley</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.</article-title>
<source>PLoS ONE</source>
<volume>2(3)</volume>
<fpage>e324</fpage>
<comment>doi:10.1371/journal.pone.0000324</comment>
<pub-id pub-id-type="pmid">17389915</pub-id>
</citation>
</ref>
<ref id="pone.0003042-Lipman1">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Souvorov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
<name>
<surname>Panchenko</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Tatusova</surname>
<given-names>TA</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>The relationship of protein conservation and sequence length.</article-title>
<source>BMC Evol Biol</source>
<volume>2</volume>
<fpage>20</fpage>
<pub-id pub-id-type="pmid">12410938</pub-id>
</citation>
</ref>
</ref-list>
<fn-group>
<fn fn-type="conflict">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="financial-disclosure">
<p>
<bold>Funding: </bold>
This work is supported by a grant from the Natural Environment Research Council (NE/C507902/1) and is part of the core research program of the Plymouth Marine Laboratory, a collaborative centre of NERC. Supplementary funds for pyrosequencing were from a Science Budget award from the Biodiversity Program of the Centre for Ecology and Hydrology. The project has been partially funded with US Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN266200400042C. Funding for CAMERA was provided by the Gordon and Betty Moore Foundation. No sponsors or funding agencies were involved in any of the design, or conduct of the study, nor the collection, analysis or interpretation of the data, nor the preparation, review or approval of the manuscript.</p>
</fn>
</fn-group>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000313 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000313 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2518522
   |texte=   Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:18725995" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024