Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Metagenomic analysis: the challenge of the data bonanza

Identifieur interne : 000508 ( Pmc/Corpus ); précédent : 000507; suivant : 000509

Metagenomic analysis: the challenge of the data bonanza

Auteurs : Chris I. Hunter ; Alex Mitchell ; Philip Jones ; Craig Mcanulla ; Sebastien Pesseat ; Maxim Scheremetjew ; Sarah Hunter

Source :

RBID : PMC:3504930

Abstract

Several thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.


Url:
DOI: 10.1093/bib/bbs020
PubMed: 22962339
PubMed Central: 3504930

Links to Exploration step

PMC:3504930

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Metagenomic analysis: the challenge of the data bonanza</title>
<author>
<name sortKey="Hunter, Chris I" sort="Hunter, Chris I" uniqKey="Hunter C" first="Chris I." last="Hunter">Chris I. Hunter</name>
</author>
<author>
<name sortKey="Mitchell, Alex" sort="Mitchell, Alex" uniqKey="Mitchell A" first="Alex" last="Mitchell">Alex Mitchell</name>
</author>
<author>
<name sortKey="Jones, Philip" sort="Jones, Philip" uniqKey="Jones P" first="Philip" last="Jones">Philip Jones</name>
</author>
<author>
<name sortKey="Mcanulla, Craig" sort="Mcanulla, Craig" uniqKey="Mcanulla C" first="Craig" last="Mcanulla">Craig Mcanulla</name>
</author>
<author>
<name sortKey="Pesseat, Sebastien" sort="Pesseat, Sebastien" uniqKey="Pesseat S" first="Sebastien" last="Pesseat">Sebastien Pesseat</name>
</author>
<author>
<name sortKey="Scheremetjew, Maxim" sort="Scheremetjew, Maxim" uniqKey="Scheremetjew M" first="Maxim" last="Scheremetjew">Maxim Scheremetjew</name>
</author>
<author>
<name sortKey="Hunter, Sarah" sort="Hunter, Sarah" uniqKey="Hunter S" first="Sarah" last="Hunter">Sarah Hunter</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22962339</idno>
<idno type="pmc">3504930</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504930</idno>
<idno type="RBID">PMC:3504930</idno>
<idno type="doi">10.1093/bib/bbs020</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000508</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Metagenomic analysis: the challenge of the data bonanza</title>
<author>
<name sortKey="Hunter, Chris I" sort="Hunter, Chris I" uniqKey="Hunter C" first="Chris I." last="Hunter">Chris I. Hunter</name>
</author>
<author>
<name sortKey="Mitchell, Alex" sort="Mitchell, Alex" uniqKey="Mitchell A" first="Alex" last="Mitchell">Alex Mitchell</name>
</author>
<author>
<name sortKey="Jones, Philip" sort="Jones, Philip" uniqKey="Jones P" first="Philip" last="Jones">Philip Jones</name>
</author>
<author>
<name sortKey="Mcanulla, Craig" sort="Mcanulla, Craig" uniqKey="Mcanulla C" first="Craig" last="Mcanulla">Craig Mcanulla</name>
</author>
<author>
<name sortKey="Pesseat, Sebastien" sort="Pesseat, Sebastien" uniqKey="Pesseat S" first="Sebastien" last="Pesseat">Sebastien Pesseat</name>
</author>
<author>
<name sortKey="Scheremetjew, Maxim" sort="Scheremetjew, Maxim" uniqKey="Scheremetjew M" first="Maxim" last="Scheremetjew">Maxim Scheremetjew</name>
</author>
<author>
<name sortKey="Hunter, Sarah" sort="Hunter, Sarah" uniqKey="Hunter S" first="Sarah" last="Hunter">Sarah Hunter</name>
</author>
</analytic>
<series>
<title level="j">Briefings in Bioinformatics</title>
<idno type="ISSN">1467-5463</idno>
<idno type="eISSN">1477-4054</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Several thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Wooley, Jc" uniqKey="Wooley J">JC Wooley</name>
</author>
<author>
<name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
<author>
<name sortKey="Friedberg, I" uniqKey="Friedberg I">I Friedberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gilbert, J" uniqKey="Gilbert J">J Gilbert</name>
</author>
<author>
<name sortKey="Dupont, C" uniqKey="Dupont C">C Dupont</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hildebrandt, A" uniqKey="Hildebrandt A">A Hildebrandt</name>
</author>
<author>
<name sortKey="Lacorte, S" uniqKey="Lacorte S">S Lacorte</name>
</author>
<author>
<name sortKey="Barcel, D" uniqKey="Barcel D">D Barceló</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, X" uniqKey="Zhou X">X Zhou</name>
</author>
<author>
<name sortKey="Ren, L" uniqKey="Ren L">L Ren</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Paarmann, D" uniqKey="Paarmann D">D Paarmann</name>
</author>
<author>
<name sortKey="D Ouza, M" uniqKey="D Ouza M">M D’Souza</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brady, A" uniqKey="Brady A">A Brady</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mchardy, Ac" uniqKey="Mchardy A">AC McHardy</name>
</author>
<author>
<name sortKey="Martin, Hg" uniqKey="Martin H">HG Martín</name>
</author>
<author>
<name sortKey="Tsirigos, A" uniqKey="Tsirigos A">A Tsirigos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angiuoli, Sv" uniqKey="Angiuoli S">SV Angiuoli</name>
</author>
<author>
<name sortKey="White, Jr" uniqKey="White J">JR White</name>
</author>
<author>
<name sortKey="Matalka, M" uniqKey="Matalka M">M Matalka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hunter, S" uniqKey="Hunter S">S Hunter</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
<author>
<name sortKey="Attwood, Tk" uniqKey="Attwood T">TK Attwood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
<author>
<name sortKey="Mistry, J" uniqKey="Mistry J">J Mistry</name>
</author>
<author>
<name sortKey="Tate, J" uniqKey="Tate J">J Tate</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sigrist, Cja" uniqKey="Sigrist C">CJA Sigrist</name>
</author>
<author>
<name sortKey="Cerutti, L" uniqKey="Cerutti L">L Cerutti</name>
</author>
<author>
<name sortKey="De Castro, E" uniqKey="De Castro E">E de Castro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Attwood, Tk" uniqKey="Attwood T">TK Attwood</name>
</author>
<author>
<name sortKey="Bradley, P" uniqKey="Bradley P">P Bradley</name>
</author>
<author>
<name sortKey="Flower, Dr" uniqKey="Flower D">DR Flower</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lees, J" uniqKey="Lees J">J Lees</name>
</author>
<author>
<name sortKey="Yeats, C" uniqKey="Yeats C">C Yeats</name>
</author>
<author>
<name sortKey="Redfern, O" uniqKey="Redfern O">O Redfern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Selengut, Jd" uniqKey="Selengut J">JD Selengut</name>
</author>
<author>
<name sortKey="Haft, Dh" uniqKey="Haft D">DH Haft</name>
</author>
<author>
<name sortKey="Davidsen, T" uniqKey="Davidsen T">T Davidsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gribskov, M" uniqKey="Gribskov M">M Gribskov</name>
</author>
<author>
<name sortKey="Mclachlan, Ad" uniqKey="Mclachlan A">AD McLachlan</name>
</author>
<author>
<name sortKey="Eisenberg, D" uniqKey="Eisenberg D">D Eisenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S Sun</name>
</author>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, S" uniqKey="Wu S">S Wu</name>
</author>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L Fu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerlach, W" uniqKey="Gerlach W">W Gerlach</name>
</author>
<author>
<name sortKey="Junemann, S" uniqKey="Junemann S">S Jünemann</name>
</author>
<author>
<name sortKey="Tille, F" uniqKey="Tille F">F Tille</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lingner, T" uniqKey="Lingner T">T Lingner</name>
</author>
<author>
<name sortKey="Asshauer, Kp" uniqKey="Asshauer K">KP Asshauer</name>
</author>
<author>
<name sortKey="Schreiber, F" uniqKey="Schreiber F">F Schreiber</name>
</author>
<author>
<name sortKey="Meinicke, P" uniqKey="Meinicke P">P Meinicke</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Su, X" uniqKey="Su X">X Su</name>
</author>
<author>
<name sortKey="Xu, J" uniqKey="Xu J">J Xu</name>
</author>
<author>
<name sortKey="Ning, K" uniqKey="Ning K">K Ning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angiuoli, Sv" uniqKey="Angiuoli S">SV Angiuoli</name>
</author>
<author>
<name sortKey="Matalka, M" uniqKey="Matalka M">M Matalka</name>
</author>
<author>
<name sortKey="Gussman, G" uniqKey="Gussman G">G Gussman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilkening, J" uniqKey="Wilkening J">J Wilkening</name>
</author>
<author>
<name sortKey="Wilke, A" uniqKey="Wilke A">A Wilke</name>
</author>
<author>
<name sortKey="Desai, N" uniqKey="Desai N">N Desai</name>
</author>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gilbert, J" uniqKey="Gilbert J">J Gilbert</name>
</author>
<author>
<name sortKey="Field, D" uniqKey="Field D">D Field</name>
</author>
<author>
<name sortKey="Swift, P" uniqKey="Swift P">P Swift</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Field, D" uniqKey="Field D">D Field</name>
</author>
<author>
<name sortKey="Amaral Zettler, L" uniqKey="Amaral Zettler L">L Amaral-Zettler</name>
</author>
<author>
<name sortKey="Cochrane, G" uniqKey="Cochrane G">G Cochrane</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Brief Bioinform</journal-id>
<journal-id journal-id-type="iso-abbrev">Brief. Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bib</journal-id>
<journal-id journal-id-type="hwp">bib</journal-id>
<journal-title-group>
<journal-title>Briefings in Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1467-5463</issn>
<issn pub-type="epub">1477-4054</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22962339</article-id>
<article-id pub-id-type="pmc">3504930</article-id>
<article-id pub-id-type="doi">10.1093/bib/bbs020</article-id>
<article-id pub-id-type="publisher-id">bbs020</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Papers</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Metagenomic analysis: the challenge of the data bonanza</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Hunter</surname>
<given-names>Chris I.</given-names>
</name>
<xref ref-type="bio" rid="d34e36">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mitchell</surname>
<given-names>Alex</given-names>
</name>
<xref ref-type="bio" rid="d34e49">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jones</surname>
<given-names>Philip</given-names>
</name>
<xref ref-type="bio" rid="d34e62">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>McAnulla</surname>
<given-names>Craig</given-names>
</name>
<xref ref-type="bio" rid="d34e75">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pesseat</surname>
<given-names>Sebastien</given-names>
</name>
<xref ref-type="bio" rid="d34e88">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Scheremetjew</surname>
<given-names>Maxim</given-names>
</name>
<xref ref-type="bio" rid="d34e101">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hunter</surname>
<given-names>Sarah</given-names>
</name>
<xref ref-type="bio" rid="d34e114">*</xref>
</contrib>
</contrib-group>
<author-notes>
<corresp>Corresponding author. Chris I. Hunter, EMBL Outstation European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, Cambridge, UK. Tel.:
<phone>+44 (0) 1223 494 444</phone>
; Fax:
<fax>+44 (0)1223 494 468</fax>
; E-mail:
<email>chrish@ebi.ac.uk</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>11</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>7</day>
<month>9</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>7</day>
<month>9</month>
<year>2012</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>13</volume>
<issue>6</issue>
<issue-title>Special Issue: Bioinformatics approaches and tools for metagenomic analysis</issue-title>
<fpage>743</fpage>
<lpage>746</lpage>
<history>
<date date-type="received">
<day>4</day>
<month>11</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>1</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2012. Published by Oxford University Press.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/3.0">http://creativecommons.org/licenses/by-nc/3.0</ext-link>
), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Several thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.</p>
</abstract>
<kwd-group>
<kwd>metagenomics</kwd>
<kwd>next-generation sequencing (NGS)</kwd>
<kwd>high-throughput sequencing (HTS)</kwd>
<kwd>functional analysis</kwd>
<kwd>environmental bioinformatics</kwd>
</kwd-group>
<counts>
<page-count count="4"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>METAGENOMICS: A BROAD FIELD</title>
<p>The discipline of metagenomics is the study of the genetic material present in a given environment (for a detailed review of the field, see [
<xref ref-type="bibr" rid="bbs020-B1">1</xref>
,
<xref ref-type="bibr" rid="bbs020-B2">2</xref>
]). However, the term ‘metagenomics’ applies to a very broad range of technical activities, including the collection of environmental samples [
<xref ref-type="bibr" rid="bbs020-B3">3</xref>
], the extraction of deoxyribonucleic acid/ribonucleic acid (RNA)/protein from those samples, the ever-increasing variety of technologies used for sequencing [
<xref ref-type="bibr" rid="bbs020-B4">4</xref>
] and the subsequent analysis and interpretation of the resulting data. In this article, we briefly review the current practices in metagenomic sequence analysis and describe potential future developments that may impact on them.</p>
</sec>
<sec>
<title>TAXONOMIC ANALYSIS AND METAGENOMICS</title>
<p>The taxonomic classification of living things has long been a central theme in biology; this is particularly true of metagenomics. Amplicon-based taxonomic studies currently dominate the field, and, at the time of writing, more than 80% of the publicly available data sets within the MG-RAST service [
<xref ref-type="bibr" rid="bbs020-B5">5</xref>
] are taxonomic analyses of the 16S RNA marker gene. Other phylogenetic classification approaches, such as those offered by Phymm [
<xref ref-type="bibr" rid="bbs020-B6">6</xref>
] and PhyloPythia [
<xref ref-type="bibr" rid="bbs020-B7">7</xref>
], are also being used more extensively.</p>
<p>Such analyses are highly valuable, as particular phylogenetic groupings can be associated with important functions, and the diversity of a microbial community is thought to provide an indication of the resilience of the system (i.e. its ability to carry on functioning when conditions change). However, taxonomic studies may not necessarily reflect the complex biological processes that exist in an environment, as microbial genes can move horizontally between unrelated species. Consequently, the same functional gene can be present in a variety of backgrounds. Furthermore, these approaches do not take account of intra-species diversity (where organisms may gain or lose function as they adapt to a specific environment) or situations where organisms may be actively engaged in only a subset of their functional repertoire.</p>
</sec>
<sec>
<title>FUNCTIONAL ANALYSIS OF METAGENOMIC SAMPLES</title>
<p>A complementary approach is to analyze the putative functional entities (such as protein coding sequences) within the genomic and/or transcriptomic sequences from an environmental sample. This has become an increasingly realistic proposition with the increasing power and reducing cost of high-throughput sequencing; it is now feasible to sequence a representative proportion of an entire metagenome at reasonable price. The remaining challenge is to process the massive volumes of data produced by such approaches.</p>
<p>Analysis of putative protein coding sequences typically begins with the identification and translation of open reading frames within nucleotide sequences. A minimum size constraint is usually applied, as prediction of function for very short sequences is not reliable. Frequently, pairwise sequence alignment methods, such as BLAST [
<xref ref-type="bibr" rid="bbs020-B8">8</xref>
], are then used to infer function by searching for similarity to other sequences in a reference database.</p>
<p>One of the original design specifications for BLAST was to provide a tool for fast comparison of sequences. Despite having been developed over 20 years ago, it is still one of the fastest sequence comparison algorithms available. Nevertheless, the sheer volume of sequence data produced during metagenomic studies means that BLAST-based analyses represent significant bottlenecks, which are unlikely to be addressed simply by scaling up computational resources [
<xref ref-type="bibr" rid="bbs020-B9">9</xref>
].</p>
</sec>
<sec>
<title>PROTEIN SIGNATURE-BASED ANALYSES</title>
<p>An alternative protein sequence analysis approach is to use computational models, known as protein signatures, of the type housed in the InterPro [
<xref ref-type="bibr" rid="bbs020-B10">10</xref>
] consortium of databases, such as Pfam [
<xref ref-type="bibr" rid="bbs020-B11">11</xref>
], PROSITE [
<xref ref-type="bibr" rid="bbs020-B12">12</xref>
], PRINTS [
<xref ref-type="bibr" rid="bbs020-B13">13</xref>
], CATH-Gene3D [
<xref ref-type="bibr" rid="bbs020-B14">14</xref>
] and TIGRFAMs [
<xref ref-type="bibr" rid="bbs020-B15">15</xref>
]. These signatures draw on multiple sequence alignments of protein families, domains and functionally important sites. By using such alignments, protein signatures are able to model the (often few) amino acid residues that are conserved in distantly related proteins that are essential for stability and function. Identifying such residues is not possible with pairwise alignment techniques, and consequently protein signatures are usually more sensitive at detecting divergent homologs [
<xref ref-type="bibr" rid="bbs020-B16">16</xref>
,
<xref ref-type="bibr" rid="bbs020-B17">17</xref>
].</p>
<p>Protein signature-based sequence analysis methods offer two further important advantages over their pairwise alignment-based counterparts. As they are built to recognize specific functional entities, such as individual protein families or particular functional domains, matches to signatures are highly accurate predictors of function. This is in contrast to pairwise alignment approaches, where the only significant matches are often to other uncharacterized sequences, meaning that no functional information can be inferred. Furthermore, recent technological advances, such as the development of the HMMER3 algorithm [
<xref ref-type="bibr" rid="bbs020-B18">18</xref>
], have led to substantial performance increases in a number of protein signature-based analysis techniques, so that they can now offer fast, as well as accurate and sensitive, alternatives to BLAST.</p>
<p>A number of metagenomic analysis pipelines already use protein signatures to predict the functional characteristics of metagenomics data sets. For example, both CAMERA [
<xref ref-type="bibr" rid="bbs020-B19">19</xref>
] and WebMGA [
<xref ref-type="bibr" rid="bbs020-B20">20</xref>
] use Pfam and TIGRFAMs alongside BLAST-based approaches for functional sequence analysis. CARMA [
<xref ref-type="bibr" rid="bbs020-B21">21</xref>
] and CoMet [
<xref ref-type="bibr" rid="bbs020-B22">22</xref>
] also draw on Pfam for their analyses.</p>
<p>EMBL-EBIs recently launched resource (
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/metagenomics">http://www.ebi.ac.uk/metagenomics</ext-link>
) uses InterPro for functional characterization of metagenomic sequences. InterPro combines different types of protein signature from multiple diverse databases, providing extensive sequence coverage and fine-grained functional analyses. It also provides additional benefits, such as the association of Gene Ontology terms [
<xref ref-type="bibr" rid="bbs020-B23">23</xref>
] with signatures and inference of potential involvement in biological pathways, further augmenting the annotation of protein sequences. InterPro’s utility is expected to grow in the future as investigations into over-represented amino acid sequences in metagenomic data lead to the
<italic>in silico</italic>
identification of novel protein families and domains, which will in turn be modeled and incorporated into the InterPro Consortium’s member databases.</p>
</sec>
<sec>
<title>COMPUTATIONAL ADVANCES IN METAGENOMIC ANALYSIS— THE NEED FOR SPEED</title>
<p>Even if protein signature-based methods are used, the time taken to analyze metagenomic data currently far outweighs the length of time taken to produce the sequences in the first place. It is anticipated that new paradigms, such as the use of graphical processing unit (GPU) computing and cloud computing, may help to mitigate this bottleneck in the future.</p>
<p>Promising work has already begun in this area. For example, the developers of Parallel-META [
<xref ref-type="bibr" rid="bbs020-B24">24</xref>
] have reported a 10–15-fold increase in analysis speeds using GPU over central processing unit. CloVR [
<xref ref-type="bibr" rid="bbs020-B25">25</xref>
], meanwhile, provides a virtualized machine containing multiple microbial sequence analysis pipelines, including one for metagenomics. It gives the user the option to run their analysis locally or using a commercial or academic cloud.</p>
<p>The use of GPUs and other hardware-based approaches is limited by the specialist programming required to adapt software to run on these architectures. Indeed, the number of general bioinformatics applications that can be run on GPUs is still restricted because of this. Cloud computing facilities should eventually revolutionize the way metagenomics researchers work, potentially allowing even small laboratories access to vast amounts of compute power. However, there remain some drawbacks with this approach, including the relative expense of the compute (running a fully utilized compute farm is cheaper than purchasing time on a commercial cloud [
<xref ref-type="bibr" rid="bbs020-B26">26</xref>
]) and potential security issues related to transferring data into the cloud environment.</p>
</sec>
<sec>
<title>METADATA PROVIDES CONTEXT TO ANALYSIS</title>
<p>Speed is not the only important consideration in metagenomics analysis. Critical to any metagenomic study is the extent and quality of the associated metadata, as this provides context to the experiments and allows meaningful comparisons to be made between studies.</p>
<p>This is exemplified by the Western English Channel study [
<xref ref-type="bibr" rid="bbs020-B27">27</xref>
], where multiple samples have been meaningfully compared across a large time series. The collection of detailed metadata for each sample allowed the researchers to hypothesize which factors affected the species and functional variety at that site the most.</p>
<p>In recognition of its importance, there has recently been a community-driven shift toward a greater degree of sample contextual metadata being archived with study data, which has been largely facilitated by the Genomic Standards Consortium (GSC) [
<xref ref-type="bibr" rid="bbs020-B28">28</xref>
]. The mission statement of the GSC is to work toward the implementation of new genomic standards for metadata and methods of capturing and exchanging that metadata. It is immensely valuable to store standards-compliant metadata and the raw sequence data they describe in public repositories, as it allows future reuse and reinterpretation of these data by other scientists. For this reason, researchers are encouraged to submit metadata and raw sequence reads to the INSDC Nucleotide Archives either directly or by the EMBL-EBI metagenomics portal.</p>
</sec>
<sec>
<title>CONCLUSION: THE NEED FOR A CONSOLIDATED APPROACH TO METAGENOMICS</title>
<p>Multiple public resources already exist that allow users to view and analyze metagenomics data; however, the field still faces several challenges. It is vital that the metagenomics service providers adopt consistent policy toward metadata, metadata standards and user access to associated raw data, so that metagenomes can be interpreted appropriately by researchers. Despite improvements to functional analysis methods (including the adoption of protein signatures for increased search performance and the optimization of algorithms such as HMMER), the expense of compute remains a barrier to the full realization of metagenomics’ potential. It is hoped that collaboration between analysis providers will lead to better exploitation of new computing paradigms to solve some of these issues.</p>
<p>
<boxed-text id="bbs020-BOX1" position="float">
<caption>
<title>Key Points</title>
</caption>
<p>
<list list-type="bullet">
<list-item>
<p>Metagenomics has historically been dominated by the taxonomic diversity approach, but next generation sequencing is changing this, with more people beginning to investigate the functional potential of an environmental sample.</p>
</list-item>
<list-item>
<p>Protein signatures are a sensitive way to identify protein families, domains and functionally important sites within protein sequence fragments.</p>
</list-item>
<list-item>
<p>High-quality contextual data are essential to allow meaningful comparisons to be made between environmental samples.</p>
</list-item>
<list-item>
<p>The EMBL-EBI metagenomics portal has recently been launched in beta. It facilitates InterPro-driven functional analysis of metagenome sequences and combines this with a metadata-rich archive of metagenomics experiments.</p>
</list-item>
</list>
</p>
</boxed-text>
</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>The authors thank Penny Hirsch for helpful discussions about this manuscript.</p>
</ack>
<bio id="d34e36">
<p>
<bold>C</bold>
<bold>hris Hunter</bold>
is a curator and bioinformatician for the European Metagenomics portal at the European Bioinformatics Institute in Cambridge, UK. A post-doctorate qualified and technically competent biologist, with over 10 years experience in genomic and genetic research.</p>
</bio>
<bio id="d34e49">
<p>
<bold>A</bold>
<bold>lex Mitchell</bold>
is a curation coordinator for the InterPro database at the European Bioinformatics Institute in Cambridge, UK. He joined EMBL-EBI in 2011. He has a DPhil in pharmacology and has over 10 years of experience in protein sequence analysis and classification.</p>
</bio>
<bio id="d34e62">
<p>
<bold>P</bold>
<bold>hilip Jones</bold>
is the software development coordinator for the InterPro database at the European Bioinformatics Institute in Cambridge, UK. He joined the EBI in 2004, initially working on PRIDE, the Proteomics Identifications Database. He holds an MSc in Software Engineering from the Open University and has over 10 years experience in bioinformatics software development.</p>
</bio>
<bio id="d34e75">
<p>
<bold>C</bold>
<bold>raig McAnulla</bold>
is a bioinformatician for the InterPro database at the European Bioinformatics Institute in Cambridge, UK. He has a PhD in Microbiology and several years of experience in running large-scale sequence analyses.</p>
</bio>
<bio id="d34e88">
<p>
<bold>S</bold>
<bold>ebastien Pesseat</bold>
is a web developer and graphic designer for the European Bioinformatics Institute in Cambridge, UK. He joined the EMBL-EBI in 2010 after 7 years as a consultant for the UN. He holds an MSc in digital image manipulation, web technologies and multimedia from the University of Nice Sophia-Antipolis.</p>
</bio>
<bio id="d34e101">
<p>
<bold>M</bold>
<bold>axim Scheremetjew</bold>
is a software developer for the InterPro database at the European Bioinformatics Institute in Cambridge, UK; he has worked there for almost 1 year. Prior to this, he was a bioinformatician and developer at Entelechon GmbH in Germany</p>
</bio>
<bio id="d34e114">
<p>
<bold>S</bold>
<bold>arah Hunter</bold>
is the InterPro team leader at EMBL-EBI, a post she has held since 2007. She previously worked in the pharmaceutical and biotech industries and holds an MSc in Bioinformatics from the University of Manchester.</p>
</bio>
<ref-list>
<title>References</title>
<ref id="bbs020-B1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wooley</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Godzik</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Friedberg</surname>
<given-names>I</given-names>
</name>
</person-group>
<article-title>A primer on metagenomics</article-title>
<source>PLoS Comput Biol</source>
<year>2010</year>
<volume>6</volume>
<issue>2</issue>
<fpage>e1000667</fpage>
<pub-id pub-id-type="pmid">20195499</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gilbert</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dupont</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Microbial metagenomics: beyond the genome</article-title>
<source>Annu Rev Marine Sci</source>
<year>2011</year>
<volume>3</volume>
<fpage>347</fpage>
<lpage>71</lpage>
</element-citation>
</ref>
<ref id="bbs020-B3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hildebrandt</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lacorte</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Barceló</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Sampling of water, soil and sediment to trace organic pollutants at a river-basin scale</article-title>
<source>Anal Bioanal Chem</source>
<year>2006</year>
<volume>386</volume>
<issue>4</issue>
<fpage>1075</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="pmid">16721562</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The next-generation sequencing technology: a technology review and future perspective</article-title>
<source>Sci China Life Sci</source>
<year>2010</year>
<volume>53</volume>
<issue>1</issue>
<fpage>44</fpage>
<lpage>57</lpage>
<pub-id pub-id-type="pmid">20596955</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Paarmann</surname>
<given-names>D</given-names>
</name>
<name>
<surname>D’Souza</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>386</fpage>
<pub-id pub-id-type="pmid">18803844</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brady</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>Phymm and PhymmBL: metagenomic phylogentic classification with interpolated Markov models</article-title>
<source>Nat Methods</source>
<year>2009</year>
<volume>6</volume>
<issue>9</issue>
<fpage>673</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="pmid">19648916</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McHardy</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Martín</surname>
<given-names>HG</given-names>
</name>
<name>
<surname>Tsirigos</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Accurate phylogenetic classification of variable-length DNA fragments</article-title>
<source>Nat Methods</source>
<year>2007</year>
<volume>4</volume>
<issue>1</issue>
<fpage>63</fpage>
<lpage>72</lpage>
<pub-id pub-id-type="pmid">17179938</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Basic local alignment search tool</article-title>
<source>J Mol Biol</source>
<year>1990</year>
<volume>215</volume>
<fpage>403</fpage>
<lpage>10</lpage>
<pub-id pub-id-type="pmid">2231712</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Angiuoli</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>White</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Matalka</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing</article-title>
<source>PLoS One</source>
<year>2011</year>
<volume>6</volume>
<issue>10</issue>
<fpage>e26624</fpage>
<pub-id pub-id-type="pmid">22028928</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hunter</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Attwood</surname>
<given-names>TK</given-names>
</name>
<etal></etal>
</person-group>
<article-title>InterPro: the integrative protein signature database</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<fpage>D211</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="pmid">18940856</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Finn</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Mistry</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tate</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Pfam protein families database</article-title>
<source>Nucleic Acids Res</source>
<year>2010</year>
<volume>38</volume>
<fpage>D211</fpage>
<lpage>22</lpage>
<pub-id pub-id-type="pmid">19920124</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sigrist</surname>
<given-names>CJA</given-names>
</name>
<name>
<surname>Cerutti</surname>
<given-names>L</given-names>
</name>
<name>
<surname>de Castro</surname>
<given-names>E</given-names>
</name>
<etal></etal>
</person-group>
<article-title>PROSITE, a protein domain database for functional characterization and annotation</article-title>
<source>Nucleic Acids Res</source>
<year>2010</year>
<volume>38</volume>
<fpage>D161</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="pmid">19858104</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Attwood</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Bradley</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Flower</surname>
<given-names>DR</given-names>
</name>
<etal></etal>
</person-group>
<article-title>PRINTS and its automatic supplement, prePRINTS</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>400</fpage>
<lpage>2</lpage>
<pub-id pub-id-type="pmid">12520033</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lees</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yeats</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Redfern</surname>
<given-names>O</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene3D: merging structure and function for a thousand genomes</article-title>
<source>Nucleic Acids Res</source>
<year>2010</year>
<volume>38</volume>
<fpage>D296</fpage>
<lpage>300</lpage>
<pub-id pub-id-type="pmid">19906693</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Selengut</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Haft</surname>
<given-names>DH</given-names>
</name>
<name>
<surname>Davidsen</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
<article-title>TIGRFAMs and genome properties: tools for the assignment of molecular function and biological process in prokaryotic genomes</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>D260</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="pmid">17151080</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gribskov</surname>
<given-names>M</given-names>
</name>
<name>
<surname>McLachlan</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Eisenberg</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Profile analysis: detection of distantly related proteins</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>1987</year>
<volume>84</volume>
<issue>13</issue>
<fpage>4355</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="pmid">3474607</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Profile hidden Markov models</article-title>
<source>Bioinformatics</source>
<year>1998</year>
<volume>14</volume>
<fpage>755</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="pmid">9918945</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
</person-group>
<article-title>Accelerated profile HMM searches</article-title>
<source>PLoS Comput Biol</source>
<year>2011</year>
<volume>7</volume>
<issue>10</issue>
<fpage>e1002195</fpage>
<pub-id pub-id-type="pmid">22039361</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Community cyberinfrastructure for advanced microbial ecology research and analysis: the CAMERA resource</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>39</volume>
<fpage>D546</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="pmid">21045053</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>WebMGA: a customizable web server for fast metagenomic sequence analysis</article-title>
<source>BMC Genomics</source>
<year>2011</year>
<volume>12</volume>
<fpage>444</fpage>
<pub-id pub-id-type="pmid">21899761</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerlach</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Jünemann</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tille</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<article-title>WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads</article-title>
<source>BMC Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>430</fpage>
<pub-id pub-id-type="pmid">20021646</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lingner</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Asshauer</surname>
<given-names>KP</given-names>
</name>
<name>
<surname>Schreiber</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Meinicke</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>CoMet—a web server for comparative functional profiling of metagenomes</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>39</volume>
<fpage>W518</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">21622656</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B23">
<label>23</label>
<element-citation publication-type="journal">
<collab>The Gene Ontology Consortium</collab>
<article-title>The Gene Ontology in 2010: extensions and refinements</article-title>
<source>Nucleic Acids Res</source>
<year>2010</year>
<volume>38</volume>
<fpage>D331</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="pmid">19920128</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ning</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Parallel-META: efficient metagenomic data analysis based on high-performance computation</article-title>
<source>BMC Systems Biology</source>
<year>2012</year>
<volume>6</volume>
<issue>Suppl 1</issue>
<fpage>S16</fpage>
<pub-id pub-id-type="pmid">23046922</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Angiuoli</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Matalka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gussman</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<fpage>356</fpage>
<pub-id pub-id-type="pmid">21878105</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B26">
<label>26</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Wilkening</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wilke</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Desai</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Using clouds for metagenomics: a case study</article-title>
<conf-name>IEEE Cluster 2009</conf-name>
<conf-date>2009</conf-date>
<conf-loc>New Orleans, LA</conf-loc>
</element-citation>
</ref>
<ref id="bbs020-B27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gilbert</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Swift</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The taxonomic and functional diversity of microbes at a temperate coastal site: a ‘Multi-Omic’ study of seasonal and diel temporal variation</article-title>
<source>PLoS One</source>
<year>2010</year>
<volume>5</volume>
<issue>11</issue>
<fpage>e15545</fpage>
<pub-id pub-id-type="pmid">21124740</pub-id>
</element-citation>
</ref>
<ref id="bbs020-B28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Amaral-Zettler</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Cochrane</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The genomic standards consortium</article-title>
<source>PLoS Biol</source>
<year>2011</year>
<volume>9</volume>
<issue>6</issue>
<fpage>e1001088</fpage>
<pub-id pub-id-type="pmid">21713030</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000508 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000508 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3504930
   |texte=   Metagenomic analysis: the challenge of the data bonanza
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22962339" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024