Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0006110 ( Pmc/Corpus ); précédent : 0006109; suivant : 0006111 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit</title>
<author>
<name sortKey="Kultima, Jens Roat" sort="Kultima, Jens Roat" uniqKey="Kultima J" first="Jens Roat" last="Kultima">Jens Roat Kultima</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sunagawa, Shinichi" sort="Sunagawa, Shinichi" uniqKey="Sunagawa S" first="Shinichi" last="Sunagawa">Shinichi Sunagawa</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Junhua" sort="Li, Junhua" uniqKey="Li J" first="Junhua" last="Li">Junhua Li</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chen, Weineng" sort="Chen, Weineng" uniqKey="Chen W" first="Weineng" last="Chen">Weineng Chen</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chen, Hua" sort="Chen, Hua" uniqKey="Chen H" first="Hua" last="Chen">Hua Chen</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mende, Daniel R" sort="Mende, Daniel R" uniqKey="Mende D" first="Daniel R." last="Mende">Daniel R. Mende</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Arumugam, Manimozhiyan" sort="Arumugam, Manimozhiyan" uniqKey="Arumugam M" first="Manimozhiyan" last="Arumugam">Manimozhiyan Arumugam</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pan, Qi" sort="Pan, Qi" uniqKey="Pan Q" first="Qi" last="Pan">Qi Pan</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Liu, Binghang" sort="Liu, Binghang" uniqKey="Liu B" first="Binghang" last="Liu">Binghang Liu</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Qin, Junjie" sort="Qin, Junjie" uniqKey="Qin J" first="Junjie" last="Qin">Junjie Qin</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wang, Jun" sort="Wang, Jun" uniqKey="Wang J" first="Jun" last="Wang">Jun Wang</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bork, Peer" sort="Bork, Peer" uniqKey="Bork P" first="Peer" last="Bork">Peer Bork</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Max-Delbruck-Centre for Molecular Medicine, Berlin-Buch, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23082188</idno>
<idno type="pmc">3474746</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3474746</idno>
<idno type="RBID">PMC:3474746</idno>
<idno type="doi">10.1371/journal.pone.0047656</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000611</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit</title>
<author>
<name sortKey="Kultima, Jens Roat" sort="Kultima, Jens Roat" uniqKey="Kultima J" first="Jens Roat" last="Kultima">Jens Roat Kultima</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sunagawa, Shinichi" sort="Sunagawa, Shinichi" uniqKey="Sunagawa S" first="Shinichi" last="Sunagawa">Shinichi Sunagawa</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Junhua" sort="Li, Junhua" uniqKey="Li J" first="Junhua" last="Li">Junhua Li</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3">
<addr-line>School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chen, Weineng" sort="Chen, Weineng" uniqKey="Chen W" first="Weineng" last="Chen">Weineng Chen</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chen, Hua" sort="Chen, Hua" uniqKey="Chen H" first="Hua" last="Chen">Hua Chen</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mende, Daniel R" sort="Mende, Daniel R" uniqKey="Mende D" first="Daniel R." last="Mende">Daniel R. Mende</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Arumugam, Manimozhiyan" sort="Arumugam, Manimozhiyan" uniqKey="Arumugam M" first="Manimozhiyan" last="Arumugam">Manimozhiyan Arumugam</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pan, Qi" sort="Pan, Qi" uniqKey="Pan Q" first="Qi" last="Pan">Qi Pan</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Liu, Binghang" sort="Liu, Binghang" uniqKey="Liu B" first="Binghang" last="Liu">Binghang Liu</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Qin, Junjie" sort="Qin, Junjie" uniqKey="Qin J" first="Junjie" last="Qin">Junjie Qin</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wang, Jun" sort="Wang, Jun" uniqKey="Wang J" first="Jun" last="Wang">Jun Wang</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bork, Peer" sort="Bork, Peer" uniqKey="Bork P" first="Peer" last="Bork">Peer Bork</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Max-Delbruck-Centre for Molecular Medicine, Berlin-Buch, Germany</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at
<ext-link ext-link-type="uri" xlink:href="http://www.bork.embl.de/mocat/">http://www.bork.embl.de/mocat/</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Goecks, J" uniqKey="Goecks J">J Goecks</name>
</author>
<author>
<name sortKey="Nekrutenko, A" uniqKey="Nekrutenko A">A Nekrutenko</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Paarmann, D" uniqKey="Paarmann D">D Paarmann</name>
</author>
<author>
<name sortKey="D Ouza, M" uniqKey="D Ouza M">M D’Souza</name>
</author>
<author>
<name sortKey="Olson, R" uniqKey="Olson R">R Olson</name>
</author>
<author>
<name sortKey="Glass, Em" uniqKey="Glass E">EM Glass</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S Sun</name>
</author>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J Chen</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Altintas, I" uniqKey="Altintas I">I Altintas</name>
</author>
<author>
<name sortKey="Lin, A" uniqKey="Lin A">A Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Markowitz, Vm" uniqKey="Markowitz V">VM Markowitz</name>
</author>
<author>
<name sortKey="Chen, I Ma" uniqKey="Chen I">I-MA Chen</name>
</author>
<author>
<name sortKey="Chu, K" uniqKey="Chu K">K Chu</name>
</author>
<author>
<name sortKey="Szeto, E" uniqKey="Szeto E">E Szeto</name>
</author>
<author>
<name sortKey="Palaniappan, K" uniqKey="Palaniappan K">K Palaniappan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Angiuoli, Sv" uniqKey="Angiuoli S">SV Angiuoli</name>
</author>
<author>
<name sortKey="Matalka, M" uniqKey="Matalka M">M Matalka</name>
</author>
<author>
<name sortKey="Gussman, A" uniqKey="Gussman A">A Gussman</name>
</author>
<author>
<name sortKey="Galens, K" uniqKey="Galens K">K Galens</name>
</author>
<author>
<name sortKey="Vangala, M" uniqKey="Vangala M">M Vangala</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author>
<name sortKey="Harrington, Ed" uniqKey="Harrington E">ED Harrington</name>
</author>
<author>
<name sortKey="Foerstner, Ku" uniqKey="Foerstner K">KU Foerstner</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huson, Dh" uniqKey="Huson D">DH Huson</name>
</author>
<author>
<name sortKey="Mitra, S" uniqKey="Mitra S">S Mitra</name>
</author>
<author>
<name sortKey="Ruscheweyh, H J" uniqKey="Ruscheweyh H">H-J Ruscheweyh</name>
</author>
<author>
<name sortKey="Weber, N" uniqKey="Weber N">N Weber</name>
</author>
<author>
<name sortKey="Schuster, Sc" uniqKey="Schuster S">SC Schuster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Raes, J" uniqKey="Raes J">J Raes</name>
</author>
<author>
<name sortKey="Arumugam, M" uniqKey="Arumugam M">M Arumugam</name>
</author>
<author>
<name sortKey="Burgdorf, Ks" uniqKey="Burgdorf K">KS Burgdorf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peterson, J" uniqKey="Peterson J">J Peterson</name>
</author>
<author>
<name sortKey="Garges, S" uniqKey="Garges S">S Garges</name>
</author>
<author>
<name sortKey="Giovanni, M" uniqKey="Giovanni M">M Giovanni</name>
</author>
<author>
<name sortKey="Mcinnes, P" uniqKey="Mcinnes P">P McInnes</name>
</author>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gilbert, Ja" uniqKey="Gilbert J">JA Gilbert</name>
</author>
<author>
<name sortKey="Meyer, F" uniqKey="Meyer F">F Meyer</name>
</author>
<author>
<name sortKey="Antonopoulos, D" uniqKey="Antonopoulos D">D Antonopoulos</name>
</author>
<author>
<name sortKey="Balaji, P" uniqKey="Balaji P">P Balaji</name>
</author>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Da" uniqKey="Simpson D">DA Simpson</name>
</author>
<author>
<name sortKey="Clark, Gr" uniqKey="Clark G">GR Clark</name>
</author>
<author>
<name sortKey="Alexander, S" uniqKey="Alexander S">S Alexander</name>
</author>
<author>
<name sortKey="Silvestri, G" uniqKey="Silvestri G">G Silvestri</name>
</author>
<author>
<name sortKey="Willoughby, Ce" uniqKey="Willoughby C">CE Willoughby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gonzalez, A" uniqKey="Gonzalez A">A Gonzalez</name>
</author>
<author>
<name sortKey="Knight, R" uniqKey="Knight R">R Knight</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mende, Dr" uniqKey="Mende D">DR Mende</name>
</author>
<author>
<name sortKey="Waller, As" uniqKey="Waller A">AS Waller</name>
</author>
<author>
<name sortKey="Sunagawa, S" uniqKey="Sunagawa S">S Sunagawa</name>
</author>
<author>
<name sortKey="J Rvelin, Ai" uniqKey="J Rvelin A">AI Järvelin</name>
</author>
<author>
<name sortKey="Chan, Mm" uniqKey="Chan M">MM Chan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cox, Mp" uniqKey="Cox M">MP Cox</name>
</author>
<author>
<name sortKey="Peterson, Da" uniqKey="Peterson D">DA Peterson</name>
</author>
<author>
<name sortKey="Biggs, Pj" uniqKey="Biggs P">PJ Biggs</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author>
<name sortKey="Oren, R" uniqKey="Oren R">R Oren</name>
</author>
<author>
<name sortKey="Ast, G" uniqKey="Ast G">G Ast</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Yu, C" uniqKey="Yu C">C Yu</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Lam, T W" uniqKey="Lam T">T-W Lam</name>
</author>
<author>
<name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author>
<name sortKey="Gevers, D" uniqKey="Gevers D">D Gevers</name>
</author>
<author>
<name sortKey="Earl, Am" uniqKey="Earl A">AM Earl</name>
</author>
<author>
<name sortKey="Feldgarden, M" uniqKey="Feldgarden M">M Feldgarden</name>
</author>
<author>
<name sortKey="Ward, Dv" uniqKey="Ward D">DV Ward</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Zhu, H" uniqKey="Zhu H">H Zhu</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Qian, W" uniqKey="Qian W">W Qian</name>
</author>
<author>
<name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ciccarelli, Fd" uniqKey="Ciccarelli F">FD Ciccarelli</name>
</author>
<author>
<name sortKey="Doerks, T" uniqKey="Doerks T">T Doerks</name>
</author>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Creevey, Cj" uniqKey="Creevey C">CJ Creevey</name>
</author>
<author>
<name sortKey="Snel, B" uniqKey="Snel B">B Snel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hyatt, D" uniqKey="Hyatt D">D Hyatt</name>
</author>
<author>
<name sortKey="Chen, G L" uniqKey="Chen G">G-L Chen</name>
</author>
<author>
<name sortKey="Locascio, Pf" uniqKey="Locascio P">PF Locascio</name>
</author>
<author>
<name sortKey="Land, Ml" uniqKey="Land M">ML Land</name>
</author>
<author>
<name sortKey="Larimer, Fw" uniqKey="Larimer F">FW Larimer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, W" uniqKey="Zhu W">W Zhu</name>
</author>
<author>
<name sortKey="Lomsadze, A" uniqKey="Lomsadze A">A Lomsadze</name>
</author>
<author>
<name sortKey="Borodovsky, M" uniqKey="Borodovsky M">M Borodovsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karsenti, E" uniqKey="Karsenti E">E Karsenti</name>
</author>
<author>
<name sortKey="Acinas, Sg" uniqKey="Acinas S">SG Acinas</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author>
<name sortKey="Bowler, C" uniqKey="Bowler C">C Bowler</name>
</author>
<author>
<name sortKey="De Vargas, C" uniqKey="De Vargas C">C De Vargas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cock, Pja" uniqKey="Cock P">PJA Cock</name>
</author>
<author>
<name sortKey="Fields, Cj" uniqKey="Fields C">CJ Fields</name>
</author>
<author>
<name sortKey="Goto, N" uniqKey="Goto N">N Goto</name>
</author>
<author>
<name sortKey="Heuer, Ml" uniqKey="Heuer M">ML Heuer</name>
</author>
<author>
<name sortKey="Rice, Pm" uniqKey="Rice P">PM Rice</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">23082188</article-id>
<article-id pub-id-type="pmc">3474746</article-id>
<article-id pub-id-type="publisher-id">PONE-D-12-19380</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0047656</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Biology</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Genomics</subject>
<subj-group>
<subject>Genome Analysis Tools</subject>
<subj-group>
<subject>Gene Prediction</subject>
<subject>Sequence Assembly Tools</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Comparative Genomics</subject>
<subject>Metagenomics</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group>
<subject>Ecology</subject>
<subj-group>
<subject>Community Ecology</subject>
<subj-group>
<subject>Community Assembly</subject>
<subject>Community Structure</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Microbial Ecology</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Microbiology</subject>
<subj-group>
<subject>Microbial Ecology</subject>
</subj-group>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit</article-title>
<alt-title alt-title-type="running-head">MOCAT: Metagenomics Toolkit</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes">
<name>
<surname>Kultima</surname>
<given-names>Jens Roat</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<name>
<surname>Sunagawa</surname>
<given-names>Shinichi</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Junhua</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Weineng</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Hua</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mende</surname>
<given-names>Daniel R.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Arumugam</surname>
<given-names>Manimozhiyan</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pan</surname>
<given-names>Qi</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liu</surname>
<given-names>Binghang</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Qin</surname>
<given-names>Junjie</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Jun</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bork</surname>
<given-names>Peer</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Department of Science and Technology, BGI-Shenzhen, Shenzhen, Guangdong, China</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, Guangdong, China</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Max-Delbruck-Centre for Molecular Medicine, Berlin-Buch, Germany</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Gilbert</surname>
<given-names>Jack Anthony</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Argonne National Laboratory, United States of America</addr-line>
</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>bork@embl.de</email>
</corresp>
<fn fn-type="conflict">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con">
<p>Conceived and designed the experiments: JRK SS PB. Performed the experiments: JRK SS. Analyzed the data: JRK SS. Contributed reagents/materials/analysis tools: DRM MA JL WC HC QP BL JQ JW. Wrote the paper: JRK SS PB.</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>17</day>
<month>10</month>
<year>2012</year>
</pub-date>
<volume>7</volume>
<issue>10</issue>
<elocation-id>e47656</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>6</month>
<year>2012</year>
</date>
<date date-type="accepted">
<day>13</day>
<month>9</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-year>2012</copyright-year>
<copyright-holder>Kultima et al</copyright-holder>
<license>
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<abstract>
<p>MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at
<ext-link ext-link-type="uri" xlink:href="http://www.bork.embl.de/mocat/">http://www.bork.embl.de/mocat/</ext-link>
.</p>
</abstract>
<funding-group>
<funding-statement>This work was funded by EMBL, the European Community’s Seventh Framework Programme via the MetaHIT (HEALTH-F4-2007-201052), International Human Microbiome Standards (IHMS) (HEALTH-F4-2010-261376), and Cancerbiome (ERC Advanced Grant 268985) grants. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<page-count count="6"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>The emerging field of metagenomics has enabled researchers to study the structure, dynamics, and functionality of uncultured microbial communities. Processing the vast amounts of metagenomics data usually involves quality-controlling raw sequence reads, aligning them to reference databases, and assembling them into longer contigs prior to predicting genes. Several packages are available for processing and analyzing metagenomics data, either as web- and cloud-based services or stand-alone computational pipelines
<xref ref-type="bibr" rid="pone.0047656-Goecks1">[1]</xref>
<xref ref-type="bibr" rid="pone.0047656-Huson1">[7]</xref>
. But currently none of them supports the assembly and gene prediction of metagenomics data produced by the Illumina platform.</p>
<p>As exemplified by recent clinical, large-scale, and on-going studies (e.g., the Human and Earth Microbiome Projects), the usage of high throughput sequencing (HTS) data can be anticipated to further increase considerably in both terms of data volume and scope of application
<xref ref-type="bibr" rid="pone.0047656-Qin1">[8]</xref>
<xref ref-type="bibr" rid="pone.0047656-Simpson1">[11]</xref>
. Thus, there is an imminent need for applications providing standardized methods for processing of HTS data in the form of pipelines
<xref ref-type="bibr" rid="pone.0047656-Gonzalez1">[12]</xref>
to facilitate comparative downstream analyses.</p>
<p>To address these issues, we have developed MOCAT, a metagenomics assembly and gene prediction toolkit for both small and large-scale processing of metagenomic data produced by the Illumina sequencing technology.</p>
</sec>
<sec id="s2">
<title>Results and Discussion</title>
<p>The main pipeline is divided into five major steps: (i) quality trimming and filtering of raw reads, (ii) optional mapping to remove, extract, and/or quantify reads matching a reference database, (iii) assembly, (iv) assembly revision, and (v) gene prediction (
<xref ref-type="fig" rid="pone-0047656-g001">Figure 1</xref>
). Statistics from each step are summarized into multi-sheet Excel documents, as well as queryable SQLite databases. Full details of output files and statistics produced in each processing step are given in
<xref ref-type="supplementary-material" rid="pone.0047656.s007">Table S7</xref>
.</p>
<fig id="pone-0047656-g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0047656.g001</object-id>
<label>Figure 1</label>
<caption>
<title>The MOCAT data processing pipeline.</title>
<p>Metagenomic samples are collected and sequenced. The raw sequence reads are given as input to the pipeline, which are processed by modular steps resulting in metagenome assemblies and predicted genes. Arrows extending to the right from boxes, indicate input to various downstream analyses. Statistics from each step are summarized into multi-sheet Excel documents, as well as queryable SQLite databases.</p>
</caption>
<graphic xlink:href="pone.0047656.g001"></graphic>
</fig>
<p>The individual processing steps in MOCAT were benchmarked using three different data sets: 124 published human gut metagenomic samples
<xref ref-type="bibr" rid="pone.0047656-Qin1">[8]</xref>
, a mock community produced by the Human Microbiome Project (HMP) with 22 species from 19 genera
<xref ref-type="bibr" rid="pone.0047656-Peterson1">[9]</xref>
, and a simulated metagenome with 100 strains from 85 species
<xref ref-type="bibr" rid="pone.0047656-Mende1">[13]</xref>
. By using this combination of host associated, artificial, and simulated metagenomes with different taxonomical resolution, we show that MOCAT can efficiently process a variety of metagenomic samples, ranging in both size (0.5–16.6 Gbp), origin and composition owing to new developments in each of the five major steps.</p>
<sec id="s2a">
<title>i) Quality Trimming and Filtering of Raw Reads</title>
<p>Read quality trimming and filtering can greatly improve the length and accuracy of metagenomic assemblies
<xref ref-type="bibr" rid="pone.0047656-Mende1">[13]</xref>
. Therefore, in the first processing step, raw reads below specified quality and length cutoffs are trimmed or removed using either the FastX program (
<ext-link ext-link-type="uri" xlink:href="http://hannonlab.cshl.edu/fastx_toolkit/">http://hannonlab.cshl.edu/fastx_toolkit/</ext-link>
) or the DynamicTrim algorithm in the SolexaQA package
<xref ref-type="bibr" rid="pone.0047656-Cox1">[14]</xref>
. The supported FastX program removes bases from the 3′ end below a user-defined threshold, whereas the DynamicTrim algorithm in the SolexaQA package keeps the longest contiguous read segment in which all quality scores are above the user-defined threshold
<xref ref-type="bibr" rid="pone.0047656-Cox1">[14]</xref>
. After quality trimming and filtering our three test datasets, 57–79% of the reads remained as high quality reads (
<xref ref-type="supplementary-material" rid="pone.0047656.s001">Table S1</xref>
).</p>
<p>Additionally, to reduce base composition-biases that commonly occur in HTS data
<xref ref-type="bibr" rid="pone.0047656-Schwartz1">[15]</xref>
, the frequency of each base at each position over all reads is calculated, and bases that exceed two standard deviations of the average base frequency within a sample are trimmed from the 5′ end of all reads. Using our test data set of 124 published human gut microbial samples, on average, the fraction of reads that could be mapped to assemblies was 1% higher when using 5′ trimmed reads, compared to non-trimmed reads (
<xref ref-type="supplementary-material" rid="pone.0047656.s002">Table S2</xref>
).</p>
<p>MOCAT also supports the FastQC package, for evaluating raw read quality statistics (
<ext-link ext-link-type="uri" xlink:href="http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc">http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc</ext-link>
).</p>
</sec>
<sec id="s2b">
<title>ii) Mapping, and Removal or Extraction of Reads Matching a Reference Database</title>
<p>In the second step, reads can be mapped to reference sequences in order to extract or remove reads from the original data set as well as to calculate base or read coverages. For example, reads from a human fecal metagenomic sample can be mapped to the provided human genome database (hg19, Genome Reference Consortium Human Reference 37) to remove reads of human origin using SOAPAligner2
<xref ref-type="bibr" rid="pone.0047656-Li1">[16]</xref>
, or reads containing adapters used for sequencing library construction can be removed using Usearch
<xref ref-type="bibr" rid="pone.0047656-Edgar1">[17]</xref>
. Reads can also be mapped to any other custom reference database to calculate base and insert coverage of reference sequences to estimate taxonomic and/or functional composition of a sample, for example.</p>
<p>Here, we estimated the taxonomic composition of the simulated metagenome by mapping reads to the set of original reference genomes (
<xref ref-type="supplementary-material" rid="pone.0047656.s002">Table S2</xref>
in
<xref ref-type="bibr" rid="pone.0047656-Mende1">[13]</xref>
and
<xref ref-type="supplementary-material" rid="pone.0047656.s003">Table S3</xref>
) and calculating genome size-normalized base and read coverages. The Pearson and Spearman correlation coefficients between the observed and expected composition of the simulated metagenome were 0.95 and 0.90, respectively, for both base and read counts (
<xref ref-type="fig" rid="pone-0047656-g002">Figure 2</xref>
), and only 80 out of more than 30 Million reads were not aligned. However, the observed abundances of genomes with very high sequence identity may deviate from the expected abundances due to reads mapping to both the genome of origin and other highly similar genomes.</p>
<fig id="pone-0047656-g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0047656.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Relative abundance of each reference genome present in the simulated metagenome.</title>
<p>The observed abundances by mapping reads to reference genomes and the expected abundance correlate with a Pearson correlation coefficient of 0.95 (base and read counts). Circles represent genomes with multiple strains from one species and squares represent genomes with only one strain within the species. All, but one, of the observations deviating from the diagonal are strains from the same species. These strains are either over- or under represented because reads are mapped to other closely related strains in addition to the strain of origin. Highlighted by dashed lines, are two examples where a high sequence similarity between strains (99.9% and 98.7% for the
<italic>Synechococcus elongatus</italic>
and
<italic>Escherichia coli</italic>
strains, respectively) can result in deviations from expected abundances.</p>
</caption>
<graphic xlink:href="pone.0047656.g002"></graphic>
</fig>
<p>When estimating taxonomic composition of the HMP mock community, reads were mapped to reference sequences of the community (
<xref ref-type="supplementary-material" rid="pone.0047656.s004">Table S4</xref>
). By first removing quality filtered and trimmed reads matching known Illumina adapter sequences (
<xref ref-type="supplementary-material" rid="pone.0047656.s005">Table S5</xref>
), the percentage of bases and reads mapping to the reference genomes increased from 94.3% to 97.3%, and 95.0% to 97.6%, respectively, indicating the usefulness of a pre-screening step. The taxonomic composition estimated here is similar to the values calculated by the HMP consortium (Pearson correlation coefficient of 0.75 and 0.83 for bases and reads mapping, respectively,
<xref ref-type="fig" rid="pone-0047656-g003">Figure 3</xref>
), and also to estimates of 16S sequences using 454 sequencing presented in (Figure S14,
<xref ref-type="bibr" rid="pone.0047656-Haas1">[18]</xref>
). Experimental errors, not applicable to estimates of computationally simulated metagenomes, may explain the lower correlation in the mock community, compared to the simulated metagenome.</p>
<fig id="pone-0047656-g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0047656.g003</object-id>
<label>Figure 3</label>
<caption>
<title>Relative abundance of each genus present in the even HMP mock community.</title>
<p>The estimated abundances using qPCR and by mapping reads to reference genomes correlate with a Pearson correlation coefficient of 0.75 (base counts) and 0.83 (read counts).</p>
</caption>
<graphic xlink:href="pone.0047656.g003"></graphic>
</fig>
</sec>
<sec id="s2c">
<title>iii) Assembly</title>
<p>In the assembly step, a new version (1.06) of SOAPdenovo
<xref ref-type="bibr" rid="pone.0047656-Li2">[19]</xref>
is used. For paired-end sequences, the insert size of each sequencing library is estimated at run-time by mapping reads to either reference marker genes
<xref ref-type="bibr" rid="pone.0047656-Ciccarelli1">[20]</xref>
prior to assembly, or assembled contigs prior to scaffolding. Similarly, Kmer sizes used for assemblies are calculated at run-time for each individual metagenome. Empirical tests on a large number of samples show that estimating a Kmer size for each sample as the closest odd number larger or equal to half the average read length may not yield the best possible assembly, but balances assembly throughput and accuracy.</p>
<p>The accuracy of metagenomic assemblies was assessed using data from the simulated metagenome and the mock community. We used the percentage of predicted complete genes aligning to the reference sequences of origin, as a proxy for correctly assembled scaftigs (contigs that were extended and linked using the paired-end information of sequencing reads). For the simulated metagenome this value was 95.2% (12,385 complete genes predicted), and for the mock community 89.3% of the complete genes aligned (1,042 complete genes predicted). The lower number of predicted complete genes in the mock community may be explained by the relatively low number of high quality reads used in the assembly for this metagenome.</p>
<p>The effect of using variable Kmer sizes, rather than a fixed kmer, in the assembly step, was evaluated using the 124 gut metagenomes. Estimating Kmer sizes at run-time for each individual metagenome, rather than using a fixed Kmer size across all samples, improved the number and frequency of complete gene calls as well as overall average gene length (column 1 in
<xref ref-type="table" rid="pone-0047656-t001">Table 1</xref>
).</p>
<table-wrap id="pone-0047656-t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0047656.t001</object-id>
<label>Table 1</label>
<caption>
<title>Progressive improvement of gene prediction metrics in 124 human gut metagenomes.</title>
</caption>
<alternatives>
<graphic id="pone-0047656-t001-1" xlink:href="pone.0047656.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Quality metric</td>
<td colspan="2" align="left" rowspan="1">Improvement compared to fixed kmer = 23 (%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">No assembly revision</td>
<td align="left" rowspan="1" colspan="1">Revised assembly</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Number of complete genes</td>
<td align="left" rowspan="1" colspan="1">8.1</td>
<td align="left" rowspan="1" colspan="1">10.2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of complete genes/Mbp</td>
<td align="left" rowspan="1" colspan="1">4.6</td>
<td align="left" rowspan="1" colspan="1">18.5</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Average gene length</td>
<td align="left" rowspan="1" colspan="1">1.7</td>
<td align="left" rowspan="1" colspan="1">1.8</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="nt101">
<label></label>
<p>Gene prediction metrics are improved when using an automated kmer size in SOAPdenovo and with assembly revision (correction of base errors, short indels, and chimeric contigs), compared to a fixed kmer size of = 23 in SOAPdenovo and no assembly revision. The Kmer size is estimated as the closest odd number greater than half the average read length for a sample. Numbers reported are in percent improvement of the respective quality metric. The calculated Kmer for each sample is given in
<xref ref-type="supplementary-material" rid="pone.0047656.s008">Table S8</xref>
.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s2d">
<title>iv) Assembly Revision</title>
<p>In the assembly revision step, a feature independent of the utilized assembly packages, MOCAT can revise existing paired-end read assemblies by aligning the reads to assembled scaftigs using the gap-tolerant BWA aligner
<xref ref-type="bibr" rid="pone.0047656-Li3">[21]</xref>
to correct for base errors and short indels, and the fast SOAPaligner2 to resolve chimeric regions. Performing assembly revision on the 124 human fecal metagenomes further improved gene prediction metrics (column 2 in
<xref ref-type="table" rid="pone-0047656-t001">Table 1</xref>
).</p>
</sec>
<sec id="s2e">
<title>v) Gene Prediction</title>
<p>Finally, protein coding genes on the metagenomes are predicted using either the default component Prodigal
<xref ref-type="bibr" rid="pone.0047656-Hyatt1">[22]</xref>
or MetaGeneMark
<xref ref-type="bibr" rid="pone.0047656-Zhu1">[23]</xref>
. An in depth comparison of the gene prediction software is beyond the scope of this article. However, each software have been benchmarked by the respective authors (
<ext-link ext-link-type="uri" xlink:href="http://prodigal.ornl.gov/results.php">http://prodigal.ornl.gov/results.php</ext-link>
and
<xref ref-type="bibr" rid="pone.0047656-Zhu1">[23]</xref>
). An independent comparison determined that MetaGeneMark had a higher precision and Prodigal a higher recall rate (
<ext-link ext-link-type="uri" xlink:href="http://genome.jgi.doe.gov/programs/metagenomes/benchmarks.jsf">http://genome.jgi.doe.gov/programs/metagenomes/benchmarks.jsf</ext-link>
).</p>
</sec>
<sec id="s2f">
<title>Conclusions</title>
<p>The functionality and versatility of the pipeline has been demonstrated using an artificial mock community metagenome, a simulated metagenome with 100 species, and 124 human gut metagenomes. Based on parameter exploration and data driven parameter optimization at run-time, the MOCAT pipeline can process metagenomes in a standardized and automated way while improving the quality of assembly and gene prediction compared to using default parameters for the supported programs. To date, MOCAT has additionally been used to process and assemble hundreds of host-associated and ocean metagenomes within the scope of the MetaHIT
<xref ref-type="bibr" rid="pone.0047656-Qin1">[8]</xref>
and TARA Oceans projects
<xref ref-type="bibr" rid="pone.0047656-Karsenti1">[24]</xref>
.</p>
</sec>
<sec id="s2g">
<title>Implementation, Availability, and Requirements</title>
<p>MOCAT is implemented in Perl and installed by extracting the package and executing one script, which downloads the default external software used by the pipeline and sets up the software. This reduces the otherwise tedious process of downloading all the individual components, a common drawback of in-house pipelines
<xref ref-type="bibr" rid="pone.0047656-Gonzalez1">[12]</xref>
. Optional components requiring a license, such as MetaGeneMark
<xref ref-type="bibr" rid="pone.0047656-Zhu1">[23]</xref>
for gene prediction, and Usearch
<xref ref-type="bibr" rid="pone.0047656-Edgar1">[17]</xref>
for removal or extraction of reads by alignment to a FASTA-formatted sequence file, require a manual download.</p>
<p>A new project is quickly setup requiring only single- or paired-end FastQ formatted sequencing reads files
<xref ref-type="bibr" rid="pone.0047656-Cock1">[25]</xref>
for each sample in a separate directory. The use of a project-specific configuration file, with suggested default settings, offers users to run all processing steps up to gene prediction without additional setup, while allowing experienced users to modify parameters and programs used in MOCAT. All of the settings are described in the MOCAT documentation.</p>
<p>A queuing system enables processing of a large number of samples in parallel. If present, MOCAT seamlessly integrates all processing steps with the SGE and PBS queuing systems. However, if no queuing system is available, MOCAT processes samples serially on the machine it was executed.</p>
<p>MOCAT runs on 64-bit UNIX systems and can be freely downloaded at
<ext-link ext-link-type="uri" xlink:href="http://www.bork.embl.de/mocat/">http://www.bork.embl.de/mocat/</ext-link>
. Perl version 5.8.8 or above is required. MOCAT is also available in a Virtual Machine package, which could be used to run MOCAT on a PC or a cloud based system. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. There are no minimum hardware requirements for the pipeline itself to run, however, requirements for analyzing metagenomic datasets vary depending on the number of samples to process in parallel and the sequencing depth of each sample. To aid in determining whether local computational resources are adequate, we provide in
<xref ref-type="supplementary-material" rid="pone.0047656.s006">Table S6</xref>
and
<xref ref-type="supplementary-material" rid="pone.0047656.s008">S8</xref>
the maximum resources required to process the datasets in this article. We recommend at least 16 GB of RAM to process smaller metagenomes and 64 GB of RAM to process medium sized metagenomes, but these requirements may vary depending on project settings and systems used. The hard disk space requirements depend on the size and number of metagenomes to analyze, but we recommend at least 500 GB of hard disk space.</p>
</sec>
</sec>
<sec sec-type="methods" id="s3">
<title>Methods</title>
<sec id="s3a">
<title>Data Sources</title>
<p>Data for the simulated metagenome is publically available at
<ext-link ext-link-type="uri" xlink:href="http://www.bork.embl.de/~mende/simulated_data/">http://www.bork.embl.de/~mende/simulated_data/</ext-link>
<xref ref-type="bibr" rid="pone.0047656-Mende1">[13]</xref>
. This dataset consisted of simulated paired-end raw reads and 193 reference sequences (chromosomes and plasmids) from 100 genomes used to simulate this metagenome (
<xref ref-type="supplementary-material" rid="pone.0047656.s003">Table S3</xref>
). Metagenomic data for the even HMP mock community were downloaded from
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/bioproject/48475">http://www.ncbi.nlm.nih.gov/bioproject/48475</ext-link>
, and the references sequences were downloaded from the NCBI database (
<xref ref-type="supplementary-material" rid="pone.0047656.s004">Table S4</xref>
), with the exception of
<italic>Candida albicans</italic>
, which was downloaded from
<ext-link ext-link-type="uri" xlink:href="http://www.candidagenome.org/download/sequence/C_albicans_SC5314/Assembly21/current/">http://www.candidagenome.org/download/sequence/C_albicans_SC5314/Assembly21/current/</ext-link>
. Metadata for the mock community was downloaded from
<ext-link ext-link-type="uri" xlink:href="http://www.hmpdacc.org/HMMC/">http://www.hmpdacc.org/HMMC/</ext-link>
. Datasets for the simulated metagenome and the mock community can optionally be downloaded automatically when installing the MOCAT pipeline.</p>
<p>Raw reads for the 124 human gut microbiomes were downloaded from the EBI homepage (accession number ERA000116,
<ext-link ext-link-type="uri" xlink:href="http://ftp.sra.ebi.ac.uk/vol1/ERA000/ERA000116/fastq/">http://ftp.sra.ebi.ac.uk/vol1/ERA000/ERA000116/fastq/</ext-link>
).</p>
</sec>
<sec id="s3b">
<title>Data Processing and Software Settings</title>
<p>The three datasets were processed by the
<italic>read_trim_filter</italic>
step in MOCAT with length cut off set to
<italic>30</italic>
and quality cut off set to
<italic>20</italic>
, using
<italic>solexaqa</italic>
for the mock community and the simulated metagenome, and
<italic>fastx</italic>
for the 124 gut metagenomes.</p>
<p>Estimated taxonomic compositions for the simulated metagenome and the mock community were calculated in three steps. First, quality trimmed and filtered reads from the mock community were screened against a FASTA-file with Illumina adapter sequences (
<xref ref-type="supplementary-material" rid="pone.0047656.s005">Table S5</xref>
), using the
<italic>screen_fastafile</italic>
option and e-value set to
<italic>0.01</italic>
. Second, screened reads from the mock community and quality trimmed and filtered reads from the simulated metagenome were mapped and filtered against the custom-made reference databases with chromosome and plasmid sequences from the 22 mock genomes (
<xref ref-type="supplementary-material" rid="pone.0047656.s004">Table S4</xref>
) and 100 genomes from the simulated metagenome (
<xref ref-type="supplementary-material" rid="pone.0047656.s002">Table S2</xref>
in
<xref ref-type="bibr" rid="pone.0047656-Mende1">[13]</xref>
and
<xref ref-type="supplementary-material" rid="pone.0047656.s003">Table S3</xref>
), respectively. This was done by executing the
<italic>screen</italic>
and
<italic>filter</italic>
commands with length cutoff set to
<italic>30</italic>
, percentage identity set to
<italic>90 and</italic>
paired_end_filtering set to
<italic>yes</italic>
for the simulated metagenome and set to
<italic>no</italic>
for the mock community. Finally, the taxonomic composition was estimated using the
<italic>calculate_coverage</italic>
command.</p>
<p>Assembly and gene prediction, on the simulated metagenome and mock community, were performed using the
<italic>assembly</italic>
(SOAPdenovo version
<italic>1.06</italic>
) and
<italic>gene_prediction</italic>
(
<italic>MetaGeneMark</italic>
) options. Quality trimmed and filtered reads from the simulated metagenome, and adapter-screened reads from the mock community, were assembled into scaftigs
<italic>60</italic>
bp or longer. Predicted complete genes were aligned to their respective metagenomes using blastall v2.2.26
<xref ref-type="bibr" rid="pone.0047656-Altschul1">[26]</xref>
(program blastn, 95% sequence identity, alignment length > = 90%, and e-value 0.1) and only the best hit selected.</p>
<p>The 124 human gut microbiomes were processed with and without 5′ trimming. 5′ trimmed reads were assembled using SOAPdenovo
<italic>1.05</italic>
, using both the Kmer determined by MOCAT and a fixed Kmer size set to 23. These assemblies were revised using SOAPdenovo
<italic>1.06</italic>
using the
<italic>assembly_revision</italic>
options, and genes were predicted, with
<italic>MetaGeneMark</italic>
as selected software, on scaftigs from both assemblies and revised assemblies. The non 5′ trimmed and 5′ trimmed reads were mapped to the assembled scaftigs using the
<italic>screen</italic>
option using length cutoff
<italic>30</italic>
and quality cutoff
<italic>15</italic>
.</p>
<p>Complete commands for processing the simulated metagenome and mock community in MOCAT are bundled with the installation of the pipeline.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="s4">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pone.0047656.s001">
<label>Table S1</label>
<caption>
<p>
<bold>Raw and high quality read and base statistics for the three metagenomic data sets used in this study.</bold>
</p>
<p>(DOCX)</p>
</caption>
<media xlink:href="pone.0047656.s001.docx" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s002">
<label>Table S2</label>
<caption>
<p>
<bold>Comparison of mapping rates of 5′ untrimmed and 5′ trimmed reads.</bold>
</p>
<p>(DOCX)</p>
</caption>
<media xlink:href="pone.0047656.s002.docx" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s003">
<label>Table S3</label>
<caption>
<p>
<bold>Mapping used when summarizing the estimated abundances for the simulated metagenome.</bold>
</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pone.0047656.s003.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s004">
<label>Table S4</label>
<caption>
<p>
<bold>Reference sequences to which reads from the even HMP mock community were mapped.</bold>
</p>
<p>(DOCX)</p>
</caption>
<media xlink:href="pone.0047656.s004.docx" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s005">
<label>Table S5</label>
<caption>
<p>
<bold>Aligned raw reads form the mock community to known Illumina adapters.</bold>
</p>
<p>(DOCX)</p>
</caption>
<media xlink:href="pone.0047656.s005.docx" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s006">
<label>Table S6</label>
<caption>
<p>
<bold>Maximum computational resources and processing time required for each processing step, for each of the datasets used in this article.</bold>
</p>
<p>(DOC)</p>
</caption>
<media xlink:href="pone.0047656.s006.doc" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s007">
<label>Table S7</label>
<caption>
<p>
<bold>Output files and statistics from each of the processing steps in MOCAT.</bold>
</p>
<p>(DOCX)</p>
</caption>
<media xlink:href="pone.0047656.s007.docx" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0047656.s008">
<label>Table S8</label>
<caption>
<p>
<bold>Number of raw and high quality (HQ) bases and reads, calculated Kmer size, and the computational resources (RAM and HDD) required to assemble the 124 fecal metagenomics samples.</bold>
</p>
<p>(DOCX)</p>
</caption>
<media xlink:href="pone.0047656.s008.docx" mimetype="application" mime-subtype="msword">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>We wish to thank the MetaHIT consortium and members of the Bork group, especially Siegfried Schloissnig, for fruitful discussions and code improvements.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0047656-Goecks1">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Goecks</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Nekrutenko</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
(
<year>2010</year>
)
<article-title>Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences</article-title>
.
<source>Genome biology</source>
<volume>11</volume>
:
<fpage>R86</fpage>
doi:10.1186/gb-2010-11-8-r86.
<pub-id pub-id-type="pmid">20738864</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Meyer1">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Paarmann</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>D’Souza</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Olson</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Glass</surname>
<given-names>EM</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes</article-title>
.
<source>BMC bioinformatics</source>
<volume>9</volume>
:
<fpage>386</fpage>
doi:10.1186/1471-2105-9-386.
<pub-id pub-id-type="pmid">18803844</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Sun1">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sun</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Altintas</surname>
<given-names>I</given-names>
</name>
,
<name>
<surname>Lin</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource</article-title>
.
<source>Nucleic acids research</source>
<volume>39</volume>
:
<fpage>D546</fpage>
<lpage>51</lpage>
doi:10.1093/nar/gkq1102.
<pub-id pub-id-type="pmid">21045053</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Markowitz1">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Markowitz</surname>
<given-names>VM</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>I-MA</given-names>
</name>
,
<name>
<surname>Chu</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Szeto</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Palaniappan</surname>
<given-names>K</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>IMG/M: the integrated metagenome data management and comparative analysis system</article-title>
.
<source>Nucleic acids research</source>
<volume>40</volume>
:
<fpage>D123</fpage>
<lpage>9</lpage>
doi:10.1093/nar/gkr975.
<pub-id pub-id-type="pmid">22086953</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Angiuoli1">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Angiuoli</surname>
<given-names>SV</given-names>
</name>
,
<name>
<surname>Matalka</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Gussman</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Galens</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Vangala</surname>
<given-names>M</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing</article-title>
.
<source>BMC Bioinformatics</source>
<volume>12</volume>
:
<fpage>356</fpage>
doi:10.1186/1471-2105-12-356.
<pub-id pub-id-type="pmid">21878105</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Arumugam1">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>Arumugam</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Harrington</surname>
<given-names>ED</given-names>
</name>
,
<name>
<surname>Foerstner</surname>
<given-names>KU</given-names>
</name>
,
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
(
<year>2010</year>
)
<article-title>SmashCommunity: A metagenomic annotation and analysis tool</article-title>
.
<source>Bioinformatics (Oxford, England)</source>
<volume>26</volume>
:
<fpage>2977</fpage>
<lpage>2978</lpage>
doi:10.1093/bioinformatics/btq536.</mixed-citation>
</ref>
<ref id="pone.0047656-Huson1">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Huson</surname>
<given-names>DH</given-names>
</name>
,
<name>
<surname>Mitra</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Ruscheweyh</surname>
<given-names>H-J</given-names>
</name>
,
<name>
<surname>Weber</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Schuster</surname>
<given-names>SC</given-names>
</name>
(
<year>2011</year>
)
<article-title>Integrative analysis of environmental sequences using MEGAN4</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>1552</fpage>
<lpage>1560</lpage>
doi:10.1101/gr.120618.111.
<pub-id pub-id-type="pmid">21690186</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Qin1">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Qin</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Raes</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Arumugam</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Burgdorf</surname>
<given-names>KS</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>A human gut microbial gene catalogue established by metagenomic sequencing</article-title>
.
<source>Nature</source>
<volume>464</volume>
:
<fpage>59</fpage>
<lpage>65</lpage>
doi:10.1038/nature08821.
<pub-id pub-id-type="pmid">20203603</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Peterson1">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Peterson</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Garges</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Giovanni</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>McInnes</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>L</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>The NIH Human Microbiome Project</article-title>
.
<source>Genome research</source>
<volume>19</volume>
:
<fpage>2317</fpage>
<lpage>2323</lpage>
doi:10.1101/gr.096651.109.
<pub-id pub-id-type="pmid">19819907</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Gilbert1">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gilbert</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Meyer</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Antonopoulos</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Balaji</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Meeting report: the terabase metagenomics workshop and the vision of an Earth microbiome project</article-title>
.
<source>Standards in genomic sciences</source>
<volume>3</volume>
:
<fpage>243</fpage>
<lpage>248</lpage>
doi:10.4056/sigs.1433550.
<pub-id pub-id-type="pmid">21304727</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Simpson1">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Simpson</surname>
<given-names>DA</given-names>
</name>
,
<name>
<surname>Clark</surname>
<given-names>GR</given-names>
</name>
,
<name>
<surname>Alexander</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Silvestri</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Willoughby</surname>
<given-names>CE</given-names>
</name>
(
<year>2011</year>
)
<article-title>Molecular diagnosis for heterogeneous genetic diseases with targeted high-throughput DNA sequencing applied to retinitis pigmentosa</article-title>
.
<source>Journal of medical genetics</source>
<volume>48</volume>
:
<fpage>145</fpage>
<lpage>151</lpage>
doi:10.1136/jmg.2010.083568.
<pub-id pub-id-type="pmid">21147909</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Gonzalez1">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gonzalez</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Knight</surname>
<given-names>R</given-names>
</name>
(
<year>2012</year>
)
<article-title>Advancing analytical algorithms and pipelines for billions of microbial sequences</article-title>
.
<source>Current Opinion in Biotechnology</source>
<volume>23</volume>
:
<fpage>64</fpage>
<lpage>71</lpage>
doi:10.1016/j.copbio.2011.11.028.
<pub-id pub-id-type="pmid">22172529</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Mende1">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Mende</surname>
<given-names>DR</given-names>
</name>
,
<name>
<surname>Waller</surname>
<given-names>AS</given-names>
</name>
,
<name>
<surname>Sunagawa</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Järvelin</surname>
<given-names>AI</given-names>
</name>
,
<name>
<surname>Chan</surname>
<given-names>MM</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Assessment of metagenomic assembly using simulated next generation sequencing data</article-title>
.
<source>PloS one</source>
<volume>7</volume>
:
<fpage>e31386</fpage>
doi:10.1371/journal.pone.0031386.
<pub-id pub-id-type="pmid">22384016</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Cox1">
<label>14</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cox</surname>
<given-names>MP</given-names>
</name>
,
<name>
<surname>Peterson</surname>
<given-names>DA</given-names>
</name>
,
<name>
<surname>Biggs</surname>
<given-names>PJ</given-names>
</name>
(
<year>2010</year>
)
<article-title>SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data</article-title>
.
<source>BMC bioinformatics</source>
<volume>11</volume>
:
<fpage>485</fpage>
doi:10.1186/1471-2105-11-485.
<pub-id pub-id-type="pmid">20875133</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Schwartz1">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Schwartz</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Oren</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Ast</surname>
<given-names>G</given-names>
</name>
(
<year>2011</year>
)
<article-title>Detection and removal of biases in the analysis of next-generation sequencing reads</article-title>
.
<source>PloS one</source>
<volume>6</volume>
:
<fpage>e16685</fpage>
doi:10.1371/journal.pone.0016685.
<pub-id pub-id-type="pmid">21304912</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Li1">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Lam</surname>
<given-names>T-W</given-names>
</name>
,
<name>
<surname>Yiu</surname>
<given-names>S-M</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>SOAP2: an improved ultrafast tool for short read alignment</article-title>
.
<source>Bioinformatics (Oxford, England)</source>
<volume>25</volume>
:
<fpage>1966</fpage>
<lpage>1967</lpage>
doi:10.1093/bioinformatics/btp336.</mixed-citation>
</ref>
<ref id="pone.0047656-Edgar1">
<label>17</label>
<mixed-citation publication-type="other">Edgar CRC (2012) USEARCH 5.1. Clinical infectious diseases?: an official publication of the Infectious Diseases Society of America 54: NP. doi:10.1093/cid/cir977.</mixed-citation>
</ref>
<ref id="pone.0047656-Haas1">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Haas</surname>
<given-names>BJ</given-names>
</name>
,
<name>
<surname>Gevers</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Earl</surname>
<given-names>AM</given-names>
</name>
,
<name>
<surname>Feldgarden</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Ward</surname>
<given-names>DV</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>494</fpage>
<lpage>504</lpage>
doi:10.1101/gr.112730.110.
<pub-id pub-id-type="pmid">21212162</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Li2">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Zhu</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Ruan</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Qian</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Fang</surname>
<given-names>X</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>De novo assembly of human genomes with massively parallel short read sequencing</article-title>
.
<source>Genome research</source>
<volume>20</volume>
:
<fpage>265</fpage>
<lpage>272</lpage>
doi:10.1101/gr.097261.109.
<pub-id pub-id-type="pmid">20019144</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Ciccarelli1">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ciccarelli</surname>
<given-names>FD</given-names>
</name>
,
<name>
<surname>Doerks</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>von Mering</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Creevey</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Snel</surname>
<given-names>B</given-names>
</name>
,
<etal>et al</etal>
(
<year>2006</year>
)
<article-title>Toward automatic reconstruction of a highly resolved tree of life</article-title>
.
<source>Science (New York, NY)</source>
<volume>311</volume>
:
<fpage>1283</fpage>
<lpage>1287</lpage>
doi:10.1126/science.1123061.</mixed-citation>
</ref>
<ref id="pone.0047656-Li3">
<label>21</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
(
<year>2009</year>
)
<article-title>Fast and accurate short read alignment with Burrows-Wheeler transform</article-title>
.
<source>Bioinformatics (Oxford, England)</source>
<volume>25</volume>
:
<fpage>1754</fpage>
<lpage>1760</lpage>
doi:10.1093/bioinformatics/btp324.</mixed-citation>
</ref>
<ref id="pone.0047656-Hyatt1">
<label>22</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hyatt</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>G-L</given-names>
</name>
,
<name>
<surname>Locascio</surname>
<given-names>PF</given-names>
</name>
,
<name>
<surname>Land</surname>
<given-names>ML</given-names>
</name>
,
<name>
<surname>Larimer</surname>
<given-names>FW</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Prodigal: prokaryotic gene recognition and translation initiation site identification</article-title>
.
<source>BMC bioinformatics</source>
<volume>11</volume>
:
<fpage>119</fpage>
doi:10.1186/1471-2105-11-119.
<pub-id pub-id-type="pmid">20211023</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Zhu1">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhu</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Lomsadze</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Borodovsky</surname>
<given-names>M</given-names>
</name>
(
<year>2010</year>
)
<article-title>Ab initio gene identification in metagenomic sequences</article-title>
.
<source>Nucleic acids research</source>
<volume>38</volume>
:
<fpage>1</fpage>
<lpage>15</lpage>
doi:10.1093/nar/gkq275.
<pub-id pub-id-type="pmid">19843612</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Karsenti1">
<label>24</label>
<mixed-citation publication-type="journal">
<name>
<surname>Karsenti</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Acinas</surname>
<given-names>SG</given-names>
</name>
,
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Bowler</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>De Vargas</surname>
<given-names>C</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>A holistic approach to marine eco-systems biology</article-title>
.
<source>PLoS biology</source>
<volume>9</volume>
:
<fpage>e1001177</fpage>
doi:10.1371/journal.pbio.1001177.
<pub-id pub-id-type="pmid">22028628</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Cock1">
<label>25</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cock</surname>
<given-names>PJA</given-names>
</name>
,
<name>
<surname>Fields</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Goto</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Heuer</surname>
<given-names>ML</given-names>
</name>
,
<name>
<surname>Rice</surname>
<given-names>PM</given-names>
</name>
(
<year>2010</year>
)
<article-title>The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants</article-title>
.
<source>Nucleic acids research</source>
<volume>38</volume>
:
<fpage>1767</fpage>
<lpage>1771</lpage>
doi:10.1093/nar/gkp1137.
<pub-id pub-id-type="pmid">20015970</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0047656-Altschul1">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
,
<name>
<surname>Gish</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
,
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
(
<year>1990</year>
)
<article-title>Basic local alignment search tool</article-title>
.
<source>Journal of molecular biology</source>
<volume>215</volume>
:
<fpage>403</fpage>
<lpage>410</lpage>
doi:10.1016/S0022-2836(05)80360-2.
<pub-id pub-id-type="pmid">2231712</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0006110 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0006110 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024