Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data

Identifieur interne : 000081 ( Pmc/Checkpoint ); précédent : 000080; suivant : 000082

NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data

Auteurs : Mohan A. V. S. K. Katta [Inde] ; Aamir W. Khan [Inde] ; Dadakhalandar Doddamani [Inde] ; Mahendar Thudi [Inde] ; Rajeev K. Varshney [Inde, Australie]

Source :

RBID : PMC:4604202

Abstract

Rapid popularity and adaptation of next generation sequencing (NGS) approaches have generated huge volumes of data. High throughput platforms like Illumina HiSeq produce terabytes of raw data that requires quick processing. Quality control of the data is an important component prior to the downstream analyses. To address these issues, we have developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or thousands of samples. Raspberry is an in-house tool, developed in C language utilizing HTSlib (v1.2.1) (http://htslib.org), for computing read/base level statistics. It can be used as stand-alone application and can process both compressed and uncompressed FASTQ format files. NGS-QCbox integrates Raspberry with other open-source tools for alignment (Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS data at higher efficiency and in high-throughput manner. The pipeline implements batch processing of jobs using Bpipe (https://github.com/ssadedin/bpipe) in parallel and internally, a fine grained task parallelization utilizing OpenMP. It reports read and base statistics along with genome coverage and variants in a user friendly format. The pipeline developed presents a simple menu driven interface and can be used in either quick or complete mode. In addition, the pipeline in quick mode outperforms in speed against other similar existing QC pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made available at the URL https://github.com/CEG-ICRISAT/NGS-QCbox and https://github.com/CEG-ICRISAT/Raspberry for rapid quality control analysis of large-scale next generation sequencing (Illumina) data.


Url:
DOI: 10.1371/journal.pone.0139868
PubMed: 26460497
PubMed Central: 4604202


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4604202

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data</title>
<author>
<name sortKey="Katta, Mohan A V S K" sort="Katta, Mohan A V S K" uniqKey="Katta M" first="Mohan A. V. S. K." last="Katta">Mohan A. V. S. K. Katta</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Khan, Aamir W" sort="Khan, Aamir W" uniqKey="Khan A" first="Aamir W." last="Khan">Aamir W. Khan</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Doddamani, Dadakhalandar" sort="Doddamani, Dadakhalandar" uniqKey="Doddamani D" first="Dadakhalandar" last="Doddamani">Dadakhalandar Doddamani</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Thudi, Mahendar" sort="Thudi, Mahendar" uniqKey="Thudi M" first="Mahendar" last="Thudi">Mahendar Thudi</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Varshney, Rajeev K" sort="Varshney, Rajeev K" uniqKey="Varshney R" first="Rajeev K." last="Varshney">Rajeev K. Varshney</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>School of Plant Biology and Institute of Agriculture, The University of Western Australia, Crawley, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>School of Plant Biology and Institute of Agriculture, The University of Western Australia, Crawley</wicri:regionArea>
<wicri:noRegion>Crawley</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26460497</idno>
<idno type="pmc">4604202</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4604202</idno>
<idno type="RBID">PMC:4604202</idno>
<idno type="doi">10.1371/journal.pone.0139868</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000081</idno>
<idno type="wicri:Area/Pmc/Curation">000081</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000081</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data</title>
<author>
<name sortKey="Katta, Mohan A V S K" sort="Katta, Mohan A V S K" uniqKey="Katta M" first="Mohan A. V. S. K." last="Katta">Mohan A. V. S. K. Katta</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Khan, Aamir W" sort="Khan, Aamir W" uniqKey="Khan A" first="Aamir W." last="Khan">Aamir W. Khan</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Doddamani, Dadakhalandar" sort="Doddamani, Dadakhalandar" uniqKey="Doddamani D" first="Dadakhalandar" last="Doddamani">Dadakhalandar Doddamani</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Thudi, Mahendar" sort="Thudi, Mahendar" uniqKey="Thudi M" first="Mahendar" last="Thudi">Mahendar Thudi</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Varshney, Rajeev K" sort="Varshney, Rajeev K" uniqKey="Varshney R" first="Rajeev K." last="Varshney">Rajeev K. Varshney</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</nlm:aff>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>School of Plant Biology and Institute of Agriculture, The University of Western Australia, Crawley, Australia</addr-line>
</nlm:aff>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>School of Plant Biology and Institute of Agriculture, The University of Western Australia, Crawley</wicri:regionArea>
<wicri:noRegion>Crawley</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Rapid popularity and adaptation of next generation sequencing (NGS) approaches have generated huge volumes of data. High throughput platforms like Illumina HiSeq produce terabytes of raw data that requires quick processing. Quality control of the data is an important component prior to the downstream analyses. To address these issues, we have developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or thousands of samples. Raspberry is an in-house tool, developed in C language utilizing HTSlib (v1.2.1) (
<ext-link ext-link-type="uri" xlink:href="http://htslib.org/">http://htslib.org</ext-link>
), for computing read/base level statistics. It can be used as stand-alone application and can process both compressed and uncompressed FASTQ format files. NGS-QCbox integrates Raspberry with other open-source tools for alignment (Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS data at higher efficiency and in high-throughput manner. The pipeline implements batch processing of jobs using Bpipe (
<ext-link ext-link-type="uri" xlink:href="https://github.com/ssadedin/bpipe">https://github.com/ssadedin/bpipe</ext-link>
) in parallel and internally, a fine grained task parallelization utilizing OpenMP. It reports read and base statistics along with genome coverage and variants in a user friendly format. The pipeline developed presents a simple menu driven interface and can be used in either
<italic>quick</italic>
or
<italic>complete</italic>
mode. In addition, the pipeline in
<italic>quick</italic>
mode outperforms in speed against other similar existing QC pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made available at the URL
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/NGS-QCbox">https://github.com/CEG-ICRISAT/NGS-QCbox</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/Raspberry">https://github.com/CEG-ICRISAT/Raspberry</ext-link>
for rapid quality control analysis of large-scale next generation sequencing (Illumina) data.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Thudi, M" uniqKey="Thudi M">M Thudi</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Jackson, Sa" uniqKey="Jackson S">SA Jackson</name>
</author>
<author>
<name sortKey="May, Gd" uniqKey="May G">GD May</name>
</author>
<author>
<name sortKey="Varshney, Rk" uniqKey="Varshney R">RK Varshney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccouch, S" uniqKey="Mccouch S">S McCouch</name>
</author>
<author>
<name sortKey="Baute, Gj" uniqKey="Baute G">GJ Baute</name>
</author>
<author>
<name sortKey="Bradeen, J" uniqKey="Bradeen J">J Bradeen</name>
</author>
<author>
<name sortKey="Bramel, P" uniqKey="Bramel P">P Bramel</name>
</author>
<author>
<name sortKey="Bretting, Pk" uniqKey="Bretting P">PK Bretting</name>
</author>
<author>
<name sortKey="Buckler, E" uniqKey="Buckler E">E Buckler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiao, Y" uniqKey="Jiao Y">Y Jiao</name>
</author>
<author>
<name sortKey="Zhao, H" uniqKey="Zhao H">H Zhao</name>
</author>
<author>
<name sortKey="Ren, L" uniqKey="Ren L">L Ren</name>
</author>
<author>
<name sortKey="Song, W" uniqKey="Song W">W Song</name>
</author>
<author>
<name sortKey="Zeng, B" uniqKey="Zeng B">B Zeng</name>
</author>
<author>
<name sortKey="Guo, J" uniqKey="Guo J">J Guo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mace, Es" uniqKey="Mace E">ES Mace</name>
</author>
<author>
<name sortKey="Tai, S" uniqKey="Tai S">S Tai</name>
</author>
<author>
<name sortKey="Gilding, Ek" uniqKey="Gilding E">EK Gilding</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Prentis, Pj" uniqKey="Prentis P">PJ Prentis</name>
</author>
<author>
<name sortKey="Bian, L" uniqKey="Bian L">L Bian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Varshney, Rk" uniqKey="Varshney R">RK Varshney</name>
</author>
<author>
<name sortKey="Song, C" uniqKey="Song C">C Song</name>
</author>
<author>
<name sortKey="Saxena, Rk" uniqKey="Saxena R">RK Saxena</name>
</author>
<author>
<name sortKey="Azam, S" uniqKey="Azam S">S Azam</name>
</author>
<author>
<name sortKey="Yu, S" uniqKey="Yu S">S Yu</name>
</author>
<author>
<name sortKey="Sharpe, Ag" uniqKey="Sharpe A">AG Sharpe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gardner, Sn" uniqKey="Gardner S">SN Gardner</name>
</author>
<author>
<name sortKey="Hall, Bg" uniqKey="Hall B">BG Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bertels, F" uniqKey="Bertels F">F Bertels</name>
</author>
<author>
<name sortKey="Silander, Ok" uniqKey="Silander O">OK Silander</name>
</author>
<author>
<name sortKey="Pachkov, M" uniqKey="Pachkov M">M Pachkov</name>
</author>
<author>
<name sortKey="Rainey, Pb" uniqKey="Rainey P">PB Rainey</name>
</author>
<author>
<name sortKey="Van Nimwegen, E" uniqKey="Van Nimwegen E">E van Nimwegen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patel, Rk" uniqKey="Patel R">RK Patel</name>
</author>
<author>
<name sortKey="Jain, M" uniqKey="Jain M">M Jain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Anders, S" uniqKey="Anders S">S Anders</name>
</author>
<author>
<name sortKey="Pyl, Pt" uniqKey="Pyl P">PT Pyl</name>
</author>
<author>
<name sortKey="Huber, W" uniqKey="Huber W">W Huber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cock, Pj" uniqKey="Cock P">PJ Cock</name>
</author>
<author>
<name sortKey="Fields, Cj" uniqKey="Fields C">CJ Fields</name>
</author>
<author>
<name sortKey="Goto, N" uniqKey="Goto N">N Goto</name>
</author>
<author>
<name sortKey="Heuer, Ml" uniqKey="Heuer M">ML Heuer</name>
</author>
<author>
<name sortKey="Rice, Pm" uniqKey="Rice P">PM Rice</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sadedin, Sp" uniqKey="Sadedin S">SP Sadedin</name>
</author>
<author>
<name sortKey="Pope, B" uniqKey="Pope B">B Pope</name>
</author>
<author>
<name sortKey="Oshlack, A" uniqKey="Oshlack A">A Oshlack</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N Homer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quinlan, Ar" uniqKey="Quinlan A">AR Quinlan</name>
</author>
<author>
<name sortKey="Hall, Im" uniqKey="Hall I">IM Hall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmieder, R" uniqKey="Schmieder R">R Schmieder</name>
</author>
<author>
<name sortKey="Edwards, R" uniqKey="Edwards R">R Edwards</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author>
<name sortKey="Myers, Jr" uniqKey="Myers J">JR Myers</name>
</author>
<author>
<name sortKey="Marth, Gt" uniqKey="Marth G">GT Marth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
<author>
<name sortKey="Vaughn, M" uniqKey="Vaughn M">M Vaughn</name>
</author>
<author>
<name sortKey="Mckay, S" uniqKey="Mckay S">S McKay</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Stapleton, Ae" uniqKey="Stapleton A">AE Stapleton</name>
</author>
<author>
<name sortKey="Gessler, D" uniqKey="Gessler D">D Gessler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26460497</article-id>
<article-id pub-id-type="pmc">4604202</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0139868</article-id>
<article-id pub-id-type="publisher-id">PONE-D-15-23733</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data</article-title>
<alt-title alt-title-type="running-head">NGS-QCbox for Quality Control of NGS Data</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Katta</surname>
<given-names>Mohan A. V. S. K.</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Khan</surname>
<given-names>Aamir W.</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Doddamani</surname>
<given-names>Dadakhalandar</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Thudi</surname>
<given-names>Mahendar</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Varshney</surname>
<given-names>Rajeev K.</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref rid="cor001" ref-type="corresp">*</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>School of Plant Biology and Institute of Agriculture, The University of Western Australia, Crawley, Australia</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Wang</surname>
<given-names>Junwen</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>The University of Hong Kong, HONG KONG</addr-line>
</aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: RKV MAVSKK. Performed the experiments: MAVSKK AWK DD. Analyzed the data: MAVSKK AWK DD RKV. Contributed reagents/materials/analysis tools: RKV MAVSKK AWK DD MT. Wrote the paper: RKV MAVSKK AWK DD MT.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>r.k.varshney@cgiar.org</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>13</day>
<month>10</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>10</volume>
<issue>10</issue>
<elocation-id>e0139868</elocation-id>
<history>
<date date-type="received">
<day>1</day>
<month>6</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>9</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-year>2015</copyright-year>
<copyright-holder>Katta et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0139868.pdf"></self-uri>
<abstract>
<p>Rapid popularity and adaptation of next generation sequencing (NGS) approaches have generated huge volumes of data. High throughput platforms like Illumina HiSeq produce terabytes of raw data that requires quick processing. Quality control of the data is an important component prior to the downstream analyses. To address these issues, we have developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or thousands of samples. Raspberry is an in-house tool, developed in C language utilizing HTSlib (v1.2.1) (
<ext-link ext-link-type="uri" xlink:href="http://htslib.org/">http://htslib.org</ext-link>
), for computing read/base level statistics. It can be used as stand-alone application and can process both compressed and uncompressed FASTQ format files. NGS-QCbox integrates Raspberry with other open-source tools for alignment (Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS data at higher efficiency and in high-throughput manner. The pipeline implements batch processing of jobs using Bpipe (
<ext-link ext-link-type="uri" xlink:href="https://github.com/ssadedin/bpipe">https://github.com/ssadedin/bpipe</ext-link>
) in parallel and internally, a fine grained task parallelization utilizing OpenMP. It reports read and base statistics along with genome coverage and variants in a user friendly format. The pipeline developed presents a simple menu driven interface and can be used in either
<italic>quick</italic>
or
<italic>complete</italic>
mode. In addition, the pipeline in
<italic>quick</italic>
mode outperforms in speed against other similar existing QC pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made available at the URL
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/NGS-QCbox">https://github.com/CEG-ICRISAT/NGS-QCbox</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/Raspberry">https://github.com/CEG-ICRISAT/Raspberry</ext-link>
for rapid quality control analysis of large-scale next generation sequencing (Illumina) data.</p>
</abstract>
<funding-group>
<funding-statement>Authors are thankful to the CGIAR Generation Challenge Program for financial support. This work has been undertaken as part of the CGIAR Research Program on Grain Legumes. ICRISAT is a member of the CGIAR Consortium.</funding-statement>
</funding-group>
<counts>
<fig-count count="2"></fig-count>
<table-count count="2"></table-count>
<page-count count="9"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>Raspberry, the inhouse tool is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/Raspberry">https://github.com/CEG-ICRISAT/Raspberry</ext-link>
The NGS-QCbox pipeline is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/NGS-QCbox">https://github.com/CEG-ICRISAT/NGS-QCbox</ext-link>
. The simulated dataset used for benchmarking is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/NGS-QCbox/blob/master/README.md#datasets-used-for-testing">https://github.com/CEG-ICRISAT/NGS-QCbox/blob/master/README.md#datasets-used-for-testing</ext-link>
.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>Raspberry, the inhouse tool is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/Raspberry">https://github.com/CEG-ICRISAT/Raspberry</ext-link>
The NGS-QCbox pipeline is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/NGS-QCbox">https://github.com/CEG-ICRISAT/NGS-QCbox</ext-link>
. The simulated dataset used for benchmarking is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/CEG-ICRISAT/NGS-QCbox/blob/master/README.md#datasets-used-for-testing">https://github.com/CEG-ICRISAT/NGS-QCbox/blob/master/README.md#datasets-used-for-testing</ext-link>
.</p>
</notes>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>Australie</li>
<li>Inde</li>
</country>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Katta, Mohan A V S K" sort="Katta, Mohan A V S K" uniqKey="Katta M" first="Mohan A. V. S. K." last="Katta">Mohan A. V. S. K. Katta</name>
</noRegion>
<name sortKey="Doddamani, Dadakhalandar" sort="Doddamani, Dadakhalandar" uniqKey="Doddamani D" first="Dadakhalandar" last="Doddamani">Dadakhalandar Doddamani</name>
<name sortKey="Khan, Aamir W" sort="Khan, Aamir W" uniqKey="Khan A" first="Aamir W." last="Khan">Aamir W. Khan</name>
<name sortKey="Thudi, Mahendar" sort="Thudi, Mahendar" uniqKey="Thudi M" first="Mahendar" last="Thudi">Mahendar Thudi</name>
<name sortKey="Varshney, Rajeev K" sort="Varshney, Rajeev K" uniqKey="Varshney R" first="Rajeev K." last="Varshney">Rajeev K. Varshney</name>
</country>
<country name="Australie">
<noRegion>
<name sortKey="Varshney, Rajeev K" sort="Varshney, Rajeev K" uniqKey="Varshney R" first="Rajeev K." last="Varshney">Rajeev K. Varshney</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000081 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000081 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:4604202
   |texte=   NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:26460497" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024