Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

CD-HIT: accelerated for clustering the next-generation sequencing data

Identifieur interne : 000361 ( Ncbi/Checkpoint ); précédent : 000360; suivant : 000362

CD-HIT: accelerated for clustering the next-generation sequencing data

Auteurs : Limin Fu ; Beifang Niu ; Zhengwei Zhu ; Sitao Wu ; Weizhong Li

Source :

RBID : PMC:3516142

Abstract

Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.

Availability:http://cd-hit.org.

Contact:liwz@sdsc.edu

Supplementary information:Supplementary data are available at Bioinformatics online.


Url:
DOI: 10.1093/bioinformatics/bts565
PubMed: 23060610
PubMed Central: 3516142


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3516142

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">CD-HIT: accelerated for clustering the next-generation sequencing data</title>
<author>
<name sortKey="Fu, Limin" sort="Fu, Limin" uniqKey="Fu L" first="Limin" last="Fu">Limin Fu</name>
</author>
<author>
<name sortKey="Niu, Beifang" sort="Niu, Beifang" uniqKey="Niu B" first="Beifang" last="Niu">Beifang Niu</name>
</author>
<author>
<name sortKey="Zhu, Zhengwei" sort="Zhu, Zhengwei" uniqKey="Zhu Z" first="Zhengwei" last="Zhu">Zhengwei Zhu</name>
</author>
<author>
<name sortKey="Wu, Sitao" sort="Wu, Sitao" uniqKey="Wu S" first="Sitao" last="Wu">Sitao Wu</name>
</author>
<author>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">23060610</idno>
<idno type="pmc">3516142</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516142</idno>
<idno type="RBID">PMC:3516142</idno>
<idno type="doi">10.1093/bioinformatics/bts565</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000498</idno>
<idno type="wicri:Area/Pmc/Curation">000498</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000436</idno>
<idno type="wicri:Area/Ncbi/Merge">000361</idno>
<idno type="wicri:Area/Ncbi/Curation">000361</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000361</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">CD-HIT: accelerated for clustering the next-generation sequencing data</title>
<author>
<name sortKey="Fu, Limin" sort="Fu, Limin" uniqKey="Fu L" first="Limin" last="Fu">Limin Fu</name>
</author>
<author>
<name sortKey="Niu, Beifang" sort="Niu, Beifang" uniqKey="Niu B" first="Beifang" last="Niu">Beifang Niu</name>
</author>
<author>
<name sortKey="Zhu, Zhengwei" sort="Zhu, Zhengwei" uniqKey="Zhu Z" first="Zhengwei" last="Zhu">Zhengwei Zhu</name>
</author>
<author>
<name sortKey="Wu, Sitao" sort="Wu, Sitao" uniqKey="Wu S" first="Sitao" last="Wu">Sitao Wu</name>
</author>
<author>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>
<bold>Summary:</bold>
CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.</p>
<p>
<bold>Availability:</bold>
<ext-link ext-link-type="uri" xlink:href="http://cd-hit.org">http://cd-hit.org</ext-link>
.</p>
<p>
<bold>Contact:</bold>
<email>liwz@sdsc.edu</email>
</p>
<p>
<bold>Supplementary information:</bold>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/cgi/content/full/bts565/DC1">Supplementary data</ext-link>
are available at
<italic>Bioinformatics</italic>
online.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Godzik, A" uniqKey="Godzik A">A Godzik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loong, Snk" uniqKey="Loong S">SNK Loong</name>
</author>
<author>
<name sortKey="Mishra, Sk" uniqKey="Mishra S">SK Mishra</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niu, B" uniqKey="Niu B">B Niu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qin, J" uniqKey="Qin J">J Qin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rubinstein, R" uniqKey="Rubinstein R">R Rubinstein</name>
</author>
<author>
<name sortKey="Fiser, A" uniqKey="Fiser A">A Fiser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S Sun</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suzek, Be" uniqKey="Suzek B">BE Suzek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Turnbaugh, Pj" uniqKey="Turnbaugh P">PJ Turnbaugh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yooseph, S" uniqKey="Yooseph S">S Yooseph</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Fu, Limin" sort="Fu, Limin" uniqKey="Fu L" first="Limin" last="Fu">Limin Fu</name>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
<name sortKey="Niu, Beifang" sort="Niu, Beifang" uniqKey="Niu B" first="Beifang" last="Niu">Beifang Niu</name>
<name sortKey="Wu, Sitao" sort="Wu, Sitao" uniqKey="Wu S" first="Sitao" last="Wu">Sitao Wu</name>
<name sortKey="Zhu, Zhengwei" sort="Zhu, Zhengwei" uniqKey="Zhu Z" first="Zhengwei" last="Zhu">Zhengwei Zhu</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Ncbi/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000361 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd -nk 000361 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Ncbi
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:3516142
   |texte=   CD-HIT: accelerated for clustering the next-generation sequencing data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/RBID.i   -Sk "pubmed:23060610" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024