Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000278 ( Pmc/Corpus ); précédent : 0002779; suivant : 0002790 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">PeakRanger: A cloud-enabled peak caller for ChIP-seq data</title>
<author>
<name sortKey="Feng, Xin" sort="Feng, Xin" uniqKey="Feng X" first="Xin" last="Feng">Xin Feng</name>
<affiliation>
<nlm:aff id="I1">Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Grossman, Robert" sort="Grossman, Robert" uniqKey="Grossman R" first="Robert" last="Grossman">Robert Grossman</name>
<affiliation>
<nlm:aff id="I4">Institute for Genomics & Systems Biology, The University of Chicago, Cummings Life Sciences Center 431A, 920 East 58th Street, Chicago, IL 60637, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stein, Lincoln" sort="Stein, Lincoln" uniqKey="Stein L" first="Lincoln" last="Stein">Lincoln Stein</name>
<affiliation>
<nlm:aff id="I1">Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21554709</idno>
<idno type="pmc">3103446</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3103446</idno>
<idno type="RBID">PMC:3103446</idno>
<idno type="doi">10.1186/1471-2105-12-139</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Pmc/Corpus">000278</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">PeakRanger: A cloud-enabled peak caller for ChIP-seq data</title>
<author>
<name sortKey="Feng, Xin" sort="Feng, Xin" uniqKey="Feng X" first="Xin" last="Feng">Xin Feng</name>
<affiliation>
<nlm:aff id="I1">Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Grossman, Robert" sort="Grossman, Robert" uniqKey="Grossman R" first="Robert" last="Grossman">Robert Grossman</name>
<affiliation>
<nlm:aff id="I4">Institute for Genomics & Systems Biology, The University of Chicago, Cummings Life Sciences Center 431A, 920 East 58th Street, Chicago, IL 60637, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stein, Lincoln" sort="Stein, Lincoln" uniqKey="Stein L" first="Lincoln" last="Stein">Lincoln Stein</name>
<affiliation>
<nlm:aff id="I1">Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks.</p>
</sec>
<sec>
<title>Results</title>
<p>In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project:
<ext-link ext-link-type="uri" xlink:href="http://www.modencode.org/software/ranger/">http://www.modencode.org/software/ranger/</ext-link>
</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Mikkelsen, Ts" uniqKey="Mikkelsen T">TS Mikkelsen</name>
</author>
<author>
<name sortKey="Ku, M" uniqKey="Ku M">M Ku</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author>
<name sortKey="Issac, B" uniqKey="Issac B">B Issac</name>
</author>
<author>
<name sortKey="Lieberman, E" uniqKey="Lieberman E">E Lieberman</name>
</author>
<author>
<name sortKey="Giannoukos, G" uniqKey="Giannoukos G">G Giannoukos</name>
</author>
<author>
<name sortKey="Alvarez, P" uniqKey="Alvarez P">P Alvarez</name>
</author>
<author>
<name sortKey="Brockman, W" uniqKey="Brockman W">W Brockman</name>
</author>
<author>
<name sortKey="Kim, Tk" uniqKey="Kim T">TK Kim</name>
</author>
<author>
<name sortKey="Koche, Rp" uniqKey="Koche R">RP Koche</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Ds" uniqKey="Johnson D">DS Johnson</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Myers, Rm" uniqKey="Myers R">RM Myers</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robertson, G" uniqKey="Robertson G">G Robertson</name>
</author>
<author>
<name sortKey="Hirst, M" uniqKey="Hirst M">M Hirst</name>
</author>
<author>
<name sortKey="Bainbridge, M" uniqKey="Bainbridge M">M Bainbridge</name>
</author>
<author>
<name sortKey="Bilenky, M" uniqKey="Bilenky M">M Bilenky</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
<author>
<name sortKey="Zeng, T" uniqKey="Zeng T">T Zeng</name>
</author>
<author>
<name sortKey="Euskirchen, G" uniqKey="Euskirchen G">G Euskirchen</name>
</author>
<author>
<name sortKey="Bernier, B" uniqKey="Bernier B">B Bernier</name>
</author>
<author>
<name sortKey="Varhol, R" uniqKey="Varhol R">R Varhol</name>
</author>
<author>
<name sortKey="Delaney, A" uniqKey="Delaney A">A Delaney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
<author>
<name sortKey="Cuddapah, S" uniqKey="Cuddapah S">S Cuddapah</name>
</author>
<author>
<name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author>
<name sortKey="Roh, Ty" uniqKey="Roh T">TY Roh</name>
</author>
<author>
<name sortKey="Schones, De" uniqKey="Schones D">DE Schones</name>
</author>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
<author>
<name sortKey="Chepelev, I" uniqKey="Chepelev I">I Chepelev</name>
</author>
<author>
<name sortKey="Zhao, K" uniqKey="Zhao K">K Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Park, Pj" uniqKey="Park P">PJ Park</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
<author>
<name sortKey="Robert, F" uniqKey="Robert F">F Robert</name>
</author>
<author>
<name sortKey="Wyrick, Jj" uniqKey="Wyrick J">JJ Wyrick</name>
</author>
<author>
<name sortKey="Aparicio, O" uniqKey="Aparicio O">O Aparicio</name>
</author>
<author>
<name sortKey="Jennings, Eg" uniqKey="Jennings E">EG Jennings</name>
</author>
<author>
<name sortKey="Simon, I" uniqKey="Simon I">I Simon</name>
</author>
<author>
<name sortKey="Zeitlinger, J" uniqKey="Zeitlinger J">J Zeitlinger</name>
</author>
<author>
<name sortKey="Schreiber, J" uniqKey="Schreiber J">J Schreiber</name>
</author>
<author>
<name sortKey="Hannett, N" uniqKey="Hannett N">N Hannett</name>
</author>
<author>
<name sortKey="Kanin, E" uniqKey="Kanin E">E Kanin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iyer, Vr" uniqKey="Iyer V">VR Iyer</name>
</author>
<author>
<name sortKey="Horak, Ce" uniqKey="Horak C">CE Horak</name>
</author>
<author>
<name sortKey="Scafe, Cs" uniqKey="Scafe C">CS Scafe</name>
</author>
<author>
<name sortKey="Botstein, D" uniqKey="Botstein D">D Botstein</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
<author>
<name sortKey="Brown, Po" uniqKey="Brown P">PO Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pepke, S" uniqKey="Pepke S">S Pepke</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lun, D" uniqKey="Lun D">D Lun</name>
</author>
<author>
<name sortKey="Sherrid, A" uniqKey="Sherrid A">A Sherrid</name>
</author>
<author>
<name sortKey="Weiner, B" uniqKey="Weiner B">B Weiner</name>
</author>
<author>
<name sortKey="Sherman, D" uniqKey="Sherman D">D Sherman</name>
</author>
<author>
<name sortKey="Galagan, J" uniqKey="Galagan J">J Galagan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blahnik, Kr" uniqKey="Blahnik K">KR Blahnik</name>
</author>
<author>
<name sortKey="Dou, L" uniqKey="Dou L">L Dou</name>
</author>
<author>
<name sortKey="O Geen, H" uniqKey="O Geen H">H O'Geen</name>
</author>
<author>
<name sortKey="Mcphillips, T" uniqKey="Mcphillips T">T McPhillips</name>
</author>
<author>
<name sortKey="Xu, X" uniqKey="Xu X">X Xu</name>
</author>
<author>
<name sortKey="Cao, Ar" uniqKey="Cao A">AR Cao</name>
</author>
<author>
<name sortKey="Iyengar, S" uniqKey="Iyengar S">S Iyengar</name>
</author>
<author>
<name sortKey="Nicolet, Cm" uniqKey="Nicolet C">CM Nicolet</name>
</author>
<author>
<name sortKey="Lud Scher, B" uniqKey="Lud Scher B">B Ludäscher</name>
</author>
<author>
<name sortKey="Korf, I" uniqKey="Korf I">I Korf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ji, H" uniqKey="Ji H">H Ji</name>
</author>
<author>
<name sortKey="Jiang, H" uniqKey="Jiang H">H Jiang</name>
</author>
<author>
<name sortKey="Ma, W" uniqKey="Ma W">W Ma</name>
</author>
<author>
<name sortKey="Johnson, D" uniqKey="Johnson D">D Johnson</name>
</author>
<author>
<name sortKey="Myers, R" uniqKey="Myers R">R Myers</name>
</author>
<author>
<name sortKey="Wong, W" uniqKey="Wong W">W Wong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jothi, R" uniqKey="Jothi R">R Jothi</name>
</author>
<author>
<name sortKey="Cuddapah, S" uniqKey="Cuddapah S">S Cuddapah</name>
</author>
<author>
<name sortKey="Barski, A" uniqKey="Barski A">A Barski</name>
</author>
<author>
<name sortKey="Cui, K" uniqKey="Cui K">K Cui</name>
</author>
<author>
<name sortKey="Zhao, K" uniqKey="Zhao K">K Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zang, Cz" uniqKey="Zang C">CZ Zang</name>
</author>
<author>
<name sortKey="Schones, De" uniqKey="Schones D">DE Schones</name>
</author>
<author>
<name sortKey="Zeng, C" uniqKey="Zeng C">C Zeng</name>
</author>
<author>
<name sortKey="Cui, Kr" uniqKey="Cui K">KR Cui</name>
</author>
<author>
<name sortKey="Zhao, Kj" uniqKey="Zhao K">KJ Zhao</name>
</author>
<author>
<name sortKey="Peng, Wq" uniqKey="Peng W">WQ Peng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fejes, A" uniqKey="Fejes A">A Fejes</name>
</author>
<author>
<name sortKey="Robertson, G" uniqKey="Robertson G">G Robertson</name>
</author>
<author>
<name sortKey="Bilenky, M" uniqKey="Bilenky M">M Bilenky</name>
</author>
<author>
<name sortKey="Varhol, R" uniqKey="Varhol R">R Varhol</name>
</author>
<author>
<name sortKey="Bainbridge, M" uniqKey="Bainbridge M">M Bainbridge</name>
</author>
<author>
<name sortKey="Jones, S" uniqKey="Jones S">S Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boyle, Ap" uniqKey="Boyle A">AP Boyle</name>
</author>
<author>
<name sortKey="Guinney, J" uniqKey="Guinney J">J Guinney</name>
</author>
<author>
<name sortKey="Crawford, Ge" uniqKey="Crawford G">GE Crawford</name>
</author>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tuteja, G" uniqKey="Tuteja G">G Tuteja</name>
</author>
<author>
<name sortKey="White, P" uniqKey="White P">P White</name>
</author>
<author>
<name sortKey="Schug, J" uniqKey="Schug J">J Schug</name>
</author>
<author>
<name sortKey="Kaestner, Kh" uniqKey="Kaestner K">KH Kaestner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Liu, T" uniqKey="Liu T">T Liu</name>
</author>
<author>
<name sortKey="Meyer, C" uniqKey="Meyer C">C Meyer</name>
</author>
<author>
<name sortKey="Eeckhoute, J" uniqKey="Eeckhoute J">J Eeckhoute</name>
</author>
<author>
<name sortKey="Johnson, D" uniqKey="Johnson D">D Johnson</name>
</author>
<author>
<name sortKey="Bernstein, B" uniqKey="Bernstein B">B Bernstein</name>
</author>
<author>
<name sortKey="Nussbaum, C" uniqKey="Nussbaum C">C Nussbaum</name>
</author>
<author>
<name sortKey="Myers, R" uniqKey="Myers R">R Myers</name>
</author>
<author>
<name sortKey="Brown, M" uniqKey="Brown M">M Brown</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozowsky, J" uniqKey="Rozowsky J">J Rozowsky</name>
</author>
<author>
<name sortKey="Euskirchen, G" uniqKey="Euskirchen G">G Euskirchen</name>
</author>
<author>
<name sortKey="Auerbach, R" uniqKey="Auerbach R">R Auerbach</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Gibson, T" uniqKey="Gibson T">T Gibson</name>
</author>
<author>
<name sortKey="Bjornson, R" uniqKey="Bjornson R">R Bjornson</name>
</author>
<author>
<name sortKey="Carriero, N" uniqKey="Carriero N">N Carriero</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Valouev, A" uniqKey="Valouev A">A Valouev</name>
</author>
<author>
<name sortKey="Johnson, D" uniqKey="Johnson D">D Johnson</name>
</author>
<author>
<name sortKey="Sundquist, A" uniqKey="Sundquist A">A Sundquist</name>
</author>
<author>
<name sortKey="Medina, C" uniqKey="Medina C">C Medina</name>
</author>
<author>
<name sortKey="Anton, E" uniqKey="Anton E">E Anton</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
<author>
<name sortKey="Myers, R" uniqKey="Myers R">R Myers</name>
</author>
<author>
<name sortKey="Sidow, A" uniqKey="Sidow A">A Sidow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kharchenko, P" uniqKey="Kharchenko P">P Kharchenko</name>
</author>
<author>
<name sortKey="Tolstorukov, M" uniqKey="Tolstorukov M">M Tolstorukov</name>
</author>
<author>
<name sortKey="Park, P" uniqKey="Park P">P Park</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nix, D" uniqKey="Nix D">D Nix</name>
</author>
<author>
<name sortKey="Courdy, S" uniqKey="Courdy S">S Courdy</name>
</author>
<author>
<name sortKey="Boucher, K" uniqKey="Boucher K">K Boucher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guo, Y" uniqKey="Guo Y">Y Guo</name>
</author>
<author>
<name sortKey="Papachristoudis, G" uniqKey="Papachristoudis G">G Papachristoudis</name>
</author>
<author>
<name sortKey="Altshuler, Rc" uniqKey="Altshuler R">RC Altshuler</name>
</author>
<author>
<name sortKey="Gerber, Gk" uniqKey="Gerber G">GK Gerber</name>
</author>
<author>
<name sortKey="Jaakkola, Ts" uniqKey="Jaakkola T">TS Jaakkola</name>
</author>
<author>
<name sortKey="Gifford, Dk" uniqKey="Gifford D">DK Gifford</name>
</author>
<author>
<name sortKey="Mahony, S" uniqKey="Mahony S">S Mahony</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Williams, Ba" uniqKey="Williams B">BA Williams</name>
</author>
<author>
<name sortKey="Mccue, K" uniqKey="Mccue K">K McCue</name>
</author>
<author>
<name sortKey="Schaeffer, L" uniqKey="Schaeffer L">L Schaeffer</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qin, Z" uniqKey="Qin Z">Z Qin</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Shen, J" uniqKey="Shen J">J Shen</name>
</author>
<author>
<name sortKey="Maher, C" uniqKey="Maher C">C Maher</name>
</author>
<author>
<name sortKey="Hu, M" uniqKey="Hu M">M Hu</name>
</author>
<author>
<name sortKey="Kalyana Sundaram, S" uniqKey="Kalyana Sundaram S">S Kalyana-Sundaram</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Chinnaiyan, A" uniqKey="Chinnaiyan A">A Chinnaiyan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilbanks, Eg" uniqKey="Wilbanks E">EG Wilbanks</name>
</author>
<author>
<name sortKey="Facciotti, Mt" uniqKey="Facciotti M">MT Facciotti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Gribskov, M" uniqKey="Gribskov M">M Gribskov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Celniker, Se" uniqKey="Celniker S">SE Celniker</name>
</author>
<author>
<name sortKey="Dillon, Lal" uniqKey="Dillon L">LAL Dillon</name>
</author>
<author>
<name sortKey="Gerstein, Mb" uniqKey="Gerstein M">MB Gerstein</name>
</author>
<author>
<name sortKey="Gunsalus, Kc" uniqKey="Gunsalus K">KC Gunsalus</name>
</author>
<author>
<name sortKey="Henikoff, S" uniqKey="Henikoff S">S Henikoff</name>
</author>
<author>
<name sortKey="Karpen, Gh" uniqKey="Karpen G">GH Karpen</name>
</author>
<author>
<name sortKey="Kellis, M" uniqKey="Kellis M">M Kellis</name>
</author>
<author>
<name sortKey="Lai, Ec" uniqKey="Lai E">EC Lai</name>
</author>
<author>
<name sortKey="Lieb, Jd" uniqKey="Lieb J">JD Lieb</name>
</author>
<author>
<name sortKey="Macalpine, Dm" uniqKey="Macalpine D">DM MacAlpine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N Homer</name>
</author>
<author>
<name sortKey="Marth, G" uniqKey="Marth G">G Marth</name>
</author>
<author>
<name sortKey="Abecasis, G" uniqKey="Abecasis G">G Abecasis</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, L" uniqKey="Stein L">L Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Armbrust, M" uniqKey="Armbrust M">M Armbrust</name>
</author>
<author>
<name sortKey="Fox, A" uniqKey="Fox A">A Fox</name>
</author>
<author>
<name sortKey="Griffith, R" uniqKey="Griffith R">R Griffith</name>
</author>
<author>
<name sortKey="Joseph, Ad" uniqKey="Joseph A">AD Joseph</name>
</author>
<author>
<name sortKey="Katz, Rh" uniqKey="Katz R">RH Katz</name>
</author>
<author>
<name sortKey="Konwinski, A" uniqKey="Konwinski A">A Konwinski</name>
</author>
<author>
<name sortKey="Lee, G" uniqKey="Lee G">G Lee</name>
</author>
<author>
<name sortKey="Patterson, Da" uniqKey="Patterson D">DA Patterson</name>
</author>
<author>
<name sortKey="Rabkin, A" uniqKey="Rabkin A">A Rabkin</name>
</author>
<author>
<name sortKey="Stoica, I" uniqKey="Stoica I">I Stoica</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jeffrey Dean, Sg" uniqKey="Jeffrey Dean S">SG Jeffrey Dean</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, Hh" uniqKey="He H">HH He</name>
</author>
<author>
<name sortKey="Meyer, Ca" uniqKey="Meyer C">CA Meyer</name>
</author>
<author>
<name sortKey="Shin, H" uniqKey="Shin H">H Shin</name>
</author>
<author>
<name sortKey="Bailey, St" uniqKey="Bailey S">ST Bailey</name>
</author>
<author>
<name sortKey="Wei, G" uniqKey="Wei G">G Wei</name>
</author>
<author>
<name sortKey="Wang, Q" uniqKey="Wang Q">Q Wang</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Xu, K" uniqKey="Xu K">K Xu</name>
</author>
<author>
<name sortKey="Ni, M" uniqKey="Ni M">M Ni</name>
</author>
<author>
<name sortKey="Lupien, M" uniqKey="Lupien M">M Lupien</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heintzman, Nd" uniqKey="Heintzman N">ND Heintzman</name>
</author>
<author>
<name sortKey="Hon, Gc" uniqKey="Hon G">GC Hon</name>
</author>
<author>
<name sortKey="Hawkins, Rd" uniqKey="Hawkins R">RD Hawkins</name>
</author>
<author>
<name sortKey="Kheradpour, P" uniqKey="Kheradpour P">P Kheradpour</name>
</author>
<author>
<name sortKey="Stark, A" uniqKey="Stark A">A Stark</name>
</author>
<author>
<name sortKey="Harp, Lf" uniqKey="Harp L">LF Harp</name>
</author>
<author>
<name sortKey="Ye, Z" uniqKey="Ye Z">Z Ye</name>
</author>
<author>
<name sortKey="Lee, Lk" uniqKey="Lee L">LK Lee</name>
</author>
<author>
<name sortKey="Stuart, Rk" uniqKey="Stuart R">RK Stuart</name>
</author>
<author>
<name sortKey="Ching, Cw" uniqKey="Ching C">CW Ching</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramsey, Sa" uniqKey="Ramsey S">SA Ramsey</name>
</author>
<author>
<name sortKey="Knijnenburg, Ta" uniqKey="Knijnenburg T">TA Knijnenburg</name>
</author>
<author>
<name sortKey="Kennedy, Ka" uniqKey="Kennedy K">KA Kennedy</name>
</author>
<author>
<name sortKey="Zak, De" uniqKey="Zak D">DE Zak</name>
</author>
<author>
<name sortKey="Gilchrist, M" uniqKey="Gilchrist M">M Gilchrist</name>
</author>
<author>
<name sortKey="Gold, Es" uniqKey="Gold E">ES Gold</name>
</author>
<author>
<name sortKey="Johnson, Cd" uniqKey="Johnson C">CD Johnson</name>
</author>
<author>
<name sortKey="Lampano, Ae" uniqKey="Lampano A">AE Lampano</name>
</author>
<author>
<name sortKey="Litvak, V" uniqKey="Litvak V">V Litvak</name>
</author>
<author>
<name sortKey="Navarro, G" uniqKey="Navarro G">G Navarro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerstein, Mb" uniqKey="Gerstein M">MB Gerstein</name>
</author>
<author>
<name sortKey="Lu, Zj" uniqKey="Lu Z">ZJ Lu</name>
</author>
<author>
<name sortKey="Van Nostrand, El" uniqKey="Van Nostrand E">EL Van Nostrand</name>
</author>
<author>
<name sortKey="Cheng, C" uniqKey="Cheng C">C Cheng</name>
</author>
<author>
<name sortKey="Arshinoff, Bi" uniqKey="Arshinoff B">BI Arshinoff</name>
</author>
<author>
<name sortKey="Liu, T" uniqKey="Liu T">T Liu</name>
</author>
<author>
<name sortKey="Yip, Ky" uniqKey="Yip K">KY Yip</name>
</author>
<author>
<name sortKey="Robilotto, R" uniqKey="Robilotto R">R Robilotto</name>
</author>
<author>
<name sortKey="Rechtsteiner, A" uniqKey="Rechtsteiner A">A Rechtsteiner</name>
</author>
<author>
<name sortKey="Ikegami, K" uniqKey="Ikegami K">K Ikegami</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="product-review">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">21554709</article-id>
<article-id pub-id-type="pmc">3103446</article-id>
<article-id pub-id-type="publisher-id">1471-2105-12-139</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-12-139</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>PeakRanger: A cloud-enabled peak caller for ChIP-seq data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" id="A1">
<name>
<surname>Feng</surname>
<given-names>Xin</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>drestion@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Grossman</surname>
<given-names>Robert</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>grossman@labcomputing.org</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A3">
<name>
<surname>Stein</surname>
<given-names>Lincoln</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>lincoln.stein@gmail.com</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Biomedical Engineering, Stony Brook University, Stony Brook, NY 11794, USA</aff>
<aff id="I2">
<label>2</label>
Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA</aff>
<aff id="I3">
<label>3</label>
Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800, Toronto, ON M5G 0A3, Canada</aff>
<aff id="I4">
<label>4</label>
Institute for Genomics & Systems Biology, The University of Chicago, Cummings Life Sciences Center 431A, 920 East 58th Street, Chicago, IL 60637, USA</aff>
<pub-date pub-type="collection">
<year>2011</year>
</pub-date>
<pub-date pub-type="epub">
<day>9</day>
<month>5</month>
<year>2011</year>
</pub-date>
<volume>12</volume>
<fpage>139</fpage>
<lpage>139</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>1</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>9</day>
<month>5</month>
<year>2011</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2011 Feng et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2011</copyright-year>
<copyright-holder>Feng et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/12/139"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks.</p>
</sec>
<sec>
<title>Results</title>
<p>In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project:
<ext-link ext-link-type="uri" xlink:href="http://www.modencode.org/software/ranger/">http://www.modencode.org/software/ranger/</ext-link>
</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The genome-wide characterization of chromatin protein binding sites and the profiling of patterns of histone modification marks is essential for understanding the dynamics of chromatin, unraveling the transcriptional regulatory code and probing epigenetic inheritance. The main technique for performing this characterization is chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq)[
<xref ref-type="bibr" rid="B1">1</xref>
-
<xref ref-type="bibr" rid="B5">5</xref>
]. Unlike its predecessor ChIP-chip [
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
], ChIP-seq provides improved dynamic range and spatial resolution[
<xref ref-type="bibr" rid="B5">5</xref>
].</p>
<p>After mapping sequenced ChIP reads to the reference genome, the first critical task of ChIP-seq data analysis is to accurately identify the target binding sites or regions enriched in histone marks [
<xref ref-type="bibr" rid="B8">8</xref>
]. Since downstream analysis relies heavily on the accurate identification of such binding sites or regions, a large number of algorithms have been proposed for peak calling[
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B9">9</xref>
-
<xref ref-type="bibr" rid="B24">24</xref>
].</p>
<p>Despite the availability of such a large set of peak callers, many of these algorithms have disadvantages in real-world settings. Some algorithms have high sensitivity, but call an excessive number of false positive peaks due to low specificity. Others have the opposite problem. Another limitation of the current generation of peak callers is that many are optimized to detect either narrow punctate features, such as those generated by transcription-factor binding site experiments, or else optimized to detect broad peaks, such as those characterized by regions of modified histones. Hence a ChIP-seq production environment may need to install and maintain two different peak calling software packages. Those algorithms that attempt to handle both type of peak typically do so at the sacrifice of inter-peak and spatial resolution. The former is the ability to distinguish two or more closely-spaced peaks, while the latter is the ability to correctly locate the target binding site or histone modification boundaries. Both types of resolution are essential for understanding the underlying biology of chromatin dynamics. An example of how loss of resolution can affect the interpretation of ChIP-seq data is shown in Figure
<xref ref-type="fig" rid="F1">1</xref>
.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>The importance of peak caller resolution</bold>
. Some peak callers are designed to call surrounding enriched regions instead of summits. This degrades their ability to locate the site of binding events and their inter-peak resolution.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-1"></graphic>
</fig>
<p>Software usability is also an issue. Some otherwise excellent peak callers are difficult to use because they require unusual data file formats, run slowly on real-world data sets, or do not take advantage of cluster computing. Poor usability can also impede the ability of a researcher to integrate the software with other tools in an analytic pipeline.</p>
<p>Here we present our efforts to address these concerns by creating PeakRanger, a novel peak caller that is both accurate and usable. Across a series of six accuracy benchmarks and three software usability benchmarks, it compares favorably to 10 other peak callers selected from the recent literature. In addition, PeakRanger supports MapReduce based parallel computing in a cloud environment, allowing it to scale well to large data sets in high-volume applications.</p>
</sec>
<sec>
<title>Implementation</title>
<sec>
<title>Building the read coverage profile</title>
<p>The first step of peak calling is to build a read coverage profile using aligned raw reads. A key step in ChIP-seq is to shear the immunoprecipitated chromatin into fragments of 200-500 bp prior to extracting the DNA and sequencing it. Because the shear size is much larger than the small reads produced by early next-generation sequencing machines, many peak calling algorithms make use of the "shift" distance between coverage peaks defined by plus and minus strand read alignments, but this has become less useful as the read length produced by next-generation sequencers approaches the ChIP-seq DNA shear size. PeakRanger uses the same "blind-extension" strategy as PeakSeq[
<xref ref-type="bibr" rid="B18">18</xref>
] in which the shear size is provided by the user and not estimated from aligned raw reads. This choice significantly simplifies the software design and improves performance. (see additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
)</p>
</sec>
<sec>
<title>Peak Detection</title>
<p>We first identify broad regions of signal enrichment using the same algorithm as PeakSeq, which detects contiguous enrichment regions by thresholding. After that, we use a "summit-valley-alternator" algorithm to scan for summits within regions identified by PeakSeq. This algorithm starts by searching for the first summit within the region, where a summit is defined as the location that has the maximum signal value before subsequent locations drop below a pre-defined cutoff value. The value is calculated by multiplying the current maximum signal value with delta, a tuning factor that should be chosen based on the needs of users. Delta is in the range (0, 1). Since the reads signal of broad regions are usually noisy, we perform additional signal processing before calling summits. (see additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
)</p>
</sec>
<sec>
<title>Software Engineering</title>
<p>PeakRanger is written in C++, and can be compiled on Linux, MacOS and Windows. It runs as a command-line program.</p>
</sec>
</sec>
<sec>
<title>Results</title>
<sec>
<title>Benchmarking</title>
<p>In preparation for benchmarking, we compiled a list of 17 third-party peak callers mentioned in two recent reviews [
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B25">25</xref>
] plus several recently-published packages (see additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). We attempted to install and run each peak caller on a test data set, and discarded seven that either failed to install, crashed during the test run, or produced no peaks from the test data set. This reduced the number of peak callers evaluated to 11, including PeakRanger.</p>
<sec>
<title>Sensitivity benchmarks</title>
<p>In order to evaluate the sensitivities of the 11 algorithms, we evaluated them using two independent ChIP-seq datasets whose binding sites had been validated by qPCR[
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B19">19</xref>
]. Peaks called by each peak caller were ranked by their confidence scores and then compared to the list of validated sites. As measured by the average recovered proportion of validated sites, PeakRanger ranks within the top group, all of which have very similar sensitivities(Figure
<xref ref-type="fig" rid="F2">2A</xref>
).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Sensitivity test using qPCR validated ChIP-Seq binding sites</bold>
. The proportion of recovered qPCR validated binding sites is shown as a function of the ranked peaks called by each peak caller. Peaks are ranked based on significance values reported. A) Test results on the GABP dataset. B) Test results on the NRSF dataset.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-2"></graphic>
</fig>
</sec>
<sec>
<title>Specificity benchmarks</title>
<p>It is more difficult to evaluate the specificity of peak calling than sensitivity because there is no golden standard of true-negative binding sites of sufficient size to confidently evaluate specificity. To partially address this issue, we performed a specificity analysis using a previously-published synthetic dataset [
<xref ref-type="bibr" rid="B21">21</xref>
]. This data set was generated from a real-world control (no antibody) experiment that contains no binding events, which was then spiked with simulated binding site peaks. Since all peaks were generated by the author, the locations of all simulated binding sites are known and false positive peaks can thus be defined.</p>
<p>Figure
<xref ref-type="fig" rid="F3">3</xref>
graphs the true positive rate against (1-the false positive rate) for each of the peak callers at a fixed FDR rate of 0.01, as shown in Figure
<xref ref-type="fig" rid="F3">3</xref>
, in the top group, PeakRanger, PeakSeq, GPS and MACS have nearly the same good specificity and sensitivity. SPP is close to the top group. While SISSRs has higher sensitivity, it suffers from higher false positives. In contrast, although CisGenome called only a few false positive peaks, it recovered fewer peaks than the top group. F-Seq, Erange and FindPeaks all had unusually high false positive rates in this test.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>The specificity test</bold>
. Peak calls of all peak callers on a semi-synthetic dataset are shown. All peak callers were configured to have a FDR cut off of 0.01. Recall rate is plotted against (1 - False positive rate)</p>
</caption>
<graphic xlink:href="1471-2105-12-139-3"></graphic>
</fig>
</sec>
<sec>
<title>Spatial accuracy benchmark</title>
<p>Spatial accuracy measures the ability of the peak caller to correctly identify the biological binding site underlying punctate peaks. To evaluate spatial accuracy, we again used the ChIP-seq data sets for the GABP and NSRF transcription factor targets. To identify the most likely biological binding sites, we used MAST[
<xref ref-type="bibr" rid="B26">26</xref>
] and the canonical target binding site motif and corresponding position specific scoring matrices (PSSMs) to find all matches in the 200 bp surrounding regions.</p>
<p>We ran each of the peak callers on the data sets, and measured the distance between the binding site motifs and the centers of the closest overlapping peak call. As shown in Figure
<xref ref-type="fig" rid="F4">4</xref>
, algorithms that report peaks as single bp coordinates are much better than those that report broader regions. In particular, SPP, FindPeaks, GPS and QuEST were all tied for first place, closely followed by PeakRanger. However, the difference in spatial accuracy among the top-ranked peak callers is small.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>The spatial accuracies of peak callers</bold>
. The distance from binding sites to motif center is measured for A) GABP and B) NRSF. Box-and-whisker plot is plotted to illustrate the distribution of distances from called binding sites to motif center for each peak caller.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-4"></graphic>
</fig>
</sec>
<sec>
<title>Inter-peak resolution benchmark</title>
<p>This benchmark measures the ability of peak callers to distinguish between two closely-spaced peaks. This is a particularly difficult task for region-reporter algorithms, which tend to merge close peaks, potentially missing biologically-significant duplets. PeakRanger identifies closely-spaced summits within an enriched region by identifying local maxima within a smoothed model of coverage.</p>
<p>There are no real-world gold standard data sets for evaluating inter-peak resolution, so we adapted the semi-synthetic data set used previously for the specificity benchmarks. We created a series of derivative data sets to simulate closely spaced binding sites by generating a peak adjacent to each synthetic binding site. The inter-peak spacing varied from 200 to 500 bp in each of 13 derived data sets. To compensate for changes in coverage introduced by this modification, we added the same number of reads to the control. Some peak callers, including PeakRanger, provide a "resolution mode" that seeks to discover all summits within an enriched region. For this benchmark, we set each algorithm to use resolution mode or equivalent when available, or the default settings when not.</p>
<p>As shown in Figure
<xref ref-type="fig" rid="F5">5A</xref>
, no peak caller is able to resolve closely-spaced peaks in this data set when the peak separation is less than 250 bp. In the range of 250-350 bp, FindPeaks and PeakRanger lead the group in sensitivity, but FindPeaks produces an excessive number of false positives, as shown in Figure
<xref ref-type="fig" rid="F5">5B</xref>
. The other algorithms have lower sensitivities across this range and some exhibit very high false positive rates as well. MACS crashed on the 200 bp, 400 bp and 500 bp data sets, so these data points are missing.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>Resolution test</bold>
. We called peaks on a series of semi-synthetic datasets consisting of paired peaks of increasing inter-peak separation. A) The percentage of close peaks recovered as the function of increasing inter-peak distance. B) The percentage of false positive peaks called. MACS crashed on the 200 bp, 400 bp and 500 bp datasets, so these data points are not plotted.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-5"></graphic>
</fig>
</sec>
</sec>
<sec>
<title>Usability design and performance tuning</title>
<p>Published algorithms are sometimes released in the research prototype stage, and do not have the software engineering necessary to work in a high volume, high availability setting. Ideally, a number of software engineering issues should be addressed (Table
<xref ref-type="table" rid="T1">1</xref>
). First, the software should be as fast as possible. Our experience in large projects such as the modENCODE project[
<xref ref-type="bibr" rid="B27">27</xref>
] supports the notion that a faster peak caller will significantly reduce the time to analyze and interpret ChIP-seq data, because all the downstream analyses rely on accurate peak calls and there is often a cycle in which the results of downstream analyses inform additional rounds of peak calling using different parameter sets. Second, the software should support multiple common data formats. Transforming file formats requires extra time, computing resources, and introduces a step in which programming errors can creep in. Third, the software should be easy to use and requires less computing expertise from users. Finally, the software should be able to handle very large ChIP-seq data sets, given the rapid increase in next generation sequencing capacity.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Usability summary of peak callers.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th align="center">GUI</th>
<th align="center">Command line parameters input</th>
<th align="center">Data format</th>
<th align="center">Customizable input</th>
<th align="center">Automatic format detection</th>
<th align="center">Species</th>
<th align="center">Reusable configuration file</th>
<th align="center">Wiggle file generation</th>
<th align="center">No preprocessing</th>
<th align="center">Parallel processing</th>
<th align="center">Cloud parallel computing</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">PeakRanger</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Eland, Bowtie, SAM/BAM, BED</td>
<td align="center">Yes</td>
<td></td>
<td align="center">All</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">MACS</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Eland, Bowtie, SAM/BAM, BED</td>
<td></td>
<td align="center">Yes</td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">FindPeaks</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td align="center">Eland, Bowtie, BED, GFF</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">SPP</td>
<td></td>
<td></td>
<td align="center">Eland, Bowtie, MAQ, Arachne</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">QuEST</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Eland, Bowtie, Solexa, MAQ</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td align="center">Yes</td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">GPS</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Eland, Bowtie, SAM, NovoAlign, BED</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Erange</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Eland, Bowtie, Blat, BED</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">CisGenome</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td align="center">Eland, BED</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">F-Seq</td>
<td></td>
<td align="center">Yes</td>
<td align="center">BED</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td align="center">Yes</td>
<td align="center">Yes</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">SISSRs</td>
<td></td>
<td align="center">Yes</td>
<td align="center">BED</td>
<td></td>
<td></td>
<td align="center">All</td>
<td></td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="12">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">PeakSeq</td>
<td></td>
<td></td>
<td align="center">Eland</td>
<td></td>
<td></td>
<td align="center">Human</td>
<td></td>
<td align="center">Yes</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>This table summarizes commonly supported software features by existing peak callers.</p>
</table-wrap-foot>
</table-wrap>
<p>We implemented PeakRanger in the compiled C++ programming language to optimize performance. We avoided performance losses from disk I/O by keeping all working data in memory rather than in temporary files; this has the effect of trading a larger memory footprint for increased execution speed. To take advantage of modern multi-core processors, we also designed PeakRanger to use parallel processing.</p>
<p>To benchmark the performance of PeakRanger against other peak callers, we recorded the running time of them required to process a typical data set. As shown in Table
<xref ref-type="table" rid="T2">2</xref>
, PeakRanger is more than twice as fast as the next fastest peak caller tested, while consuming an acceptable amount of memory.</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>The performance of peak callers.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Algorithms</th>
<th align="center">Elapsed time</th>
<th align="center">Maximum memory footprint</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">PeakRanger</td>
<td align="center">2m9s</td>
<td align="center">2.9G</td>
</tr>
<tr>
<td align="center">PeakSeq</td>
<td align="center">5m11s</td>
<td align="center">1.48G</td>
</tr>
<tr>
<td align="center">SISSRs</td>
<td align="center">15m18s</td>
<td align="center">0.89G</td>
</tr>
<tr>
<td align="center">FindPeaks</td>
<td align="center">19m39s</td>
<td align="center">4.2G</td>
</tr>
<tr>
<td align="center">Erange</td>
<td align="center">21m31s</td>
<td align="center">0.81G</td>
</tr>
<tr>
<td align="center">F-Seq</td>
<td align="center">23m6s</td>
<td align="center">7.27G</td>
</tr>
<tr>
<td align="center">MACS</td>
<td align="center">33m13s</td>
<td align="center">1.04G</td>
</tr>
<tr>
<td align="center">SPP</td>
<td align="center">34m59s</td>
<td align="center">1.98G</td>
</tr>
<tr>
<td align="center">QuEST</td>
<td align="center">36m51s</td>
<td align="center">4.36G</td>
</tr>
<tr>
<td align="center">CisGenome</td>
<td align="center">55m39s</td>
<td align="center">1.85G</td>
</tr>
<tr>
<td align="center">GPS</td>
<td align="center">64m18s</td>
<td align="center">4.39G</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Running time and memory footprint was recorded for peak callers using the GABP dataset.</p>
</table-wrap-foot>
</table-wrap>
<p>To enable the support of multiple input data formats, we adopted designs shared by SPP and MACS which separate data loading from data processing. We wrote individual modules for specific data formats and let users to choose the one they need. PeakRanger currently supports Bowtie[
<xref ref-type="bibr" rid="B28">28</xref>
], Eland, SAM[
<xref ref-type="bibr" rid="B29">29</xref>
] and BAM[
<xref ref-type="bibr" rid="B29">29</xref>
] formats. Other file formats can be added by writing additional importation modules. PeakRanger is also capable of exporting its results in formats suitable for data visualization, including both compressed and uncompressed versions of the UCSC Genome Browser "wiggle" format.</p>
<p>To support multiple species, peak calling packages need basic genome build information such as the names and sizes of chromosomes. For users' convenience, PeakRanger can either derive this information directly from the input files, or can be given pre-computed genome tables. Although the former mode is convenient, it does add a small amount of overhead to the execution time.</p>
<p>Although hard to quantify, we noted considerably variation in the difficulty of installing and configuring the various peak caller packages during our benchmarking tests. For example, some packages require the user to make changes to the source code in order to change the location of hard-coded file paths and run-time parameters. PeakRanger makes all its run-time configuration parameters available as command-line options, and also provides a reasonable set of presets for common analysis tasks. For example, PeakRanger provides "resolution mode" and "region mode", which are presets suitable for analyzing transcription factor binding sites and other punctate data on the one hand, and broad regions such as histone modifications on the other. All run-time parameters can be read from external configuration files as well, allowing parameter sets to be managed by source code control, versioned, and shared among laboratories.</p>
<p>PeakRanger does not provide a graphical user interface (GUI) such as those provided by CisGenome, USeq and Sole-Search[
<xref ref-type="bibr" rid="B10">10</xref>
]. While GUIs are convenient for casual users, they make it difficult to integrate the software into the automatic workflows needed by high-throughput laboratories, which are the target audience for PeakRanger.</p>
</sec>
<sec>
<title>Support for MapReduce</title>
<p>With sequencing industry's rapidly increasing capacity to generate more and longer sequencing reads[
<xref ref-type="bibr" rid="B30">30</xref>
], peak calling algorithms face an exponentially growing demand for computational resources. Cloud computing[
<xref ref-type="bibr" rid="B31">31</xref>
] offers a cost-effective solution for groups that have highly variable demands for compute resources.</p>
<p>Current cloud computing infrastructures offer a highly scalable parallel computational model called MapReduce[
<xref ref-type="bibr" rid="B32">32</xref>
] which was originally designed by Google to process very large-volume datasets. We thus also implemented a MapReduce version of PeakRanger on top of the Hadoop library[
<xref ref-type="bibr" rid="B33">33</xref>
], a free open source implementation of MapReduce.</p>
<p>The Hadoop version of PeakRanger supports splitting the job by chromosomes to take advantage of the chromosome-level independence (CLI) of ChIP-seq data sets. Other ways of partitioning the genome are possible, but require additional preparation by the user.</p>
<p>Within the Hadoop framework, a PeakRanger job can be expressed as a series of "map-then-reduce" sub-jobs (Figure
<xref ref-type="fig" rid="F6">6</xref>
). PeakRanger first starts a series of mappers to map the input datasets to a set of keys. Then a Hadoop partitioner assigns keys to a set of reducers. Each individual reducer fetches the data according to the keys it receives and processes these data. In the CLI case, "map-then-reduce" becomes "split-by-chromosome-then-call-peaks" where chromosomes are used as keys. That is, we delegate the data loading/preprocessing to mappers and peak calling to reducers. After mappers finishes splitting data on chromosome, the partitioner assigns jobs based on the number of available reducers and reducers then do the actual peak calling.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption>
<p>
<bold>The programming model of Hadoop and the adaptation of PeakRanger to it</bold>
. Reads are first splitted by the Hadoop spliter. Mappers are then initiated to preprocess these reads by chromosomes. Hadoop partitioner then assign processed reads to individual reducers to call peaks. Called peaks then undergo post-call processing.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-6"></graphic>
</fig>
<p>To evaluate the performance of Hadoop-PeakRanger, we performed two benchmark tests: 1) test with fixed number of nodes and data sets of increasing size; 2) test with increasing numbers of nodes and data sets with fixed sizes.</p>
<p>Figure
<xref ref-type="fig" rid="F7">7A</xref>
demonstrates that on a fixed number of nodes with increasing data set sizes, the execution time for the Hadoop version of PeakRanger is dramatically shorter, and increases more slowly, than the regular single-processor version. For example, the cloud version processed 14 Gb dataset of 192 million reads in less than 5 minutes, more than 10 times faster than the original PeakRanger.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption>
<p>
<bold>Performance of PeakRanger in cloud parallel computing</bold>
. A) test with fixed number of nodes and data sets of increasing size; B) test with increasing numbers of nodes and data sets with fixed sizes.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-7"></graphic>
</fig>
<p>In the second test, we tested how the running time scales with the increasing number of nodes (Figure
<xref ref-type="fig" rid="F7">7B</xref>
). As expected, runtime decreases rapidly until the number of nodes equals the number of chromosomes (25), after which adding additional nodes does not provide further benefit. Future versions of PeakRanger will provide alternate ways of splitting the genome to overcome this parallelization bottleneck.</p>
<p>We plan to make both the regular and Hadoop version of PeakRanger available as public machine images in Amazon EC2 and other cloud service providers in order to facilitate its use by the research community.</p>
</sec>
<sec>
<title>Real world usage of PeakRanger</title>
<p>In this section we provide two examples of using PeakRanger in biological research settings.</p>
<sec>
<title>Characterization of broad enriched regions</title>
<p>It is common for studies of histone modifications to identify broad regions enriched in the modification of interest and then to correlate these broad regions with other biological annotations such as genes. Although this type of analysis is straightforward, it ignores the detailed internal structure of the enriched profiles, which can contain summits and valleys relating to quantitative differences in modification efficiency and/or heterogeneity within the sample.</p>
<p>Recently there have been several publications reporting biologically significant phenomena based on the internal structures of the enriched histone modification regions [
<xref ref-type="bibr" rid="B34">34</xref>
-
<xref ref-type="bibr" rid="B36">36</xref>
]. Therefore it is desirable that a peak caller be able to retrieve both broad enriched regions while simultaneously identifying the detailed summits within these regions. Here we demonstrate such an example using PeakRanger.</p>
<p>In the paper recently published by He et al[
<xref ref-type="bibr" rid="B34">34</xref>
], the authors found that after exposures to 5-α-dihydrotestosterone (DHT) the central nucleosome was depleted from a subpopulation of androgen receptor (AR) binding sites, leaving a pair of flanking nucleosomes. Without knowing the region structure in advance, it is difficult to identify the paired nucleosomes from the read coverage signal alone, and He et al built additional models to identify and quantify the paired binding sites.</p>
<p>We applied PeakRanger directly to the He data set, using a configuration that allowed it to find both broad enriched regions and summits within the regions. We then compared the number of summits in each enriched region before and after DHT exposure to directly identify the subpopulation of AR binding sites that have depleted central nucleosomes. In order to accomplish this objective, we configured PeakRanger to detect summits with comparable heights. As shown in Figure
<xref ref-type="fig" rid="F8">8A</xref>
, the profile plot strongly resembled that reported in the original publication, and had an average twin-peak separation of 360 bp, close to the publication estimate of 370 bp. As a comparison, we repeated the same procedure using QuEST. The resulting estimated peak distance was 240 bp and the profile plot departed from the original one(Figure
<xref ref-type="fig" rid="F8">8B</xref>
). For other peak callers, since no information is available for the number of summits of an enriched region, we could not perform the same analysis.</p>
<fig id="F8" position="float">
<label>Figure 8</label>
<caption>
<p>
<bold>Estimating the peak distance from DHT sensitive subgroups</bold>
. The analysis conducted by He et al is repeated by using just peak calls generated by A) PeakRanger and B) QuEST. PeakRanger gave a much closer estimate of the twin-peak distance than QuEST.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-8"></graphic>
</fig>
</sec>
<sec>
<title>Processing modENCODE worm datasets</title>
<p>ModENCODE is a multi-center collaboration to catalogue functional elements in C. elegans and D. melanogaster [
<xref ref-type="bibr" rid="B27">27</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
], and includes more than 100 ChIP-seq data sets. PeakRanger was used by modENCODE as the standard ChIP-seq peak caller for 29 ChIP-seq experiments for involving 23 C. elegans transcription factors across various developmental stages[
<xref ref-type="bibr" rid="B37">37</xref>
]. PeakRanger was able to process the entire data sets in less than 2 hours running on a regular workstation with 8G ram and a quad core CPU. This illustrates PeakRanger's ability to integrate into a high-throughput environment. Ultra-high through-put enabled great collaborated analysis among different labs. A couple of internal analysis shows that peaks produced by PeakRanger were of high quality (Data not shown).</p>
</sec>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>Figure
<xref ref-type="fig" rid="F9">9</xref>
summarizes the accuracy and software engineering benchmarks discussed above, where each of the 11 peak callers examined is ranked from 0 (worst) to 10 (best) for a particular benchmark. The last column of the table is a simple sum of the ranks. No single peak caller ranks as the best on all benchmarks; in particular, algorithms with high sensitivity often have low specificity. However, PeakRanger manages a good compromise among all the performance benchmarks and ranks first in the aggregate ranking.</p>
<fig id="F9" position="float">
<label>Figure 9</label>
<caption>
<p>
<bold>Summary of benchmarks performed in this study</bold>
. For each benchmark item, peak callers are ranked and scored (see methods). The score has a range of 0 to 10 and 10 is the best score. The overall rank is based on the sum of all scores in all benchmarks.</p>
</caption>
<graphic xlink:href="1471-2105-12-139-9"></graphic>
</fig>
<p>The algorithm used to find the summits within enriched regions are similar to those used by QuEST and FindPeaks. To make the summit detection more reliable and flexible, we enhanced it based on our experiences of real ChIP-Seq datasets. In QuEST, users can not control the sensitivity of summits detection. In comparison, PeakRanger allows users to specify the sensitivity by using the -r option. We also applied an additional padding algorithm to avoid calling false positive summits. In case a dataset does not have adequate sequencing depth, we pad enriched regions so that the summit detection algorithm will not call summits if two base pairs are separated with regions of zero read counts.</p>
<p>PeakRanger relies on PeakSeq which detects enriched regions before the step of summit detecting. PeakSeq is an effective algorithm but the original implementation gives only limited usage of the algorithm. We thus significantly modified PeakSeq so that it can be integrated as a part of PeakRanger. PeakSeq contains two separate parts: pre-processing and peak-calling. These two parts are now combined into a single module to reduce file I/O cost. We also designed indexing of chromosomes to enable support to other species with different number and names of chromosomes. The original PeakSeq runs in single-thread mode and we modified related data structures to support multi-thread mode.</p>
<p>Although PeakRanger represents a successful compromise among multiple measures of accuracy, researchers should consider one of the other peak calling algorithms if a particular performance characteristic is of the top priority. For example, if identifying the precise center of the peak is critical to an experiment, then researchers should consider GPS, QuEST, MACS, SPP or FindPeaks, all of which have better spatial accuracy than PeakRanger.</p>
<p>The current design for the Hadoop version is based on chromosome-level-independence (CLI), which limits the practical level of parallelization to the number of chromosomes in the genome. This concept can be generalized to region-level-independence (RLI) by breaking the genomes into a set of arbitrary regions and call peaks in each individual region. However, this is dependent on the peak calls for each region being independent of each other, a criterion that is not satisfied when an enriched region crosses the region boundary. Additional manipulation of the regions to allow for overlap between them, and adjustments for the changes in coverage in overlapped regions will be necessary to implement this, and is deferred to future work. However, even with the current design we are able to archive an order-of-magnitude increase in speed, which is sufficient for most practical applications.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>In this paper, we introduce PeakRanger, a general purpose ChIP-seq peak calling algorithm that is optimized for accuracy, speed and ease of use. It is suitable for use in small laboratories, as well as in large production centers, and can be used in a cloud environment for very high throughput environments. The software is freely available and open source under the Artistic License 2.0. The primary download site is
<ext-link ext-link-type="uri" xlink:href="http://www.modencode.org/software/ranger/">http://www.modencode.org/software/ranger/</ext-link>
.</p>
</sec>
<sec>
<title>Availability and requirements</title>
<p>PeakRanger is under the Artistic License 2.0. PeakRanger can be downloaded from:
<ext-link ext-link-type="uri" xlink:href="http://www.modencode.org/software/ranger/">http://www.modencode.org/software/ranger/</ext-link>
. We currently provide the full source code, as well as binaries for Linux systems. Binaries for other operating system and an Amazon EC2 image will be available during the first quarter of 2011.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>XF designed, implemented and tested the algorithm. LS helped testing the algorithm. RG provided hardware and software support for the cloud-enabled version. XF and LS wrote the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>This file contains detailed description of the algorithms and benchmarks</bold>
.</p>
</caption>
<media xlink:href="1471-2105-12-139-S1.PDF" mimetype="application" mime-subtype="pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>We thank Michal Sabala for his help on cloud computing environment configurations. We thank Mark Gerstein, Joel S. Rozowsky, Bradley I. Arshinoff, Guanming Wu and Zheng Zha for their comments. We thank Marc Perry, Sonja Althammer and Zheng Zha for their help of datasets. We thank Shamit Soneji, Stephen Taylor, Ian Donaldson and Jasreet Hundal for their feedbacks of beta releases. This project was funded by the iPlant Collaborative and a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI-0735191).</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Mikkelsen</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Ku</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Issac</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lieberman</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Giannoukos</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Alvarez</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Brockman</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>TK</given-names>
</name>
<name>
<surname>Koche</surname>
<given-names>RP</given-names>
</name>
<etal></etal>
<article-title>Genome-wide maps of chromatin state in pluripotent and lineage-committed cells</article-title>
<source>Nature</source>
<year>2007</year>
<volume>448</volume>
<issue>7153</issue>
<fpage>553</fpage>
<lpage>560</lpage>
<pub-id pub-id-type="doi">10.1038/nature06008</pub-id>
<pub-id pub-id-type="pmid">17603471</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Johnson</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
<article-title>Genome-wide mapping of in vivo protein-DNA interactions</article-title>
<source>Science</source>
<year>2007</year>
<volume>316</volume>
<issue>5830</issue>
<fpage>1497</fpage>
<lpage>1502</lpage>
<pub-id pub-id-type="doi">10.1126/science.1141319</pub-id>
<pub-id pub-id-type="pmid">17540862</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Robertson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hirst</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bainbridge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bilenky</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Euskirchen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bernier</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Varhol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Delaney</surname>
<given-names>A</given-names>
</name>
<article-title>Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing</article-title>
<source>Nat Methods</source>
<year>2007</year>
<volume>4</volume>
<fpage>651</fpage>
<lpage>657</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth1068</pub-id>
<pub-id pub-id-type="pmid">17558387</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cuddapah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Roh</surname>
<given-names>TY</given-names>
</name>
<name>
<surname>Schones</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chepelev</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K</given-names>
</name>
<article-title>High-resolution profiling of histone methylations in the human genome</article-title>
<source>Cell</source>
<year>2007</year>
<volume>129</volume>
<issue>4</issue>
<fpage>823</fpage>
<lpage>837</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2007.05.009</pub-id>
<pub-id pub-id-type="pmid">17512414</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Park</surname>
<given-names>PJ</given-names>
</name>
<article-title>ChIP-seq: advantages and challenges of a maturing technology</article-title>
<source>Nat Rev Genet</source>
<year>2009</year>
<volume>10</volume>
<issue>10</issue>
<fpage>669</fpage>
<lpage>680</lpage>
<pub-id pub-id-type="doi">10.1038/nrg2641</pub-id>
<pub-id pub-id-type="pmid">19736561</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Robert</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wyrick</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Aparicio</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Jennings</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Simon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zeitlinger</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schreiber</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hannett</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kanin</surname>
<given-names>E</given-names>
</name>
<etal></etal>
<article-title>Genome-wide location and function of DNA binding proteins</article-title>
<source>Science</source>
<year>2000</year>
<volume>290</volume>
<issue>5500</issue>
<fpage>2306</fpage>
<lpage>2309</lpage>
<pub-id pub-id-type="doi">10.1126/science.290.5500.2306</pub-id>
<pub-id pub-id-type="pmid">11125145</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Iyer</surname>
<given-names>VR</given-names>
</name>
<name>
<surname>Horak</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Scafe</surname>
<given-names>CS</given-names>
</name>
<name>
<surname>Botstein</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>PO</given-names>
</name>
<article-title>Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF</article-title>
<source>Nature</source>
<year>2001</year>
<volume>409</volume>
<issue>6819</issue>
<fpage>533</fpage>
<lpage>538</lpage>
<pub-id pub-id-type="doi">10.1038/35054095</pub-id>
<pub-id pub-id-type="pmid">11206552</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Pepke</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
<article-title>Computation for ChIP-seq and RNA-seq studies</article-title>
<source>Nat Meth</source>
<year>2009</year>
<volume>6</volume>
<issue>11s</issue>
<fpage>S22</fpage>
<lpage>S32</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1371</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="other">
<name>
<surname>Lun</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sherrid</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Weiner</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Galagan</surname>
<given-names>J</given-names>
</name>
<article-title>A blind deconvolution approach to high-resolution mapping of transcription factor binding sites from ChIP-seq data</article-title>
<year>2009</year>
<volume>10</volume>
<fpage>R142</fpage>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="other">
<name>
<surname>Blahnik</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Dou</surname>
<given-names>L</given-names>
</name>
<name>
<surname>O'Geen</surname>
<given-names>H</given-names>
</name>
<name>
<surname>McPhillips</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Cao</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Iyengar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nicolet</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Ludäscher</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Korf</surname>
<given-names>I</given-names>
</name>
<etal></etal>
<article-title>Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data</article-title>
<year>2009</year>
<volume>38</volume>
<fpage>e13</fpage>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Ji</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>W</given-names>
</name>
<article-title>An integrated software system for analyzing ChIP-chip and ChIP-seq data</article-title>
<source>Nat Biotechnol</source>
<year>2008</year>
<volume>26</volume>
<fpage>1293</fpage>
<lpage>1300</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.1505</pub-id>
<pub-id pub-id-type="pmid">18978777</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Jothi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cuddapah</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Barski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>K</given-names>
</name>
<article-title>Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data</article-title>
<source>Nucleic Acids Res</source>
<year>2008</year>
<volume>36</volume>
<fpage>5221</fpage>
<lpage>5231</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn488</pub-id>
<pub-id pub-id-type="pmid">18684996</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Zang</surname>
<given-names>CZ</given-names>
</name>
<name>
<surname>Schones</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Cui</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>WQ</given-names>
</name>
<article-title>A clustering approach for identification of enriched domains from histone modification ChIP-Seq data</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<issue>15</issue>
<fpage>1952</fpage>
<lpage>1958</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp340</pub-id>
<pub-id pub-id-type="pmid">19505939</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Fejes</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bilenky</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Varhol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bainbridge</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>S</given-names>
</name>
<article-title>FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<fpage>1729</fpage>
<lpage>1730</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn305</pub-id>
<pub-id pub-id-type="pmid">18599518</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Boyle</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Guinney</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Crawford</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
<article-title>F-Seq: a feature density estimator for high-throughput sequence tags</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<issue>21</issue>
<fpage>2537</fpage>
<lpage>2538</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn480</pub-id>
<pub-id pub-id-type="pmid">18784119</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Tuteja</surname>
<given-names>G</given-names>
</name>
<name>
<surname>White</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schug</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kaestner</surname>
<given-names>KH</given-names>
</name>
<article-title>Extracting transcription factor targets from ChIP-Seq data</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<issue>17</issue>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Eeckhoute</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bernstein</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Nussbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<etal></etal>
<article-title>Model-based analysis of ChIP-Seq (MACS)</article-title>
<source>Genome Biol</source>
<year>2008</year>
<volume>9</volume>
<fpage>R137</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2008-9-9-r137</pub-id>
<pub-id pub-id-type="pmid">18798982</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Rozowsky</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Euskirchen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Auerbach</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bjornson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Carriero</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
<article-title>PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls</article-title>
<source>Nat Biotechnol</source>
<year>2009</year>
<volume>27</volume>
<fpage>66</fpage>
<lpage>75</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.1518</pub-id>
<pub-id pub-id-type="pmid">19122651</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Valouev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sundquist</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Medina</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Anton</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sidow</surname>
<given-names>A</given-names>
</name>
<article-title>Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data</article-title>
<source>Nat Methods</source>
<year>2008</year>
<volume>5</volume>
<fpage>829</fpage>
<lpage>834</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1246</pub-id>
<pub-id pub-id-type="pmid">19160518</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Kharchenko</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tolstorukov</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>P</given-names>
</name>
<article-title>Design and analysis of ChIP-seq experiments for DNA-binding proteins</article-title>
<source>Nat Biotechnol</source>
<year>2008</year>
<volume>26</volume>
<fpage>1351</fpage>
<lpage>1359</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.1508</pub-id>
<pub-id pub-id-type="pmid">19029915</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Nix</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Courdy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Boucher</surname>
<given-names>K</given-names>
</name>
<article-title>Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks</article-title>
<source>BMC Bioinformatics</source>
<year>2008</year>
<volume>9</volume>
<fpage>523</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-9-523</pub-id>
<pub-id pub-id-type="pmid">19061503</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<name>
<surname>Guo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Papachristoudis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Gerber</surname>
<given-names>GK</given-names>
</name>
<name>
<surname>Jaakkola</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Gifford</surname>
<given-names>DK</given-names>
</name>
<name>
<surname>Mahony</surname>
<given-names>S</given-names>
</name>
<article-title>Discovering homotypic binding events at high spatial resolution</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>24</issue>
<fpage>3028</fpage>
<lpage>34</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq590</pub-id>
<pub-id pub-id-type="pmid">20966006</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>McCue</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Schaeffer</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
<article-title>Mapping and quantifying mammalian transcriptomes by RNA-Seq</article-title>
<source>Nat Meth</source>
<year>2008</year>
<volume>5</volume>
<issue>7</issue>
<fpage>621</fpage>
<lpage>628</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1226</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Qin</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Maher</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kalyana-Sundaram</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chinnaiyan</surname>
<given-names>A</given-names>
</name>
<article-title>HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<issue>1</issue>
<fpage>369</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-369</pub-id>
<pub-id pub-id-type="pmid">20598134</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Wilbanks</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Facciotti</surname>
<given-names>MT</given-names>
</name>
<article-title>Evaluation of Algorithm Performance in ChIP-Seq Peak Detection</article-title>
<source>PLoS ONE</source>
<year>2010</year>
<volume>5</volume>
<issue>7</issue>
<fpage>e11471</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0011471</pub-id>
<pub-id pub-id-type="pmid">20628599</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Gribskov</surname>
<given-names>M</given-names>
</name>
<article-title>Combining evidence using p-values: application to sequence homology searches</article-title>
<source>Bioinformatics</source>
<year>1998</year>
<volume>14</volume>
<issue>1</issue>
<fpage>48</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/14.1.48</pub-id>
<pub-id pub-id-type="pmid">9520501</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Celniker</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Dillon</surname>
<given-names>LAL</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Gunsalus</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Henikoff</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Karpen</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Kellis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lai</surname>
<given-names>EC</given-names>
</name>
<name>
<surname>Lieb</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>MacAlpine</surname>
<given-names>DM</given-names>
</name>
<etal></etal>
<article-title>Unlocking the secrets of the genome</article-title>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<issue>7249</issue>
<fpage>927</fpage>
<lpage>930</lpage>
<pub-id pub-id-type="doi">10.1038/459927a</pub-id>
<pub-id pub-id-type="pmid">19536255</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>S</given-names>
</name>
<article-title>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</article-title>
<source>Genome Biology</source>
<year>2009</year>
<volume>10</volume>
<issue>3</issue>
<fpage>R25</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2009-10-3-r25</pub-id>
<pub-id pub-id-type="pmid">19261174</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wysoker</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Homer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Abecasis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<collab>Genome Project Data Processing Subgroup</collab>
<article-title>The Sequence Alignment/Map format and SAMtools</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<issue>16</issue>
<fpage>2078</fpage>
<lpage>2079</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id>
<pub-id pub-id-type="pmid">19505943</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Stein</surname>
<given-names>L</given-names>
</name>
<article-title>The case for cloud computing in genome informatics</article-title>
<source>Genome Biology</source>
<year>2010</year>
<volume>11</volume>
<issue>5</issue>
<fpage>207</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2010-11-5-207</pub-id>
<pub-id pub-id-type="pmid">20441614</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="book">
<name>
<surname>Armbrust</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Griffith</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Joseph</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Katz</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Konwinski</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Rabkin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Stoica</surname>
<given-names>I</given-names>
</name>
<etal></etal>
<source>Above the Clouds: A Berkeley View of Cloud Computing</source>
<year>2009</year>
<publisher-name>EECS Department, University of California, Berkeley</publisher-name>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="other">
<name>
<surname>Jeffrey Dean</surname>
<given-names>SG</given-names>
</name>
<article-title>MapReduce: Simplified Data Processing on Large Clusters</article-title>
<source>OSDI'04: Sixth Symposium on Operating System Design and Implementation. San Francisco, CA</source>
<year>2004</year>
<pub-id pub-id-type="pmid">21557633</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="other">
<article-title>Hadoop</article-title>
<ext-link ext-link-type="uri" xlink:href="http://hadoop.apache.org/">http://hadoop.apache.org/</ext-link>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>He</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>ST</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ni</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lupien</surname>
<given-names>M</given-names>
</name>
<etal></etal>
<article-title>Nucleosome dynamics define transcriptional enhancers</article-title>
<source>Nat Genet</source>
<year>2010</year>
<volume>42</volume>
<issue>4</issue>
<fpage>343</fpage>
<lpage>347</lpage>
<pub-id pub-id-type="doi">10.1038/ng.545</pub-id>
<pub-id pub-id-type="pmid">20208536</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Heintzman</surname>
<given-names>ND</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Hawkins</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Kheradpour</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Stark</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Harp</surname>
<given-names>LF</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>LK</given-names>
</name>
<name>
<surname>Stuart</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Ching</surname>
<given-names>CW</given-names>
</name>
<etal></etal>
<article-title>Histone modifications at human enhancers reflect global cell-type-specific gene expression</article-title>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<issue>7243</issue>
<fpage>108</fpage>
<lpage>112</lpage>
<pub-id pub-id-type="doi">10.1038/nature07829</pub-id>
<pub-id pub-id-type="pmid">19295514</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="other">
<name>
<surname>Ramsey</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Knijnenburg</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Kennedy</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Zak</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Gilchrist</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gold</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Lampano</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Litvak</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Navarro</surname>
<given-names>G</given-names>
</name>
<etal></etal>
<article-title>Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<fpage>btq405</fpage>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Gerstein</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>ZJ</given-names>
</name>
<name>
<surname>Van Nostrand</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Arshinoff</surname>
<given-names>BI</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Yip</surname>
<given-names>KY</given-names>
</name>
<name>
<surname>Robilotto</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rechtsteiner</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ikegami</surname>
<given-names>K</given-names>
</name>
<etal></etal>
<article-title>Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project</article-title>
<source>Science</source>
<year>2010</year>
<volume>330</volume>
<issue>6012</issue>
<fpage>1775</fpage>
<lpage>1787</lpage>
<pub-id pub-id-type="doi">10.1126/science.1196914</pub-id>
<pub-id pub-id-type="pmid">21177976</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000278  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000278  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024