Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

EXCAVATOR: detecting copy number variants from whole-exome sequencing data

Identifieur interne : 000345 ( Pmc/Corpus ); précédent : 000344; suivant : 000346

EXCAVATOR: detecting copy number variants from whole-exome sequencing data

Auteurs : Alberto Magi ; Lorenzo Tattini ; Ingrid Cifola ; Romina D Urizio ; Matteo Benelli ; Eleonora Mangano ; Cristina Battaglia ; Elena Bonora ; Ants Kurg ; Marco Seri ; Pamela Magini ; Betti Giusti ; Giovanni Romeo ; Tommaso Pippucci ; Gianluca De Bellis ; Rosanna Abbate ; Gian Franco Gensini

Source :

RBID : PMC:4053953

Abstract

We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.


Url:
DOI: 10.1186/gb-2013-14-10-r120
PubMed: 24172663
PubMed Central: 4053953

Links to Exploration step

PMC:4053953

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">EXCAVATOR: detecting copy number variants from whole-exome sequencing data</title>
<author>
<name sortKey="Magi, Alberto" sort="Magi, Alberto" uniqKey="Magi A" first="Alberto" last="Magi">Alberto Magi</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tattini, Lorenzo" sort="Tattini, Lorenzo" uniqKey="Tattini L" first="Lorenzo" last="Tattini">Lorenzo Tattini</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Laboratory of Molecular Genetics, G. Gaslini Institute, Genoa, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cifola, Ingrid" sort="Cifola, Ingrid" uniqKey="Cifola I" first="Ingrid" last="Cifola">Ingrid Cifola</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="D Urizio, Romina" sort="D Urizio, Romina" uniqKey="D Urizio R" first="Romina" last="D Urizio">Romina D Urizio</name>
<affiliation>
<nlm:aff id="I4">Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Benelli, Matteo" sort="Benelli, Matteo" uniqKey="Benelli M" first="Matteo" last="Benelli">Matteo Benelli</name>
<affiliation>
<nlm:aff id="I5">Diagnostic Genetic Unit, Careggi Hospital, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mangano, Eleonora" sort="Mangano, Eleonora" uniqKey="Mangano E" first="Eleonora" last="Mangano">Eleonora Mangano</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Battaglia, Cristina" sort="Battaglia, Cristina" uniqKey="Battaglia C" first="Cristina" last="Battaglia">Cristina Battaglia</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I6">Dipartimento di Biotecnologie Mediche e Medicina Traslazionale (BIOMETRA), University of Milan, Milan, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bonora, Elena" sort="Bonora, Elena" uniqKey="Bonora E" first="Elena" last="Bonora">Elena Bonora</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kurg, Ants" sort="Kurg, Ants" uniqKey="Kurg A" first="Ants" last="Kurg">Ants Kurg</name>
<affiliation>
<nlm:aff id="I8">Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Seri, Marco" sort="Seri, Marco" uniqKey="Seri M" first="Marco" last="Seri">Marco Seri</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Magini, Pamela" sort="Magini, Pamela" uniqKey="Magini P" first="Pamela" last="Magini">Pamela Magini</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Giusti, Betti" sort="Giusti, Betti" uniqKey="Giusti B" first="Betti" last="Giusti">Betti Giusti</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Romeo, Giovanni" sort="Romeo, Giovanni" uniqKey="Romeo G" first="Giovanni" last="Romeo">Giovanni Romeo</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pippucci, Tommaso" sort="Pippucci, Tommaso" uniqKey="Pippucci T" first="Tommaso" last="Pippucci">Tommaso Pippucci</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bellis, Gianluca De" sort="Bellis, Gianluca De" uniqKey="Bellis G" first="Gianluca De" last="Bellis">Gianluca De Bellis</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Abbate, Rosanna" sort="Abbate, Rosanna" uniqKey="Abbate R" first="Rosanna" last="Abbate">Rosanna Abbate</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gensini, Gian Franco" sort="Gensini, Gian Franco" uniqKey="Gensini G" first="Gian Franco" last="Gensini">Gian Franco Gensini</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24172663</idno>
<idno type="pmc">4053953</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053953</idno>
<idno type="RBID">PMC:4053953</idno>
<idno type="doi">10.1186/gb-2013-14-10-r120</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000345</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000345</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">EXCAVATOR: detecting copy number variants from whole-exome sequencing data</title>
<author>
<name sortKey="Magi, Alberto" sort="Magi, Alberto" uniqKey="Magi A" first="Alberto" last="Magi">Alberto Magi</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tattini, Lorenzo" sort="Tattini, Lorenzo" uniqKey="Tattini L" first="Lorenzo" last="Tattini">Lorenzo Tattini</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Laboratory of Molecular Genetics, G. Gaslini Institute, Genoa, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cifola, Ingrid" sort="Cifola, Ingrid" uniqKey="Cifola I" first="Ingrid" last="Cifola">Ingrid Cifola</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="D Urizio, Romina" sort="D Urizio, Romina" uniqKey="D Urizio R" first="Romina" last="D Urizio">Romina D Urizio</name>
<affiliation>
<nlm:aff id="I4">Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Benelli, Matteo" sort="Benelli, Matteo" uniqKey="Benelli M" first="Matteo" last="Benelli">Matteo Benelli</name>
<affiliation>
<nlm:aff id="I5">Diagnostic Genetic Unit, Careggi Hospital, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mangano, Eleonora" sort="Mangano, Eleonora" uniqKey="Mangano E" first="Eleonora" last="Mangano">Eleonora Mangano</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Battaglia, Cristina" sort="Battaglia, Cristina" uniqKey="Battaglia C" first="Cristina" last="Battaglia">Cristina Battaglia</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I6">Dipartimento di Biotecnologie Mediche e Medicina Traslazionale (BIOMETRA), University of Milan, Milan, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bonora, Elena" sort="Bonora, Elena" uniqKey="Bonora E" first="Elena" last="Bonora">Elena Bonora</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kurg, Ants" sort="Kurg, Ants" uniqKey="Kurg A" first="Ants" last="Kurg">Ants Kurg</name>
<affiliation>
<nlm:aff id="I8">Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Seri, Marco" sort="Seri, Marco" uniqKey="Seri M" first="Marco" last="Seri">Marco Seri</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Magini, Pamela" sort="Magini, Pamela" uniqKey="Magini P" first="Pamela" last="Magini">Pamela Magini</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Giusti, Betti" sort="Giusti, Betti" uniqKey="Giusti B" first="Betti" last="Giusti">Betti Giusti</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Romeo, Giovanni" sort="Romeo, Giovanni" uniqKey="Romeo G" first="Giovanni" last="Romeo">Giovanni Romeo</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pippucci, Tommaso" sort="Pippucci, Tommaso" uniqKey="Pippucci T" first="Tommaso" last="Pippucci">Tommaso Pippucci</name>
<affiliation>
<nlm:aff id="I7">Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bellis, Gianluca De" sort="Bellis, Gianluca De" uniqKey="Bellis G" first="Gianluca De" last="Bellis">Gianluca De Bellis</name>
<affiliation>
<nlm:aff id="I3">Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Abbate, Rosanna" sort="Abbate, Rosanna" uniqKey="Abbate R" first="Rosanna" last="Abbate">Rosanna Abbate</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gensini, Gian Franco" sort="Gensini, Gian Franco" uniqKey="Gensini G" first="Gian Franco" last="Gensini">Gian Franco Gensini</name>
<affiliation>
<nlm:aff id="I1">Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Genome Biology</title>
<idno type="ISSN">1465-6906</idno>
<idno type="eISSN">1465-6914</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title></title>
<p>We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/excavatortool/">http://sourceforge.net/projects/excavatortool/</ext-link>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Alkan, C" uniqKey="Alkan C">C Alkan</name>
</author>
<author>
<name sortKey="Coe, Bp" uniqKey="Coe B">BP Coe</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iafrate, Aj" uniqKey="Iafrate A">AJ Iafrate</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Rivera, Mn" uniqKey="Rivera M">MN Rivera</name>
</author>
<author>
<name sortKey="Listewnik, Ml" uniqKey="Listewnik M">ML Listewnik</name>
</author>
<author>
<name sortKey="Donahoe, Pk" uniqKey="Donahoe P">PK Donahoe</name>
</author>
<author>
<name sortKey="Qi, Y" uniqKey="Qi Y">Y Qi</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tuzun, E" uniqKey="Tuzun E">E Tuzun</name>
</author>
<author>
<name sortKey="Sharp, Aj" uniqKey="Sharp A">AJ Sharp</name>
</author>
<author>
<name sortKey="Bailey, Ja" uniqKey="Bailey J">JA Bailey</name>
</author>
<author>
<name sortKey="Kaul, R" uniqKey="Kaul R">R Kaul</name>
</author>
<author>
<name sortKey="Morrison, Va" uniqKey="Morrison V">VA Morrison</name>
</author>
<author>
<name sortKey="Pertz, Lm" uniqKey="Pertz L">LM Pertz</name>
</author>
<author>
<name sortKey="Haugen, E" uniqKey="Haugen E">E Haugen</name>
</author>
<author>
<name sortKey="Hayden, H" uniqKey="Hayden H">H Hayden</name>
</author>
<author>
<name sortKey="Albertson, D" uniqKey="Albertson D">D Albertson</name>
</author>
<author>
<name sortKey="Pinkel, D" uniqKey="Pinkel D">D Pinkel</name>
</author>
<author>
<name sortKey="Olson, Mv" uniqKey="Olson M">MV Olson</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Redon, R" uniqKey="Redon R">R Redon</name>
</author>
<author>
<name sortKey="Ishikawa, S" uniqKey="Ishikawa S">S Ishikawa</name>
</author>
<author>
<name sortKey="Fitch, Kr" uniqKey="Fitch K">KR Fitch</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Perry, Gh" uniqKey="Perry G">GH Perry</name>
</author>
<author>
<name sortKey="Andrews, Td" uniqKey="Andrews T">TD Andrews</name>
</author>
<author>
<name sortKey="Fiegler, H" uniqKey="Fiegler H">H Fiegler</name>
</author>
<author>
<name sortKey="Shapero, Mh" uniqKey="Shapero M">MH Shapero</name>
</author>
<author>
<name sortKey="Carson, Ar" uniqKey="Carson A">AR Carson</name>
</author>
<author>
<name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
<author>
<name sortKey="Cho, Ek" uniqKey="Cho E">EK Cho</name>
</author>
<author>
<name sortKey="Dallaire, S" uniqKey="Dallaire S">S Dallaire</name>
</author>
<author>
<name sortKey="Freeman, Jl" uniqKey="Freeman J">JL Freeman</name>
</author>
<author>
<name sortKey="Gonzalez, Jr" uniqKey="Gonzalez J">JR González</name>
</author>
<author>
<name sortKey="Gratac S, M" uniqKey="Gratac S M">M Gratacòs</name>
</author>
<author>
<name sortKey="Huang, J" uniqKey="Huang J">J Huang</name>
</author>
<author>
<name sortKey="Kalaitzopoulos, D" uniqKey="Kalaitzopoulos D">D Kalaitzopoulos</name>
</author>
<author>
<name sortKey="Komura, D" uniqKey="Komura D">D Komura</name>
</author>
<author>
<name sortKey="Macdonald, Jr" uniqKey="Macdonald J">JR MacDonald</name>
</author>
<author>
<name sortKey="Marshall, Cr" uniqKey="Marshall C">CR Marshall</name>
</author>
<author>
<name sortKey="Mei, R" uniqKey="Mei R">R Mei</name>
</author>
<author>
<name sortKey="Montgomery, L" uniqKey="Montgomery L">L Montgomery</name>
</author>
<author>
<name sortKey="Nishimura, K" uniqKey="Nishimura K">K Nishimura</name>
</author>
<author>
<name sortKey="Okamura, K" uniqKey="Okamura K">K Okamura</name>
</author>
<author>
<name sortKey="Shen, F" uniqKey="Shen F">F Shen</name>
</author>
<author>
<name sortKey="Somerville, Mj" uniqKey="Somerville M">MJ Somerville</name>
</author>
<author>
<name sortKey="Tchinda, J" uniqKey="Tchinda J">J Tchinda</name>
</author>
<author>
<name sortKey="Valsesia, A" uniqKey="Valsesia A">A Valsesia</name>
</author>
<author>
<name sortKey="Woodwark, C" uniqKey="Woodwark C">C Woodwark</name>
</author>
<author>
<name sortKey="Yang, F" uniqKey="Yang F">F Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Conrad, Df" uniqKey="Conrad D">DF Conrad</name>
</author>
<author>
<name sortKey="Pinto, D" uniqKey="Pinto D">D Pinto</name>
</author>
<author>
<name sortKey="Redon, R" uniqKey="Redon R">R Redon</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Gokcumen, O" uniqKey="Gokcumen O">O Gokcumen</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Aerts, J" uniqKey="Aerts J">J Aerts</name>
</author>
<author>
<name sortKey="Andrews, Td" uniqKey="Andrews T">TD Andrews</name>
</author>
<author>
<name sortKey="Barnes, C" uniqKey="Barnes C">C Barnes</name>
</author>
<author>
<name sortKey="Campbell, P" uniqKey="Campbell P">P Campbell</name>
</author>
<author>
<name sortKey="Fitzgerald, T" uniqKey="Fitzgerald T">T Fitzgerald</name>
</author>
<author>
<name sortKey="Hu, M" uniqKey="Hu M">M Hu</name>
</author>
<author>
<name sortKey="Ihm, Ch" uniqKey="Ihm C">CH Ihm</name>
</author>
<author>
<name sortKey="Kristiansson, K" uniqKey="Kristiansson K">K Kristiansson</name>
</author>
<author>
<name sortKey="Macarthur, Dg" uniqKey="Macarthur D">DG Macarthur</name>
</author>
<author>
<name sortKey="Macdonald, Jr" uniqKey="Macdonald J">JR Macdonald</name>
</author>
<author>
<name sortKey="Onyiah, I" uniqKey="Onyiah I">I Onyiah</name>
</author>
<author>
<name sortKey="Pang, Awc" uniqKey="Pang A">AWC Pang</name>
</author>
<author>
<name sortKey="Robson, S" uniqKey="Robson S">S Robson</name>
</author>
<author>
<name sortKey="Stirrups, K" uniqKey="Stirrups K">K Stirrups</name>
</author>
<author>
<name sortKey="Valsesia, A" uniqKey="Valsesia A">A Valsesia</name>
</author>
<author>
<name sortKey="Walter, K" uniqKey="Walter K">K Walter</name>
</author>
<author>
<name sortKey="Wei, J" uniqKey="Wei J">J Wei</name>
</author>
<author>
<name sortKey="Tyler Smith, C" uniqKey="Tyler Smith C">C Tyler-Smith</name>
</author>
<author>
<name sortKey="Carter, Np" uniqKey="Carter N">NP Carter</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
<author>
<name sortKey="Hurles, Me" uniqKey="Hurles M">ME Hurles</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kidd, Jm" uniqKey="Kidd J">JM Kidd</name>
</author>
<author>
<name sortKey="Cooper, Gm" uniqKey="Cooper G">GM Cooper</name>
</author>
<author>
<name sortKey="Donahue, Wf" uniqKey="Donahue W">WF Donahue</name>
</author>
<author>
<name sortKey="Hayden, Hs" uniqKey="Hayden H">HS Hayden</name>
</author>
<author>
<name sortKey="Sampas, N" uniqKey="Sampas N">N Sampas</name>
</author>
<author>
<name sortKey="Graves, T" uniqKey="Graves T">T Graves</name>
</author>
<author>
<name sortKey="Hansen, N" uniqKey="Hansen N">N Hansen</name>
</author>
<author>
<name sortKey="Teague, B" uniqKey="Teague B">B Teague</name>
</author>
<author>
<name sortKey="Alkan, C" uniqKey="Alkan C">C Alkan</name>
</author>
<author>
<name sortKey="Antonacci, F" uniqKey="Antonacci F">F Antonacci</name>
</author>
<author>
<name sortKey="Haugen, E" uniqKey="Haugen E">E Haugen</name>
</author>
<author>
<name sortKey="Zerr, T" uniqKey="Zerr T">T Zerr</name>
</author>
<author>
<name sortKey="Yamada, Na" uniqKey="Yamada N">NA Yamada</name>
</author>
<author>
<name sortKey="Tsang, P" uniqKey="Tsang P">P Tsang</name>
</author>
<author>
<name sortKey="Newman, Tl" uniqKey="Newman T">TL Newman</name>
</author>
<author>
<name sortKey="Tuzun, E" uniqKey="Tuzun E">E Tüzün</name>
</author>
<author>
<name sortKey="Cheng, Z" uniqKey="Cheng Z">Z Cheng</name>
</author>
<author>
<name sortKey="Ebling, Hm" uniqKey="Ebling H">HM Ebling</name>
</author>
<author>
<name sortKey="Tusneem, N" uniqKey="Tusneem N">N Tusneem</name>
</author>
<author>
<name sortKey="David, R" uniqKey="David R">R David</name>
</author>
<author>
<name sortKey="Gillett, W" uniqKey="Gillett W">W Gillett</name>
</author>
<author>
<name sortKey="Phelps, Ka" uniqKey="Phelps K">KA Phelps</name>
</author>
<author>
<name sortKey="Weaver, M" uniqKey="Weaver M">M Weaver</name>
</author>
<author>
<name sortKey="Saranga, D" uniqKey="Saranga D">D Saranga</name>
</author>
<author>
<name sortKey="Brand, A" uniqKey="Brand A">A Brand</name>
</author>
<author>
<name sortKey="Tao, W" uniqKey="Tao W">W Tao</name>
</author>
<author>
<name sortKey="Gustafson, E" uniqKey="Gustafson E">E Gustafson</name>
</author>
<author>
<name sortKey="Mckernan, K" uniqKey="Mckernan K">K McKernan</name>
</author>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
<author>
<name sortKey="Malig, M" uniqKey="Malig M">M Malig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccarroll, Sa" uniqKey="Mccarroll S">SA McCarroll</name>
</author>
<author>
<name sortKey="Kuruvilla, Fg" uniqKey="Kuruvilla F">FG Kuruvilla</name>
</author>
<author>
<name sortKey="Korn, Jm" uniqKey="Korn J">JM Korn</name>
</author>
<author>
<name sortKey="Cawley, S" uniqKey="Cawley S">S Cawley</name>
</author>
<author>
<name sortKey="Nemesh, J" uniqKey="Nemesh J">J Nemesh</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Shapero, Mh" uniqKey="Shapero M">MH Shapero</name>
</author>
<author>
<name sortKey="De Bakker, Piw" uniqKey="De Bakker P">PIW de Bakker</name>
</author>
<author>
<name sortKey="Maller, Jb" uniqKey="Maller J">JB Maller</name>
</author>
<author>
<name sortKey="Kirby, A" uniqKey="Kirby A">A Kirby</name>
</author>
<author>
<name sortKey="Elliott, Al" uniqKey="Elliott A">AL Elliott</name>
</author>
<author>
<name sortKey="Parkin, M" uniqKey="Parkin M">M Parkin</name>
</author>
<author>
<name sortKey="Hubbell, E" uniqKey="Hubbell E">E Hubbell</name>
</author>
<author>
<name sortKey="Webster, T" uniqKey="Webster T">T Webster</name>
</author>
<author>
<name sortKey="Mei, R" uniqKey="Mei R">R Mei</name>
</author>
<author>
<name sortKey="Veitch, J" uniqKey="Veitch J">J Veitch</name>
</author>
<author>
<name sortKey="Collins, Pj" uniqKey="Collins P">PJ Collins</name>
</author>
<author>
<name sortKey="Handsaker, R" uniqKey="Handsaker R">R Handsaker</name>
</author>
<author>
<name sortKey="Lincoln, S" uniqKey="Lincoln S">S Lincoln</name>
</author>
<author>
<name sortKey="Nizzari, M" uniqKey="Nizzari M">M Nizzari</name>
</author>
<author>
<name sortKey="Blume, J" uniqKey="Blume J">J Blume</name>
</author>
<author>
<name sortKey="Jones, Kw" uniqKey="Jones K">KW Jones</name>
</author>
<author>
<name sortKey="Rava, R" uniqKey="Rava R">R Rava</name>
</author>
<author>
<name sortKey="Daly, Mj" uniqKey="Daly M">MJ Daly</name>
</author>
<author>
<name sortKey="Gabriel, Sb" uniqKey="Gabriel S">SB Gabriel</name>
</author>
<author>
<name sortKey="Altshuler, D" uniqKey="Altshuler D">D Altshuler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sebat, J" uniqKey="Sebat J">J Sebat</name>
</author>
<author>
<name sortKey="Lakshmi, B" uniqKey="Lakshmi B">B Lakshmi</name>
</author>
<author>
<name sortKey="Troge, J" uniqKey="Troge J">J Troge</name>
</author>
<author>
<name sortKey="Alexander, J" uniqKey="Alexander J">J Alexander</name>
</author>
<author>
<name sortKey="Young, J" uniqKey="Young J">J Young</name>
</author>
<author>
<name sortKey="Lundin, P" uniqKey="Lundin P">P Lundin</name>
</author>
<author>
<name sortKey="M Ner, S" uniqKey="M Ner S">S Månér</name>
</author>
<author>
<name sortKey="Massa, H" uniqKey="Massa H">H Massa</name>
</author>
<author>
<name sortKey="Walker, M" uniqKey="Walker M">M Walker</name>
</author>
<author>
<name sortKey="Chi, M" uniqKey="Chi M">M Chi</name>
</author>
<author>
<name sortKey="Navin, N" uniqKey="Navin N">N Navin</name>
</author>
<author>
<name sortKey="Lucito, R" uniqKey="Lucito R">R Lucito</name>
</author>
<author>
<name sortKey="Healy, J" uniqKey="Healy J">J Healy</name>
</author>
<author>
<name sortKey="Hicks, J" uniqKey="Hicks J">J Hicks</name>
</author>
<author>
<name sortKey="Ye, K" uniqKey="Ye K">K Ye</name>
</author>
<author>
<name sortKey="Reiner, A" uniqKey="Reiner A">A Reiner</name>
</author>
<author>
<name sortKey="Gilliam, Tc" uniqKey="Gilliam T">TC Gilliam</name>
</author>
<author>
<name sortKey="Trask, B" uniqKey="Trask B">B Trask</name>
</author>
<author>
<name sortKey="Patterson, N" uniqKey="Patterson N">N Patterson</name>
</author>
<author>
<name sortKey="Zetterberg, A" uniqKey="Zetterberg A">A Zetterberg</name>
</author>
<author>
<name sortKey="Wigler, M" uniqKey="Wigler M">M Wigler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pang, Aw" uniqKey="Pang A">AW Pang</name>
</author>
<author>
<name sortKey="Macdonald, Jr" uniqKey="Macdonald J">JR MacDonald</name>
</author>
<author>
<name sortKey="Pinto, D" uniqKey="Pinto D">D Pinto</name>
</author>
<author>
<name sortKey="Wei, J" uniqKey="Wei J">J Wei</name>
</author>
<author>
<name sortKey="Rafiq, Ma" uniqKey="Rafiq M">MA Rafiq</name>
</author>
<author>
<name sortKey="Conrad, Df" uniqKey="Conrad D">DF Conrad</name>
</author>
<author>
<name sortKey="Park, H" uniqKey="Park H">H Park</name>
</author>
<author>
<name sortKey="Hurles, Me" uniqKey="Hurles M">ME Hurles</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
<author>
<name sortKey="Venter, Jc" uniqKey="Venter J">JC Venter</name>
</author>
<author>
<name sortKey="Kirkness, Ef" uniqKey="Kirkness E">EF Kirkness</name>
</author>
<author>
<name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abecasis, Gr" uniqKey="Abecasis G">GR Abecasis</name>
</author>
<author>
<name sortKey="Altshuler, D" uniqKey="Altshuler D">D Altshuler</name>
</author>
<author>
<name sortKey="Auton, A" uniqKey="Auton A">A Auton</name>
</author>
<author>
<name sortKey="Brooks, Ld" uniqKey="Brooks L">LD Brooks</name>
</author>
<author>
<name sortKey="Durbin, Rm" uniqKey="Durbin R">RM Durbin</name>
</author>
<author>
<name sortKey="Gibbs, Ra" uniqKey="Gibbs R">RA Gibbs</name>
</author>
<author>
<name sortKey="Hurles, Me" uniqKey="Hurles M">ME Hurles</name>
</author>
<author>
<name sortKey="Mcvean, Ga" uniqKey="Mcvean G">GA McVean</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Singleton, Ab" uniqKey="Singleton A">AB Singleton</name>
</author>
<author>
<name sortKey="Farrer, M" uniqKey="Farrer M">M Farrer</name>
</author>
<author>
<name sortKey="Johnson, J" uniqKey="Johnson J">J Johnson</name>
</author>
<author>
<name sortKey="Singleton, A" uniqKey="Singleton A">A Singleton</name>
</author>
<author>
<name sortKey="Hague, S" uniqKey="Hague S">S Hague</name>
</author>
<author>
<name sortKey="Kachergus, J" uniqKey="Kachergus J">J Kachergus</name>
</author>
<author>
<name sortKey="Hulihan, M" uniqKey="Hulihan M">M Hulihan</name>
</author>
<author>
<name sortKey="Peuralinna, T" uniqKey="Peuralinna T">T Peuralinna</name>
</author>
<author>
<name sortKey="Dutra, A" uniqKey="Dutra A">A Dutra</name>
</author>
<author>
<name sortKey="Nussbaum, R" uniqKey="Nussbaum R">R Nussbaum</name>
</author>
<author>
<name sortKey="Lincoln, S" uniqKey="Lincoln S">S Lincoln</name>
</author>
<author>
<name sortKey="Crawley, A" uniqKey="Crawley A">A Crawley</name>
</author>
<author>
<name sortKey="Hanson, M" uniqKey="Hanson M">M Hanson</name>
</author>
<author>
<name sortKey="Maraganore, D" uniqKey="Maraganore D">D Maraganore</name>
</author>
<author>
<name sortKey="Adler, C" uniqKey="Adler C">C Adler</name>
</author>
<author>
<name sortKey="Cookson, Mr" uniqKey="Cookson M">MR Cookson</name>
</author>
<author>
<name sortKey="Muenter, M" uniqKey="Muenter M">M Muenter</name>
</author>
<author>
<name sortKey="Baptista, M" uniqKey="Baptista M">M Baptista</name>
</author>
<author>
<name sortKey="Miller, D" uniqKey="Miller D">D Miller</name>
</author>
<author>
<name sortKey="Blancato, J" uniqKey="Blancato J">J Blancato</name>
</author>
<author>
<name sortKey="Hardy, J" uniqKey="Hardy J">J Hardy</name>
</author>
<author>
<name sortKey="Gwinn Hardy, K" uniqKey="Gwinn Hardy K">K Gwinn-Hardy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rovelet Lecrux, A" uniqKey="Rovelet Lecrux A">A Rovelet-Lecrux</name>
</author>
<author>
<name sortKey="Hannequin, D" uniqKey="Hannequin D">D Hannequin</name>
</author>
<author>
<name sortKey="Raux, G" uniqKey="Raux G">G Raux</name>
</author>
<author>
<name sortKey="Le Meur, N" uniqKey="Le Meur N">N Le Meur</name>
</author>
<author>
<name sortKey="Laquerriere, A" uniqKey="Laquerriere A">A Laquerrière</name>
</author>
<author>
<name sortKey="Vital, A" uniqKey="Vital A">A Vital</name>
</author>
<author>
<name sortKey="Dumanchin, C" uniqKey="Dumanchin C">C Dumanchin</name>
</author>
<author>
<name sortKey="Feuillette, S" uniqKey="Feuillette S">S Feuillette</name>
</author>
<author>
<name sortKey="Brice, A" uniqKey="Brice A">A Brice</name>
</author>
<author>
<name sortKey="Vercelletto, M" uniqKey="Vercelletto M">M Vercelletto</name>
</author>
<author>
<name sortKey="Dubas, F" uniqKey="Dubas F">F Dubas</name>
</author>
<author>
<name sortKey="Frebourg, T" uniqKey="Frebourg T">T Frebourg</name>
</author>
<author>
<name sortKey="Campion, D" uniqKey="Campion D">D Campion</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, Da" uniqKey="Wheeler D">DA Wheeler</name>
</author>
<author>
<name sortKey="Srinivasan, M" uniqKey="Srinivasan M">M Srinivasan</name>
</author>
<author>
<name sortKey="Egholm, M" uniqKey="Egholm M">M Egholm</name>
</author>
<author>
<name sortKey="Shen, Y" uniqKey="Shen Y">Y Shen</name>
</author>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
<author>
<name sortKey="Mcguire, A" uniqKey="Mcguire A">A McGuire</name>
</author>
<author>
<name sortKey="He, W" uniqKey="He W">W He</name>
</author>
<author>
<name sortKey="Chen, Yj" uniqKey="Chen Y">YJ Chen</name>
</author>
<author>
<name sortKey="Makhijani, V" uniqKey="Makhijani V">V Makhijani</name>
</author>
<author>
<name sortKey="Roth, Gt" uniqKey="Roth G">GT Roth</name>
</author>
<author>
<name sortKey="Gomes, X" uniqKey="Gomes X">X Gomes</name>
</author>
<author>
<name sortKey="Tartaro, K" uniqKey="Tartaro K">K Tartaro</name>
</author>
<author>
<name sortKey="Niazi, F" uniqKey="Niazi F">F Niazi</name>
</author>
<author>
<name sortKey="Turcotte, Cl" uniqKey="Turcotte C">CL Turcotte</name>
</author>
<author>
<name sortKey="Irzyk, Gp" uniqKey="Irzyk G">GP Irzyk</name>
</author>
<author>
<name sortKey="Lupski, Jr" uniqKey="Lupski J">JR Lupski</name>
</author>
<author>
<name sortKey="Chinault, C" uniqKey="Chinault C">C Chinault</name>
</author>
<author>
<name sortKey="Song, Xz" uniqKey="Song X">Xz Song</name>
</author>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Yuan, Y" uniqKey="Yuan Y">Y Yuan</name>
</author>
<author>
<name sortKey="Nazareth, L" uniqKey="Nazareth L">L Nazareth</name>
</author>
<author>
<name sortKey="Qin, X" uniqKey="Qin X">X Qin</name>
</author>
<author>
<name sortKey="Muzny, Dm" uniqKey="Muzny D">DM Muzny</name>
</author>
<author>
<name sortKey="Margulies, M" uniqKey="Margulies M">M Margulies</name>
</author>
<author>
<name sortKey="Weinstock, Gm" uniqKey="Weinstock G">GM Weinstock</name>
</author>
<author>
<name sortKey="Gibbs, Ra" uniqKey="Gibbs R">RA Gibbs</name>
</author>
<author>
<name sortKey="Rothberg, Jm" uniqKey="Rothberg J">JM Rothberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bentley, Dr" uniqKey="Bentley D">DR Bentley</name>
</author>
<author>
<name sortKey="Balasubramanian, S" uniqKey="Balasubramanian S">S Balasubramanian</name>
</author>
<author>
<name sortKey="Swerdlow, Hp" uniqKey="Swerdlow H">HP Swerdlow</name>
</author>
<author>
<name sortKey="Smith, Gp" uniqKey="Smith G">GP Smith</name>
</author>
<author>
<name sortKey="Milton, J" uniqKey="Milton J">J Milton</name>
</author>
<author>
<name sortKey="Brown, Cg" uniqKey="Brown C">CG Brown</name>
</author>
<author>
<name sortKey="Hall, Kp" uniqKey="Hall K">KP Hall</name>
</author>
<author>
<name sortKey="Evers, Dj" uniqKey="Evers D">DJ Evers</name>
</author>
<author>
<name sortKey="Barnes, Cl" uniqKey="Barnes C">CL Barnes</name>
</author>
<author>
<name sortKey="Bignell, Hr" uniqKey="Bignell H">HR Bignell</name>
</author>
<author>
<name sortKey="Boutell, Jm" uniqKey="Boutell J">JM Boutell</name>
</author>
<author>
<name sortKey="Bryant, J" uniqKey="Bryant J">J Bryant</name>
</author>
<author>
<name sortKey="Carter, Rj" uniqKey="Carter R">RJ Carter</name>
</author>
<author>
<name sortKey="Keira Cheetham, R" uniqKey="Keira Cheetham R">R Keira Cheetham</name>
</author>
<author>
<name sortKey="Cox, Aj" uniqKey="Cox A">AJ Cox</name>
</author>
<author>
<name sortKey="Ellis, Dj" uniqKey="Ellis D">DJ Ellis</name>
</author>
<author>
<name sortKey="Flatbush, Mr" uniqKey="Flatbush M">MR Flatbush</name>
</author>
<author>
<name sortKey="Gormley, Na" uniqKey="Gormley N">NA Gormley</name>
</author>
<author>
<name sortKey="Humphray, Sj" uniqKey="Humphray S">SJ Humphray</name>
</author>
<author>
<name sortKey="Irving, Lj" uniqKey="Irving L">LJ Irving</name>
</author>
<author>
<name sortKey="Karbelashvili, Ms" uniqKey="Karbelashvili M">MS Karbelashvili</name>
</author>
<author>
<name sortKey="Kirk, Sm" uniqKey="Kirk S">SM Kirk</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Maisinger, Ks" uniqKey="Maisinger K">KS Maisinger</name>
</author>
<author>
<name sortKey="Murray, Lj" uniqKey="Murray L">LJ Murray</name>
</author>
<author>
<name sortKey="Obradovic, B" uniqKey="Obradovic B">B Obradovic</name>
</author>
<author>
<name sortKey="Ost, T" uniqKey="Ost T">T Ost</name>
</author>
<author>
<name sortKey="Parkinson, Ml" uniqKey="Parkinson M">ML Parkinson</name>
</author>
<author>
<name sortKey="Pratt, Mr" uniqKey="Pratt M">MR Pratt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mckernan, Kj" uniqKey="Mckernan K">KJ McKernan</name>
</author>
<author>
<name sortKey="Peckham, He" uniqKey="Peckham H">HE Peckham</name>
</author>
<author>
<name sortKey="Costa, Gl" uniqKey="Costa G">GL Costa</name>
</author>
<author>
<name sortKey="Mclaughlin, Sf" uniqKey="Mclaughlin S">SF McLaughlin</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Tsung, Ef" uniqKey="Tsung E">EF Tsung</name>
</author>
<author>
<name sortKey="Clouser, Cr" uniqKey="Clouser C">CR Clouser</name>
</author>
<author>
<name sortKey="Duncan, C" uniqKey="Duncan C">C Duncan</name>
</author>
<author>
<name sortKey="Ichikawa, Jk" uniqKey="Ichikawa J">JK Ichikawa</name>
</author>
<author>
<name sortKey="Lee, Cc" uniqKey="Lee C">CC Lee</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Ranade, Ss" uniqKey="Ranade S">SS Ranade</name>
</author>
<author>
<name sortKey="Dimalanta, Et" uniqKey="Dimalanta E">ET Dimalanta</name>
</author>
<author>
<name sortKey="Hyland, Fc" uniqKey="Hyland F">FC Hyland</name>
</author>
<author>
<name sortKey="Sokolsky, Td" uniqKey="Sokolsky T">TD Sokolsky</name>
</author>
<author>
<name sortKey="Zhang, L" uniqKey="Zhang L">L Zhang</name>
</author>
<author>
<name sortKey="Sheridan, A" uniqKey="Sheridan A">A Sheridan</name>
</author>
<author>
<name sortKey="Fu, H" uniqKey="Fu H">H Fu</name>
</author>
<author>
<name sortKey="Hendrickson, Cl" uniqKey="Hendrickson C">CL Hendrickson</name>
</author>
<author>
<name sortKey="Li, B" uniqKey="Li B">B Li</name>
</author>
<author>
<name sortKey="Kotler, L" uniqKey="Kotler L">L Kotler</name>
</author>
<author>
<name sortKey="Stuart, Jr" uniqKey="Stuart J">JR Stuart</name>
</author>
<author>
<name sortKey="Malek, Ja" uniqKey="Malek J">JA Malek</name>
</author>
<author>
<name sortKey="Manning, Jm" uniqKey="Manning J">JM Manning</name>
</author>
<author>
<name sortKey="Antipova, Aa" uniqKey="Antipova A">AA Antipova</name>
</author>
<author>
<name sortKey="Perez, Ds" uniqKey="Perez D">DS Perez</name>
</author>
<author>
<name sortKey="Moore, Mp" uniqKey="Moore M">MP Moore</name>
</author>
<author>
<name sortKey="Hayashibara, Kc" uniqKey="Hayashibara K">KC Hayashibara</name>
</author>
<author>
<name sortKey="Lyons, Mr" uniqKey="Lyons M">MR Lyons</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teer, Jk" uniqKey="Teer J">JK Teer</name>
</author>
<author>
<name sortKey="Mullikin, Jc" uniqKey="Mullikin J">JC Mullikin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ng, Sb" uniqKey="Ng S">SB Ng</name>
</author>
<author>
<name sortKey="Turner, Eh" uniqKey="Turner E">EH Turner</name>
</author>
<author>
<name sortKey="Robertson, Pd" uniqKey="Robertson P">PD Robertson</name>
</author>
<author>
<name sortKey="Flygare, Sd" uniqKey="Flygare S">SD Flygare</name>
</author>
<author>
<name sortKey="Bigham, Aw" uniqKey="Bigham A">AW Bigham</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
<author>
<name sortKey="Shaffer, T" uniqKey="Shaffer T">T Shaffer</name>
</author>
<author>
<name sortKey="Wong, M" uniqKey="Wong M">M Wong</name>
</author>
<author>
<name sortKey="Bhattacharjee, A" uniqKey="Bhattacharjee A">A Bhattacharjee</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
<author>
<name sortKey="Bamshad, M" uniqKey="Bamshad M">M Bamshad</name>
</author>
<author>
<name sortKey="Nickerson, Da" uniqKey="Nickerson D">DA Nickerson</name>
</author>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hormozdiari, F" uniqKey="Hormozdiari F">F Hormozdiari</name>
</author>
<author>
<name sortKey="Alkan, C" uniqKey="Alkan C">C Alkan</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
<author>
<name sortKey="Sahinalp, Sc" uniqKey="Sahinalp S">SC Sahinalp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Korbel, Jo" uniqKey="Korbel J">JO Korbel</name>
</author>
<author>
<name sortKey="Abyzov, A" uniqKey="Abyzov A">A Abyzov</name>
</author>
<author>
<name sortKey="Mu, Xj" uniqKey="Mu X">XJ Mu</name>
</author>
<author>
<name sortKey="Carriero, N" uniqKey="Carriero N">N Carriero</name>
</author>
<author>
<name sortKey="Cayting, P" uniqKey="Cayting P">P Cayting</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
<author>
<name sortKey="Gerstein, Mb" uniqKey="Gerstein M">MB Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karakoc, E" uniqKey="Karakoc E">E Karakoc</name>
</author>
<author>
<name sortKey="Alkan, C" uniqKey="Alkan C">C Alkan</name>
</author>
<author>
<name sortKey="O Oak, Bj" uniqKey="O Oak B">BJ O’Roak</name>
</author>
<author>
<name sortKey="Dennis, My" uniqKey="Dennis M">MY Dennis</name>
</author>
<author>
<name sortKey="Vives, L" uniqKey="Vives L">L Vives</name>
</author>
<author>
<name sortKey="Mark, K" uniqKey="Mark K">K Mark</name>
</author>
<author>
<name sortKey="Rieder, Mj" uniqKey="Rieder M">MJ Rieder</name>
</author>
<author>
<name sortKey="Nickerson, Da" uniqKey="Nickerson D">DA Nickerson</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ye, K" uniqKey="Ye K">K Ye</name>
</author>
<author>
<name sortKey="Schulz, Mh" uniqKey="Schulz M">MH Schulz</name>
</author>
<author>
<name sortKey="Long, Q" uniqKey="Long Q">Q Long</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
<author>
<name sortKey="Ning, Z" uniqKey="Ning Z">Z Ning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magi, A" uniqKey="Magi A">A Magi</name>
</author>
<author>
<name sortKey="Benelli, M" uniqKey="Benelli M">M Benelli</name>
</author>
<author>
<name sortKey="Yoon, S" uniqKey="Yoon S">S Yoon</name>
</author>
<author>
<name sortKey="Roviello, F" uniqKey="Roviello F">F Roviello</name>
</author>
<author>
<name sortKey="Torricelli, F" uniqKey="Torricelli F">F Torricelli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yoon, S" uniqKey="Yoon S">S Yoon</name>
</author>
<author>
<name sortKey="Xuan, Z" uniqKey="Xuan Z">Z Xuan</name>
</author>
<author>
<name sortKey="Makarov, V" uniqKey="Makarov V">V Makarov</name>
</author>
<author>
<name sortKey="Ye, K" uniqKey="Ye K">K Ye</name>
</author>
<author>
<name sortKey="Sebat, J" uniqKey="Sebat J">J Sebat</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chiang, Dy" uniqKey="Chiang D">DY Chiang</name>
</author>
<author>
<name sortKey="Getz, G" uniqKey="Getz G">G Getz</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author>
<name sortKey="O Elly, Mjt" uniqKey="O Elly M">MJT O’Kelly</name>
</author>
<author>
<name sortKey="Zhao, X" uniqKey="Zhao X">X Zhao</name>
</author>
<author>
<name sortKey="Carter, Sl" uniqKey="Carter S">SL Carter</name>
</author>
<author>
<name sortKey="Russ, C" uniqKey="Russ C">C Russ</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Meyerson, M" uniqKey="Meyerson M">M Meyerson</name>
</author>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sathirapongsasuti, Jf" uniqKey="Sathirapongsasuti J">JF Sathirapongsasuti</name>
</author>
<author>
<name sortKey="Lee, H" uniqKey="Lee H">H Lee</name>
</author>
<author>
<name sortKey="Horst, Baj" uniqKey="Horst B">BAJ Horst</name>
</author>
<author>
<name sortKey="Brunner, G" uniqKey="Brunner G">G Brunner</name>
</author>
<author>
<name sortKey="Cochran, Aj" uniqKey="Cochran A">AJ Cochran</name>
</author>
<author>
<name sortKey="Binder, S" uniqKey="Binder S">S Binder</name>
</author>
<author>
<name sortKey="Quackenbush, J" uniqKey="Quackenbush J">J Quackenbush</name>
</author>
<author>
<name sortKey="Nelson, Sf" uniqKey="Nelson S">SF Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krumm, N" uniqKey="Krumm N">N Krumm</name>
</author>
<author>
<name sortKey="Sudmant, Ph" uniqKey="Sudmant P">PH Sudmant</name>
</author>
<author>
<name sortKey="Ko, A" uniqKey="Ko A">A Ko</name>
</author>
<author>
<name sortKey="O Oak, Bj" uniqKey="O Oak B">BJ O’Roak</name>
</author>
<author>
<name sortKey="Malig, M" uniqKey="Malig M">M Malig</name>
</author>
<author>
<name sortKey="Coe, Bp" uniqKey="Coe B">BP Coe</name>
</author>
<author>
<name sortKey="Quinlan, Ar" uniqKey="Quinlan A">AR Quinlan</name>
</author>
<author>
<name sortKey="Nickerson, Da" uniqKey="Nickerson D">DA Nickerson</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fromer, M" uniqKey="Fromer M">M Fromer</name>
</author>
<author>
<name sortKey="Moran, Jl" uniqKey="Moran J">JL Moran</name>
</author>
<author>
<name sortKey="Chambert, K" uniqKey="Chambert K">K Chambert</name>
</author>
<author>
<name sortKey="Banks, E" uniqKey="Banks E">E Banks</name>
</author>
<author>
<name sortKey="Bergen, Se" uniqKey="Bergen S">SE Bergen</name>
</author>
<author>
<name sortKey="Ruderfer, Dm" uniqKey="Ruderfer D">DM Ruderfer</name>
</author>
<author>
<name sortKey="Handsaker, Re" uniqKey="Handsaker R">RE Handsaker</name>
</author>
<author>
<name sortKey="Mccarroll, Sa" uniqKey="Mccarroll S">SA McCarroll</name>
</author>
<author>
<name sortKey="O Onovan, Mc" uniqKey="O Onovan M">MC O’Donovan</name>
</author>
<author>
<name sortKey="Owen, Mj" uniqKey="Owen M">MJ Owen</name>
</author>
<author>
<name sortKey="Kirov, G" uniqKey="Kirov G">G Kirov</name>
</author>
<author>
<name sortKey="Sullivan, Pf" uniqKey="Sullivan P">PF Sullivan</name>
</author>
<author>
<name sortKey="Hultman, Cm" uniqKey="Hultman C">CM Hultman</name>
</author>
<author>
<name sortKey="Sklar, P" uniqKey="Sklar P">P Sklar</name>
</author>
<author>
<name sortKey="Purcell, Sm" uniqKey="Purcell S">SM Purcell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Lupat, R" uniqKey="Lupat R">R Lupat</name>
</author>
<author>
<name sortKey="Amarasinghe, Kc" uniqKey="Amarasinghe K">KC Amarasinghe</name>
</author>
<author>
<name sortKey="Thompson, Er" uniqKey="Thompson E">ER Thompson</name>
</author>
<author>
<name sortKey="Doyle, Ma" uniqKey="Doyle M">MA Doyle</name>
</author>
<author>
<name sortKey="Ryland, Gl" uniqKey="Ryland G">GL Ryland</name>
</author>
<author>
<name sortKey="Tothill, Rw" uniqKey="Tothill R">RW Tothill</name>
</author>
<author>
<name sortKey="Halgamuge, Sk" uniqKey="Halgamuge S">SK Halgamuge</name>
</author>
<author>
<name sortKey="Campbell, Ig" uniqKey="Campbell I">IG Campbell</name>
</author>
<author>
<name sortKey="Gorringe, Kl" uniqKey="Gorringe K">KL Gorringe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Olshen, Ab" uniqKey="Olshen A">AB Olshen</name>
</author>
<author>
<name sortKey="Venkatraman, Es" uniqKey="Venkatraman E">ES Venkatraman</name>
</author>
<author>
<name sortKey="Lucito, R" uniqKey="Lucito R">R Lucito</name>
</author>
<author>
<name sortKey="Wigler, M" uniqKey="Wigler M">M Wigler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koboldt, Dc" uniqKey="Koboldt D">DC Koboldt</name>
</author>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
<author>
<name sortKey="Larson, De" uniqKey="Larson D">DE Larson</name>
</author>
<author>
<name sortKey="Shen, D" uniqKey="Shen D">D Shen</name>
</author>
<author>
<name sortKey="Mclellan, Md" uniqKey="Mclellan M">MD McLellan</name>
</author>
<author>
<name sortKey="Lin, L" uniqKey="Lin L">L Lin</name>
</author>
<author>
<name sortKey="Miller, Ca" uniqKey="Miller C">CA Miller</name>
</author>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
<author>
<name sortKey="Ding, L" uniqKey="Ding L">L Ding</name>
</author>
<author>
<name sortKey="Wilson, Rk" uniqKey="Wilson R">RK Wilson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magi, A" uniqKey="Magi A">A Magi</name>
</author>
<author>
<name sortKey="Tattini, L" uniqKey="Tattini L">L Tattini</name>
</author>
<author>
<name sortKey="Pippucci, T" uniqKey="Pippucci T">T Pippucci</name>
</author>
<author>
<name sortKey="Torricelli, F" uniqKey="Torricelli F">F Torricelli</name>
</author>
<author>
<name sortKey="Benelli, M" uniqKey="Benelli M">M Benelli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harismendy, O" uniqKey="Harismendy O">O Harismendy</name>
</author>
<author>
<name sortKey="Ng, Pc" uniqKey="Ng P">PC Ng</name>
</author>
<author>
<name sortKey="Strausberg, Rl" uniqKey="Strausberg R">RL Strausberg</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Stockwell, Tb" uniqKey="Stockwell T">TB Stockwell</name>
</author>
<author>
<name sortKey="Beeson, Ky" uniqKey="Beeson K">KY Beeson</name>
</author>
<author>
<name sortKey="Schork, Nj" uniqKey="Schork N">NJ Schork</name>
</author>
<author>
<name sortKey="Murray, Ss" uniqKey="Murray S">SS Murray</name>
</author>
<author>
<name sortKey="Topol, Ej" uniqKey="Topol E">EJ Topol</name>
</author>
<author>
<name sortKey="Levy, S" uniqKey="Levy S">S Levy</name>
</author>
<author>
<name sortKey="Frazer, Ka" uniqKey="Frazer K">KA Frazer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dohm, Jc" uniqKey="Dohm J">JC Dohm</name>
</author>
<author>
<name sortKey="Lottaz, C" uniqKey="Lottaz C">C Lottaz</name>
</author>
<author>
<name sortKey="Borodina, T" uniqKey="Borodina T">T Borodina</name>
</author>
<author>
<name sortKey="Himmelbauer, H" uniqKey="Himmelbauer H">H Himmelbauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hillier, Lw" uniqKey="Hillier L">LW Hillier</name>
</author>
<author>
<name sortKey="Marth, Gt" uniqKey="Marth G">GT Marth</name>
</author>
<author>
<name sortKey="Quinlan, Ar" uniqKey="Quinlan A">AR Quinlan</name>
</author>
<author>
<name sortKey="Dooling, D" uniqKey="Dooling D">D Dooling</name>
</author>
<author>
<name sortKey="Fewell, G" uniqKey="Fewell G">G Fewell</name>
</author>
<author>
<name sortKey="Barnett, D" uniqKey="Barnett D">D Barnett</name>
</author>
<author>
<name sortKey="Fox, P" uniqKey="Fox P">P Fox</name>
</author>
<author>
<name sortKey="Glasscock, Ji" uniqKey="Glasscock J">JI Glasscock</name>
</author>
<author>
<name sortKey="Hickenbotham, M" uniqKey="Hickenbotham M">M Hickenbotham</name>
</author>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Magrini, Vj" uniqKey="Magrini V">VJ Magrini</name>
</author>
<author>
<name sortKey="Richt, Rj" uniqKey="Richt R">RJ Richt</name>
</author>
<author>
<name sortKey="Sander, Sn" uniqKey="Sander S">SN Sander</name>
</author>
<author>
<name sortKey="Stewart, Da" uniqKey="Stewart D">DA Stewart</name>
</author>
<author>
<name sortKey="Stromberg, M" uniqKey="Stromberg M">M Stromberg</name>
</author>
<author>
<name sortKey="Tsung, Ef" uniqKey="Tsung E">EF Tsung</name>
</author>
<author>
<name sortKey="Wylie, T" uniqKey="Wylie T">T Wylie</name>
</author>
<author>
<name sortKey="Schedl, T" uniqKey="Schedl T">T Schedl</name>
</author>
<author>
<name sortKey="Wilson, Rk" uniqKey="Wilson R">RK Wilson</name>
</author>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magi, A" uniqKey="Magi A">A Magi</name>
</author>
<author>
<name sortKey="Benelli, M" uniqKey="Benelli M">M Benelli</name>
</author>
<author>
<name sortKey="Marseglia, G" uniqKey="Marseglia G">G Marseglia</name>
</author>
<author>
<name sortKey="Nannetti, G" uniqKey="Nannetti G">G Nannetti</name>
</author>
<author>
<name sortKey="Scordo, Mr" uniqKey="Scordo M">MR Scordo</name>
</author>
<author>
<name sortKey="Torricelli, F" uniqKey="Torricelli F">F Torricelli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benelli, M" uniqKey="Benelli M">M Benelli</name>
</author>
<author>
<name sortKey="Marseglia, G" uniqKey="Marseglia G">G Marseglia</name>
</author>
<author>
<name sortKey="Nannetti, G" uniqKey="Nannetti G">G Nannetti</name>
</author>
<author>
<name sortKey="Paravidino, R" uniqKey="Paravidino R">R Paravidino</name>
</author>
<author>
<name sortKey="Zara, F" uniqKey="Zara F">F Zara</name>
</author>
<author>
<name sortKey="Bricarelli, Fd" uniqKey="Bricarelli F">FD Bricarelli</name>
</author>
<author>
<name sortKey="Torricelli, F" uniqKey="Torricelli F">F Torricelli</name>
</author>
<author>
<name sortKey="Magi, A" uniqKey="Magi A">A Magi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lai, Wr" uniqKey="Lai W">WR Lai</name>
</author>
<author>
<name sortKey="Johnson, Md" uniqKey="Johnson M">MD Johnson</name>
</author>
<author>
<name sortKey="Kucherlapati, R" uniqKey="Kucherlapati R">R Kucherlapati</name>
</author>
<author>
<name sortKey="Park, Pj" uniqKey="Park P">PJ Park</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stark, M" uniqKey="Stark M">M Stark</name>
</author>
<author>
<name sortKey="Hayward, N" uniqKey="Hayward N">N Hayward</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clark, Mj" uniqKey="Clark M">MJ Clark</name>
</author>
<author>
<name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author>
<name sortKey="Lam, Hyk" uniqKey="Lam H">HYK Lam</name>
</author>
<author>
<name sortKey="Karczewski, Kj" uniqKey="Karczewski K">KJ Karczewski</name>
</author>
<author>
<name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author>
<name sortKey="Euskirchen, G" uniqKey="Euskirchen G">G Euskirchen</name>
</author>
<author>
<name sortKey="Butte, Aj" uniqKey="Butte A">AJ Butte</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cooper, Gm" uniqKey="Cooper G">GM Cooper</name>
</author>
<author>
<name sortKey="Coe, Bp" uniqKey="Coe B">BP Coe</name>
</author>
<author>
<name sortKey="Girirajan, S" uniqKey="Girirajan S">S Girirajan</name>
</author>
<author>
<name sortKey="Rosenfeld, Ja" uniqKey="Rosenfeld J">JA Rosenfeld</name>
</author>
<author>
<name sortKey="Vu, Th" uniqKey="Vu T">TH Vu</name>
</author>
<author>
<name sortKey="Baker, C" uniqKey="Baker C">C Baker</name>
</author>
<author>
<name sortKey="Williams, C" uniqKey="Williams C">C Williams</name>
</author>
<author>
<name sortKey="Stalker, H" uniqKey="Stalker H">H Stalker</name>
</author>
<author>
<name sortKey="Hamid, R" uniqKey="Hamid R">R Hamid</name>
</author>
<author>
<name sortKey="Hannig, V" uniqKey="Hannig V">V Hannig</name>
</author>
<author>
<name sortKey="Abdel Hamid, H" uniqKey="Abdel Hamid H">H Abdel-Hamid</name>
</author>
<author>
<name sortKey="Bader, P" uniqKey="Bader P">P Bader</name>
</author>
<author>
<name sortKey="Mccracken, E" uniqKey="Mccracken E">E McCracken</name>
</author>
<author>
<name sortKey="Niyazov, D" uniqKey="Niyazov D">D Niyazov</name>
</author>
<author>
<name sortKey="Leppig, K" uniqKey="Leppig K">K Leppig</name>
</author>
<author>
<name sortKey="Thiese, H" uniqKey="Thiese H">H Thiese</name>
</author>
<author>
<name sortKey="Hummel, M" uniqKey="Hummel M">M Hummel</name>
</author>
<author>
<name sortKey="Alexander, N" uniqKey="Alexander N">N Alexander</name>
</author>
<author>
<name sortKey="Gorski, J" uniqKey="Gorski J">J Gorski</name>
</author>
<author>
<name sortKey="Kussmann, J" uniqKey="Kussmann J">J Kussmann</name>
</author>
<author>
<name sortKey="Shashi, V" uniqKey="Shashi V">V Shashi</name>
</author>
<author>
<name sortKey="Johnson, K" uniqKey="Johnson K">K Johnson</name>
</author>
<author>
<name sortKey="Rehder, C" uniqKey="Rehder C">C Rehder</name>
</author>
<author>
<name sortKey="Ballif, Bc" uniqKey="Ballif B">BC Ballif</name>
</author>
<author>
<name sortKey="Shaffer, Lg" uniqKey="Shaffer L">LG Shaffer</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, R" uniqKey="Li R">R Li</name>
</author>
<author>
<name sortKey="Yu, C" uniqKey="Yu C">C Yu</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Lam, Tw" uniqKey="Lam T">TW Lam</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author>
<name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N Homer</name>
</author>
<author>
<name sortKey="Marth, G" uniqKey="Marth G">G Marth</name>
</author>
<author>
<name sortKey="Abecasis, G" uniqKey="Abecasis G">G Abecasis</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mckenna, A" uniqKey="Mckenna A">A McKenna</name>
</author>
<author>
<name sortKey="Hanna, M" uniqKey="Hanna M">M Hanna</name>
</author>
<author>
<name sortKey="Banks, E" uniqKey="Banks E">E Banks</name>
</author>
<author>
<name sortKey="Sivachenko, A" uniqKey="Sivachenko A">A Sivachenko</name>
</author>
<author>
<name sortKey="Cibulskis, K" uniqKey="Cibulskis K">K Cibulskis</name>
</author>
<author>
<name sortKey="Kernytsky, A" uniqKey="Kernytsky A">A Kernytsky</name>
</author>
<author>
<name sortKey="Garimella, K" uniqKey="Garimella K">K Garimella</name>
</author>
<author>
<name sortKey="Altshuler, D" uniqKey="Altshuler D">D Altshuler</name>
</author>
<author>
<name sortKey="Gabriel, S" uniqKey="Gabriel S">S Gabriel</name>
</author>
<author>
<name sortKey="Daly, M" uniqKey="Daly M">M Daly</name>
</author>
<author>
<name sortKey="Depristo, Ma" uniqKey="Depristo M">MA DePristo</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koehler, R" uniqKey="Koehler R">R Koehler</name>
</author>
<author>
<name sortKey="Issac, H" uniqKey="Issac H">H Issac</name>
</author>
<author>
<name sortKey="Cloonan, N" uniqKey="Cloonan N">N Cloonan</name>
</author>
<author>
<name sortKey="Grimmond, Sm" uniqKey="Grimmond S">SM Grimmond</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Genome Biol</journal-id>
<journal-id journal-id-type="iso-abbrev">Genome Biol</journal-id>
<journal-title-group>
<journal-title>Genome Biology</journal-title>
</journal-title-group>
<issn pub-type="ppub">1465-6906</issn>
<issn pub-type="epub">1465-6914</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24172663</article-id>
<article-id pub-id-type="pmc">4053953</article-id>
<article-id pub-id-type="publisher-id">gb-2013-14-10-r120</article-id>
<article-id pub-id-type="doi">10.1186/gb-2013-14-10-r120</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Method</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>EXCAVATOR: detecting copy number variants from whole-exome sequencing data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" equal-contrib="yes" id="A1">
<name>
<surname>Magi</surname>
<given-names>Alberto</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>albertomagi@gmail.com</email>
</contrib>
<contrib contrib-type="author" corresp="yes" equal-contrib="yes" id="A2">
<name>
<surname>Tattini</surname>
<given-names>Lorenzo</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>lorenzotattini@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A3">
<name>
<surname>Cifola</surname>
<given-names>Ingrid</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>ingrid.cifola@itb.cnr.it</email>
</contrib>
<contrib contrib-type="author" id="A4">
<name>
<surname>D’Aurizio</surname>
<given-names>Romina</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>romina.daurizio@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A5">
<name>
<surname>Benelli</surname>
<given-names>Matteo</given-names>
</name>
<xref ref-type="aff" rid="I5">5</xref>
<email>matteo.benelli@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A6">
<name>
<surname>Mangano</surname>
<given-names>Eleonora</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>eleonora.mangano@itb.cnr.it</email>
</contrib>
<contrib contrib-type="author" id="A7">
<name>
<surname>Battaglia</surname>
<given-names>Cristina</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<xref ref-type="aff" rid="I6">6</xref>
<email>cristina.battaglia@unimi.it</email>
</contrib>
<contrib contrib-type="author" id="A8">
<name>
<surname>Bonora</surname>
<given-names>Elena</given-names>
</name>
<xref ref-type="aff" rid="I7">7</xref>
<email>elena.bonora6@unibo.it</email>
</contrib>
<contrib contrib-type="author" id="A9">
<name>
<surname>Kurg</surname>
<given-names>Ants</given-names>
</name>
<xref ref-type="aff" rid="I8">8</xref>
<email>akurg@ebc.ee</email>
</contrib>
<contrib contrib-type="author" id="A10">
<name>
<surname>Seri</surname>
<given-names>Marco</given-names>
</name>
<xref ref-type="aff" rid="I7">7</xref>
<email>marco.seri@unibo.it</email>
</contrib>
<contrib contrib-type="author" id="A11">
<name>
<surname>Magini</surname>
<given-names>Pamela</given-names>
</name>
<xref ref-type="aff" rid="I7">7</xref>
<email>pamela.magini@unibo.it</email>
</contrib>
<contrib contrib-type="author" id="A12">
<name>
<surname>Giusti</surname>
<given-names>Betti</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>betti.giusti@unifi.it</email>
</contrib>
<contrib contrib-type="author" id="A13">
<name>
<surname>Romeo</surname>
<given-names>Giovanni</given-names>
</name>
<xref ref-type="aff" rid="I7">7</xref>
<email>egf.giovanni.romeo@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A14">
<name>
<surname>Pippucci</surname>
<given-names>Tommaso</given-names>
</name>
<xref ref-type="aff" rid="I7">7</xref>
<email>tommaso.pippucci@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A15">
<name>
<surname>Bellis</surname>
<given-names>Gianluca De</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>gianluca.debellis@itb.cnr.it</email>
</contrib>
<contrib contrib-type="author" id="A16">
<name>
<surname>Abbate</surname>
<given-names>Rosanna</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>rosanna.abbate@unifi.it</email>
</contrib>
<contrib contrib-type="author" id="A17">
<name>
<surname>Gensini</surname>
<given-names>Gian Franco</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>gfgensini@gmail.com</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Clinical and Experimental Medicine, University of Florence, Florence, Italy</aff>
<aff id="I2">
<label>2</label>
Laboratory of Molecular Genetics, G. Gaslini Institute, Genoa, Italy</aff>
<aff id="I3">
<label>3</label>
Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy</aff>
<aff id="I4">
<label>4</label>
Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy</aff>
<aff id="I5">
<label>5</label>
Diagnostic Genetic Unit, Careggi Hospital, Florence, Italy</aff>
<aff id="I6">
<label>6</label>
Dipartimento di Biotecnologie Mediche e Medicina Traslazionale (BIOMETRA), University of Milan, Milan, Italy</aff>
<aff id="I7">
<label>7</label>
Medical Genetics Unit, Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy</aff>
<aff id="I8">
<label>8</label>
Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia</aff>
<pub-date pub-type="ppub">
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>30</day>
<month>10</month>
<year>2013</year>
</pub-date>
<volume>14</volume>
<issue>10</issue>
<fpage>R120</fpage>
<lpage>R120</lpage>
<history>
<date date-type="received">
<day>15</day>
<month>6</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>10</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2013 Magi et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Magi et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://genomebiology.com/2013/14/10/R120"></self-uri>
<abstract>
<sec>
<title></title>
<p>We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/excavatortool/">http://sourceforge.net/projects/excavatortool/</ext-link>
.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>Copy number variants (CNVs) are operationally defined as 50 bp or larger DNA segments [
<xref ref-type="bibr" rid="B1">1</xref>
] that are present at a variable copy number in comparison with a reference genome. CNVs have been demonstrated to be one of the main sources of genomic variation in humans [
<xref ref-type="bibr" rid="B2">2</xref>
-
<xref ref-type="bibr" rid="B10">10</xref>
] and have been shown to participate in phenotypic variation and adaptation by disrupting genes and altering gene dosage. Some CNVs are found in normal individuals, while others contribute to causing various diseases including cancer, cardiovascular disease, HIV acquisition and progression, autoimmune diseases and Alzheimer’s and Parkinson’s diseases [
<xref ref-type="bibr" rid="B11">11</xref>
,
<xref ref-type="bibr" rid="B12">12</xref>
].</p>
<p>In the last few years, several high-throughput sequencing (HTS) platforms [
<xref ref-type="bibr" rid="B13">13</xref>
-
<xref ref-type="bibr" rid="B15">15</xref>
] have emerged that, by simultaneously sequencing billions of short DNA fragments (reads), can be used to sequence a full human genome per week at a cost 400-fold less than previous methods. The development of these HTS platforms has made large-scale re-sequencing projects possible, such as the 1000 Genomes Project and the Cancer Genome Atlas, but their computational complexity still limits the routine use of whole-genome sequencing to individual smaller projects. Whole-exome sequencing (WES), which is the sequencing of all the coding regions of a genome, is a very effective alternative to whole-genome sequencing and has been successfully used to discover common and rare single nucleotide variants (SNVs), small insertions/deletions (indels) and breakpoints of structural variation [
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B17">17</xref>
].</p>
<p>Although WES is a powerful tool for investigating the great majority of genomic variants, it is unsuitable for analyzing CNVs: the sparse nature of the target and the non-uniform read-depth among captured regions make WES data unsuitable for read-pair [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B19">19</xref>
] or split-read [
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
] algorithms and make the read count (RC) approach particularly challenging [
<xref ref-type="bibr" rid="B22">22</xref>
-
<xref ref-type="bibr" rid="B24">24</xref>
]. At present, there are a few publicly available tools that can identify CNVs from WES data using the RC approach: ExomeCNV [
<xref ref-type="bibr" rid="B25">25</xref>
], CoNIFER [
<xref ref-type="bibr" rid="B26">26</xref>
], XHMM [
<xref ref-type="bibr" rid="B27">27</xref>
] and CONTRA [
<xref ref-type="bibr" rid="B28">28</xref>
].</p>
<p>ExomeCNV was the first tool implemented to detect CNVs from WES data. It uses a two-step normalization procedure to mitigate systematic biases due to GC content and mappability, and it estimates copy number values using an uncalibrated read depth. Depending upon batch effects, this can result in the algorithm reporting a significant fraction of the exome as non-diploid. ExomeCNV uses the circular binary segmentation (CBS) algorithm [
<xref ref-type="bibr" rid="B29">29</xref>
] to detect the boundaries of altered regions. CBS does not take into account the distance between adjacent exons and this can lead to it missing large and small genomic alterations in sparsely targeted regions, when applied to WES data [
<xref ref-type="bibr" rid="B30">30</xref>
]. CoNIFER and XHMM exploit singular value decomposition (SVD) and principal-component analysis (PCA) to identify and remove the principal sources of variation underlying the non-uniform read depth of captured regions. The SVD and PCA normalization procedures require the analysis of many samples at once, thus limiting their application to sequencing projects with a large number of samples.</p>
<p>CONTRA uses a base-level log-ratio strategy to remove GC content bias and correct for the library size effect. Nevertheless, it has been demonstrated that the ratio between the RCs of case and control samples is not able to remove GC content bias completely [
<xref ref-type="bibr" rid="B31">31</xref>
]. Moreover, all of these tools classify each genomic region according to a three-state classification scheme (deletion, normal and amplification), which does not discriminate between two- and single-copy deletions and between three- and multiple-copy amplifications, thus limiting the potential of RC data to predict the exact number of DNA copies.</p>
<p>To overcome the limitations of existing methods in detecting genomic regions involved in CNV using WES data, we developed a novel software package, EXCAVATOR (EXome Copy number Alterations/Variations annotATOR), which uses a RC approach. We studied the systematic biases of sequencing data causing the non-uniform read depth of captured regions and we developed a three-step normalization procedure that mitigates the effects of these biases. To take into account the sparseness of WES data throughout the genome, we developed a novel segmentation algorithm that exploits the distances between consecutive exons to improve the detection of small and large altered regions covered by few exons. Finally, we combined our normalization and segmentation methods with a calling procedure to classify each genomic region as one of five discrete copy number states and we packaged everything into the EXCAVATOR software tool.</p>
<p>We tested the EXCAVATOR pipeline by analyzing three different WES datasets: a population dataset generated by the 1000 Genomes Project Consortium and two datasets generated in our labs comprising melanoma cancer and intellectual disability samples. To evaluate its performance, we compared the results obtained by EXCAVATOR with three other state-of-the-art pipelines. Furthermore, we validated the results obtained by EXCAVATOR using copy number profiles generated by SNP array technology, demonstrating its power and versatility for discovering small and large genomic regions involved in CNVs.</p>
</sec>
<sec>
<title>Results and discussion</title>
<sec>
<title>Data biases and correction</title>
<p>To study DNA copy number variations from targeted sequencing data, we consider the mean number of reads aligned to each exon, that is the exon mean read count (EMRC). EMRC is defined as: </p>
<p>
<disp-formula id="bmcM1">
<label>(1)</label>
<mml:math id="M1" name="gb-2013-14-10-r120-i1" overflow="scroll">
<mml:msub>
<mml:mrow>
<mml:mtext>EMRC</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>RC</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:math>
</disp-formula>
</p>
<p>where RC
<sub>
<italic>e</italic>
</sub>
 is the number of reads aligned to a target genomic region
<italic>e</italic>
 and
<italic>L</italic>
<sub>
<italic>e</italic>
</sub>
 is the size of that same genomic region (in base pairs). EMRC is calculated for each targeted region of the genome and gives a measure of the density of reads aligned to that particular region. To study the statistical properties and the sources of bias of EMRC data we exploited the WES data of eight individuals sequenced by the 1000 Genomes Project Consortium (NA10847, NA19131, NA19138, NA19152, NA19153, NA19159, NA19206 and NA19223); see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
for more details.</p>
<p>First, we studied the relation between EMRC and three bias sources: the local GC content percentage, the genomic mappability and the size of the targeted regions (see Materials and methods for more details). The results of these analyses are shown in Figure 
<xref ref-type="fig" rid="F1">1</xref>
. In agreement with previous reports [
<xref ref-type="bibr" rid="B31">31</xref>
-
<xref ref-type="bibr" rid="B34">34</xref>
], we observed that EMRC is strongly correlated to the local GC content percentage: it is highest for values of GC content between 35% and 60% while it decreases at both extremes (Figure 
<xref ref-type="fig" rid="F1">1</xref>
a). As previously reported for RC data [
<xref ref-type="bibr" rid="B31">31</xref>
], we found that EMRCs are affected by genomic mappability: the larger the mappability score, the smaller the EMRC distribution variance. Moreover, mappability affects the mean number of aligned reads (Figure 
<xref ref-type="fig" rid="F1">1</xref>
c). Interestingly, our analysis indicated that the mean number of reads aligned to a targeted region of the genome is correlated to the size of that region. In particular, for exons smaller than 150 bp, we found that the EMRC value grows as a function of targeted region size, while for exons larger than 150 bp, EMRC reaches a plateau and remains constant (Figure 
<xref ref-type="fig" rid="F1">1</xref>
e). These results show that EMRC data require a normalization step before being used to detect genomic regions involved in CNVs.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>EMRC data biases, normalization and CNV prediction ability. </bold>
<bold>(a)</bold>
,
<bold>(c)</bold>
,
<bold>(e)</bold>
Correlation between EMRC data and the three bias types due to GC content percentage
<bold>(a)</bold>
, genomic mappability
<bold>(c)</bold>
 and exon size
<bold>(e)</bold>
.
<bold>(b)</bold>
,
<bold>(d)</bold>
,
<bold>(f)</bold>
The effect of the median normalization procedures on the removal of the three bias sources: GC content percentage correction
<bold>(b)</bold>
, genomic mappability correction
<bold>(d)</bold>
and exon size correction
<bold>(f)</bold>
. The upper border of the dashed lines is the 90th percentile of the EMRCs, while the lower border is the 10th percentile.
<bold>(g)</bold>
,
<bold>(h)</bold>
,
<bold>(i)</bold>
,
<bold>(j)</bold>
Histograms and boxplots summarizing the capability of EMRC data to predict the exact number of DNA copies of a CNV region.
<bold>(g)</bold>
 and
<bold>(i)</bold>
 show the prediction capability for single-sample EMRC data, while
<bold>(h)</bold>
 and
<bold>(j)</bold>
 are the prediction capability for the EMRC ratio. EMRC ratios were calculated by using the NA10847 sample as control. These calculations were performed using several broad genomic regions that were previously reported to have copy numbers equal to 0, 1, 2, 3 and 4 by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] in the eight samples from the 1000 Genomes Project.
<italic>R</italic>
 is the Pearson correlation coefficient.CNV, copy number variant; EMRC, exon mean read count.</p>
</caption>
<graphic xlink:href="gb-2013-14-10-r120-1"></graphic>
</fig>
<p>To minimize the effect of these sources of variation and make the data within and between samples comparable, we implemented a three-step bias removal procedure based on the median normalization approach introduced in [
<xref ref-type="bibr" rid="B23">23</xref>
] for the removal of the GC content effect and extended in [
<xref ref-type="bibr" rid="B31">31</xref>
] for mitigating mappability bias (see Materials and methods for more details). To evaluate the performance of the median normalization procedures described in the Materials and methods section, we applied them to the WES data of the eight samples generated by the 1000 Genomes Project Consortium. The normalized data show in Figure 
<xref ref-type="fig" rid="F1">1</xref>
b,d,f demonstrate that median normalization approaches are able to mitigate the effect of all three bias sources, equalizing the mean level of each bin to the same master mean.</p>
<p>Since the first exon of each gene is GC richer than the final and internal exons, this bias can affect the detection of CNVs that include first exons. To investigate the capability of our normalization procedure to mitigate the first exon effect, we compared the distribution of EMRC values for first and all other exons before and after the normalization step. The results of this analysis are reported in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S1. As expected, the mean level of EMRC values for first exons is smaller than EMRC values for internal and final exons. Nevertheless, normalization allows for the removal of this difference, equalizing the mean levels of EMRC values for first exons and all other exons. Next, to understand the capability of EMRC data to predict the exact DNA copy number values of a genomic region, we examined several broad genomic regions that were previously reported to have copy numbers equal to 0, 1, 2, 3 or 4 by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] for the eight samples (see Materials and methods). In this analysis we compared the distribution and the CNV prediction capability for both single-sample EMRC data and the ratio between EMRC data from two samples.</p>
<p>The histograms in Figure 
<xref ref-type="fig" rid="F1">1</xref>
g show that for single-sample data (with the median normalized to copy number two), the EMRC distributions for genomic regions with different DNA copy number states have a significant overlap and completely fail to predict the exact number of copies, as shown in Figure 
<xref ref-type="fig" rid="F1">1</xref>
i, where the Pearson correlation coefficient calculated between the real and predicted DNA copy number values is
<italic>R </italic>
= 0.19. On the other hand, the EMRC ratio between two samples allows for a better discrimination of genomic regions with different numbers of DNA copies, as illustrated in Figures 
<xref ref-type="fig" rid="F1">1</xref>
h and 1j, where the Pearson correlation coefficient between the real and predicted DNA copy number values is
<italic>R </italic>
= 0.80. Remarkably, as shown in Figure 
<xref ref-type="fig" rid="F1">1</xref>
j, normalized ERMC ratios can distinguish between even intermediate CN ratios, such as 2/3, 3/4, 4/5 and 3/2, 4/3, 4/2, despite their overlapping distributions. For these reasons, in all the analyses we performed for this work, we decided to use the ratio between EMRC data from test and control samples to identify genomic regions involved in CNVs: in particular, we chose to use the log-transformed ratio (log
<sub>2</sub>
ratio) between test and control samples normalized with the LOWESS scatter plot normalization procedure (see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
for more details).</p>
</sec>
<sec>
<title>Segmentation and calling algorithms</title>
<p>After EMRC bias correction, we calculated the logarithm of the ratio between test and control samples (log
<sub>2 </sub>
ratio) and we sorted the data with respect to their genomic position. The obtained signal is mathematically very similar to those generated by RC analysis [
<xref ref-type="bibr" rid="B31">31</xref>
]: deletions (or amplifications) are identified as a signal decrease (or increase) across multiple consecutive targeted regions. For this reason, as in RC data analysis, the log
<sub>2 </sub>
ratios of EMRC data need to undergo a segmentation step to detect the boundaries of the genomic regions with altered DNA copy number. The only difference between RC and EMRC data is the distance between consecutive genomic regions: RCs are estimated for non-overlapping and contiguous genomic windows with predefined lengths, while EMRCs are calculated for genomic windows (corresponding to targeted regions) with different sizes and variable distance. The distance between consecutive exons within the same gene ranges from few base pairs to 100 kb (with a median value of 1500 bp), while the distance between consecutive genes (calculated as the distance between the final exon of a gene and the first exon of the subsequent gene) ranges from hundreds of base pairs to millions of base pairs (with a median value of 25 kb). For this reason, we can find genomic regions comprising a large number of exons as well as highly isolated genomic regions with few exons using the log
<sub>2 </sub>
ratio of EMRC profiles.</p>
<p>To take into account this peculiar characteristic of EMRC data, we extended the shifting level model (SLM) segmentation algorithm [
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B35">35</xref>
] to include the distance between consecutive exons (defined as the distance between the midpoints of consecutive exons). In SLM, sequential observations
<italic>x </italic>
= (
<italic>x</italic>
<sub>1</sub>
,…,
<italic>x</italic>
<sub>
<italic>i</italic>
</sub>
,…,
<italic>x</italic>
<sub>
<italic>N</italic>
</sub>
) are considered to be realizations of the sum of two independent stochastic processes
<italic>x</italic>
<sub>
<italic>i </italic>
</sub>
=
<italic>m</italic>
<sub>
<italic>i </italic>
</sub>
+
<italic>ε</italic>
<sub>
<italic>i</italic>
</sub>
, where
<italic>m</italic>
<sub>
<italic>i </italic>
</sub>
is the unobserved mean level and
<italic>ε</italic>
<sub>
<italic>i </italic>
</sub>
is normally distributed white noise. The mean level
<italic>m</italic>
<sub>
<italic>i </italic>
</sub>
does not change for long intervals and its duration follows a geometric distribution: the probability that
<italic>m</italic>
<sub>
<italic>i </italic>
</sub>
takes a new value at any point
<italic>i </italic>
is regulated by the parameter
<italic>η</italic>
. We included the distance between consecutive exons (
<italic>d</italic>
<sub>
<italic>i</italic>
</sub>
) in the SLM by defining the parameter
<italic>η </italic>
as: </p>
<p>
<disp-formula id="bmcM2">
<label>(2)</label>
<mml:math id="M2" name="gb-2013-14-10-r120-i2" overflow="scroll">
<mml:mo>η</mml:mo>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mtext>exp</mml:mtext>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>Norm</mml:mtext>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
</mml:math>
</disp-formula>
</p>
<p>where
<italic>d</italic>
<sub>Norm </sub>
is a distance normalization parameter. We thus obtained a heterogeneous shifting level model (HSLM) in which as the genomic distance between consecutive exons increases, so does the probability of jumping from one state to another. This feature allows the HSLM algorithm to detect both highly isolated genomic regions covered by few exons and large genomic regions covered by many exons with a comparable accuracy. A detailed description of the heterogeneous shifting level model and its algorithm is given in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
.</p>
<p>Once the log
<sub>2 </sub>
ratios have been segmented with the HSLM algorithm, each segment needs to be classified as a discrete copy number state. As reported in the Background section, all of the recently published tools can classify genomic regions using a three-state classification scheme (deletion, normal and amplification), which limits the potential of RC data to predict two-copy deletions and multiple-copy amplifications. To overcome these limitations, we decided to exploit the FastCall algorithm [
<xref ref-type="bibr" rid="B36">36</xref>
], which we developed to classify array-CGH (comparative genomic hybridization) data, by applying it to WES data. The FastCall algorithm can classify each segmented region using a five-state classification scheme (two-copy deletion, one-copy deletion, normal, one-copy duplication and multiple-copy amplification) and thus we can discriminate double-copy from single-copy deletions and single-copy from multiple-copy duplications (see Materials and methods for more details). All the algorithms and methods described above have been packaged in the EXCAVATOR software (see Materials and methods).</p>
<p>To test the ability of the HSLM algorithm to detect CNVs of different sizes as a function of the distance between consecutive exons, we performed an intensive simulation based on synthetic data. Synthetic chromosomes were generated from the EMRC data of the eight samples described above and previously characterized by [
<xref ref-type="bibr" rid="B7">7</xref>
]: there were seven samples of Yoruba ancestry (NA19131, NA19138, NA19152, NA19153, NA19159, NA19206 and NA19223) and one sample of Caucasian ancestry (NA10847). The EMRC data were first corrected for the three bias sources and then the EMRC log
<sub>2 </sub>
ratio was calculated using each possible combination with one sample as control and the other seven samples as tests. To reproduce the complex architecture of exome data, we generated synthetic chromosomes using synthetic genes as building blocks. Each synthetic gene, with the exception of
<italic>g </italic>
genes (the altered genes), has a random number of exons sampled from a uniform distribution
<italic>U </italic>
(5,100) (that is, the number of exons ranges from 5 to 100). The number of exons in the altered genes is defined by the integer parameter
<italic>N </italic>
and the total number of exons in each synthetic chromosome is constrained to be 1,000. The distances between adjacent exons that belong to the same gene are sampled from a uniform distribution
<italic>U </italic>
(10,10000) (ranging from 10 to 10,000 bp), while the distance between adjacent genes is set equal to a predefined distance
<italic>D</italic>
. The DNA copy number values of each synthetic chromosome were generated by exploiting the results reported in [
<xref ref-type="bibr" rid="B7">7</xref>
]. To simulate normal copy regions, we sampled (1000 −
<italic>N</italic>
) log2 ratio data from genomic regions previously predicted as two-copy in [
<xref ref-type="bibr" rid="B7">7</xref>
] for both test and control samples and to simulate one-copy (three-copy) regions, we sampled
<italic>N </italic>
log2 ratio data from regions previously predicted as one-copy (three-copy) for the test sample and two-copy for the control sample.</p>
<p>We performed simulations with
<italic>g </italic>
= [ 1,2,3,4,5],
<italic>N </italic>
= [ 2,3,5,10,20,50] and
<italic>D </italic>
= [10 kb, 50 kb, 100 kb, 500 kb, 1 Mb, 5 Mb] and for all combinations of
<italic>g</italic>
,
<italic>N </italic>
and
<italic>D </italic>
we generated 1,000 synthetic chromosomes: all the synthetic datasets were analyzed using different values of the parameter
<italic>D</italic>
<sub>Norm </sub>
(10
<sup>3</sup>
, 10
<sup>4</sup>
, 10
<sup>5 </sup>
or 10
<sup>6</sup>
).</p>
<p>To assess the accuracy of HSLM in detecting CNVs at the boundaries (breakpoint detection) we computed the receiver operating characteristic (ROC) curve as in [
<xref ref-type="bibr" rid="B37">37</xref>
] and we compared its performance to that of the circular binary segmentation (CBS) algorithm [
<xref ref-type="bibr" rid="B29">29</xref>
], which has been used in other traditional packages for exome-CNV analysis, such as ExomeCNV [
<xref ref-type="bibr" rid="B25">25</xref>
] and VarScan2 [
<xref ref-type="bibr" rid="B30">30</xref>
]. The results of these analyses are summarized in Figure 
<xref ref-type="fig" rid="F2">2</xref>
a,b,c,d and Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figures S2 to S49. Overall they show that our segmentation algorithm outperforms the CBS method in both sensitivity and specificity for all the alteration sizes we simulated. Panels c and d of Figure 
<xref ref-type="fig" rid="F2">2</xref>
also show that the larger the number of altered regions in a chromosome, the lower the accuracy of the CBS method. On the other hand, increasing the number of altered regions in a chromosome does not affect the global performance of HSLM. Remarkably, synthetic analysis indicates there is a difference in the accuracy of detection of genomic regions with one copy and three copies. Both CBS and HSLM detect one-copy regions with higher sensitivity than three-copy regions and this behavior can be ascribed to two main reasons. The first is numerical: the signal shift for three-copy regions (log2(3/2) = 0.58) is smaller than the signal shift for one-copy regions (log2(1/2) = − 1) and the segmentation algorithms are sensitive to the extent of this shift. The second reason lies in the fact that the variance of RC data is lower for deleted states (zero or one copy) and it proportionally increases with copy number values [
<xref ref-type="bibr" rid="B23">23</xref>
]: the larger the variance, the smaller the sensitivity of segmentation algorithms in detecting signal shifts.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Performance evaluation of the HSLM algorithm for detecting CNVs in synthetic chromosomes. </bold>
<bold>(a)</bold>
,
<bold>(b)</bold>
 ROC curves comparing the sensitivity and specificity of the HSLM and CBS algorithms in the detection of one-copy
<bold>(a)</bold>
 and three-copy CNVs
<bold>(b)</bold>
.
<bold>(c)</bold>
,
<bold>(d)</bold>
 Comparisom of the HSLM and CBS algorithms when analyzing synthetic chromosomes with different numbers (
<italic>g</italic>
 = [ 1,2,3,4,5]) of one-copy
<bold>(c)</bold>
 and three-copy
<bold>(d)</bold>
 genes.
<bold>(e)</bold>
,
<bold>(f)</bold>
 Performance of the HSLM
<bold>(e)</bold>
 and CBS
<bold>(f)</bold>
 algorithms in detecting the correct breakpoint position. The
<italic>x</italic>
 axis is the distance between the predicted and the correct position. The
<italic>y</italic>
 axis is the percentage of breakpoints predicted at a given distance from the correct position.
<bold>(g)</bold>
,
<bold>(h)</bold>
,
<bold>(i)</bold>
,
<bold>(j)</bold>
 TPR and FP plots for different values of the
<italic>D</italic>
<sub>Norm</sub>
 parameter versus exon number in the segmented region.
<bold>(g)</bold>
 and
<bold>(h)</bold>
 show TPR and FP when analyzing one-copy regions.
<bold>(i)</bold>
 and
<bold>(j)</bold>
 are TPR and FP when analyzing three-copy regions. Each curve point was obtained by averaging across 5,000 simulations (1,000 synthetic chromosomes for
<italic>g</italic>
 = [ 1,2,3,4,5]).CBS, circular binary segmentation; CNV, copy number variant; FPR, false positive rate; HSLM, heterogeneous shifting level model; TPR, true positive rate.</p>
</caption>
<graphic xlink:href="gb-2013-14-10-r120-2"></graphic>
</fig>
<p>As a further test, to assess the ability of our segmentation algorithm to identify the exact breakpoint of a CNV region correctly, for each synthetic chromosome we calculated the distance (in exons) between the predicted and the correct breakpoint positions and we compared its performance with CBS. The results of these analyses are shown in the histograms of Figure 
<xref ref-type="fig" rid="F2">2</xref>
c,d, which show that HLSM can correctly detect the exact position of 94% of the breakpoints on synthetic chromosomes, while CBS predicted the exact position only of 50% of the breakpoints. Finally, to evaluate the capability of the HSLM and FastCall procedures in discovering CNVs, we exploited the method reported in [
<xref ref-type="bibr" rid="B23">23</xref>
] and [
<xref ref-type="bibr" rid="B22">22</xref>
]: a detected segment is considered a true positive (TP) if there is at least a 50% overlap between the detected segment and the synthetic altered region, while it is considered a false positive (FP) if there is no overlap with a synthetic altered region. Moreover, to better investigate the FP events detected by HSLM we generated synthetic chromosomes with no altered regions (
<italic>g </italic>
= 0). The true positive rate (TPR) and false positive (FP) plots reported in Figure 
<xref ref-type="fig" rid="F2">2</xref>
g,h,i,j and Additional 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figures S50 to S56 show that the larger the distance between adjacent genes (
<italic>D</italic>
) the higher the sensitivity of HSLM in detecting genomic alterations. This feature is a direct consequence of how we modeled the parameter
<italic>η </italic>
(
<italic>d</italic>
<sub>
<italic>i</italic>
</sub>
) of the HSLM (the larger the genomic distance
<italic>D</italic>
 the larger the probability of jumping from one mean level
<italic>m</italic>
<sub>
<italic>i </italic>
</sub>
to another
<italic>m</italic>
<sub>
<italic>i </italic>
+ 1</sub>
) and this allows our algorithm to detect both highly isolated genomic regions covered by few exons and large genomic regions covered by many exons with a comparable accuracy. For genomic distances
<italic>D </italic>
smaller than 500 kb, we were able to detect one-copy regions with ten exons (TPR = 0.99) and three-copy regions with 20 exons (TPR = 0.8), while for
<italic>D </italic>
≥ 1 Mb we detected one-copy regions with three exons (TPR = 0.95) and three-copies regions with ten exons (TPR = 0.8). Finally, the analysis of the synthetic chromosomes demonstrated that the
<italic>D</italic>
<sub>Norm </sub>
parameter is fundamental for modulating the resolution of our algorithm. As expected, the results shown in Figure 
<xref ref-type="fig" rid="F2">2</xref>
and Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figures S50 to S55 show that the smaller the value of
<italic>D</italic>
<sub>Norm </sub>
the stronger the ability of HSLM to detect small genomic events. On the other hand, small values of the
<italic>D</italic>
<sub>Norm </sub>
also increase the total number of FP events detected. However, in terms of specificity, our method detected a very small number of FP events, the great majority of them (96%) being events that include less than five exons (see panels h and j of Figure 
<xref ref-type="fig" rid="F2">2</xref>
and Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S56).</p>
</sec>
<sec>
<title>Population data analysis</title>
<p>To show the potential of our analysis pipeline for population genomics studies, we applied EXCAVATOR on the WES data of 20 healthy individuals (seven Utah residents (CEU) with ancestors from northern and Western European, seven Japanese people (JPT) from Tokyo and six Yoruba people (YRI) from Ibadan) using the WES data of an individual of Yoruba ancestry as control (see Table 
<xref ref-type="table" rid="T1">1</xref>
). The table shows the total number of samples used as tests and controls, the enrichment kit used to capture coding sequences, the sequencing platform and the sequencing depth obtained for test and control samples.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Summary statistics of the three datasets analyzed in this paper</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left" valign="bottom">
<bold>Cohort</bold>
<hr></hr>
</th>
<th align="center" valign="bottom">
<bold>Test</bold>
<hr></hr>
</th>
<th align="center" valign="bottom">
<bold>Control</bold>
<hr></hr>
</th>
<th align="center" valign="bottom">
<bold>Capture</bold>
<hr></hr>
</th>
<th align="center" valign="bottom">
<bold>HTS</bold>
<hr></hr>
</th>
<th align="center" valign="bottom">
<bold>Mean depth</bold>
<hr></hr>
</th>
<th align="center" valign="bottom">
<bold>Mean depth</bold>
<hr></hr>
</th>
</tr>
<tr>
<th align="left"> </th>
<th align="center">
<bold>samples</bold>
</th>
<th align="center">
<bold>samples</bold>
</th>
<th align="center">
<bold>version</bold>
</th>
<th align="center">
<bold>platform</bold>
</th>
<th align="center">
<bold>on tests</bold>
</th>
<th align="center">
<bold>on controls</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">1000 Genomes Project
<hr></hr>
</td>
<td align="center" valign="bottom">20
<hr></hr>
</td>
<td align="center" valign="bottom">1
<hr></hr>
</td>
<td align="center" valign="bottom">SureSelect
<hr></hr>
</td>
<td align="center" valign="bottom">HiSeq2000
<hr></hr>
</td>
<td align="center" valign="bottom">83 ×
<hr></hr>
</td>
<td align="center" valign="bottom">107 ×
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">All Exon V2
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Melanoma
<hr></hr>
</td>
<td align="center" valign="bottom">6
<hr></hr>
</td>
<td align="center" valign="bottom">6
<hr></hr>
</td>
<td align="center" valign="bottom">SureSelect
<hr></hr>
</td>
<td align="center" valign="bottom">GA IIx
<hr></hr>
</td>
<td align="center" valign="bottom">45 ×
<hr></hr>
</td>
<td align="center" valign="bottom">41 ×
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">All Exon 50 Mb
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Intellectual disability
<hr></hr>
</td>
<td align="center" valign="bottom">2
<hr></hr>
</td>
<td align="center" valign="bottom">1
<hr></hr>
</td>
<td align="center" valign="bottom">TruSeq
<hr></hr>
</td>
<td align="center" valign="bottom">HiSeq2000
<hr></hr>
</td>
<td align="center" valign="bottom">63 ×
<hr></hr>
</td>
<td align="center" valign="bottom">65 ×
<hr></hr>
</td>
</tr>
<tr>
<td align="left"> </td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center">Exome enrichment</td>
<td align="center"> </td>
<td align="center"> </td>
<td align="center"> </td>
</tr>
</tbody>
</table>
</table-wrap>
<p>According to the Fort Lauderdale principle for the use of unpublished data for method development, we give only the CNV regions detected on chromosome 1 and chromosome 4. Globally we detected 101 CNV events (with a median number of five CNV regions per sample), with a minimum of two regions for the NA12760 sample and a maximum of eight regions for the NA10847 sample. The mean size of these regions was approximately 135 kb, with a minimum size of approximately 5 kb in 11 samples (NA10847, NA11840, NA12717, NA12751, NA12760, NA18959, NA18973, NA19138, NA19159, NA19206 and NA19223) and a maximum size of approximately 900 kb in eight samples (NA10847, NA12249, NA12717, NA12751, NA12761, NA18973, NA18959 and NA18981). The complete list of the CNVs detected on chromosomes 1 and 4 is given in Additional file 
<xref ref-type="supplementary-material" rid="S2">2</xref>
: Table S1.</p>
<p>To evaluate the accuracy of our computational approach, we analyzed the data for the 20 healthy individuals using the other three recently published methods for CNV calling from WES data: ExomeCNV, CoNIFER and XHMM (see Materials and methods for analysis settings). As reported in Background section, the performance of SVD and PCA methods depends on concurrently analyzing many samples, so that systematic noise becomes evident and can subsequently be removed. For this reason, to improve the accuracy of CoNIFER and XHMM, we used these two tools by adding 80 extra samples to the 20 used with EXCAVATOR and ExomeCNV (see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
for more details). Globally we observed that the total number of CNV events detected by each of the three tools was very different (Table 
<xref ref-type="table" rid="T2">2</xref>
). On chromosomes 1 and 4 of the 20 individuals, CoNIFER detected only 9 CNV regions, XHMM 55 CNVs, while ExomeCNV identified 1,791 events (Table 
<xref ref-type="table" rid="T2">2</xref>
). Of the 9 CNV regions detected by CoNIFER, 6 (66%) are present only in one sample (rare variants) while 3 (33%) are shared by more than one sample (common variants). Similarly, XHMM detected 12 rare CNVs (21.8%, 12/55) and 43 common variants (78.2%). On the other hand, the great majority of the CNV events detected by EXCAVATOR and ExomeCNV are common variants: EXCAVATOR detected 10% of rare variants (10/101) and ExomeCNV 5% (99/1,791). The large proportion of rare events detected by CoNIFER and XHMM could be related to the normalization methods that are the basis of these two computational pipelines: singular value decomposition (SVD) for CoNIFER and principal component analysis (PCA) for XHMM. PCA and SVD are eigenvalue methods used to reduce a high-dimensional dataset into fewer dimensions while retaining important information. CoNIFER and XHMM use them to determine and filter out the principal components of systematic noise. This filtering strategy can lead to the removal of common CNV signals thus explaining the preferential detection of rare events by these methods. Conversely, ExomeCNV and EXCAVATOR analyze and normalize one sample at a time and do not suffer from this bias.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>
<bold>Summary of the CNV events detected by the four tools in the population data analysis</bold>
<sup>
<bold>a</bold>
</sup>
</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Sample</bold>
</th>
<th align="center">
<bold>EXCAVATOR</bold>
</th>
<th align="center">
<bold>XHMM</bold>
</th>
<th align="center">
<bold>CoNIFER</bold>
</th>
<th align="center">
<bold>ExomeCNV</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">NA10847
<hr></hr>
</td>
<td align="center" valign="bottom">8 (6-2)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">125 (122-3)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA11840
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (2-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">124 (122-2)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA12249
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">128 (128-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA12717
<hr></hr>
</td>
<td align="center" valign="bottom">6 (6-0)
<hr></hr>
</td>
<td align="center" valign="bottom">4 (4-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">113 (113-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA12751
<hr></hr>
</td>
<td align="center" valign="bottom">7 (5-2)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (2-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">119 (118-1)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA12760
<hr></hr>
</td>
<td align="center" valign="bottom">2 (2-0)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (2-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">126 (126-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA12761
<hr></hr>
</td>
<td align="center" valign="bottom">4 (2-2)
<hr></hr>
</td>
<td align="center" valign="bottom">4 (3-1)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">206 (173-33)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18959
<hr></hr>
</td>
<td align="center" valign="bottom">6 (6-0)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (1-1)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">149 (134-15)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18966
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">5 (4-1)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">39 (35-4)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18967
<hr></hr>
</td>
<td align="center" valign="bottom">5 (5-0)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (1-1)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">21 (21-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18970
<hr></hr>
</td>
<td align="center" valign="bottom">4 (4-0)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">24 (24-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18973
<hr></hr>
</td>
<td align="center" valign="bottom">7 (7-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">91 (91-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18981
<hr></hr>
</td>
<td align="center" valign="bottom">5 (4-1)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">100 (99-1)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA18999
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (2-0)
<hr></hr>
</td>
<td align="center" valign="bottom">0 (0-0)
<hr></hr>
</td>
<td align="center" valign="bottom">229 (196-33)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA19131
<hr></hr>
</td>
<td align="center" valign="bottom">8 (6-2)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (3-0)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (1-1)
<hr></hr>
</td>
<td align="center" valign="bottom">30 (30-0)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA19138
<hr></hr>
</td>
<td align="center" valign="bottom">5 (5-0)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (2-1)
<hr></hr>
</td>
<td align="center" valign="bottom">1 (1-0)
<hr></hr>
</td>
<td align="center" valign="bottom">48 (46-2)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA19153
<hr></hr>
</td>
<td align="center" valign="bottom">5 (5-0)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (1-2)
<hr></hr>
</td>
<td align="center" valign="bottom">1 (0-1)
<hr></hr>
</td>
<td align="center" valign="bottom">26 (25-1)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA19159
<hr></hr>
</td>
<td align="center" valign="bottom">4 (4-0)
<hr></hr>
</td>
<td align="center" valign="bottom">5 (2-3)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (0-2)
<hr></hr>
</td>
<td align="center" valign="bottom">28 (27-1)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA19206
<hr></hr>
</td>
<td align="center" valign="bottom">6 (5-1)
<hr></hr>
</td>
<td align="center" valign="bottom">3 (2-1)
<hr></hr>
</td>
<td align="center" valign="bottom">1 (0-1)
<hr></hr>
</td>
<td align="center" valign="bottom">35 (33-2)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">NA19223
<hr></hr>
</td>
<td align="center" valign="bottom">7 (7-0)
<hr></hr>
</td>
<td align="center" valign="bottom">4 (3-1)
<hr></hr>
</td>
<td align="center" valign="bottom">2 (1-1)
<hr></hr>
</td>
<td align="center" valign="bottom">30 (29-1)
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Total</td>
<td align="center">101 (91-10)</td>
<td align="center">55 (43-12)</td>
<td align="center">9 (6-3)</td>
<td align="center">1,791 (1,692-99)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>a</sup>
For each sample, columns show the number of all CNV events (common-rare) identified by each tool.</p>
</table-wrap-foot>
</table-wrap>
<p>To validate the results obtained by the four methods, we calculated the overlap between the four sets of genomic events and the known CNVs annotated in the database of genomic variants (DGV) and in the NCBI dbVar. For each of the four algorithms, the overlap analysis took into account all the discovered CNVs and rare and common variants separately. The comparison of the four algorithms and the CNVs in DGV and dbVar was performed using two different overlap criteria: a region was considered validated if there was any overlap greater than 10% (criterion A) or 50% (criterion B).</p>
<p>The results of these analyses are summarized in Figure 
<xref ref-type="fig" rid="F3">3</xref>
a,b,c,d. For the all CNV and common CNV analyses, the best results for the validation rate for the DGV and dbVar databases for both overlap criteria were obtained by EXCAVATOR and CoNIFER, followed by XHMM and ExomeCNV. For the rare CNV analysis, CoNIFER obtained the best validation rates, followed by EXCAVATOR, XHMM and ExomeCNV. As a further step, to evaluate the sensitivity and the specificity of the four methods, we compared the four sets of calls with the CNVs previously reported by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] and Conrad
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B5">5</xref>
] in the 20 samples included in our study. Also in this case, all the comparison analyses took into account all the discovered CNVs and rare and common variants separately. Using microarray techniques, McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] detected 100 CNV events (96 common CNVs and 4 rare CNVs) overlapping coding regions (with at least three exons) on chromosomes 1 and 4 of these 20 samples, while Conrad
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B5">5</xref>
] detected 120 events (116 common and 4 rare). Of the CNV regions reported by McCarroll
<italic>et al.</italic>
, 12 out of 100, and 76 out of the 120 reported by Conrad
<italic>et al.</italic>
, were not found by EXCAVATOR and ExomeCNV, since the test and control samples had the same DNA copy number values for those traits. For this reason, we used the whole reference set of CNVs used by McCarroll
<italic>et al. </italic>
and Conrad
<italic>et al. </italic>
to validate the CoNIFER and XHMM results, while EXCAVATOR and ExomeCNV were validated using a reduced dataset with variants having the same copy number status in the test and control samples filtered out. The two reference sets allowed us to evaluate the precision (
<italic>P</italic>
) and recall (
<italic>R</italic>
) obtained by the four tools. For each reference set, the precision was calculated as the ratio between the number of correctly detected events (the intersection between the tool calls and the validation set calls) and the total number of events detected by a tool. The recall was calculated as the ratio between the number of correctly detected events and the total number of events in the validation set.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Summary of the results obtained by EXCAVATOR on the 1000 Genomes Project samples. </bold>
<bold>(a)</bold>
,
<bold>(b)</bold>
,
<bold>(c)</bold>
,
<bold>(d)</bold>
 Overlap between the set of CNVs detected by the four methods and the CNVs annotated in the DGV
<bold>(a, b)</bold>
 and in the NCBI dbVar
<bold>(c, d)</bold>
 with the two overlapping criteria: 10%
<bold>(a, c)</bold>
 and 50%
<bold>(b, d)</bold>
.
<bold>(e)</bold>
,
<bold>(f)</bold>
,
<bold>(g)</bold>
 Precision-recall plots of the comparison between the CNV events detected by the four methods included in this comparison and the CNVs previously reported by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] and Conrad
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B5">5</xref>
]. Light grey curves represent
<italic>F</italic>
-measure levels (harmonic mean of precision and recall).
<bold>(e)</bold>
 Results for all variants.
<bold>(f)</bold>
 Results for common CNVs.
<bold>(g)</bold>
 Results for rare CNVs.</p>
</caption>
<graphic xlink:href="gb-2013-14-10-r120-3"></graphic>
</fig>
<p>The results obtained by the four methods for the all variants (Figure 
<xref ref-type="fig" rid="F3">3</xref>
e) and common variants (Figure 
<xref ref-type="fig" rid="F3">3</xref>
f) validations are very similar. In the McCarroll dataset, CoNIFER obtained excellent results for precision followed by EXCAVATOR, XHMM and ExomeCNV. ExomeCNV was the best for recall, followed by EXCAVATOR, XHMM and CoNIFER. The high recall rate obtained by ExomeCNV is due to the large number of CNV events (see Table 
<xref ref-type="table" rid="T2">2</xref>
) detected by this tool. However, the precision for this method is very low since only a very small fraction of the 1,791 events overlap with the McCarroll dataset. In the Conrad dataset, all the methods gave poor results with the exception of our computational pipeline: EXCAVATOR outperformed the other three software packages for both precision and recall.</p>
<p>For the rare variants analysis, we observed that the PCA-based approach performs well with the McCarroll dataset (Figure 
<xref ref-type="fig" rid="F3">3</xref>
g). CoNIFER obtained high precision and moderate recall, while XHMM obtained high recall and moderate precision. On the other hand, EXCAVATOR gave very poor results: it was not able to identify any of the rare events of the McCarroll dataset, and only two out of the ten rare events detected by our method overlap with the McCarroll dataset. Conversely, for the Conrad dataset, our pipeline achieved the best trade-off between precision and recall while the other three methods completely failed the validation analysis. Taken as a whole, these results highlight that EXCAVATOR outperforms the other state-of-the-art methods considered in this comparison.</p>
</sec>
<sec>
<title>Melanoma data analysis</title>
<p>To evaluate the power of our computational approach for cancer genomics studies, we used EXCAVATOR to analyze six metastatic melanoma cell lines derived from metastasis tumor biopsies of stage IV melanoma patients and six blood samples from healthy donors were used as controls (Table 
<xref ref-type="table" rid="T1">1</xref>
). Here, we aimed to test our pipeline with respect to some typical major challenges of cancer genomics analyses, such as the ability to analyze widely rearranged karyotypes, with many different copy number alterations (CNAs) that often result in significant sample diversity. Given these issues, the detection of CNAs in tumor samples and the correct quantification of their DNA copies can be particularly challenging.</p>
<p>To evaluate the accuracy and resolution of WES data in discovering CNAs of different kinds and sizes, we also performed genomic profiling of the same 12 samples using the Affymetrix 250K SNP Array platform. For each segmented region, we compared the log
<sub>2 </sub>
ratio median values obtained from WES and the SNP array and calculated their global correlation over the whole dataset. This calculation was performed considering all the segmented regions or progressively filtering out regions smaller than a threshold (which we set at 100 kb, 500 kb or 1 Mb). The results of this correlation analysis are shown in the central panels of Figure 
<xref ref-type="fig" rid="F4">4</xref>
. A strong correlation between the SNP array and WES results (
<italic>R </italic>
= 0.85) was observed for segmented regions larger than 1 Mb. Conversely, considering progressively smaller genomic regions, the correlation between the two platforms drastically decreased mainly due to the different distributions of the SNP probes and exons throughout the genome. This was confirmed by comparing the number of Affymetrix SNP probes and the number of exons that cover each segmented region (Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S57): segmented regions larger than 1 Mb comprise a comparable number of SNP probes and exons (
<italic>R </italic>
= 0.8), while segmented regions smaller than 100 kb do not (
<italic>R </italic>
= − 0.02).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Summary of the results obtained by EXCAVATOR on the melanoma dataset.</bold>
 The Circos plot summarizes all the CNV regions detected in each of the six samples by both exome-seq and SNP array analysis. On each chromosome, melanoma samples are vertically ordered (Me01, Me02, Me04, Me05, Me08, Me12), with two tracks (WES and SNP array) for each. Central panels show the global correlation calculated between the log2 ratio median values obtained from the two technologies, when considering all the segmented regions
<bold>(a)</bold>
 or segmented regions larger than 100 kb
<bold>(b)</bold>
, 500 kb
<bold>(c)</bold>
 or 1 Mb
<bold>(d)</bold>
. CNV regions are distinguished by color as two-copy deletions (red), one-copy deletions (orange), one-copy amplifications (light green) and multiple-copy amplifications (dark green).CNV, copy number variant.</p>
</caption>
<graphic xlink:href="gb-2013-14-10-r120-4"></graphic>
</fig>
<p>Another important feature emerging from this correlation analysis is the larger dynamic range provided by WES data: for genomic regions larger than 100 kb we found that the slope of the regression line was greater than 1 and it had a maximum value of 1.5 for regions larger than 1 Mb, thus indicating that over the whole dataset WES data can detect and quantify a wider range of copy number values with respect to SNP array data. The higher dynamic range of WES data is a documented advantage of this technology, which improves the ability of segmentation algorithms to detect signal shifts and the ability of calling algorithms to quantify the correct number of DNA copies. This feature is particularly relevant in cancer genomics analysis, where sample heterogeneity often hampers the detection of CNAs and the correct quantification of their DNA copy number. This is evident also in the melanoma dataset: the Circos plot (Figure 
<xref ref-type="fig" rid="F4">4</xref>
) shows all the CNAs called by WES and SNP array, for each tumor sample (for complete lists see Additional file 
<xref ref-type="supplementary-material" rid="S3">3</xref>
: Table S2 for WES and Additional file 
<xref ref-type="supplementary-material" rid="S4">4</xref>
: Table S3 for SNP array results). Although the genomic aberrations here found were globally consistent with the typical well-known melanoma signature, it is straightforward to note that on some chromosomes WES and SNP array data returned different results.</p>
<p>All these results are directly related to the different dynamic range and sensitivity peculiar to these two technologies. For many chromosomes across the six tumor samples, WES data called one-copy deletions or one-copy amplifications where SNP array data returned a normal copy number state. In these cases, as shown for chromosomes 4, 7, 10 and 17 in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figures S57 to S60, the copy number data derived from both technologies showed a shift from the normal diploidy baseline. However, the WES data resulted in a greater shift than SNP array, thus allowing the classification of a region as CNA by the calling algorithm. The same phenomenon explains why, in cases where both technologies detected exactly the same CNA in terms of boundaries, the WES data was able to call multiple-copy amplifications whereas SNP array data called only one-copy gains, as seen on chromosomes 1, 5, 7 and 9 in the Circos plot (Figure 
<xref ref-type="fig" rid="F4">4</xref>
). Overall, these data demonstrated that, particularly when dealing with cancer samples, the wider dynamic range provided by WES data can be used to obtain a greater sensitivity and, consequently, a better discrimination and quantification of CNAs. Considering these properties, the combination of WES data with the EXCAVATOR pipeline improves the detection of CNAs and, consequently, the identification of potentially interesting genes affected by genomic imbalances that may deserve further investigations as candidate cancer genes. Indeed, as a proof of principle confirming the potential of our method, we observed that on chromosome 7, in three samples (Me04, Me08 and Me12), both WES and SNP array data detected the one-copy gain of a q arm typical of a melanoma signature and encompassing the
<italic>BRAF </italic>
locus on 7q34 (chr7:140433813-140624564), already known to be affected by genomic amplifications in melanoma cell lines [
<xref ref-type="bibr" rid="B38">38</xref>
]. In addition, EXCAVATOR called such a one-copy gain also in Me02 (whereas SNP array data called a normal diploidy over the whole chromosome), and a multiple-copy amplification in Me01 and Me05, where SNP array data showed only a one-copy gain. Moreover, as examples of known melanoma genes typically affected by deletions, our computational pipeline applied on WES data identified a one-copy loss in two samples (Me01 and Me04) covering the whole chromosome 10 and including the
<italic>PTEN </italic>
locus on 10q23.31 (chr10:89623195-89728532), which SNP array data completely missed. Similarly, on chromosome 17p, while for Me08 both WES and SNP array data detected a one-copy loss spanning over the
<italic>TP53 </italic>
locus on 17p13.1 (chr17:7571720-7590868), WES data were able to identify such a deletion also in Me02, whereas SNP array data returned a diploid state. These two genes are well-known tumor suppressor genes and are frequently affected by one-copy deletions in up to 40% of melanoma cell lines [
<xref ref-type="bibr" rid="B38">38</xref>
]. Such situations are visually noticeable in the Circos plot of Figure 
<xref ref-type="fig" rid="F4">4</xref>
and are reported in detail in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figures S58 to S61.</p>
<p>As a final step, since ExomeCNV was purposely developed and calibrated on cancer data, we compared its performance with that of EXCAVATOR in the analysis of the six metastatic melanoma cell lines using the six blood samples from healthy donors as controls (see Materials and methods for analysis settings). The results produced by ExomeCNV clearly indicate an overestimation of CNV events: for almost all melanoma samples, the algorithm detected more than 2 Gb of altered regions (1,950 Mb for Me01, 2,302 Mb for Me02, 2,318 for Me04, 2,168 Mb for Me05, 2,265 Mb for Me08 and 2,168 Mb for Me12). This overestimation of non-diploid regions distributed over most of the exome is due to the fact that ExomeCNV estimates DNA copy number values using an uncalibrated read depth. Overall, these results strongly suggest that EXCAVATOR gives novel and potentially useful improvements and opportunities for cancer genomics.</p>
</sec>
<sec>
<title>Intellectual disability data analysis</title>
<p>To demonstrate the ability of our computational pipeline to detect genomic alterations involved in mental retardation, we performed whole-exome sequencing of two siblings with an intellectual disability (ID1 and ID2); see Table 
<xref ref-type="table" rid="T1">1</xref>
. To show the flexibility of our computational pipeline in combining and analyzing data generated by different laboratories, we used, as control, the WES data of a healthy individual of European descent sequenced by [
<xref ref-type="bibr" rid="B39">39</xref>
] (see Materials and methods and Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
for more details). The data were analyzed using EXCAVATOR with default parameters and the results of this analysis are shown in Additional file 
<xref ref-type="supplementary-material" rid="S5">5</xref>
: Table S4 and summarized in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S62.</p>
<p>For autosomal chromosomes, EXCAVATOR detected 29 CNV regions in the ID1 sample and 24 CNV regions in the ID2 sample, ranging from 1 Mb to 3 kb in size. To distinguish putative pathogenic CNVs from normal copy number polymorphisms, we assessed the overlap between our calls and the known CNVs annotated in the database of genomic variants (DGV) by using a 50% overlap criterion. We found that 22 out of 29 and 17 out of 24 regions overlap with DGV for the ID1 and ID2 samples, respectively. The CNV regions that do not overlap with DGV range from 1 Mb to 26 kb in size. In this set of CNVs, we found a large deletion on chromosome 2q11.1-2q11.2 (chr2:96780257-97833468), which is shared by the two siblings and which was confirmed by using the Affymetrix GeneChip SNP6.0 Array for both the siblings (Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S63). By interrogating the ISCA database [
<xref ref-type="bibr" rid="B40">40</xref>
], we found recurrent rearrangements involving this region and indicated as pathogenic in cases with developmental delay. Seven ISCA deletions had a 87% to 100% overlap with those found by EXCAVATOR and six of them were reported to be associated with ID, autism or general developmental delay, with both a
<italic>de novo </italic>
origin and parental inheritance and different pathogenetic roles (Additional file 
<xref ref-type="supplementary-material" rid="S6">6</xref>
: Table S5). Interestingly, the same genomic region (chr2:96726273-97676273) was found at a very low frequency in cases affected by developmental delay (2/15,767), while it never occurred in controls (0/8,329) [
<xref ref-type="bibr" rid="B41">41</xref>
].</p>
<p>Within this deleted region, 21 NCBI RefSeq genes (
<italic>ADRA2B</italic>
,
<italic>ANKRD23</italic>
,
<italic>ANKRD36</italic>
,
<italic>ANKRD39</italic>
,
<italic>ARID5A</italic>
,
<italic>ASTL</italic>
,
<italic>WDR39</italic>
,
<italic>CNNM4</italic>
,
<italic>CNNM3</italic>
,
<italic>DUSP2</italic>
,
<italic>FAHD2B</italic>
,
<italic>FAM178B</italic>
,
<italic>FER1L5</italic>
,
<italic>ITPRIPL1</italic>
,
<italic>KANSL</italic>
,
<italic>LMAN2L</italic>
,
<italic>NCAPH</italic>
,
<italic>SEMA4C</italic>
,
<italic>SNRNP200</italic>
,
<italic>STARD7</italic>
and
<italic>TMEM127</italic>
) have been mapped. Moreover, 13 genes are recorded in the On-line Mendelian Inheritance in Man (OMIM) [
<xref ref-type="bibr" rid="B42">42</xref>
] catalog, some of which are associated with congenital disorders distinct from ID. Other genes are putative candidates to be defective in ID or neurodevelopmental delay:
<italic>ADRA2B</italic>
(alpha-2B-adrenergic receptor, MIM 104260) is one of the three highly homologous alpha-2-adrenergic receptors having a critical role in regulating neurotransmitter release from sympathetic nerves and from adrenergic neurons in the central nervous system and
<italic>ARID5A</italic>
(AT-rich interaction domain-containing protein 5A, MIM 61153) is a member of the ARID protein family, which might play important roles in development.</p>
<p>Overall, the detection of a recurrent 2q11.1-2q11.2 deletion in the two siblings affected by ID, demonstrated that EXCAVATOR is a suitable tool for widely screening the exomes of ID patients even for low-frequency CNVs. It has added a piece of information that possibly implicates this genomic region in producing susceptibility to neurocognitive defects.</p>
<p>Finally, we used the two ID samples to compare the performance of our pipeline with that of the methods mentioned in the Background section (see Materials and methods for analysis settings). Tests are described in the Population data analysis section. CoNIFER and XHMM were not able to identify any genomic regions involved in CNVs, thus confirming their limitations in analyzing small datasets comprising few samples. On the other hand, ExomeCNV detected 200 Mb (269 CNVs ranging from 36 Mb to 1 kb) and 342 Mb (245 CNVs ranging from 40 Mb to 1 kb) of genomic regions involved in CNV for the ID1 and ID2 samples, respectively. As discussed above, these results can be ascribed to the discrepancy in the total sequence read count between the case and control samples. Taken as a whole, these results show the uniqueness of our tool in the analysis of WES data for diagnosis.</p>
</sec>
<sec>
<title>Effect of mapping algorithms and read length on EXCAVATOR performance</title>
<p>To investigate the effects of alignment tools and read lengths on the global performance of our computational pipeline, we analyzed the WES data for four individuals (NA10847, NA19131, NA19152 and NA19153) generated by the 1000 Genomes Project Consortium. To study the dependence of the outcome from EXCAVATOR on different short read aligners, we mapped reads using three of the most popular and commonly used algorithms (BWA [
<xref ref-type="bibr" rid="B39">39</xref>
], Bowtie2 [
<xref ref-type="bibr" rid="B43">43</xref>
] and SOAP2 [
<xref ref-type="bibr" rid="B44">44</xref>
]), while to evaluate the effect of read length we cut the original 100-nucleotide-long paired-end reads of the four samples into 75-nucleotide-long and 50-nucleotide-long reads and compared the outputs (see Materials and methods for more details). Raw sequencing data were aligned to the human reference genome (hg19) and then subjected to a post-processing pipeline including Picard [
<xref ref-type="bibr" rid="B45">45</xref>
], SAMtools [
<xref ref-type="bibr" rid="B46">46</xref>
] and the Genome Analysis ToolKit [
<xref ref-type="bibr" rid="B47">47</xref>
] (see Materials and methods for more details). After the mapping pipeline, for each aligner and read length, we applied EXCAVATOR to the three samples, NA19131, NA19152 and NA19153, using NA10847 as control. First, we compared raw read count values for different aligners and read lengths. The comparison was performed by calculating the Pearson correlation coefficient between the read count values of each combination of aligner and read length. The results of these analyses are reported in Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S64 and show that using different aligners with different read lengths slightly affects the total number of reads mapped at each exon of the genome. For all read lengths investigated, Bowtie2 and BWA obtained a correlation coefficient greater than 0.99. SOAP2aligner had a smaller correlation coefficient than the other two algorithms, nevertheless it was larger than 0.98 for all examined cases. To evaluate the effect of read length and mapping algorithm on the ability of EMRC data to predict the exact DNA copy number values of a genomic region, we examined several broad genomic regions previously reported to have copy numbers equal to 0, 1, 2, 3, 4, 5 or 6 by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
]. We calculated the correlation between the EMRC ratio and the absolute DNA copies predicted by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
]. The results of these analyses are reported in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Figure S65 and show that the prediction of the absolute number of DNA copies is independent of the read length and mapping algorithm: in all analyses we obtained a Pearson correlation coefficient between 0.77 and 0.79.</p>
</sec>
</sec>
<sec sec-type="conclusions">
<title>Conclusions</title>
<p>In this work we present a novel computational method based on the RC approach to detect CNV regions starting from whole-exome sequencing data. We studied the statistical properties and systematic biases of RC targeted sequencing data and introduced a novel normalization procedure to mitigate the effects of these biases. We also demonstrated the capability of such normalized WES data to predict the exact number of DNA copies for CNV regions.</p>
<p>Furthermore, we developed a novel heterogeneous hidden markov model based algorithm (HSLM), which exploits the sparseness of coding regions throughout the genome to detect both small isolated events and large alterations. Testing HSLM on synthetic data showed that it was able to detect, with a comparable accuracy, large genomic regions covered by many exons as well as small genomic regions covered by few exons. Moreover, synthetic simulations were also exploited to compare the performance of HSLM to the CBS algorithm. Our results show that HSLM outperforms CBS in both sensitivity and specificity, thus improving our ability to identify small and highly isolated CNV regions covered by few exons. Also, we extended a method previously developed for array-CGH analysis to classify genomic regions obtained from HSLM segmentation into discrete copy number states. Finally, we packaged all these algorithms into a novel software tool named EXCAVATOR.</p>
<p>To demonstrate the usefulness and versatility of our tool in analyzing different experimental designs, we applied our computational pipeline to three WES datasets generated using different exome capture and sequencing technologies and we compared its performance with three recently published methods for CNV calling from WES data (ExomeCNV, CoNIFER and XHMM).</p>
<p>To show the potential of EXCAVATOR in population genetics studies, we analyzed 20 healthy individuals sequenced by the 1000 Genomes Project Consortium and previously genotyped with microarray technologies. Our method detected both rare and common variants and the comparison with known CNVs from microarray studies show that EXCAVATOR outperforms the other three pipelines in both precision and recall.</p>
<p>We tested our tool to see if it applicable to cancer genomics studies by using it to identify genomic alterations in six metastatic melanoma cell lines. The results were compared with those obtained by SNP array analysis. We found considerable concordance between WES and SNP array results, which show that WES data have much greater sensitivity and a wider dynamic range than SNP array data for detecting deletions and amplifications. A comparison with a tool developed and calibrated for cancer data analysis (ExomeCNV), demonstrated that EXCAVATOR had better performance for both sensitivity and specificity.</p>
<p>Finally, we studied genomic alterations in two siblings affected by intellectual disability. Our tool detected a large deletion on chromosome 2, which was confirmed by SNP array analysis for both samples and suggested that there is potential pathogenic interest for this disease. None of the other methods performed as well as EXCAVATOR.</p>
<p>All of the comparative analyses we performed highlighted the versatility of our software and its ability to overcome the limitations and drawbacks of currently available state-of-the-art tools. Importantly, while the other software packages are limited to three classification states, EXCAVATOR can quantify and discriminate five copy number states, thus allowing it to distinguish one-copy from two-copy deletions and one-copy duplications from multiple-copy amplifications. Moreover, we found that ExomeCNV generates a huge number of false positive events while CoNIFER and XHMM produce a significant number of false negatives. These results are mainly ascribed to the different normalization procedures implemented in the three software packages: ExomeCNV does not take into account the discrepancy in the total sequence read count between the case and control samples, while CoNIFER and XHMM analyze many samples simultaneously to remove systematic noise. The computational pipeline we presented in this paper can be run on single samples and the results are not affected by dataset size, thus making EXCAVATOR a suitable tool for the investigation of CNVs in large-scale projects (such as the 1000 Genomes Project and the Cancer Genome Atlas) as well as in clinical research and diagnosis.</p>
</sec>
<sec sec-type="materials|methods">
<title>Materials and methods</title>
<sec>
<title>GC content and mappability</title>
<p>To calculate the GC content percentage for each exon we used the gc5Base tracks downloaded from the UCSC website [
<xref ref-type="bibr" rid="B48">48</xref>
]. gc5Base tracks give the percentage of G (guanine) and C (cytosine) bases in five-base windows. Mappability bias is due to the fact that the genome contains many repetitive elements and aligning reads to these positions leads to ambiguous mapping. We used the uniqueome data in [
<xref ref-type="bibr" rid="B49">49</xref>
] to calculate a mappability score for each exon. In this paper, the authors introduced a genomic resource to understand the uniquely mappable proportion of genomic sequences. We evaluated the uniqueness of genomic sequences using an all-against-all alignment for different word sizes. Alignments were performed with the Imagenix Sequence Alignment System (ISAS) [
<xref ref-type="bibr" rid="B50">50</xref>
]. The all-against-all alignments were performed independently for tag lengths between 25 and 90 nucleotides with varying numbers of mismatches, in both nucleotide space and color space. The results of these analysis were formatted as bigBED and bigWig files and can be downloaded from [
<xref ref-type="bibr" rid="B51">51</xref>
]. The bigWig files contain coverage values expressed as rounded integer percentiles of full coverage (for example, a value of 100 indicates that 100% of overlapping N-mers are unique and contribute to coverage of that coordinate; similarly a value of 50 indicates that 50% of overlapping N-mers are unique). A mappability score for each exon was obtained by averaging the coverage values of the nucleotides belonging to the selected exon.</p>
</sec>
<sec>
<title>Exon mean read count data normalization</title>
<p>To minimize the effect of the three sources of variation, we used a three-step bias removal procedure based on the median normalization approach introduced in [
<xref ref-type="bibr" rid="B23">23</xref>
] and in [
<xref ref-type="bibr" rid="B31">31</xref>
]. In practice, for all of the GC percentages (0,1,2,…,100
<italic>%</italic>
), all of the bin of mappability scores (0,0.1,0.2,…,1) and all of the bin of exon sizes (10 bp, 20 bp, 30 bp, …) we calculated the deviation of EMRC from the exome average and then corrected each EMRC according to: </p>
<p>
<disp-formula id="bmcM3">
<label>(3)</label>
<mml:math id="M3" name="gb-2013-14-10-r120-i3" overflow="scroll">
<mml:mover accent="false">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mtext>EMRC</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo accent="true">¯</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mtext>EMRC</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>·</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="normal">X</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:math>
</disp-formula>
</p>
<p>where EMRC
<sub>
<italic>i </italic>
</sub>
is the exon mean read count of the
<italic>i</italic>
th exon,
<italic>m</italic>
<sub>X </sub>
is the median EMRC of all the exons that have the same X value (where X = [GC content, mappability score, exon size]) as the
<italic>i</italic>
th exon, and
<italic>m </italic>
is the overall median of all the exons. At the end of this procedure, the EMRC for each exon has been corrected for the three sources of bias.</p>
</sec>
<sec>
<title>Copy number estimation</title>
<p>To measure the ability of EMRC data to predict the exact DNA copy number of a genomic region, we examined several broad genomic regions that were previously reported to have copy numbers equal to 0, 1, 2, 3 or 4 by McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] for the eight samples (NA10847, NA19131, NA19138, NA19152, NA19153, NA19159, NA19206 and NA19223) generated by the 1000 Genomes Project Consortium. McCarroll
<italic>et al.</italic>
[
<xref ref-type="bibr" rid="B7">7</xref>
] designed a hybrid genotyping array (Affymetrix SNP 6.0) to measure 906,600 SNPs and copy numbers at 1.8 million genomic locations simultaneously. They used this array to develop a high-resolution map of copy number variation for 270 HapMap samples. Their goal was to construct a map that was precise and accurate for the boundaries of the genomic regions affected by CNV and to determine an accurate integer copy number level for each segment in each individual. The boundaries of each CNV were determined using a hidden Markov model and the integer copy number level was estimated using quantitative PCR. For samples NA19152, NA19159, NA19131, NA19153, NA19138, NA19223, NA19206 and NA10847 they detected 191, 193, 183, 173, 172, 202, 185 and 148 CNV regions, respectively, with copy numbers equal to 0, 1, 3 or 4. The table of DNA copy numbers estimated in [
<xref ref-type="bibr" rid="B7">7</xref>
] were downloaded from the Nature Genetics website. The results shown in Figure 
<xref ref-type="fig" rid="F1">1</xref>
i,g were obtained using the EMRC data median normalized to copy number 2 of the seven samples of Yoruba ancestry for genomic regions, while the results reported in Figure 
<xref ref-type="fig" rid="F1">1</xref>
h,j were obtained using the EMRC ratio between the seven samples of Yoruba ancestry and the NA10847 sample for these genomic regions. To evaluate the linear relation between RC and CNV regions we calculated the Pearson correlation coefficient.</p>
</sec>
<sec>
<title>Calling algorithm</title>
<p>To classify each segmented region as one of five discrete copy number states (two-copy deletion, one-copy deletion, normal, one-copy duplication or multiple-copy amplification) we used the FastCall algorithm [
<xref ref-type="bibr" rid="B36">36</xref>
], which we developed to classify array-CGH data. The FastCall calling procedure is a mixture model based algorithm, which can be used to classify each segmented region as one of five predefined copy number states: double loss, loss, neutral, gain or multiple gain. Our calling procedure models the mean of each segment as a mixture of five truncated normal distributions and can also take into account sample heterogeneity using a cellularity parameter
<italic>c</italic>
(see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
for more details). The algorithm takes as input the mean level of each segment
<italic>m </italic>
= (
<italic>m</italic>
<sub>1</sub>
,
<italic>m</italic>
<sub>2</sub>
,…,
<italic>m</italic>
<sub>
<italic>i</italic>
</sub>
,…,
<italic>m</italic>
<sub>
<italic>N</italic>
</sub>
), identified by the HSLM algorithm and gives as output the probability that a segment (mean) belongs to a particular state.</p>
</sec>
<sec>
<title>EXCAVATOR tool</title>
<p>All the algorithms and methods here described have been packaged in the EXCAVATOR software. EXCAVATOR is a collection of Perl, Bash, R and Fortran codes. Figure 
<xref ref-type="fig" rid="F5">5</xref>
is a schematic representation of EXCAVATOR’s workflow steps. It takes as input BAM files and gives as output figures for raw and normalized data, plots of segmentation and calling results and a list of detected CNVs as tab-delimited text files. The package can analyze samples with two different experimental designs: ‘pooling’ and ‘somatic’. In the pooling scheme, each test sample is compared with a pooled reference obtained by summing the total number of reads for each exon across all the control samples. In the somatic scheme, each test sample is compared with its matched control. The EXCAVATOR tool can run on any UNIX system (desktops and workstations). On a desktop computer with a 2.5-GHz CPU and 8 GB of RAM, it takes four hours to analyze ten WES samples sequenced at 60 ×. The EXCAVATOR tool is freely available from [
<xref ref-type="bibr" rid="B52">52</xref>
].</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>EXCAVATOR workflow.</bold>
 BAM files of both test and control samples are processed by means of SAMtools and R scripts for EMRC calculations. After EMRC calculation, EXCAVATOR corrects the data for GC-content, mappability and exon size. After normalization, normalized read count (NRC) for each sample are organized according to the analysis mode (pooling or somatic) selected by the user: pooling mode to compare one sample to a pool of normal controls, somatic mode to compare one sample to its corresponding normal control. Finally, HLSM and FastCall are applied to normalized data and results are provided as tab-delimited text files (variant call format, VCF and BED format). HSLM, heterogeneous shifting level model; RC, read count.</p>
</caption>
<graphic xlink:href="gb-2013-14-10-r120-5"></graphic>
</fig>
</sec>
<sec>
<title>Population dataset</title>
<p>The genomes of all 27 individuals were sequenced by the 1000 Genomes Project Consortium and data were downloaded from [
<xref ref-type="bibr" rid="B53">53</xref>
] as BAM files. The data were first filtered and normalized as reported in Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
and then analyzed using HSLM followed by the FastCall algorithm with default parameters (see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
for more details).</p>
</sec>
<sec>
<title>Melanoma dataset</title>
<p>For the melanoma dataset, all tumor and normal samples were captured using the same target enrichment kit (Agilent SureSelect Human All Exon 50 Mb kit) and sequenced, one sample per lane, in a 76-bp paired-end GAIIx run, thus obtaining a mean depth on the target of 43 × (range 32 × to 54 ×) (see Table 
<xref ref-type="table" rid="T1">1</xref>
and Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S3). Exome sequencing data are available at the Sequence Read Archive under accession ERP001844. WES reads of the 12 samples were aligned against the human reference genome hg19 by means of the BWA aligner, then filtered, normalized and analyzed by the HSLM and FastCall algorithms with default parameters (see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
). Since we did not have autologous normal samples for matched controls, WES reads from the six normal blood samples were pooled and used as a common reference baseline (see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
).</p>
<p>The same 12 samples were profiled using the Affymetrix 250K SNP Array platform and signal intensities were acquired by the GCOS software and normalized with the CNAG software. Melanoma cell line data were compared to the common reference pool composed of the six normal blood samples. The normalized log2 ratio SNP copy number values generated for each tumor sample were segmented using the SLM segmentation algorithm and the FastCall calling procedure was used to classify all the segmented genomic regions into defined copy number states (see Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
).</p>
</sec>
<sec>
<title>Intellectual disability dataset</title>
<p>The two ID samples were captured using the same Illumina Truseq Target Enrichment kit and sequenced as 100-bp paired-end reads with a mean base coverage of 63 × using the Illumina HiSeq2000 platform (see Table 
<xref ref-type="table" rid="T1">1</xref>
and Additional file 
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S4). Exome sequencing data are available at the Sequence Read Archive under accession ERP001831. The WES data of the healthy individual of European descent sequenced by [
<xref ref-type="bibr" rid="B39">39</xref>
] were generated by the same exome-capture and sequencing platform used for the two ID samples (Illumina Truseq Target Enrichment kit and the Illumina HiSeq2000 platform). Reads from the three samples were aligned against the human reference genome hg19 by the BWA aligner, then filtered, normalized and analyzed by the HSLM and FastCall algorithms with default parameters (see Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
).</p>
</sec>
<sec>
<title>Algorithm comparison</title>
<p>We compared our algorithm to three previously published software packages: ExomeCNV [
<xref ref-type="bibr" rid="B25">25</xref>
], CoNIFER [
<xref ref-type="bibr" rid="B26">26</xref>
] and XHMM [
<xref ref-type="bibr" rid="B27">27</xref>
]. We downloaded the ExomeCNV R package version 1.4 from [
<xref ref-type="bibr" rid="B54">54</xref>
]. We used ExomeCNV with default parameters: sensitivity and specificity were set at 0.9999 for exons (maximizing specificity) and 0.99 for calls (‘auc’ option), and the admixture rate was set at a value of 0.5 (although all the samples used in this work had no biological admixture, we found that this setting reduced the number of false positive calls). We downloaded CoNIFER 0.2.2 from [
<xref ref-type="bibr" rid="B55">55</xref>
]. After running the analysis with the − −
<italic>plot_screen </italic>
option, we examined the components plot and we decided to run the final CoNIFER analyses with the setting to remove two singular value decomposition components (− −
<italic>svd </italic>
2). XHMM was downloaded from [
<xref ref-type="bibr" rid="B56">56</xref>
]. The XHMM tool was applied to the three datasets using the default parameter setting and following the instructions on [
<xref ref-type="bibr" rid="B57">57</xref>
].</p>
</sec>
<sec>
<title>Alignment algorithms and read trimming</title>
<p>Raw reads in fastq format were downloaded from [
<xref ref-type="bibr" rid="B58">58</xref>
] for each of the four samples (NA10847, NA19131, NA19152 and NA19153). As a first step, the original 100-nucleotide reads were trimmed to 75 nucleotides and 50 nucleotides using the fastx-trimmer of the FASTX Toolkit 0.0.13.1 [
<xref ref-type="bibr" rid="B59">59</xref>
], then, raw reads were aligned to the human reference genome (hg19) using BWA, Bowtie2 and SOAP2 with default parameter settings. We downloaded BWA version 0.6.1-r104 from [
<xref ref-type="bibr" rid="B60">60</xref>
], Bowtie2 version 2.1.0 from [
<xref ref-type="bibr" rid="B61">61</xref>
] and SOAPaligner version 2.21 from [
<xref ref-type="bibr" rid="B62">62</xref>
]. The output from SOAP2aligner was converted into sequence alignment map (SAM) format exploiting the Perl soap2sam.pl script (available from [
<xref ref-type="bibr" rid="B62">62</xref>
]). SAM files were processed using Picard [
<xref ref-type="bibr" rid="B45">45</xref>
], SAMtools [
<xref ref-type="bibr" rid="B63">63</xref>
] and the Genome Analysis ToolKit (GATK) (3,4) release 2.5-2 [
<xref ref-type="bibr" rid="B64">64</xref>
]. In brief, SAM files were binary compressed, sorted and indexed by SAMtools (samtools view, sort and index tools), duplicated reads were removed by Picard (with MarkDuplicates) and base quality score recalibration and local realignment around indels followed the recommended workflow of the GATK toolkit (RealignerTargetCreator, IndelRealigner, BaseRecalibrator and PrintReads).</p>
</sec>
</sec>
<sec>
<title>Abbreviations</title>
<p>BP: Base pair; CBS: Circular binary segmentation; CGH: Comparative genomic hybridization; CNA: Copy number alteration; CNV: Copy number variant; DGV: Database of genomic variants; EMRC: Exon mean read count; FP: False positive; FPR: False positive rate; Gb: Gigabase; HSLM: Heterogeneous shifting level model; HTS: High-throughput sequencing; ID: Intellectual disability; Kb: Kilobase; Mb: Megabase; PCA: Principal-component analysis; PCR: Polymerase chain reaction; RC: Read count; ROC: Receiver operating characteristic; SAM: Sequence alignment map; SLM: Shifting level model; SNV: Single nucleotide variant; SVD: Singular value decomposition; TP: True positive; TPR: True positive rate; WES: Whole-exome sequencing.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors’ contributions</title>
<p>AM conceived and designed the basic algorithm for EXCAVATOR. LT implemented and optimized the package. IC, CB and EM conducted the melanoma dataset experiments. PM, EB and TP ran the intellectual disability experiments. AM, LT and RD carried out the comparison of the different tools. AK, BG, GDB, RA, GFG, GR and MS supervised the project and gave advice. AM, LT, IC and TP wrote the manuscript. GR, GDB, BG and MB revised the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>Supplemental methods.</bold>
 Supplemental methods for EXCAVATOR: detecting copy number variants from whole-exome sequencing data.</p>
</caption>
<media xlink:href="gb-2013-14-10-r120-S1.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2: Table S1</title>
<p>The complete list of CNVs detected by EXCAVATOR on chromosomes 1 and 4 of the population dataset.</p>
</caption>
<media xlink:href="gb-2013-14-10-r120-S2.xls">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3: Table S2</title>
<p>The complete list of CNAs detected by EXCAVATOR on the WES data of the melanoma dataset.</p>
</caption>
<media xlink:href="gb-2013-14-10-r120-S3.xls">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional file 4: Table S3</title>
<p>The complete list of CNAs detected by SLM segmentation algorithm on the SNP array data of the melanoma dataset.</p>
</caption>
<media xlink:href="gb-2013-14-10-r120-S4.xls">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional file 5: Table S4</title>
<p>Complete list of CNVs detected by EXCAVATOR on the WES data of the ID dataset.</p>
</caption>
<media xlink:href="gb-2013-14-10-r120-S5.xls">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional file 6: Table S5</title>
<p>List of the seven ISCA deletions that had a 87% to 100% overlap with the large deletion that we found in our ID samples.</p>
</caption>
<media xlink:href="gb-2013-14-10-r120-S6.xls">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>We gratefully acknowledge the financial support of the Cariplo Foundation grant number 2006_0771 for genomic, epigenetic and transcriptional analysis of cancer by next-generation sequencing. We gratefully acknowledge the financial support of grant number SF0180027s10 from the Estonian Ministry of Education and Research. Matteo Benelli is supported by European Commission FP7 funding, Project CHERISH (grant agreement number 223692). Tommaso Pippucci is supported by the Italian Ministry of Health’s Young Investigators Award, Project GR-2009-1574072.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Alkan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>Genome structural variation discovery and genotyping</article-title>
<source>Nat Rev Genet</source>
<year>2011</year>
<volume>14</volume>
<fpage>363</fpage>
<lpage>376</lpage>
<pub-id pub-id-type="doi">10.1038/nrg2958</pub-id>
<pub-id pub-id-type="pmid">21358748</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Iafrate</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Feuk</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Rivera</surname>
<given-names>MN</given-names>
</name>
<name>
<surname>Listewnik</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Donahoe</surname>
<given-names>PK</given-names>
</name>
<name>
<surname>Qi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<article-title>Detection of large-scale variation in the human genome</article-title>
<source>Nat Genet</source>
<year>2004</year>
<volume>14</volume>
<fpage>949</fpage>
<lpage>951</lpage>
<pub-id pub-id-type="doi">10.1038/ng1416</pub-id>
<pub-id pub-id-type="pmid">15286789</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Tuzun</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sharp</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Kaul</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>VA</given-names>
</name>
<name>
<surname>Pertz</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Haugen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Hayden</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Albertson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Pinkel</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>MV</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>Fine-scale structural variation of the human genome</article-title>
<source>Nat Genet</source>
<year>2005</year>
<volume>14</volume>
<fpage>727</fpage>
<lpage>732</lpage>
<pub-id pub-id-type="doi">10.1038/ng1562</pub-id>
<pub-id pub-id-type="pmid">15895083</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Redon</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ishikawa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fitch</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Feuk</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Perry</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Andrews</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Fiegler</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Shapero</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Carson</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>EK</given-names>
</name>
<name>
<surname>Dallaire</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Freeman</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>González</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Gratacòs</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kalaitzopoulos</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Komura</surname>
<given-names>D</given-names>
</name>
<name>
<surname>MacDonald</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Mei</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Montgomery</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Nishimura</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Okamura</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Somerville</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Tchinda</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Valsesia</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Woodwark</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>F</given-names>
</name>
<etal></etal>
<article-title>Global variation in copy number in the human genome</article-title>
<source>Nature</source>
<year>2006</year>
<volume>14</volume>
<fpage>444</fpage>
<lpage>454</lpage>
<pub-id pub-id-type="doi">10.1038/nature05329</pub-id>
<pub-id pub-id-type="pmid">17122850</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Conrad</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Pinto</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Redon</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Feuk</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Gokcumen</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Aerts</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Andrews</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Barnes</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fitzgerald</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ihm</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Kristiansson</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Macarthur</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Macdonald</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Onyiah</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Pang</surname>
<given-names>AWC</given-names>
</name>
<name>
<surname>Robson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stirrups</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Valsesia</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Walter</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tyler-Smith</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Carter</surname>
<given-names>NP</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Hurles</surname>
<given-names>ME</given-names>
</name>
<collab>Wellcome Trust Case Control Consortium</collab>
<article-title>Origins and functional impact of copy number variation in the human genome</article-title>
<source>Nature</source>
<year>2010</year>
<volume>14</volume>
<fpage>704</fpage>
<lpage>712</lpage>
<pub-id pub-id-type="doi">10.1038/nature08516</pub-id>
<pub-id pub-id-type="pmid">19812545</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Kidd</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Cooper</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Donahue</surname>
<given-names>WF</given-names>
</name>
<name>
<surname>Hayden</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Sampas</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Graves</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Hansen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Teague</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Alkan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Antonacci</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Haugen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Zerr</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Yamada</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Tsang</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Newman</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Tüzün</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Ebling</surname>
<given-names>HM</given-names>
</name>
<name>
<surname>Tusneem</surname>
<given-names>N</given-names>
</name>
<name>
<surname>David</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gillett</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Phelps</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Weaver</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Saranga</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Brand</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tao</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Gustafson</surname>
<given-names>E</given-names>
</name>
<name>
<surname>McKernan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Malig</surname>
<given-names>M</given-names>
</name>
<etal></etal>
<article-title>Mapping and sequencing of structural variation from eight human genomes</article-title>
<source>Nature</source>
<year>2008</year>
<volume>14</volume>
<fpage>56</fpage>
<lpage>64</lpage>
<pub-id pub-id-type="doi">10.1038/nature06862</pub-id>
<pub-id pub-id-type="pmid">18451855</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>McCarroll</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Kuruvilla</surname>
<given-names>FG</given-names>
</name>
<name>
<surname>Korn</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Cawley</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nemesh</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wysoker</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Shapero</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>de Bakker</surname>
<given-names>PIW</given-names>
</name>
<name>
<surname>Maller</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Kirby</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Elliott</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Parkin</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hubbell</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Webster</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Mei</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Veitch</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lincoln</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nizzari</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Blume</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>KW</given-names>
</name>
<name>
<surname>Rava</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Daly</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Gabriel</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>D</given-names>
</name>
<article-title>Integrated detection and population-genetic analysis of SNPs and copy number variation</article-title>
<source>Nat Genet</source>
<year>2008</year>
<volume>14</volume>
<fpage>1166</fpage>
<lpage>1174</lpage>
<pub-id pub-id-type="doi">10.1038/ng.238</pub-id>
<pub-id pub-id-type="pmid">18776908</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Sebat</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lakshmi</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Troge</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lundin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Månér</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Massa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Walker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Navin</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lucito</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Healy</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hicks</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Reiner</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gilliam</surname>
<given-names>TC</given-names>
</name>
<name>
<surname>Trask</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Zetterberg</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wigler</surname>
<given-names>M</given-names>
</name>
<article-title>Large-scale copy number polymorphism in the human genome</article-title>
<source>Science</source>
<year>2004</year>
<volume>14</volume>
<fpage>525</fpage>
<lpage>528</lpage>
<pub-id pub-id-type="doi">10.1126/science.1098918</pub-id>
<pub-id pub-id-type="pmid">15273396</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Pang</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>MacDonald</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Pinto</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rafiq</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Conrad</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hurles</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Venter</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Kirkness</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Feuk</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>SW</given-names>
</name>
<article-title>Towards a comprehensive structural variation map of an individual human genome</article-title>
<source>Genome Biol</source>
<year>2010</year>
<volume>14</volume>
<fpage>R52</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2010-11-5-r52</pub-id>
<pub-id pub-id-type="pmid">20482838</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Abecasis</surname>
<given-names>GR</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Auton</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Brooks</surname>
<given-names>LD</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Gibbs</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Hurles</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>McVean</surname>
<given-names>GA</given-names>
</name>
<collab>1000 Genomes Project Consortium</collab>
<article-title>A map of human genome variation from population-scale sequencing</article-title>
<source>Nature</source>
<year>2010</year>
<volume>14</volume>
<fpage>1061</fpage>
<lpage>1073</lpage>
<pub-id pub-id-type="doi">10.1038/nature09534</pub-id>
<pub-id pub-id-type="pmid">20981092</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Singleton</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Farrer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Singleton</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hague</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kachergus</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hulihan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Peuralinna</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Dutra</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nussbaum</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lincoln</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Crawley</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hanson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Maraganore</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Adler</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Cookson</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Muenter</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Baptista</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Blancato</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hardy</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gwinn-Hardy</surname>
<given-names>K</given-names>
</name>
<article-title>alpha-synuclein locus triplication causes Parkinson’s disease</article-title>
<source>Science</source>
<year>2003</year>
<volume>14</volume>
<fpage>841</fpage>
<pub-id pub-id-type="doi">10.1126/science.1090278</pub-id>
<pub-id pub-id-type="pmid">14593171</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Rovelet-Lecrux</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hannequin</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Raux</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Le Meur</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Laquerrière</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vital</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Dumanchin</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Feuillette</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Brice</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vercelletto</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dubas</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Frebourg</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Campion</surname>
<given-names>D</given-names>
</name>
<article-title>APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy</article-title>
<source>Nat Genet</source>
<year>2006</year>
<volume>14</volume>
<fpage>24</fpage>
<lpage>26</lpage>
<pub-id pub-id-type="doi">10.1038/ng1718</pub-id>
<pub-id pub-id-type="pmid">16369530</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Wheeler</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Srinivasan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Egholm</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>McGuire</surname>
<given-names>A</given-names>
</name>
<name>
<surname>He</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>YJ</given-names>
</name>
<name>
<surname>Makhijani</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Roth</surname>
<given-names>GT</given-names>
</name>
<name>
<surname>Gomes</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Tartaro</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Niazi</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Turcotte</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Irzyk</surname>
<given-names>GP</given-names>
</name>
<name>
<surname>Lupski</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Chinault</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>Xz</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Nazareth</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Qin</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Muzny</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Margulies</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Weinstock</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Gibbs</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Rothberg</surname>
<given-names>JM</given-names>
</name>
<article-title>The complete genome of an individual by massively parallel DNA sequencing</article-title>
<source>Nature</source>
<year>2008</year>
<volume>14</volume>
<fpage>872</fpage>
<lpage>876</lpage>
<pub-id pub-id-type="doi">10.1038/nature06884</pub-id>
<pub-id pub-id-type="pmid">18421352</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Bentley</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Balasubramanian</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Swerdlow</surname>
<given-names>HP</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>GP</given-names>
</name>
<name>
<surname>Milton</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>KP</given-names>
</name>
<name>
<surname>Evers</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Barnes</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Bignell</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Boutell</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Bryant</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Carter</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Keira Cheetham</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Ellis</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Flatbush</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Gormley</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Humphray</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Irving</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Karbelashvili</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Kirk</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Maisinger</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Murray</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Obradovic</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Ost</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Parkinson</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Pratt</surname>
<given-names>MR</given-names>
</name>
<etal></etal>
<article-title>Accurate whole human genome sequencing using reversible terminator chemistry</article-title>
<source>Nature</source>
<year>2008</year>
<volume>14</volume>
<fpage>53</fpage>
<lpage>59</lpage>
<pub-id pub-id-type="doi">10.1038/nature07517</pub-id>
<pub-id pub-id-type="pmid">18987734</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>McKernan</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Peckham</surname>
<given-names>HE</given-names>
</name>
<name>
<surname>Costa</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>McLaughlin</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tsung</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Clouser</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Duncan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ichikawa</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Ranade</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Dimalanta</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Hyland</surname>
<given-names>FC</given-names>
</name>
<name>
<surname>Sokolsky</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Sheridan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hendrickson</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kotler</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stuart</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Malek</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Manning</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Antipova</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Perez</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Hayashibara</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Lyons</surname>
<given-names>MR</given-names>
</name>
<collab>Beaudoin RE</collab>
<etal></etal>
<article-title>Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding</article-title>
<source>Genome Res</source>
<year>2009</year>
<volume>14</volume>
<fpage>1527</fpage>
<lpage>1541</lpage>
<pub-id pub-id-type="doi">10.1101/gr.091868.109</pub-id>
<pub-id pub-id-type="pmid">19546169</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Teer</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Mullikin</surname>
<given-names>JC</given-names>
</name>
<article-title>Exome sequencing: the sweet spot before whole genomes</article-title>
<source>Hum Mol Genet</source>
<year>2010</year>
<volume>14</volume>
<fpage>R145</fpage>
<lpage>R151</lpage>
<pub-id pub-id-type="doi">10.1093/hmg/ddq333</pub-id>
<pub-id pub-id-type="pmid">20705737</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Ng</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>EH</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>Flygare</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Bigham</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shaffer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bhattacharjee</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Bamshad</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nickerson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Shendure</surname>
<given-names>J</given-names>
</name>
<article-title>Targeted capture and massively parallel sequencing of 12 human exomes</article-title>
<source>Nature</source>
<year>2009</year>
<volume>14</volume>
<fpage>272</fpage>
<lpage>276</lpage>
<pub-id pub-id-type="doi">10.1038/nature08250</pub-id>
<pub-id pub-id-type="pmid">19684571</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Hormozdiari</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Alkan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Sahinalp</surname>
<given-names>SC</given-names>
</name>
<article-title>Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes</article-title>
<source>Genome Res</source>
<year>2009</year>
<volume>14</volume>
<fpage>1270</fpage>
<lpage>1278</lpage>
<pub-id pub-id-type="doi">10.1101/gr.088633.108</pub-id>
<pub-id pub-id-type="pmid">19447966</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Korbel</surname>
<given-names>JO</given-names>
</name>
<name>
<surname>Abyzov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mu</surname>
<given-names>XJ</given-names>
</name>
<name>
<surname>Carriero</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Cayting</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>MB</given-names>
</name>
<article-title>PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data</article-title>
<source>Genome Biol</source>
<year>2009</year>
<volume>14</volume>
<fpage>R23</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2009-10-2-r23</pub-id>
<pub-id pub-id-type="pmid">19236709</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Karakoc</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Alkan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>O’Roak</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Dennis</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Vives</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Mark</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Rieder</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Nickerson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>Detection of structural variants and indels within exome data</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>14</volume>
<fpage>176</fpage>
<lpage>178</lpage>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Ye</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Schulz</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Long</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ning</surname>
<given-names>Z</given-names>
</name>
<article-title>Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>14</volume>
<fpage>2865</fpage>
<lpage>2871</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp394</pub-id>
<pub-id pub-id-type="pmid">19561018</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<name>
<surname>Magi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Benelli</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yoon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Roviello</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Torricelli</surname>
<given-names>F</given-names>
</name>
<article-title>Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>14</volume>
<fpage>e65</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gkr068</pub-id>
<pub-id pub-id-type="pmid">21321017</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Yoon</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Xuan</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Makarov</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Sebat</surname>
<given-names>J</given-names>
</name>
<article-title>Sensitive and accurate detection of copy number variants using read depth of coverage</article-title>
<source>Genome Res</source>
<year>2009</year>
<volume>14</volume>
<fpage>1586</fpage>
<lpage>1592</lpage>
<pub-id pub-id-type="doi">10.1101/gr.092981.109</pub-id>
<pub-id pub-id-type="pmid">19657104</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Chiang</surname>
<given-names>DY</given-names>
</name>
<name>
<surname>Getz</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>O’Kelly</surname>
<given-names>MJT</given-names>
</name>
<name>
<surname>Zhao</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Carter</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Russ</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Meyerson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
<article-title>High-resolution mapping of copy-number alterations with massively parallel sequencing</article-title>
<source>Nat Methods</source>
<year>2009</year>
<volume>14</volume>
<fpage>99</fpage>
<lpage>103</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1276</pub-id>
<pub-id pub-id-type="pmid">19043412</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Sathirapongsasuti</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Horst</surname>
<given-names>BAJ</given-names>
</name>
<name>
<surname>Brunner</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Cochran</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Binder</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Quackenbush</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>SF</given-names>
</name>
<article-title>Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>14</volume>
<fpage>2648</fpage>
<lpage>2654</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr462</pub-id>
<pub-id pub-id-type="pmid">21828086</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Krumm</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sudmant</surname>
<given-names>PH</given-names>
</name>
<name>
<surname>Ko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>O’Roak</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Malig</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Quinlan</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Nickerson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<collab>NHLBI Exome Sequencing Project</collab>
<article-title>Copy number variation detection and genotyping from exome sequence data</article-title>
<source>Genome Res</source>
<year>2012</year>
<volume>14</volume>
<fpage>1525</fpage>
<lpage>1532</lpage>
<pub-id pub-id-type="doi">10.1101/gr.138115.112</pub-id>
<pub-id pub-id-type="pmid">22585873</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Fromer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Moran</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Chambert</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Banks</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Bergen</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Ruderfer</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>McCarroll</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>O’Donovan</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Owen</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Kirov</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Hultman</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Sklar</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Purcell</surname>
<given-names>SM</given-names>
</name>
<article-title>Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth</article-title>
<source>Am J Hum Genet</source>
<year>2012</year>
<volume>14</volume>
<fpage>597</fpage>
<lpage>607</lpage>
<pub-id pub-id-type="doi">10.1016/j.ajhg.2012.08.005</pub-id>
<pub-id pub-id-type="pmid">23040492</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lupat</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Amarasinghe</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Doyle</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Ryland</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Tothill</surname>
<given-names>RW</given-names>
</name>
<name>
<surname>Halgamuge</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>IG</given-names>
</name>
<name>
<surname>Gorringe</surname>
<given-names>KL</given-names>
</name>
<article-title>CONTRA: copy number analysis for targeted resequencing</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>14</volume>
<fpage>1307</fpage>
<lpage>1313</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts146</pub-id>
<pub-id pub-id-type="pmid">22474122</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Olshen</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Venkatraman</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Lucito</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wigler</surname>
<given-names>M</given-names>
</name>
<article-title>Circular binary segmentation for the analysis of array-based DNA copy number data</article-title>
<source>Biostatistics</source>
<year>2004</year>
<volume>14</volume>
<fpage>557</fpage>
<lpage>572</lpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxh008</pub-id>
<pub-id pub-id-type="pmid">15475419</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Koboldt</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Larson</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>D</given-names>
</name>
<name>
<surname>McLellan</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Ding</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>RK</given-names>
</name>
<article-title>VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing</article-title>
<source>Genome Res</source>
<year>2012</year>
<volume>14</volume>
<fpage>568</fpage>
<lpage>576</lpage>
<pub-id pub-id-type="doi">10.1101/gr.129684.111</pub-id>
<pub-id pub-id-type="pmid">22300766</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Magi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Tattini</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Pippucci</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Torricelli</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Benelli</surname>
<given-names>M</given-names>
</name>
<article-title>Read count approach for DNA copy number variants detection</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>14</volume>
<fpage>470</fpage>
<lpage>478</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr707</pub-id>
<pub-id pub-id-type="pmid">22199393</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<name>
<surname>Harismendy</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Strausberg</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Stockwell</surname>
<given-names>TB</given-names>
</name>
<name>
<surname>Beeson</surname>
<given-names>KY</given-names>
</name>
<name>
<surname>Schork</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Murray</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Topol</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Frazer</surname>
<given-names>KA</given-names>
</name>
<article-title>Evaluation of next generation sequencing platforms for population targeted sequencing studies</article-title>
<source>Genome Biol</source>
<year>2009</year>
<volume>14</volume>
<fpage>R32</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2009-10-3-r32</pub-id>
<pub-id pub-id-type="pmid">19327155</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Dohm</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Lottaz</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Borodina</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Himmelbauer</surname>
<given-names>H</given-names>
</name>
<article-title>Substantial biases in ultra-short read data sets from high-throughput DNA sequencing</article-title>
<source>Nucleic Acids Res</source>
<year>2008</year>
<volume>14</volume>
<fpage>e105</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn425</pub-id>
<pub-id pub-id-type="pmid">18660515</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Hillier</surname>
<given-names>LW</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>GT</given-names>
</name>
<name>
<surname>Quinlan</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Dooling</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Fewell</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Barnett</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Glasscock</surname>
<given-names>JI</given-names>
</name>
<name>
<surname>Hickenbotham</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Magrini</surname>
<given-names>VJ</given-names>
</name>
<name>
<surname>Richt</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>SN</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Stromberg</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tsung</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Wylie</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Schedl</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Mardis</surname>
<given-names>ER</given-names>
</name>
<article-title>Whole-genome sequencing and variant discovery in C. elegans</article-title>
<source>Nat Methods</source>
<year>2008</year>
<volume>14</volume>
<fpage>183</fpage>
<lpage>188</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1179</pub-id>
<pub-id pub-id-type="pmid">18204455</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Magi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Benelli</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Marseglia</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Nannetti</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Scordo</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Torricelli</surname>
<given-names>F</given-names>
</name>
<article-title>A shifting level model algorithm that identifies aberrations in array-CGH data</article-title>
<source>Biostatistics</source>
<year>2010</year>
<volume>14</volume>
<fpage>265</fpage>
<lpage>280</lpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxp051</pub-id>
<pub-id pub-id-type="pmid">19948744</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<name>
<surname>Benelli</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Marseglia</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Nannetti</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Paravidino</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zara</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Bricarelli</surname>
<given-names>FD</given-names>
</name>
<name>
<surname>Torricelli</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Magi</surname>
<given-names>A</given-names>
</name>
<article-title>A very fast and accurate method for calling aberrations in array-CGH data</article-title>
<source>Biostatistics</source>
<year>2010</year>
<volume>14</volume>
<fpage>515</fpage>
<lpage>518</lpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxq008</pub-id>
<pub-id pub-id-type="pmid">20207682</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Lai</surname>
<given-names>WR</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Kucherlapati</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>PJ</given-names>
</name>
<article-title>Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>14</volume>
<fpage>3763</fpage>
<lpage>3770</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti611</pub-id>
<pub-id pub-id-type="pmid">16081473</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Stark</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hayward</surname>
<given-names>N</given-names>
</name>
<article-title>Genome-wide loss of heterozygosity and copy number analysis in melanoma using high-density single-nucleotide polymorphism arrays</article-title>
<source>Cancer Res</source>
<year>2007</year>
<volume>14</volume>
<fpage>2632</fpage>
<lpage>2642</lpage>
<pub-id pub-id-type="doi">10.1158/0008-5472.CAN-06-4152</pub-id>
<pub-id pub-id-type="pmid">17363583</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Clark</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lam</surname>
<given-names>HYK</given-names>
</name>
<name>
<surname>Karczewski</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Euskirchen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Butte</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<article-title>Performance comparison of exome DNA sequencing technologies</article-title>
<source>Nat Biotechnol</source>
<year>2011</year>
<volume>14</volume>
<fpage>908</fpage>
<lpage>914</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.1975</pub-id>
<pub-id pub-id-type="pmid">21947028</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="other">
<article-title>The International Standards for Cytogenomic Arrays (ISCA) Consortium</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.iscaconsortium.org">http://www.iscaconsortium.org</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Cooper</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Girirajan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rosenfeld</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Vu</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stalker</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hamid</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hannig</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Abdel-Hamid</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>P</given-names>
</name>
<name>
<surname>McCracken</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Niyazov</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Leppig</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Thiese</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hummel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Gorski</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kussmann</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shashi</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Rehder</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ballif</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Shaffer</surname>
<given-names>LG</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>A copy number variation morbidity map of developmental delay</article-title>
<source>Nat Genet</source>
<year>2011</year>
<volume>14</volume>
<fpage>838</fpage>
<lpage>846</lpage>
<pub-id pub-id-type="doi">10.1038/ng.909</pub-id>
<pub-id pub-id-type="pmid">21841781</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="other">
<article-title>OMIM Database</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/omim/">http://www.ncbi.nlm.nih.gov/omim/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>Fast gapped-read alignment with Bowtie 2</article-title>
<source>Nat Methods</source>
<year>2012</year>
<volume>14</volume>
<fpage>357</fpage>
<lpage>359</lpage>
<pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id>
<pub-id pub-id-type="pmid">22388286</pub-id>
</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lam</surname>
<given-names>TW</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Kristiansen</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<article-title>SOAP2: an improved ultrafast tool for short read alignment</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>14</volume>
<fpage>1966</fpage>
<lpage>1967</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp336</pub-id>
<pub-id pub-id-type="pmid">19497933</pub-id>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="other">
<article-title>Picard Tools</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://picard.sourceforge.net">http://picard.sourceforge.net</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wysoker</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Homer</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Abecasis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<article-title>The sequence alignment/map format and SAMtools</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>14</volume>
<fpage>2078</fpage>
<lpage>2079</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id>
<pub-id pub-id-type="pmid">19505943</pub-id>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="journal">
<name>
<surname>McKenna</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hanna</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Banks</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Sivachenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cibulskis</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kernytsky</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Garimella</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Gabriel</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Daly</surname>
<given-names>M</given-names>
</name>
<name>
<surname>DePristo</surname>
<given-names>MA</given-names>
</name>
<article-title>The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data</article-title>
<source>Genome Res</source>
<year>2010</year>
<volume>14</volume>
<fpage>1297</fpage>
<lpage>1303</lpage>
<pub-id pub-id-type="doi">10.1101/gr.107524.110</pub-id>
<pub-id pub-id-type="pmid">20644199</pub-id>
</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="other">
<article-title>UCSC Genome Browser</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu">http://genome.ucsc.edu</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="journal">
<name>
<surname>Koehler</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Issac</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cloonan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Grimmond</surname>
<given-names>SM</given-names>
</name>
<article-title>The uniqueome: a mappability resource for short-tag sequencing</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>14</volume>
<fpage>272</fpage>
<lpage>274</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq640</pub-id>
<pub-id pub-id-type="pmid">21075741</pub-id>
</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="other">
<article-title>Imagenix Sequence Alignment System</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.imagenix.com">http://www.imagenix.com</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B51">
<mixed-citation publication-type="other">
<article-title>Uniqueome download page</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://grimmond.imb.uq.edu.au/uniqueome/downloads/">http://grimmond.imb.uq.edu.au/uniqueome/downloads/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B52">
<mixed-citation publication-type="other">
<article-title>EXCAVATOR</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/excavatortool/">http://sourceforge.net/projects/excavatortool/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B53">
<mixed-citation publication-type="other">
<article-title>1000 Genomes Project Consortium</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.1000genomes.org">http://www.1000genomes.org</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B54">
<mixed-citation publication-type="other">
<article-title>ExomeCNV</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/ExomeCNV/index.html">http://cran.r-project.org/web/packages/ExomeCNV/index.html</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B55">
<mixed-citation publication-type="other">
<article-title>CoNIFER</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://conifer.sourceforge.net">http://conifer.sourceforge.net</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B56">
<mixed-citation publication-type="other">
<article-title>XHMM</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://atgu.mgh.harvard.edu/xhmm/">http://atgu.mgh.harvard.edu/xhmm/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B57">
<mixed-citation publication-type="other">
<article-title>XHMM tutorial</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml">http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B58">
<mixed-citation publication-type="other">
<article-title>1000 Genomes Project Consortium ftp site</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/">http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B59">
<mixed-citation publication-type="other">
<article-title>FASTX-Toolkit</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://hannonlab.cshl.edu/fastx_toolkit">http://hannonlab.cshl.edu/fastx_toolkit</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B60">
<mixed-citation publication-type="other">
<article-title>BWA</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/bio-bwa/files/">http://sourceforge.net/projects/bio-bwa/files/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B61">
<mixed-citation publication-type="other">
<article-title>Bowtie2</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/bowtie-bio/files/bowtie2/">http://sourceforge.net/projects/bowtie-bio/files/bowtie2/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B62">
<mixed-citation publication-type="other">
<article-title>SOAP2</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://soap.genomics.org.cn/soapaligner.html">http://soap.genomics.org.cn/soapaligner.html</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B63">
<mixed-citation publication-type="other">
<article-title>SAMtools</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://samtools.sourceforge.net/">http://samtools.sourceforge.net/</ext-link>
]</comment>
</mixed-citation>
</ref>
<ref id="B64">
<mixed-citation publication-type="other">
<article-title>The Genome Analysis Toolkit (GATK)</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.broadinstitute.org/gatk/">http://www.broadinstitute.org/gatk/</ext-link>
]</comment>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000345 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000345 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4053953
   |texte=   EXCAVATOR: detecting copy number variants from whole-exome sequencing data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24172663" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024