Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome

Identifieur interne : 000949 ( Pmc/Corpus ); précédent : 000948; suivant : 000950

Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome

Auteurs : Wentian Li ; Jan Freudenberg ; Pedro Miramontes

Source :

RBID : PMC:3927684

Abstract

Background

The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the k-mer distribution of the human reference genome. The k-mer frequency is determined for k ranging from 20 bp to 1000 bp.

Results

We observe that the proportion of non-singletons k-mers decreases slowly with increasing k, and can be fitted by piecewise power-law functions with different exponents at different ranges of k. A slower decay at greater values for k indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of k-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications.

Conclusion

Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.


Url:
DOI: 10.1186/1471-2105-15-2
PubMed: 24386976
PubMed Central: 3927684

Links to Exploration step

PMC:3927684

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Diminishing return for increased Mappability with longer sequencing reads: implications of the
<italic>k</italic>
-mer distributions in the human genome</title>
<author>
<name sortKey="Li, Wentian" sort="Li, Wentian" uniqKey="Li W" first="Wentian" last="Li">Wentian Li</name>
<affiliation>
<nlm:aff id="I1">The Robert S. Boas Center for Genomics and Human Genetic, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Freudenberg, Jan" sort="Freudenberg, Jan" uniqKey="Freudenberg J" first="Jan" last="Freudenberg">Jan Freudenberg</name>
<affiliation>
<nlm:aff id="I1">The Robert S. Boas Center for Genomics and Human Genetic, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Miramontes, Pedro" sort="Miramontes, Pedro" uniqKey="Miramontes P" first="Pedro" last="Miramontes">Pedro Miramontes</name>
<affiliation>
<nlm:aff id="I2">Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, 04510 DF México, México</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24386976</idno>
<idno type="pmc">3927684</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3927684</idno>
<idno type="RBID">PMC:3927684</idno>
<idno type="doi">10.1186/1471-2105-15-2</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000949</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000949</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Diminishing return for increased Mappability with longer sequencing reads: implications of the
<italic>k</italic>
-mer distributions in the human genome</title>
<author>
<name sortKey="Li, Wentian" sort="Li, Wentian" uniqKey="Li W" first="Wentian" last="Li">Wentian Li</name>
<affiliation>
<nlm:aff id="I1">The Robert S. Boas Center for Genomics and Human Genetic, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Freudenberg, Jan" sort="Freudenberg, Jan" uniqKey="Freudenberg J" first="Jan" last="Freudenberg">Jan Freudenberg</name>
<affiliation>
<nlm:aff id="I1">The Robert S. Boas Center for Genomics and Human Genetic, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Miramontes, Pedro" sort="Miramontes, Pedro" uniqKey="Miramontes P" first="Pedro" last="Miramontes">Pedro Miramontes</name>
<affiliation>
<nlm:aff id="I2">Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, 04510 DF México, México</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the
<italic>k</italic>
-mer distribution of the human reference genome. The
<italic>k</italic>
-mer frequency is determined for
<italic>k</italic>
ranging from 20 bp to 1000 bp.</p>
</sec>
<sec>
<title>Results</title>
<p>We observe that the proportion of non-singletons
<italic>k</italic>
-mers decreases slowly with increasing
<italic>k</italic>
, and can be fitted by piecewise power-law functions with different exponents at different ranges of
<italic>k</italic>
. A slower decay at greater values for
<italic>k</italic>
indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of
<italic>k</italic>
-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozowsky, J" uniqKey="Rozowsky J">J Rozowsky</name>
</author>
<author>
<name sortKey="Euskirchen, G" uniqKey="Euskirchen G">G Euskirchen</name>
</author>
<author>
<name sortKey="Auerbach, Rk" uniqKey="Auerbach R">RK Auerbach</name>
</author>
<author>
<name sortKey="Zhang, Zd" uniqKey="Zhang Z">ZD Zhang</name>
</author>
<author>
<name sortKey="Gibson, T" uniqKey="Gibson T">T Gibson</name>
</author>
<author>
<name sortKey="Bjornson, R" uniqKey="Bjornson R">R Bjornson</name>
</author>
<author>
<name sortKey="Carriero, N" uniqKey="Carriero N">N Carriero</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
<author>
<name sortKey="Gerstein, Mb" uniqKey="Gerstein M">MB Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cahill, Mj" uniqKey="Cahill M">MJ Cahill</name>
</author>
<author>
<name sortKey="Koser, Cu" uniqKey="Koser C">CU Köser</name>
</author>
<author>
<name sortKey="Ross, Ne" uniqKey="Ross N">NE Ross</name>
</author>
<author>
<name sortKey="Archer, Jac" uniqKey="Archer J">JAC Archer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koehler, R" uniqKey="Koehler R">R Koehler</name>
</author>
<author>
<name sortKey="Issac, H" uniqKey="Issac H">H Issac</name>
</author>
<author>
<name sortKey="Cloonan, N" uniqKey="Cloonan N">N Cloonan</name>
</author>
<author>
<name sortKey="Grimmond, Sm" uniqKey="Grimmond S">SM Grimmond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Derrien, T" uniqKey="Derrien T">T Derrien</name>
</author>
<author>
<name sortKey="Marco Sola, M" uniqKey="Marco Sola M">M Marco Sola</name>
</author>
<author>
<name sortKey="Knowles, Dg" uniqKey="Knowles D">DG Knowles</name>
</author>
<author>
<name sortKey="Raineri, E" uniqKey="Raineri E">E Raineri</name>
</author>
<author>
<name sortKey="Ribeca, P" uniqKey="Ribeca P">P Ribeca</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, H" uniqKey="Lee H">H Lee</name>
</author>
<author>
<name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Storvall, H" uniqKey="Storvall H">H Storvall</name>
</author>
<author>
<name sortKey="Ramskold, D" uniqKey="Ramskold D">D Ramsköld</name>
</author>
<author>
<name sortKey="Sandberg, R" uniqKey="Sandberg R">R Sandberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weber, Jl" uniqKey="Weber J">JL Weber</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Green, Ed" uniqKey="Green E">ED Green</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fraenkel, As" uniqKey="Fraenkel A">AS Fraenkel</name>
</author>
<author>
<name sortKey="Gillis, J" uniqKey="Gillis J">J Gillis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stoppa Lyonnet, D" uniqKey="Stoppa Lyonnet D">D Stoppa-Lyonnet</name>
</author>
<author>
<name sortKey="Carter, Pe" uniqKey="Carter P">PE Carter</name>
</author>
<author>
<name sortKey="Meo, T" uniqKey="Meo T">T Meo</name>
</author>
<author>
<name sortKey="Tosi, M" uniqKey="Tosi M">M Tosi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Conrad, B" uniqKey="Conrad B">B Conrad</name>
</author>
<author>
<name sortKey="Antonarakis, Se" uniqKey="Antonarakis S">SE Antonarakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ohno, S" uniqKey="Ohno S">S Ohno</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nowak, Ma" uniqKey="Nowak M">MA Nowak</name>
</author>
<author>
<name sortKey="Cooke, J" uniqKey="Cooke J">J Cooke</name>
</author>
<author>
<name sortKey="Maynard Smith, J" uniqKey="Maynard Smith J">J Maynard Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fortna, A" uniqKey="Fortna A">A Fortna</name>
</author>
<author>
<name sortKey="Kim, Y" uniqKey="Kim Y">Y Kim</name>
</author>
<author>
<name sortKey="Maclaren, E" uniqKey="Maclaren E">E MacLaren</name>
</author>
<author>
<name sortKey="Marshall, K" uniqKey="Marshall K">K Marshall</name>
</author>
<author>
<name sortKey="Hahn, G" uniqKey="Hahn G">G Hahn</name>
</author>
<author>
<name sortKey="Meltesen, L" uniqKey="Meltesen L">L Meltesen</name>
</author>
<author>
<name sortKey="Brenton, M" uniqKey="Brenton M">M Brenton</name>
</author>
<author>
<name sortKey="Hink, R" uniqKey="Hink R">R Hink</name>
</author>
<author>
<name sortKey="Burgers, S" uniqKey="Burgers S">S Burgers</name>
</author>
<author>
<name sortKey="Hernandez Boussard, T" uniqKey="Hernandez Boussard T">T Hernandez-Boussard</name>
</author>
<author>
<name sortKey="Karimpour Fard, A" uniqKey="Karimpour Fard A">A Karimpour-Fard</name>
</author>
<author>
<name sortKey="Glueck, D" uniqKey="Glueck D">D Glueck</name>
</author>
<author>
<name sortKey="Mcgavran, L" uniqKey="Mcgavran L">L McGavran</name>
</author>
<author>
<name sortKey="Berry, R" uniqKey="Berry R">R Berry</name>
</author>
<author>
<name sortKey="Pollack, J" uniqKey="Pollack J">J Pollack</name>
</author>
<author>
<name sortKey="Sikela, Jm" uniqKey="Sikela J">JM Sikela</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krakauer, Dc" uniqKey="Krakauer D">DC Krakauer</name>
</author>
<author>
<name sortKey="Plotkin, Jb" uniqKey="Plotkin J">JB Plotkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcotte, Em" uniqKey="Marcotte E">EM Marcotte</name>
</author>
<author>
<name sortKey="Pellegrini, M" uniqKey="Pellegrini M">M Pellegrini</name>
</author>
<author>
<name sortKey="Yeates, To" uniqKey="Yeates T">TO Yeates</name>
</author>
<author>
<name sortKey="Eisenberg, D" uniqKey="Eisenberg D">D Eisenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, L" uniqKey="Liu L">L Liu</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author>
<name sortKey="Hu, N" uniqKey="Hu N">N Hu</name>
</author>
<author>
<name sortKey="He, Y" uniqKey="He Y">Y He</name>
</author>
<author>
<name sortKey="Pong, R" uniqKey="Pong R">R Pong</name>
</author>
<author>
<name sortKey="Lin, D" uniqKey="Lin D">D Lin</name>
</author>
<author>
<name sortKey="Lu, L" uniqKey="Lu L">L Lu</name>
</author>
<author>
<name sortKey="Law, M" uniqKey="Law M">M Law</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eisenstein, M" uniqKey="Eisenstein M">M Eisenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heiner, C" uniqKey="Heiner C">C Heiner</name>
</author>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
<author>
<name sortKey="Ashby, M" uniqKey="Ashby M">M Ashby</name>
</author>
<author>
<name sortKey="Guo, Y" uniqKey="Guo Y">Y Guo</name>
</author>
<author>
<name sortKey="Underwood, J" uniqKey="Underwood J">J Underwood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, Pf" uniqKey="Brown P">PF Brown</name>
</author>
<author>
<name sortKey="Desouza, Pv" uniqKey="Desouza P">PV deSouza</name>
</author>
<author>
<name sortKey="Mercer, Rl" uniqKey="Mercer R">RL Mercer</name>
</author>
<author>
<name sortKey="Pietra, Vj" uniqKey="Pietra V">VJ Pietra</name>
</author>
<author>
<name sortKey="Lao, Jc" uniqKey="Lao J">JC Lao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baayen, Rh" uniqKey="Baayen R">RH Baayen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Phoophakdee, B" uniqKey="Phoophakdee B">B Phoophakdee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Phoophakdee, B" uniqKey="Phoophakdee B">B Phoophakdee</name>
</author>
<author>
<name sortKey="Zaki, Mj" uniqKey="Zaki M">MJ Zaki</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Q" uniqKey="Li Q">Q Li</name>
</author>
<author>
<name sortKey="Yu, C" uniqKey="Yu C">C Yu</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Lam, Tw" uniqKey="Lam T">TW Lam</name>
</author>
<author>
<name sortKey="Kristiansen, K" uniqKey="Kristiansen K">K Kristiansen</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chu, Ht" uniqKey="Chu H">HT Chu</name>
</author>
<author>
<name sortKey="Hsiao, Wwl" uniqKey="Hsiao W">WWL Hsiao</name>
</author>
<author>
<name sortKey="Tsao, Tt" uniqKey="Tsao T">TT Tsao</name>
</author>
<author>
<name sortKey="Hsu, Df" uniqKey="Hsu D">DF Hsu</name>
</author>
<author>
<name sortKey="Chen, Cc" uniqKey="Chen C">CC Chen</name>
</author>
<author>
<name sortKey="Lee, Sa" uniqKey="Lee S">SA Lee</name>
</author>
<author>
<name sortKey="Kao, Cy" uniqKey="Kao C">CY Kao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author>
<name sortKey="Lavenier, D" uniqKey="Lavenier D">D Lavenier</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kurtz, S" uniqKey="Kurtz S">S Kurtz</name>
</author>
<author>
<name sortKey="Narechania, A" uniqKey="Narechania A">A Narechania</name>
</author>
<author>
<name sortKey="Stein, Jc" uniqKey="Stein J">JC Stein</name>
</author>
<author>
<name sortKey="Ware, D" uniqKey="Ware D">D Ware</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author>
<name sortKey="Pritchard, Jk" uniqKey="Pritchard J">JK Pritchard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Anderson, C" uniqKey="Anderson C">C Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clauset, A" uniqKey="Clauset A">A Clauset</name>
</author>
<author>
<name sortKey="Shalizi, Cr" uniqKey="Shalizi C">CR Shalizi</name>
</author>
<author>
<name sortKey="Newman, Mej" uniqKey="Newman M">MEJ Newman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharp, Aj" uniqKey="Sharp A">AJ Sharp</name>
</author>
<author>
<name sortKey="Locke, Dp" uniqKey="Locke D">DP Locke</name>
</author>
<author>
<name sortKey="Mcgrath, Sd" uniqKey="Mcgrath S">SD McGrath</name>
</author>
<author>
<name sortKey="Cheng, Z" uniqKey="Cheng Z">Z Cheng</name>
</author>
<author>
<name sortKey="Bailey, Ja" uniqKey="Bailey J">JA Bailey</name>
</author>
<author>
<name sortKey="Vallente, Ru" uniqKey="Vallente R">RU Vallente</name>
</author>
<author>
<name sortKey="Pertz, Lm" uniqKey="Pertz L">LM Pertz</name>
</author>
<author>
<name sortKey="Clark, Ra" uniqKey="Clark R">RA Clark</name>
</author>
<author>
<name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author>
<name sortKey="Segraves, R" uniqKey="Segraves R">R Segraves</name>
</author>
<author>
<name sortKey="Oseroff, Vv" uniqKey="Oseroff V">VV Oseroff</name>
</author>
<author>
<name sortKey="Albertson, Dg" uniqKey="Albertson D">DG Albertson</name>
</author>
<author>
<name sortKey="Pinkel, D" uniqKey="Pinkel D">D Pinkel</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perry, Gh" uniqKey="Perry G">GH Perry</name>
</author>
<author>
<name sortKey="Tchinda, J" uniqKey="Tchinda J">J Tchinda</name>
</author>
<author>
<name sortKey="Mcgrath, Sd" uniqKey="Mcgrath S">SD McGrath</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Picker, Sr" uniqKey="Picker S">SR Picker</name>
</author>
<author>
<name sortKey="Caceres, Am" uniqKey="Caceres A">AM Cáceres</name>
</author>
<author>
<name sortKey="Iafrate, Aj" uniqKey="Iafrate A">AJ Iafrate</name>
</author>
<author>
<name sortKey="Tyler Smith, C" uniqKey="Tyler Smith C">C Tyler-Smith</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
<author>
<name sortKey="Stone, Ac" uniqKey="Stone A">AC Stone</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Genovese, G" uniqKey="Genovese G">G Genovese</name>
</author>
<author>
<name sortKey="Handsaker, Re" uniqKey="Handsaker R">RE Handsaker</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Altemose, N" uniqKey="Altemose N">N Altemose</name>
</author>
<author>
<name sortKey="Lindgren, Am" uniqKey="Lindgren A">AM Lindgren</name>
</author>
<author>
<name sortKey="Chambert, K" uniqKey="Chambert K">K Chambert</name>
</author>
<author>
<name sortKey="Pasaniuc, B" uniqKey="Pasaniuc B">B Pasaniuc</name>
</author>
<author>
<name sortKey="Price, Al" uniqKey="Price A">AL Price</name>
</author>
<author>
<name sortKey="Reich, D" uniqKey="Reich D">D Reich</name>
</author>
<author>
<name sortKey="Morton, Cc" uniqKey="Morton C">CC Morton</name>
</author>
<author>
<name sortKey="Pollak, Mr" uniqKey="Pollak M">MR Pollak</name>
</author>
<author>
<name sortKey="Wilson, Jg" uniqKey="Wilson J">JG Wilson</name>
</author>
<author>
<name sortKey="Mccarroll, Sa" uniqKey="Mccarroll S">SA McCarroll</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, Tj" uniqKey="Wheeler T">TJ Wheeler</name>
</author>
<author>
<name sortKey="Clements, J" uniqKey="Clements J">J Clements</name>
</author>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author>
<name sortKey="Hubley, R" uniqKey="Hubley R">R Hubley</name>
</author>
<author>
<name sortKey="Jones, Ta" uniqKey="Jones T">TA Jones</name>
</author>
<author>
<name sortKey="Jurka, J" uniqKey="Jurka J">J Jurka</name>
</author>
<author>
<name sortKey="Smit, Af" uniqKey="Smit A">AF Smit</name>
</author>
<author>
<name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benson, G" uniqKey="Benson G">G Benson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Ja" uniqKey="Bailey J">JA Bailey</name>
</author>
<author>
<name sortKey="Gu, Z" uniqKey="Gu Z">Z Gu</name>
</author>
<author>
<name sortKey="Clark, Ra" uniqKey="Clark R">RA Clark</name>
</author>
<author>
<name sortKey="Reinert, K" uniqKey="Reinert K">K Reinert</name>
</author>
<author>
<name sortKey="Samonte, Rv" uniqKey="Samonte R">RV Samonte</name>
</author>
<author>
<name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author>
<name sortKey="Adams, Md" uniqKey="Adams M">MD Adams</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Li, Pw" uniqKey="Li P">PW Li</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author>
<name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author>
<name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Ja" uniqKey="Bailey J">JA Bailey</name>
</author>
<author>
<name sortKey="Yavor, Am" uniqKey="Yavor A">AM Yavor</name>
</author>
<author>
<name sortKey="Massa, Hf" uniqKey="Massa H">HF Massa</name>
</author>
<author>
<name sortKey="Trask, Bj" uniqKey="Trask B">BJ Trask</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cheung, J" uniqKey="Cheung J">J Cheung</name>
</author>
<author>
<name sortKey="Estivill, X" uniqKey="Estivill X">X Estivill</name>
</author>
<author>
<name sortKey="Khaja, R" uniqKey="Khaja R">R Khaja</name>
</author>
<author>
<name sortKey="Macdonald, Jr" uniqKey="Macdonald J">JR MacDonald</name>
</author>
<author>
<name sortKey="Lau, K" uniqKey="Lau K">K Lau</name>
</author>
<author>
<name sortKey="Tsui, Lc" uniqKey="Tsui L">LC Tsui</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Miramontes, P" uniqKey="Miramontes P">P Miramontes</name>
</author>
<author>
<name sortKey="Cocho, G" uniqKey="Cocho G">G Cocho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Miramontes, P" uniqKey="Miramontes P">P Miramontes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mansilla, R" uniqKey="Mansilla R">R Mansilla</name>
</author>
<author>
<name sortKey="Koppen, E" uniqKey="Koppen E">E Köppen</name>
</author>
<author>
<name sortKey="Cocho, G" uniqKey="Cocho G">G Cocho</name>
</author>
<author>
<name sortKey="Miramontes, P" uniqKey="Miramontes P">P Miramontes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Martinez Mekler, G" uniqKey="Martinez Mekler G">G Martínez-Mekler</name>
</author>
<author>
<name sortKey="Alvarez Martinez, R" uniqKey="Alvarez Martinez R">R Alvarez Martínez</name>
</author>
<author>
<name sortKey="Beltran Del, Rio" uniqKey="Beltran Del R">Río Beltrán del</name>
</author>
<author>
<name sortKey="Mansilla, R" uniqKey="Mansilla R">R Mansilla</name>
</author>
<author>
<name sortKey="Miramontes, P" uniqKey="Miramontes P">P Miramontes</name>
</author>
<author>
<name sortKey="Cocho, G" uniqKey="Cocho G">G Cocho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miramontes, P" uniqKey="Miramontes P">P Miramontes</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Cocho, G" uniqKey="Cocho G">G Cocho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haubold, B" uniqKey="Haubold B">B Haubold</name>
</author>
<author>
<name sortKey="Pierstorff, N" uniqKey="Pierstorff N">N Pierstorff</name>
</author>
<author>
<name sortKey="Moller, F" uniqKey="Moller F">F Möller</name>
</author>
<author>
<name sortKey="Wiehe, T" uniqKey="Wiehe T">T Wiehe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Sosa, D" uniqKey="Sosa D">D Sosa</name>
</author>
<author>
<name sortKey="Jose, Mv" uniqKey="Jose M">MV Jose</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sindi, Ss" uniqKey="Sindi S">SS Sindi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sindi, Ss" uniqKey="Sindi S">SS Sindi</name>
</author>
<author>
<name sortKey="Hunt, Br" uniqKey="Hunt B">BR Hunt</name>
</author>
<author>
<name sortKey="Yorke, Ja" uniqKey="Yorke J">JA Yorke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gabaix, X" uniqKey="Gabaix X">X Gabaix</name>
</author>
<author>
<name sortKey="Ioannides, Ym" uniqKey="Ioannides Y">YM Ioannides</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eeckhout, J" uniqKey="Eeckhout J">J Eeckhout</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vandepoele, K" uniqKey="Vandepoele K">K Vandepoele</name>
</author>
<author>
<name sortKey="Van Roy, N" uniqKey="Van Roy N">N Van Roy</name>
</author>
<author>
<name sortKey="Staes, K" uniqKey="Staes K">K Staes</name>
</author>
<author>
<name sortKey="Speleman, F" uniqKey="Speleman F">F Speleman</name>
</author>
<author>
<name sortKey="Van Roy, F" uniqKey="Van Roy F">F van Roy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paar, V" uniqKey="Paar V">V Paar</name>
</author>
<author>
<name sortKey="Glunc I, M" uniqKey="Glunc I M">M Glunc̆ić</name>
</author>
<author>
<name sortKey="Rosandi, M" uniqKey="Rosandi M">M Rosandić</name>
</author>
<author>
<name sortKey="Basar, I" uniqKey="Basar I">I Basar</name>
</author>
<author>
<name sortKey="Vlahovi, I" uniqKey="Vlahovi I">I Vlahović</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dumas, Lj" uniqKey="Dumas L">LJ Dumas</name>
</author>
<author>
<name sortKey="O Leness, Ms" uniqKey="O Leness M">MS O’Bleness</name>
</author>
<author>
<name sortKey="Davis, Jm" uniqKey="Davis J">JM Davis</name>
</author>
<author>
<name sortKey="Dickens, Cm" uniqKey="Dickens C">CM Dickens</name>
</author>
<author>
<name sortKey="Anderson, N" uniqKey="Anderson N">N Anderson</name>
</author>
<author>
<name sortKey="Keeney, Jg" uniqKey="Keeney J">JG Keeney</name>
</author>
<author>
<name sortKey="Jackson, J" uniqKey="Jackson J">J Jackson</name>
</author>
<author>
<name sortKey="Sikela, M" uniqKey="Sikela M">M Sikela</name>
</author>
<author>
<name sortKey="Raznahan, A" uniqKey="Raznahan A">A Raznahan</name>
</author>
<author>
<name sortKey="Giedd, J" uniqKey="Giedd J">J Giedd</name>
</author>
<author>
<name sortKey="Rapoport, J" uniqKey="Rapoport J">J Rapoport</name>
</author>
<author>
<name sortKey="Nagamani, Ss" uniqKey="Nagamani S">SS Nagamani</name>
</author>
<author>
<name sortKey="Erez, A" uniqKey="Erez A">A Erez</name>
</author>
<author>
<name sortKey="Brunetti Pierri, N" uniqKey="Brunetti Pierri N">N Brunetti-Pierri</name>
</author>
<author>
<name sortKey="Sugalski, R" uniqKey="Sugalski R">R Sugalski</name>
</author>
<author>
<name sortKey="Lupski, Jr" uniqKey="Lupski J">JR Lupski</name>
</author>
<author>
<name sortKey="Fingerlin, T" uniqKey="Fingerlin T">T Fingerlin</name>
</author>
<author>
<name sortKey="Cheung, Sw" uniqKey="Cheung S">SW Cheung</name>
</author>
<author>
<name sortKey="Sikela, Jm" uniqKey="Sikela J">JM Sikela</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, Yt" uniqKey="Chen Y">YT Chen</name>
</author>
<author>
<name sortKey="Iseli, C" uniqKey="Iseli C">C Iseli</name>
</author>
<author>
<name sortKey="Venditti, Ca" uniqKey="Venditti C">CA Venditti</name>
</author>
<author>
<name sortKey="Old, Lj" uniqKey="Old L">LJ Old</name>
</author>
<author>
<name sortKey="Simpson, Aj" uniqKey="Simpson A">AJ Simpson</name>
</author>
<author>
<name sortKey="Jongeneel, Cv" uniqKey="Jongeneel C">CV Jongeneel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dobrynin, P" uniqKey="Dobrynin P">P Dobrynin</name>
</author>
<author>
<name sortKey="Matyunina, E" uniqKey="Matyunina E">E Matyunina</name>
</author>
<author>
<name sortKey="Malov, Sv" uniqKey="Malov S">SV Malov</name>
</author>
<author>
<name sortKey="Kozlov, Ap" uniqKey="Kozlov A">AP Kozlov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giacalone, J" uniqKey="Giacalone J">J Giacalone</name>
</author>
<author>
<name sortKey="Friedes, J" uniqKey="Friedes J">J Friedes</name>
</author>
<author>
<name sortKey="Francke, U" uniqKey="Francke U">U Francke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tremblay, Dc" uniqKey="Tremblay D">DC Tremblay</name>
</author>
<author>
<name sortKey="Moseley, S" uniqKey="Moseley S">S Moseley</name>
</author>
<author>
<name sortKey="Chadwick, Bp" uniqKey="Chadwick B">BP Chadwick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schaap, M" uniqKey="Schaap M">M Schaap</name>
</author>
<author>
<name sortKey="Lemmers, R" uniqKey="Lemmers R">R Lemmers</name>
</author>
<author>
<name sortKey="Maassen, R" uniqKey="Maassen R">R Maassen</name>
</author>
<author>
<name sortKey="Van Der Vliet, Pj" uniqKey="Van Der Vliet P">PJ van der Vliet</name>
</author>
<author>
<name sortKey="Hoogerheide, Lf" uniqKey="Hoogerheide L">LF Hoogerheide</name>
</author>
<author>
<name sortKey="Van Dijk, Hk" uniqKey="Van Dijk H">HK van Dijk</name>
</author>
<author>
<name sortKey="Ba Turk, N" uniqKey="Ba Turk N">N Baştürk</name>
</author>
<author>
<name sortKey="De Knijff, P" uniqKey="De Knijff P">P de Knijff</name>
</author>
<author>
<name sortKey="Van Der Maarel, Sm" uniqKey="Van Der Maarel S">SM van der Maarel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Horakova, Ah" uniqKey="Horakova A">AH Horakova</name>
</author>
<author>
<name sortKey="Moseley, Sc" uniqKey="Moseley S">SC Moseley</name>
</author>
<author>
<name sortKey="Mclaughlin, Cr" uniqKey="Mclaughlin C">CR McLaughlin</name>
</author>
<author>
<name sortKey="Tremblay, Dc" uniqKey="Tremblay D">DC Tremblay</name>
</author>
<author>
<name sortKey="Chadwick, Bp" uniqKey="Chadwick B">BP Chadwick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smit, Af" uniqKey="Smit A">AF Smit</name>
</author>
<author>
<name sortKey="T Th, G" uniqKey="T Th G">G Tóth</name>
</author>
<author>
<name sortKey="Riggs, Ad" uniqKey="Riggs A">AD Riggs</name>
</author>
<author>
<name sortKey="Jurka, J" uniqKey="Jurka J">J Jurka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Ja" uniqKey="Bailey J">JA Bailey</name>
</author>
<author>
<name sortKey="Liu, G" uniqKey="Liu G">G Liu</name>
</author>
<author>
<name sortKey="Richler, Ee" uniqKey="Richler E">EE Richler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, X" uniqKey="Li X">X Li</name>
</author>
<author>
<name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosenfeld, J" uniqKey="Rosenfeld J">J Rosenfeld</name>
</author>
<author>
<name sortKey="Mason, Ce" uniqKey="Mason C">CE Mason</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, Yh" uniqKey="Chen Y">YH Chen</name>
</author>
<author>
<name sortKey="Nyeo, Sl" uniqKey="Nyeo S">SL Nyeo</name>
</author>
<author>
<name sortKey="Yeh, Cy" uniqKey="Yeh C">CY Yeh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nikolaou, C" uniqKey="Nikolaou C">C Nikolaou</name>
</author>
<author>
<name sortKey="Almirantis, Y" uniqKey="Almirantis Y">Y Almirantis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xie, H" uniqKey="Xie H">H Xie</name>
</author>
<author>
<name sortKey="Hao, B" uniqKey="Hao B">B Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chor, B" uniqKey="Chor B">B Chor</name>
</author>
<author>
<name sortKey="Horn, D" uniqKey="Horn D">D Horn</name>
</author>
<author>
<name sortKey="Goldman, N" uniqKey="Goldman N">N Goldman</name>
</author>
<author>
<name sortKey="Levy, Y" uniqKey="Levy Y">Y Levy</name>
</author>
<author>
<name sortKey="Massingham, T" uniqKey="Massingham T">T Massingham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paszkiewicz, K" uniqKey="Paszkiewicz K">K Paszkiewicz</name>
</author>
<author>
<name sortKey="Studholme, Dj" uniqKey="Studholme D">DJ Studholme</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bradnam, Kr" uniqKey="Bradnam K">KR Bradnam</name>
</author>
<author>
<name sortKey="Fass, Jn" uniqKey="Fass J">JN Fass</name>
</author>
<author>
<name sortKey="Alexandrov, A" uniqKey="Alexandrov A">A Alexandrov</name>
</author>
<author>
<name sortKey="Baranay, P" uniqKey="Baranay P">P Baranay</name>
</author>
<author>
<name sortKey="Bechner, M" uniqKey="Bechner M">M Bechner</name>
</author>
<author>
<name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Chapman, Ja" uniqKey="Chapman J">JA Chapman</name>
</author>
<author>
<name sortKey="Chapuis, G" uniqKey="Chapuis G">G Chapuis</name>
</author>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
<author>
<name sortKey="Chitsaz, H" uniqKey="Chitsaz H">H Chitsaz</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
<author>
<name sortKey="Del Fabbro, C" uniqKey="Del Fabbro C">C Del Fabbro</name>
</author>
<author>
<name sortKey="Docking, Tr" uniqKey="Docking T">TR Docking</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Earl, D" uniqKey="Earl D">D Earl</name>
</author>
<author>
<name sortKey="Emrich, S" uniqKey="Emrich S">S Emrich</name>
</author>
<author>
<name sortKey="Fedotov, P" uniqKey="Fedotov P">P Fedotov</name>
</author>
<author>
<name sortKey="Fonseca, Na" uniqKey="Fonseca N">NA Fonseca</name>
</author>
<author>
<name sortKey="Ganapathy, G" uniqKey="Ganapathy G">G Ganapathy</name>
</author>
<author>
<name sortKey="Gibbs, Ra" uniqKey="Gibbs R">RA Gibbs</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Godzaridis, E" uniqKey="Godzaridis E">E Godzaridis</name>
</author>
<author>
<name sortKey="Goldstein, S" uniqKey="Goldstein S">S Goldstein</name>
</author>
<author>
<name sortKey="Haimel, M" uniqKey="Haimel M">M Haimel</name>
</author>
<author>
<name sortKey="Hall, G" uniqKey="Hall G">G Hall</name>
</author>
<author>
<name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
<author>
<name sortKey="Hiatt, Jb" uniqKey="Hiatt J">JB Hiatt</name>
</author>
<author>
<name sortKey="Ho, Iy" uniqKey="Ho I">IY Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mu Oz, Jf" uniqKey="Mu Oz J">JF Muñoz</name>
</author>
<author>
<name sortKey="Gallo, Je" uniqKey="Gallo J">JE Gallo</name>
</author>
<author>
<name sortKey="Misas, E" uniqKey="Misas E">E Misas</name>
</author>
<author>
<name sortKey="Mcewan, Jg" uniqKey="Mcewan J">JG McEwan</name>
</author>
<author>
<name sortKey="Clay, Ok" uniqKey="Clay O">OK Clay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, D" uniqKey="Zerbino D">D Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Yuan, J" uniqKey="Yuan J">J Yuan</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Xie, Y" uniqKey="Xie Y">Y Xie</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author>
<name sortKey="Shi, Y" uniqKey="Shi Y">Y Shi</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Lam, Tw" uniqKey="Lam T">TW Lam</name>
</author>
<author>
<name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Christiansen, J" uniqKey="Christiansen J">J Christiansen</name>
</author>
<author>
<name sortKey="Dyck, Jd" uniqKey="Dyck J">JD Dyck</name>
</author>
<author>
<name sortKey="Elyas, Bg" uniqKey="Elyas B">BG Elyas</name>
</author>
<author>
<name sortKey="Lilley, M" uniqKey="Lilley M">M Lilley</name>
</author>
<author>
<name sortKey="Bamforth, Js" uniqKey="Bamforth J">JS Bamforth</name>
</author>
<author>
<name sortKey="Hicks, M" uniqKey="Hicks M">M Hicks</name>
</author>
<author>
<name sortKey="Sprysak, Ka" uniqKey="Sprysak K">KA Sprysak</name>
</author>
<author>
<name sortKey="Tomaszewski, R" uniqKey="Tomaszewski R">R Tomaszewski</name>
</author>
<author>
<name sortKey="Haase, Sm" uniqKey="Haase S">SM Haase</name>
</author>
<author>
<name sortKey="Vicen Wyhony, Lm" uniqKey="Vicen Wyhony L">LM Vicen-Wyhony</name>
</author>
<author>
<name sortKey="Somerville, Mj" uniqKey="Somerville M">MJ Somerville</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Redon, R" uniqKey="Redon R">R Redon</name>
</author>
<author>
<name sortKey="Ishikawa, S" uniqKey="Ishikawa S">S Ishikawa</name>
</author>
<author>
<name sortKey="Fitch, Kr" uniqKey="Fitch K">KR Fitch</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Perry, Gh" uniqKey="Perry G">GH Perry</name>
</author>
<author>
<name sortKey="Andrews, Td" uniqKey="Andrews T">TD Andrews</name>
</author>
<author>
<name sortKey="Fiegler, H" uniqKey="Fiegler H">H Fiegler</name>
</author>
<author>
<name sortKey="Shapero, Mh" uniqKey="Shapero M">MH Shapero</name>
</author>
<author>
<name sortKey="Carson, Ar" uniqKey="Carson A">AR Carson</name>
</author>
<author>
<name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
<author>
<name sortKey="Cho, Ek" uniqKey="Cho E">EK Cho</name>
</author>
<author>
<name sortKey="Dallaire, S" uniqKey="Dallaire S">S Dallaire</name>
</author>
<author>
<name sortKey="Freeman, Jl" uniqKey="Freeman J">JL Freeman</name>
</author>
<author>
<name sortKey="Gonzalez, Jr" uniqKey="Gonzalez J">JR González</name>
</author>
<author>
<name sortKey="Gratac S, M" uniqKey="Gratac S M">M Gratacós</name>
</author>
<author>
<name sortKey="Huang, J" uniqKey="Huang J">J Huang</name>
</author>
<author>
<name sortKey="Kalaitzopoulos, D" uniqKey="Kalaitzopoulos D">D Kalaitzopoulos</name>
</author>
<author>
<name sortKey="Komura, D" uniqKey="Komura D">D Komura</name>
</author>
<author>
<name sortKey="Macdonald, Jr" uniqKey="Macdonald J">JR MacDonald</name>
</author>
<author>
<name sortKey="Marshall, Cr" uniqKey="Marshall C">CR Marshall</name>
</author>
<author>
<name sortKey="Mei, R" uniqKey="Mei R">R Mei</name>
</author>
<author>
<name sortKey="Montgomery, L" uniqKey="Montgomery L">L Montgomery</name>
</author>
<author>
<name sortKey="Nishimura, K" uniqKey="Nishimura K">K Nishimura</name>
</author>
<author>
<name sortKey="Okamura, K" uniqKey="Okamura K">K Okamura</name>
</author>
<author>
<name sortKey="Shen, F" uniqKey="Shen F">F Shen</name>
</author>
<author>
<name sortKey="Somerville, Mj" uniqKey="Somerville M">MJ Somerville</name>
</author>
<author>
<name sortKey="Tchinda, J" uniqKey="Tchinda J">J Tchinda</name>
</author>
<author>
<name sortKey="Valsesia, A" uniqKey="Valsesia A">A Valsesia</name>
</author>
<author>
<name sortKey="Woodwark, C" uniqKey="Woodwark C">C Woodwark</name>
</author>
<author>
<name sortKey="Yang, F" uniqKey="Yang F">F Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Greenway, Sc" uniqKey="Greenway S">SC Greenway</name>
</author>
<author>
<name sortKey="Pereira, Ac" uniqKey="Pereira A">AC Pereira</name>
</author>
<author>
<name sortKey="Lin, Jc" uniqKey="Lin J">JC Lin</name>
</author>
<author>
<name sortKey="Depalma, Sr" uniqKey="Depalma S">SR DePalma</name>
</author>
<author>
<name sortKey="Israel, Sj" uniqKey="Israel S">SJ Israel</name>
</author>
<author>
<name sortKey="Mesquita, Sm" uniqKey="Mesquita S">SM Mesquita</name>
</author>
<author>
<name sortKey="Ergul, E" uniqKey="Ergul E">E Ergul</name>
</author>
<author>
<name sortKey="Conta, Jh" uniqKey="Conta J">JH Conta</name>
</author>
<author>
<name sortKey="Korn, Jm" uniqKey="Korn J">JM Korn</name>
</author>
<author>
<name sortKey="Mccarroll, Sa" uniqKey="Mccarroll S">SA McCarroll</name>
</author>
<author>
<name sortKey="Gorham, Jm" uniqKey="Gorham J">JM Gorham</name>
</author>
<author>
<name sortKey="Gabriel, S" uniqKey="Gabriel S">S Gabriel</name>
</author>
<author>
<name sortKey="Altshuler, Dm" uniqKey="Altshuler D">DM Altshuler</name>
</author>
<author>
<name sortKey="Quintanilla Dieck Mde, L" uniqKey="Quintanilla Dieck Mde L">L Quintanilla-Dieck Mde</name>
</author>
<author>
<name sortKey="Artunduaga, Ma" uniqKey="Artunduaga M">MA Artunduaga</name>
</author>
<author>
<name sortKey="Eavey, Rd" uniqKey="Eavey R">RD Eavey</name>
</author>
<author>
<name sortKey="Plenge, Rm" uniqKey="Plenge R">RM Plenge</name>
</author>
<author>
<name sortKey="Shadick, Na" uniqKey="Shadick N">NA Shadick</name>
</author>
<author>
<name sortKey="Weinblatt, Me" uniqKey="Weinblatt M">ME Weinblatt</name>
</author>
<author>
<name sortKey="De Jager, Pl" uniqKey="De Jager P">PL De Jager</name>
</author>
<author>
<name sortKey="Hafler, Da" uniqKey="Hafler D">DA Hafler</name>
</author>
<author>
<name sortKey="Breitbart, Re" uniqKey="Breitbart R">RE Breitbart</name>
</author>
<author>
<name sortKey="Seidman, Jg" uniqKey="Seidman J">JG Seidman</name>
</author>
<author>
<name sortKey="Seidman, Ce" uniqKey="Seidman C">CE Seidman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Szatmari, P" uniqKey="Szatmari P">P Szatmari</name>
</author>
<author>
<name sortKey="Paterson, Ad" uniqKey="Paterson A">AD Paterson</name>
</author>
<author>
<name sortKey="Zwaigenbaum, L" uniqKey="Zwaigenbaum L">L Zwaigenbaum</name>
</author>
<author>
<name sortKey="Roberts, W" uniqKey="Roberts W">W Roberts</name>
</author>
<author>
<name sortKey="Brian, J" uniqKey="Brian J">J Brian</name>
</author>
<author>
<name sortKey="Liu, Xq" uniqKey="Liu X">XQ Liu</name>
</author>
<author>
<name sortKey="Vincent, Jb" uniqKey="Vincent J">JB Vincent</name>
</author>
<author>
<name sortKey="Skaug, Jl" uniqKey="Skaug J">JL Skaug</name>
</author>
<author>
<name sortKey="Thompson, Ap" uniqKey="Thompson A">AP Thompson</name>
</author>
<author>
<name sortKey="Senman, L" uniqKey="Senman L">L Senman</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Qian, C" uniqKey="Qian C">C Qian</name>
</author>
<author>
<name sortKey="Bryson, Se" uniqKey="Bryson S">SE Bryson</name>
</author>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
<author>
<name sortKey="Marshall, Cr" uniqKey="Marshall C">CR Marshall</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
<author>
<name sortKey="Vieland, Vj" uniqKey="Vieland V">VJ Vieland</name>
</author>
<author>
<name sortKey="Bartlett, C" uniqKey="Bartlett C">C Bartlett</name>
</author>
<author>
<name sortKey="Mangin, Lv" uniqKey="Mangin L">LV Mangin</name>
</author>
<author>
<name sortKey="Goedken, R" uniqKey="Goedken R">R Goedken</name>
</author>
<author>
<name sortKey="Segre, A" uniqKey="Segre A">A Segre</name>
</author>
<author>
<name sortKey="Pericak Vance, Ma" uniqKey="Pericak Vance M">MA Pericak-Vance</name>
</author>
<author>
<name sortKey="Cuccaro, Ml" uniqKey="Cuccaro M">ML Cuccaro</name>
</author>
<author>
<name sortKey="Gilbert, Jr" uniqKey="Gilbert J">JR Gilbert</name>
</author>
<author>
<name sortKey="Wright, Hh" uniqKey="Wright H">HH Wright</name>
</author>
<author>
<name sortKey="Abramson, Rk" uniqKey="Abramson R">RK Abramson</name>
</author>
<author>
<name sortKey="Betancur, C" uniqKey="Betancur C">C Betancur</name>
</author>
<author>
<name sortKey="Bourgeron, T" uniqKey="Bourgeron T">T Bourgeron</name>
</author>
<author>
<name sortKey="Gillberg, C" uniqKey="Gillberg C">C Gillberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Girirajan, S" uniqKey="Girirajan S">S Girirajan</name>
</author>
<author>
<name sortKey="Dennis, My" uniqKey="Dennis M">MY Dennis</name>
</author>
<author>
<name sortKey="Baker, C" uniqKey="Baker C">C Baker</name>
</author>
<author>
<name sortKey="Malig, M" uniqKey="Malig M">M Malig</name>
</author>
<author>
<name sortKey="Coe, Bp" uniqKey="Coe B">BP Coe</name>
</author>
<author>
<name sortKey="Campbell, Cd" uniqKey="Campbell C">CD Campbell</name>
</author>
<author>
<name sortKey="Mark, K" uniqKey="Mark K">K Mark</name>
</author>
<author>
<name sortKey="Vu, Th" uniqKey="Vu T">TH Vu</name>
</author>
<author>
<name sortKey="Alkan, C" uniqKey="Alkan C">C Alkan</name>
</author>
<author>
<name sortKey="Cheng, Z" uniqKey="Cheng Z">Z Cheng</name>
</author>
<author>
<name sortKey="Biesecker, Lg" uniqKey="Biesecker L">LG Biesecker</name>
</author>
<author>
<name sortKey="Bernier, R" uniqKey="Bernier R">R Bernier</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mefford, Hc" uniqKey="Mefford H">HC Mefford</name>
</author>
<author>
<name sortKey="Sharp, Aj" uniqKey="Sharp A">AJ Sharp</name>
</author>
<author>
<name sortKey="Baker, C" uniqKey="Baker C">C Baker</name>
</author>
<author>
<name sortKey="Itsara, A" uniqKey="Itsara A">A Itsara</name>
</author>
<author>
<name sortKey="Jiang, Z" uniqKey="Jiang Z">Z Jiang</name>
</author>
<author>
<name sortKey="Buysse, K" uniqKey="Buysse K">K Buysse</name>
</author>
<author>
<name sortKey="Huang, S" uniqKey="Huang S">S Huang</name>
</author>
<author>
<name sortKey="Maloney, Vk" uniqKey="Maloney V">VK Maloney</name>
</author>
<author>
<name sortKey="Crolla, Ja" uniqKey="Crolla J">JA Crolla</name>
</author>
<author>
<name sortKey="Baralle, D" uniqKey="Baralle D">D Baralle</name>
</author>
<author>
<name sortKey="Collins, A" uniqKey="Collins A">A Collins</name>
</author>
<author>
<name sortKey="Mercer, C" uniqKey="Mercer C">C Mercer</name>
</author>
<author>
<name sortKey="Norga, K" uniqKey="Norga K">K Norga</name>
</author>
<author>
<name sortKey="De Ravel, T" uniqKey="De Ravel T">T de Ravel</name>
</author>
<author>
<name sortKey="Devriendt, K" uniqKey="Devriendt K">K Devriendt</name>
</author>
<author>
<name sortKey="Bongers, Em" uniqKey="Bongers E">EM Bongers</name>
</author>
<author>
<name sortKey="De Leeuw, N" uniqKey="De Leeuw N">N de Leeuw</name>
</author>
<author>
<name sortKey="Reardon, W" uniqKey="Reardon W">W Reardon</name>
</author>
<author>
<name sortKey="Gimelli, S" uniqKey="Gimelli S">S Gimelli</name>
</author>
<author>
<name sortKey="Bena, F" uniqKey="Bena F">F Bena</name>
</author>
<author>
<name sortKey="Hennekam, Rc" uniqKey="Hennekam R">RC Hennekam</name>
</author>
<author>
<name sortKey="Male, A" uniqKey="Male A">A Male</name>
</author>
<author>
<name sortKey="Gaunt, L" uniqKey="Gaunt L">L Gaunt</name>
</author>
<author>
<name sortKey="Clayton Smith, J" uniqKey="Clayton Smith J">J Clayton-Smith</name>
</author>
<author>
<name sortKey="Simonic, I" uniqKey="Simonic I">I Simonic</name>
</author>
<author>
<name sortKey="Park, Sm" uniqKey="Park S">SM Park</name>
</author>
<author>
<name sortKey="Mehta, Sg" uniqKey="Mehta S">SG Mehta</name>
</author>
<author>
<name sortKey="Nik Zainal, S" uniqKey="Nik Zainal S">S Nik-Zainal</name>
</author>
<author>
<name sortKey="Woods, Cg" uniqKey="Woods C">CG Woods</name>
</author>
<author>
<name sortKey="Firth, Hv" uniqKey="Firth H">HV Firth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brunetti Pierri, N" uniqKey="Brunetti Pierri N">N Brunetti-Pierri</name>
</author>
<author>
<name sortKey="Berg, Js" uniqKey="Berg J">JS Berg</name>
</author>
<author>
<name sortKey="Scaglia, F" uniqKey="Scaglia F">F Scaglia</name>
</author>
<author>
<name sortKey="Belmont, J" uniqKey="Belmont J">J Belmont</name>
</author>
<author>
<name sortKey="Bacino, Ca" uniqKey="Bacino C">CA Bacino</name>
</author>
<author>
<name sortKey="Sahoo, T" uniqKey="Sahoo T">T Sahoo</name>
</author>
<author>
<name sortKey="Lalani, Sr" uniqKey="Lalani S">SR Lalani</name>
</author>
<author>
<name sortKey="Graham, B" uniqKey="Graham B">B Graham</name>
</author>
<author>
<name sortKey="Lee, B" uniqKey="Lee B">B Lee</name>
</author>
<author>
<name sortKey="Shinawi, M" uniqKey="Shinawi M">M Shinawi</name>
</author>
<author>
<name sortKey="Shen, J" uniqKey="Shen J">J Shen</name>
</author>
<author>
<name sortKey="Kang, Sh" uniqKey="Kang S">SH Kang</name>
</author>
<author>
<name sortKey="Pursley, A" uniqKey="Pursley A">A Pursley</name>
</author>
<author>
<name sortKey="Lotze, T" uniqKey="Lotze T">T Lotze</name>
</author>
<author>
<name sortKey="Kennedy, G" uniqKey="Kennedy G">G Kennedy</name>
</author>
<author>
<name sortKey="Lansky Shafer, S" uniqKey="Lansky Shafer S">S Lansky-Shafer</name>
</author>
<author>
<name sortKey="Weaver, C" uniqKey="Weaver C">C Weaver</name>
</author>
<author>
<name sortKey="Roeder, Er" uniqKey="Roeder E">ER Roeder</name>
</author>
<author>
<name sortKey="Grebe, Ta" uniqKey="Grebe T">TA Grebe</name>
</author>
<author>
<name sortKey="Arnold, Gl" uniqKey="Arnold G">GL Arnold</name>
</author>
<author>
<name sortKey="Hutchison, T" uniqKey="Hutchison T">T Hutchison</name>
</author>
<author>
<name sortKey="Reimschisel, T" uniqKey="Reimschisel T">T Reimschisel</name>
</author>
<author>
<name sortKey="Amato, S" uniqKey="Amato S">S Amato</name>
</author>
<author>
<name sortKey="Geragthy, Mt" uniqKey="Geragthy M">MT Geragthy</name>
</author>
<author>
<name sortKey="Innis, Jw" uniqKey="Innis J">JW Innis</name>
</author>
<author>
<name sortKey="Obersztyn, E" uniqKey="Obersztyn E">E Obersztyn</name>
</author>
<author>
<name sortKey="Nowakowska, B" uniqKey="Nowakowska B">B Nowakowska</name>
</author>
<author>
<name sortKey="Rosengren, Ss" uniqKey="Rosengren S">SS Rosengren</name>
</author>
<author>
<name sortKey="Bader, Pi" uniqKey="Bader P">PI Bader</name>
</author>
<author>
<name sortKey="Grange, Dk" uniqKey="Grange D">DK Grange</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ikeda, M" uniqKey="Ikeda M">M Ikeda</name>
</author>
<author>
<name sortKey="Aleksic, B" uniqKey="Aleksic B">B Aleksic</name>
</author>
<author>
<name sortKey="Kirov, G" uniqKey="Kirov G">G Kirov</name>
</author>
<author>
<name sortKey="Kinoshita, Y" uniqKey="Kinoshita Y">Y Kinoshita</name>
</author>
<author>
<name sortKey="Yamanouchi, Y" uniqKey="Yamanouchi Y">Y Yamanouchi</name>
</author>
<author>
<name sortKey="Kitajima, T" uniqKey="Kitajima T">T Kitajima</name>
</author>
<author>
<name sortKey="Kawashima, K" uniqKey="Kawashima K">K Kawashima</name>
</author>
<author>
<name sortKey="Okochi, T" uniqKey="Okochi T">T Okochi</name>
</author>
<author>
<name sortKey="Kishi, T" uniqKey="Kishi T">T Kishi</name>
</author>
<author>
<name sortKey="Zaharieva, I" uniqKey="Zaharieva I">I Zaharieva</name>
</author>
<author>
<name sortKey="Owen, Mj" uniqKey="Owen M">MJ Owen</name>
</author>
<author>
<name sortKey="O Onovan, Mc" uniqKey="O Onovan M">MC O’Donovan</name>
</author>
<author>
<name sortKey="Ozaki, N" uniqKey="Ozaki N">N Ozaki</name>
</author>
<author>
<name sortKey="Iwata, N" uniqKey="Iwata N">N Iwata</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Diskin, Sj" uniqKey="Diskin S">SJ Diskin</name>
</author>
<author>
<name sortKey="Hou, C" uniqKey="Hou C">C Hou</name>
</author>
<author>
<name sortKey="Glessner, Jt" uniqKey="Glessner J">JT Glessner</name>
</author>
<author>
<name sortKey="Attiyeh, Ef" uniqKey="Attiyeh E">EF Attiyeh</name>
</author>
<author>
<name sortKey="Laudenslager, M" uniqKey="Laudenslager M">M Laudenslager</name>
</author>
<author>
<name sortKey="Bosse, K" uniqKey="Bosse K">K Bosse</name>
</author>
<author>
<name sortKey="Cole, K" uniqKey="Cole K">K Cole</name>
</author>
<author>
<name sortKey="Mosse, Yp" uniqKey="Mosse Y">YP Mossé</name>
</author>
<author>
<name sortKey="Wood, A" uniqKey="Wood A">A Wood</name>
</author>
<author>
<name sortKey="Lynch, Je" uniqKey="Lynch J">JE Lynch</name>
</author>
<author>
<name sortKey="Pecor, K" uniqKey="Pecor K">K Pecor</name>
</author>
<author>
<name sortKey="Diamond, M" uniqKey="Diamond M">M Diamond</name>
</author>
<author>
<name sortKey="Winter, C" uniqKey="Winter C">C Winter</name>
</author>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K Wang</name>
</author>
<author>
<name sortKey="Kim, C" uniqKey="Kim C">C Kim</name>
</author>
<author>
<name sortKey="Geiger, Ea" uniqKey="Geiger E">EA Geiger</name>
</author>
<author>
<name sortKey="Mcgrady, Pw" uniqKey="Mcgrady P">PW McGrady</name>
</author>
<author>
<name sortKey="Blakemore, Ai" uniqKey="Blakemore A">AI Blakemore</name>
</author>
<author>
<name sortKey="London, Wb" uniqKey="London W">WB London</name>
</author>
<author>
<name sortKey="Shaikh, Th" uniqKey="Shaikh T">TH Shaikh</name>
</author>
<author>
<name sortKey="Bradfield, J" uniqKey="Bradfield J">J Bradfield</name>
</author>
<author>
<name sortKey="Grant, Sf" uniqKey="Grant S">SF Grant</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Devoto, M" uniqKey="Devoto M">M Devoto</name>
</author>
<author>
<name sortKey="Rappaport, Er" uniqKey="Rappaport E">ER Rappaport</name>
</author>
<author>
<name sortKey="Hakonarson, H" uniqKey="Hakonarson H">H Hakonarson</name>
</author>
<author>
<name sortKey="Maris, Jm" uniqKey="Maris J">JM Maris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Isrie, M" uniqKey="Isrie M">M Isrie</name>
</author>
<author>
<name sortKey="Froyen, G" uniqKey="Froyen G">G Froyen</name>
</author>
<author>
<name sortKey="Devriendt, K" uniqKey="Devriendt K">K Devriendt</name>
</author>
<author>
<name sortKey="De Ravel, T" uniqKey="De Ravel T">T de Ravel</name>
</author>
<author>
<name sortKey="Fryns, Jp" uniqKey="Fryns J">JP Fryns</name>
</author>
<author>
<name sortKey="Vermeesch, Jr" uniqKey="Vermeesch J">JR Vermeesch</name>
</author>
<author>
<name sortKey="Van Esch, H" uniqKey="Van Esch H">H Van Esch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moseley, Sc" uniqKey="Moseley S">SC Moseley</name>
</author>
<author>
<name sortKey="Rizkallah, R" uniqKey="Rizkallah R">R Rizkallah</name>
</author>
<author>
<name sortKey="Tremblay, Dc" uniqKey="Tremblay D">DC Tremblay</name>
</author>
<author>
<name sortKey="Anderson, Br" uniqKey="Anderson B">BR Anderson</name>
</author>
<author>
<name sortKey="Hurt, Mm" uniqKey="Hurt M">MM Hurt</name>
</author>
<author>
<name sortKey="Chadwick, Bp" uniqKey="Chadwick B">BP Chadwick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whibley, Ac" uniqKey="Whibley A">AC Whibley</name>
</author>
<author>
<name sortKey="Plagnol, V" uniqKey="Plagnol V">V Plagnol</name>
</author>
<author>
<name sortKey="Tarpay, Ps" uniqKey="Tarpay P">PS Tarpay</name>
</author>
<author>
<name sortKey="Abidi, F" uniqKey="Abidi F">F Abidi</name>
</author>
<author>
<name sortKey="Fullston, T" uniqKey="Fullston T">T Fullston</name>
</author>
<author>
<name sortKey="Choma, Mk" uniqKey="Choma M">MK Choma</name>
</author>
<author>
<name sortKey="Boucher, Ca" uniqKey="Boucher C">CA Boucher</name>
</author>
<author>
<name sortKey="Shepherd, L" uniqKey="Shepherd L">L Shepherd</name>
</author>
<author>
<name sortKey="Willatt, L" uniqKey="Willatt L">L Willatt</name>
</author>
<author>
<name sortKey="Parkin, G" uniqKey="Parkin G">G Parkin</name>
</author>
<author>
<name sortKey="Smith, R" uniqKey="Smith R">R Smith</name>
</author>
<author>
<name sortKey="Futreal, Pa" uniqKey="Futreal P">PA Futreal</name>
</author>
<author>
<name sortKey="Shaw, M" uniqKey="Shaw M">M Shaw</name>
</author>
<author>
<name sortKey="Boyle, J" uniqKey="Boyle J">J Boyle</name>
</author>
<author>
<name sortKey="Licata, A" uniqKey="Licata A">A Licata</name>
</author>
<author>
<name sortKey="Skinner, C" uniqKey="Skinner C">C Skinner</name>
</author>
<author>
<name sortKey="Stevenson, Re" uniqKey="Stevenson R">RE Stevenson</name>
</author>
<author>
<name sortKey="Turner, G" uniqKey="Turner G">G Turner</name>
</author>
<author>
<name sortKey="Field, M" uniqKey="Field M">M Field</name>
</author>
<author>
<name sortKey="Hackett, A" uniqKey="Hackett A">A Hackett</name>
</author>
<author>
<name sortKey="Schwartz, Ce" uniqKey="Schwartz C">CE Schwartz</name>
</author>
<author>
<name sortKey="Gecz, J" uniqKey="Gecz J">J Gecz</name>
</author>
<author>
<name sortKey="Stratton, Mr" uniqKey="Stratton M">MR Stratton</name>
</author>
<author>
<name sortKey="Raymond, Fl" uniqKey="Raymond F">FL Raymond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Honda, S" uniqKey="Honda S">S Honda</name>
</author>
<author>
<name sortKey="Hayashi, S" uniqKey="Hayashi S">S Hayashi</name>
</author>
<author>
<name sortKey="Imoto, I" uniqKey="Imoto I">I Imoto</name>
</author>
<author>
<name sortKey="Toyama, J" uniqKey="Toyama J">J Toyama</name>
</author>
<author>
<name sortKey="Okazawa, H" uniqKey="Okazawa H">H Okazawa</name>
</author>
<author>
<name sortKey="Nakagawa, E" uniqKey="Nakagawa E">E Nakagawa</name>
</author>
<author>
<name sortKey="Goto, Y" uniqKey="Goto Y">Y Goto</name>
</author>
<author>
<name sortKey="Inazawa, J" uniqKey="Inazawa J">J Inazawa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gu, W" uniqKey="Gu W">W Gu</name>
</author>
<author>
<name sortKey="Zhang, F" uniqKey="Zhang F">F Zhang</name>
</author>
<author>
<name sortKey="Lupski, Jr" uniqKey="Lupski J">JR Lupski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hong, Gf" uniqKey="Hong G">GF Hong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Korbel, Jo" uniqKey="Korbel J">JO Korbel</name>
</author>
<author>
<name sortKey="Urban, Ae" uniqKey="Urban A">AE Urban</name>
</author>
<author>
<name sortKey="Affourtit, Jp" uniqKey="Affourtit J">JP Affourtit</name>
</author>
<author>
<name sortKey="Godwin, B" uniqKey="Godwin B">B Godwin</name>
</author>
<author>
<name sortKey="Grubert, F" uniqKey="Grubert F">F Grubert</name>
</author>
<author>
<name sortKey="Simons, Jf" uniqKey="Simons J">JF Simons</name>
</author>
<author>
<name sortKey="Kim, Pm" uniqKey="Kim P">PM Kim</name>
</author>
<author>
<name sortKey="Palejev, D" uniqKey="Palejev D">D Palejev</name>
</author>
<author>
<name sortKey="Carriero, Nj" uniqKey="Carriero N">NJ Carriero</name>
</author>
<author>
<name sortKey="Du, L" uniqKey="Du L">L Du</name>
</author>
<author>
<name sortKey="Taillon, Be" uniqKey="Taillon B">BE Taillon</name>
</author>
<author>
<name sortKey="Chen, Z" uniqKey="Chen Z">Z Chen</name>
</author>
<author>
<name sortKey="Tanzer, A" uniqKey="Tanzer A">A Tanzer</name>
</author>
<author>
<name sortKey="Saunders, Ac" uniqKey="Saunders A">AC Saunders</name>
</author>
<author>
<name sortKey="Chi, J" uniqKey="Chi J">J Chi</name>
</author>
<author>
<name sortKey="Yang, F" uniqKey="Yang F">F Yang</name>
</author>
<author>
<name sortKey="Carter, Np" uniqKey="Carter N">NP Carter</name>
</author>
<author>
<name sortKey="Hurles, Me" uniqKey="Hurles M">ME Hurles</name>
</author>
<author>
<name sortKey="Weissman, Sm" uniqKey="Weissman S">SM Weissman</name>
</author>
<author>
<name sortKey="Harkins, Tt" uniqKey="Harkins T">TT Harkins</name>
</author>
<author>
<name sortKey="Gerstein, Mb" uniqKey="Gerstein M">MB Gerstein</name>
</author>
<author>
<name sortKey="Egholm, M" uniqKey="Egholm M">M Egholm</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Williams, Lj" uniqKey="Williams L">LJ Williams</name>
</author>
<author>
<name sortKey="Tabbaa, Dg" uniqKey="Tabbaa D">DG Tabbaa</name>
</author>
<author>
<name sortKey="Li, N" uniqKey="Li N">N Li</name>
</author>
<author>
<name sortKey="Berlin, Am" uniqKey="Berlin A">AM Berlin</name>
</author>
<author>
<name sortKey="Shea, Tp" uniqKey="Shea T">TP Shea</name>
</author>
<author>
<name sortKey="Maccallum, I" uniqKey="Maccallum I">I Maccallum</name>
</author>
<author>
<name sortKey="Lawrence, Ms" uniqKey="Lawrence M">MS Lawrence</name>
</author>
<author>
<name sortKey="Drier, Y" uniqKey="Drier Y">Y Drier</name>
</author>
<author>
<name sortKey="Getz, G" uniqKey="Getz G">G Getz</name>
</author>
<author>
<name sortKey="Young, Sk" uniqKey="Young S">SK Young</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB Jaffe</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramachandran, P" uniqKey="Ramachandran P">P Ramachandran</name>
</author>
<author>
<name sortKey="Palidwor, Ga" uniqKey="Palidwor G">GA Palidwor</name>
</author>
<author>
<name sortKey="Porter, Cj" uniqKey="Porter C">CJ Porter</name>
</author>
<author>
<name sortKey="Perkins, Tj" uniqKey="Perkins T">TJ Perkins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodrigue, S" uniqKey="Rodrigue S">S Rodrigue</name>
</author>
<author>
<name sortKey="Materna, Ac" uniqKey="Materna A">AC Materna</name>
</author>
<author>
<name sortKey="Timberlake, Sc" uniqKey="Timberlake S">SC Timberlake</name>
</author>
<author>
<name sortKey="Blackburn, Mc" uniqKey="Blackburn M">MC Blackburn</name>
</author>
<author>
<name sortKey="Malmstrom, Rr" uniqKey="Malmstrom R">RR Malmstrom</name>
</author>
<author>
<name sortKey="Alm, Ej" uniqKey="Alm E">EJ Alm</name>
</author>
<author>
<name sortKey="Chisholm, Sw" uniqKey="Chisholm S">SW Chisholm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Magoc, T" uniqKey="Magoc T">T Magoc̆</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Yuan, J" uniqKey="Yuan J">J Yuan</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Xie, Y" uniqKey="Xie Y">Y Xie</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author>
<name sortKey="Shi, Y" uniqKey="Shi Y">Y Shi</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Lam, Tw" uniqKey="Lam T">TW Lam</name>
</author>
<author>
<name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Jiang, L" uniqKey="Jiang L">L Jiang</name>
</author>
<author>
<name sortKey="Chong, Z" uniqKey="Chong Z">Z Chong</name>
</author>
<author>
<name sortKey="Gong, Q" uniqKey="Gong Q">Q Gong</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Li, C" uniqKey="Li C">C Li</name>
</author>
<author>
<name sortKey="Tao, Y" uniqKey="Tao Y">Y Tao</name>
</author>
<author>
<name sortKey="Zheng, C" uniqKey="Zheng C">C Zheng</name>
</author>
<author>
<name sortKey="Zhai, W" uniqKey="Zhai W">W Zhai</name>
</author>
<author>
<name sortKey="Turissini, D" uniqKey="Turissini D">D Turissini</name>
</author>
<author>
<name sortKey="Cannon, Ch" uniqKey="Cannon C">CH Cannon</name>
</author>
<author>
<name sortKey="Lu, X" uniqKey="Lu X">X Lu</name>
</author>
<author>
<name sortKey="Wu, Ci" uniqKey="Wu C">CI Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Kaneko, K" uniqKey="Kaneko K">K Kaneko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernaola Galvan, P" uniqKey="Bernaola Galvan P">P Bernaola-Galván</name>
</author>
<author>
<name sortKey="Carpena, P" uniqKey="Carpena P">P Carpena</name>
</author>
<author>
<name sortKey="Roman Roldan, R" uniqKey="Roman Roldan R">R Román-Roldán</name>
</author>
<author>
<name sortKey="Oliver, Jl" uniqKey="Oliver J">JL Oliver</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arneodo, A" uniqKey="Arneodo A">A Arneodo</name>
</author>
<author>
<name sortKey="Vaillant, C" uniqKey="Vaillant C">C Vaillant</name>
</author>
<author>
<name sortKey="Audit, B" uniqKey="Audit B">B Audit</name>
</author>
<author>
<name sortKey="Argoul, F" uniqKey="Argoul F">F Argoul</name>
</author>
<author>
<name sortKey="D Ubenton Carafa, Y" uniqKey="D Ubenton Carafa Y">Y d’Aubenton-Carafa</name>
</author>
<author>
<name sortKey="Thermes, C" uniqKey="Thermes C">C Thermes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Voss, Rf" uniqKey="Voss R">RF Voss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fukushima, A" uniqKey="Fukushima A">A Fukushima</name>
</author>
<author>
<name sortKey="Ikemura, T" uniqKey="Ikemura T">T Ikemura</name>
</author>
<author>
<name sortKey="Kinouchi, M" uniqKey="Kinouchi M">M Kinouchi</name>
</author>
<author>
<name sortKey="Oshima, T" uniqKey="Oshima T">T Oshima</name>
</author>
<author>
<name sortKey="Kudo, Y" uniqKey="Kudo Y">Y Kudo</name>
</author>
<author>
<name sortKey="Mori, H" uniqKey="Mori H">H Mori</name>
</author>
<author>
<name sortKey="Kanaya, S" uniqKey="Kanaya S">S Kanaya</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Holste, D" uniqKey="Holste D">D Holste</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author>
<name sortKey="Holste, D" uniqKey="Holste D">D Holste</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huynen, M" uniqKey="Huynen M">M Huynen</name>
</author>
<author>
<name sortKey="Van Nimwegen, E" uniqKey="Van Nimwegen E">E van Nimwegen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qian, J" uniqKey="Qian J">J Qian</name>
</author>
<author>
<name sortKey="Luscombe, Nm" uniqKey="Luscombe N">NM Luscombe</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koonin, Ev" uniqKey="Koonin E">EV Koonin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Herrada, A" uniqKey="Herrada A">A Herrada</name>
</author>
<author>
<name sortKey="Euiluz, Vm" uniqKey="Euiluz V">VM Euíluz</name>
</author>
<author>
<name sortKey="Hernandez Garcia, E" uniqKey="Hernandez Garcia E">E Hernández-García</name>
</author>
<author>
<name sortKey="Duarte, Cm" uniqKey="Duarte C">CM Duarte</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salerno, W" uniqKey="Salerno W">W Salerno</name>
</author>
<author>
<name sortKey="Havlak, P" uniqKey="Havlak P">P Havlak</name>
</author>
<author>
<name sortKey="Miller, J" uniqKey="Miller J">J Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yanai, I" uniqKey="Yanai I">I Yanai</name>
</author>
<author>
<name sortKey="Camacho, Cj" uniqKey="Camacho C">CJ Camacho</name>
</author>
<author>
<name sortKey="Delisi, C" uniqKey="Delisi C">C DeLisi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teichmann, Sa" uniqKey="Teichmann S">SA Teichmann</name>
</author>
<author>
<name sortKey="Babu, Mm" uniqKey="Babu M">MM Babu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Massip, F" uniqKey="Massip F">F Massip</name>
</author>
<author>
<name sortKey="Arndt, Pf" uniqKey="Arndt P">PF Arndt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, L" uniqKey="Zhang L">L Zhang</name>
</author>
<author>
<name sortKey="Lu, Hh" uniqKey="Lu H">HH Lu</name>
</author>
<author>
<name sortKey="Chung, Wy" uniqKey="Chung W">WY Chung</name>
</author>
<author>
<name sortKey="Yang, J" uniqKey="Yang J">J Yang</name>
</author>
<author>
<name sortKey="Li, Wh" uniqKey="Li W">WH Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ng, Sb" uniqKey="Ng S">SB Ng</name>
</author>
<author>
<name sortKey="Turner, Eh" uniqKey="Turner E">EH Turner</name>
</author>
<author>
<name sortKey="Robertson, Pd" uniqKey="Robertson P">PD Robertson</name>
</author>
<author>
<name sortKey="Flygare, Sd" uniqKey="Flygare S">SD Flygare</name>
</author>
<author>
<name sortKey="Bigham, Aw" uniqKey="Bigham A">AW Bigham</name>
</author>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
<author>
<name sortKey="Shaffer, T" uniqKey="Shaffer T">T Shaffer</name>
</author>
<author>
<name sortKey="Wong, M" uniqKey="Wong M">M Wong</name>
</author>
<author>
<name sortKey="Bhattacharjee, A" uniqKey="Bhattacharjee A">A Bhattacharjee</name>
</author>
<author>
<name sortKey="Eichler, Ee" uniqKey="Eichler E">EE Eichler</name>
</author>
<author>
<name sortKey="Bamshad, M" uniqKey="Bamshad M">M Bamshad</name>
</author>
<author>
<name sortKey="Nickerson, Da" uniqKey="Nickerson D">DA Nickerson</name>
</author>
<author>
<name sortKey="Shendure, J" uniqKey="Shendure J">J Shendure</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24386976</article-id>
<article-id pub-id-type="pmc">3927684</article-id>
<article-id pub-id-type="publisher-id">1471-2105-15-2</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-15-2</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Diminishing return for increased Mappability with longer sequencing reads: implications of the
<italic>k</italic>
-mer distributions in the human genome</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" id="A1">
<name>
<surname>Li</surname>
<given-names>Wentian</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>wtli2012@gmail.com</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Freudenberg</surname>
<given-names>Jan</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>jfreuden@nshs.edu</email>
</contrib>
<contrib contrib-type="author" id="A3">
<name>
<surname>Miramontes</surname>
<given-names>Pedro</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>pmv@ciencias.unam.mx</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
The Robert S. Boas Center for Genomics and Human Genetic, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, USA</aff>
<aff id="I2">
<label>2</label>
Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, 04510 DF México, México</aff>
<pub-date pub-type="collection">
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>3</day>
<month>1</month>
<year>2014</year>
</pub-date>
<volume>15</volume>
<fpage>2</fpage>
<lpage>2</lpage>
<history>
<date date-type="received">
<day>25</day>
<month>8</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>12</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2014 Li et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2014</copyright-year>
<copyright-holder>Li et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/15/2"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>The amount of non-unique sequence (non-singletons) in a genome directly affects the difficulty of read alignment to a reference assembly for high throughput-sequencing data. Although a longer read is more likely to be uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking. To address this question, we evaluate the
<italic>k</italic>
-mer distribution of the human reference genome. The
<italic>k</italic>
-mer frequency is determined for
<italic>k</italic>
ranging from 20 bp to 1000 bp.</p>
</sec>
<sec>
<title>Results</title>
<p>We observe that the proportion of non-singletons
<italic>k</italic>
-mers decreases slowly with increasing
<italic>k</italic>
, and can be fitted by piecewise power-law functions with different exponents at different ranges of
<italic>k</italic>
. A slower decay at greater values for
<italic>k</italic>
indicates more limited gains in mappability for read lengths between 200 bp and 1000 bp. The frequency distributions of
<italic>k</italic>
-mers exhibit long tails with a power-law-like trend, and rank frequency plots exhibit a concave Zipf’s curve. The most frequent 1000-mers comprise 172 regions, which include four large stretches on chromosomes 1 and X, containing genes of biomedical relevance. Comparison with other databases indicates that the 172 regions can be broadly classified into two types: those containing LINE transposable elements and those containing segmental duplications.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Read mappability as measured by the proportion of singletons increases steadily up to the length scale around 200 bp. When read length increases above 200 bp, smaller gains in mappability are expected. Moreover, the proportion of non-singletons decreases with read lengths much slower than linear. Even a read length of 1000 bp would not allow the unique alignment of reads for many coding regions of human genes. A mix of techniques will be needed for efficiently producing high-quality data that cover the complete human genome.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Next-generation sequencing</kwd>
<kwd>Read alignment</kwd>
<kwd>Repeat sequences</kwd>
<kwd>Genome redundancy</kwd>
<kwd>Long-tail distribution</kwd>
<kwd>
<italic>k</italic>
-mers</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>Many applications of next-generation-sequencing (NGS) in human genetic and medical studies depend on the ability to uniquely align DNA reads to the human reference genome [
<xref ref-type="bibr" rid="B1">1</xref>
-
<xref ref-type="bibr" rid="B6">6</xref>
]. This, in turn, is related to the level of redundancy caused by repetitive sequences in the human genome, well known from the earlier human whole-genome shotgun sequencing [
<xref ref-type="bibr" rid="B7">7</xref>
,
<xref ref-type="bibr" rid="B8">8</xref>
], and the read length
<italic>k</italic>
. When the read length
<italic>k</italic>
is too short, it is theoretically impossible to have a reference sequence with size comparable to the human genome that does not contain any repeats of
<italic>k</italic>
bases. It has been shown using graph theory that the longest DNA sequences avoiding any repeats of
<italic>k</italic>
-mers can be constructed by packing all unique
<italic>k</italic>
-mers shifting one position at the time [
<xref ref-type="bibr" rid="B9">9</xref>
]. The number of different
<italic>k</italic>
-mer types is 4
<sup>
<italic>k</italic>
</sup>
/2 (
<italic>k</italic>
odd) or (4
<sup>
<italic>k</italic>
</sup>
+2
<sup>
<italic>k</italic>
</sup>
)/ 2 (
<italic>k</italic>
even) if both a subsequence and its reverse complement are considered to belong to the same
<italic>k</italic>
-mer type. Solving 4
<sup>
<italic>k</italic>
</sup>
/2≈3×10
<sup>9</sup>
leads to the conclusion that read length
<italic>k</italic>
must be at least greater than 17 for all reads to be uniquely alignable to a hypothetical reference sequence that has the size of the human genome.</p>
<p>However, in reality the human genome did not evolve by a first principle to be consistently compact and incompressible. Redundant sequences in the human genome have resulted from duplication, insertion of transposable elements, and tandem repeats due to replication slippage, and more than half of the human genome can be traced to repetitive transposable elements. Although locally duplicated sequences can be deleterious [
<xref ref-type="bibr" rid="B10">10</xref>
] or disease-causing [
<xref ref-type="bibr" rid="B11">11</xref>
], a certain level of redundancy is a requirement for biological novelty and adaptation [
<xref ref-type="bibr" rid="B12">12</xref>
-
<xref ref-type="bibr" rid="B14">14</xref>
]. For higher eukaryotes, a slower removal of the deleterious repeats due to low mutation rates and smaller population sizes [
<xref ref-type="bibr" rid="B15">15</xref>
] lead to a higher level of genome-wide redundancy. This in turns may lead to more protein sequences with internal repeats and perhaps new fold or new functions such as the case for connection tissue, cytoskeletal, and muscle proteins [
<xref ref-type="bibr" rid="B16">16</xref>
].</p>
<p>Therefore,
<italic>k</italic>
=17 is a very unrealistic estimation of the minimal read length required for a perfectly successful NGS reads alignment. Accordingly, NGS technologies utilize reads with various larger lengths:
<italic>k</italic>
=70 for
<italic>Complete Genomics</italic>
, 35∼85 for
<italic>ABI SOLiD</italic>
, 75∼150 pair-end for
<italic>Illumina HiSeq</italic>
, 400 for
<italic>Ion Torrent PGM</italic>
, 450∼600 for
<italic>Roche 454 GS FLX Titanium XLR70</italic>
, etc. [
<xref ref-type="bibr" rid="B17">17</xref>
]. Currently, the technology is pushing towards read lengths of
<italic>k</italic>
=1000 (e.g.,
<italic>Roche 454 GS FLX Titanium XL+</italic>
) or even
<italic>k</italic>
=10000 [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B19">19</xref>
]. Needless to say, the longer the read length, the higher the chance that reads can be aligned to the reference genome. Ultimately, high quality genomes will be obtained by a mix of technologies. To find this optimal mixture, a quantitative understanding of the repeat structure of the human genome is required.</p>
<p>Our analysis of the repeat structure is different from some earlier investigations of read mappability [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B5">5</xref>
]. In these studies, the actual reads from the current sequencing technology are used. There are two shortcomings in these approaches: (i) it is impossible to extrapolate the result to read lengths which is beyond the current technology; (ii) a certain proportion of reads are never mappable because the corresponding regions in the reference genome are not finished. Using the existing reference genome makes it possible to treat
<italic>k</italic>
-mers as hypothetic reads whose length
<italic>k</italic>
can be as long as possible, and unfinished regions can be excluded from the analysis.</p>
<p>In this paper we quantitatively address the question how alignment improves for greater read length. To this end, we artificially cut the human reference genome into overlapping
<italic>k</italic>
-windows (
<italic>k</italic>
-mers,
<italic>k</italic>
-tuples, or
<italic>k</italic>
-gram [
<xref ref-type="bibr" rid="B20">20</xref>
]), each considered to be possible a “read”, and count the number of appearances (or “tokens”, borrowing a terminology from linguistics [
<xref ref-type="bibr" rid="B21">21</xref>
]) of each
<italic>k</italic>
-mer type across the full reference sequence. Those
<italic>k</italic>
-mer types that appear in the genome only once (
<italic>f</italic>
=1) are labeled singletons, and the remainder (
<italic>f</italic>
>1) are non-singletons. Intuitively, the percentage of non-singleton reads is expected to decrease with increasing read length
<italic>k</italic>
. Obtaining the functional form of this decay enables us to predict the percentage of difficult-to-align reads for longer read lengths.</p>
<p>These seemingly simple calculations already encounter a “big data” problem on a regular-sized computer. In particular, storing counts in a hash table requires large amount of RAM. Suppose a
<italic>k</italic>
-mer needs K byte to store (e.g.
<italic>K</italic>
=
<italic>k</italic>
/4), a hash table to count all
<italic>k</italic>
-mers in the human genome would require 3
<italic>K</italic>
GByte RAM, which quickly becomes implausible when
<italic>k</italic>
is greater than 100. Using a solution that is similar to other applications where the hard disk [
<xref ref-type="bibr" rid="B22">22</xref>
-
<xref ref-type="bibr" rid="B24">24</xref>
] or computing time [
<xref ref-type="bibr" rid="B25">25</xref>
] is traded with RAM, we use a new public-domain program DSK which utilizes the less expensive hard disk and longer CPU time to compensate a lack of RAM [
<xref ref-type="bibr" rid="B26">26</xref>
]. Other efficient
<italic>k</italic>
-mer count procedures have been proposed in [
<xref ref-type="bibr" rid="B27">27</xref>
-
<xref ref-type="bibr" rid="B29">29</xref>
].</p>
<p>The mathematical relationship between the fraction of non-singleton
<italic>k</italic>
-mers and
<italic>k</italic>
predicts the fraction of putative reads that can be mapped uniquely. Another statistic of interest is the distribution of
<italic>k</italic>
-mer frequencies when
<italic>k</italic>
is fixed at a given value. This distribution has a head and a tail, a head for low frequency
<italic>k</italic>
-mers (including singletons), and a tail for high frequency
<italic>k</italic>
-mers. In the situation when these distributions exhibit long-tails [
<xref ref-type="bibr" rid="B30">30</xref>
] and power-law-like trends [
<xref ref-type="bibr" rid="B31">31</xref>
], thus fitting a straight line in log-log scale, the head end is best characterized by the frequency distribution [
<xref ref-type="bibr" rid="B21">21</xref>
], whereas the tail end is better characterized by the rank-frequency distribution commonly related to Zipf’s law in quantitative linguistics [
<xref ref-type="bibr" rid="B32">32</xref>
]. Our analysis of these distributions provides information on the level of redundancy in the human genome at various scales.</p>
<p>The identification of regions in the human genome that cannot be uniquely mapped by reads (which can be called “non-uniqueome” following the term “uniqueome” used in [
<xref ref-type="bibr" rid="B3">3</xref>
]) is important in any NGS-based studies. These regions may contribute the most the false-positive and false-negative variant callings. These may also be hotspots for structural variations such as copy-number-variation [
<xref ref-type="bibr" rid="B33">33</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
]. We will specifically examine the location of some of these regions at the
<italic>k</italic>
=1000 level.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Genome sequence data</title>
<p>The human reference genome GRCh37 (hg19) was downloaded from UCSC’s Genome Browser (
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu/">http://genome.ucsc.edu/</ext-link>
). The intermittent strings of N’s (marking unfinished basepairs that cannot be sequenced with the applied technology [
<xref ref-type="bibr" rid="B35">35</xref>
]) are used to partition the 22 autosomes and 2 sex chromosomes into 322 subsequences, and
<italic>k</italic>
-mers overlapping two chromosome partitions are not allowed.</p>
<p>For an additional analysis on repeat-filtered sequences, strings of lowercase letters in the reference genome (which mark repetitive sequences identified by the RepeatMasker program,
<ext-link ext-link-type="uri" xlink:href="http://www.repeatmasker.org/">http://www.repeatmasker.org/</ext-link>
) are used to partition the genome into 3,456,905 subsequences with all transposable elements removed.</p>
<p>We further use the database
<italic>Dfam</italic>
version 1.2 (May 2013) (
<ext-link ext-link-type="uri" xlink:href="http://dfam.janelia.org/">http://dfam.janelia.org/</ext-link>
) [
<xref ref-type="bibr" rid="B36">36</xref>
] to annotate genomic regions by repeat sequences.
<italic>Dfam</italic>
contains the genomic locations of more than a thousand (1132) of transposable elements (TE) subfamily types. A hit is recorded whenever our genomic region overlaps with a TE.
<italic>Dfam</italic>
also provides information on tandem repeats by the program Tandem Repeat Finder [
<xref ref-type="bibr" rid="B37">37</xref>
].</p>
<p>Segmental duplication annotation of the human genome, which is either based on unusually high read coverage of whole-genome shotgun sequence segments from the Celera Genomics [
<xref ref-type="bibr" rid="B38">38</xref>
], or by a self-alignment by BLAST [
<xref ref-type="bibr" rid="B39">39</xref>
] on the RepeatMasker filtered genome (“fuguization”) [
<xref ref-type="bibr" rid="B40">40</xref>
,
<xref ref-type="bibr" rid="B41">41</xref>
], is obtained from the Segmental Dups track (“Duplications of > 1000 bases of non-RepeatMasker sequence”) at Genome Browser (
<ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=genomicSuperDups">http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=genomicSuperDups</ext-link>
).</p>
</sec>
<sec>
<title>Counting
<italic>k</italic>
-mers</title>
<p>A
<italic>k</italic>
-mer type includes both the direct and the reverse complement substring; AAGC/GCTT is an example of such a 4-mer type. We use a state-of-art
<italic>k</italic>
-mer counting program DSK [
<xref ref-type="bibr" rid="B26">26</xref>
] (
<ext-link ext-link-type="uri" xlink:href="http://minia.genouest.org/dsk/">http://minia.genouest.org/dsk/</ext-link>
), version 1.5031 (March 26, 2013). Most of the DSK calculations were carried out on a Linux computer with 48 GByte RAM and around 900 GByte disk space, except a calculation at
<italic>k</italic>
=1000 which was run on another Linux computer with the same RAM but 30 TByte of disk space. The parameter setting of DSK was determined by a trial-and-error process. The output of the DSK program consists of a list of
<italic>k</italic>
-mers. The BLAT program from UCSC’s Genome Browser is used to map frequent
<italic>k</italic>
-mers back to the reference genome.</p>
</sec>
<sec>
<title>Frequency distribution, rank frequency plot, and data fitting</title>
<p>Suppose a
<italic>k</italic>
-mer type appears in the genome
<italic>f</italic>
times (
<italic>f</italic>
is frequency, or copy number); frequency distribution (FD) is the number of
<italic>k</italic>
-mer types with frequency
<italic>f</italic>
. Individual
<italic>k</italic>
-mer types can be ranked by their
<italic>f</italic>
, highest
<italic>f</italic>
ranks number 1, second highest
<italic>f</italic>
ranks number 2, etc. The ranked
<italic>f</italic>
’s of
<italic>k</italic>
-mer types as a function of rank
<italic>r</italic>
is the rank-frequency distribution (RFD).</p>
<p>The functions used here in fitting the RFD can all be expressed as linear regression, include Weibull function: log(
<italic>f</italic>
)∼ log(log((max(
<italic>r</italic>
)+1)/
<italic>r</italic>
)) [
<xref ref-type="bibr" rid="B42">42</xref>
]; quadratic logarithmic: log(
<italic>f</italic>
)∼ log(
<italic>r</italic>
)+(log
<italic>r</italic>
)
<sup>2</sup>
[
<xref ref-type="bibr" rid="B43">43</xref>
]; and reverse Beta: log(
<italic>r</italic>
)∼ log(
<italic>f</italic>
)+ log(max(
<italic>f</italic>
)+1-
<italic>f</italic>
). The latter function is derived from the Beta rank function [
<xref ref-type="bibr" rid="B44">44</xref>
-
<xref ref-type="bibr" rid="B46">46</xref>
] by reversing the
<italic>f</italic>
and
<italic>r</italic>
. All linear regressions are carried out by the
<italic>R</italic>
function
<italic>lm</italic>
(
<ext-link ext-link-type="uri" xlink:href="http://www.r-project.org/">http://www.r-project.org/</ext-link>
).</p>
</sec>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>Percentage of non-singleton reads vs. read length: piece-wise power-law function</title>
<p>In Figure
<xref ref-type="fig" rid="F1">1</xref>
we show the percentage of non-singleton reads/tokens (
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
) as a function of
<italic>k</italic>
-mer length
<italic>k</italic>
in log-log scale. The
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
is 28.35% at
<italic>k</italic>
=20, 8.16% at
<italic>k</italic>
=50, 4.26% at
<italic>k</italic>
=80, 3.40% at
<italic>k</italic>
=100, 2.44% at
<italic>k</italic>
=150, 1.33% at
<italic>k</italic>
=400, 1.18% at
<italic>k</italic>
=500, and 0.82% at
<italic>k</italic>
=1000. If
<italic>k</italic>
is shorter than the “shortest unique substring” length, which is 11 in the human genome [
<xref ref-type="bibr" rid="B47">47</xref>
], singletons do not exist (i.e.,
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
=100
<italic>%</italic>
).</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Proportion of non-singleton</bold>
<bold>
<italic>k</italic>
</bold>
<bold>-mers/tokens in the human genome (24 chromosomes) as a function of</bold>
<bold>
<italic>k</italic>
</bold>
<bold> (in log-log scale).</bold>
Circles (o) show the results for all finished basepairs, whereas crosses (x) for the result from RepeatMasker-filtered sequences. Pluses (+) are results when unfinished sequences (234 Mbase) are included as non-singletons.</p>
</caption>
<graphic xlink:href="1471-2105-15-2-1"></graphic>
</fig>
<p>Visual inspection of the trend suggests the use of piecewise power-law function in fitting the data. We fit the points in
<italic>k</italic>
=20-80 and
<italic>k</italic>
=200-1000 ranges separately by linear regressions in the log-log scale: log10
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
=
<italic>a</italic>
+
<italic>b</italic>
log10
<italic>k</italic>
(or log
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
∼ log
<italic>k</italic>
). The fitted
<inline-formula>
<mml:math id="M1" name="1471-2105-15-2-i1" overflow="scroll">
<mml:mo>(</mml:mo>
<mml:mi>â</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>^</mml:mo>
</mml:mover>
<mml:mo>)</mml:mo>
</mml:math>
</inline-formula>
is (1.58366, -1.5478) and (-0.4371, -0.5495) for the two segments, equivalent to
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
=38.34/
<italic>k</italic>
<sup>1.548</sup>
and
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
=0.365/
<italic>k</italic>
<sup>0.55</sup>
. The steep decay in the first segment shows a stronger increase of the amount of uniquely mappable sequences with read length, which implies that obtaining read lengths of at least around 100 is more cost-efficient with respect to reducing the amount of non-mappable reads. Of course, longer reads have extra benefits such as more robust alignments in the presence of polymorphisms or the ability to determine the length of longer repeat polymorphisms. The power-law function also indicates that the reduction of non-specific, difficult-to-align reads with longer read length is not linear.</p>
<p>If we assume our fitting function can be extrapolated to larger
<italic>k</italic>
’s for which a direct analysis of
<italic>k</italic>
-mer frequencies is restricted by computational constraints, the proportion of non-singleton reads can be predicted. For example, this leads to the prediction of a 0.2% non-singleton rate at the 10kb read length.</p>
<p>It is known that repetitive sequences create considerable obstacle in NGS alignment [
<xref ref-type="bibr" rid="B48">48</xref>
]. Though TE’s may exhibit subtle correlation with functional units in the genome [
<xref ref-type="bibr" rid="B49">49</xref>
], it is generally assumed that their biological role is indirect. Accordingly, we also looked at the non-singleton
<italic>k</italic>
-mer percentages in RepeatMasker filtered sequences (Figure
<xref ref-type="fig" rid="F1">1</xref>
). As expected, the percentage of uniquely mappable sequence is much higher than in the all-inclusive sequence for short
<italic>k</italic>
-mers (e.g.
<italic>k</italic>
<100). Interestingly, the differences between the two disappear for longer
<italic>k</italic>
-mers (e.g.
<italic>k</italic>
=500). A note of caution is that 89% of these RepeatMasker-filtered subsequences are shorter than 1kb, making the statistics less reliable at longer
<italic>k</italic>
’s.</p>
</sec>
<sec>
<title>Maximum
<italic>k</italic>
-mer frequency decreases with
<italic>k</italic>
slowly</title>
<p>Another measure of the level of redundancy at length scale
<italic>k</italic>
is the maximum frequency (max(
<italic>f</italic>
)) of
<italic>k</italic>
-mer types. For example, base A/T homopolymers of length 20 appear most often with 898,647 copies; at
<italic>k</italic>
=400, AT repeats have more copy numbers (
<italic>f</italic>
=150) than other 400-mers; the max(
<italic>f</italic>
) for
<italic>k</italic>
=1000 is equal to 24 for a sequence which is not filtered by the RepeatMasker. The max(
<italic>f</italic>
) as a function of
<italic>k</italic>
is shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
in log-log scale.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Maximum frequencies of</bold>
<bold>
<italic>k</italic>
</bold>
<bold>-mers as a function of</bold>
<bold>
<italic>k</italic>
</bold>
<bold> (in log-log scale).</bold>
Circles (o) show the results for all finished bases, whereas crosses (x) for the result from RepeatMasker-filtered bases.</p>
</caption>
<graphic xlink:href="1471-2105-15-2-2"></graphic>
</fig>
<p>For RepeatMasker-filtered sequences, max(
<italic>f</italic>
) quickly decays below 100 and then falls only slowly, indicating that RepeatMasker usually finds shorter repeats. At
<italic>k</italic>
= 200–500, the
<italic>k</italic>
-mer with the max(
<italic>f</italic>
) ∼ 50 is a lowcomplexity sequence, with internal repeats of GGGGGGAACAGCGACAC/GTGTCCGCTGTTCCCCCC. Despite its high prevalence, this low-complexity sequence is not masked by RepeatMasker in the human reference genome.</p>
<p>Fitting the linear regression model log10 max(
<italic>f</italic>
)=
<italic>a</italic>
+
<italic>b</italic>
log10
<italic>k</italic>
(or log max(
<italic>f</italic>
)∼ log
<italic>k</italic>
) leads to (
<italic>a</italic>
,
<italic>b</italic>
)= (8.99, -2.62). Extrapolating this regression to longer
<italic>k</italic>
’s predicts that at
<italic>k</italic>
=2724, max(
<italic>f</italic>
) = 1. This prediction should be viewed with caution as max(
<italic>f</italic>
) is mainly determined by “outlier” events thus un-reproducible in principle, and the linear function in Figure
<xref ref-type="fig" rid="F2">2</xref>
does not fit the data perfectly. Any extrapolation, exemplified by both Figure
<xref ref-type="fig" rid="F1">1</xref>
and Figure
<xref ref-type="fig" rid="F2">2</xref>
, is based on the assumption that the fitted function in the observed range will continue as the same outside the range. There is no guarantee that this assumption is true in the present case.</p>
</sec>
<sec>
<title>Frequency distributions at fixed k values exhibit power-law-like trend</title>
<p>The frequency distribution (FD) describes the distribution of
<italic>k</italic>
-mer types according their copy numbers in the genome. When plotted in log-log scale, low-frequency
<italic>k</italic>
-mer types and the less redundant portion of the sequence are highlighted. Figure
<xref ref-type="fig" rid="F3">3</xref>
shows five FDs at
<italic>k</italic>
=30, 50, 150, 500, and 1000 in log-log scale. The FDs at
<italic>k</italic>
=30 and 50 span a wider frequency range, and the power-law trend is obvious.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Frequency distributions of</bold>
<bold>
<italic>k</italic>
</bold>
<bold>-mers at</bold>
<bold>
<italic>k</italic>
</bold>
<bold> = 30, 50, 150, 500, and 1000 (in log-log scale).</bold>
The distributions for
<italic>k</italic>
-mers in repeat-filtered sequences at
<italic>k</italic>
=50, 150, 500 are shown in grey lines.</p>
</caption>
<graphic xlink:href="1471-2105-15-2-3"></graphic>
</fig>
<p>A similar FD for
<italic>k</italic>
=40 in human genome was shown in [
<xref ref-type="bibr" rid="B50">50</xref>
,
<xref ref-type="bibr" rid="B51">51</xref>
], and a slope of -2.3 in linear regression (in log-log scale) in the
<italic>f</italic>
= 3–500 range was reported. When we fit the
<italic>k</italic>
=50 FD by linear regression in log-log scale, a very similar fitting slope value is obtained (-2.38, for
<italic>f</italic>
= 3-200). However, it is clear from Figure
<xref ref-type="fig" rid="F3">3</xref>
that the slopes are steeper for
<italic>k</italic>
=150 (-2.7 for
<italic>f</italic>
= 2-100),
<italic>k</italic>
=500 (-3.5 for
<italic>f</italic>
= 2-40), and
<italic>k</italic>
=1000 (-5.3 for
<italic>f</italic>
= 2–19, or -5.9 from
<italic>f</italic>
= 2-9), indicating that the slope is not a universal parameter.</p>
<p>From the short read alignment perspective, the long tail at the high copy-numbers shows that many sequences cannot be uniquely mapped at smaller
<italic>k</italic>
values (e.g.
<italic>k</italic>
=30, 50). However, the tail is much shortened at
<italic>k</italic>
=1000. As expected, the tail for RepeatMasker-filtered sequences at various
<italic>k</italic>
values are much shorter (Figure
<xref ref-type="fig" rid="F3">3</xref>
, grey lines).</p>
</sec>
<sec>
<title>Rank-frequency distributions at fixed k values mostly follows a concave curve in log-log scale</title>
<p>Although a rank-frequency distribution (RFD) can be converted to cumulative FD [
<xref ref-type="bibr" rid="B42">42</xref>
], in log-log scale, it zooms into the high-frequency tail of the frequency distribution. Figure
<xref ref-type="fig" rid="F4">4</xref>
shows five RFD at
<italic>k</italic>
’s from 30 to 1000. While the RFD at
<italic>k</italic>
=30 may maintain a power-law or piecewise power-law trend, those at larger
<italic>k</italic>
values become more concave. This concave Zipf’s curve is commonly observed in city size distributions [
<xref ref-type="bibr" rid="B52">52</xref>
,
<xref ref-type="bibr" rid="B53">53</xref>
].</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Rank-frequency distributions for</bold>
<bold>
<italic>k</italic>
</bold>
<bold>-mers at</bold>
<bold>
<italic>k</italic>
</bold>
<bold> = 30, 50, 150, 500, and 1000 (in log-log scale).</bold>
The corresponding rank-frequency distributions for RepeatMasker-filtered sequences at
<italic>k</italic>
=30, 50, 150, 500 are shown in grey lines.</p>
</caption>
<graphic xlink:href="1471-2105-15-2-4"></graphic>
</fig>
<p>For RFDs deviating from the Zipf’s law, functions with two parameters may be used to account for the concave or convex shape of the curve in log-log scale [
<xref ref-type="bibr" rid="B42">42</xref>
]. We found that the quadratic logarithmic function, but not the Weibull function, fits the RFDs well (Figure
<xref ref-type="fig" rid="F5">5</xref>
). The Beta rank function usually exhibit “S” shapes [
<xref ref-type="bibr" rid="B45">45</xref>
], whereas the RFD in Figure
<xref ref-type="fig" rid="F4">4</xref>
shows a “Z” shape. This motivated us to use a novel reverse Beta function to fit the data (Figure
<xref ref-type="fig" rid="F5">5</xref>
). The “Z” shaped log-log RFD means that if the power-law function is the default functional relationship between frequency and rank, frequencies of the intermediately-ranked
<italic>k</italic>
-mers decrease faster than the two tails. The “S” shaped log-log RFD implies the opposite.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>Fitting rank-frequency distribution of</bold>
<bold>
<italic>k</italic>
</bold>
<bold>-mers at k = 30, 50, 150, 500 using three functions.</bold>
Red: quadratic logarithmic (log
<italic>f</italic>
∼ log(
<italic>r</italic>
)+ log((
<italic>r</italic>
))
<sup>2</sup>
,
<italic>f</italic>
: frequency of a
<italic>k</italic>
-mer type,
<italic>r</italic>
: rank of a
<italic>k</italic>
-mer type, and the ∼ symbol represents linear regression); blue: reverse Beta rank function (log(
<italic>r</italic>
)∼ log(
<italic>f</italic>
)+ log(
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
(
<italic>f</italic>
)+1-
<italic>f</italic>
)); Orange: Weibull function (log(
<italic>f</italic>
)∼ log(log((
<italic>m</italic>
<italic>a</italic>
<italic>x</italic>
(
<italic>r</italic>
)+1)/
<italic>r</italic>
))).</p>
</caption>
<graphic xlink:href="1471-2105-15-2-5"></graphic>
</fig>
</sec>
<sec>
<title>Mapping
<bold>
<italic>f≥10</italic>
</bold>
1000-mer to the reference genome</title>
<p>For
<italic>k</italic>
=1000, there are 6107
<italic>k</italic>
-mer types with frequency
<italic>f</italic>
larger or equal to 10. Due to the fact that these are overlapping
<italic>k</italic>
-mers, they are mapped to only 172 chromosomal regions, each of a few kb (the 172 locations, number of high-frequency 1000-mers, and the distance from the left-neighboring chromosome regions are included in Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S1).</p>
<p>A total of 70 out of these 172 regions (or 40%) are clustered in four larger stretches on chromosomes 1 and X and contain long tandem repeats (60, 70 kbase on chromosome 1q21.1, 1q21.2, and 41, 56 kbases on Xq23, Xq24). The two stretches on chromosome 1 contain copies of the neuroblastoma breakpoint family genes (
<italic>NBPF</italic>
) [
<xref ref-type="bibr" rid="B54">54</xref>
-
<xref ref-type="bibr" rid="B56">56</xref>
]. The Xq24 region contains cancer/testis antigen family genes (
<italic>CT47A</italic>
) [
<xref ref-type="bibr" rid="B57">57</xref>
,
<xref ref-type="bibr" rid="B58">58</xref>
], whereas the Xq23 region has no genes, but contains the macrosatellite
<italic>DXZ4</italic>
[
<xref ref-type="bibr" rid="B59">59</xref>
-
<xref ref-type="bibr" rid="B61">61</xref>
] which exhibits periodic appearance of other functional elements, such as H3K27Ac or H3K4me2 [
<xref ref-type="bibr" rid="B62">62</xref>
] histone modification marks.</p>
<p>Besides these long stretches, 39 out of 172 regions (or 23%) overlap with 34 genes:
<italic>ZNF3850, EPHA3, COL6A6, CD38, KCNIP4, FRAS1, ANTXR2, HSD17B11, FAM190A, DKK2, FBXL7, AK123816, FAM153A, FAM65B, LAMA2, MYCT1, NOD1, TPST1, PSD3, KCNB2, NR4A3, C9orf171, CACNA1B, DLG2, CCDC67, UACA, HOMER2, SMG1, CDH13, PRKCA, LILRA2, TTC28, MTMR8</italic>
, and
<italic>SLC25A43</italic>
. Obtaining high quality data on genetic variants in these genes is therefore likely to remain a challenge even with longer reads.</p>
<p>The distribution of transposable elements in the 172 regions is analyzed using the
<italic>Dfam</italic>
database. Interestingly, 1q21.1, 1q21.2, Xq23 regions discussed above do not overlap with any transposable elements. The Xq24 region contains a subfamily of Alu, AluSc8 (length ∼ 304, with mismatch-included copy number in the human genome ∼ 24000). Outside the four long stretches of genomic regions, however, almost all overlap with LINE-1 retrotransposons [
<xref ref-type="bibr" rid="B63">63</xref>
] (98/102, or 96%; 98/172, or 57%). Among these, the dominant LINE-1 subfamily is L1P1_orf2 (84/102, or 82%; 84/172 or 49%). The length of L1P1_orf2 is roughly 2174, and its mismatch-included copy number in the human genome is more than 16000.</p>
<p>Other LINE-1 subfamilies overlapping these regions include L1P1_5end, L1M2_5end, L1PA2_3end, and L1ME3G_3end. Three regions also overlap with a DNA transposon, Tigger3d. All transposable element information in these regions are listed in the Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S1. Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S1 also shows the tandem repeats result, such as TG-, AC-, or TTTA-repeat. Unlike transposable elements, these tandem repeats comprise a very small proportion of the region.</p>
<p>The Segmental Duplications Track in the Genome Browser provides repeat information that is different from the transposable elements. These repeats are usually large (> 1-15kb), and information is obtained either from the whole-genome shotgun sequencing reads, independent from the reference genome; or from the reference genome itself by self-alignment. We have listed overlapping information between our 172 regions and those in the Segmental Duplications Track in the Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S1. Reassuringly, the four large regions on chromosomes 1 and X overlap with the previously identified segmentally duplicated regions, even though the methodology of the two approaches are very different.</p>
<p>By inspecting the Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
: Table S1, it can be seen that the 172 regions either contain LINE transposable elements or overlap with the segmental duplication track. The large stretch on Xq24 overlaps with both segmental duplication track and transposable elements. However, the transposable element contained is the Alu element, which is a SINE instead of LINE. Possible connections between segmental duplication and Alu elements have been discussed before [
<xref ref-type="bibr" rid="B64">64</xref>
], and it is possible that the Alu element appeared in this region before the onset of duplication.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<sec>
<title>Long
<italic>k</italic>
-mers in the reference genome as surrogate for sequencing reads</title>
<p>The
<italic>k</italic>
-mer distribution has many application in sequence analysis, such as measuring similarity between two genomes [
<xref ref-type="bibr" rid="B65">65</xref>
], correcting sequencing error [
<xref ref-type="bibr" rid="B66">66</xref>
], finding repeat structures [
<xref ref-type="bibr" rid="B67">67</xref>
], determining the feasibility of gene patents [
<xref ref-type="bibr" rid="B68">68</xref>
]. In many applications, only short
<italic>k</italic>
-mers are considered to be relevant, such as
<italic>k</italic>
=6 [
<xref ref-type="bibr" rid="B69">69</xref>
],
<italic>k</italic>
≤7 [
<xref ref-type="bibr" rid="B70">70</xref>
],
<italic>k</italic>
=8 [
<xref ref-type="bibr" rid="B71">71</xref>
],
<italic>k</italic>
=11 [
<xref ref-type="bibr" rid="B72">72</xref>
]. This paper essentially uses long
<italic>k</italic>
-mers taken from the reference genome as surrogate for reads from future NGS technologies. Computationally speaking, counting long
<italic>k</italic>
-mers is more challenging and we are not aware of any prior publications on the long
<italic>k</italic>
-mer distributions in the human genome for
<italic>k</italic>
as long as 1000.</p>
<p>As compared to other papers on mappability of genome sequencing reads [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B5">5</xref>
], our more theoretical approach has the advantage of being able to discuss long reads (e.g.
<italic>k</italic>
=1000) where such data is not available from the current NGS technology. Our approach also separates the two causes of poor mappability: one due to the unfinished sequence in the reference genome and another due to the redundancy in the finished sequences. The unfinished bases are mainly located in the centromeres, short arms of acrocentric chromosomes and other heterochromatic regions, and rich in repetitive sequences. If we always treat this unfinished sequences (total 234 Mbases) to be non-singletons regardless of
<italic>k</italic>
,
<italic>p</italic>
<sub>
<italic>ns</italic>
</sub>
would flatten out around 0.1 (see Figure
<xref ref-type="fig" rid="F1">1</xref>
).</p>
</sec>
<sec>
<title>A baseline knowledge of redundancy of the human genome at length
<italic>k</italic>
level</title>
<p>Figures
<xref ref-type="fig" rid="F1">1</xref>
,
<xref ref-type="fig" rid="F2">2</xref>
and
<xref ref-type="fig" rid="F3">3</xref>
provides a baseline knowledge of the redundancy of the human genome at the
<italic>k</italic>
-mer level. Our results give a quantitative description of the effect of read length
<italic>k</italic>
on the mappability of reads from the finished region of the human genome.</p>
<p>Reference assembly is easier than
<italic>de novo</italic>
assembly, and our approach does not directly apply to
<italic>de novo</italic>
sequencing “assemblability”. However mappability and assemblability are closely related, as repetitive sequences cause problems in both situations [
<xref ref-type="bibr" rid="B73">73</xref>
]. The current
<italic>de novo</italic>
assemblies still do not perform consistently [
<xref ref-type="bibr" rid="B74">74</xref>
,
<xref ref-type="bibr" rid="B75">75</xref>
] and a quantitative assessment of the impact of repetitive sequences on reference assembly could be a useful piece of information for
<italic>de novo</italic>
assembly as well. Note that some discussion on
<italic>k</italic>
-mer-based assembly actually refers to
<italic>k</italic>
<sup></sup>
-mer (
<italic>k</italic>
<sup></sup>
<<
<italic>k</italic>
) [
<xref ref-type="bibr" rid="B76">76</xref>
,
<xref ref-type="bibr" rid="B77">77</xref>
].</p>
</sec>
<sec>
<title>Highly redundant regions at
<italic>k</italic>
= 1000 level and copy-number-variation regions</title>
<p>The chromosome 1 and X regions which we have identified by showing at least 10 copy numbers of 1000-mers are discussed in the literature as regions with common copy-number-variations (CNV). CNVs in the 1q21.1 region, if not
<italic>NBPF</italic>
-specific, have been linked to congenital cardiac defects [
<xref ref-type="bibr" rid="B78">78</xref>
-
<xref ref-type="bibr" rid="B80">80</xref>
], autism [
<xref ref-type="bibr" rid="B81">81</xref>
,
<xref ref-type="bibr" rid="B82">82</xref>
], mental retardation [
<xref ref-type="bibr" rid="B83">83</xref>
], head size abnormalities [
<xref ref-type="bibr" rid="B84">84</xref>
], schizophrenia [
<xref ref-type="bibr" rid="B85">85</xref>
,
<xref ref-type="bibr" rid="B86">86</xref>
], and neuroblastoma [
<xref ref-type="bibr" rid="B87">87</xref>
]. With so many abnormalities mapped to this region, these are collectively called the chromosome 1q21.1 duplication syndrome in the Online Mendelian Inheritance in Man (OMIM 612475).</p>
<p>The Xq23 region, if not macrosatellite
<italic>DXZ4</italic>
specific, has been identified as likely CNV regions linked to developmental and behavioral problems [
<xref ref-type="bibr" rid="B88">88</xref>
]. Chromatin configuration at
<italic>DXZ4</italic>
region is reported to differ between male melanoma cells and normal skin cells [
<xref ref-type="bibr" rid="B89">89</xref>
]. The Xq24 and the
<italic>CT47A</italic>
gene are listed as a region of structural variants associated with intellectual disability [
<xref ref-type="bibr" rid="B90">90</xref>
] and mental retardation [
<xref ref-type="bibr" rid="B91">91</xref>
].</p>
<p>A well-known mechanism for CNV formation is non-allelic homologous recombinations (NAHR) between repetitive elements [
<xref ref-type="bibr" rid="B92">92</xref>
]. More copies of a repetitive sequence give more opportunities that NAHR could occur, resulting in a natural connection between repetitive sequences and CNV. The fact that simple counting of 1000-mer frequencies leads to CNV regions with medical implications indicates that understanding the
<italic>k</italic>
-mer distribution is an important part of genomic analyses.</p>
<p>Although the four highlighted large regions also appear in the Segmental Duplication track for >1000 bp RepeatMasker-filtered sequences in the UCSC Genome Browser, the two methodologies are somewhat different. Here, we use the reference genome as starting point, length scale is upper-limited at 1000 bp, zero-mismatch, and high copy numbers (≥10). In SegDup track, the reference may or may not be used (in the latter case, raw reads are the starting point), length scale is lower-limited at one or few kbs, mismatches are allowed, and low copy number (e.g. 2) is allowed. From this may lead to the development of strategy where our approach can be used to check the consistency of the reference genome with raw read data.</p>
</sec>
<sec>
<title>Discussions of extensions to a next-generation-sequencing data</title>
<p>In a realistic setting of NGS, there are sequencing errors and single-nucleotide polymorphisms (SNP); alignment to the reference genome may allow mismatches; and there is a wide adoption of paired-end/mate-pair strategy [
<xref ref-type="bibr" rid="B93">93</xref>
-
<xref ref-type="bibr" rid="B96">96</xref>
]. It is a daunting challenge to provide a definitive answer under these situations [
<xref ref-type="bibr" rid="B4">4</xref>
] for long
<italic>k</italic>
-mer lengths such as
<italic>k</italic>
=1000. Some concepts in this paper, e.g., the
<italic>k</italic>
-mer frequency distribution in Figure
<xref ref-type="fig" rid="F3">3</xref>
, cannot be used if mismatches are considered.</p>
<p>We can however speculate about some consequences when practical complications are introduced. Suppose a DNA fragment (of length
<italic>k</italic>
) is split into two ends (of length
<italic>k</italic>
<sup></sup>
<
<italic>k</italic>
/2 each) which are to be sequenced, and an insert (of length
<italic>k</italic>
-2
<italic>k</italic>
<sup></sup>
). At
<italic>k</italic>
<sup></sup>
=
<italic>k</italic>
/2, one is essentially sequencing the whole DNA fragment, and aligning two
<italic>k</italic>
<sup></sup>
-mers next to each other is equivalent to aligning a 2
<italic>k</italic>
<sup></sup>
-mer. The result in Figure
<xref ref-type="fig" rid="F1">1</xref>
implies that the proportion of non-mappable reads/tokens decreases with
<italic>k</italic>
<sup></sup>
as 1/(2
<italic>k</italic>
<sup></sup>
)
<sup>
<italic>b</italic>
</sup>
. When
<italic>k</italic>
≪2
<italic>k</italic>
<sup></sup>
, aligning two paired-end
<italic>k</italic>
<sup></sup>
-mers is more likely to be unique than when the two
<italic>k</italic>
<sup></sup>
-mers are next to each other, as the correlation between two
<italic>k</italic>
<sup></sup>
-mers decrease with distance [
<xref ref-type="bibr" rid="B97">97</xref>
]. We may speculate that the proportion of non-uniquely-mapped reads as a function of
<italic>k</italic>
<sup></sup>
and
<italic>k</italic>
is: ∼
<italic>f</italic>
(
<italic>k</italic>
-2
<italic>k</italic>
<sup></sup>
)/(2
<italic>k</italic>
<sup></sup>
)
<sup>
<italic>b</italic>
</sup>
, where the unknown function
<italic>f</italic>
(
<italic>k</italic>
-2
<italic>k</italic>
<sup></sup>
) is 1 if
<italic>k</italic>
=2
<italic>k</italic>
<sup></sup>
, and decreases with
<italic>k</italic>
-2
<italic>k</italic>
<sup></sup>
.</p>
<p>There have been recent attempts to fill in the sequence of inserts between two ends in the pair-end strategy [
<xref ref-type="bibr" rid="B98">98</xref>
-
<xref ref-type="bibr" rid="B101">101</xref>
]. A typical example would consider a segment length
<italic>k</italic>
of 600-800 bp, and read length
<italic>k</italic>
<sup></sup>
of 100 bp [
<xref ref-type="bibr" rid="B101">101</xref>
]. We then can consider the best scenario that the sequence of the whole segment of length
<italic>k</italic>
can be determined. This will merely shift the length scale from the two times the read length (2
<italic>k</italic>
<sup></sup>
) to the segment length (
<italic>k</italic>
), and all our results still apply.</p>
<p>The effect of sequencing errors, single-nucleotide polymorphism, alignment allowing mismatches, can be discussed in the framework of
<italic>k</italic>
-mer space (with reverse complement). The observed
<italic>k</italic>
-mers in the human genome consist of a subspace of the
<italic>k</italic>
-mer space, and a link between two
<italic>k</italic>
-mers is established when the Hamming distance between the two is 1. Sequencing errors and polymorphisms either generate a new
<italic>k</italic>
-mer in this subspace, or move along a link to a previously existing
<italic>k</italic>
-mer. If new
<italic>k</italic>
-mers are generated, links between
<italic>k</italic>
-mers will be recalculated. One can argue that sequencing error and polymorphism would have less impact if the error/mutation does not lead to the creation of a new
<italic>k</italic>
-mer, or, even when a new
<italic>k</italic>
-mer is created, if the new
<italic>k</italic>
-mer does not have new links to other
<italic>k</italic>
-mers. In the case where sequencing errors and polymorphisms generate two or more mutations, links between
<italic>k</italic>
-mers with both 1- and 2-Hamming distances should be considered. The framework of discussion is similar, though more complicated.</p>
</sec>
<sec>
<title>Long-tails and the regime of diminishing return of longer reads</title>
<p>Our analysis shows that all distributions discussed in this paper are better viewed in log-log scale, proving the existence of power-law distributions or long-tails. This has been observed in the past for other genomic distributions, such as correlation function [
<xref ref-type="bibr" rid="B97">97</xref>
,
<xref ref-type="bibr" rid="B102">102</xref>
-
<xref ref-type="bibr" rid="B104">104</xref>
], power spectrum of base composition [
<xref ref-type="bibr" rid="B105">105</xref>
-
<xref ref-type="bibr" rid="B108">108</xref>
], frequency distribution of gene or protein family size [
<xref ref-type="bibr" rid="B109">109</xref>
-
<xref ref-type="bibr" rid="B112">112</xref>
], sizes of ultraconserved regions [
<xref ref-type="bibr" rid="B113">113</xref>
], and in models with duplications [
<xref ref-type="bibr" rid="B114">114</xref>
-
<xref ref-type="bibr" rid="B117">117</xref>
]. Ongoing duplications increase the copy number geometrically, which explains the presence of long-tails.</p>
<p>A consequence of the long-tail in Figure
<xref ref-type="fig" rid="F1">1</xref>
is that with increasing read (or
<italic>k</italic>
-mer) lengths, the proportion of reads that cannot be mapped to a unique genomic region (within the finished sequences) decreases as a power-law function, as compared to a linear or exponential function. Numerically, if not economically, this defines a regime of diminishing return. It is important to emphasize that we have only directly observed an diminishing return in the range of 200-1000 bp. This diminishing return may be extended further beyond 1kb, until it reaches a point of accelerating return if the read length is longer than the size of any segmental duplication region (which can be 200kb for gene-containing duplications [
<xref ref-type="bibr" rid="B118">118</xref>
]). The use of paired-end strategy usually does not increase the length scale by orders of magnitude, thus it may still be confined to the diminishing return regime. To assess the economic return with NGS technology with longer reads, other factors should be considered, such as the choice of less redundant target regions such as the exome [
<xref ref-type="bibr" rid="B119">119</xref>
], read length and sequencing error tradeoff, and the overall cost of longer-read sequencing.</p>
</sec>
</sec>
<sec sec-type="conclusions">
<title>Conclusion</title>
<p>We have established that, up to 1000 bases, the mappability of reads decreases slower than linear with read length, when mappability is measured as the proportion of non-singletons in human reference genome. The slow decrease is similar to other observed long tail distributions in genomics. Anticipating that the highest-quality human genome sequences will be obtained by a combination of various technologies, the analysis of
<italic>k</italic>
-mer distribution at different scales is a prominent factor for determining how these technologies can be optimally combined. We also identified the most redundant 1000-mers in the human genome, which include the region responsible for the chromosome 1q21.1 duplication syndrome, as well as other regions which are rich in segmental duplication and macrosatellites.</p>
</sec>
<sec>
<title>Availability of support data</title>
<p>The data set supporting the results of this article is included within the article and its additional file.</p>
</sec>
<sec>
<title>Abbreviations</title>
<p>BLAT: BLAST like alignment tool; BLAST: Basic local alignment search tool; CPU: Central processing unit; CNV: Copy number vatiations; DSK: Disk streaming of kmers; DNA: Deoxyribo-nucleic acids; FD: Frequency distribution; GRCh37: Genome reference consortium human (build) 37; LINE: Long interspersed elements; NAHR: Non-allelic homologous recombinations; NGS: Next-generation sequencing; OMIM: Online Mendelian Inheritance in Man; RAM: Random-access memory; RFD: Rank-frequency distribution; SINE: Short interspersed elements; SNP: Single nucleotide polymorphism; TE: Transposable elements; UCSC: University of California at Santa Cruz.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors’ contributions</title>
<p>WL conceived of the study and contribute to the analysis of the data. JF carried out the mapping of redundant 1000-mers to the reference genome. PM carried out the fitting of rank-frequency distribution. WL, JF, PM contribute to draft of the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>The additional file includes the supplementary Table S1: 172 chromosome locations with high-frequency (</bold>
<bold>
<italic>f</italic>
</bold>
<bold>
<italic></italic>
</bold>
<bold> 10) 1000-mers.</bold>
</p>
</caption>
<media xlink:href="1471-2105-15-2-S1.pdf">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>We would like to thank Oliver Clay, Andrew Shih, Astero Provata, Yannis Almirantis for discussions, and the authors of DSK for timely responding to our inquiries and fixing bugs. WL acknowledges the support from the Robert S Boas Center for Genomics and Human Genetics, and JF was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health under Award Number 1R03AR063340.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Rozowsky</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Euskirchen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Auerbach</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>ZD</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bjornson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Carriero</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>MB</given-names>
</name>
<article-title>
<bold>PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls</bold>
</article-title>
<source>NatBiotech</source>
<year>2009</year>
<volume>27</volume>
<fpage>66</fpage>
<lpage>75</lpage>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Cahill</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Köser</surname>
<given-names>CU</given-names>
</name>
<name>
<surname>Ross</surname>
<given-names>NE</given-names>
</name>
<name>
<surname>Archer</surname>
<given-names>JAC</given-names>
</name>
<article-title>
<bold>Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies</bold>
</article-title>
<source>PLoS ONE</source>
<year>2010</year>
<volume>5</volume>
<fpage>e11518</fpage>
<pub-id pub-id-type="pmid">20634954</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Koehler</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Issac</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cloonan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Grimmond</surname>
<given-names>SM</given-names>
</name>
<article-title>
<bold>The uniqueome: a mappability resource for short-tag sequencing</bold>
</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>272</fpage>
<lpage>274</lpage>
<pub-id pub-id-type="pmid">21075741</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Derrien</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Marco Sola</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Knowles</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Raineri</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ribeca</surname>
<given-names>P</given-names>
</name>
<collab>Estellé J</collab>
<article-title>
<bold>Fast computation and applications of genome mappability</bold>
</article-title>
<source>PLoS ONE</source>
<year>2012</year>
<volume>7</volume>
<fpage>e30377</fpage>
<pub-id pub-id-type="pmid">22276185</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Lee</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Schatz</surname>
<given-names>MC</given-names>
</name>
<article-title>
<bold>Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score</bold>
</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>2097</fpage>
<lpage>2105</lpage>
<pub-id pub-id-type="pmid">22668792</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Storvall</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Ramsköld</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sandberg</surname>
<given-names>R</given-names>
</name>
<article-title>
<bold>Efficient and comprehensive representation of uniqueness for next-Generation sequencing by minimum unique length analyses</bold>
</article-title>
<source>PLoS ONE</source>
<year>2013</year>
<volume>8</volume>
<fpage>e53822</fpage>
<pub-id pub-id-type="pmid">23349747</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Weber</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<article-title>
<bold>Human whole-genome shotgun sequencing</bold>
</article-title>
<source>Genome Res</source>
<year>1997</year>
<volume>7</volume>
<fpage>401</fpage>
<lpage>409</lpage>
<pub-id pub-id-type="pmid">9149936</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Green</surname>
<given-names>ED</given-names>
</name>
<article-title>
<bold>Strategies for the systematic sequencing of complex genomes</bold>
</article-title>
<source>Nat Rev Genet</source>
<year>2001</year>
<volume>2</volume>
<fpage>573</fpage>
<lpage>583</lpage>
<pub-id pub-id-type="pmid">11483982</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Fraenkel</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Gillis</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Appendix II. Proof that sequences of A, C, G, and T can be assembled to produce chains of ultimate length avoiding repetitions everywhere</bold>
</article-title>
<source>Prog Nucl Acids Res Mol Biol</source>
<year>1966</year>
<volume>5</volume>
<fpage>343</fpage>
<lpage>348</lpage>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Stoppa-Lyonnet</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Carter</surname>
<given-names>PE</given-names>
</name>
<name>
<surname>Meo</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tosi</surname>
<given-names>M</given-names>
</name>
<article-title>
<bold>Clusters of intragenic Alu repeats predispose the human C1 inhibitor locus to deleterious rearrangements</bold>
</article-title>
<source>Proc Natl Acad Sci</source>
<year>1990</year>
<volume>87</volume>
<fpage>1551</fpage>
<lpage>1555</lpage>
<pub-id pub-id-type="pmid">2154751</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Conrad</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Antonarakis</surname>
<given-names>SE</given-names>
</name>
<article-title>
<bold>Gene duplication: a drive for phenotypic diversity and cause of human disease</bold>
</article-title>
<source>Ann Rev Genomics Hum Genet</source>
<year>2007</year>
<volume>8</volume>
<fpage>17</fpage>
<lpage>35</lpage>
<pub-id pub-id-type="pmid">17386002</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="book">
<name>
<surname>Ohno</surname>
<given-names>S</given-names>
</name>
<source>Evolution by Gene Duplication</source>
<year>1970</year>
<publisher-name>New York: Springer-Verlag</publisher-name>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Nowak</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Cooke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Maynard Smith</surname>
<given-names>J</given-names>
</name>
<collab>Boerlijst</collab>
<article-title>
<bold>Evolution of genetic redundancy</bold>
</article-title>
<source>Nature</source>
<year>1997</year>
<volume>388</volume>
<fpage>167</fpage>
<lpage>171</lpage>
<pub-id pub-id-type="pmid">9217155</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Fortna</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>MacLaren</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Meltesen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Brenton</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hink</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Burgers</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hernandez-Boussard</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Karimpour-Fard</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Glueck</surname>
<given-names>D</given-names>
</name>
<name>
<surname>McGavran</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Berry</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Pollack</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sikela</surname>
<given-names>JM</given-names>
</name>
<article-title>
<bold>Lineage-specific gene duplication and loss in human and great ape evolution</bold>
</article-title>
<source>PLoS Biol</source>
<year>2004</year>
<volume>2</volume>
<fpage>E207</fpage>
<pub-id pub-id-type="pmid">15252450</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Krakauer</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Plotkin</surname>
<given-names>JB</given-names>
</name>
<article-title>
<bold>Redundancy, antiredundancy, and the robustness of genomes</bold>
</article-title>
<source>Proc Natl Acad Sci</source>
<year>2002</year>
<volume>99</volume>
<fpage>1405</fpage>
<lpage>1409</lpage>
<pub-id pub-id-type="pmid">11818563</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Marcotte</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Pellegrini</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yeates</surname>
<given-names>TO</given-names>
</name>
<name>
<surname>Eisenberg</surname>
<given-names>D</given-names>
</name>
<article-title>
<bold>A cencus of protein repeats</bold>
</article-title>
<source>J Mol Biol</source>
<year>1998</year>
<volume>293</volume>
<fpage>151</fpage>
<lpage>160</lpage>
<pub-id pub-id-type="pmid">10512723</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>N</given-names>
</name>
<name>
<surname>He</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Pong</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Law</surname>
<given-names>M</given-names>
</name>
<article-title>
<bold>Comparison of next-generation sequencing systems</bold>
</article-title>
<source>J Biomed Biotech</source>
<year>2012</year>
<volume>2012</volume>
<fpage>251364</fpage>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Eisenstein</surname>
<given-names>M</given-names>
</name>
<article-title>
<bold>Companies 'going long’ generate sequencing buzz at Marco island (news)</bold>
</article-title>
<source>Nat Biotech</source>
<year>2013</year>
<volume>31</volume>
<fpage>265</fpage>
<lpage>266</lpage>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Heiner</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ashby</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Underwood</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Greater than 10 kb read lengths routine when sequencing with Pacific Biosciences’ XL release</bold>
</article-title>
<source>J Biomol Tech</source>
<year>2013</year>
<volume>24(suppl)</volume>
<fpage>S43</fpage>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Brown</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>deSouza</surname>
<given-names>PV</given-names>
</name>
<name>
<surname>Mercer</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Pietra</surname>
<given-names>VJ</given-names>
</name>
<name>
<surname>Lao</surname>
<given-names>JC</given-names>
</name>
<article-title>
<bold>Class-based n-gram models of natural languages</bold>
</article-title>
<source>J Comp Linguist</source>
<year>1992</year>
<volume>18</volume>
<fpage>467</fpage>
<lpage>479</lpage>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="book">
<name>
<surname>Baayen</surname>
<given-names>RH</given-names>
</name>
<source>Word Frequency Distribution</source>
<year>2001</year>
<publisher-name>Dordrecht: Kluwer Academic Publishers</publisher-name>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="other">
<name>
<surname>Phoophakdee</surname>
<given-names>B</given-names>
</name>
<article-title>
<bold>TRELLIS: genome-size disk-based suffix tree indexing algorithm</bold>
</article-title>
<source>Ph.D Thesis, Rensselaer Polytechnic Institute, Troy, NY,</source>
<year>2007</year>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Phoophakdee</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zaki</surname>
<given-names>MJ</given-names>
</name>
<article-title>
<bold>TRELLIS+: an effective approach for indexing genome-scale sequences using suffix trees</bold>
</article-title>
<source>Pacif Sym Biocomp</source>
<year>2008</year>
<volume>2008</volume>
<fpage>90</fpage>
<lpage>101</lpage>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lam</surname>
<given-names>TW</given-names>
</name>
<name>
<surname>Kristiansen</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<collab>Y SM</collab>
<article-title>
<bold>SOAP2: an improved ultrafast tool for short read alignment</bold>
</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>1966</fpage>
<lpage>1967</lpage>
<pub-id pub-id-type="pmid">19497933</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Chu</surname>
<given-names>HT</given-names>
</name>
<name>
<surname>Hsiao</surname>
<given-names>WWL</given-names>
</name>
<name>
<surname>Tsao</surname>
<given-names>TT</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Kao</surname>
<given-names>CY</given-names>
</name>
<article-title>
<bold>SeqEntropy: genome-wide assessment of repeats for short read sequencing</bold>
</article-title>
<source>PLoS ONE</source>
<year>2013</year>
<volume>8</volume>
<fpage>e59484</fpage>
<pub-id pub-id-type="pmid">23544073</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Rizk</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lavenier</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Chikhi</surname>
<given-names>R</given-names>
</name>
<article-title>
<bold>DSK, k-mer counting with very low memory usage</bold>
</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>652</fpage>
<lpage>653</lpage>
<pub-id pub-id-type="pmid">23325618</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Kurtz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Narechania</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Ware</surname>
<given-names>D</given-names>
</name>
<article-title>
<bold>A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes</bold>
</article-title>
<source>BMC Genomics</source>
<year>2008</year>
<volume>9</volume>
<fpage>517</fpage>
<pub-id pub-id-type="pmid">18976482</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Marçais</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
<article-title>
<bold>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers</bold>
</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>764</fpage>
<lpage>770</lpage>
<pub-id pub-id-type="pmid">21217122</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Melsted</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pritchard</surname>
<given-names>JK</given-names>
</name>
<article-title>
<bold>Effecient counting of k-mers in DNA sequences using a bloom filter</bold>
</article-title>
<source>BMC Bioinfo</source>
<year>2011</year>
<volume>12</volume>
<fpage>333</fpage>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="book">
<name>
<surname>Anderson</surname>
<given-names>C</given-names>
</name>
<source>The Long Tail: Why the Future of Business is Selling Less of More</source>
<year>2006</year>
<publisher-name>New York: Hyperion</publisher-name>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Clauset</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Shalizi</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Newman</surname>
<given-names>MEJ</given-names>
</name>
<article-title>
<bold>Power-law distributions in empirical data</bold>
</article-title>
<source>SIAM Rev</source>
<year>2007</year>
<volume>51</volume>
<fpage>661</fpage>
<lpage>703</lpage>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="book">
<collab>Zipf GK</collab>
<source>Human Behavior and the Principle of Least Effort</source>
<year>1949</year>
<publisher-name>Addison-Wesley</publisher-name>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Sharp</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Locke</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>McGrath</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Bailey</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Vallente</surname>
<given-names>RU</given-names>
</name>
<name>
<surname>Pertz</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Segraves</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Oseroff</surname>
<given-names>VV</given-names>
</name>
<name>
<surname>Albertson</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Pinkel</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>
<bold>Segmental duplications and copy-number variation in the human genome</bold>
</article-title>
<source>Am J Hum Genet</source>
<year>2005</year>
<volume>77</volume>
<fpage>78</fpage>
<lpage>88</lpage>
<pub-id pub-id-type="pmid">15918152</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Perry</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Tchinda</surname>
<given-names>J</given-names>
</name>
<name>
<surname>McGrath</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Picker</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Cáceres</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Iafrate</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Tyler-Smith</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Stone</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<article-title>
<bold>Hotspots for copy number variation in chimpanzees and humans</bold>
</article-title>
<source>Proc Natl Acad Sci</source>
<year>2006</year>
<volume>101</volume>
<fpage>8006</fpage>
<lpage>8011</lpage>
<pub-id pub-id-type="pmid">16702545</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Genovese</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Altemose</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lindgren</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Chambert</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Pasaniuc</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Price</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Reich</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Morton</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Pollak</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>McCarroll</surname>
<given-names>SA</given-names>
</name>
<article-title>
<bold>Using population admixture to help complete maps of the human genome</bold>
</article-title>
<source>Nat Genet</source>
<year>2013</year>
<volume>45</volume>
<fpage>406</fpage>
<lpage>414</lpage>
<pub-id pub-id-type="pmid">23435088</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<name>
<surname>Wheeler</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Clements</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Hubley</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Jurka</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Smit</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Finn</surname>
<given-names>RD</given-names>
</name>
<article-title>
<bold>Dfam: a database of repetitive DNA based on profile hidden Markov models</bold>
</article-title>
<source>Nucleic Acids Res</source>
<year>2013</year>
<volume>41</volume>
<fpage>D70</fpage>
<lpage>D82</lpage>
<pub-id pub-id-type="pmid">23203985</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Benson</surname>
<given-names>G</given-names>
</name>
<article-title>
<bold>Tandem repeats finder: a program to analyze DNA sequences</bold>
</article-title>
<source>Nucleic Acids Res</source>
<year>1999</year>
<volume>27</volume>
<fpage>573</fpage>
<lpage>580</lpage>
<pub-id pub-id-type="pmid">9862982</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Bailey</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Reinert</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Samonte</surname>
<given-names>RV</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>
<bold>Recent segmental duplications in the human genome</bold>
</article-title>
<source>Science</source>
<year>2002</year>
<volume>297</volume>
<fpage>1003</fpage>
<lpage>1007</lpage>
<pub-id pub-id-type="pmid">12169732</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Gish</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
<article-title>
<bold>Basic local alignment search tool</bold>
</article-title>
<source>J Mol Biol</source>
<year>1990</year>
<volume>215</volume>
<fpage>403</fpage>
<lpage>410</lpage>
<pub-id pub-id-type="pmid">2231712</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>Bailey</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Yavor</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Massa</surname>
<given-names>HF</given-names>
</name>
<name>
<surname>Trask</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>
<bold>Segmental duplications: organization and impact within the current human genome project assembly</bold>
</article-title>
<source>Genome Res</source>
<year>2001</year>
<volume>11</volume>
<fpage>1005</fpage>
<lpage>1007</lpage>
<pub-id pub-id-type="pmid">11381028</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Cheung</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Estivill</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Khaja</surname>
<given-names>R</given-names>
</name>
<name>
<surname>MacDonald</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Lau</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tsui</surname>
<given-names>LC</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>SW</given-names>
</name>
<article-title>
<bold>Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence</bold>
</article-title>
<source>Genome Biol</source>
<year>2003</year>
<volume>4</volume>
<fpage>R25</fpage>
<pub-id pub-id-type="pmid">12702206</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miramontes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cocho</surname>
<given-names>G</given-names>
</name>
<article-title>
<bold>Fitting ranked linguistic data with two-parameter functions</bold>
</article-title>
<source>Entropy</source>
<year>2010</year>
<volume>12</volume>
<fpage>1743</fpage>
<lpage>1764</lpage>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Miramontes</surname>
<given-names>P</given-names>
</name>
<article-title>
<bold>Fitting ranked English and Spanish letter frequency distribution in US and Mexican presidential speeches</bold>
</article-title>
<source>J Quant Linguist</source>
<year>2011</year>
<volume>18</volume>
<fpage>337</fpage>
<lpage>358</lpage>
</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="journal">
<name>
<surname>Mansilla</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Köppen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Cocho</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Miramontes</surname>
<given-names>P</given-names>
</name>
<article-title>
<bold>On the behavior of journal impact factor rank-order distribution</bold>
</article-title>
<source>J Infometrics</source>
<year>2007</year>
<volume>1</volume>
<fpage>155</fpage>
<lpage>160</lpage>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="journal">
<name>
<surname>Martínez-Mekler</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Alvarez Martínez</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Beltrán del</surname>
<given-names>Río</given-names>
<suffix>M</suffix>
</name>
<name>
<surname>Mansilla</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Miramontes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cocho</surname>
<given-names>G</given-names>
</name>
<article-title>
<bold>Universality of rank-ordering distributions in the arts and sciences</bold>
</article-title>
<source>PLoS ONE</source>
<year>2009</year>
<volume>4</volume>
<fpage>e4791</fpage>
<pub-id pub-id-type="pmid">19277122</pub-id>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="other">
<name>
<surname>Miramontes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Cocho</surname>
<given-names>G</given-names>
</name>
<article-title>
<bold>Some critical support for power laws and their variations</bold>
</article-title>
<comment>arXiv preprint. arXiv:nlin.AO/1204.3124, 2012</comment>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="journal">
<name>
<surname>Haubold</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Pierstorff</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Möller</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Wiehe</surname>
<given-names>T</given-names>
</name>
<article-title>
<bold>Genome comparison without alignment using shortest unique substrings</bold>
</article-title>
<source>BMC Bioinfo</source>
<year>2005</year>
<volume>6</volume>
<fpage>123</fpage>
</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="journal">
<name>
<surname>Treangen</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>
<bold>Repetitive DNA and next-generation sequencing: computational challenges and solutions</bold>
</article-title>
<source>Nat Rev Genet</source>
<year>2012</year>
<volume>13</volume>
<fpage>36</fpage>
<lpage>46</lpage>
<pub-id pub-id-type="pmid">22124482</pub-id>
</mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sosa</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jose</surname>
<given-names>MV</given-names>
</name>
<article-title>
<bold>Human repetitive sequence densities are mostly negatively correlated with R/Y-based nucleosome-positioning motifs and positively correlated with W/S-based motifs</bold>
</article-title>
<source>Genomics</source>
<year>2013</year>
<volume>101</volume>
<fpage>125</fpage>
<lpage>133</lpage>
</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="other">
<name>
<surname>Sindi</surname>
<given-names>SS</given-names>
</name>
<article-title>
<bold>Describing and Modeling Repetitive Sequences in DNA</bold>
</article-title>
<comment>Ph.D Thesis, Univ. of Maryland; 2006</comment>
</mixed-citation>
</ref>
<ref id="B51">
<mixed-citation publication-type="journal">
<name>
<surname>Sindi</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Hunt</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>Yorke</surname>
<given-names>JA</given-names>
</name>
<article-title>
<bold>Duplication count distributions in DNA sequences</bold>
</article-title>
<source>Phys Rev E</source>
<year>2008</year>
<volume>78</volume>
<fpage>061912</fpage>
</mixed-citation>
</ref>
<ref id="B52">
<mixed-citation publication-type="book">
<name>
<surname>Gabaix</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Ioannides</surname>
<given-names>YM</given-names>
</name>
<person-group person-group-type="editor">Henderson V, Thisse JF</person-group>
<article-title>
<bold>The evolution of city size distributions.</bold>
</article-title>
<source>Handbook of Regional and Urban Economics</source>
<year>2004</year>
<publisher-name>North-Holland</publisher-name>
</mixed-citation>
</ref>
<ref id="B53">
<mixed-citation publication-type="journal">
<name>
<surname>Eeckhout</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Gibrat’s law for (all) cities</bold>
</article-title>
<source>Am Eco Rev</source>
<year>2004</year>
<volume>94</volume>
<fpage>1429</fpage>
<lpage>1451</lpage>
</mixed-citation>
</ref>
<ref id="B54">
<mixed-citation publication-type="journal">
<name>
<surname>Vandepoele</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Van Roy</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Staes</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Speleman</surname>
<given-names>F</given-names>
</name>
<name>
<surname>van Roy</surname>
<given-names>F</given-names>
</name>
<article-title>
<bold>A novel gene family NBPF: intricate structure generated by gene duplications during primate evolution</bold>
</article-title>
<source>Mol Biol Evol</source>
<year>2005</year>
<volume>22</volume>
<fpage>2265</fpage>
<lpage>2274</lpage>
<pub-id pub-id-type="pmid">16079250</pub-id>
</mixed-citation>
</ref>
<ref id="B55">
<mixed-citation publication-type="journal">
<name>
<surname>Paar</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Glunc̆ić</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rosandić</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Basar</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Vlahović</surname>
<given-names>I</given-names>
</name>
<article-title>
<bold>Intragene higher order repeats in neuroblastoma breakpoint family genes distinguish humans from chimpanzees</bold>
</article-title>
<source>Mol Biol Evol</source>
<year>2011</year>
<volume>28</volume>
<fpage>1877</fpage>
<lpage>1892</lpage>
<pub-id pub-id-type="pmid">21273634</pub-id>
</mixed-citation>
</ref>
<ref id="B56">
<mixed-citation publication-type="journal">
<name>
<surname>Dumas</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>O’Bleness</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Dickens</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Keeney</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sikela</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Raznahan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Giedd</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rapoport</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nagamani</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Erez</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Brunetti-Pierri</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Sugalski</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lupski</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Fingerlin</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Sikela</surname>
<given-names>JM</given-names>
</name>
<article-title>
<bold>DUF1220-domain copy number implicated in human brain-size pathology and evolution</bold>
</article-title>
<source>Am J Hum Genet</source>
<year>2012</year>
<volume>91</volume>
<fpage>444</fpage>
<lpage>454</lpage>
<pub-id pub-id-type="pmid">22901949</pub-id>
</mixed-citation>
</ref>
<ref id="B57">
<mixed-citation publication-type="journal">
<name>
<surname>Chen</surname>
<given-names>YT</given-names>
</name>
<name>
<surname>Iseli</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Venditti</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Old</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Simpson</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Jongeneel</surname>
<given-names>CV</given-names>
</name>
<article-title>
<bold>Identification of a new cancer/testis gene family, CT47, among expressed multicopy genes on the human X chromosome</bold>
</article-title>
<source>Genes Chromosomes Cancer</source>
<year>2006</year>
<volume>45</volume>
<fpage>392</fpage>
<lpage>400</lpage>
<pub-id pub-id-type="pmid">16382448</pub-id>
</mixed-citation>
</ref>
<ref id="B58">
<mixed-citation publication-type="journal">
<name>
<surname>Dobrynin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Matyunina</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Malov</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Kozlov</surname>
<given-names>AP</given-names>
</name>
<article-title>
<bold>The novelty of human cancer/testis antigen encoding genes in evolution</bold>
</article-title>
<source>Int J Genomics</source>
<year>2013</year>
<volume>2013</volume>
<fpage>105108</fpage>
<pub-id pub-id-type="pmid">23691492</pub-id>
</mixed-citation>
</ref>
<ref id="B59">
<mixed-citation publication-type="journal">
<name>
<surname>Giacalone</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Friedes</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Francke</surname>
<given-names>U</given-names>
</name>
<article-title>
<bold>A novel GC-rich human macrosatellite VNTR in Xq24 is differentially methylated on active and inactive X chromosomes</bold>
</article-title>
<source>Nat Genet</source>
<year>1992</year>
<volume>1</volume>
<fpage>137</fpage>
<lpage>143</lpage>
<pub-id pub-id-type="pmid">1302007</pub-id>
</mixed-citation>
</ref>
<ref id="B60">
<mixed-citation publication-type="journal">
<name>
<surname>Tremblay</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Moseley</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chadwick</surname>
<given-names>BP</given-names>
</name>
<article-title>
<bold>Variation in array size, monomer composition and expression of the macrosatellite DXZ4</bold>
</article-title>
<source>PLoS ONE</source>
<year>2010</year>
<volume>6</volume>
<fpage>e18969</fpage>
<pub-id pub-id-type="pmid">21544201</pub-id>
</mixed-citation>
</ref>
<ref id="B61">
<mixed-citation publication-type="journal">
<name>
<surname>Schaap</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lemmers</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Maassen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>van der Vliet</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Hoogerheide</surname>
<given-names>LF</given-names>
</name>
<name>
<surname>van Dijk</surname>
<given-names>HK</given-names>
</name>
<name>
<surname>Baştürk</surname>
<given-names>N</given-names>
</name>
<name>
<surname>de Knijff</surname>
<given-names>P</given-names>
</name>
<name>
<surname>van der Maarel</surname>
<given-names>SM</given-names>
</name>
<article-title>
<bold>Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions</bold>
</article-title>
<source>BMC Genomics</source>
<year>2013</year>
<volume>14</volume>
<fpage>143</fpage>
<pub-id pub-id-type="pmid">23496858</pub-id>
</mixed-citation>
</ref>
<ref id="B62">
<mixed-citation publication-type="journal">
<name>
<surname>Horakova</surname>
<given-names>AH</given-names>
</name>
<name>
<surname>Moseley</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>McLaughlin</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Tremblay</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Chadwick</surname>
<given-names>BP</given-names>
</name>
<article-title>
<bold>The macrosatellite DXZ4 mediates CTCF-dependent long-range intrachromosomal interactions on the human inactive X chromosome</bold>
</article-title>
<source>Hum Mol Genet</source>
<year>2012</year>
<volume>21</volume>
<fpage>4367</fpage>
<lpage>4377</lpage>
<pub-id pub-id-type="pmid">22791747</pub-id>
</mixed-citation>
</ref>
<ref id="B63">
<mixed-citation publication-type="journal">
<name>
<surname>Smit</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Tóth</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Riggs</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Jurka</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences</bold>
</article-title>
<source>J Mol Biol</source>
<year>1995</year>
<volume>246</volume>
<fpage>401</fpage>
<lpage>417</lpage>
<pub-id pub-id-type="pmid">7877164</pub-id>
</mixed-citation>
</ref>
<ref id="B64">
<mixed-citation publication-type="journal">
<name>
<surname>Bailey</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Richler</surname>
<given-names>EE</given-names>
</name>
<article-title>
<bold>An Alu transposition model for the origin and expansion of human segmental duplications</bold>
</article-title>
<source>Am J Hum Genet</source>
<year>2003</year>
<volume>73</volume>
<fpage>823</fpage>
<lpage>834</lpage>
<pub-id pub-id-type="pmid">14505274</pub-id>
</mixed-citation>
</ref>
<ref id="B65">
<mixed-citation publication-type="journal">
<name>
<surname>Edgar</surname>
<given-names>RC</given-names>
</name>
<article-title>
<bold>MUSCLE: a multiple sequence alignment method with reduced time and space complexity</bold>
</article-title>
<source>BMC Bioinfo</source>
<year>2004</year>
<volume>5</volume>
<fpage>113</fpage>
</mixed-citation>
</ref>
<ref id="B66">
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Schröder</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>B</given-names>
</name>
<article-title>
<bold>Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data</bold>
</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>308</fpage>
<lpage>315</lpage>
<pub-id pub-id-type="pmid">23202746</pub-id>
</mixed-citation>
</ref>
<ref id="B67">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
<article-title>
<bold>Estimating the repeat structure and length of DNA sequences using l-tuples</bold>
</article-title>
<source>Genome Res</source>
<year>2003</year>
<volume>13</volume>
<fpage>1916</fpage>
<lpage>1922</lpage>
<pub-id pub-id-type="pmid">12902383</pub-id>
</mixed-citation>
</ref>
<ref id="B68">
<mixed-citation publication-type="journal">
<name>
<surname>Rosenfeld</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>CE</given-names>
</name>
<article-title>
<bold>Pervasive sequence patents cover the entire human genome</bold>
</article-title>
<source>Genome Med</source>
<year>2013</year>
<volume>5</volume>
<fpage>27</fpage>
<pub-id pub-id-type="pmid">23522065</pub-id>
</mixed-citation>
</ref>
<ref id="B69">
<mixed-citation publication-type="journal">
<name>
<surname>Chen</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Nyeo</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Yeh</surname>
<given-names>CY</given-names>
</name>
<article-title>
<bold>Model for the distributions of k-mers in DNA sequences</bold>
</article-title>
<source>Phys Rev E</source>
<year>2005</year>
<volume>72</volume>
<fpage>011908</fpage>
</mixed-citation>
</ref>
<ref id="B70">
<mixed-citation publication-type="journal">
<name>
<surname>Nikolaou</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Almirantis</surname>
<given-names>Y</given-names>
</name>
<article-title>
<bold>'Word’ preference in the genomic text and genome evolution: different modes of n-tuplet usage in coding and noncoding sequences</bold>
</article-title>
<source>J Mol Evol</source>
<year>2005</year>
<volume>61</volume>
<fpage>23</fpage>
<lpage>25</lpage>
<pub-id pub-id-type="pmid">16059753</pub-id>
</mixed-citation>
</ref>
<ref id="B71">
<mixed-citation publication-type="book">
<name>
<surname>Xie</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>B</given-names>
</name>
<source>Visualization of K-tuple distribution in procaryote complete genomes and their randomized counterparts</source>
<year>2002</year>
<publisher-name>Los Alamitos: IEEE Computer Society Press</publisher-name>
</mixed-citation>
</ref>
<ref id="B72">
<mixed-citation publication-type="journal">
<name>
<surname>Chor</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Horn</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Goldman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Massingham</surname>
<given-names>T</given-names>
</name>
<article-title>
<bold>Genomic DNA k-mer spectra: models and modalities</bold>
</article-title>
<source>Genome Biol</source>
<year>2009</year>
<volume>10</volume>
<fpage>R108</fpage>
<pub-id pub-id-type="pmid">19814784</pub-id>
</mixed-citation>
</ref>
<ref id="B73">
<mixed-citation publication-type="journal">
<name>
<surname>Paszkiewicz</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Studholme</surname>
<given-names>DJ</given-names>
</name>
<article-title>
<bold>de novo assembly of short sequence reads</bold>
</article-title>
<source>Brief Bioinfo</source>
<year>2010</year>
<volume>11</volume>
<fpage>457</fpage>
<lpage>472</lpage>
</mixed-citation>
</ref>
<ref id="B74">
<mixed-citation publication-type="other">
<name>
<surname>Bradnam</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Fass</surname>
<given-names>JN</given-names>
</name>
<name>
<surname>Alexandrov</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Baranay</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Bechner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Birol</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Boisvert</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Chapuis</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chikhi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Chitsaz</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Corbeil</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Del Fabbro</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Docking</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Earl</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Emrich</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fedotov</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fonseca</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Ganapathy</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gibbs</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Gnerre</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Godzaridis</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Goldstein</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Haimel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Haussler</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hiatt</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>IY</given-names>
</name>
<collab>Chou</collab>
<etal></etal>
<article-title>
<bold>Assemblathon 2: evaluting de novo methods of genome assembly in three vertebrate species</bold>
</article-title>
<comment>arXiv preprint. arXiv:q-bio.GN/1301.5406, 2013</comment>
</mixed-citation>
</ref>
<ref id="B75">
<mixed-citation publication-type="journal">
<name>
<surname>Muñoz</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Gallo</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Misas</surname>
<given-names>E</given-names>
</name>
<name>
<surname>McEwan</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>Clay</surname>
<given-names>OK</given-names>
</name>
<article-title>
<bold>The eukaryotic genome, its reads, and the unfinished assembly</bold>
</article-title>
<source>FEBS Lett</source>
<year>2013</year>
<volume>587</volume>
<fpage>2090</fpage>
<lpage>2093</lpage>
<pub-id pub-id-type="pmid">23727201</pub-id>
</mixed-citation>
</ref>
<ref id="B76">
<mixed-citation publication-type="journal">
<name>
<surname>Zerbino</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
<article-title>
<bold>Velvet: Algorithms for de novo short read assembly using de Bruijn graphs</bold>
</article-title>
<source>Genome Res</source>
<year>2008</year>
<volume>18</volume>
<fpage>821</fpage>
<lpage>829</lpage>
<pub-id pub-id-type="pmid">18349386</pub-id>
</mixed-citation>
</ref>
<ref id="B77">
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lam</surname>
<given-names>TW</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>R</given-names>
</name>
<article-title>
<bold>COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly</bold>
</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>28</volume>
<fpage>2870</fpage>
<lpage>2874</lpage>
<pub-id pub-id-type="pmid">23044551</pub-id>
</mixed-citation>
</ref>
<ref id="B78">
<mixed-citation publication-type="journal">
<name>
<surname>Christiansen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Dyck</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Elyas</surname>
<given-names>BG</given-names>
</name>
<name>
<surname>Lilley</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bamforth</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Hicks</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sprysak</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Tomaszewski</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Haase</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Vicen-Wyhony</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Somerville</surname>
<given-names>MJ</given-names>
</name>
<article-title>
<bold>Chromosome 1q21.1 contiguous gene deletion is associated with congenital heart disease</bold>
</article-title>
<source>Circ Res</source>
<year>2004</year>
<volume>94</volume>
<fpage>1429</fpage>
<lpage>1435</lpage>
<pub-id pub-id-type="pmid">15117819</pub-id>
</mixed-citation>
</ref>
<ref id="B79">
<mixed-citation publication-type="journal">
<name>
<surname>Redon</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ishikawa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fitch</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Feuk</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Perry</surname>
<given-names>GH</given-names>
</name>
<name>
<surname>Andrews</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Fiegler</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Shapero</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Carson</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Cho</surname>
<given-names>EK</given-names>
</name>
<name>
<surname>Dallaire</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Freeman</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>González</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Gratacós</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kalaitzopoulos</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Komura</surname>
<given-names>D</given-names>
</name>
<name>
<surname>MacDonald</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Mei</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Montgomery</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Nishimura</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Okamura</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Somerville</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Tchinda</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Valsesia</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Woodwark</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>F</given-names>
</name>
<etal></etal>
<article-title>
<bold>Global variation in copy number in the human genome</bold>
</article-title>
<source>Nature</source>
<year>2006</year>
<volume>444</volume>
<fpage>444</fpage>
<lpage>454</lpage>
<pub-id pub-id-type="pmid">17122850</pub-id>
</mixed-citation>
</ref>
<ref id="B80">
<mixed-citation publication-type="journal">
<name>
<surname>Greenway</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>DePalma</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Israel</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Mesquita</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Ergul</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Conta</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Korn</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>McCarroll</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Gorham</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Gabriel</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Altshuler</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Quintanilla-Dieck Mde</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Artunduaga</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Eavey</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Plenge</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Shadick</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Weinblatt</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>De Jager</surname>
<given-names>PL</given-names>
</name>
<name>
<surname>Hafler</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Breitbart</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Seidman</surname>
<given-names>JG</given-names>
</name>
<name>
<surname>Seidman</surname>
<given-names>CE</given-names>
</name>
<article-title>
<bold>De novo copy number variants identify new genes and loci in isolated sporadic tetralogy of Fallot</bold>
</article-title>
<source>Nat Genet</source>
<year>2009</year>
<volume>41</volume>
<fpage>931</fpage>
<lpage>935</lpage>
<pub-id pub-id-type="pmid">19597493</pub-id>
</mixed-citation>
</ref>
<ref id="B81">
<mixed-citation publication-type="journal">
<name>
<surname>Szatmari</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Paterson</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Zwaigenbaum</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Roberts</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Brian</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>XQ</given-names>
</name>
<name>
<surname>Vincent</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Skaug</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Senman</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Feuk</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Qian</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bryson</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Scherer</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Vieland</surname>
<given-names>VJ</given-names>
</name>
<name>
<surname>Bartlett</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Mangin</surname>
<given-names>LV</given-names>
</name>
<name>
<surname>Goedken</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Segre</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pericak-Vance</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Cuccaro</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Gilbert</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Wright</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Abramson</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Betancur</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bourgeron</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Gillberg</surname>
<given-names>C</given-names>
</name>
<collab>Autism Genome, Project Consortium</collab>
<etal></etal>
<article-title>
<bold>Mapping autism risk loci using genetic linkage and chromosomal rearrangements</bold>
</article-title>
<source>Nat Genet</source>
<year>2007</year>
<volume>39</volume>
<fpage>319</fpage>
<lpage>328</lpage>
<pub-id pub-id-type="pmid">17322880</pub-id>
</mixed-citation>
</ref>
<ref id="B82">
<mixed-citation publication-type="journal">
<name>
<surname>Girirajan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Dennis</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Malig</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Coe</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>CD</given-names>
</name>
<name>
<surname>Mark</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Vu</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Alkan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Biesecker</surname>
<given-names>LG</given-names>
</name>
<name>
<surname>Bernier</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<article-title>
<bold>Refinement and discovery of new hotspots of copy-number variation associated with autism spectrum disorder</bold>
</article-title>
<source>Am J Hum Genet</source>
<year>2013</year>
<volume>92</volume>
<fpage>221</fpage>
<lpage>237</lpage>
<pub-id pub-id-type="pmid">23375656</pub-id>
</mixed-citation>
</ref>
<ref id="B83">
<mixed-citation publication-type="journal">
<name>
<surname>Mefford</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Sharp</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Itsara</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Buysse</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Maloney</surname>
<given-names>VK</given-names>
</name>
<name>
<surname>Crolla</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Baralle</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mercer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Norga</surname>
<given-names>K</given-names>
</name>
<name>
<surname>de Ravel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Devriendt</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Bongers</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>de Leeuw</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Reardon</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Gimelli</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bena</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Hennekam</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Male</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gaunt</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Clayton-Smith</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Simonic</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Mehta</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Nik-Zainal</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Woods</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Firth</surname>
<given-names>HV</given-names>
</name>
<etal></etal>
<article-title>
<bold>Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes</bold>
</article-title>
<source>New Eng J Med</source>
<year>2008</year>
<volume>359</volume>
<fpage>1685</fpage>
<lpage>1699</lpage>
<pub-id pub-id-type="pmid">18784092</pub-id>
</mixed-citation>
</ref>
<ref id="B84">
<mixed-citation publication-type="journal">
<name>
<surname>Brunetti-Pierri</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Berg</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Scaglia</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Belmont</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bacino</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Sahoo</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lalani</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Graham</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Shinawi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Pursley</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lotze</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kennedy</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lansky-Shafer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Weaver</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Roeder</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Grebe</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Arnold</surname>
<given-names>GL</given-names>
</name>
<name>
<surname>Hutchison</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Reimschisel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Amato</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Geragthy</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Innis</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Obersztyn</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Nowakowska</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Rosengren</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>PI</given-names>
</name>
<name>
<surname>Grange</surname>
<given-names>DK</given-names>
</name>
<etal></etal>
<article-title>
<bold>Recurrent reciprocal 1q21.1 deletions and duplications associated with microcephaly or macrocephaly and developmental and behavioral abnormalities</bold>
</article-title>
<source>Nat Genet</source>
<year>2008</year>
<volume>40</volume>
<fpage>1466</fpage>
<lpage>1471</lpage>
<pub-id pub-id-type="pmid">19029900</pub-id>
</mixed-citation>
</ref>
<ref id="B85">
<mixed-citation publication-type="journal">
<collab>The International, Schizophrenia Consortium</collab>
<article-title>
<bold>Rare chromosomal deletions and duplications increase risk of schizophrenia</bold>
</article-title>
<source>Nature</source>
<year>2008</year>
<volume>455</volume>
<fpage>237</fpage>
<lpage>241</lpage>
<pub-id pub-id-type="pmid">18668038</pub-id>
</mixed-citation>
</ref>
<ref id="B86">
<mixed-citation publication-type="journal">
<name>
<surname>Ikeda</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Aleksic</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kirov</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kinoshita</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yamanouchi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kitajima</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kawashima</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Okochi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kishi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Zaharieva</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Owen</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>O’Donovan</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Ozaki</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Iwata</surname>
<given-names>N</given-names>
</name>
<article-title>
<bold>Copy number variation in schizophrenia in the Japanese population</bold>
</article-title>
<source>Biol Psych</source>
<year>2010</year>
<volume>67</volume>
<fpage>283</fpage>
<lpage>286</lpage>
</mixed-citation>
</ref>
<ref id="B87">
<mixed-citation publication-type="journal">
<name>
<surname>Diskin</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Hou</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Glessner</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Attiyeh</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Laudenslager</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bosse</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Cole</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mossé</surname>
<given-names>YP</given-names>
</name>
<name>
<surname>Wood</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lynch</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Pecor</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Diamond</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Winter</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Geiger</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>McGrady</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Blakemore</surname>
<given-names>AI</given-names>
</name>
<name>
<surname>London</surname>
<given-names>WB</given-names>
</name>
<name>
<surname>Shaikh</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Bradfield</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Grant</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Devoto</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rappaport</surname>
<given-names>ER</given-names>
</name>
<name>
<surname>Hakonarson</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Maris</surname>
<given-names>JM</given-names>
</name>
<article-title>
<bold>Copy number variation at 1q21.1 associated with neuroblastoma</bold>
</article-title>
<source>Nature</source>
<year>2009</year>
<volume>459</volume>
<fpage>987</fpage>
<lpage>991</lpage>
<pub-id pub-id-type="pmid">19536264</pub-id>
</mixed-citation>
</ref>
<ref id="B88">
<mixed-citation publication-type="journal">
<name>
<surname>Isrie</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Froyen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Devriendt</surname>
<given-names>K</given-names>
</name>
<name>
<surname>de Ravel</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Fryns</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Vermeesch</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Van Esch</surname>
<given-names>H</given-names>
</name>
<article-title>
<bold>Sporadic male patients with intellectual disability: contribution of X-chromosome copy number variants</bold>
</article-title>
<source>Euro J Med Genet</source>
<year>2012</year>
<volume>55</volume>
<fpage>577</fpage>
<lpage>585</lpage>
</mixed-citation>
</ref>
<ref id="B89">
<mixed-citation publication-type="journal">
<name>
<surname>Moseley</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Rizkallah</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tremblay</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>Hurt</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Chadwick</surname>
<given-names>BP</given-names>
</name>
<article-title>
<bold>YY1 associates with the macrosatellite DXZ4 on the inactive X chromosome and binds with CTCF to a hypomethylated form in some male carcinomas</bold>
</article-title>
<source>Nucleic Acids Res</source>
<year>2012</year>
<volume>40</volume>
<fpage>1596</fpage>
<lpage>1608</lpage>
<pub-id pub-id-type="pmid">22064860</pub-id>
</mixed-citation>
</ref>
<ref id="B90">
<mixed-citation publication-type="journal">
<name>
<surname>Whibley</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Plagnol</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Tarpay</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Abidi</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Fullston</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Choma</surname>
<given-names>MK</given-names>
</name>
<name>
<surname>Boucher</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Shepherd</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Willatt</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Parkin</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Futreal</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Shaw</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Boyle</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Licata</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Skinner</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stevenson</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hackett</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Gecz</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stratton</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Raymond</surname>
<given-names>FL</given-names>
</name>
<article-title>
<bold>Fine-scale survey of X chromosome copy number variants and indels underlying intellectual disability</bold>
</article-title>
<source>Am J Hum Genet</source>
<year>2010</year>
<volume>87</volume>
<fpage>173</fpage>
<lpage>188</lpage>
<pub-id pub-id-type="pmid">20655035</pub-id>
</mixed-citation>
</ref>
<ref id="B91">
<mixed-citation publication-type="journal">
<name>
<surname>Honda</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hayashi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Imoto</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Toyama</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Okazawa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nakagawa</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Goto</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Inazawa</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Copy-number variations on the X chromosome in Japanese patients with mental retardation detected by array-based comparative genomic hybridization analysis</bold>
</article-title>
<source>J Hum Genet</source>
<year>2010</year>
<volume>55</volume>
<fpage>590</fpage>
<lpage>599</lpage>
<pub-id pub-id-type="pmid">20613765</pub-id>
</mixed-citation>
</ref>
<ref id="B92">
<mixed-citation publication-type="journal">
<name>
<surname>Gu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Lupski</surname>
<given-names>JR</given-names>
</name>
<article-title>
<bold>Mechanisms for human genomic rearrangement</bold>
</article-title>
<source>PathoGenet</source>
<year>2008</year>
<volume>1</volume>
<fpage>4</fpage>
</mixed-citation>
</ref>
<ref id="B93">
<mixed-citation publication-type="journal">
<name>
<surname>Hong</surname>
<given-names>GF</given-names>
</name>
<article-title>
<bold>A method for sequencing single-stranded cloned DNA in both directions</bold>
</article-title>
<source>Biosci Rep</source>
<year>1981</year>
<volume>1</volume>
<fpage>243</fpage>
<lpage>252</lpage>
<pub-id pub-id-type="pmid">6271278</pub-id>
</mixed-citation>
</ref>
<ref id="B94">
<mixed-citation publication-type="journal">
<name>
<surname>Korbel</surname>
<given-names>JO</given-names>
</name>
<name>
<surname>Urban</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Affourtit</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Godwin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Grubert</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Simons</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Palejev</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Carriero</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Du</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Taillon</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Tanzer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Saunders</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Chi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Carter</surname>
<given-names>NP</given-names>
</name>
<name>
<surname>Hurles</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Weissman</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Harkins</surname>
<given-names>TT</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Egholm</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<article-title>
<bold>Paired-end mapping reveals extensive structural variation in the human genome</bold>
</article-title>
<source>Science</source>
<year>2007</year>
<volume>318</volume>
<fpage>420</fpage>
<lpage>426</lpage>
<pub-id pub-id-type="pmid">17901297</pub-id>
</mixed-citation>
</ref>
<ref id="B95">
<mixed-citation publication-type="journal">
<name>
<surname>Williams</surname>
<given-names>LJ</given-names>
</name>
<name>
<surname>Tabbaa</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Berlin</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Shea</surname>
<given-names>TP</given-names>
</name>
<name>
<surname>Maccallum</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Drier</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Getz</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>SK</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gnirke</surname>
<given-names>A</given-names>
</name>
<article-title>
<bold>Paired-end sequencing of Fosmid libraries by Illumina</bold>
</article-title>
<source>Genome Res</source>
<year>2012</year>
<volume>22</volume>
<fpage>2241</fpage>
<lpage>2249</lpage>
<pub-id pub-id-type="pmid">22800726</pub-id>
</mixed-citation>
</ref>
<ref id="B96">
<mixed-citation publication-type="journal">
<name>
<surname>Ramachandran</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Palidwor</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Porter</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Perkins</surname>
<given-names>TJ</given-names>
</name>
<article-title>
<bold>MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data</bold>
</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>444</fpage>
<lpage>450</lpage>
<pub-id pub-id-type="pmid">23300135</pub-id>
</mixed-citation>
</ref>
<ref id="B97">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<article-title>
<bold>The study of correlation structures of DNA sequences: a critical review</bold>
</article-title>
<source>Comput Chem</source>
<year>1997</year>
<volume>21</volume>
<fpage>257</fpage>
<lpage>271</lpage>
<pub-id pub-id-type="pmid">9415988</pub-id>
</mixed-citation>
</ref>
<ref id="B98">
<mixed-citation publication-type="journal">
<name>
<surname>Rodrigue</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Materna</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Timberlake</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Blackburn</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Malmstrom</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Alm</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Chisholm</surname>
<given-names>SW</given-names>
</name>
<article-title>
<bold>Unlocking short read sequencing for metagenomics</bold>
</article-title>
<source>PLoS ONE</source>
<year>2010</year>
<volume>5</volume>
<fpage>e11840</fpage>
<pub-id pub-id-type="pmid">20676378</pub-id>
</mixed-citation>
</ref>
<ref id="B99">
<mixed-citation publication-type="journal">
<name>
<surname>Magoc̆</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
<article-title>
<bold>FLASH: fast length adjustment of short reads to improve genome assemblies</bold>
</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>2957</fpage>
<lpage>2963</lpage>
<pub-id pub-id-type="pmid">21903629</pub-id>
</mixed-citation>
</ref>
<ref id="B100">
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lam</surname>
<given-names>TW</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>R</given-names>
</name>
<article-title>
<bold>COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly</bold>
</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>2870</fpage>
<lpage>2874</lpage>
<pub-id pub-id-type="pmid">23044551</pub-id>
</mixed-citation>
</ref>
<ref id="B101">
<mixed-citation publication-type="journal">
<name>
<surname>Ruan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Chong</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Gong</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Tao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Zhai</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Turissini</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cannon</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>CI</given-names>
</name>
<article-title>
<bold>Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology</bold>
</article-title>
<source>BMC Genomics</source>
<year>2013</year>
<volume>14</volume>
<fpage>711</fpage>
<pub-id pub-id-type="pmid">24134808</pub-id>
</mixed-citation>
</ref>
<ref id="B102">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Kaneko</surname>
<given-names>K</given-names>
</name>
<article-title>
<bold>Long-range correlation and partial 1/f</bold>
<sup>
<bold>
<italic>α</italic>
</bold>
</sup>
<bold> spectrum in a noncoding DNA sequence</bold>
</article-title>
<source>Euro Phys Lett</source>
<year>1992</year>
<volume>17</volume>
<fpage>655</fpage>
<lpage>660</lpage>
</mixed-citation>
</ref>
<ref id="B103">
<mixed-citation publication-type="journal">
<name>
<surname>Bernaola-Galván</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Carpena</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Román-Roldán</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Oliver</surname>
<given-names>JL</given-names>
</name>
<article-title>
<bold>Study of statistical correlations in DNA sequences</bold>
</article-title>
<source>Gene</source>
<year>2002</year>
<volume>300</volume>
<fpage>105</fpage>
<lpage>115</lpage>
<pub-id pub-id-type="pmid">12468092</pub-id>
</mixed-citation>
</ref>
<ref id="B104">
<mixed-citation publication-type="journal">
<name>
<surname>Arneodo</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vaillant</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Audit</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Argoul</surname>
<given-names>F</given-names>
</name>
<name>
<surname>d’Aubenton-Carafa</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Thermes</surname>
<given-names>C</given-names>
</name>
<article-title>
<bold>Multi-scale coding of genomic information: from DNA sequence to genome structure and function</bold>
</article-title>
<source>Phys Rep</source>
<year>2011</year>
<volume>498</volume>
<fpage>45</fpage>
<lpage>188</lpage>
</mixed-citation>
</ref>
<ref id="B105">
<mixed-citation publication-type="journal">
<name>
<surname>Voss</surname>
<given-names>RF</given-names>
</name>
<article-title>
<bold>Evolution of long-range fractal correlations and 1/f noise in DNA base sequences</bold>
</article-title>
<source>Phys Rev Lett</source>
<year>1992</year>
<volume>68</volume>
<fpage>3805</fpage>
<lpage>3808</lpage>
<pub-id pub-id-type="pmid">10045801</pub-id>
</mixed-citation>
</ref>
<ref id="B106">
<mixed-citation publication-type="journal">
<name>
<surname>Fukushima</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ikemura</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kinouchi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Oshima</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kudo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Mori</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kanaya</surname>
<given-names>S</given-names>
</name>
<article-title>
<bold>Periodicity in prokaryotic and eukaryotic genomes identified by power spectrum analysis</bold>
</article-title>
<source>Gene</source>
<year>2002</year>
<volume>300</volume>
<fpage>203</fpage>
<lpage>211</lpage>
<pub-id pub-id-type="pmid">12468102</pub-id>
</mixed-citation>
</ref>
<ref id="B107">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Holste</surname>
<given-names>D</given-names>
</name>
<article-title>
<bold>Spectral analysis of guanine and cytosine fluctuations of mouse genomic DNA</bold>
</article-title>
<source>Fluc Noise Lett</source>
<year>2004</year>
<volume>4</volume>
<fpage>L453</fpage>
<lpage>L464</lpage>
</mixed-citation>
</ref>
<ref id="B108">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Holste</surname>
<given-names>D</given-names>
</name>
<article-title>
<bold>Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome</bold>
</article-title>
<source>Phys Rev E</source>
<year>2005</year>
<volume>71</volume>
<fpage>041910</fpage>
</mixed-citation>
</ref>
<ref id="B109">
<mixed-citation publication-type="journal">
<name>
<surname>Huynen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>van Nimwegen</surname>
<given-names>E</given-names>
</name>
<article-title>
<bold>The frequency distribution of gene family sizes in complete genomes</bold>
</article-title>
<source>Mol Biol Evol</source>
<year>1998</year>
<volume>15</volume>
<fpage>583</fpage>
<lpage>589</lpage>
<pub-id pub-id-type="pmid">9580988</pub-id>
</mixed-citation>
</ref>
<ref id="B110">
<mixed-citation publication-type="journal">
<name>
<surname>Qian</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Luscombe</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
<article-title>
<bold>Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model</bold>
</article-title>
<source>J Mol Biol</source>
<year>2001</year>
<volume>313</volume>
<fpage>673</fpage>
<lpage>681</lpage>
<pub-id pub-id-type="pmid">11697896</pub-id>
</mixed-citation>
</ref>
<ref id="B111">
<mixed-citation publication-type="journal">
<name>
<surname>Koonin</surname>
<given-names>EV</given-names>
</name>
<article-title>
<bold>Are there laws of genome evolution?</bold>
</article-title>
<source>PLoS Comp Biol</source>
<year>2011</year>
<volume>7</volume>
<fpage>e1002173</fpage>
</mixed-citation>
</ref>
<ref id="B112">
<mixed-citation publication-type="journal">
<name>
<surname>Herrada</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Euíluz</surname>
<given-names>VM</given-names>
</name>
<name>
<surname>Hernández-García</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Duarte</surname>
<given-names>CM</given-names>
</name>
<article-title>
<bold>Scaling properties of protein family phylogenies</bold>
</article-title>
<source>BMC Evol Biol</source>
<year>2011</year>
<volume>11</volume>
<fpage>155</fpage>
<pub-id pub-id-type="pmid">21645345</pub-id>
</mixed-citation>
</ref>
<ref id="B113">
<mixed-citation publication-type="journal">
<name>
<surname>Salerno</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Havlak</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Scale-invariant structure of strongly conserved sequence in genomic intersections and alignments</bold>
</article-title>
<source>Proc Natl Acad Sci</source>
<year>2006</year>
<volume>103</volume>
<fpage>13121</fpage>
<lpage>13125</lpage>
<pub-id pub-id-type="pmid">16924100</pub-id>
</mixed-citation>
</ref>
<ref id="B114">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
<article-title>
<bold>Expansion-modification systems: a model for spatial 1/f spectra</bold>
</article-title>
<source>Phys Rev A</source>
<year>1991</year>
<volume>43</volume>
<fpage>5240</fpage>
<lpage>5260</lpage>
<pub-id pub-id-type="pmid">9904836</pub-id>
</mixed-citation>
</ref>
<ref id="B115">
<mixed-citation publication-type="journal">
<name>
<surname>Yanai</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Camacho</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>DeLisi</surname>
<given-names>C</given-names>
</name>
<article-title>
<bold>Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification</bold>
</article-title>
<source>Phys Rev Lett</source>
<year>2000</year>
<volume>85</volume>
<fpage>2641</fpage>
<lpage>2644</lpage>
<pub-id pub-id-type="pmid">10978127</pub-id>
</mixed-citation>
</ref>
<ref id="B116">
<mixed-citation publication-type="journal">
<name>
<surname>Teichmann</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Babu</surname>
<given-names>MM</given-names>
</name>
<article-title>
<bold>Gene regulatory network growth by duplication</bold>
</article-title>
<source>Nat Genet</source>
<year>2004</year>
<volume>36</volume>
<fpage>492</fpage>
<lpage>496</lpage>
<pub-id pub-id-type="pmid">15107850</pub-id>
</mixed-citation>
</ref>
<ref id="B117">
<mixed-citation publication-type="journal">
<name>
<surname>Massip</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Arndt</surname>
<given-names>PF</given-names>
</name>
<article-title>
<bold>Neutral evolution of duplicated DNA: an evolutionary stick-breaking process causes scale-invariant behavior</bold>
</article-title>
<source>Phys Rev Lett</source>
<year>2013</year>
<volume>110</volume>
<fpage>148101</fpage>
<pub-id pub-id-type="pmid">25167038</pub-id>
</mixed-citation>
</ref>
<ref id="B118">
<mixed-citation publication-type="journal">
<name>
<surname>Zhang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>HH</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>WY</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>WH</given-names>
</name>
<article-title>
<bold>Patterns of segmental duplication in the human genome</bold>
</article-title>
<source>Mol Biol Evol</source>
<year>2005</year>
<volume>22</volume>
<fpage>135</fpage>
<lpage>141</lpage>
<pub-id pub-id-type="pmid">15371527</pub-id>
</mixed-citation>
</ref>
<ref id="B119">
<mixed-citation publication-type="journal">
<name>
<surname>Ng</surname>
<given-names>SB</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>EH</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>Flygare</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Bigham</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shaffer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bhattacharjee</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Eichler</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Bamshad</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nickerson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Shendure</surname>
<given-names>J</given-names>
</name>
<article-title>
<bold>Targeted capture and massively parallel sequencing of 12 human exome</bold>
</article-title>
<source>Nature</source>
<year>2009</year>
<volume>461</volume>
<fpage>272</fpage>
<lpage>276</lpage>
<pub-id pub-id-type="pmid">19684571</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000949 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000949 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3927684
   |texte=   Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24386976" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021