Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

G+C content dominates intrinsic nucleosome occupancy

Identifieur interne : 000A86 ( Pmc/Corpus ); précédent : 000A85; suivant : 000A87

G+C content dominates intrinsic nucleosome occupancy

Auteurs : Desiree Tillo ; Timothy R. Hughes

Source :

RBID : PMC:2808325

Abstract

Background

The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome in vitro. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences.

Results

We find that a simple linear combination of only 14 simple DNA sequence attributes (G+C content, two transformations of dinucleotide composition, and the frequency of eleven 4-bp sequences) explains nucleosome occupancy in vitro and in vivo in a manner comparable to the Kaplan model. G+C content and frequency of AAAA are the most important features. G+C content is dominant, alone explaining ~50% of the variation in nucleosome occupancy in vitro.

Conclusions

Our findings provide a dramatically simplified means to predict and understand intrinsic nucleosome occupancy. G+C content may dominate because it both reduces frequency of poly-A-like stretches and correlates with many other DNA structural characteristics. Since G+C content is enriched or depleted at many types of features in diverse eukaryotic genomes, our results suggest that variation in nucleotide composition may have a widespread and direct influence on chromatin structure.


Url:
DOI: 10.1186/1471-2105-10-442
PubMed: 20028554
PubMed Central: 2808325

Links to Exploration step

PMC:2808325

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">G+C content dominates intrinsic nucleosome occupancy</title>
<author>
<name sortKey="Tillo, Desiree" sort="Tillo, Desiree" uniqKey="Tillo D" first="Desiree" last="Tillo">Desiree Tillo</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hughes, Timothy R" sort="Hughes, Timothy R" uniqKey="Hughes T" first="Timothy R" last="Hughes">Timothy R. Hughes</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M5S 3E1, Canada</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20028554</idno>
<idno type="pmc">2808325</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808325</idno>
<idno type="RBID">PMC:2808325</idno>
<idno type="doi">10.1186/1471-2105-10-442</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000A86</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000A86</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">G+C content dominates intrinsic nucleosome occupancy</title>
<author>
<name sortKey="Tillo, Desiree" sort="Tillo, Desiree" uniqKey="Tillo D" first="Desiree" last="Tillo">Desiree Tillo</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hughes, Timothy R" sort="Hughes, Timothy R" uniqKey="Hughes T" first="Timothy R" last="Hughes">Timothy R. Hughes</name>
<affiliation>
<nlm:aff id="I1">Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M5S 3E1, Canada</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome
<italic>in vitro</italic>
. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences.</p>
</sec>
<sec>
<title>Results</title>
<p>We find that a simple linear combination of only 14 simple DNA sequence attributes (G+C content, two transformations of dinucleotide composition, and the frequency of eleven 4-bp sequences) explains nucleosome occupancy
<italic>in vitro </italic>
and
<italic>in vivo </italic>
in a manner comparable to the Kaplan model. G+C content and frequency of AAAA are the most important features. G+C content is dominant, alone explaining ~50% of the variation in nucleosome occupancy
<italic>in vitro</italic>
.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our findings provide a dramatically simplified means to predict and understand intrinsic nucleosome occupancy. G+C content may dominate because it both reduces frequency of poly-A-like stretches and correlates with many other DNA structural characteristics. Since G+C content is enriched or depleted at many types of features in diverse eukaryotic genomes, our results suggest that variation in nucleotide composition may have a widespread and direct influence on chromatin structure.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Luger, K" uniqKey="Luger K">K Luger</name>
</author>
<author>
<name sortKey="Mader, Aw" uniqKey="Mader A">AW Mader</name>
</author>
<author>
<name sortKey="Richmond, Rk" uniqKey="Richmond R">RK Richmond</name>
</author>
<author>
<name sortKey="Sargent, Df" uniqKey="Sargent D">DF Sargent</name>
</author>
<author>
<name sortKey="Richmond, Tj" uniqKey="Richmond T">TJ Richmond</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, W" uniqKey="Lee W">W Lee</name>
</author>
<author>
<name sortKey="Tillo, D" uniqKey="Tillo D">D Tillo</name>
</author>
<author>
<name sortKey="Bray, N" uniqKey="Bray N">N Bray</name>
</author>
<author>
<name sortKey="Morse, Rh" uniqKey="Morse R">RH Morse</name>
</author>
<author>
<name sortKey="Davis, Rw" uniqKey="Davis R">RW Davis</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
<author>
<name sortKey="Nislow, C" uniqKey="Nislow C">C Nislow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Groth, A" uniqKey="Groth A">A Groth</name>
</author>
<author>
<name sortKey="Rocha, W" uniqKey="Rocha W">W Rocha</name>
</author>
<author>
<name sortKey="Verreault, A" uniqKey="Verreault A">A Verreault</name>
</author>
<author>
<name sortKey="Almouzni, G" uniqKey="Almouzni G">G Almouzni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, B" uniqKey="Li B">B Li</name>
</author>
<author>
<name sortKey="Carey, M" uniqKey="Carey M">M Carey</name>
</author>
<author>
<name sortKey="Workman, Jl" uniqKey="Workman J">JL Workman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yuan, Gc" uniqKey="Yuan G">GC Yuan</name>
</author>
<author>
<name sortKey="Liu, Yj" uniqKey="Liu Y">YJ Liu</name>
</author>
<author>
<name sortKey="Dion, Mf" uniqKey="Dion M">MF Dion</name>
</author>
<author>
<name sortKey="Slack, Md" uniqKey="Slack M">MD Slack</name>
</author>
<author>
<name sortKey="Wu, Lf" uniqKey="Wu L">LF Wu</name>
</author>
<author>
<name sortKey="Altschuler, Sj" uniqKey="Altschuler S">SJ Altschuler</name>
</author>
<author>
<name sortKey="Rando, Oj" uniqKey="Rando O">OJ Rando</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, Ck" uniqKey="Lee C">CK Lee</name>
</author>
<author>
<name sortKey="Shibata, Y" uniqKey="Shibata Y">Y Shibata</name>
</author>
<author>
<name sortKey="Rao, B" uniqKey="Rao B">B Rao</name>
</author>
<author>
<name sortKey="Strahl, Bd" uniqKey="Strahl B">BD Strahl</name>
</author>
<author>
<name sortKey="Lieb, Jd" uniqKey="Lieb J">JD Lieb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernstein, Be" uniqKey="Bernstein B">BE Bernstein</name>
</author>
<author>
<name sortKey="Liu, Cl" uniqKey="Liu C">CL Liu</name>
</author>
<author>
<name sortKey="Humphrey, El" uniqKey="Humphrey E">EL Humphrey</name>
</author>
<author>
<name sortKey="Perlstein, Eo" uniqKey="Perlstein E">EO Perlstein</name>
</author>
<author>
<name sortKey="Schreiber, Sl" uniqKey="Schreiber S">SL Schreiber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaplan, N" uniqKey="Kaplan N">N Kaplan</name>
</author>
<author>
<name sortKey="Moore, Ik" uniqKey="Moore I">IK Moore</name>
</author>
<author>
<name sortKey="Fondufe Mittendorf, Y" uniqKey="Fondufe Mittendorf Y">Y Fondufe-Mittendorf</name>
</author>
<author>
<name sortKey="Gossett, Aj" uniqKey="Gossett A">AJ Gossett</name>
</author>
<author>
<name sortKey="Tillo, D" uniqKey="Tillo D">D Tillo</name>
</author>
<author>
<name sortKey="Field, Y" uniqKey="Field Y">Y Field</name>
</author>
<author>
<name sortKey="Leproust, Em" uniqKey="Leproust E">EM LeProust</name>
</author>
<author>
<name sortKey="Hughes, Tr" uniqKey="Hughes T">TR Hughes</name>
</author>
<author>
<name sortKey="Lieb, Jd" uniqKey="Lieb J">JD Lieb</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sekinger, Ea" uniqKey="Sekinger E">EA Sekinger</name>
</author>
<author>
<name sortKey="Moqtaderi, Z" uniqKey="Moqtaderi Z">Z Moqtaderi</name>
</author>
<author>
<name sortKey="Struhl, K" uniqKey="Struhl K">K Struhl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Yh" uniqKey="Wang Y">YH Wang</name>
</author>
<author>
<name sortKey="Amirhaeri, S" uniqKey="Amirhaeri S">S Amirhaeri</name>
</author>
<author>
<name sortKey="Kang, S" uniqKey="Kang S">S Kang</name>
</author>
<author>
<name sortKey="Wells, Rd" uniqKey="Wells R">RD Wells</name>
</author>
<author>
<name sortKey="Griffith, Jd" uniqKey="Griffith J">JD Griffith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ozsolak, F" uniqKey="Ozsolak F">F Ozsolak</name>
</author>
<author>
<name sortKey="Song, Js" uniqKey="Song J">JS Song</name>
</author>
<author>
<name sortKey="Liu, Xs" uniqKey="Liu X">XS Liu</name>
</author>
<author>
<name sortKey="Fisher, De" uniqKey="Fisher D">DE Fisher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cao, H" uniqKey="Cao H">H Cao</name>
</author>
<author>
<name sortKey="Widlund, Hr" uniqKey="Widlund H">HR Widlund</name>
</author>
<author>
<name sortKey="Simonsson, T" uniqKey="Simonsson T">T Simonsson</name>
</author>
<author>
<name sortKey="Kubista, M" uniqKey="Kubista M">M Kubista</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drew, Hr" uniqKey="Drew H">HR Drew</name>
</author>
<author>
<name sortKey="Travers, Aa" uniqKey="Travers A">AA Travers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Suter, B" uniqKey="Suter B">B Suter</name>
</author>
<author>
<name sortKey="Schnappauf, G" uniqKey="Schnappauf G">G Schnappauf</name>
</author>
<author>
<name sortKey="Thoma, F" uniqKey="Thoma F">F Thoma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Yh" uniqKey="Wang Y">YH Wang</name>
</author>
<author>
<name sortKey="Gellibolian, R" uniqKey="Gellibolian R">R Gellibolian</name>
</author>
<author>
<name sortKey="Shimizu, M" uniqKey="Shimizu M">M Shimizu</name>
</author>
<author>
<name sortKey="Wells, Rd" uniqKey="Wells R">RD Wells</name>
</author>
<author>
<name sortKey="Griffith, J" uniqKey="Griffith J">J Griffith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Calladine, Cr" uniqKey="Calladine C">CR Calladine</name>
</author>
<author>
<name sortKey="Drew, Hr" uniqKey="Drew H">HR Drew</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sivolob, Av" uniqKey="Sivolob A">AV Sivolob</name>
</author>
<author>
<name sortKey="Khrapunov, Sn" uniqKey="Khrapunov S">SN Khrapunov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brukner, I" uniqKey="Brukner I">I Brukner</name>
</author>
<author>
<name sortKey="Sanchez, R" uniqKey="Sanchez R">R Sanchez</name>
</author>
<author>
<name sortKey="Suck, D" uniqKey="Suck D">D Suck</name>
</author>
<author>
<name sortKey="Pongor, S" uniqKey="Pongor S">S Pongor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ponomarenko, Jv" uniqKey="Ponomarenko J">JV Ponomarenko</name>
</author>
<author>
<name sortKey="Ponomarenko, Mp" uniqKey="Ponomarenko M">MP Ponomarenko</name>
</author>
<author>
<name sortKey="Frolov, As" uniqKey="Frolov A">AS Frolov</name>
</author>
<author>
<name sortKey="Vorobyev, Dg" uniqKey="Vorobyev D">DG Vorobyev</name>
</author>
<author>
<name sortKey="Overton, Gc" uniqKey="Overton G">GC Overton</name>
</author>
<author>
<name sortKey="Kolchanov, Na" uniqKey="Kolchanov N">NA Kolchanov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ioshikhes, I" uniqKey="Ioshikhes I">I Ioshikhes</name>
</author>
<author>
<name sortKey="Bolshoy, A" uniqKey="Bolshoy A">A Bolshoy</name>
</author>
<author>
<name sortKey="Derenshteyn, K" uniqKey="Derenshteyn K">K Derenshteyn</name>
</author>
<author>
<name sortKey="Borodovsky, M" uniqKey="Borodovsky M">M Borodovsky</name>
</author>
<author>
<name sortKey="Trifonov, En" uniqKey="Trifonov E">EN Trifonov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Satchwell, Sc" uniqKey="Satchwell S">SC Satchwell</name>
</author>
<author>
<name sortKey="Drew, Hr" uniqKey="Drew H">HR Drew</name>
</author>
<author>
<name sortKey="Travers, Aa" uniqKey="Travers A">AA Travers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ioshikhes, Ip" uniqKey="Ioshikhes I">IP Ioshikhes</name>
</author>
<author>
<name sortKey="Albert, I" uniqKey="Albert I">I Albert</name>
</author>
<author>
<name sortKey="Zanton, Sj" uniqKey="Zanton S">SJ Zanton</name>
</author>
<author>
<name sortKey="Pugh, Bf" uniqKey="Pugh B">BF Pugh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
<author>
<name sortKey="Fondufe Mittendorf, Y" uniqKey="Fondufe Mittendorf Y">Y Fondufe-Mittendorf</name>
</author>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
<author>
<name sortKey="Thastrom, A" uniqKey="Thastrom A">A Thastrom</name>
</author>
<author>
<name sortKey="Field, Y" uniqKey="Field Y">Y Field</name>
</author>
<author>
<name sortKey="Moore, Ik" uniqKey="Moore I">IK Moore</name>
</author>
<author>
<name sortKey="Wang, Jp" uniqKey="Wang J">JP Wang</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Field, Y" uniqKey="Field Y">Y Field</name>
</author>
<author>
<name sortKey="Kaplan, N" uniqKey="Kaplan N">N Kaplan</name>
</author>
<author>
<name sortKey="Fondufe Mittendorf, Y" uniqKey="Fondufe Mittendorf Y">Y Fondufe-Mittendorf</name>
</author>
<author>
<name sortKey="Moore, Ik" uniqKey="Moore I">IK Moore</name>
</author>
<author>
<name sortKey="Sharon, E" uniqKey="Sharon E">E Sharon</name>
</author>
<author>
<name sortKey="Lubling, Y" uniqKey="Lubling Y">Y Lubling</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peckham, He" uniqKey="Peckham H">HE Peckham</name>
</author>
<author>
<name sortKey="Thurman, Re" uniqKey="Thurman R">RE Thurman</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Stamatoyannopoulos, Ja" uniqKey="Stamatoyannopoulos J">JA Stamatoyannopoulos</name>
</author>
<author>
<name sortKey="Noble, Ws" uniqKey="Noble W">WS Noble</name>
</author>
<author>
<name sortKey="Struhl, K" uniqKey="Struhl K">K Struhl</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yuan, Gc" uniqKey="Yuan G">GC Yuan</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schwartz, S" uniqKey="Schwartz S">S Schwartz</name>
</author>
<author>
<name sortKey="Meshorer, E" uniqKey="Meshorer E">E Meshorer</name>
</author>
<author>
<name sortKey="Ast, G" uniqKey="Ast G">G Ast</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miele, V" uniqKey="Miele V">V Miele</name>
</author>
<author>
<name sortKey="Vaillant, C" uniqKey="Vaillant C">C Vaillant</name>
</author>
<author>
<name sortKey="D Aubenton Carafa, Y" uniqKey="D Aubenton Carafa Y">Y d'Aubenton-Carafa</name>
</author>
<author>
<name sortKey="Thermes, C" uniqKey="Thermes C">C Thermes</name>
</author>
<author>
<name sortKey="Grange, T" uniqKey="Grange T">T Grange</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tolstorukov, My" uniqKey="Tolstorukov M">MY Tolstorukov</name>
</author>
<author>
<name sortKey="Choudhary, V" uniqKey="Choudhary V">V Choudhary</name>
</author>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
<author>
<name sortKey="Zhurkin, Vb" uniqKey="Zhurkin V">VB Zhurkin</name>
</author>
<author>
<name sortKey="Park, Pj" uniqKey="Park P">PJ Park</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tolstorukov, My" uniqKey="Tolstorukov M">MY Tolstorukov</name>
</author>
<author>
<name sortKey="Colasanti, Av" uniqKey="Colasanti A">AV Colasanti</name>
</author>
<author>
<name sortKey="Mccandlish, Dm" uniqKey="Mccandlish D">DM McCandlish</name>
</author>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
<author>
<name sortKey="Zhurkin, Vb" uniqKey="Zhurkin V">VB Zhurkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dohm, Jc" uniqKey="Dohm J">JC Dohm</name>
</author>
<author>
<name sortKey="Lottaz, C" uniqKey="Lottaz C">C Lottaz</name>
</author>
<author>
<name sortKey="Borodina, T" uniqKey="Borodina T">T Borodina</name>
</author>
<author>
<name sortKey="Himmelbauer, H" uniqKey="Himmelbauer H">H Himmelbauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Valouev, A" uniqKey="Valouev A">A Valouev</name>
</author>
<author>
<name sortKey="Ichikawa, J" uniqKey="Ichikawa J">J Ichikawa</name>
</author>
<author>
<name sortKey="Tonthat, T" uniqKey="Tonthat T">T Tonthat</name>
</author>
<author>
<name sortKey="Stuart, J" uniqKey="Stuart J">J Stuart</name>
</author>
<author>
<name sortKey="Ranade, S" uniqKey="Ranade S">S Ranade</name>
</author>
<author>
<name sortKey="Peckham, H" uniqKey="Peckham H">H Peckham</name>
</author>
<author>
<name sortKey="Zeng, K" uniqKey="Zeng K">K Zeng</name>
</author>
<author>
<name sortKey="Malek, Ja" uniqKey="Malek J">JA Malek</name>
</author>
<author>
<name sortKey="Costa, G" uniqKey="Costa G">G Costa</name>
</author>
<author>
<name sortKey="Mckernan, K" uniqKey="Mckernan K">K McKernan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barbic, A" uniqKey="Barbic A">A Barbic</name>
</author>
<author>
<name sortKey="Zimmer, Dp" uniqKey="Zimmer D">DP Zimmer</name>
</author>
<author>
<name sortKey="Crothers, Dm" uniqKey="Crothers D">DM Crothers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rice, Pa" uniqKey="Rice P">PA Rice</name>
</author>
<author>
<name sortKey="Correll, Cc" uniqKey="Correll C">CC Correll</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gardiner Garden, M" uniqKey="Gardiner Garden M">M Gardiner-Garden</name>
</author>
<author>
<name sortKey="Frommer, M" uniqKey="Frommer M">M Frommer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thiery, Jp" uniqKey="Thiery J">JP Thiery</name>
</author>
<author>
<name sortKey="Macaya, G" uniqKey="Macaya G">G Macaya</name>
</author>
<author>
<name sortKey="Bernardi, G" uniqKey="Bernardi G">G Bernardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aerts, S" uniqKey="Aerts S">S Aerts</name>
</author>
<author>
<name sortKey="Thijs, G" uniqKey="Thijs G">G Thijs</name>
</author>
<author>
<name sortKey="Dabrowski, M" uniqKey="Dabrowski M">M Dabrowski</name>
</author>
<author>
<name sortKey="Moreau, Y" uniqKey="Moreau Y">Y Moreau</name>
</author>
<author>
<name sortKey="De Moor, B" uniqKey="De Moor B">B De Moor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Efron, B" uniqKey="Efron B">B Efron</name>
</author>
<author>
<name sortKey="Hastie, T" uniqKey="Hastie T">T Hastie</name>
</author>
<author>
<name sortKey="Johnstone, I" uniqKey="Johnstone I">I Johnstone</name>
</author>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thastrom, A" uniqKey="Thastrom A">A Thastrom</name>
</author>
<author>
<name sortKey="Bingham, Lm" uniqKey="Bingham L">LM Bingham</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">20028554</article-id>
<article-id pub-id-type="pmc">2808325</article-id>
<article-id pub-id-type="publisher-id">1471-2105-10-442</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-10-442</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>G+C content dominates intrinsic nucleosome occupancy</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Tillo</surname>
<given-names>Desiree</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>desiree.tillo@utoronto.ca</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A2">
<name>
<surname>Hughes</surname>
<given-names>Timothy R</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>t.hughes@utoronto.ca</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada</aff>
<aff id="I2">
<label>2</label>
Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M5S 3E1, Canada</aff>
<pub-date pub-type="collection">
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>22</day>
<month>12</month>
<year>2009</year>
</pub-date>
<volume>10</volume>
<fpage>442</fpage>
<lpage>442</lpage>
<history>
<date date-type="received">
<day>15</day>
<month>6</month>
<year>2009</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>12</month>
<year>2009</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2009 Tillo and Hughes; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2009</copyright-year>
<copyright-holder>Tillo and Hughes; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/10/442"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome
<italic>in vitro</italic>
. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences.</p>
</sec>
<sec>
<title>Results</title>
<p>We find that a simple linear combination of only 14 simple DNA sequence attributes (G+C content, two transformations of dinucleotide composition, and the frequency of eleven 4-bp sequences) explains nucleosome occupancy
<italic>in vitro </italic>
and
<italic>in vivo </italic>
in a manner comparable to the Kaplan model. G+C content and frequency of AAAA are the most important features. G+C content is dominant, alone explaining ~50% of the variation in nucleosome occupancy
<italic>in vitro</italic>
.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our findings provide a dramatically simplified means to predict and understand intrinsic nucleosome occupancy. G+C content may dominate because it both reduces frequency of poly-A-like stretches and correlates with many other DNA structural characteristics. Since G+C content is enriched or depleted at many types of features in diverse eukaryotic genomes, our results suggest that variation in nucleotide composition may have a widespread and direct influence on chromatin structure.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The genomes of eukaryotes are packaged into nucleosomes, comprised of approximately 147 base pairs of double-stranded DNA wrapped around an octamer of the highly conserved histone subunits[
<xref ref-type="bibr" rid="B1">1</xref>
]. Histones are the most abundant DNA binding proteins in the cell, and occupy ~80% of the yeast genome
<italic>in vivo</italic>
[
<xref ref-type="bibr" rid="B2">2</xref>
]. In the past few decades, it has become clear that the biological roles of nucleosomes extend far beyond simple DNA packaging, to include replication, DNA repair, recombination, and transcriptional regulation[
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
]. Active regulatory sequences are often depleted of nucleosomes[
<xref ref-type="bibr" rid="B5">5</xref>
-
<xref ref-type="bibr" rid="B7">7</xref>
], presumably due to steric hindrance constraints between nucleosomes and binding of most other DNA-binding proteins. The interplay between histones, DNA, and other DNA-binding proteins is therefore critical to the orchestration of transcription and other functions of the genome.</p>
<p>In
<italic>S. cerevisiae</italic>
, studies examining the relative incorporation of yeast genomic DNA into nucleosomes
<italic>in vitro </italic>
have demonstrated that nucleosome depletion at promoters is to a large extent programmed into the DNA sequence[
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B9">9</xref>
]. These experiments were conducted using chicken[
<xref ref-type="bibr" rid="B8">8</xref>
] or human[
<xref ref-type="bibr" rid="B9">9</xref>
] histones, which, when assembled onto yeast genomic DNA, adopted a configuration that closely resembles that of yeast nucleosomes
<italic>in vivo</italic>
. Therefore these results also indicate that the sequence preferences of nucleosomes are likely to be broadly conserved across eukarya.</p>
<p>To fully understand the function and evolution of gene regulation and genome packaging, it will be essential to understand the sequence preferences of nucleosomes. A variety of sequence cues have been shown to influence nucleosome sequence preference. These include nucleosome positioning[
<xref ref-type="bibr" rid="B10">10</xref>
,
<xref ref-type="bibr" rid="B11">11</xref>
] and excluding[
<xref ref-type="bibr" rid="B12">12</xref>
-
<xref ref-type="bibr" rid="B15">15</xref>
] sequences, as well as many local structural features that describe the overall deformability, curvature and flexibility of double stranded DNA[
<xref ref-type="bibr" rid="B16">16</xref>
-
<xref ref-type="bibr" rid="B19">19</xref>
] that could affect nucleosome occupancy and arrangement at particular sites in the genome. Methods to predict nucleosome positioning and occupancy from sequence have often relied on periodic dinucleotide patterns found in collections of nucleosomal sequences from both
<italic>in vivo </italic>
and
<italic>in vitro </italic>
experiments[
<xref ref-type="bibr" rid="B20">20</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
] and these patterns can explain a fraction of nucleosome positions
<italic>in vivo</italic>
[
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B23">23</xref>
]. However, analyses of sequences highly enriched in nucleosome-occupied and nucleosome-depleted regions in genome-scale and genome-wide data sets have highlighted the importance of nucleosome-excluding sequences, in particular poly-dA/dT tracts[
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
-
<xref ref-type="bibr" rid="B27">27</xref>
], and incorporation of these features into models of nucleosome occupancy has markedly improved prediction accuracy [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
-
<xref ref-type="bibr" rid="B26">26</xref>
]. Some of these studies have also noted that the observed nucleosome occupancy
<italic>in vivo </italic>
correlates with and can be predicted by base composition (G+C content)[
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B25">25</xref>
,
<xref ref-type="bibr" rid="B28">28</xref>
] and other structural features of DNA [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B29">29</xref>
], many of which, on their own, correlate with base composition. However, these observations were based on
<italic>in vivo </italic>
nucleosome occupancy, and did not directly demonstrate intrinsic nucleosome sequence preference.</p>
<p>Kaplan et al.[
<xref ref-type="bibr" rid="B8">8</xref>
] showed recently that a probabilistic model (hereafter referred to as the "Kaplan model") using the composition of all 5-mers within a 147-base tiling window accurately predicts nucleosome occupancy across an entire genome
<italic>in vitro</italic>
. The Kaplan model should inherently capture the effects of both base composition and aspects of large-scale structural properties which are thought to depend primarily on dinucleotide content[
<xref ref-type="bibr" rid="B19">19</xref>
]. However, the relative contributions of individual sequence features and properties are not readily apparent from the Kaplan model, which contained over 2,294 parameters. To our knowledge, there currently exists no systematic assessment of the impact of individual nucleosome excluding/attracting sequences on intrinsic nucleosome preference on a genomic scale, nor an examination of which features are redundant or dispensable in a combined model.</p>
<p>Here we used Lasso[
<xref ref-type="bibr" rid="B30">30</xref>
], a linear regression algorithm, to derive a greatly-simplified model for intrinsic nucleosome sequence preference. We used Lasso because: (1) Model generation is fast for large data sets (compared to other machine-learning approaches, such as SVM), (2) Lasso does subset selection, such that if given a set of highly correlated features, it will weight those that have the greatest impact, setting other feature weights to 0, thereby reducing the number of features in the final model, and (3) The end result is a simple linear equation, containing a set of easily interpreted weights for each feature. In our analysis, we obtained very similar models regardless of training/test divisions of the yeast genome, and we selected for further analysis one model that contains only 14 features and has predictive capacity nearly identical to the Kaplan model. While the 14 feature model is trained on the Kaplan
<italic>in vitro </italic>
data, it performs comparably or better than the best previous models on
<italic>in vivo </italic>
data in both yeast and
<italic>C. elegans</italic>
. The 14 feature model is heavily dependent on G+C and poly-A content, with G+C having the highest independent correlation with measured nucleosome occupancy. We suggest possible explanations and implications of the strong association between G+C content and intrinsic nucleosome occupancy.</p>
</sec>
<sec>
<title>Results and Discussion</title>
<p>We first performed a feature selection step to identify which sequence features known or believed to influence nucleosome occupancy or positioning correlate with or are strongly associated with the
<italic>in vitro </italic>
nucleosome data of Kaplan et al.[
<xref ref-type="bibr" rid="B8">8</xref>
].
<bold>Table S1 </bold>
(Additional File
<xref ref-type="supplementary-material" rid="S1">1</xref>
) lists the 171 features tested and the results of the tests. The features included: (a) mononucleotide frequency (i.e. G+C content); (b) predicted DNA structural characteristics (each calculated from the dinucleotide content using a simple linear formula[
<xref ref-type="bibr" rid="B19">19</xref>
]); (c) nucleosome positioning and excluding sequences from the literature[
<xref ref-type="bibr" rid="B10">10</xref>
-
<xref ref-type="bibr" rid="B15">15</xref>
]; and (d) the frequency of 4-bp sequences over a 150-bp window. We used 4-mers instead of 5-mers (as in the Kaplan model) in order to limit the number of features, and to obtain inputs that correlate independently with nucleosome occupancy (since each 4-mer occurs more frequently than nucleosomes, on average). We identified 130 features that we deemed to be associated with
<italic>in vitro </italic>
nucleosome occupancy across the yeast genome (see
<bold>Methods</bold>
), including representatives of all categories (a-d) above (
<bold>Table S1 </bold>
[Additional File
<xref ref-type="supplementary-material" rid="S1">1</xref>
]).</p>
<p>We then used Lasso to learn linear models that relate these 130 features to the Kaplan et al. data. We created eleven different models, using eleven different random samples of 1,000,000 genomic positions selected from subsets of the yeast genome as training data, each with 10-fold internal cross-validation (Lasso itself chooses the number of coefficients using a cross-validation procedure within the training data). In each case, Lasso assigned nonzero weights to a similar set of features (Figures
<xref ref-type="fig" rid="F1">1</xref>
and S1-3 [Additional File
<xref ref-type="supplementary-material" rid="S1">1</xref>
]), each of which yielded a roughly comparable correlation to test data. This result indicates that the model chosen is not strongly dependent on the subset of the data used for training. From among the models, we chose the model trained on chromosomes 1-9 for further analysis, on the basis that (a) it was an arbitrary selection, being the first model sorted numerically, and (b) it has 14 features, which is the median number among the eleven runs. Hereafter, we refer to this model as the "14 feature model", the formula for which is given in the
<bold>Methods </bold>
section.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Model feature weights selected by Lasso for eleven different training data sets</bold>
. Chromosomes from which 1,000,000 random nucleotide positions were taken are given at bottom. Correlation coefficients are given in the middle, using a test set that does not include any of the random nucleotide positions used in the training set. The top panel is a zoom-in of the 16 features that were weighted in more than half of the eleven runs. Weights do not directly reflect importance or proportion of the data that a feature explains, because features are unit-normalized prior to analysis, and can have dissimilar distributions.</p>
</caption>
<graphic xlink:href="1471-2105-10-442-1"></graphic>
</fig>
<p>The 14 feature model explains a large majority of the variation in nucleosome occupancy over the yeast genome in the Kaplan et al.
<italic>in vitro </italic>
data[
<xref ref-type="bibr" rid="B8">8</xref>
] (R = 0.86 over the test set) (Figure
<xref ref-type="fig" rid="F2">2A, B</xref>
). This correlation is near the level of experimental reproducibility reported by Kaplan et al. (R = 0.92), and similar to that of the Kaplan model that learned 2,294 parameters (R = 0.89)[
<xref ref-type="bibr" rid="B8">8</xref>
]. We note that our models with substantially more than 14 features have correlations with the
<italic>in vitro </italic>
data as high as 0.88 (Figure
<xref ref-type="fig" rid="F1">1</xref>
and S1 [Additional File
<xref ref-type="supplementary-material" rid="S1">1</xref>
]). The 14 feature model also correlates significantly with
<italic>in vivo </italic>
nucleosome occupancy in yeast (grown in glucose)[
<xref ref-type="bibr" rid="B8">8</xref>
] (Figure
<xref ref-type="fig" rid="F2">2C</xref>
) (R = 0.72, Spearman P < 2.2 × 10
<sup>-308</sup>
). The Kaplan model has a correlation coefficient of 0.74 over the same test data. Thus, the 14 feature model encapsulates the vast majority of the information in the Kaplan model. Indeed, the correlation between the 14 feature model and the Kaplan model over the entire yeast genome is 0.88 (Figure
<xref ref-type="fig" rid="F2">2D</xref>
).</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Performance of a 14 feature linear model of intrinsic nucleosome sequence preference</bold>
. (
<bold>A</bold>
) Scatter plot vs. test set (yeast chromosomes 10-16), shown as a heat-map. Axis values are log
<sub>2 </sub>
normalized nucleosome occupancy (see
<bold>Methods</bold>
). (
<bold>B</bold>
) Model scores (probabilistic[
<xref ref-type="bibr" rid="B8">8</xref>
] and linear) and
<italic>in vivo </italic>
and
<italic>in vitro </italic>
nucleosome occupancy[
<xref ref-type="bibr" rid="B8">8</xref>
] within a 20 kb region of chromosome 14. (
<bold>C</bold>
) and (
<bold>D</bold>
) Correlation of the 14 feature model score with measured
<italic>in vivo </italic>
nucleosome occupancy in yeast (C) and with the Kaplan model across chr10-16 (test set) (D).</p>
</caption>
<graphic xlink:href="1471-2105-10-442-2"></graphic>
</fig>
<p>In order to further benchmark our model, we compared the performance of the 14 feature model with published models[
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
-
<xref ref-type="bibr" rid="B26">26</xref>
,
<xref ref-type="bibr" rid="B29">29</xref>
,
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
] on other
<italic>in vitro </italic>
and
<italic>in vivo </italic>
nucleosome occupancy data sets, using Pearson correlation between predicted and actual data. These results are summarized in Table
<xref ref-type="table" rid="T1">1</xref>
. In all cases, the 14 feature model has performance comparable to the Kaplan model and to another model (the Field model) from the same lab with a similar number of parameters as the Kaplan model[
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
]. Since the 14 feature model is trained on Illumina/Solexa sequencing data, which may have inherent biases[
<xref ref-type="bibr" rid="B33">33</xref>
], it is important to note that it also correlates well with an
<italic>in vivo </italic>
nucleosome organization from a tiling array study in yeast[
<xref ref-type="bibr" rid="B2">2</xref>
] and a sequencing-based study in
<italic>C. elegans </italic>
that was normalized using naked genomic DNA processed in the same fashion as the nucleosomal DNA[
<xref ref-type="bibr" rid="B34">34</xref>
], performing the best out of all models tested on the latter data set. Thus, our model is comparable to the Kaplan model on multiple data sets, including those generated
<italic>in vivo</italic>
, using other methods, and/or in an organism distantly related to yeast.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Comparison of nucleosome occupancy prediction models on different data sets</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Summary</th>
<th align="center" colspan="6">Performance (Pearson R)</th>
<th align="center">Correlation with %G+C (Yeast, 150 bp windows)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td align="center">
<bold>Synthetic oligonucleotides (Microarray) </bold>
[
<xref ref-type="bibr" rid="B8">8</xref>
]</td>
<td align="center">
<bold>Synthetic oligonucleotides (Sequencing) </bold>
[
<xref ref-type="bibr" rid="B8">8</xref>
]</td>
<td align="center">
<bold>Yeast
<italic>in vitro </italic>
</bold>
[
<xref ref-type="bibr" rid="B8">8</xref>
]</td>
<td align="center">
<bold>Yeast
<italic>in vivo </italic>
</bold>
[
<xref ref-type="bibr" rid="B2">2</xref>
]</td>
<td align="center">
<bold>
<italic>C. elegans </italic>
adjusted nucleosome coverage </bold>
[
<xref ref-type="bibr" rid="B34">34</xref>
]</td>
<td align="center">
<bold>
<italic>C. elegans </italic>
normalized occupancy </bold>
[
<xref ref-type="bibr" rid="B34">34</xref>
]</td>
<td></td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Kaplan et al., 2009[
<xref ref-type="bibr" rid="B8">8</xref>
]</td>
<td align="left">Probabilistic model based on
<italic>in vitro </italic>
5-mer preferences and periodic dinucleotide signal.</td>
<td align="center">
<bold>0.51*</bold>
</td>
<td align="center">
<bold>0.45*</bold>
</td>
<td align="center">
<bold>0.89*</bold>
</td>
<td align="center">
<bold>0.34</bold>
</td>
<td align="center">
<bold>0.47*</bold>
</td>
<td align="center">
<bold>0.61*</bold>
</td>
<td align="center">0.87</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Lasso model (this study)</td>
<td align="left">See
<bold>Methods</bold>
.</td>
<td align="center">
<bold>0.44</bold>
</td>
<td align="center">
<bold>0.41</bold>
</td>
<td align="center">
<bold>0.86*</bold>
</td>
<td align="center">
<bold>0.38*</bold>
</td>
<td align="center">
<bold>0.49*</bold>
</td>
<td align="center">
<bold>0.66*</bold>
</td>
<td align="center">0.85</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Field et al., 2008[
<xref ref-type="bibr" rid="B24">24</xref>
]</td>
<td align="left">Probabilistic model based on 5-mer preferences measured
<italic>in vivo </italic>
(yeast) and periodic dinucleotide signals.</td>
<td align="center">
<bold>0.47*</bold>
</td>
<td align="center">
<bold>0.45*</bold>
</td>
<td align="center">
<bold>0.74</bold>
</td>
<td align="center">
<bold>0.39*</bold>
</td>
<td align="center">
<bold>0.46*</bold>
</td>
<td align="center">
<bold>0.61*</bold>
</td>
<td align="center">0.64</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">%G+C</td>
<td align="left">The percentage of guanine and cytosine bases in a DNA sequence.</td>
<td align="center">0.53*</td>
<td align="center">0.49*</td>
<td align="center">0.78*</td>
<td align="center">0.25</td>
<td align="center">0.42</td>
<td align="center">0.47</td>
<td align="center">1</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Lasso model[
<xref ref-type="bibr" rid="B2">2</xref>
]</td>
<td align="left">Linear regression model trained on
<italic>in vivo </italic>
nucleosome occupancy data. Uses DNA structural parameters, excluding sequences and transcription factor binding sites (ABF1, REB1, and STB2) as inputs.</td>
<td align="center">0.23</td>
<td align="center">0.22</td>
<td align="center">
<bold>0.63</bold>
</td>
<td align="center">
<bold>0.45*</bold>
</td>
<td align="center">
<bold>0.38</bold>
</td>
<td align="center">
<bold>0.5</bold>
</td>
<td align="center">0.55</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Peckham et al., 2007[
<xref ref-type="bibr" rid="B25">25</xref>
]</td>
<td align="left">SVM classifier trained on overrepresented k-mers (k = 1-6) found in nucleosome occupied and depleted sequences determined
<italic>in vivo </italic>
yeast data.</td>
<td align="center">
<bold>0.43</bold>
</td>
<td align="center">
<bold>0.39</bold>
</td>
<td align="center">
<bold>0.48</bold>
</td>
<td align="center">0.22</td>
<td align="center">0.29</td>
<td align="center">0.33</td>
<td align="center">0.57</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Yuan and Liu, 2008[
<xref ref-type="bibr" rid="B26">26</xref>
]</td>
<td align="left">Computes predicted nucleosome occupancy based on periodic dinucleotide signals found in nucleosomal and linker DNA sequences determined from
<italic>in vitro </italic>
and
<italic>in vivo </italic>
experiments in yeast</td>
<td align="center">0.02</td>
<td align="center">0.05</td>
<td align="center">0.35</td>
<td align="center">
<bold>0.27</bold>
</td>
<td align="center">
<bold>0.36</bold>
</td>
<td align="center">
<bold>0.48</bold>
</td>
<td align="center">0.30</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Miele et al., 2008[
<xref ref-type="bibr" rid="B29">29</xref>
]</td>
<td align="left">Computes free energy landscape of nucleosome formation using an estimation of dinucleotide-dependent DNA flexibility and intrinsic curvature.</td>
<td align="center">
<bold>0.32</bold>
</td>
<td align="center">
<bold>0.26</bold>
</td>
<td align="center">0.38</td>
<td align="center">0.22</td>
<td align="center">0.21</td>
<td align="center">0.25</td>
<td align="center">0.49</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Segal et al., 2006[
<xref ref-type="bibr" rid="B23">23</xref>
]
<break></break>
Downloaded January 2007</td>
<td align="left">Probabilistic model trained on yeast data, using a position specific scoring matrix derived from a collection of nucleosome-bound sequences obtained from
<italic>in vitro </italic>
selection experiments.</td>
<td align="center">NaN</td>
<td align="center">NaN</td>
<td align="center">0.05</td>
<td align="center">0.09</td>
<td align="center">0.05</td>
<td align="center">0.05</td>
<td align="center">0.07</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Ioshikhes et al., 2006[
<xref ref-type="bibr" rid="B22">22</xref>
]</td>
<td align="left">Computes the correlation of periodic AA/TT dinucleotide motifs in a given sequence with those found in a set of 204 eukaryotic and viral nucleosomal sequences determined through
<italic>in vivo </italic>
and
<italic>in vitro </italic>
experiments[
<xref ref-type="bibr" rid="B20">20</xref>
].</td>
<td align="center">-0.03</td>
<td align="center">-0.03</td>
<td align="center">0.01</td>
<td align="center">0.07</td>
<td align="center">-0.03</td>
<td align="center">-0.01</td>
<td align="center">0.01</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Tolstorukov et al., 2007,2008[
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
]</td>
<td align="left">Estimates the dinucleotide-dependent cost of deformation caused by threading a given sequence on a template comprising the path of DNA found on the experimentally determined structure of the nucleosome core particle.</td>
<td align="center">0.01</td>
<td align="center">0.004</td>
<td align="center">0</td>
<td align="center">-0.001</td>
<td align="center">-0.001</td>
<td align="center">-0.001</td>
<td align="center">-0.0003</td>
</tr>
<tr>
<td colspan="9">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Segal et al., 2006[
<xref ref-type="bibr" rid="B23">23</xref>
]
<break></break>
Downloaded August 2009</td>
<td align="left">Probabilistic model trained on yeast data, using a position specific scoring matrix derived from a collection of nucleosome-bound sequences obtained from
<italic>in vitro </italic>
selection experiments.</td>
<td align="center">NaN</td>
<td align="center">NaN</td>
<td align="center">-0.2</td>
<td align="center">0.001</td>
<td align="center">-0.06</td>
<td align="center">-0.05</td>
<td align="center">-0.21</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Pearson correlation is shown as a performance metric. Nucleosome occupancy was predicted in yeast using only sequence from the test set (chr10-16) and chromosome III in
<italic>C. elegans</italic>
. "NaN" indicates that a score of "0" was obtained for each sequence (since this model[
<xref ref-type="bibr" rid="B23">23</xref>
] requires the sequence be > 150 bp in length). Models are sorted by their average rank in performance. Asterisks (*) and text in
<bold>bold </bold>
denote the top three and top 50% performing models for each data set, respectively.</p>
</table-wrap-foot>
</table-wrap>
<p>The results from this comparison also confirm that models that combine aperiodic signals perform much better at predicting nucleosome occupancy than models based primarily on periodic dinucleotide signals[
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B23">23</xref>
]. The one exception is the model of Yuan and Liu[
<xref ref-type="bibr" rid="B26">26</xref>
], which is based on periodic dinucleotide signals in nucleosomal and linker sequences identified using wavelet analysis. We note, however, that the dinucleotide features with most predictive power and the highest regression coefficients in the Yuan and Liu model have frequencies at the single base scale (i.e. have a length scale of 1)[
<xref ref-type="bibr" rid="B26">26</xref>
], suggesting that aperiodic dinucleotide composition is, perhaps unintentionally, a major component.</p>
<p>The most critical features in the 14 feature model are G+C content and frequency of AAAA, on the basis of two criteria. First, these two features correlate highly with nucleosome occupancy
<italic>in vitro </italic>
(R = 0.71 and 0.63, respectively), independently of all other features (Figure
<xref ref-type="fig" rid="F3">3A-C</xref>
). Second, a procedure in which we iteratively removed the least critical feature(s) of the model (i.e. those with the least influence on the basis of re-trained model performance after their removal) resulted in AAAA and G+C being the last two components removed (data not shown). A two-feature linear model (trained on G+C and AAAA) retained a correlation on test data of 0.72, only a marginal improvement over G+C alone (Figure
<xref ref-type="fig" rid="F3">3D</xref>
). From this analysis, we conclude that G+C content independently accounts for approximately half of the variation in intrinsic nucleosome occupancy (R
<sup>2 </sup>
= 0.71
<sup>2 </sup>
= 0.50). We note that the Kaplan model weights for individual 5-mers also scale highly with G+C content (R = 0.78, Spearman P = 3.33 × 10
<sup>-284</sup>
; data not shown) and that the scores assigned by the Kaplan model to 147-base windows across the yeast genome correlate highly with G+C content (R = 0.87, Figure
<xref ref-type="fig" rid="F3">3E</xref>
). Table
<xref ref-type="table" rid="T1">1</xref>
shows that other models that correlate highly with G+C content (Table
<xref ref-type="table" rid="T1">1</xref>
, last column) perform well at predicting nucleosome occupancy
<italic>in vitro </italic>
and
<italic>in vivo</italic>
, and that G+C content itself is a good predictor in all data sets (Table
<xref ref-type="table" rid="T1">1</xref>
): in all data sets examined, %G+C had a higher correlation that the majority of published models tested at predicting nucleosome occupancy.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Correlation of each of the 14 features with nucleosome occupancy</bold>
. (
<bold>A</bold>
) Graphic illustration of the correlation of each of the 14 sequence features with nucleosome occupancy in
<italic>vitr</italic>
o and
<italic>in vivo </italic>
across the yeast genome (data from Kaplan et al.[
<xref ref-type="bibr" rid="B8">8</xref>
]). (
<bold>B</bold>
-
<bold>D</bold>
) Scatter plots showing performance of linear models on test set using only G+C content (B), AAAA occurrence (C), or both (D) as inputs. (
<bold>E</bold>
) Kaplan model score vs. proportion of G+C over all 150 bp tiling windows in the yeast genome.</p>
</caption>
<graphic xlink:href="1471-2105-10-442-3"></graphic>
</fig>
<p>We next sought to understand why these 14 features are repeatedly retained in linear models (Figure
<xref ref-type="fig" rid="F1">1</xref>
). Manual inspection of the components of the 14 feature model suggests a small number of overarching themes. All 11 of the 4-mers are A/T rich (eight are entirely A/T), and models of DNA structure suggest that they should retain some of the structural character of poly-A sequences (data not shown). Poly-A stretches are believed to exclude nucleosomes because they are both rigid and bent, making them less compatible with the extreme bending required for nucleosome formation, regardless of their local sequence context[
<xref ref-type="bibr" rid="B14">14</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
,
<xref ref-type="bibr" rid="B35">35</xref>
]. Sequences high in G+C will tend to lack these (and related) sequences, which may partly explain why G+C content has high overall predictive value; however, it is possible for sequences to be both G+C rich and contain small nucleosome excluding sequences, which would negatively impact nucleosome formation, explaining why a variety of poly-A-like 4-mers are retained in the model.</p>
<p>The importance of G+C may also be explained by the fact that this single parameter affects virtually all aspects of DNA structure. Indeed, the two overall DNA structural properties selected (propeller twist, which describes angular displacement of bases in a pair relative to each other, and slide, which describes lateral translation of base pairs relative to each other), both correlate well with G+C content when calculated as an average over a tiling window using dinucleotide tables[
<xref ref-type="bibr" rid="B19">19</xref>
] (data not shown). These and the majority of other DNA structural properties also correlate either positively or negatively with both G+C content and nucleosome occupancy
<italic>in vitro </italic>
and
<italic>in vivo </italic>
(Figure
<xref ref-type="fig" rid="F4">4</xref>
and data not shown). Thus, the 14 feature model is also likely to be dominated by G+C because this parameter influences a large number of structural attributes of DNA, perhaps most critically propeller twist and slide, which may also be sufficiently important that their deviations from simple G+C content cause them to be retained in the Lasso regression. There is prior evidence for the importance of one of these features in nucleosome formation: Poly-A and related sequences are rigid and bent precisely because they are high in propeller twist, resulting in a continuous network of bifurcated hydrogen bonds[
<xref ref-type="bibr" rid="B36">36</xref>
].</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Correlation of DNA structural parameters, calculated as the average over a 150-base window, with nucleosome occupancy
<italic>in vitro </italic>
and
<italic>in vivo</italic>
</bold>
. Calculations were made using dinucleotide and other coefficients obtained from the PROPERTY database
<ext-link ext-link-type="uri" xlink:href="http://srs6.bionet.nsc.ru/srs6bin/cgi-bin/wgetz?-page+LibInfo+-newId+-lib+PROPERTY">http://srs6.bionet.nsc.ru/srs6bin/cgi-bin/wgetz?-page+LibInfo+-newId+-lib+PROPERTY</ext-link>
. Nucleosome occupancy data are from Kaplan et al.[
<xref ref-type="bibr" rid="B8">8</xref>
] and Lee et al.[
<xref ref-type="bibr" rid="B2">2</xref>
]. Pearson correlation is shown.</p>
</caption>
<graphic xlink:href="1471-2105-10-442-4"></graphic>
</fig>
<p>To gain more direct evidence for separability between G+C content and poly-A sequences as determinants of intrinsic nucleosome occupancy, we examined G+C content and poly-A sequences in an independent data set in the Kaplan et al. paper, in which nucleosomes were assembled with synthetic 150-mer sequences designed to have a broader range of unusual sequence attributes than are present in the yeast genome. Since the synthetic 150-mer nucleosome occupancy data was described by Kaplan et al. as noisier than the yeast genomic DNA occupancy data[
<xref ref-type="bibr" rid="B8">8</xref>
], due to two rounds of PCR required in the experiment, we first confirmed that the synthetic 150-mer data set displays the same global trends with respect to DNA structural parameters as does yeast genomic DNA, both
<italic>in vitro </italic>
or
<italic>in vivo </italic>
(Figure
<xref ref-type="fig" rid="F4">4</xref>
). We then asked whether G+C content and poly-A sequences act independently by examining the effect of one variable while holding the other within a narrow range. Figure
<xref ref-type="fig" rid="F5">5A</xref>
and
<xref ref-type="fig" rid="F5">5B</xref>
show that these parameters do act independently to a considerable degree; G+C has a major effect even if there are no poly-A tracts of length greater than three, and poly-A tracts have a clear effect even if placed in a 150-mer with neutral G+C content. We note that the behaviour at the extremes of G+C content in Figure
<xref ref-type="fig" rid="F5">5A</xref>
is inconsistent with the dependence of G+C shown in Figure
<xref ref-type="fig" rid="F3">3B</xref>
; however, there are very few data points at the extremes (Figure
<xref ref-type="fig" rid="F5">5A</xref>
). The
<italic>in vivo </italic>
relevance of these extremes may be very small: there are no nucleosome-sized sequence windows in yeast that are greater than 80% or less than 20% G+C, and the same is nearly true in much larger genomes (e.g. human; Figure
<xref ref-type="fig" rid="F5">5A</xref>
). Even human CpG islands are only 66% G+C on average. CpG-like sequences[
<xref ref-type="bibr" rid="B37">37</xref>
] among the ~27,000 oligonucleotides in this analysis[
<xref ref-type="bibr" rid="B8">8</xref>
] do have high intrinsic nucleosome occupancy overall, even if they contain poly-A sequence (Figure
<xref ref-type="fig" rid="F5">5C</xref>
).</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>Relative nucleosome preference of different subsets of synthetic 150-mers</bold>
. (
<bold>A</bold>
) and (
<bold>B</bold>
) Dependence of relative nucleosome preference (as log
<sub>2</sub>
(occupancy ratio)) on G+C content (A) and maximum poly-A length (B). Oligonucleotides categorized as "Neutral %G+C" in (B) are those with 45-55% G+C. Graph below shows the frequency of the selected attribute in the oligonucleotides analyzed, and also the human and yeast genomes. (
<bold>C</bold>
) Dependence of relative occupancy on poly-A content and CpG status. Poly-A containing oligonucleotides are defined as containing at least four consecutive adenine bases. CpG oligonucleotides are defined as having a G+C content ≥50%, with an observed/expected CpG ratio ≥0.6 (Obs/Exp CpG = Number of CpG * N/(num G * num C), where N = length of sequence[
<xref ref-type="bibr" rid="B37">37</xref>
]). The sequencing readout (rather than array readout) data from the Kaplan paper was used in this analysis. On all box plots, whiskers indicate 10
<sup>th </sup>
and 90
<sup>th </sup>
percentiles.</p>
</caption>
<graphic xlink:href="1471-2105-10-442-5"></graphic>
</fig>
<p>Our model confirms and extends previous indications that G+C content is a major determinant of nucleosome sequence preference, demonstrating the importance of G+C content on intrinsic nucleosome occupancy. We propose that it represents a "summary feature" that both biases against poly-A-like tracts and encapsulates multiple DNA structural attributes. The 14 feature model we derive provides an extremely simple means to assess the intrinsic preference for nucleosomes to form on a given segment of DNA. Moreover, it can be used to evaluate
<italic>why </italic>
the segment has an intrinsic preference, in comparison to other sequences; the expected distribution of values for all of the model features in random sequence or across a genome is easily determined. We note that the 14 feature model does not contain any periodic component; Kaplan et al. also found that periodic signal added little to the probabilistic model[
<xref ref-type="bibr" rid="B8">8</xref>
]. We previously proposed that the predominant role of this signal may be to reinforce local translational or rotational settings[
<xref ref-type="bibr" rid="B2">2</xref>
], and we emphasize that our 14 feature model does not explicitly predict either nucleosome positioning or translational settings, nor does it account for steric effects. Nonetheless, the model scores closely mirror actual
<italic>in vitro </italic>
occupancy data obtained for the entire yeast genome, and also have strong correlations to
<italic>in vivo </italic>
nucleosome occupancy in yeast and
<italic>C. elegans </italic>
as shown in Figure
<xref ref-type="fig" rid="F2">2</xref>
and Table
<xref ref-type="table" rid="T1">1</xref>
similarly or more strongly than any previous model or algorithm, and much higher than most previous approaches, particularly those that rely solely on periodic signals.</p>
<p>Finally, we note that G+C content as a major determinant of nucleosome occupancy has major implications for genome organisation. Our analysis indicates that in yeast simple nucleotide composition plays a direct role in nucleosome exclusion, and presumably in demarcation of promoters. Local biases in nucleotide composition have been reported in other eukaryotes, including CpG islands[
<xref ref-type="bibr" rid="B37">37</xref>
], isochores[
<xref ref-type="bibr" rid="B38">38</xref>
], and transcription start sites[
<xref ref-type="bibr" rid="B39">39</xref>
]. It will be of interest to examine how variation in base content impacts nucleosome occupancy and chromatin structure in other genomes, whether there are functional consequences, and how the intrinsic nucleosome formation signals interact with overlapping regulatory signals in the genome.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have constructed a simple predictive model of intrinsic nucleosome occupancy in which base composition (G+C content) is a major component. G+C content may be a dominant feature because it correlates with many structural properties of DNA, and also reduces the frequency of poly-A-like stretches. Since local variations in G+C content occur at many types of features in diverse eukaryotic genomes, our findings suggest that nucleotide composition may have a widespread and direct influence on chromatin structure.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Data sets</title>
<p>We converted the average nucleosome occupancy measurements from yeast (
<italic>in vitro </italic>
and
<italic>in vivo</italic>
)[
<xref ref-type="bibr" rid="B8">8</xref>
] to log
<sub>2 </sub>
scale. We also used the
<italic>in vivo </italic>
nucleosome occupancy measurements from a tiling array study in yeast[
<xref ref-type="bibr" rid="B2">2</xref>
], and measurements from an
<italic>in vivo </italic>
map of nucleosome occupancy in
<italic>C. elegans</italic>
[
<xref ref-type="bibr" rid="B34">34</xref>
] using both the "adjusted nucleosome occupancy" values (in which nucleosomal DNA was normalized with respect to micrococcal-nuclease treated genomic DNA), and raw nucleosome coverage, applying the same normalization method found in[
<xref ref-type="bibr" rid="B8">8</xref>
]. For this, we calculate a "normalized nucleosome occupancy" measure for each base pair by taking the log
<sub>2 </sub>
ratio between the basepair's total occupancy and the mean genomic average occupancy. Then, we set the genomic average to zero by subtracting the new genome-wide mean from each basepair.</p>
</sec>
<sec>
<title>Derivation of linear model</title>
<p>We downloaded a MATLAB version of the Lasso algorithm[
<xref ref-type="bibr" rid="B30">30</xref>
,
<xref ref-type="bibr" rid="B40">40</xref>
]. Given a set of predictors (e.g. sequence features), and an outcome measurement (e.g. log
<sub>2 </sub>
<italic>in vitro </italic>
nucleosome occupancy data), Lasso generates a linear model ŷ = β x
<sub>1 </sub>
+ β x
<sub>2 </sub>
+ ... β x
<sub>n</sub>
, where the output ŷ is the nucleosome occupancy prediction for a given base position, and β are the weights for each feature (x
<sub>1..n</sub>
), calculated at that position. The Lasso algorithm imposes a constraint on the sum of the weights, such that only the most important features are given non-zero weights. Input features are listed in Table
<xref ref-type="table" rid="T1">1</xref>
and were selected following[
<xref ref-type="bibr" rid="B2">2</xref>
] (but excluding transcription factor binding sequences, which are not relevant to intrinsic nucleosome sequence preferences). Briefly, for each base, we calculated the average of each structural and base composition feature in a 75-base window centered on this base; here, a 75-base window was used because it approximates the number of central basepairs (67-71 bp) bound by the histone-fold domains of the H3
<sub>2</sub>
H4
<sub>2 </sub>
tetramer of the histone octamer[
<xref ref-type="bibr" rid="B1">1</xref>
], which, in turn, dominates the free energy of histone-DNA interactions
<italic>in vitro</italic>
[
<xref ref-type="bibr" rid="B41">41</xref>
]. The frequency of sequence motifs (4-mer copy number/frequency, poly-dA/dT tract length, and nucleosome positioning and excluding sequence occurrence) was scored on both strands in 150-base windows (75 bp on the left, 74 on the right) centered on this base, because we anticipated that specific sequences would be nucleosome-excluding, and would have such an activity over the full length of the nucleosomal DNA.</p>
<p>For Lasso, we found that an initial reduction in feature space (to ~130 features) resulted in more stable results. We therefore selected input features as follows: for 4-mer frequency and nucleosome excluding/positioning motifs, the AUROC (area under the receiver operating curve) ≤0.45 and AUROC > 0.54. To calculate the AUROC for each sequence motif, we first sorted each 150-base sequence by
<italic>in vitro </italic>
occupancy, and used the presence or absence of the sequence feature to define positive and negative instances. For the base composition and dinucleotide feature models, we calculated the Pearson correlation to the measured
<italic>in vitro </italic>
nucleosome occupancy, and retained those with correlation > |0.10|. We then ran Lasso on the selected sequence features, training on 1,000,000 randomly selected data points from chromosomes 1-9 (or other sets of chromosomes as indicated) which had been standardized to have mean zero and unit variance (for mathematical reasons and numerical stability) and selected the optimal weights by means of 10-fold internal cross validation. The 14 feature linear model is as follows (note that these values are different from those shown in Figure
<xref ref-type="fig" rid="F1">1</xref>
because we have compensated for the unit-normalization step that Lasso incorporates;
<bold>Figure S2-3 </bold>
(Additional File
<xref ref-type="supplementary-material" rid="S1">1</xref>
) show the equivalent of Figure
<xref ref-type="fig" rid="F1">1</xref>
and S1 (Additional File
<xref ref-type="supplementary-material" rid="S1">1</xref>
) but with unit-normalization removed):</p>
<p>Intrinsic sequence preference = 1.67175 × G+C content + 0.145742 × propeller twist + 1.31928 × slide - 0.10549 × freqAAAA - 0.07628 × freqAAAT - 0.03006 × freqAAGT - 0.05055 × freqAATA - 0.02564 × freqAATT - 0.02154 × freqAGAA - 0.03949 × freqATAA - 0.02354 × freqATAT - 0.03214 × freqATTA - 0.03314 × freqGAAA - 0.0334 × freqTATA + 1.788022</p>
<p>Where 4-mer occurrence is calculated in 150 bp windows, and G+C content, propeller twist and slide are calculated in 75 bp windows as described above. Propeller twist and slide were calculated as averages over all dinucleotides from tables found in[
<xref ref-type="bibr" rid="B19">19</xref>
]:
<disp-formula>
<graphic xlink:href="1471-2105-10-442-i1.gif"></graphic>
</disp-formula>
</p>
<p>Where
<italic>S(i) </italic>
is the structural feature (propeller twist, slide) score for the dinucleotide at position
<italic>i</italic>
.</p>
<p>For propeller twist and slide, the full equations are (from the PROPERTY database[
<xref ref-type="bibr" rid="B19">19</xref>
]
<ext-link ext-link-type="uri" xlink:href="http://srs6.bionet.nsc.ru/srs6bin/cgi-bin/wgetz?-page+LibInfo+-newId+-lib+PROPERTY">http://srs6.bionet.nsc.ru/srs6bin/cgi-bin/wgetz?-page+LibInfo+-newId+-lib+PROPERTY</ext-link>
:</p>
<p>Average propeller twist = (-17.3 × freqAA - 6.7 × freqAC - 14.3 × freqAG - 16.9 × freqAT - 8.6 × freqCA - 12.8 × freqCC - 11.2 × freqCG - 14.3 × freqCT - 15.1 × freqGA - 11.7 × freqGC - 12.8 × freqGG - 6.7 × freqGT - 11.1 × freqTA - 15.1 × freqTC - 8.6 × freqTG - 17.3 × freqTT)/75</p>
<p>Average slide = (-0.03 × freqAA - 0.13 × freqAC + 0.47 × freqAG - 0.37 × freqAT + 1.46 × freqCA + 0.6 × freqCC + 0.63 × freqCG + 0.47 × freqCT - 0.07 × freqGA + 0.29 × freqGC + 0.6 × freqGG - 0.13 × freqGT + 0.74 × freqTA - 0.07 × freqTC + 1.46 × freqTG - 0.03 × freqTT)/75</p>
<p>We predicted nucleosome occupancy in yeast and
<italic>C. elegans </italic>
genomes using the model scored on 150-base windows surrounding each data point in both
<italic>in vitro </italic>
and
<italic>in vivo </italic>
nucleosome maps [
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
] at 1 bp intervals.</p>
</sec>
<sec>
<title>Comparison of nucleosome occupancy prediction models</title>
<p>We obtained nucleosome prediction software from the authors' website
<ext-link ext-link-type="uri" xlink:href="http://genie.weizmann.ac.il/software/nucleo_exe.html">http://genie.weizmann.ac.il/software/nucleo_exe.html</ext-link>
[
<xref ref-type="bibr" rid="B8">8</xref>
,
<xref ref-type="bibr" rid="B23">23</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
], and used the P
<sub>occ </sub>
or "average occupancy" measure. For other models[
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B26">26</xref>
,
<xref ref-type="bibr" rid="B29">29</xref>
,
<xref ref-type="bibr" rid="B31">31</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
], we requested the code from the authors. An implementation of the nucleosome positioning sequence scoring metric[
<xref ref-type="bibr" rid="B22">22</xref>
] was obtained from Dr. G.C. Yuan. Scores for the model described in[
<xref ref-type="bibr" rid="B25">25</xref>
] on all sequence data sets tested were provided by Yair Field. For all models, with the exception of the Peckham SVM[
<xref ref-type="bibr" rid="B25">25</xref>
], we predicted nucleosome occupancy across the yeast test set used for the Lasso model derived in this study (chromosomes 10-16),
<italic>C. elegans </italic>
chrIII, and synthetic 150-mer oligonucleotides (both microarray and sequencing data sets)[
<xref ref-type="bibr" rid="B8">8</xref>
], using default parameters for all models. In the case of the Peckham SVM[
<xref ref-type="bibr" rid="B25">25</xref>
], which outputs a score to every 50 bp sequence, scores over a 150-base window were calculated by averaging all contained 50 bp scores for all sequences analyzed.</p>
</sec>
</sec>
<sec>
<title>Authors' contributions</title>
<p>DT performed the analyses. TRH coordinated the study. TRH and DT contributed to the preparation of the manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>Supplementary data and figures</bold>
. Contains Table S1, and Figures S1-3.</p>
</caption>
<media xlink:href="1471-2105-10-442-S1.PDF" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>DT is supported by an Alexander Graham Bell Canada Graduate Scholarship. This work was partially supported by funding to TRH from the Canadian Institutes of Health Research, Genome Canada through the Ontario Genomics Institute and the Ontario Research Fund, the Howard Hughes Medical Institute, and the Canadian Institute For Advanced Research. We thank Guocheng Yuan, Michael Tolstorukov, Thierry Grange, and Yair Field for providing assistance with their nucleosome prediction software. We are grateful to Eran Segal, Jason Lieb, and Jon Widom for helpful conversations and critical evaluation of the manuscript.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Luger</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Mader</surname>
<given-names>AW</given-names>
</name>
<name>
<surname>Richmond</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Sargent</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Richmond</surname>
<given-names>TJ</given-names>
</name>
<article-title>Crystal structure of the nucleosome core particle at 2.8 A resolution</article-title>
<source>Nature</source>
<year>1997</year>
<volume>389</volume>
<issue>6648</issue>
<fpage>251</fpage>
<lpage>260</lpage>
<pub-id pub-id-type="doi">10.1038/38444</pub-id>
<pub-id pub-id-type="pmid">9305837</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Lee</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Tillo</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bray</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Morse</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>RW</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Nislow</surname>
<given-names>C</given-names>
</name>
<article-title>A high-resolution atlas of nucleosome occupancy in yeast</article-title>
<source>Nat Genet</source>
<year>2007</year>
<volume>39</volume>
<issue>10</issue>
<fpage>1235</fpage>
<lpage>1244</lpage>
<pub-id pub-id-type="doi">10.1038/ng2117</pub-id>
<pub-id pub-id-type="pmid">17873876</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Groth</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rocha</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Verreault</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Almouzni</surname>
<given-names>G</given-names>
</name>
<article-title>Chromatin challenges during DNA replication and repair</article-title>
<source>Cell</source>
<year>2007</year>
<volume>128</volume>
<issue>4</issue>
<fpage>721</fpage>
<lpage>733</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2007.01.030</pub-id>
<pub-id pub-id-type="pmid">17320509</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Carey</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Workman</surname>
<given-names>JL</given-names>
</name>
<article-title>The role of chromatin during transcription</article-title>
<source>Cell</source>
<year>2007</year>
<volume>128</volume>
<issue>4</issue>
<fpage>707</fpage>
<lpage>719</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2007.01.015</pub-id>
<pub-id pub-id-type="pmid">17320508</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Yuan</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>YJ</given-names>
</name>
<name>
<surname>Dion</surname>
<given-names>MF</given-names>
</name>
<name>
<surname>Slack</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>LF</given-names>
</name>
<name>
<surname>Altschuler</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Rando</surname>
<given-names>OJ</given-names>
</name>
<article-title>Genome-scale identification of nucleosome positions in S. cerevisiae</article-title>
<source>Science</source>
<year>2005</year>
<volume>309</volume>
<issue>5734</issue>
<fpage>626</fpage>
<lpage>630</lpage>
<pub-id pub-id-type="doi">10.1126/science.1112178</pub-id>
<pub-id pub-id-type="pmid">15961632</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Lee</surname>
<given-names>CK</given-names>
</name>
<name>
<surname>Shibata</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Rao</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Strahl</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>Lieb</surname>
<given-names>JD</given-names>
</name>
<article-title>Evidence for nucleosome depletion at active regulatory regions genome-wide</article-title>
<source>Nat Genet</source>
<year>2004</year>
<volume>36</volume>
<issue>8</issue>
<fpage>900</fpage>
<lpage>905</lpage>
<pub-id pub-id-type="doi">10.1038/ng1400</pub-id>
<pub-id pub-id-type="pmid">15247917</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Humphrey</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Perlstein</surname>
<given-names>EO</given-names>
</name>
<name>
<surname>Schreiber</surname>
<given-names>SL</given-names>
</name>
<article-title>Global nucleosome occupancy in yeast</article-title>
<source>Genome Biol</source>
<year>2004</year>
<volume>5</volume>
<issue>9</issue>
<fpage>R62</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2004-5-9-r62</pub-id>
<pub-id pub-id-type="pmid">15345046</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Kaplan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>IK</given-names>
</name>
<name>
<surname>Fondufe-Mittendorf</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Gossett</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Tillo</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>LeProust</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Hughes</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Lieb</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Widom</surname>
<given-names>J</given-names>
</name>
<article-title>The DNA-encoded nucleosome organization of a eukaryotic genome</article-title>
<source>Nature</source>
<year>2009</year>
<volume>458</volume>
<issue>7236</issue>
<fpage>362</fpage>
<lpage>366</lpage>
<pub-id pub-id-type="doi">10.1038/nature07667</pub-id>
<pub-id pub-id-type="pmid">19092803</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Sekinger</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Moqtaderi</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Struhl</surname>
<given-names>K</given-names>
</name>
<article-title>Intrinsic histone-DNA interactions and low nucleosome density are important for preferential accessibility of promoter regions in yeast</article-title>
<source>Mol Cell</source>
<year>2005</year>
<volume>18</volume>
<issue>6</issue>
<fpage>735</fpage>
<lpage>748</lpage>
<pub-id pub-id-type="doi">10.1016/j.molcel.2005.05.003</pub-id>
<pub-id pub-id-type="pmid">15949447</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Amirhaeri</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Wells</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Griffith</surname>
<given-names>JD</given-names>
</name>
<article-title>Preferential nucleosome assembly at DNA triplet repeats from the myotonic dystrophy gene</article-title>
<source>Science (New York, NY)</source>
<year>1994</year>
<volume>265</volume>
<issue>5172</issue>
<fpage>669</fpage>
<lpage>671</lpage>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Ozsolak</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Song</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>XS</given-names>
</name>
<name>
<surname>Fisher</surname>
<given-names>DE</given-names>
</name>
<article-title>High-throughput mapping of the chromatin structure of human promoters</article-title>
<source>Nature biotechnology</source>
<year>2007</year>
<volume>25</volume>
<issue>2</issue>
<fpage>244</fpage>
<lpage>248</lpage>
<pub-id pub-id-type="doi">10.1038/nbt1279</pub-id>
<pub-id pub-id-type="pmid">17220878</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Cao</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Widlund</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Simonsson</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kubista</surname>
<given-names>M</given-names>
</name>
<article-title>TGGA repeats impair nucleosome formation</article-title>
<source>J Mol Biol</source>
<year>1998</year>
<volume>281</volume>
<issue>2</issue>
<fpage>253</fpage>
<lpage>260</lpage>
<pub-id pub-id-type="doi">10.1006/jmbi.1998.1925</pub-id>
<pub-id pub-id-type="pmid">9698546</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Drew</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Travers</surname>
<given-names>AA</given-names>
</name>
<article-title>DNA bending and its relation to nucleosome positioning</article-title>
<source>J Mol Biol</source>
<year>1985</year>
<volume>186</volume>
<issue>4</issue>
<fpage>773</fpage>
<lpage>790</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(85)90396-1</pub-id>
<pub-id pub-id-type="pmid">3912515</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Suter</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Schnappauf</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Thoma</surname>
<given-names>F</given-names>
</name>
<article-title>Poly(dA.dT) sequences exist as rigid DNA structures in nucleosome-free yeast promoters in vivo</article-title>
<source>Nucleic acids research</source>
<year>2000</year>
<volume>28</volume>
<issue>21</issue>
<fpage>4083</fpage>
<lpage>4089</lpage>
<pub-id pub-id-type="doi">10.1093/nar/28.21.4083</pub-id>
<pub-id pub-id-type="pmid">11058103</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Gellibolian</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Shimizu</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wells</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Griffith</surname>
<given-names>J</given-names>
</name>
<article-title>Long CCG triplet repeat blocks exclude nucleosomes: a possible mechanism for the nature of fragile sites in chromosomes</article-title>
<source>J Mol Biol</source>
<year>1996</year>
<volume>263</volume>
<issue>4</issue>
<fpage>511</fpage>
<lpage>516</lpage>
<pub-id pub-id-type="doi">10.1006/jmbi.1996.0593</pub-id>
<pub-id pub-id-type="pmid">8918933</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Calladine</surname>
<given-names>CR</given-names>
</name>
<name>
<surname>Drew</surname>
<given-names>HR</given-names>
</name>
<article-title>Principles of sequence-dependent flexure of DNA</article-title>
<source>J Mol Biol</source>
<year>1986</year>
<volume>192</volume>
<issue>4</issue>
<fpage>907</fpage>
<lpage>918</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(86)90036-7</pub-id>
<pub-id pub-id-type="pmid">3586013</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Sivolob</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Khrapunov</surname>
<given-names>SN</given-names>
</name>
<article-title>Translational positioning of nucleosomes on DNA: the role of sequence-dependent isotropic DNA bending stiffness</article-title>
<source>J Mol Biol</source>
<year>1995</year>
<volume>247</volume>
<issue>5</issue>
<fpage>918</fpage>
<lpage>931</lpage>
<pub-id pub-id-type="doi">10.1006/jmbi.1994.0190</pub-id>
<pub-id pub-id-type="pmid">7723041</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Brukner</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Sanchez</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Suck</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Pongor</surname>
<given-names>S</given-names>
</name>
<article-title>Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides</article-title>
<source>The EMBO journal</source>
<year>1995</year>
<volume>14</volume>
<issue>8</issue>
<fpage>1812</fpage>
<lpage>1818</lpage>
<pub-id pub-id-type="pmid">7737131</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Ponomarenko</surname>
<given-names>JV</given-names>
</name>
<name>
<surname>Ponomarenko</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Frolov</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Vorobyev</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Overton</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Kolchanov</surname>
<given-names>NA</given-names>
</name>
<article-title>Conformational and physicochemical DNA features specific for transcription factor binding sites</article-title>
<source>Bioinformatics</source>
<year>1999</year>
<volume>15</volume>
<issue>7-8</issue>
<fpage>654</fpage>
<lpage>668</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/15.7.654</pub-id>
<pub-id pub-id-type="pmid">10487873</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Ioshikhes</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Bolshoy</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Derenshteyn</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Borodovsky</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Trifonov</surname>
<given-names>EN</given-names>
</name>
<article-title>Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences</article-title>
<source>Journal of molecular biology</source>
<year>1996</year>
<volume>262</volume>
<issue>2</issue>
<fpage>129</fpage>
<lpage>139</lpage>
<pub-id pub-id-type="doi">10.1006/jmbi.1996.0503</pub-id>
<pub-id pub-id-type="pmid">8831784</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Satchwell</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Drew</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Travers</surname>
<given-names>AA</given-names>
</name>
<article-title>Sequence periodicities in chicken nucleosome core DNA</article-title>
<source>Journal of molecular biology</source>
<year>1986</year>
<volume>191</volume>
<issue>4</issue>
<fpage>659</fpage>
<lpage>675</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(86)90452-3</pub-id>
<pub-id pub-id-type="pmid">3806678</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<name>
<surname>Ioshikhes</surname>
<given-names>IP</given-names>
</name>
<name>
<surname>Albert</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Zanton</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Pugh</surname>
<given-names>BF</given-names>
</name>
<article-title>Nucleosome positions predicted through comparative genomics</article-title>
<source>Nat Genet</source>
<year>2006</year>
<volume>38</volume>
<issue>10</issue>
<fpage>1210</fpage>
<lpage>1215</lpage>
<pub-id pub-id-type="doi">10.1038/ng1878</pub-id>
<pub-id pub-id-type="pmid">16964265</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Segal</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Fondufe-Mittendorf</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Thastrom</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>IK</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Widom</surname>
<given-names>J</given-names>
</name>
<article-title>A genomic code for nucleosome positioning</article-title>
<source>Nature</source>
<year>2006</year>
<volume>442</volume>
<issue>7104</issue>
<fpage>772</fpage>
<lpage>778</lpage>
<pub-id pub-id-type="doi">10.1038/nature04979</pub-id>
<pub-id pub-id-type="pmid">16862119</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Field</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kaplan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Fondufe-Mittendorf</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>IK</given-names>
</name>
<name>
<surname>Sharon</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lubling</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Widom</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Segal</surname>
<given-names>E</given-names>
</name>
<article-title>Distinct modes of regulation by chromatin encoded through nucleosome positioning signals</article-title>
<source>PLoS computational biology</source>
<year>2008</year>
<volume>4</volume>
<issue>11</issue>
<fpage>e1000216</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1000216</pub-id>
<pub-id pub-id-type="pmid">18989395</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Peckham</surname>
<given-names>HE</given-names>
</name>
<name>
<surname>Thurman</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Stamatoyannopoulos</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Noble</surname>
<given-names>WS</given-names>
</name>
<name>
<surname>Struhl</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z</given-names>
</name>
<article-title>Nucleosome positioning signals in genomic DNA</article-title>
<source>Genome research</source>
<year>2007</year>
<volume>17</volume>
<issue>8</issue>
<fpage>1170</fpage>
<lpage>1177</lpage>
<pub-id pub-id-type="doi">10.1101/gr.6101007</pub-id>
<pub-id pub-id-type="pmid">17620451</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Yuan</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
<article-title>Genomic sequence is highly predictive of local nucleosome depletion</article-title>
<source>PLoS computational biology</source>
<year>2008</year>
<volume>4</volume>
<issue>1</issue>
<fpage>e13</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.0040013</pub-id>
<pub-id pub-id-type="pmid">18225943</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Segal</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Widom</surname>
<given-names>J</given-names>
</name>
<article-title>Poly(dA:dT) tracts: major determinants of nucleosome organization</article-title>
<source>Curr Opin Struct Biol</source>
<year>2009</year>
<volume>19</volume>
<issue>1</issue>
<fpage>65</fpage>
<lpage>71</lpage>
<pub-id pub-id-type="doi">10.1016/j.sbi.2009.01.004</pub-id>
<pub-id pub-id-type="pmid">19208466</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Schwartz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Meshorer</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ast</surname>
<given-names>G</given-names>
</name>
<article-title>Chromatin organization marks exon-intron structure</article-title>
<source>Nature structural & molecular biology</source>
<year>2009</year>
<volume>16</volume>
<issue>9</issue>
<fpage>990</fpage>
<lpage>995</lpage>
<pub-id pub-id-type="doi">10.1038/nsmb.1659</pub-id>
<pub-id pub-id-type="pmid">19684600</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Miele</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Vaillant</surname>
<given-names>C</given-names>
</name>
<name>
<surname>d'Aubenton-Carafa</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Thermes</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Grange</surname>
<given-names>T</given-names>
</name>
<article-title>DNA physical properties determine nucleosome occupancy from yeast to fly</article-title>
<source>Nucleic acids research</source>
<year>2008</year>
<volume>36</volume>
<issue>11</issue>
<fpage>3746</fpage>
<lpage>3756</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn262</pub-id>
<pub-id pub-id-type="pmid">18487627</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
<article-title>Regression shrinkage and selection via the Lasso</article-title>
<source>Journal of the Royal Statistical Society Series B-Methodological</source>
<year>1996</year>
<volume>58</volume>
<issue>1</issue>
<fpage>267</fpage>
<lpage>288</lpage>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Tolstorukov</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Choudhary</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>WK</given-names>
</name>
<name>
<surname>Zhurkin</surname>
<given-names>VB</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>PJ</given-names>
</name>
<article-title>nuScore: a web-interface for nucleosome positioning predictions</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<issue>12</issue>
<fpage>1456</fpage>
<lpage>1458</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btn212</pub-id>
<pub-id pub-id-type="pmid">18445607</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<name>
<surname>Tolstorukov</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Colasanti</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>McCandlish</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Olson</surname>
<given-names>WK</given-names>
</name>
<name>
<surname>Zhurkin</surname>
<given-names>VB</given-names>
</name>
<article-title>A novel roll-and-slide mechanism of DNA folding in chromatin: implications for nucleosome positioning</article-title>
<source>Journal of molecular biology</source>
<year>2007</year>
<volume>371</volume>
<issue>3</issue>
<fpage>725</fpage>
<lpage>738</lpage>
<pub-id pub-id-type="doi">10.1016/j.jmb.2007.05.048</pub-id>
<pub-id pub-id-type="pmid">17585938</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Dohm</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Lottaz</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Borodina</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Himmelbauer</surname>
<given-names>H</given-names>
</name>
<article-title>Substantial biases in ultra-short read data sets from high-throughput DNA sequencing</article-title>
<source>Nucleic acids research</source>
<year>2008</year>
<volume>36</volume>
<issue>16</issue>
<fpage>e105</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn425</pub-id>
<pub-id pub-id-type="pmid">18660515</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Valouev</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ichikawa</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tonthat</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Stuart</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ranade</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Peckham</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Malek</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Costa</surname>
<given-names>G</given-names>
</name>
<name>
<surname>McKernan</surname>
<given-names>K</given-names>
</name>
<article-title>A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning</article-title>
<source>Genome research</source>
<year>2008</year>
<volume>18</volume>
<issue>7</issue>
<fpage>1051</fpage>
<lpage>1063</lpage>
<pub-id pub-id-type="doi">10.1101/gr.076463.108</pub-id>
<pub-id pub-id-type="pmid">18477713</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Barbic</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zimmer</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Crothers</surname>
<given-names>DM</given-names>
</name>
<article-title>Structural origins of adenine-tract bending</article-title>
<source>Proceedings of the National Academy of Sciences of the United States of America</source>
<year>2003</year>
<volume>100</volume>
<issue>5</issue>
<fpage>2369</fpage>
<lpage>2373</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.0437877100</pub-id>
<pub-id pub-id-type="pmid">12586860</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="book">
<name>
<surname>Rice</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Correll</surname>
<given-names>CC</given-names>
</name>
<source>Protein-nucleic acid interactions: structural biology</source>
<year>2008</year>
<publisher-name>Cambridge: Royal Society of Chemistry</publisher-name>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Gardiner-Garden</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Frommer</surname>
<given-names>M</given-names>
</name>
<article-title>CpG islands in vertebrate genomes</article-title>
<source>J Mol Biol</source>
<year>1987</year>
<volume>196</volume>
<issue>2</issue>
<fpage>261</fpage>
<lpage>282</lpage>
<pub-id pub-id-type="doi">10.1016/0022-2836(87)90689-9</pub-id>
<pub-id pub-id-type="pmid">3656447</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Thiery</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Macaya</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Bernardi</surname>
<given-names>G</given-names>
</name>
<article-title>An analysis of eukaryotic genomes by density gradient centrifugation</article-title>
<source>J Mol Biol</source>
<year>1976</year>
<volume>108</volume>
<issue>1</issue>
<fpage>219</fpage>
<lpage>235</lpage>
<pub-id pub-id-type="doi">10.1016/S0022-2836(76)80104-0</pub-id>
<pub-id pub-id-type="pmid">826643</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Aerts</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thijs</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Dabrowski</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Moreau</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>De Moor</surname>
<given-names>B</given-names>
</name>
<article-title>Comprehensive analysis of the base composition around the transcription start site in Metazoa</article-title>
<source>BMC Genomics</source>
<year>2004</year>
<volume>5</volume>
<issue>1</issue>
<fpage>34</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-5-34</pub-id>
<pub-id pub-id-type="pmid">15171795</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>Efron</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Hastie</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Johnstone</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
<article-title>Least Angle Regression</article-title>
<source>Annals of Statistics</source>
<year>2004</year>
<volume>32</volume>
<issue>2</issue>
<fpage>407</fpage>
<lpage>499</lpage>
<pub-id pub-id-type="doi">10.1214/009053604000000067</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Thastrom</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bingham</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Widom</surname>
<given-names>J</given-names>
</name>
<article-title>Nucleosomal locations of dominant DNA sequence motifs for histone-DNA interactions and nucleosome positioning</article-title>
<source>Journal of molecular biology</source>
<year>2004</year>
<volume>338</volume>
<issue>4</issue>
<fpage>695</fpage>
<lpage>709</lpage>
<pub-id pub-id-type="doi">10.1016/j.jmb.2004.03.032</pub-id>
<pub-id pub-id-type="pmid">15099738</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A86 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000A86 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:2808325
   |texte=   G+C content dominates intrinsic nucleosome occupancy
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:20028554" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021