Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The exceptional genomic word symmetry along DNA sequences

Identifieur interne : 000109 ( Pmc/Corpus ); précédent : 000108; suivant : 000110

The exceptional genomic word symmetry along DNA sequences

Auteurs : Vera Afreixo ; João M. O. S. Rodrigues ; Carlos A. C. Bastos ; Raquel M. Silva

Source :

RBID : PMC:4738807

Abstract

Background

The second Chargaff’s parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligonucleotides. The global exceptional symmetry was detected in long and short genomes.

Results

To explore the exceptional genomic word symmetry along the genome sequences, we propose a sliding window method to extract the values of exceptional symmetry (for all words or by word groups). We compare the exceptional symmetry effect size distribution in all human chromosomes against control scenarios (positive and negative controls), testing the differences and performing a residual analysis. We explore local exceptional symmetry in equivalent composition word groups, and find that the behaviour of the local exceptional symmetry depends on the word group.

Conclusions

We conclude that the exceptional symmetry is a local phenomenon in genome sequences, with distinct characteristics along the sequence of each chromosome. The local exceptional symmetry along the genomic sequences shows outlying segments, and those segments have high biological annotation density.


Url:
DOI: 10.1186/s12859-016-0905-0
PubMed: 26842742
PubMed Central: 4738807

Links to Exploration step

PMC:4738807

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The exceptional genomic word symmetry along DNA sequences</title>
<author>
<name sortKey="Afreixo, Vera" sort="Afreixo, Vera" uniqKey="Afreixo V" first="Vera" last="Afreixo">Vera Afreixo</name>
<affiliation>
<nlm:aff id="Aff1">Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff3">Department of Medical Sciences and Institute of Biomedicine – iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rodrigues, Joao M O S" sort="Rodrigues, Joao M O S" uniqKey="Rodrigues J" first="João M. O. S." last="Rodrigues">João M. O. S. Rodrigues</name>
<affiliation>
<nlm:aff id="Aff2">Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bastos, Carlos A C" sort="Bastos, Carlos A C" uniqKey="Bastos C" first="Carlos A. C." last="Bastos">Carlos A. C. Bastos</name>
<affiliation>
<nlm:aff id="Aff2">Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Raquel M" sort="Silva, Raquel M" uniqKey="Silva R" first="Raquel M." last="Silva">Raquel M. Silva</name>
<affiliation>
<nlm:aff id="Aff3">Department of Medical Sciences and Institute of Biomedicine – iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26842742</idno>
<idno type="pmc">4738807</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4738807</idno>
<idno type="RBID">PMC:4738807</idno>
<idno type="doi">10.1186/s12859-016-0905-0</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000109</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000109</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">The exceptional genomic word symmetry along DNA sequences</title>
<author>
<name sortKey="Afreixo, Vera" sort="Afreixo, Vera" uniqKey="Afreixo V" first="Vera" last="Afreixo">Vera Afreixo</name>
<affiliation>
<nlm:aff id="Aff1">Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff3">Department of Medical Sciences and Institute of Biomedicine – iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rodrigues, Joao M O S" sort="Rodrigues, Joao M O S" uniqKey="Rodrigues J" first="João M. O. S." last="Rodrigues">João M. O. S. Rodrigues</name>
<affiliation>
<nlm:aff id="Aff2">Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bastos, Carlos A C" sort="Bastos, Carlos A C" uniqKey="Bastos C" first="Carlos A. C." last="Bastos">Carlos A. C. Bastos</name>
<affiliation>
<nlm:aff id="Aff2">Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Silva, Raquel M" sort="Silva, Raquel M" uniqKey="Silva R" first="Raquel M." last="Silva">Raquel M. Silva</name>
<affiliation>
<nlm:aff id="Aff3">Department of Medical Sciences and Institute of Biomedicine – iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff4">IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The second Chargaff’s parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligonucleotides. The global exceptional symmetry was detected in long and short genomes.</p>
</sec>
<sec>
<title>Results</title>
<p>To explore the exceptional genomic word symmetry along the genome sequences, we propose a sliding window method to extract the values of exceptional symmetry (for all words or by word groups). We compare the exceptional symmetry effect size distribution in all human chromosomes against control scenarios (positive and negative controls), testing the differences and performing a residual analysis. We explore local exceptional symmetry in equivalent composition word groups, and find that the behaviour of the local exceptional symmetry depends on the word group.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We conclude that the exceptional symmetry is a local phenomenon in genome sequences, with distinct characteristics along the sequence of each chromosome. The local exceptional symmetry along the genomic sequences shows outlying segments, and those segments have high biological annotation density.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Chargaff, E" uniqKey="Chargaff E">E Chargaff</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karkas, Jd" uniqKey="Karkas J">JD Karkas</name>
</author>
<author>
<name sortKey="Rudner, R" uniqKey="Rudner R">R Rudner</name>
</author>
<author>
<name sortKey="Chargaff, E" uniqKey="Chargaff E">E Chargaff</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Forsdyke, Dr" uniqKey="Forsdyke D">DR Forsdyke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qi, D" uniqKey="Qi D">D Qi</name>
</author>
<author>
<name sortKey="Cuticchia, Aj" uniqKey="Cuticchia A">AJ Cuticchia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kong, Sg" uniqKey="Kong S">SG Kong</name>
</author>
<author>
<name sortKey="Fan, Wl" uniqKey="Fan W">WL Fan</name>
</author>
<author>
<name sortKey="Chen, Hd" uniqKey="Chen H">HD Chen</name>
</author>
<author>
<name sortKey="Hsu, Zt" uniqKey="Hsu Z">ZT Hsu</name>
</author>
<author>
<name sortKey="Zhou, N" uniqKey="Zhou N">N Zhou</name>
</author>
<author>
<name sortKey="Zheng, B" uniqKey="Zheng B">B Zheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Sh" uniqKey="Zhang S">SH Zhang</name>
</author>
<author>
<name sortKey="Huang, Yz" uniqKey="Huang Y">YZ Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Forsdyke, Dr" uniqKey="Forsdyke D">DR Forsdyke</name>
</author>
<author>
<name sortKey="Bell, Sj" uniqKey="Bell S">SJ Bell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baisnee, Pf" uniqKey="Baisnee P">PF Baisnée</name>
</author>
<author>
<name sortKey="Hampson, S" uniqKey="Hampson S">S Hampson</name>
</author>
<author>
<name sortKey="Baldi, P" uniqKey="Baldi P">P Baldi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Albrecht Buehler, G" uniqKey="Albrecht Buehler G">G Albrecht-Buehler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lobry, Jr" uniqKey="Lobry J">JR Lobry</name>
</author>
<author>
<name sortKey="Lobry, C" uniqKey="Lobry C">C Lobry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Powdel, B" uniqKey="Powdel B">B Powdel</name>
</author>
<author>
<name sortKey="Satapathy, S" uniqKey="Satapathy S">S Satapathy</name>
</author>
<author>
<name sortKey="Kumar, A" uniqKey="Kumar A">A Kumar</name>
</author>
<author>
<name sortKey="Jha, P" uniqKey="Jha P">P Jha</name>
</author>
<author>
<name sortKey="Buragohain, A" uniqKey="Buragohain A">A Buragohain</name>
</author>
<author>
<name sortKey="Borah, M" uniqKey="Borah M">M Borah</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Afreixo, V" uniqKey="Afreixo V">V Afreixo</name>
</author>
<author>
<name sortKey="Rodrigues, Jamos" uniqKey="Rodrigues J">JAMOS Rodrigues</name>
</author>
<author>
<name sortKey="Bastos, Cac" uniqKey="Bastos C">CAC Bastos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cozzi, P" uniqKey="Cozzi P">P Cozzi</name>
</author>
<author>
<name sortKey="Milanesi, L" uniqKey="Milanesi L">L Milanesi</name>
</author>
<author>
<name sortKey="Bernardi, G" uniqKey="Bernardi G">G Bernardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Agresti, A" uniqKey="Agresti A">A Agresti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holm, S" uniqKey="Holm S">S Holm</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26842742</article-id>
<article-id pub-id-type="pmc">4738807</article-id>
<article-id pub-id-type="publisher-id">905</article-id>
<article-id pub-id-type="doi">10.1186/s12859-016-0905-0</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>The exceptional genomic word symmetry along DNA sequences</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Afreixo</surname>
<given-names>Vera</given-names>
</name>
<address>
<email>vera@ua.pt</email>
</address>
<xref ref-type="aff" rid="Aff1"></xref>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rodrigues</surname>
<given-names>João M. O. S.</given-names>
</name>
<address>
<email>cbastos@ua.pt</email>
</address>
<xref ref-type="aff" rid="Aff2"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bastos</surname>
<given-names>Carlos A. C.</given-names>
</name>
<address>
<email>cbastos@ua.pt</email>
</address>
<xref ref-type="aff" rid="Aff2"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Silva</surname>
<given-names>Raquel M.</given-names>
</name>
<address>
<email>raquelsilva@ua.pt</email>
</address>
<xref ref-type="aff" rid="Aff3"></xref>
<xref ref-type="aff" rid="Aff4"></xref>
</contrib>
<aff id="Aff1">
<label></label>
Department of Mathematics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</aff>
<aff id="Aff2">
<label></label>
Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</aff>
<aff id="Aff3">
<label></label>
Department of Medical Sciences and Institute of Biomedicine – iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal, Campus Universitário de Santiago, Aveiro, Portugal</aff>
<aff id="Aff4">
<label></label>
IEETA-Institute of Electronic Engineering and Informatics of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>3</day>
<month>2</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>3</day>
<month>2</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>17</volume>
<elocation-id>59</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>9</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>1</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>© Afreixo et al. 2016</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>The second Chargaff’s parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligonucleotides. The global exceptional symmetry was detected in long and short genomes.</p>
</sec>
<sec>
<title>Results</title>
<p>To explore the exceptional genomic word symmetry along the genome sequences, we propose a sliding window method to extract the values of exceptional symmetry (for all words or by word groups). We compare the exceptional symmetry effect size distribution in all human chromosomes against control scenarios (positive and negative controls), testing the differences and performing a residual analysis. We explore local exceptional symmetry in equivalent composition word groups, and find that the behaviour of the local exceptional symmetry depends on the word group.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We conclude that the exceptional symmetry is a local phenomenon in genome sequences, with distinct characteristics along the sequence of each chromosome. The local exceptional symmetry along the genomic sequences shows outlying segments, and those segments have high biological annotation density.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Exceptional symmetry</kwd>
<kwd>Genome</kwd>
<kwd>Chargaff’s second parity rule</kwd>
<kwd>Window analysis</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source>
<institution>Institute for Biomedicine - iBiMED</institution>
</funding-source>
<award-id>UID/BIM/04501/2013</award-id>
<principal-award-recipient>
<name>
<surname>Afreixo</surname>
<given-names>Vera</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2016</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Chargaff’s first parity rule states that, in any sequence of double-stranded DNA molecules, the total number of complementary nucleotides is exactly equal [
<xref ref-type="bibr" rid="CR1">1</xref>
]. Chargaff’s second parity rule states that those quantities are almost equal in a single strand of DNA [
<xref ref-type="bibr" rid="CR2">2</xref>
<xref ref-type="bibr" rid="CR4">4</xref>
], and this phenomenon holds in almost all living organisms.</p>
<p>The extension to the second parity rule is also known as single strand symmetry phenomenon. The single strand symmetry states that, in each DNA strand, the proportion of an oligonucleotide should be similar to that of its reverse complement [
<xref ref-type="bibr" rid="CR5">5</xref>
<xref ref-type="bibr" rid="CR8">8</xref>
]. There is no knowledge about why the parity is needed and there is no consensual explanation for the occurrence of the single strand phenomenon. There are some attempts to explain the phenomenon related with the species evolution process, for example: stem-loops hypothesis [
<xref ref-type="bibr" rid="CR9">9</xref>
]; duplication followed by inversion hypothesis [
<xref ref-type="bibr" rid="CR10">10</xref>
]; inversions and inverted transposition hypothesis [
<xref ref-type="bibr" rid="CR11">11</xref>
]; no strand bias [
<xref ref-type="bibr" rid="CR12">12</xref>
]; original trait of the primordial genome [
<xref ref-type="bibr" rid="CR8">8</xref>
].</p>
<p>Powdel and others [
<xref ref-type="bibr" rid="CR13">13</xref>
] studied the symmetry phenomenon in non-overlapping regions of DNA of specific size. They analysed the frequency distributions of the local abundance of oligonucleotides along a single strand of DNA, and found that the frequency distributions of reverse complementary oligonucleotides tend to be statistically similar. Afreixo et al. [
<xref ref-type="bibr" rid="CR14">14</xref>
] introduced a new symmetry measure, which emphasizes that the frequency of an oligonucleotide is more similar to the frequency of its reverse complement than to the frequencies of other equivalent composition oligonucleotides. They also identified several word groups with a strong exceptional symmetry. Here, we have applied this measure to find genomic regions with very strong exceptional symmetry effect and to characterize their non-uniform behaviour. We observed exceptional symmetry throughout the human genome. Moreover, some regions showed outlying exceptional symmetry, and those are enriched in protein-coding annotated genes.</p>
</sec>
<sec id="Sec2">
<title>Methods</title>
<sec id="Sec3">
<title>Materials</title>
<p>We analysed the whole human genome, reference assembly build 37.3, available from the website of the National Center for Biotechnology Information. In our data processing, the chromosomes were processed as separate sequences, words were counted with overlap. We also produced and used random control experiments. Those experiments tried to mimic some features of each human chromosome and contained the same number of base pairs of the corresponding chromosome (see ‘
<xref rid="Sec5" ref-type="sec">Control experiments</xref>
’ subsection).</p>
<p>We obtained the coding sequences (cds file) for all the transcripts of the human genome (release 75) from Ensembl (
<ext-link ext-link-type="uri" xlink:href="http://www.ensembl.org/">http://www.ensembl.org/</ext-link>
), to use in coding vs non-coding region classification.</p>
</sec>
<sec id="Sec4">
<title>Exceptional genomic word symmetry</title>
<p>In a previous work, we proposed the concept of exceptional genomic word symmetry in equivalent composition groups (ECG), and globally [
<xref ref-type="bibr" rid="CR14">14</xref>
]. Exceptional symmetry is a refinement of Chargaff’s second parity rule that highlights the words whose frequencies of occurrence are similar to those of their reversed complements, but are dissimilar to the frequencies of occurrence of other words with
<italic>equivalent composition</italic>
. Words of equal length are defined to have equivalent composition if they contain the same number of nucleotides A or T.</p>
<p>Some words are equal to their reverse complement. We denote these as self symmetric words (SSW). We also define a symmetric word pair as the set composed by one word
<italic>w</italic>
and the corresponding reverse complement word
<italic>w</italic>
<sup></sup>
, with (
<italic>w</italic>
<sup></sup>
)
<sup></sup>
=
<italic>w</italic>
.</p>
<p>Let
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
denote a set of words with equivalent composition, i.e. words containing the same number (
<italic>m</italic>
) of
<italic>A</italic>
s +
<italic>T</italic>
s. For words of length
<italic>k</italic>
=2, the ECGs are:
<italic>G</italic>
<sub>0</sub>
={
<italic>C</italic>
<italic>C</italic>
,
<italic>C</italic>
<italic>G</italic>
,
<italic>G</italic>
<italic>C</italic>
,
<italic>G</italic>
<italic>G</italic>
};
<italic>G</italic>
<sub>1</sub>
={
<italic>A</italic>
<italic>C</italic>
,
<italic>A</italic>
<italic>G</italic>
,
<italic>C</italic>
<italic>A</italic>
,
<italic>G</italic>
<italic>A</italic>
,
<italic>C</italic>
<italic>T</italic>
,
<italic>G</italic>
<italic>T</italic>
,
<italic>T</italic>
<italic>C</italic>
,
<italic>T</italic>
<italic>G</italic>
} and
<italic>G</italic>
<sub>2</sub>
={
<italic>A</italic>
<italic>A</italic>
,
<italic>A</italic>
<italic>T</italic>
,
<italic>T</italic>
<italic>A</italic>
,
<italic>T</italic>
<italic>T</italic>
}. The proposed exceptional symmetry measure for
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
is given by
<disp-formula id="Equ1">
<label>(1)</label>
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \mathit{VR}(G_{m})= \sqrt{\frac{{X^{2}_{u}}(G_{m})/df_{u}(G_{m})+\epsilon}{{X^{2}_{s}}(G_{m})/df_{s}(G_{m}) +\epsilon}}, \;\;df_{s}>0 $$ \end{document}</tex-math>
<mml:math id="M2">
<mml:mi mathvariant="italic">VR</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>/</mml:mo>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>ε</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>/</mml:mo>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>ε</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
<mml:mo>,</mml:mo>
<mml:mspace width="2.77626pt"></mml:mspace>
<mml:mspace width="2.77626pt"></mml:mspace>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>></mml:mo>
<mml:mn>0</mml:mn>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equ1.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<inline-formula id="IEq1">
<alternatives>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${X^{2}_{s}}(G_{m})$\end{document}</tex-math>
<mml:math id="M4">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
is used to evaluate the discrepancy between the frequencies of symmetric words in
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
, and
<inline-formula id="IEq2">
<alternatives>
<tex-math id="M5">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${X^{2}_{u}}(G_{m})$\end{document}</tex-math>
<mml:math id="M6">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
to evaluate the variability within
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
words (discrepancy from uniformity). To define those measures we establish the following notation
<list list-type="bullet">
<list-item>
<p>
<italic>N</italic>
<sub>
<italic>m</italic>
</sub>
the number of elements in
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
.</p>
</list-item>
<list-item>
<p>
<inline-formula id="IEq3">
<alternatives>
<tex-math id="M7">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$N^{\mathit {SSW}}_{m}$\end{document}</tex-math>
<mml:math id="M8">
<mml:msubsup>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">SSW</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq3.gif"></inline-graphic>
</alternatives>
</inline-formula>
the number of elements in
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
which are self symmetric words.</p>
</list-item>
<list-item>
<p>
<inline-formula id="IEq4">
<alternatives>
<tex-math id="M9">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${N^{0}_{m}}$\end{document}</tex-math>
<mml:math id="M10">
<mml:msubsup>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq4.gif"></inline-graphic>
</alternatives>
</inline-formula>
the number of symmetric word pairs in
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
, excluding the SSWs, such that both words in the pair are absent from the nucleotide sequence under study.</p>
</list-item>
<list-item>
<p>
<italic>n</italic>
<sub>
<italic>w</italic>
</sub>
the frequency of occurrence of word
<italic>w</italic>
in a nucleotide sequence.</p>
</list-item>
<list-item>
<p>
<italic>N</italic>
<sub>
<italic>m</italic>
</sub>
the frequency of occurrence of words from group
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
in a nucleotide sequence.</p>
</list-item>
</list>
</p>
<p>The discrepancy measures for equivalent composition group
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
can be described by
<disp-formula id="Equa">
<alternatives>
<tex-math id="M11">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$X^{2}_{s}(G_{m})=\left\{\begin{array}{ll}\frac{1}{2}\sum_{w \in G_{m} \wedge n_{w}+n_{w'}\not= 0}{\frac{(n_{w}-n_{w'})^{2}}{n_{w}+n_{w'}}} &n_{m}\not=0\\ 0,& n_{m}=0,\end{array}\right.$$ \end{document}</tex-math>
<mml:math id="M12">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
<mml:munder>
<mml:mrow>
<mml:mo mathsize="big"></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
<disp-formula id="Equb">
<alternatives>
<tex-math id="M13">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$X^{2}_{u}(G_{m})=\left\{\begin{array}{ll} -n_{m}+N_{m}\sum_{w\in G_{m}}{\frac{{n_{w}^{2}}}{n_{m}}}, & n_{m}\not=0\\ 0,& n_{m}=0.\end{array}\right.$$ \end{document}</tex-math>
<mml:math id="M14">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:munder>
<mml:mrow>
<mml:mo mathsize="big"></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:munder>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equb.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Taking into account that an SSW has no discrepancy from symmetry, we introduce here an adjustment to the degrees of freedom proposed in [
<xref ref-type="bibr" rid="CR14">14</xref>
],
<disp-formula id="Equc">
<alternatives>
<tex-math id="M15">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$df_{u}(G_{m})=\left\{\begin{array}{ll} N_{m}-2,& n_{m}>0 \\ -1,& n_{m}=0. \end{array}\right.$$ \end{document}</tex-math>
<mml:math id="M16">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>></mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equc.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
and
<disp-formula id="Equd">
<alternatives>
<tex-math id="M17">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$df_{s}(G_{m})=\left(N_{m}-N^{\mathit{SSW}}_{m}-2{N^{0}_{m}}\right)/2-1 $$ \end{document}</tex-math>
<mml:math id="M18">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">SSW</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo></mml:mo>
<mml:mn>2</mml:mn>
<mml:msubsup>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equd.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>According to the exceptional symmetry concept, if
<italic>V</italic>
<italic>R</italic>
(
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
)≈1, there is no exceptional symmetry, but if
<italic>V</italic>
<italic>R</italic>
(
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
)≫1, there is exceptional symmetry.</p>
<p>To measure the global exceptional symmetry, we use
<disp-formula id="Equ2">
<label>(2)</label>
<alternatives>
<tex-math id="M19">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \mathit{VR}= \sqrt{\frac{{X^{2}_{u}}/df_{u}+\epsilon}{{X^{2}_{s}}/df_{s}+\epsilon}}, \;\;df_{s}>0 $$ \end{document}</tex-math>
<mml:math id="M20">
<mml:mi mathvariant="italic">VR</mml:mi>
<mml:mo>=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>/</mml:mo>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>ε</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>/</mml:mo>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>ε</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:msqrt>
<mml:mo>,</mml:mo>
<mml:mspace width="2.77626pt"></mml:mspace>
<mml:mspace width="2.77626pt"></mml:mspace>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>></mml:mo>
<mml:mn>0</mml:mn>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equ2.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<inline-formula id="IEq5">
<alternatives>
<tex-math id="M21">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${X^{2}_{i}}=\sum _{m \in \{0,\ldots,k\}}{{X^{2}_{i}}(G_{m})}$\end{document}</tex-math>
<mml:math id="M22">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo></mml:mo>
<mml:mo>{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:munder>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq5.gif"></inline-graphic>
</alternatives>
</inline-formula>
,
<inline-formula id="IEq6">
<alternatives>
<tex-math id="M23">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$df_{i}=-1+\sum _{m \in \{0,\ldots,k\}}{\left (df_{i}(G_{m})+1\right)}$\end{document}</tex-math>
<mml:math id="M24">
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo></mml:mo>
<mml:mo>{</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:munder>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>f</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq6.gif"></inline-graphic>
</alternatives>
</inline-formula>
and
<italic>i</italic>
∈{
<italic>s</italic>
,
<italic>u</italic>
}.</p>
<p>The exceptional genomic word symmetry values were determined in all non-overlapping sub-chromosomal regions (windows) of several specific sizes (1000 bp, 2000 bp, 5000 bp and corresponding multiples of 10, up to the size of the chromosomes). The starting window size (1000 bp) was established taking into account the maximum word size under study (k = 10) and the expected number of words in each ECG assuming uniform word distribution: as expected value we fixed at least one word in the smallest ECGs,
<italic>G</italic>
<sub>0</sub>
and
<italic>G</italic>
<sub>
<italic>k</italic>
</sub>
. However, note that for large
<italic>k</italic>
, the shorter windows (1000 bp and 2000 bp) may not include enough ocurrences in the smallest ECGs to provide a good estimate of
<italic>V</italic>
<italic>R</italic>
(
<italic>G</italic>
<sub>
<italic>m</italic>
</sub>
).</p>
</sec>
<sec id="Sec5">
<title>Control experiments</title>
<p>To produce a negative control (without exceptional symmetry) we generated two types of random scenarios
<list list-type="bullet">
<list-item>
<p>random (rnd): assuming independence and using the human chromosome nucleotide composition as input. There are small differences between the frequencies of occurrence of complementary nucleotides. Moreover, in this scenario the expected probabilities of the reverse complements are not equal but there are words in an ECG (e.g.
<italic>ATT</italic>
,
<italic>TAT</italic>
,
<italic>TTA</italic>
) with equal expected probabilities.</p>
<p>
<bold>Input:</bold>
nucleotide probabilities (
<italic>π</italic>
<sub>
<italic>A</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>C</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>G</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>T</italic>
</sub>
, where
<italic>π</italic>
<sub>
<italic>w</italic>
</sub>
denotes the probability of
<italic>w</italic>
).</p>
</list-item>
<list-item>
<p>random symmetric (sym): assuming independence and using the same composition for complementary nucleotides as input. In this scenario the expected probabilities of ECG words are the same.</p>
<p>
<bold>Input:</bold>
nucleotide probabilities (
<italic>π</italic>
<sub>
<italic>A</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>C</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>G</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>T</italic>
</sub>
, subject to
<inline-formula id="IEq7">
<alternatives>
<tex-math id="M25">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${\pi _{w}}={\pi _{w^{\prime }}}\phantom {\dot {i}\!}$\end{document}</tex-math>
<mml:math id="M26">
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq7.gif"></inline-graphic>
</alternatives>
</inline-formula>
with
<italic>w</italic>
∈{
<italic>A</italic>
,
<italic>C</italic>
,
<italic>G</italic>
,
<italic>T</italic>
}).</p>
</list-item>
</list>
</p>
<p>To produce a positive control (with exceptional symmetry for
<italic>k</italic>
=2) we generated two types of random scenarios
<list list-type="bullet">
<list-item>
<p>random with first-order dependence (mrnd): assuming first order Markov structure using the human chromosome nucleotide and dinucleotide composition as inputs.</p>
<p>
<bold>Input:</bold>
matrix of nucleotide transition probabilities (
<inline-formula id="IEq8">
<alternatives>
<tex-math id="M27">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathbf {P}= \,[\pi _{K_{1}K_{2}\phantom {\dot {i}\!}}/\pi _{K_{1}}]\phantom {\dot {i}\!}$\end{document}</tex-math>
<mml:math id="M28">
<mml:mi mathvariant="bold">P</mml:mi>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq8.gif"></inline-graphic>
</alternatives>
</inline-formula>
with
<italic>K</italic>
<sub>1</sub>
,
<italic>K</italic>
<sub>2</sub>
∈{
<italic>A</italic>
,
<italic>C</italic>
,
<italic>G</italic>
,
<italic>T</italic>
}) and initial probabilities (
<italic>π</italic>
<sub>
<italic>A</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>C</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>G</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>T</italic>
</sub>
).</p>
</list-item>
<list-item>
<p>random exceptional symmetric with first-order dependence (msym): assuming first order Markov structure using the human chromosome nucleotide and dinucleotide composition and using the same composition for inverted complement dinucleotides as inputs.</p>
<p>
<bold>Input:</bold>
matrix of nucleotide transition probabilities (
<inline-formula id="IEq9">
<alternatives>
<tex-math id="M29">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathbf {P}=\,[\pi _{K_{1}K_{2}}/\pi _{K_{1}}]$\end{document}</tex-math>
<mml:math id="M30">
<mml:mi mathvariant="bold">P</mml:mi>
<mml:mo>=</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mo>[</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>]</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq9.gif"></inline-graphic>
</alternatives>
</inline-formula>
with
<italic>K</italic>
<sub>1</sub>
,
<italic>K</italic>
<sub>2</sub>
∈{
<italic>A</italic>
,
<italic>C</italic>
,
<italic>G</italic>
,
<italic>T</italic>
}, subject to
<inline-formula id="IEq10">
<alternatives>
<tex-math id="M31">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${\pi _{w}}={\pi _{w^{\prime }}}\phantom {\dot {i}\!}$\end{document}</tex-math>
<mml:math id="M32">
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq10.gif"></inline-graphic>
</alternatives>
</inline-formula>
with
<italic>w</italic>
∈{
<italic>A</italic>
<italic>A</italic>
,
<italic>A</italic>
<italic>C</italic>
,…,
<italic>T</italic>
<italic>T</italic>
}) and initial probabilities (
<italic>π</italic>
<sub>
<italic>A</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>C</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>G</italic>
</sub>
,
<italic>π</italic>
<sub>
<italic>T</italic>
</sub>
, subject to
<inline-formula id="IEq11">
<alternatives>
<tex-math id="M33">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${\pi _{w}}={\pi _{w^{\prime }}\phantom {\dot {i}\!}}$\end{document}</tex-math>
<mml:math id="M34">
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>π</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>w</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq11.gif"></inline-graphic>
</alternatives>
</inline-formula>
with
<italic>w</italic>
∈{
<italic>A</italic>
,
<italic>C</italic>
,
<italic>G</italic>
,
<italic>T</italic>
}).</p>
</list-item>
</list>
</p>
</sec>
<sec id="Sec6">
<title>Coding region classification</title>
<p>We extracted the start and end positions of all known coding sequences from the Ensembl cds file whose gene biotype was “protein coding”. For genes with multiple transcripts, the gene start position was considered as the minimum start position of all the transcripts of that gene, and the end position as the maximum end position of the same transcripts. For each chromosome, for a given word length
<italic>k</italic>
and window size, windows that intercept a gene were labeled as coding neighbourhood windows, windows that do not intercept any gene were labeled as non-coding windows.</p>
</sec>
<sec id="Sec7">
<title>Isochores region classification</title>
<p>We used the IsoSegmenter program [
<xref ref-type="bibr" rid="CR15">15</xref>
] with the default parameters, to classify the human genome in isochore families: L1, L2, H1, H2, H3. For each chromosome, for a given word length
<italic>k</italic>
and window size, windows fully included in an isochore were labeled with the corresponding isochore family. Windows spanning more than one isochore were discarded.</p>
</sec>
<sec id="Sec8">
<title>DNA segmentation procedure</title>
<p>In order to evaluate the association between the local exceptional symmetry values and their biological relevance we propose a threshold based method to perform DNA segmentation into high and low exceptional symmetry regions.</p>
<p>To perform the DNA segmentation on the exceptional symmetry profile (the sequence of exceptional symmetry values, also referred to as
<italic>V</italic>
<italic>R</italic>
sequences), we need to choose an adequate window size and word length. The window size and the word length which show the widest diversity of local behaviours along the sequence have the potential to perform a good sequence segmentation. So, to explore the variability of local behaviours we evaluate the strict stationarity using the Kolmogorov Smirnov (KS) statistic.</p>
<p>To explore the stationarity, and find the window size and the word length which show the highest lack of stationarity, we propose the following procedure:
<list list-type="bullet">
<list-item>
<p>the
<italic>V</italic>
<italic>R</italic>
chromosome sequence (the sequence of
<italic>V</italic>
<italic>R</italic>
values in each chromosome) is divided in successive non-overlapping subsequences (
<italic>V</italic>
<italic>R</italic>
subsequences) with a fixed length (50, 100, 200);</p>
</list-item>
<list-item>
<p>for each word length and for each window size, we compute the KS statistic between the
<italic>V</italic>
<italic>R</italic>
distribution of each subsequence and the
<italic>V</italic>
<italic>R</italic>
distribution of its complete chromosome sequence;</p>
</list-item>
<list-item>
<p>to characterise the lack of stationarity in each exceptional symmetry experiment (defined by the window size and the word length) we compute the average of all KS statistics obtained from
<italic>V</italic>
<italic>R</italic>
subsequences.</p>
</list-item>
<list-item>
<p>the window size and word length of the exceptional symmetry experiment with the highest average of KS values are chosen.</p>
</list-item>
</list>
</p>
<p>To perform the DNA segmentation
<list list-type="bullet">
<list-item>
<p>we determined the quartiles of the
<italic>V</italic>
<italic>R</italic>
chromosome sequence;</p>
</list-item>
<list-item>
<p>we calculated the outlier threshold as the third quartile (
<italic>Q</italic>
<sub>3</sub>
) plus 1.5 times the interquartile range (
<italic>I</italic>
<italic>Q</italic>
<italic>R</italic>
):
<italic>Q</italic>
<sub>3</sub>
+1.5∗
<italic>I</italic>
<italic>Q</italic>
<italic>R</italic>
;</p>
</list-item>
<list-item>
<p>we identified the windows with
<italic>V</italic>
<italic>R</italic>
<italic>Q</italic>
<sub>3</sub>
+1.5∗
<italic>I</italic>
<italic>Q</italic>
<italic>R</italic>
as the regions with very high local exceptional symmetry (outlying regions) and the other regions with
<italic>V</italic>
<italic>R</italic>
<
<italic>Q</italic>
<sub>3</sub>
+1.5∗
<italic>I</italic>
<italic>Q</italic>
<italic>R</italic>
as the regions without very high local exceptional symmetry (non-outlying regions).</p>
</list-item>
</list>
</p>
</sec>
<sec id="Sec9">
<title>Functional annotation enrichments</title>
<p>Using BioMart, we extracted the annotation information for Homo sapiens genes (GRCh37.p13) dataset from the Ensembl Genes database. To examine the functional annotation enrichments of outlying regions vs non-outlying regions we computed the annotation density ratio (
<italic>A</italic>
<italic>D</italic>
<italic>R</italic>
) for each chromosome, defined by
<disp-formula id="Equ3">
<label>(3)</label>
<alternatives>
<tex-math id="M35">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \mathit{ADR}=\frac{\frac{n^{A}_{\text{outlying segments}}}{\sum{\text{outlying segments length}}}}{\frac{n^{A}_{\text{non-outlying segments}}}{\sum{\text{non outlying segments length}}}} $$ \end{document}</tex-math>
<mml:math id="M36">
<mml:mi mathvariant="italic">ADR</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>outlying segments</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
<mml:mtext>outlying segments length</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mtext>non-outlying segments</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
<mml:mtext>non outlying segments length</mml:mtext>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mfrac>
</mml:math>
<graphic xlink:href="12859_2016_905_Article_Equ3.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<inline-formula id="IEq12">
<alternatives>
<tex-math id="M37">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}${n^{A}_{S}}$\end{document}</tex-math>
<mml:math id="M38">
<mml:msubsup>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq12.gif"></inline-graphic>
</alternatives>
</inline-formula>
denotes the number of annotations in the subset
<italic>S</italic>
. We used the chi-square test to evaluate if the annotations are equally distributed in the two subsets. To better evaluate the diferences between both subsets, we used the adjusted residual analysis. Under the homogeneity hypothesis the adjusted residuals have a standard normal distribution [
<xref ref-type="bibr" rid="CR16">16</xref>
].</p>
</sec>
</sec>
<sec id="Sec10">
<title>Results and discussion</title>
<p>In this study, we analysed local exceptional word symmetry in the complete human genome. In particular, we analysed words of lengths up to 10 in all human chromosomes. We performed a sliding window analysis in terms of exceptional symmetry (
<italic>V</italic>
<italic>R</italic>
). We obtained, when possible, results for the following window sizes: 10
<sup>
<italic>l</italic>
</sup>
, 2×10
<sup>
<italic>l</italic>
</sup>
, 5×10
<sup>
<italic>l</italic>
</sup>
base pairs, with
<italic>l</italic>
∈{3,4,5,6,7,8}.</p>
<p>We performed our analysis using five ACGT sequence types: real human chromosomes, and corresponding simulated sequences generated according to four distinct random scenarios. For each fixed window size and word length we determined the exceptional symmetry (
<italic>V</italic>
<italic>R</italic>
) and symmetry
<inline-formula id="IEq13">
<alternatives>
<tex-math id="M39">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left ({X^{2}_{s}}\right)$\end{document}</tex-math>
<mml:math id="M40">
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12859_2016_905_Article_IEq13.gif"></inline-graphic>
</alternatives>
</inline-formula>
values. Each of these experiments is characterized by median and median absolute deviation values.</p>
<p>To evaluate the effect of chromosome type, window size and word length on the local exceptional symmetry behaviour, we considered the window
<italic>V</italic>
<italic>R</italic>
median values of each ACGT sequence (chromosomes or corresponding random chromosomes).</p>
<p>Figure
<xref rid="Fig1" ref-type="fig">1</xref>
shows five boxplots; one for each sequence type. The local exceptional symmetry in the human genome is clearly higher than in the random scenarios produced without exceptional symmetry (rnd and sym), but globally the effect is similar to random sequences generated with first order Markov models (mrnd and msym).
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Local exceptional symmetry by sequence type in the human genome. Box plots comparing the local exceptional symmetry values (
<italic>V</italic>
<italic>R</italic>
median) using all chromosomes, window length, and word length results, separated by: human chromosomes (human), random scenarios with first order structure (without exact symmetry: mrnd, and with exact symmetry for k = 2: msym), and random scenarios assuming nucleotide independence (without exact symmetry: rnd, and with exact symmetry for k = 1: sym). The local exceptional symmetry in the human genome and positive control experiments (mrnd and msym) is higher than in the random scenarios produced without exceptional symmetry (rnd and sym)</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p>Local exceptional symmetry has no significant differences between chromosomes (Kruskal-Wallis test
<italic>p</italic>
≫0.1). Figure
<xref rid="Fig2" ref-type="fig">2</xref>
presents the results of the local exceptional symmetry using boxplots for comparing the various human chromosomes. The similarity of the chromosomes results is easily observed in the plot.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Local exceptional symmetry by human chromosome. Box plots comparing the local exceptional symmetry values (
<italic>V</italic>
<italic>R</italic>
median) by human chromosomes for all window and word lengths. The results are similar across all chromosomes</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p>Figure
<xref rid="Fig3" ref-type="fig">3</xref>
plots the median of the local exceptional symmetry values by word length using all chromosomes and all window size data. Excluding
<italic>k</italic>
=2, the local exceptional symmetry and the corresponding dispersion decrease with increasing word length. The effect of word length in local exceptional symmetry has significantly different behaviours in human and random scenarios (random and symmetric random). As was expected, for shorter word lengths we obtain higher local exceptional symmetry values in random scenarios with first order dependence structure, but for
<italic>k</italic>
≥7 the human chromosomes surpass the random values. In Fig.
<xref rid="Fig3" ref-type="fig">3</xref>
all chromosomes results are combined, but the local exceptional behaviour is also present in each chromosome.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Local exceptional symmetry by word length. Line plot comparing the median of local exceptional symmetry values (
<italic>V</italic>
<italic>R</italic>
median) by word length, for human and random scenarios. The sym and rnd scenarios overlap. The local exceptional symmetry in the human genome decreases with increasing word length, excluding
<italic>k</italic>
=2. The effect of word length in local exceptional symmetry is significantly different in human and random scenarios. For
<italic>k</italic>
≥7 the local exceptional symmetry in the human genome surpasses that obtained by positive control scenarios</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<sec id="Sec11">
<title>Window size effect</title>
<p>Figure
<xref rid="Fig4" ref-type="fig">4</xref>
shows the local exceptional symmetry values by window size. In the presence of exceptional symmetry, the local exceptional symmetry values increase with the window size, as was expected.
<fig id="Fig4">
<label>Fig. 4</label>
<caption>
<p>Local exceptional symmetry by window length. Line plot comparing the median of local exceptional symmetry values (
<italic>V</italic>
<italic>R</italic>
median) by window length, for human and random scenarios. The local exceptional symmetry increases with increasing window length, except for the negative control scenarios (sym and rnd)</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig4_HTML" id="MO4"></graphic>
</fig>
</p>
<p>The random sequences msym and mrand were generated with forced exceptional symmetry under stationary behaviour, and an increasing tendency was observed on their local exceptional symmetry values as a function of the window size. For the random sequences without exceptional symmetry (sym and rnd) the local
<italic>V</italic>
<italic>R</italic>
values are nearly constant (see Fig.
<xref rid="Fig4" ref-type="fig">4</xref>
).</p>
<p>All human chromosomes exhibit increased exceptional symmetry with increasing window lengths. In general, the behaviour is similar to the random sequences with first order Markov structure. However, we can observe higher values in the first order Markov sequences than in the human sequences.</p>
</sec>
<sec id="Sec12">
<title>Local exceptional symmetry stationarity</title>
<p>In order to find the window size and the word length with the highest potential to show distinct local behaviour along the sequence, we explored the stationarity using the procedure described previously. Figure
<xref rid="Fig5" ref-type="fig">5</xref>
presents a heat map of the results of the Kolmogorov-Smirnov statistic by word length and by window size in the human genome, obtained with
<italic>V</italic>
<italic>R</italic>
subsequences with length 200. The results obtained with
<italic>V</italic>
<italic>R</italic>
subsequences with length 50 and 100 are similar to these (not shown). The human genome shows non stationary local exceptional symmetry behaviour. Local results are distinct from the global. We observe the maximum value for
<italic>k</italic>
=7 and for window size equal to 20,000 base pairs. The second highest value is obtained for
<italic>k</italic>
=6 and window size equal to 10,000 base pairs.
<fig id="Fig5">
<label>Fig. 5</label>
<caption>
<p>Heat map for Kolmogorov-Smirnov statistics. Kolmogorov-Smirnov statistics by word length and window size, using
<italic>V</italic>
<italic>R</italic>
subsequences of length 200, for the complete human genome. The human genome shows non-stationary local exceptional symmetry behaviour, with maximum value for
<italic>k</italic>
=7 and window length equal to 20,000 bp</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig5_HTML" id="MO5"></graphic>
</fig>
</p>
</sec>
<sec id="Sec13">
<title>Local exceptional ECG symmetry</title>
<p>As
<italic>G</italic>
<sub>0</sub>
and
<italic>G</italic>
<sub>
<italic>k</italic>
</sub>
are the sets with fewer elements, higher variability in
<italic>V</italic>
<italic>R</italic>
results is expected, and this was confirmed in all sequences under study (results not shown). We verified that almost all human ECGs have higher
<italic>V</italic>
<italic>R</italic>
values than the random scenarios.</p>
<p>Figure
<xref rid="Fig6" ref-type="fig">6</xref>
shows the comparison of the human and random local ECG exceptional symmetry results for word length 7 and window length 20,000. In the human genome, the ECG
<italic>G</italic>
<sub>7</sub>
has the highest local exceptional symmetry values (and dispersion). Surprisingly, the human
<italic>G</italic>
<sub>0</sub>
has lower median
<italic>V</italic>
<italic>R</italic>
values than the random sequences that incorporate exceptional symmetry.
<fig id="Fig6">
<label>Fig. 6</label>
<caption>
<p>Local exceptional symmetry by ECG. Line plot comparing local exceptional symmetry median values by ECG for
<italic>k</italic>
=7 and 20,000 bp window length in the human genome and random scenarios. ECG
<italic>G</italic>
<sub>7</sub>
has the highest local exceptional symmetry values and
<italic>G</italic>
<sub>0</sub>
has the lowest</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig6_HTML" id="MO6"></graphic>
</fig>
</p>
</sec>
<sec id="Sec14">
<title>Segmentation</title>
<p>We have observed exceptional symmetry throughout the genome, including coding and non-coding regions. Figure
<xref rid="Fig7" ref-type="fig">7</xref>
shows the chromosome median exceptional symmetry values for
<italic>k</italic>
=7 and window length 20,000, divided in two sets: the coding neighbourhood windows (70,646
<italic>V</italic>
<italic>R</italic>
subsequences), and the non-coding windows (72,543
<italic>V</italic>
<italic>R</italic>
subsequences). The coding neighbourhood windows show significantly higher
<italic>V</italic>
<italic>R</italic>
values than non-coding windows (
<italic>p</italic>
<0.001, z-test). However, the effect size of the difference is small (Cohen’s
<italic>d</italic>
≈0.2). Figure
<xref rid="Fig8" ref-type="fig">8</xref>
presents box plots comparing the local exceptional symmetry median values for
<italic>k</italic>
=7 and 20,000 bp window length in the five isochore families: L1, L2, H1, H2, H3. The exceptional symmetry effect between H and L isochores show strong and significant diferences (
<italic>p</italic>
<0.001,
<italic>d</italic>
>0.8).
<fig id="Fig7">
<label>Fig. 7</label>
<caption>
<p>Chromosome median exceptional symmetry values. Box plots comparing the local exceptional symmetry median values by chromosome in coding neighbourhood windows and in non-coding windows. The analysis was performed with
<italic>k</italic>
=7 and 20,000 bp window length. Chromosome 19 has the highest median
<italic>V</italic>
<italic>R</italic>
value in both window sets. The coding neighbourhood windows show significantly higher
<italic>V</italic>
<italic>R</italic>
values than non-coding windows</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig7_HTML" id="MO7"></graphic>
</fig>
<fig id="Fig8">
<label>Fig. 8</label>
<caption>
<p>Chromosome median exceptional symmetry values. Box plots comparing the local exceptional symmetry median values by chromosome in five isochore families: L1, L2, H1, H2 and H3. The analysis was performed with
<italic>k</italic>
=7 and 20,000 bp window length. Chromosome 19 has outlying median
<italic>V</italic>
<italic>R</italic>
values in all isochore families. The H isochores show significantly higher
<italic>V</italic>
<italic>R</italic>
values than L isochores</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig8_HTML" id="MO8"></graphic>
</fig>
</p>
<p>Additionally, there are several windows with strong outlying behaviour. We applied the outlier detection procedure described previously. As an example, Fig.
<xref rid="Fig9" ref-type="fig">9</xref>
shows the local symmetry results for chromosome 1. Table
<xref rid="Tab1" ref-type="table">1</xref>
shows the percentage of outlying segments by chromosome. In all chromosomes, the percentage of outliers is less than 10 %.
<fig id="Fig9">
<label>Fig. 9</label>
<caption>
<p>Chromosome 1 segmentation. Chromosome 1
<italic>V</italic>
<italic>R</italic>
results for
<italic>k</italic>
=7 and 20,000 bp window size. The horizontal line shows the threshold for segmentation into very high local exceptional symmetry regions (outliers) and complementary regions (non-outliers)</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig9_HTML" id="MO9"></graphic>
</fig>
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Outlying segments description by human chromosome for
<italic>k</italic>
=7 and 20,000 window size</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Chr</th>
<th align="left">1</th>
<th align="left">2</th>
<th align="left">3</th>
<th align="left">4</th>
<th align="left">5</th>
<th align="left">6</th>
<th align="left">7</th>
<th align="left">8</th>
<th align="left">9</th>
<th align="left">10</th>
<th align="left">11</th>
<th align="left">12</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Outlying segments %</td>
<td align="left">4.8</td>
<td align="left">6.1</td>
<td align="left">6.4</td>
<td align="left">7.6</td>
<td align="left">7.7</td>
<td align="left">6.5</td>
<td align="left">9.1</td>
<td align="left">6.8</td>
<td align="left">4.7</td>
<td align="left">5.3</td>
<td align="left">6.0</td>
<td align="left">6.5</td>
</tr>
<tr>
<td align="left">
<italic>A</italic>
<italic>D</italic>
<italic>R</italic>
</td>
<td align="left">3.5</td>
<td align="left">3.9</td>
<td align="left">4.1</td>
<td align="left">4.3</td>
<td align="left">3.8</td>
<td align="left">3.6</td>
<td align="left">2.9</td>
<td align="left">3.5</td>
<td align="left">4.0</td>
<td align="left">3.3</td>
<td align="left">3.3</td>
<td align="left">6.9</td>
</tr>
<tr>
<td align="left">
<italic>χ</italic>
<sup>2</sup>
test (p-value)</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
</tr>
<tr>
<td align="left">
<italic>ϕ</italic>
∗100 %</td>
<td align="left">9</td>
<td align="left">9</td>
<td align="left">10</td>
<td align="left">14</td>
<td align="left">13</td>
<td align="left">14</td>
<td align="left">12</td>
<td align="left">14</td>
<td align="left">13</td>
<td align="left">12</td>
<td align="left">13</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">Chr</td>
<td align="left">13</td>
<td align="left">14</td>
<td align="left">15</td>
<td align="left">16</td>
<td align="left">17</td>
<td align="left">18</td>
<td align="left">19</td>
<td align="left">20</td>
<td align="left">21</td>
<td align="left">22</td>
<td align="left">X</td>
<td align="left">Y</td>
</tr>
<tr>
<td align="left">Outlying segments %</td>
<td align="left">5.6</td>
<td align="left">4.5</td>
<td align="left">4.4</td>
<td align="left">3.5</td>
<td align="left">3.0</td>
<td align="left">6.7</td>
<td align="left">1.5</td>
<td align="left">5.5</td>
<td align="left">5.0</td>
<td align="left">2.6</td>
<td align="left">8.6</td>
<td align="left">7.0</td>
</tr>
<tr>
<td align="left">
<italic>A</italic>
<italic>D</italic>
<italic>R</italic>
</td>
<td align="left">5.5</td>
<td align="left">4.3</td>
<td align="left">3.8</td>
<td align="left">3.1</td>
<td align="left">2.2</td>
<td align="left">4.5</td>
<td align="left">1.2</td>
<td align="left">2.5</td>
<td align="left">5.2</td>
<td align="left">2.5</td>
<td align="left">3.8</td>
<td align="left">1.5</td>
</tr>
<tr>
<td align="left">
<italic>χ</italic>
<sup>2</sup>
test (p-value)</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">**</td>
<td align="left">0.003</td>
<td align="left">**</td>
<td align="left">1</td>
<td align="left">**</td>
<td align="left">0.038</td>
</tr>
<tr>
<td align="left">
<italic>ϕ</italic>
∗100</td>
<td align="left">16</td>
<td align="left">15</td>
<td align="left">11</td>
<td align="left">7</td>
<td align="left">6</td>
<td align="left">14</td>
<td align="left">8</td>
<td align="left">9</td>
<td align="left">15</td>
<td align="left">8</td>
<td align="left">14</td>
<td align="left">19</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The p-values are adjusted for Holm–Bonferroni method</p>
<p>
<sup>**</sup>
means that the p-value is lower than 0.001</p>
<p>
<italic>A</italic>
<italic>D</italic>
<italic>R</italic>
- annotation density ratio;
<italic>χ</italic>
<sup>2</sup>
test - p-value of chi-square test;
<italic>ϕ</italic>
- phi measure</p>
</table-wrap-foot>
</table-wrap>
</p>
</sec>
<sec id="Sec15">
<title>Annotation results</title>
<p>To characterize the chromosome features associated with the outlying segments of local exceptional symmetry, we have performed annotation enrichment analyses. Table
<xref rid="Tab1" ref-type="table">1</xref>
presents the annotation density ratio (
<italic>A</italic>
<italic>D</italic>
<italic>R</italic>
, Eq.
<xref rid="Equ3" ref-type="">3</xref>
) of the outlying segments vs non-outlying segments by chromosome, as defined by the DNA segmentation procedure described in the ‘
<xref rid="Sec2" ref-type="sec">Methods</xref>
’ section. We observe that in the human genome the
<italic>A</italic>
<italic>D</italic>
<italic>R</italic>
values are higher than 1 for all chromosomes, which means that the density of annotation in outlying segments is higher than in non-outlying segments (average value equal to 3.6 and standard deviation equal to 1.2). Table
<xref rid="Tab1" ref-type="table">1</xref>
also shows the p-values of the chi-square test for homogeneity of annotation types between outlying and non-outlying segments. The p-values were adjusted using the Holm–Bonferroni method [
<xref ref-type="bibr" rid="CR17">17</xref>
]. Almost all chromosomes display significant diferences in annotation between segment types (outlying vs non-outlying). In chromosome 22, however, the difference was not considered significant, perhaps due to the low percentage of outlying segments and chromosome size. Still, the dissimilarity effect between outlying and non-outlying annotations is present in all chromosomes (phi measure (
<italic>ϕ</italic>
) range between 0.06 to 0.19).</p>
<p>Figure
<xref rid="Fig10" ref-type="fig">10</xref>
presents a heat map with the adjusted residuals of the homogeneity in gene type annotation using all chromosome sequences. The counts of protein-coding gene annotations in outlying segments are significantly larger than expected (adjusted residual equal to 37.7), whereas in non-outlying segments long intergenic non-coding RNAs (lincRNA), microRNAs (miRNAs), antisense and pseudogene annotations predominate (adjusted residuals equal to 22.0, 9.6, 15.5 and 20.3, respectively).
<fig id="Fig10">
<label>Fig. 10</label>
<caption>
<p>Association residual analysis between gene type annotation and segment type. Heat map of adjusted residuals by each gene and segment type for all human chromosomes data. The counts of protein-coding gene annotations in outlying segments are significantly larger than expected, whereas in non-outlying segments long intergenic non-coding RNAs (lincRNA), microRNAs (miRNAs), antisense and pseudogene annotations predominate</p>
</caption>
<graphic xlink:href="12859_2016_905_Fig10_HTML" id="MO10"></graphic>
</fig>
</p>
</sec>
</sec>
<sec id="Sec16" sec-type="conclusion">
<title>Conclusion</title>
<p>The local exceptional symmetry profile provides a numerical signature along genomic sequences. The proposed procedure to analyse local exceptional symmetry in the human genome can be applied to any genomic sequence as a segmentation procedure and also as a genomic signature. The results obtained in this work suggest that for the human genome there is an optimal word length and window size to explore the local exceptional symmetry (7 and 20,000 bp, respectively).</p>
<p>The local exceptional symmetry in the human genome is very dissimilar from random scenarios (both with independent symbols or first order Markov structure) showing, as expected, a non-stationary behaviour. Globally, the human genome exhibits high local exceptional symmetry values, which for some word lengths are lower than the values for positive control experiments, but higher than the values for negative control experiments.</p>
<p>The global statistical pattern (location and dispersion values), which is obtained from the exceptional symmetry profiles, is present in all chromosomes of the human genome. The local profile is chromosome specific and the regions with very high exceptional symmetry values are strongly associated with the presence of protein coding genes, although non-coding regions also present exceptional symmetry. Additionally, the local exceptional symmetry values are positively correlated with the GC content as defined by isochore families.</p>
</sec>
</body>
<back>
<fn-group>
<fn>
<p>
<bold>Competing interests</bold>
</p>
<p>The authors declare that they have no competing interests.</p>
</fn>
<fn>
<p>
<bold>Authors’ contributions</bold>
</p>
<p>VA idea for study conception/procedures, statistical analysis of local exceptional symmetry values, BioMart data acquisition, interpretation of data results and writing paper. JMOSR coding and optimization of programming procedures to obtain local exceptional symmetry values and critically revising the paper. CACB critically discussing the procedures presented, generating random sequences and critically revising the paper. RMS critically discuss the use of genomic annotation to evaluate the local exceptional symmetry profiles, critically revising the paper. All authors read and approved the final manuscript.</p>
</fn>
</fn-group>
<ack>
<title>Acknowledgements</title>
<p>This work was supported by Portuguese funds through the iBiMED - Institute of Biomedicine, IEETA - Institute of Electronics and Telematics Engineering of Aveiro and the Portuguese Foundation for Science and Technology (“FCT–Fundação para a Ciência e a Tecnologia”), within projects: COMPETE/FEDER UID/BIM/04501/2013 and PEst-OE/EEI/UI0127/2014.</p>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chargaff</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Chemical specificity of nucleic acids and mechanism of their enzymatic degradation</article-title>
<source>Experientia</source>
<year>1950</year>
<volume>6</volume>
<issue>6</issue>
<fpage>201</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1007/BF02173653</pub-id>
<pub-id pub-id-type="pmid">15421335</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2</label>
<mixed-citation publication-type="other">Rudner R, Karkas JD, Chargaff E. Proc Nat Acad Sci USA. 1968; 60(2):630–5.</mixed-citation>
</ref>
<ref id="CR3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karkas</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Rudner</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Chargaff</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Separation of B, subtilis DNA into complementary strands. II. template functions and composition as determined by transcription with RNA polymerase</article-title>
<source>Proc Nat Acad Sci USA</source>
<year>1968</year>
<volume>60</volume>
<issue>3</issue>
<fpage>915</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.60.3.915</pub-id>
<pub-id pub-id-type="pmid">4970113</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4</label>
<mixed-citation publication-type="other">Rudner R, Karkas JD, Chargaff E. Proc Nat Acad Sci USA. 1968; 60(3):921–2.</mixed-citation>
</ref>
<ref id="CR5">
<label>5</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Forsdyke</surname>
<given-names>DR</given-names>
</name>
</person-group>
<source>Evolutionary Bioinformatics</source>
<year>2011</year>
<publisher-loc>New York</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="CR6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Qi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cuticchia</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>Compositional symmetries in complete genomes</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<issue>6</issue>
<fpage>557</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/17.6.557</pub-id>
<pub-id pub-id-type="pmid">11395434</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kong</surname>
<given-names>SG</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>HD</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>ZT</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Inverse symmetry in complete genomes and whole-genome inverse duplication</article-title>
<source>PLoS ONE</source>
<year>2009</year>
<volume>4</volume>
<issue>11</issue>
<fpage>7553</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0007553</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>YZ</given-names>
</name>
</person-group>
<article-title>Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>4</issue>
<fpage>478</fpage>
<lpage>85</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp703</pub-id>
<pub-id pub-id-type="pmid">20031973</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Forsdyke</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Bell</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>Purine loading, stem-loops and Chargaff’s second parity rule: a discussion of the application of elementary principles to early chemical observations</article-title>
<source>Appl Bioinformatics</source>
<year>2004</year>
<volume>3</volume>
<issue>1</issue>
<fpage>3</fpage>
<lpage>8</lpage>
<pub-id pub-id-type="doi">10.2165/00822942-200403010-00002</pub-id>
<pub-id pub-id-type="pmid">16323961</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baisnée</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Hampson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Baldi</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Why are complementary DNA strands symmetric?</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<issue>8</issue>
<fpage>1021</fpage>
<lpage>33</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/18.8.1021</pub-id>
<pub-id pub-id-type="pmid">12176825</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Albrecht-Buehler</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Inversions and inverted transpositions as the basis for an almost universal “format” of genome sequences</article-title>
<source>Genomics</source>
<year>2007</year>
<volume>90</volume>
<fpage>297</fpage>
<lpage>305</lpage>
<pub-id pub-id-type="doi">10.1016/j.ygeno.2007.05.010</pub-id>
<pub-id pub-id-type="pmid">17582735</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lobry</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Lobry</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Evolution of dna base composition under no-strand-bias conditions when the substitution rates are not constant</article-title>
<source>Mol Biol Evol</source>
<year>1999</year>
<volume>16</volume>
<fpage>719</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="doi">10.1093/oxfordjournals.molbev.a026156</pub-id>
<pub-id pub-id-type="pmid">10368950</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Powdel</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Satapathy</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Jha</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Buragohain</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Borah</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (chargaff’s second parity rule)</article-title>
<source>DNA Res</source>
<year>2009</year>
<volume>16</volume>
<fpage>325</fpage>
<lpage>43</lpage>
<pub-id pub-id-type="doi">10.1093/dnares/dsp021</pub-id>
<pub-id pub-id-type="pmid">19861381</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Afreixo</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Rodrigues</surname>
<given-names>JAMOS</given-names>
</name>
<name>
<surname>Bastos</surname>
<given-names>CAC</given-names>
</name>
</person-group>
<article-title>Analysis of single-strand exceptional word symmetry in the human genome: new measures</article-title>
<source>Biostatistics</source>
<year>2015</year>
<volume>16</volume>
<issue>2</issue>
<fpage>209</fpage>
<lpage>21</lpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxu041</pub-id>
<pub-id pub-id-type="pmid">25190514</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cozzi</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Milanesi</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Bernardi</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Segmenting the human genome into isochores</article-title>
<source>Evol Bioinformatics</source>
<year>2015</year>
<volume>11</volume>
<fpage>253</fpage>
<lpage>61</lpage>
</element-citation>
</ref>
<ref id="CR16">
<label>16</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Agresti</surname>
<given-names>A</given-names>
</name>
</person-group>
<source>Categorical Data Analysis</source>
<year>2002</year>
<publisher-loc>New York</publisher-loc>
<publisher-name>Wiley</publisher-name>
</element-citation>
</ref>
<ref id="CR17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holm</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>A simple sequentially rejective multiple test procedure</article-title>
<source>Scand J Stat</source>
<year>1979</year>
<volume>6</volume>
<issue>2</issue>
<fpage>65</fpage>
<lpage>70</lpage>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000109 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000109 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4738807
   |texte=   The exceptional genomic word symmetry along DNA sequences
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26842742" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024