Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Estimating the Entropy of DNA Sequences

Identifieur interne : 000926 ( Istex/Curation ); précédent : 000925; suivant : 000927

Estimating the Entropy of DNA Sequences

Auteurs : Armin O. Schmitt [Allemagne] ; Hanspeter Herzel [Allemagne]

Source :

RBID : ISTEX:52BD28F5F3B717D185388B831C873D341ECDEF5D

English descriptors

Abstract

Abstract: The Shannon entropy is a standard measure for the order state of symbol sequences, such as, for example, DNA sequences. In order to incorporate correlations between symbols, the entropy ofn-mers (consecutive strands ofnsymbols) has to be determined. Here, an assay is presented to estimate such higher order entropies (block entropies) for DNA sequences when the actual number of observations is small compared with the number of possible outcomes. Then-mer probability distribution underlying the dynamical process is reconstructed using elementary statistical principles: The theorem of asymptotic equi-distribution and the Maximum Entropy Principle. Constraints are set to force the constructed distributions to adopt features which are characteristic for the real probability distribution. From the many solutions compatible with these constraints the one with the highest entropy is the most likely one according to the Maximum Entropy Principle. An algorithm performing this procedure is expounded. It is tested by applying it to various DNA model sequences whose exact entropies are known. Finally, results for a real DNA sequence, the complete genome of the Epstein Barr virus, are presented and compared with those of other information carriers (texts, computer source code, music). It seems as if DNA sequences possess much more freedom in the combination of the symbols of their alphabet than written language or computer source codes.

Url:
DOI: 10.1006/jtbi.1997.0493

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:52BD28F5F3B717D185388B831C873D341ECDEF5D

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Estimating the Entropy of DNA Sequences</title>
<author>
<name sortKey="Schmitt, Armin O" sort="Schmitt, Armin O" uniqKey="Schmitt A" first="Armin O." last="Schmitt">Armin O. Schmitt</name>
<affiliation wicri:level="1">
<mods:affiliation>MPI für molekulare Genetik, Ihnestraße 73, D-14195, Berlin, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>MPI für molekulare Genetik, Ihnestraße 73, D-14195, Berlin</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Herzel, Hanspeter" sort="Herzel, Hanspeter" uniqKey="Herzel H" first="Hanspeter" last="Herzel">Hanspeter Herzel</name>
<affiliation wicri:level="1">
<mods:affiliation>Institut für Theoretiche Biologie, Humboldt-Universität zu Berlin, Invalidenstraße 43, D-10115, Berlin, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institut für Theoretiche Biologie, Humboldt-Universität zu Berlin, Invalidenstraße 43, D-10115, Berlin</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:52BD28F5F3B717D185388B831C873D341ECDEF5D</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1006/jtbi.1997.0493</idno>
<idno type="url">https://api.istex.fr/ark:/67375/6H6-9S6W0QGK-2/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000926</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000926</idno>
<idno type="wicri:Area/Istex/Curation">000926</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Estimating the Entropy of DNA Sequences</title>
<author>
<name sortKey="Schmitt, Armin O" sort="Schmitt, Armin O" uniqKey="Schmitt A" first="Armin O." last="Schmitt">Armin O. Schmitt</name>
<affiliation wicri:level="1">
<mods:affiliation>MPI für molekulare Genetik, Ihnestraße 73, D-14195, Berlin, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>MPI für molekulare Genetik, Ihnestraße 73, D-14195, Berlin</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Herzel, Hanspeter" sort="Herzel, Hanspeter" uniqKey="Herzel H" first="Hanspeter" last="Herzel">Hanspeter Herzel</name>
<affiliation wicri:level="1">
<mods:affiliation>Institut für Theoretiche Biologie, Humboldt-Universität zu Berlin, Invalidenstraße 43, D-10115, Berlin, Germany</mods:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institut für Theoretiche Biologie, Humboldt-Universität zu Berlin, Invalidenstraße 43, D-10115, Berlin</wicri:regionArea>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Journal of Theoretical Biology</title>
<title level="j" type="abbrev">YJTBI</title>
<idno type="ISSN">0022-5193</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">188</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="369">369</biblScope>
<biblScope unit="page" to="377">377</biblScope>
</imprint>
<idno type="ISSN">0022-5193</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0022-5193</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="Teeft" xml:lang="en">
<term>Absolute values</term>
<term>Academic press</term>
<term>Amino</term>
<term>Amino acid</term>
<term>Amino acids</term>
<term>Average occurrence</term>
<term>Average values</term>
<term>Base pairs</term>
<term>Block entropies</term>
<term>Block entropy</term>
<term>Block length</term>
<term>Chromosome</term>
<term>Complete sequence</term>
<term>Computer languages</term>
<term>Computer source code</term>
<term>Computer source codes</term>
<term>Constraint</term>
<term>Continuous function</term>
<term>Correction method</term>
<term>Entropy</term>
<term>Entropy value</term>
<term>Entropy values</term>
<term>Equal probability</term>
<term>Expectation value</term>
<term>Experimental values</term>
<term>Frequent words</term>
<term>Genetic code</term>
<term>Gray code</term>
<term>Herzel</term>
<term>Highest entropy</term>
<term>Highest ranks</term>
<term>Identical fragments</term>
<term>Information theory</term>
<term>Literary texts</term>
<term>Maximal entropy</term>
<term>Maximal value</term>
<term>Maximum entropy</term>
<term>Maximum entropy principle</term>
<term>Mcmillan theorem</term>
<term>Model string</term>
<term>Nite</term>
<term>Nite sample</term>
<term>Nite size</term>
<term>Other hand</term>
<term>Possible combinations</term>
<term>Previous section</term>
<term>Probability distribution</term>
<term>Probability distributions</term>
<term>Protein sequences</term>
<term>Random positions</term>
<term>Random sequence</term>
<term>Random sequences</term>
<term>Real sequences</term>
<term>Relative frequencies</term>
<term>Repetitive sequences</term>
<term>Same length</term>
<term>Small changes</term>
<term>Small samples</term>
<term>Solitons fractals</term>
<term>Standard words</term>
<term>Symbol sequences</term>
<term>Theoretical values</term>
<term>Virus genome</term>
<term>Yeast</term>
<term>Yeast chromosome</term>
<term>Yeast sequence</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The Shannon entropy is a standard measure for the order state of symbol sequences, such as, for example, DNA sequences. In order to incorporate correlations between symbols, the entropy ofn-mers (consecutive strands ofnsymbols) has to be determined. Here, an assay is presented to estimate such higher order entropies (block entropies) for DNA sequences when the actual number of observations is small compared with the number of possible outcomes. Then-mer probability distribution underlying the dynamical process is reconstructed using elementary statistical principles: The theorem of asymptotic equi-distribution and the Maximum Entropy Principle. Constraints are set to force the constructed distributions to adopt features which are characteristic for the real probability distribution. From the many solutions compatible with these constraints the one with the highest entropy is the most likely one according to the Maximum Entropy Principle. An algorithm performing this procedure is expounded. It is tested by applying it to various DNA model sequences whose exact entropies are known. Finally, results for a real DNA sequence, the complete genome of the Epstein Barr virus, are presented and compared with those of other information carriers (texts, computer source code, music). It seems as if DNA sequences possess much more freedom in the combination of the symbols of their alphabet than written language or computer source codes.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000926 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 000926 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:52BD28F5F3B717D185388B831C873D341ECDEF5D
   |texte=   Estimating the Entropy of DNA Sequences
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021