Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.

Identifieur interne : 001150 ( Ncbi/Checkpoint ); précédent : 001149; suivant : 001151

Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.

Auteurs : Tiee-Jian Wu [Taïwan] ; Ying-Hsueh Huang ; Lung-An Li

Source :

RBID : pubmed:16144805

Descripteurs français

English descriptors

Abstract

Several measures of DNA sequence dissimilarity have been developed. The purpose of this paper is 3-fold. Firstly, we compare the performance of several word-based or alignment-based methods. Secondly, we give a general guideline for choosing the window size and determining the optimal word sizes for several word-based measures at different window sizes. Thirdly, we use a large-scale simulation method to simulate data from the distribution of SK-LD (symmetric Kullback-Leibler discrepancy). These simulated data can be used to estimate the degree of dissimilarity beta between any pair of DNA sequences.

DOI: 10.1093/bioinformatics/bti658
PubMed: 16144805


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:16144805

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.</title>
<author>
<name sortKey="Wu, Tiee Jian" sort="Wu, Tiee Jian" uniqKey="Wu T" first="Tiee-Jian" last="Wu">Tiee-Jian Wu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Statistics, National Cheng-Kung University, Tainan, Taiwan. tjwu@stat.ncku.edu.tw</nlm:affiliation>
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Statistics, National Cheng-Kung University, Tainan</wicri:regionArea>
<wicri:noRegion>Tainan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Huang, Ying Hsueh" sort="Huang, Ying Hsueh" uniqKey="Huang Y" first="Ying-Hsueh" last="Huang">Ying-Hsueh Huang</name>
</author>
<author>
<name sortKey="Li, Lung An" sort="Li, Lung An" uniqKey="Li L" first="Lung-An" last="Li">Lung-An Li</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:16144805</idno>
<idno type="pmid">16144805</idno>
<idno type="doi">10.1093/bioinformatics/bti658</idno>
<idno type="wicri:Area/PubMed/Corpus">002552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002552</idno>
<idno type="wicri:Area/PubMed/Curation">002552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002552</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002593</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002593</idno>
<idno type="wicri:Area/Ncbi/Merge">001150</idno>
<idno type="wicri:Area/Ncbi/Curation">001150</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001150</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.</title>
<author>
<name sortKey="Wu, Tiee Jian" sort="Wu, Tiee Jian" uniqKey="Wu T" first="Tiee-Jian" last="Wu">Tiee-Jian Wu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Statistics, National Cheng-Kung University, Tainan, Taiwan. tjwu@stat.ncku.edu.tw</nlm:affiliation>
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Statistics, National Cheng-Kung University, Tainan</wicri:regionArea>
<wicri:noRegion>Tainan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Huang, Ying Hsueh" sort="Huang, Ying Hsueh" uniqKey="Huang Y" first="Ying-Hsueh" last="Huang">Ying-Hsueh Huang</name>
</author>
<author>
<name sortKey="Li, Lung An" sort="Li, Lung An" uniqKey="Li L" first="Lung-An" last="Li">Lung-An Li</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="ISSN">1367-4803</idno>
<imprint>
<date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Computer Simulation</term>
<term>Computers</term>
<term>DNA (chemistry)</term>
<term>Databases, Genetic</term>
<term>Databases, Protein</term>
<term>Escherichia coli (genetics)</term>
<term>Genes, Bacterial</term>
<term>Genome</term>
<term>Humans</term>
<term>Lipoprotein Lipase (genetics)</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Oligonucleotide Array Sequence Analysis</term>
<term>Oligonucleotide Probes (chemistry)</term>
<term>Open Reading Frames</term>
<term>Pattern Recognition, Automated</term>
<term>Phylogeny</term>
<term>SARS Virus (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Shigella flexneri (genetics)</term>
<term>Software</term>
<term>Species Specificity</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN ()</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données de protéines</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique ()</term>
<term>Cadres ouverts de lecture</term>
<term>Escherichia coli (génétique)</term>
<term>Gènes bactériens</term>
<term>Génome</term>
<term>Humains</term>
<term>Lipoprotein lipase (génétique)</term>
<term>Logiciel</term>
<term>Modèles génétiques</term>
<term>Modèles statistiques</term>
<term>Ordinateurs</term>
<term>Phylogénie</term>
<term>Reconnaissance automatique des formes</term>
<term>Shigella flexneri (génétique)</term>
<term>Simulation numérique</term>
<term>Sondes oligonucléotidiques ()</term>
<term>Spécificité d'espèce</term>
<term>Séquençage par oligonucléotides en batterie</term>
<term>Virus du SRAS (génétique)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>DNA</term>
<term>Oligonucleotide Probes</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Escherichia coli</term>
<term>Lipoprotein Lipase</term>
<term>SARS Virus</term>
<term>Shigella flexneri</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Escherichia coli</term>
<term>Lipoprotein lipase</term>
<term>Shigella flexneri</term>
<term>Virus du SRAS</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Computer Simulation</term>
<term>Computers</term>
<term>Databases, Genetic</term>
<term>Databases, Protein</term>
<term>Genes, Bacterial</term>
<term>Genome</term>
<term>Humans</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Oligonucleotide Array Sequence Analysis</term>
<term>Open Reading Frames</term>
<term>Pattern Recognition, Automated</term>
<term>Phylogeny</term>
<term>Software</term>
<term>Species Specificity</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>ADN</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données de protéines</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique</term>
<term>Cadres ouverts de lecture</term>
<term>Gènes bactériens</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Modèles génétiques</term>
<term>Modèles statistiques</term>
<term>Ordinateurs</term>
<term>Phylogénie</term>
<term>Reconnaissance automatique des formes</term>
<term>Simulation numérique</term>
<term>Sondes oligonucléotidiques</term>
<term>Spécificité d'espèce</term>
<term>Séquençage par oligonucléotides en batterie</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Several measures of DNA sequence dissimilarity have been developed. The purpose of this paper is 3-fold. Firstly, we compare the performance of several word-based or alignment-based methods. Secondly, we give a general guideline for choosing the window size and determining the optimal word sizes for several word-based measures at different window sizes. Thirdly, we use a large-scale simulation method to simulate data from the distribution of SK-LD (symmetric Kullback-Leibler discrepancy). These simulated data can be used to estimate the degree of dissimilarity beta between any pair of DNA sequences.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Taïwan</li>
</country>
</list>
<tree>
<noCountry>
<name sortKey="Huang, Ying Hsueh" sort="Huang, Ying Hsueh" uniqKey="Huang Y" first="Ying-Hsueh" last="Huang">Ying-Hsueh Huang</name>
<name sortKey="Li, Lung An" sort="Li, Lung An" uniqKey="Li L" first="Lung-An" last="Li">Lung-An Li</name>
</noCountry>
<country name="Taïwan">
<noRegion>
<name sortKey="Wu, Tiee Jian" sort="Wu, Tiee Jian" uniqKey="Wu T" first="Tiee-Jian" last="Wu">Tiee-Jian Wu</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Ncbi/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001150 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd -nk 001150 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Ncbi
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:16144805
   |texte=   Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/RBID.i   -Sk "pubmed:16144805" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021