Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.
Identifieur interne : 001150 ( Ncbi/Checkpoint ); précédent : 001149; suivant : 001151Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.
Auteurs : Tiee-Jian Wu [Taïwan] ; Ying-Hsueh Huang ; Lung-An LiSource :
- Bioinformatics (Oxford, England) [ 1367-4803 ] ; 2005.
Descripteurs français
- KwdFr :
- ADN (), Algorithmes, Analyse de séquence d'ADN (), Bases de données de protéines, Bases de données génétiques, Biologie informatique (), Cadres ouverts de lecture, Escherichia coli (génétique), Gènes bactériens, Génome, Humains, Lipoprotein lipase (génétique), Logiciel, Modèles génétiques, Modèles statistiques, Ordinateurs, Phylogénie, Reconnaissance automatique des formes, Shigella flexneri (génétique), Simulation numérique, Sondes oligonucléotidiques (), Spécificité d'espèce, Séquençage par oligonucléotides en batterie, Virus du SRAS (génétique).
- MESH :
- génétique : Escherichia coli, Lipoprotein lipase, Shigella flexneri, Virus du SRAS.
- ADN, Algorithmes, Analyse de séquence d'ADN, Bases de données de protéines, Bases de données génétiques, Biologie informatique, Cadres ouverts de lecture, Gènes bactériens, Génome, Humains, Logiciel, Modèles génétiques, Modèles statistiques, Ordinateurs, Phylogénie, Reconnaissance automatique des formes, Simulation numérique, Sondes oligonucléotidiques, Spécificité d'espèce, Séquençage par oligonucléotides en batterie.
English descriptors
- KwdEn :
- Algorithms, Computational Biology (methods), Computer Simulation, Computers, DNA (chemistry), Databases, Genetic, Databases, Protein, Escherichia coli (genetics), Genes, Bacterial, Genome, Humans, Lipoprotein Lipase (genetics), Models, Genetic, Models, Statistical, Oligonucleotide Array Sequence Analysis, Oligonucleotide Probes (chemistry), Open Reading Frames, Pattern Recognition, Automated, Phylogeny, SARS Virus (genetics), Sequence Analysis, DNA (methods), Shigella flexneri (genetics), Software, Species Specificity.
- MESH :
- chemical , chemistry : DNA, Oligonucleotide Probes.
- genetics : Escherichia coli, Lipoprotein Lipase, SARS Virus, Shigella flexneri.
- methods : Computational Biology, Sequence Analysis, DNA.
- Algorithms, Computer Simulation, Computers, Databases, Genetic, Databases, Protein, Genes, Bacterial, Genome, Humans, Models, Genetic, Models, Statistical, Oligonucleotide Array Sequence Analysis, Open Reading Frames, Pattern Recognition, Automated, Phylogeny, Software, Species Specificity.
Abstract
Several measures of DNA sequence dissimilarity have been developed. The purpose of this paper is 3-fold. Firstly, we compare the performance of several word-based or alignment-based methods. Secondly, we give a general guideline for choosing the window size and determining the optimal word sizes for several word-based measures at different window sizes. Thirdly, we use a large-scale simulation method to simulate data from the distribution of SK-LD (symmetric Kullback-Leibler discrepancy). These simulated data can be used to estimate the degree of dissimilarity beta between any pair of DNA sequences.
DOI: 10.1093/bioinformatics/bti658
PubMed: 16144805
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 002552
- to stream PubMed, to step Curation: 002552
- to stream PubMed, to step Checkpoint: 002593
- to stream Ncbi, to step Merge: 001150
- to stream Ncbi, to step Curation: 001150
Links to Exploration step
pubmed:16144805Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.</title>
<author><name sortKey="Wu, Tiee Jian" sort="Wu, Tiee Jian" uniqKey="Wu T" first="Tiee-Jian" last="Wu">Tiee-Jian Wu</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Statistics, National Cheng-Kung University, Tainan, Taiwan. tjwu@stat.ncku.edu.tw</nlm:affiliation>
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Statistics, National Cheng-Kung University, Tainan</wicri:regionArea>
<wicri:noRegion>Tainan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Huang, Ying Hsueh" sort="Huang, Ying Hsueh" uniqKey="Huang Y" first="Ying-Hsueh" last="Huang">Ying-Hsueh Huang</name>
</author>
<author><name sortKey="Li, Lung An" sort="Li, Lung An" uniqKey="Li L" first="Lung-An" last="Li">Lung-An Li</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:16144805</idno>
<idno type="pmid">16144805</idno>
<idno type="doi">10.1093/bioinformatics/bti658</idno>
<idno type="wicri:Area/PubMed/Corpus">002552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002552</idno>
<idno type="wicri:Area/PubMed/Curation">002552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002552</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002593</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002593</idno>
<idno type="wicri:Area/Ncbi/Merge">001150</idno>
<idno type="wicri:Area/Ncbi/Curation">001150</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001150</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences.</title>
<author><name sortKey="Wu, Tiee Jian" sort="Wu, Tiee Jian" uniqKey="Wu T" first="Tiee-Jian" last="Wu">Tiee-Jian Wu</name>
<affiliation wicri:level="1"><nlm:affiliation>Department of Statistics, National Cheng-Kung University, Tainan, Taiwan. tjwu@stat.ncku.edu.tw</nlm:affiliation>
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Statistics, National Cheng-Kung University, Tainan</wicri:regionArea>
<wicri:noRegion>Tainan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Huang, Ying Hsueh" sort="Huang, Ying Hsueh" uniqKey="Huang Y" first="Ying-Hsueh" last="Huang">Ying-Hsueh Huang</name>
</author>
<author><name sortKey="Li, Lung An" sort="Li, Lung An" uniqKey="Li L" first="Lung-An" last="Li">Lung-An Li</name>
</author>
</analytic>
<series><title level="j">Bioinformatics (Oxford, England)</title>
<idno type="ISSN">1367-4803</idno>
<imprint><date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Computational Biology (methods)</term>
<term>Computer Simulation</term>
<term>Computers</term>
<term>DNA (chemistry)</term>
<term>Databases, Genetic</term>
<term>Databases, Protein</term>
<term>Escherichia coli (genetics)</term>
<term>Genes, Bacterial</term>
<term>Genome</term>
<term>Humans</term>
<term>Lipoprotein Lipase (genetics)</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Oligonucleotide Array Sequence Analysis</term>
<term>Oligonucleotide Probes (chemistry)</term>
<term>Open Reading Frames</term>
<term>Pattern Recognition, Automated</term>
<term>Phylogeny</term>
<term>SARS Virus (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Shigella flexneri (genetics)</term>
<term>Software</term>
<term>Species Specificity</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>ADN ()</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Bases de données de protéines</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique ()</term>
<term>Cadres ouverts de lecture</term>
<term>Escherichia coli (génétique)</term>
<term>Gènes bactériens</term>
<term>Génome</term>
<term>Humains</term>
<term>Lipoprotein lipase (génétique)</term>
<term>Logiciel</term>
<term>Modèles génétiques</term>
<term>Modèles statistiques</term>
<term>Ordinateurs</term>
<term>Phylogénie</term>
<term>Reconnaissance automatique des formes</term>
<term>Shigella flexneri (génétique)</term>
<term>Simulation numérique</term>
<term>Sondes oligonucléotidiques ()</term>
<term>Spécificité d'espèce</term>
<term>Séquençage par oligonucléotides en batterie</term>
<term>Virus du SRAS (génétique)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en"><term>DNA</term>
<term>Oligonucleotide Probes</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Escherichia coli</term>
<term>Lipoprotein Lipase</term>
<term>SARS Virus</term>
<term>Shigella flexneri</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Escherichia coli</term>
<term>Lipoprotein lipase</term>
<term>Shigella flexneri</term>
<term>Virus du SRAS</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Computational Biology</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Computer Simulation</term>
<term>Computers</term>
<term>Databases, Genetic</term>
<term>Databases, Protein</term>
<term>Genes, Bacterial</term>
<term>Genome</term>
<term>Humans</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Oligonucleotide Array Sequence Analysis</term>
<term>Open Reading Frames</term>
<term>Pattern Recognition, Automated</term>
<term>Phylogeny</term>
<term>Software</term>
<term>Species Specificity</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>ADN</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Bases de données de protéines</term>
<term>Bases de données génétiques</term>
<term>Biologie informatique</term>
<term>Cadres ouverts de lecture</term>
<term>Gènes bactériens</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Modèles génétiques</term>
<term>Modèles statistiques</term>
<term>Ordinateurs</term>
<term>Phylogénie</term>
<term>Reconnaissance automatique des formes</term>
<term>Simulation numérique</term>
<term>Sondes oligonucléotidiques</term>
<term>Spécificité d'espèce</term>
<term>Séquençage par oligonucléotides en batterie</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Several measures of DNA sequence dissimilarity have been developed. The purpose of this paper is 3-fold. Firstly, we compare the performance of several word-based or alignment-based methods. Secondly, we give a general guideline for choosing the window size and determining the optimal word sizes for several word-based measures at different window sizes. Thirdly, we use a large-scale simulation method to simulate data from the distribution of SK-LD (symmetric Kullback-Leibler discrepancy). These simulated data can be used to estimate the degree of dissimilarity beta between any pair of DNA sequences.</div>
</front>
</TEI>
<affiliations><list><country><li>Taïwan</li>
</country>
</list>
<tree><noCountry><name sortKey="Huang, Ying Hsueh" sort="Huang, Ying Hsueh" uniqKey="Huang Y" first="Ying-Hsueh" last="Huang">Ying-Hsueh Huang</name>
<name sortKey="Li, Lung An" sort="Li, Lung An" uniqKey="Li L" first="Lung-An" last="Li">Lung-An Li</name>
</noCountry>
<country name="Taïwan"><noRegion><name sortKey="Wu, Tiee Jian" sort="Wu, Tiee Jian" uniqKey="Wu T" first="Tiee-Jian" last="Wu">Tiee-Jian Wu</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Ncbi/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001150 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd -nk 001150 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= SrasV1 |flux= Ncbi |étape= Checkpoint |type= RBID |clé= pubmed:16144805 |texte= Optimal word sizes for dissimilarity measures and estimation of the degree of dissimilarity between DNA sequences. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/RBID.i -Sk "pubmed:16144805" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd \ | NlmPubMed2Wicri -a SrasV1
![]() | This area was generated with Dilib version V0.6.33. | ![]() |