Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

On optimizing distance-based similarity search for biological databases.

Identifieur interne : 002E92 ( Main/Exploration ); précédent : 002E91; suivant : 002E93

On optimizing distance-based similarity search for biological databases.

Auteurs : Rui Mao [États-Unis] ; Weijia Xu ; Smriti Ramakrishnan ; Glen Nuckolls ; Daniel P. Miranker

Source :

RBID : pubmed:16447992

Descripteurs français

English descriptors

Abstract

Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits.

DOI: 10.1109/csb.2005.42
PubMed: 16447992


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">On optimizing distance-based similarity search for biological databases.</title>
<author>
<name sortKey="Mao, Rui" sort="Mao, Rui" uniqKey="Mao R" first="Rui" last="Mao">Rui Mao</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Sciences, Center for Computational Biology and Bioinformatics, University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233, USA. rmao@cs.utexas.edu</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Sciences, Center for Computational Biology and Bioinformatics, University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233</wicri:regionArea>
<placeName>
<region type="state">Texas</region>
<settlement type="city">Austin (Texas)</settlement>
</placeName>
<orgName type="university">Université du Texas à Austin</orgName>
</affiliation>
</author>
<author>
<name sortKey="Xu, Weijia" sort="Xu, Weijia" uniqKey="Xu W" first="Weijia" last="Xu">Weijia Xu</name>
</author>
<author>
<name sortKey="Ramakrishnan, Smriti" sort="Ramakrishnan, Smriti" uniqKey="Ramakrishnan S" first="Smriti" last="Ramakrishnan">Smriti Ramakrishnan</name>
</author>
<author>
<name sortKey="Nuckolls, Glen" sort="Nuckolls, Glen" uniqKey="Nuckolls G" first="Glen" last="Nuckolls">Glen Nuckolls</name>
</author>
<author>
<name sortKey="Miranker, Daniel P" sort="Miranker, Daniel P" uniqKey="Miranker D" first="Daniel P" last="Miranker">Daniel P. Miranker</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2005">2005</date>
<idno type="RBID">pubmed:16447992</idno>
<idno type="pmid">16447992</idno>
<idno type="doi">10.1109/csb.2005.42</idno>
<idno type="wicri:Area/PubMed/Corpus">002282</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002282</idno>
<idno type="wicri:Area/PubMed/Curation">002282</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002282</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002190</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002190</idno>
<idno type="wicri:Area/Ncbi/Merge">000399</idno>
<idno type="wicri:Area/Ncbi/Curation">000399</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000399</idno>
<idno type="wicri:doubleKey">1551-7497:2005:Mao R:on:optimizing:distance</idno>
<idno type="wicri:Area/Main/Merge">002F22</idno>
<idno type="wicri:Area/Main/Curation">002E92</idno>
<idno type="wicri:Area/Main/Exploration">002E92</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">On optimizing distance-based similarity search for biological databases.</title>
<author>
<name sortKey="Mao, Rui" sort="Mao, Rui" uniqKey="Mao R" first="Rui" last="Mao">Rui Mao</name>
<affiliation wicri:level="4">
<nlm:affiliation>Department of Computer Sciences, Center for Computational Biology and Bioinformatics, University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233, USA. rmao@cs.utexas.edu</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Sciences, Center for Computational Biology and Bioinformatics, University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233</wicri:regionArea>
<placeName>
<region type="state">Texas</region>
<settlement type="city">Austin (Texas)</settlement>
</placeName>
<orgName type="university">Université du Texas à Austin</orgName>
</affiliation>
</author>
<author>
<name sortKey="Xu, Weijia" sort="Xu, Weijia" uniqKey="Xu W" first="Weijia" last="Xu">Weijia Xu</name>
</author>
<author>
<name sortKey="Ramakrishnan, Smriti" sort="Ramakrishnan, Smriti" uniqKey="Ramakrishnan S" first="Smriti" last="Ramakrishnan">Smriti Ramakrishnan</name>
</author>
<author>
<name sortKey="Nuckolls, Glen" sort="Nuckolls, Glen" uniqKey="Nuckolls G" first="Glen" last="Nuckolls">Glen Nuckolls</name>
</author>
<author>
<name sortKey="Miranker, Daniel P" sort="Miranker, Daniel P" uniqKey="Miranker D" first="Daniel P" last="Miranker">Daniel P. Miranker</name>
</author>
</analytic>
<series>
<title level="j">Proceedings. IEEE Computational Systems Bioinformatics Conference</title>
<idno type="ISSN">1551-7497</idno>
<imprint>
<date when="2005" type="published">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Artificial Intelligence</term>
<term>Database Management Systems</term>
<term>Databases, Genetic</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Pattern Recognition, Automated (methods)</term>
<term>Sequence Alignment (methods)</term>
<term>Sequence Analysis (methods)</term>
<term>Sequence Homology</term>
<term>User-Computer Interface</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Alignement de séquences ()</term>
<term>Analyse de séquence ()</term>
<term>Bases de données génétiques</term>
<term>Intelligence artificielle</term>
<term>Interface utilisateur</term>
<term>Mémorisation et recherche des informations ()</term>
<term>Reconnaissance automatique des formes ()</term>
<term>Similitude de séquences</term>
<term>Systèmes de gestion de bases de données</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Artificial Intelligence</term>
<term>Database Management Systems</term>
<term>Databases, Genetic</term>
<term>Sequence Homology</term>
<term>User-Computer Interface</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence</term>
<term>Bases de données génétiques</term>
<term>Intelligence artificielle</term>
<term>Interface utilisateur</term>
<term>Mémorisation et recherche des informations</term>
<term>Reconnaissance automatique des formes</term>
<term>Similitude de séquences</term>
<term>Systèmes de gestion de bases de données</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Texas</li>
</region>
<settlement>
<li>Austin (Texas)</li>
</settlement>
<orgName>
<li>Université du Texas à Austin</li>
</orgName>
</list>
<tree>
<noCountry>
<name sortKey="Miranker, Daniel P" sort="Miranker, Daniel P" uniqKey="Miranker D" first="Daniel P" last="Miranker">Daniel P. Miranker</name>
<name sortKey="Nuckolls, Glen" sort="Nuckolls, Glen" uniqKey="Nuckolls G" first="Glen" last="Nuckolls">Glen Nuckolls</name>
<name sortKey="Ramakrishnan, Smriti" sort="Ramakrishnan, Smriti" uniqKey="Ramakrishnan S" first="Smriti" last="Ramakrishnan">Smriti Ramakrishnan</name>
<name sortKey="Xu, Weijia" sort="Xu, Weijia" uniqKey="Xu W" first="Weijia" last="Xu">Weijia Xu</name>
</noCountry>
<country name="États-Unis">
<region name="Texas">
<name sortKey="Mao, Rui" sort="Mao, Rui" uniqKey="Mao R" first="Rui" last="Mao">Rui Mao</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002E92 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002E92 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:16447992
   |texte=   On optimizing distance-based similarity search for biological databases.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:16447992" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021