Improving Mining Quality by Exploiting Data Dependency
Identifieur interne : 001211 ( Main/Exploration ); précédent : 001210; suivant : 001212Improving Mining Quality by Exploiting Data Dependency
Auteurs : Fang Chu [États-Unis] ; Yizhou Wang [États-Unis] ; Carlo Zaniolo [États-Unis] ; Stott Parker [États-Unis]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.
Abstract
Abstract: The usefulness of the results produced by data mining methods can be critically impaired by several factors such as (1) low quality of data, including errors due to contamination, or incompleteness due to limited bandwidth for data acquisition, and (2) inadequacy of the data model for capturing complex probabilistic relationships in data. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. Therefore, dependencies among data can be successfully exploited to remedy the problems mentioned above. In this paper, we propose a unified approach to improving mining quality using Markov networks as the data model to exploit local dependencies. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to clean the data, to infer missing values, or to improve the mining results from a model that ignores these dependencies. To illustrate the benefits and great generality of the technique, we present its application to three challenging problems: (i) cost-efficient sensor probing, (ii) enhancing protein function predictions, and (iii) sequence data denoising.
Url:
DOI: 10.1007/11430919_57
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000B72
- to stream Istex, to step Curation: 000B57
- to stream Istex, to step Checkpoint: 000B20
- to stream Main, to step Merge: 001247
- to stream Main, to step Curation: 001211
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Mining Quality by Exploiting Data Dependency</title>
<author><name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
</author>
<author><name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
</author>
<author><name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
</author>
<author><name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:AFBCB3524B957D24F9E051BE8B43D64EA51EBD54</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11430919_57</idno>
<idno type="url">https://api.istex.fr/document/AFBCB3524B957D24F9E051BE8B43D64EA51EBD54/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000B72</idno>
<idno type="wicri:Area/Istex/Curation">000B57</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B20</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Chu F:improving:mining:quality</idno>
<idno type="wicri:Area/Main/Merge">001247</idno>
<idno type="wicri:Area/Main/Curation">001211</idno>
<idno type="wicri:Area/Main/Exploration">001211</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Improving Mining Quality by Exploiting Data Dependency</title>
<author><name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">AFBCB3524B957D24F9E051BE8B43D64EA51EBD54</idno>
<idno type="DOI">10.1007/11430919_57</idno>
<idno type="ChapterID">57</idno>
<idno type="ChapterID">Chap57</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: The usefulness of the results produced by data mining methods can be critically impaired by several factors such as (1) low quality of data, including errors due to contamination, or incompleteness due to limited bandwidth for data acquisition, and (2) inadequacy of the data model for capturing complex probabilistic relationships in data. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. Therefore, dependencies among data can be successfully exploited to remedy the problems mentioned above. In this paper, we propose a unified approach to improving mining quality using Markov networks as the data model to exploit local dependencies. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to clean the data, to infer missing values, or to improve the mining results from a model that ignores these dependencies. To illustrate the benefits and great generality of the technique, we present its application to three challenging problems: (i) cost-efficient sensor probing, (ii) enhancing protein function predictions, and (iii) sequence data denoising.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Californie"><name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
</region>
<name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
<name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
<name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
<name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
<name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
<name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
<name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001211 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001211 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:AFBCB3524B957D24F9E051BE8B43D64EA51EBD54 |texte= Improving Mining Quality by Exploiting Data Dependency }}
This area was generated with Dilib version V0.6.32. |