OcrV1, Main, Exploration, bibRecord, 001211

Improving Mining Quality by Exploiting Data Dependency

Identifieur interne : 001211 ( Main/Exploration ); précédent : 001210; suivant : 001212

Improving Mining Quality by Exploiting Data Dependency

Auteurs : Fang Chu [États-Unis] ; Yizhou Wang [États-Unis] ; Carlo Zaniolo [États-Unis] ; Stott Parker [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.

RBID : ISTEX:AFBCB3524B957D24F9E051BE8B43D64EA51EBD54

Abstract

Abstract: The usefulness of the results produced by data mining methods can be critically impaired by several factors such as (1) low quality of data, including errors due to contamination, or incompleteness due to limited bandwidth for data acquisition, and (2) inadequacy of the data model for capturing complex probabilistic relationships in data. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. Therefore, dependencies among data can be successfully exploited to remedy the problems mentioned above. In this paper, we propose a unified approach to improving mining quality using Markov networks as the data model to exploit local dependencies. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to clean the data, to infer missing values, or to improve the mining results from a model that ignores these dependencies. To illustrate the benefits and great generality of the technique, we present its application to three challenging problems: (i) cost-efficient sensor probing, (ii) enhancing protein function predictions, and (iii) sequence data denoising.

Url:

https://api.istex.fr/document/AFBCB3524B957D24F9E051BE8B43D64EA51EBD54/fulltext/pdf

DOI: 10.1007/11430919_57

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000B72
to stream Istex, to step Curation: 000B57
to stream Istex, to step Checkpoint: 000B20
to stream Main, to step Merge: 001247
to stream Main, to step Curation: 001211

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving Mining Quality by Exploiting Data Dependency</title>
<author><name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
</author>
<author><name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
</author>
<author><name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
</author>
<author><name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:AFBCB3524B957D24F9E051BE8B43D64EA51EBD54</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11430919_57</idno>
<idno type="url">https://api.istex.fr/document/AFBCB3524B957D24F9E051BE8B43D64EA51EBD54/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000B72</idno>
<idno type="wicri:Area/Istex/Curation">000B57</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B20</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Chu F:improving:mining:quality</idno>
<idno type="wicri:Area/Main/Merge">001247</idno>
<idno type="wicri:Area/Main/Curation">001211</idno>
<idno type="wicri:Area/Main/Exploration">001211</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Improving Mining Quality by Exploiting Data Dependency</title>
<author><name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, 90095, Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">AFBCB3524B957D24F9E051BE8B43D64EA51EBD54</idno>
<idno type="DOI">10.1007/11430919_57</idno>
<idno type="ChapterID">57</idno>
<idno type="ChapterID">Chap57</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: The usefulness of the results produced by data mining methods can be critically impaired by several factors such as (1) low quality of data, including errors due to contamination, or incompleteness due to limited bandwidth for data acquisition, and (2) inadequacy of the data model for capturing complex probabilistic relationships in data. Fortunately, a wide spectrum of applications exhibit strong dependencies between data samples. For example, the readings of nearby sensors are generally correlated, and proteins interact with each other when performing crucial functions. Therefore, dependencies among data can be successfully exploited to remedy the problems mentioned above. In this paper, we propose a unified approach to improving mining quality using Markov networks as the data model to exploit local dependencies. Belief propagation is used to efficiently compute the marginal or maximum posterior probabilities, so as to clean the data, to infer missing values, or to improve the mining results from a model that ignores these dependencies. To illustrate the benefits and great generality of the technique, we present its application to three challenging problems: (i) cost-efficient sensor probing, (ii) enhancing protein function predictions, and (iii) sequence data denoising.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Californie"><name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
</region>
<name sortKey="Chu, Fang" sort="Chu, Fang" uniqKey="Chu F" first="Fang" last="Chu">Fang Chu</name>
<name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
<name sortKey="Parker, Stott" sort="Parker, Stott" uniqKey="Parker S" first="Stott" last="Parker">Stott Parker</name>
<name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
<name sortKey="Wang, Yizhou" sort="Wang, Yizhou" uniqKey="Wang Y" first="Yizhou" last="Wang">Yizhou Wang</name>
<name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
<name sortKey="Zaniolo, Carlo" sort="Zaniolo, Carlo" uniqKey="Zaniolo C" first="Carlo" last="Zaniolo">Carlo Zaniolo</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001211 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001211 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:AFBCB3524B957D24F9E051BE8B43D64EA51EBD54
   |texte=   Improving Mining Quality by Exploiting Data Dependency
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Improving Mining Quality by Exploiting Data Dependency

Improving Mining Quality by Exploiting Data Dependency

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri