Learning Top- k Transformation Rules
Identifieur interne : 001114 ( Istex/Curation ); précédent : 001113; suivant : 001115Learning Top- k Transformation Rules
Auteurs : Sunanda Patro [Australie] ; Wei Wang [Australie]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2011.
Abstract
Abstract: Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.
Url:
DOI: 10.1007/978-3-642-23088-2_12
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: Pour aller vers cette notice dans l'étape Curation :001184
Links to Exploration step
ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Learning Top- k Transformation Rules</title>
<author><name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
<affiliation wicri:level="1"><mods:affiliation>University of New South Wales, Australia</mods:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: sunandap@cse.unsw.edu.au</mods:affiliation>
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
<author><name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
<affiliation wicri:level="1"><mods:affiliation>University of New South Wales, Australia</mods:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: weiw@cse.unsw.edu.au</mods:affiliation>
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-23088-2_12</idno>
<idno type="url">https://api.istex.fr/document/0E53B146AE762B16D3A5D89E42E870FCD55FC2D6/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001184</idno>
<idno type="wicri:Area/Istex/Curation">001114</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Learning Top- k Transformation Rules</title>
<author><name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
<affiliation wicri:level="1"><mods:affiliation>University of New South Wales, Australia</mods:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: sunandap@cse.unsw.edu.au</mods:affiliation>
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
<author><name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
<affiliation wicri:level="1"><mods:affiliation>University of New South Wales, Australia</mods:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: weiw@cse.unsw.edu.au</mods:affiliation>
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0E53B146AE762B16D3A5D89E42E870FCD55FC2D6</idno>
<idno type="DOI">10.1007/978-3-642-23088-2_12</idno>
<idno type="ChapterID">12</idno>
<idno type="ChapterID">Chap12</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001114 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 001114 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Istex |étape= Curation |type= RBID |clé= ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6 |texte= Learning Top- k Transformation Rules }}
This area was generated with Dilib version V0.6.32. |