Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Exploiting Locality of Wikipedia Links in Entity Ranking

Identifieur interne : 008824 ( Main/Curation ); précédent : 008823; suivant : 008825

Exploiting Locality of Wikipedia Links in Entity Ranking

Auteurs : Jovan Pehcevski [France] ; Anne-Marie Vercoustre [France] ; James A. Thom [Australie]

Source :

RBID : ISTEX:BEA42901DF0CAD3A11948E18F556920A3DC02697

Descripteurs français

English descriptors

Abstract

Abstract: Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking entities from the Wikipedia XML document collection. Our approach utilises the known categories and the link structure of Wikipedia, and more importantly, exploits link co-occurrences to improve the effectiveness of entity ranking. Using the broad context of a full Wikipedia page as a baseline, we evaluate two different algorithms for identifying narrow contexts around the entity examples: one that uses predefined types of elements such as paragraphs, lists and tables; and another that dynamically identifies the contexts by utilising the underlying XML document structure. Our experiments demonstrate that the locality of Wikipedia links can be exploited to significantly improve the effectiveness of entity ranking.

Url:
DOI: 10.1007/978-3-540-78646-7_25

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:BEA42901DF0CAD3A11948E18F556920A3DC02697

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Exploiting Locality of Wikipedia Links in Entity Ranking</title>
<author>
<name sortKey="Pehcevski, Jovan" sort="Pehcevski, Jovan" uniqKey="Pehcevski J" first="Jovan" last="Pehcevski">Jovan Pehcevski</name>
</author>
<author>
<name sortKey="Vercoustre, Anne Marie" sort="Vercoustre, Anne Marie" uniqKey="Vercoustre A" first="Anne-Marie" last="Vercoustre">Anne-Marie Vercoustre</name>
</author>
<author>
<name sortKey="Thom, James A" sort="Thom, James A" uniqKey="Thom J" first="James A." last="Thom">James A. Thom</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:BEA42901DF0CAD3A11948E18F556920A3DC02697</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1007/978-3-540-78646-7_25</idno>
<idno type="url">https://api.istex.fr/document/BEA42901DF0CAD3A11948E18F556920A3DC02697/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002358</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">002358</idno>
<idno type="wicri:Area/Istex/Curation">002358</idno>
<idno type="wicri:Area/Istex/Checkpoint">001191</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">001191</idno>
<idno type="wicri:doubleKey">0302-9743:2008:Pehcevski J:exploiting:locality:of</idno>
<idno type="wicri:Area/Main/Merge">009085</idno>
<idno type="wicri:Area/Main/Curation">008824</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Exploiting Locality of Wikipedia Links in Entity Ranking</title>
<author>
<name sortKey="Pehcevski, Jovan" sort="Pehcevski, Jovan" uniqKey="Pehcevski J" first="Jovan" last="Pehcevski">Jovan Pehcevski</name>
<affiliation wicri:level="1">
<country xml:lang="fr">France</country>
<wicri:regionArea>INRIA, Rocquencourt</wicri:regionArea>
<wicri:noRegion>Rocquencourt</wicri:noRegion>
<wicri:noRegion>Rocquencourt</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Vercoustre, Anne Marie" sort="Vercoustre, Anne Marie" uniqKey="Vercoustre A" first="Anne-Marie" last="Vercoustre">Anne-Marie Vercoustre</name>
<affiliation wicri:level="1">
<country xml:lang="fr">France</country>
<wicri:regionArea>INRIA, Rocquencourt</wicri:regionArea>
<wicri:noRegion>Rocquencourt</wicri:noRegion>
<wicri:noRegion>Rocquencourt</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Thom, James A" sort="Thom, James A" uniqKey="Thom J" first="James A." last="Thom">James A. Thom</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Australie</country>
<wicri:regionArea>RMIT University, Melbourne</wicri:regionArea>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2008</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Anchor text</term>
<term>Broad context</term>
<term>Category score</term>
<term>Category similarity score</term>
<term>Consistent performance improvement</term>
<term>Czech republic</term>
<term>Document collection</term>
<term>Dynamic contexts</term>
<term>Entity</term>
<term>Entity disambiguation</term>
<term>Entity example</term>
<term>Entity examples</term>
<term>Entity page</term>
<term>Entity recognition</term>
<term>Euro</term>
<term>Euro page</term>
<term>European countries</term>
<term>Evaluation measures</term>
<term>External knowledge</term>
<term>Full page</term>
<term>Full page context</term>
<term>Fullpage statl statr dyncre</term>
<term>Global score</term>
<term>Good entity page</term>
<term>Inex</term>
<term>Information retrieval</term>
<term>Initial zettair score</term>
<term>International conference</term>
<term>Joint conference</term>
<term>Linear combination</term>
<term>Link analysis</term>
<term>Linkrank</term>
<term>Linkrank function</term>
<term>Linkrank module</term>
<term>Linkrank score</term>
<term>Links</term>
<term>Module</term>
<term>Narrow contexts</term>
<term>Open source search engines</term>
<term>Optimal value</term>
<term>Optimal values</term>
<term>Pehcevski</term>
<term>Performance scores</term>
<term>Positive impact</term>
<term>Query</term>
<term>Ranking entities</term>
<term>Relevance assessments</term>
<term>Relevant entities</term>
<term>Retrieval</term>
<term>Retrieving entities</term>
<term>Rmit university</term>
<term>Search engine</term>
<term>Second task</term>
<term>Static contexts</term>
<term>Target entities</term>
<term>Target entity</term>
<term>Target entity page</term>
<term>Test collection</term>
<term>Topic title</term>
<term>Vercoustre</term>
<term>Wikipedia</term>
<term>Wikipedia categories</term>
<term>Wikipedia links</term>
<term>Wikipedia page</term>
<term>Wikipedia pages</term>
<term>Zettair</term>
</keywords>
<keywords scheme="Teeft" xml:lang="en">
<term>Algorithm</term>
<term>Anchor text</term>
<term>Broad context</term>
<term>Category score</term>
<term>Category similarity score</term>
<term>Consistent performance improvement</term>
<term>Czech republic</term>
<term>Document collection</term>
<term>Dynamic contexts</term>
<term>Entity</term>
<term>Entity disambiguation</term>
<term>Entity example</term>
<term>Entity examples</term>
<term>Entity page</term>
<term>Entity recognition</term>
<term>Euro</term>
<term>Euro page</term>
<term>European countries</term>
<term>Evaluation measures</term>
<term>External knowledge</term>
<term>Full page</term>
<term>Full page context</term>
<term>Fullpage statl statr dyncre</term>
<term>Global score</term>
<term>Good entity page</term>
<term>Inex</term>
<term>Information retrieval</term>
<term>Initial zettair score</term>
<term>International conference</term>
<term>Joint conference</term>
<term>Linear combination</term>
<term>Link analysis</term>
<term>Linkrank</term>
<term>Linkrank function</term>
<term>Linkrank module</term>
<term>Linkrank score</term>
<term>Links</term>
<term>Module</term>
<term>Narrow contexts</term>
<term>Open source search engines</term>
<term>Optimal value</term>
<term>Optimal values</term>
<term>Pehcevski</term>
<term>Performance scores</term>
<term>Positive impact</term>
<term>Query</term>
<term>Ranking entities</term>
<term>Relevance assessments</term>
<term>Relevant entities</term>
<term>Retrieval</term>
<term>Retrieving entities</term>
<term>Rmit university</term>
<term>Search engine</term>
<term>Second task</term>
<term>Static contexts</term>
<term>Target entities</term>
<term>Target entity</term>
<term>Target entity page</term>
<term>Test collection</term>
<term>Topic title</term>
<term>Vercoustre</term>
<term>Wikipedia</term>
<term>Wikipedia categories</term>
<term>Wikipedia links</term>
<term>Wikipedia page</term>
<term>Wikipedia pages</term>
<term>Zettair</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Euro</term>
<term>Conférence internationale</term>
<term>Moteur de recherche</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking entities from the Wikipedia XML document collection. Our approach utilises the known categories and the link structure of Wikipedia, and more importantly, exploits link co-occurrences to improve the effectiveness of entity ranking. Using the broad context of a full Wikipedia page as a baseline, we evaluate two different algorithms for identifying narrow contexts around the entity examples: one that uses predefined types of elements such as paragraphs, lists and tables; and another that dynamically identifies the contexts by utilising the underlying XML document structure. Our experiments demonstrate that the locality of Wikipedia links can be exploited to significantly improve the effectiveness of entity ranking.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Main/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 008824 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Curation/biblio.hfd -nk 008824 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Main
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:BEA42901DF0CAD3A11948E18F556920A3DC02697
   |texte=   Exploiting Locality of Wikipedia Links in Entity Ranking
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024