SisAgriV1, Istex, Corpus, bibRecord, 001245

A random walk on an ontology: Using thesaurus structure for automatic subject indexing

Identifieur interne : 001245 ( Istex/Corpus ); précédent : 001244; suivant : 001246

A random walk on an ontology: Using thesaurus structure for automatic subject indexing

Auteurs : Craig Willis ; Robert M. Losee

Source :

Journal of the American Society for Information Science and Technology [ 1532-2882 ] ; 2013-07.

RBID : ISTEX:AE3F4B8CB26A03C3C3E220357B22647CB463B9E8

Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high‐energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Url:

https://api.istex.fr/document/AE3F4B8CB26A03C3C3E220357B22647CB463B9E8/fulltext/pdf

DOI: 10.1002/asi.22853

Links to Exploration step

ISTEX:AE3F4B8CB26A03C3C3E220357B22647CB463B9E8

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
<author><name sortKey="Willis, Craig" sort="Willis, Craig" uniqKey="Willis C" first="Craig" last="Willis">Craig Willis</name>
<affiliation><mods:affiliation>Graduate School of Library and Information Science, University of Illinois at Urbana‐Champaign, 501 E. Daniel Street, IL, 61820, Champaign</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: willis8@illinois.edu</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Losee, Robert M" sort="Losee, Robert M" uniqKey="Losee R" first="Robert M." last="Losee">Robert M. Losee</name>
<affiliation><mods:affiliation>School of Information and Library Science, University of North Carolina, 216 Lenoir Drive, 302 Manning Hall, NC, Chapel Hill</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: losee@unc.edu</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:AE3F4B8CB26A03C3C3E220357B22647CB463B9E8</idno>
<date when="2013" year="2013">2013</date>
<idno type="doi">10.1002/asi.22853</idno>
<idno type="url">https://api.istex.fr/document/AE3F4B8CB26A03C3C3E220357B22647CB463B9E8/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001245</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001245</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
<author><name sortKey="Willis, Craig" sort="Willis, Craig" uniqKey="Willis C" first="Craig" last="Willis">Craig Willis</name>
<affiliation><mods:affiliation>Graduate School of Library and Information Science, University of Illinois at Urbana‐Champaign, 501 E. Daniel Street, IL, 61820, Champaign</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: willis8@illinois.edu</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Losee, Robert M" sort="Losee, Robert M" uniqKey="Losee R" first="Robert M." last="Losee">Robert M. Losee</name>
<affiliation><mods:affiliation>School of Information and Library Science, University of North Carolina, 216 Lenoir Drive, 302 Manning Hall, NC, Chapel Hill</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: losee@unc.edu</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Journal of the American Society for Information Science and Technology</title>
<title level="j" type="abbrev">J Am Soc Inf Sci Tec</title>
<idno type="ISSN">1532-2882</idno>
<idno type="eISSN">1532-2890</idno>
<imprint><publisher>Blackwell Publishing Ltd</publisher>
<date type="published" when="2013-07">2013-07</date>
<biblScope unit="volume">64</biblScope>
<biblScope unit="issue">7</biblScope>
<biblScope unit="page" from="1330">1330</biblScope>
<biblScope unit="page" to="1344">1344</biblScope>
</imprint>
<idno type="ISSN">1532-2882</idno>
</series>
<idno type="istex">AE3F4B8CB26A03C3C3E220357B22647CB463B9E8</idno>
<idno type="DOI">10.1002/asi.22853</idno>
<idno type="ArticleID">ASI22853</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1532-2882</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract">Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high‐energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.</div>
</front>
</TEI>
<istex><corpusName>wiley</corpusName>
<author><json:item><name>Craig Willis</name>
<affiliations><json:string>Graduate School of Library and Information Science, University of Illinois at Urbana‐Champaign, 501 E. Daniel Street, IL, 61820, Champaign</json:string>
<json:string>E-mail: willis8@illinois.edu</json:string>
</affiliations>
</json:item>
<json:item><name>Robert M. Losee</name>
<affiliations><json:string>School of Information and Library Science, University of North Carolina, 216 Lenoir Drive, 302 Manning Hall, NC, Chapel Hill</json:string>
<json:string>E-mail: losee@unc.edu</json:string>
</affiliations>
</json:item>
</author>
<subject><json:item><lang><json:string>eng</json:string>
</lang>
<value>automatic indexing</value>
</json:item>
<json:item><lang><json:string>eng</json:string>
</lang>
<value>ontologies</value>
</json:item>
<json:item><lang><json:string>eng</json:string>
</lang>
<value>thesauri</value>
</json:item>
</subject>
<articleId><json:string>ASI22853</json:string>
</articleId>
<language><json:string>eng</json:string>
</language>
<originalGenre><json:string>article</json:string>
</originalGenre>
<abstract>Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high‐energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.</abstract>
<qualityIndicators><score>8.5</score>
<pdfVersion>1.4</pdfVersion>
<pdfPageSize>610.651 x 790.254 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<abstractCharCount>1698</abstractCharCount>
<pdfWordCount>9558</pdfWordCount>
<pdfCharCount>61080</pdfCharCount>
<pdfPageCount>15</pdfPageCount>
<abstractWordCount>254</abstractWordCount>
</qualityIndicators>
<title>A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
<genre><json:string>article</json:string>
</genre>
<host><volume>64</volume>
<publisherId><json:string>ASI</json:string>
</publisherId>
<pages><total>15</total>
<last>1344</last>
<first>1330</first>
</pages>
<issn><json:string>1532-2882</json:string>
</issn>
<issue>7</issue>
<subject><json:item><value>thesauri</value>
</json:item>
<json:item><value>subject indexing</value>
</json:item>
<json:item><value>automatic categorization</value>
</json:item>
<json:item><value>hierarchical models</value>
</json:item>
<json:item><value>weighting</value>
</json:item>
<json:item><value>RESEARCH ARTICLE</value>
</json:item>
</subject>
<genre><json:string>journal</json:string>
</genre>
<language><json:string>unknown</json:string>
</language>
<eissn><json:string>1532-2890</json:string>
</eissn>
<title>Journal of the American Society for Information Science and Technology</title>
<doi><json:string>10.1002/(ISSN)1532-2890</json:string>
</doi>
</host>
<categories><wos><json:string>social science</json:string>
<json:string>information science & library science</json:string>
<json:string>science</json:string>
<json:string>computer science, information systems</json:string>
</wos>
<scienceMetrix><json:string>economic & social sciences</json:string>
<json:string>social sciences</json:string>
<json:string>information & library sciences</json:string>
</scienceMetrix>
</categories>
<publicationDate>2013</publicationDate>
<copyrightDate>2013</copyrightDate>
<doi><json:string>10.1002/asi.22853</json:string>
</doi>
<id>AE3F4B8CB26A03C3C3E220357B22647CB463B9E8</id>
<score>0.32865158</score>
<fulltext><json:item><extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/document/AE3F4B8CB26A03C3C3E220357B22647CB463B9E8/fulltext/pdf</uri>
</json:item>
<json:item><extension>zip</extension>
<original>false</original>
<mimetype>application/zip</mimetype>
<uri>https://api.istex.fr/document/AE3F4B8CB26A03C3C3E220357B22647CB463B9E8/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/AE3F4B8CB26A03C3C3E220357B22647CB463B9E8/fulltext/tei"><teiHeader><fileDesc><titleStmt><title level="a" type="main" xml:lang="en">A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
</titleStmt>
<publicationStmt><authority>ISTEX</authority>
<publisher>Blackwell Publishing Ltd</publisher>
<availability><p>Copyright © 2013 ASIS&T© 2013 ASIS&T</p>
</availability>
<date>2012-12-19</date>
</publicationStmt>
<notesStmt><note>Institute of Museum and Library Services (IMLS) - No. LG‐07‐08‐0120‐08;</note>
</notesStmt>
<sourceDesc><biblStruct type="inbook"><analytic><title level="a" type="main" xml:lang="en">A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
<author xml:id="author-1"><persName><forename type="first">Craig</forename>
<surname>Willis</surname>
</persName>
<email>willis8@illinois.edu</email>
<affiliation>Graduate School of Library and Information Science, University of Illinois at Urbana‐Champaign, 501 E. Daniel Street, IL, 61820, Champaign</affiliation>
</author>
<author xml:id="author-2"><persName><forename type="first">Robert M.</forename>
<surname>Losee</surname>
</persName>
<email>losee@unc.edu</email>
<affiliation>School of Information and Library Science, University of North Carolina, 216 Lenoir Drive, 302 Manning Hall, NC, Chapel Hill</affiliation>
</author>
</analytic>
<monogr><title level="j">Journal of the American Society for Information Science and Technology</title>
<title level="j" type="abbrev">J Am Soc Inf Sci Tec</title>
<idno type="pISSN">1532-2882</idno>
<idno type="eISSN">1532-2890</idno>
<idno type="DOI">10.1002/(ISSN)1532-2890</idno>
<imprint><publisher>Blackwell Publishing Ltd</publisher>
<date type="published" when="2013-07"></date>
<biblScope unit="volume">64</biblScope>
<biblScope unit="issue">7</biblScope>
<biblScope unit="page" from="1330">1330</biblScope>
<biblScope unit="page" to="1344">1344</biblScope>
</imprint>
</monogr>
<idno type="istex">AE3F4B8CB26A03C3C3E220357B22647CB463B9E8</idno>
<idno type="DOI">10.1002/asi.22853</idno>
<idno type="ArticleID">ASI22853</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><creation><date>2012-12-19</date>
</creation>
<langUsage><language ident="en">en</language>
</langUsage>
<abstract><p>Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high‐energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.</p>
</abstract>
<textClass><keywords scheme="keyword"><list><head>keywords</head>
<item><term>automatic indexing</term>
</item>
<item><term>ontologies</term>
</item>
<item><term>thesauri</term>
</item>
</list>
</keywords>
</textClass>
<textClass><keywords scheme="Journal Subject"><list><head>index-terms</head>
<item><term>thesauri</term>
</item>
<item><term>subject indexing</term>
</item>
<item><term>automatic categorization</term>
</item>
<item><term>hierarchical models</term>
</item>
<item><term>weighting</term>
</item>
</list>
</keywords>
</textClass>
<textClass><keywords scheme="Journal Subject"><list><head>article-category</head>
<item><term>RESEARCH ARTICLE</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc><change when="2012-04-30">Received</change>
<change when="2012-10-01">Registration</change>
<change when="2012-12-19">Created</change>
<change when="2013-07">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item><extension>txt</extension>
<original>false</original>
<mimetype>text/plain</mimetype>
<uri>https://api.istex.fr/document/AE3F4B8CB26A03C3C3E220357B22647CB463B9E8/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata><istex:metadataXml wicri:clean="Wiley, elements deleted: body"><istex:xmlDeclaration>version="1.0" encoding="UTF-8" standalone="yes"</istex:xmlDeclaration>
<istex:document><component type="serialArticle" version="2.0" xml:id="asi22853" xml:lang="en"><header><publicationMeta level="product"><doi origin="wiley">10.1002/(ISSN)1532-2890</doi>
<issn type="print">1532-2882</issn>
<issn type="electronic">1532-2890</issn>
<idGroup><id type="product" value="ASI"></id>
</idGroup>
<titleGroup><title sort="JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY" type="main">Journal of the American Society for Information Science and Technology</title>
<title type="short">J Am Soc Inf Sci Tec</title>
</titleGroup>
</publicationMeta>
<publicationMeta level="part" position="07107"><doi>10.1002/asi.2013.64.issue-7</doi>
<copyright ownership="thirdParty">Copyright © 2013 ASIS&T</copyright>
<numberingGroup><numbering number="64" type="journalVolume">64</numbering>
<numbering type="journalIssue">7</numbering>
</numberingGroup>
<coverDate startDate="2013-07">July 2013</coverDate>
</publicationMeta>
<publicationMeta level="unit" position="40" status="forIssue" type="article"><doi>10.1002/asi.22853</doi>
<idGroup><id type="unit" value="ASI22853"></id>
</idGroup>
<countGroup><count number="15" type="pageTotal"></count>
</countGroup>
<titleGroup><title type="tocHeading1">RESEARCH ARTICLES</title>
<title type="articleCategory">RESEARCH ARTICLE</title>
</titleGroup>
<copyright ownership="thirdParty">© 2013 ASIS&T</copyright>
<eventGroup><event agent="bestset" date="2012-12-19" type="xmlCreated"></event>
<event date="2012-04-30" type="manuscriptReceived"></event>
<event date="2012-10-01" type="manuscriptRevised"></event>
<event date="2012-10-01" type="manuscriptAccepted"></event>
<event type="publishedOnlineEarlyUnpaginated" date="2013-05-22"></event>
<event type="firstOnline" date="2013-05-22"></event>
<event type="publishedOnlineFinalForm" date="2013-06-04"></event>
<event type="xmlConverted" agent="Converter:WILEY_ML3G_TO_WILEY_ML3GV2 version:3.8.8" date="2014-01-06"></event>
<event type="xmlConverted" agent="Converter:WML3G_To_WML3G version:4.6.4 mode:FullText" date="2015-10-01"></event>
</eventGroup>
<numberingGroup><numbering type="pageFirst">1330</numbering>
<numbering type="pageLast">1344</numbering>
</numberingGroup>
<subjectInfo><subject href="http://psi.asis.org/digital/thesauri">thesauri</subject>
<subject href="http://psi.asis.org/digital/subject+indexing">subject indexing</subject>
<subject href="http://psi.asis.org/digital/automatic+categorization">automatic categorization</subject>
<subject href="http://psi.asis.org/digital/hierarchical+models">hierarchical models</subject>
<subject href="http://psi.asis.org/digital/weighting">weighting</subject>
</subjectInfo>
<linkGroup><link type="toTypesetVersion" href="file:ASI.ASI22853.pdf"></link>
</linkGroup>
</publicationMeta>
<contentMeta><titleGroup><title type="short">A Random Walk on an Ontology: Using Thesaurus Structure for Automatic Subject Indexing</title>
<title type="main">A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
</titleGroup>
<creators><creator affiliationRef="#asi22853-aff-0001" creatorRole="author" xml:id="asi22853-cr-0001"><personName><givenNames>Craig</givenNames>
<familyName>Willis</familyName>
</personName>
<contactDetails><email>willis8@illinois.edu</email>
</contactDetails>
</creator>
<creator affiliationRef="#asi22853-aff-0002" creatorRole="author" xml:id="asi22853-cr-0002"><personName><givenNames>Robert M.</givenNames>
<familyName>Losee</familyName>
</personName>
<contactDetails><email>losee@unc.edu</email>
</contactDetails>
</creator>
</creators>
<affiliationGroup><affiliation xml:id="asi22853-aff-0001"><orgDiv>Graduate School of Library and Information Science</orgDiv>
<orgName>University of Illinois at Urbana‐Champaign</orgName>
<address><street>501 E. Daniel Street</street>
<city>Champaign</city>
<countryPart>IL</countryPart>
<postCode>61820</postCode>
</address>
</affiliation>
<affiliation xml:id="asi22853-aff-0002"><orgDiv>School of Information and Library Science</orgDiv>
<orgName>University of North Carolina</orgName>
<address><street>216 Lenoir Drive, 302 Manning Hall</street>
<city>Chapel Hill</city>
<countryPart>NC</countryPart>
</address>
</affiliation>
</affiliationGroup>
<keywordGroup type="author"><keyword xml:id="asi22853-kwd-0001">automatic indexing</keyword>
<keyword xml:id="asi22853-kwd-0002">ontologies</keyword>
<keyword xml:id="asi22853-kwd-0003">thesauri</keyword>
</keywordGroup>
<fundingInfo><fundingAgency>Institute of Museum and Library Services (IMLS)</fundingAgency>
<fundingNumber>LG‐07‐08‐0120‐08</fundingNumber>
</fundingInfo>
<abstractGroup> <abstract type="main"><p>Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (<fc>AGROVOC</fc>
 [UN Food and Agriculture Organization], high‐energy physics taxonomy [<fc>HEP</fc>
], National Agricultural Library Thesaurus [<fc>NALT</fc>
], and medical subject headings [<fc>MeSH</fc>
]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (<fc>AP</fc>
) of 9% for <fc>HEP</fc>
, 11% for <fc>MeSH</fc>
, 35% for <fc>NALT</fc>
, and 37% for <fc>AGROVOC</fc>
. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.</p>
</abstract>
</abstractGroup>
 </contentMeta>
</header>
</component>
</istex:document>
</istex:metadataXml>
<mods version="3.6"><titleInfo lang="en"><title>A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
</titleInfo>
<titleInfo type="abbreviated" lang="en"><title>A Random Walk on an Ontology: Using Thesaurus Structure for Automatic Subject Indexing</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en"><title>A random walk on an ontology: Using thesaurus structure for automatic subject indexing</title>
</titleInfo>
<name type="personal"><namePart type="given">Craig</namePart>
<namePart type="family">Willis</namePart>
<affiliation>Graduate School of Library and Information Science, University of Illinois at Urbana‐Champaign, 501 E. Daniel Street, IL, 61820, Champaign</affiliation>
<affiliation>E-mail: willis8@illinois.edu</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Robert M.</namePart>
<namePart type="family">Losee</namePart>
<affiliation>School of Information and Library Science, University of North Carolina, 216 Lenoir Drive, 302 Manning Hall, NC, Chapel Hill</affiliation>
<affiliation>E-mail: losee@unc.edu</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="article" displayLabel="article"></genre>
<originInfo><publisher>Blackwell Publishing Ltd</publisher>
<dateIssued encoding="w3cdtf">2013-07</dateIssued>
<dateCreated encoding="w3cdtf">2012-12-19</dateCreated>
<dateCaptured encoding="w3cdtf">2012-04-30</dateCaptured>
<dateValid encoding="w3cdtf">2012-10-01</dateValid>
<copyrightDate encoding="w3cdtf">2013</copyrightDate>
</originInfo>
<language><languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription><internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract>Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre‐indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high‐energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus‐centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.</abstract>
<note type="funding">Institute of Museum and Library Services (IMLS) - No. LG‐07‐08‐0120‐08; </note>
<subject><genre>keywords</genre>
<topic>automatic indexing</topic>
<topic>ontologies</topic>
<topic>thesauri</topic>
</subject>
<relatedItem type="host"><titleInfo><title>Journal of the American Society for Information Science and Technology</title>
</titleInfo>
<titleInfo type="abbreviated"><title>J Am Soc Inf Sci Tec</title>
</titleInfo>
<genre type="journal">journal</genre>
<subject><genre>index-terms</genre>
<topic authorityURI="http://psi.asis.org/digital/thesauri">thesauri</topic>
<topic authorityURI="http://psi.asis.org/digital/subject+indexing">subject indexing</topic>
<topic authorityURI="http://psi.asis.org/digital/automatic+categorization">automatic categorization</topic>
<topic authorityURI="http://psi.asis.org/digital/hierarchical+models">hierarchical models</topic>
<topic authorityURI="http://psi.asis.org/digital/weighting">weighting</topic>
</subject>
<subject><genre>article-category</genre>
<topic>RESEARCH ARTICLE</topic>
</subject>
<identifier type="ISSN">1532-2882</identifier>
<identifier type="eISSN">1532-2890</identifier>
<identifier type="DOI">10.1002/(ISSN)1532-2890</identifier>
<identifier type="PublisherID">ASI</identifier>
<part><date>2013</date>
<detail type="volume"><caption>vol.</caption>
<number>64</number>
</detail>
<detail type="issue"><caption>no.</caption>
<number>7</number>
</detail>
<extent unit="pages"><start>1330</start>
<end>1344</end>
<total>15</total>
</extent>
</part>
</relatedItem>
<identifier type="istex">AE3F4B8CB26A03C3C3E220357B22647CB463B9E8</identifier>
<identifier type="DOI">10.1002/asi.22853</identifier>
<identifier type="ArticleID">ASI22853</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Copyright © 2013 ASIS&T© 2013 ASIS&T</accessCondition>
<recordInfo><recordContentSource>WILEY</recordContentSource>
</recordInfo>
</mods>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Agronomie/explor/SisAgriV1/Data/Istex/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001245 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 001245 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Agronomie
   |area=    SisAgriV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:AE3F4B8CB26A03C3C3E220357B22647CB463B9E8
   |texte=   A random walk on an ontology: Using thesaurus structure for automatic subject indexing
}}

This area was generated with Dilib version V0.6.28.
Data generation: Wed Mar 29 00:06:34 2017. Site generation: Tue Mar 12 12:44:16 2024

	Système d'information stratégique et agriculture (serveur d'exploration)
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Système d'information stratégique et agriculture (serveur d'exploration)

A random walk on an ontology: Using thesaurus structure for automatic subject indexing

A random walk on an ontology: Using thesaurus structure for automatic subject indexing

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri