Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Is searching full text more effective than searching abstracts?

Identifieur interne : 000033 ( Pmc/Checkpoint ); précédent : 000032; suivant : 000034

Is searching full text more effective than searching abstracts?

Auteurs : Jimmy Lin [États-Unis]

Source :

RBID : PMC:2695361

Abstract

Background

With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE® abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: bm25 and the ranking algorithm implemented in the open-source Lucene search engine.

Results

Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.

Conclusion

Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.


Url:
DOI: 10.1186/1471-2105-10-46
PubMed: 19192280
PubMed Central: 2695361


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:2695361

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Is searching full text more effective than searching abstracts?</title>
<author>
<name sortKey="Lin, Jimmy" sort="Lin, Jimmy" uniqKey="Lin J" first="Jimmy" last="Lin">Jimmy Lin</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<nlm:aff id="I2">The iSchool, University of Maryland, College Park, Maryland, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The iSchool, University of Maryland, College Park, Maryland</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19192280</idno>
<idno type="pmc">2695361</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2695361</idno>
<idno type="RBID">PMC:2695361</idno>
<idno type="doi">10.1186/1471-2105-10-46</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000124</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000124</idno>
<idno type="wicri:Area/Pmc/Curation">000124</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000124</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000033</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000033</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Is searching full text more effective than searching abstracts?</title>
<author>
<name sortKey="Lin, Jimmy" sort="Lin, Jimmy" uniqKey="Lin J" first="Jimmy" last="Lin">Jimmy Lin</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<nlm:aff id="I2">The iSchool, University of Maryland, College Park, Maryland, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>The iSchool, University of Maryland, College Park, Maryland</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE
<sup>® </sup>
abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined:
<italic>bm25 </italic>
and the ranking algorithm implemented in the open-source Lucene search engine.</p>
</sec>
<sec>
<title>Results</title>
<p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title>BMC Bioinformatics</journal-title>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">19192280</article-id>
<article-id pub-id-type="pmc">2695361</article-id>
<article-id pub-id-type="publisher-id">1471-2105-10-46</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-10-46</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Is searching full text more effective than searching abstracts?</article-title>
</title-group>
<contrib-group>
<contrib id="A1" corresp="yes" contrib-type="author">
<name>
<surname>Lin</surname>
<given-names>Jimmy</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>jimmylin@umd.edu</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA</aff>
<aff id="I2">
<label>2</label>
The iSchool, University of Maryland, College Park, Maryland, USA</aff>
<pub-date pub-type="collection">
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>3</day>
<month>2</month>
<year>2009</year>
</pub-date>
<volume>10</volume>
<fpage>46</fpage>
<lpage>46</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/10/46"></ext-link>
<history>
<date date-type="received">
<day>2</day>
<month>10</month>
<year>2008</year>
</date>
<date date-type="accepted">
<day>3</day>
<month>2</month>
<year>2009</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2009 Lin; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2009</copyright-year>
<copyright-holder>Lin; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> Lin Jimmy jimmylin@umd.edu Is searching full text more effective than searching abstracts? 2009BMC Bioinformatics 10(1): 46-. (2009)1471-2105(2009)10:1<46>urn:ISSN:1471-2105</pmc-comment>
</license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE
<sup>® </sup>
abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined:
<italic>bm25 </italic>
and the ranking algorithm implemented in the open-source Lucene search engine.</p>
</sec>
<sec>
<title>Results</title>
<p>Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p>
</sec>
</abstract>
</article-meta>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Maryland</li>
</region>
<settlement>
<li>College Park (Maryland)</li>
</settlement>
<orgName>
<li>Université du Maryland</li>
</orgName>
</list>
<tree>
<country name="États-Unis">
<region name="Maryland">
<name sortKey="Lin, Jimmy" sort="Lin, Jimmy" uniqKey="Lin J" first="Jimmy" last="Lin">Jimmy Lin</name>
</region>
<name sortKey="Lin, Jimmy" sort="Lin, Jimmy" uniqKey="Lin J" first="Jimmy" last="Lin">Jimmy Lin</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000033 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000033 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:2695361
   |texte=   Is searching full text more effective than searching abstracts?
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:19192280" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a SgmlV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021