La maladie de Parkinson en France (serveur d'exploration)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Performance evaluation of unified medical language system®'s synonyms expansion to query PubMed

Identifieur interne : 000695 ( Pmc/Corpus ); précédent : 000694; suivant : 000696

Performance evaluation of unified medical language system®'s synonyms expansion to query PubMed

Auteurs : Nicolas Griffon ; Wiem Chebil ; Laetitia Rollin ; Gaetan Kerdelhue ; Benoit Thirion ; Jean-François Gehanno ; Stéfan Jacques Darmoni

Source :

RBID : PMC:3309945

Abstract

Background

PubMed is the main access to medical literature on the Internet. In order to enhance the performance of its information retrieval tools, primarily non-indexed citations, the authors propose a method: expanding users' queries using Unified Medical Language System' (UMLS) synonyms i.e. all the terms gathered under one unique Concept Unique Identifier.

Methods

This method was evaluated using queries constructed to emphasize the differences between this new method and the current PubMed automatic term mapping. Four experts assessed citation relevance.

Results

Using UMLS, we were able to retrieve new citations in 45.5% of queries, which implies a small increase in recall. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved. Of these, 82% have been published less than 4 months earlier. The overall mean precision was 48.4% but differed according to the evaluators, ranging from 36.7% to 88.1% (Inter rater agreement was poor: kappa = 0.34).

Conclusions

This study highlights the need for specific search tools for each type of user and use-cases. The proposed strategy may be useful to retrieve recent scientific advancement.


Url:
DOI: 10.1186/1472-6947-12-12
PubMed: 22376010
PubMed Central: 3309945

Links to Exploration step

PMC:3309945

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Performance evaluation of unified medical language system
<sup>®</sup>
's synonyms expansion to query PubMed</title>
<author>
<name sortKey="Griffon, Nicolas" sort="Griffon, Nicolas" uniqKey="Griffon N" first="Nicolas" last="Griffon">Nicolas Griffon</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chebil, Wiem" sort="Chebil, Wiem" uniqKey="Chebil W" first="Wiem" last="Chebil">Wiem Chebil</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Institute of Management, University of Sousse, Sousse, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rollin, Laetitia" sort="Rollin, Laetitia" uniqKey="Rollin L" first="Laetitia" last="Rollin">Laetitia Rollin</name>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kerdelhue, Gaetan" sort="Kerdelhue, Gaetan" uniqKey="Kerdelhue G" first="Gaetan" last="Kerdelhue">Gaetan Kerdelhue</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Thirion, Benoit" sort="Thirion, Benoit" uniqKey="Thirion B" first="Benoit" last="Thirion">Benoit Thirion</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gehanno, Jean Francois" sort="Gehanno, Jean Francois" uniqKey="Gehanno J" first="Jean-François" last="Gehanno">Jean-François Gehanno</name>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Darmoni, Stefan Jacques" sort="Darmoni, Stefan Jacques" uniqKey="Darmoni S" first="Stéfan Jacques" last="Darmoni">Stéfan Jacques Darmoni</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22376010</idno>
<idno type="pmc">3309945</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3309945</idno>
<idno type="RBID">PMC:3309945</idno>
<idno type="doi">10.1186/1472-6947-12-12</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000695</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000695</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Performance evaluation of unified medical language system
<sup>®</sup>
's synonyms expansion to query PubMed</title>
<author>
<name sortKey="Griffon, Nicolas" sort="Griffon, Nicolas" uniqKey="Griffon N" first="Nicolas" last="Griffon">Nicolas Griffon</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Chebil, Wiem" sort="Chebil, Wiem" uniqKey="Chebil W" first="Wiem" last="Chebil">Wiem Chebil</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Institute of Management, University of Sousse, Sousse, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rollin, Laetitia" sort="Rollin, Laetitia" uniqKey="Rollin L" first="Laetitia" last="Rollin">Laetitia Rollin</name>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kerdelhue, Gaetan" sort="Kerdelhue, Gaetan" uniqKey="Kerdelhue G" first="Gaetan" last="Kerdelhue">Gaetan Kerdelhue</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Thirion, Benoit" sort="Thirion, Benoit" uniqKey="Thirion B" first="Benoit" last="Thirion">Benoit Thirion</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gehanno, Jean Francois" sort="Gehanno, Jean Francois" uniqKey="Gehanno J" first="Jean-François" last="Gehanno">Jean-François Gehanno</name>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Darmoni, Stefan Jacques" sort="Darmoni, Stefan Jacques" uniqKey="Darmoni S" first="Stéfan Jacques" last="Darmoni">Stéfan Jacques Darmoni</name>
<affiliation>
<nlm:aff id="I1">CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">TIBS, Rouen University, LITIS EA 4108, Rouen, France</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Medical Informatics and Decision Making</title>
<idno type="eISSN">1472-6947</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>PubMed is the main access to medical literature on the Internet. In order to enhance the performance of its information retrieval tools, primarily non-indexed citations, the authors propose a method: expanding users' queries using Unified Medical Language System' (UMLS) synonyms i.e. all the terms gathered under one unique Concept Unique Identifier.</p>
</sec>
<sec>
<title>Methods</title>
<p>This method was evaluated using queries constructed to emphasize the differences between this new method and the current PubMed automatic term mapping. Four experts assessed citation relevance.</p>
</sec>
<sec>
<title>Results</title>
<p>Using UMLS, we were able to retrieve new citations in 45.5% of queries, which implies a small increase in recall. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved. Of these, 82% have been published less than 4 months earlier. The overall mean precision was 48.4% but differed according to the evaluators, ranging from 36.7% to 88.1% (Inter rater agreement was poor: kappa = 0.34).</p>
</sec>
<sec>
<title>Conclusions</title>
<p>This study highlights the need for specific search tools for each type of user and use-cases. The proposed strategy may be useful to retrieve recent scientific advancement.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Nelson, Sj" uniqKey="Nelson S">SJ Nelson</name>
</author>
<author>
<name sortKey="Johnson, Wd" uniqKey="Johnson W">WD Johnson</name>
</author>
<author>
<name sortKey="Humphreys, Bl" uniqKey="Humphreys B">BL Humphreys</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Herskovic, Jr" uniqKey="Herskovic J">JR Herskovic</name>
</author>
<author>
<name sortKey="Tanaka, Ly" uniqKey="Tanaka L">LY Tanaka</name>
</author>
<author>
<name sortKey="Hersh, W" uniqKey="Hersh W">W Hersh</name>
</author>
<author>
<name sortKey="Bernstam, Ev" uniqKey="Bernstam E">EV Bernstam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoogendam, A" uniqKey="Hoogendam A">A Hoogendam</name>
</author>
<author>
<name sortKey="Stalenhoef, Af" uniqKey="Stalenhoef A">AF Stalenhoef</name>
</author>
<author>
<name sortKey="Robbe, Pf" uniqKey="Robbe P">PF Robbé</name>
</author>
<author>
<name sortKey="Overbeke, Aj" uniqKey="Overbeke A">AJ Overbeke</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thirion, B" uniqKey="Thirion B">B Thirion</name>
</author>
<author>
<name sortKey="Robu, I" uniqKey="Robu I">I Robu</name>
</author>
<author>
<name sortKey="Darmoni, Sj" uniqKey="Darmoni S">SJ Darmoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, M" uniqKey="Huang M">M Huang</name>
</author>
<author>
<name sortKey="Neveol, A" uniqKey="Neveol A">A Névéol</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grosjean, J" uniqKey="Grosjean J">J Grosjean</name>
</author>
<author>
<name sortKey="Merabti, T" uniqKey="Merabti T">T Merabti</name>
</author>
<author>
<name sortKey="Dahamna, B" uniqKey="Dahamna B">B Dahamna</name>
</author>
<author>
<name sortKey="Kergourlay, I" uniqKey="Kergourlay I">I Kergourlay</name>
</author>
<author>
<name sortKey="Thirion, B" uniqKey="Thirion B">B Thirion</name>
</author>
<author>
<name sortKey="Soualmia, Lf" uniqKey="Soualmia L">LF Soualmia</name>
</author>
<author>
<name sortKey="Darmoni, Sj" uniqKey="Darmoni S">SJ Darmoni</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hersh, W" uniqKey="Hersh W">W Hersh</name>
</author>
<author>
<name sortKey="Buckley, C" uniqKey="Buckley C">C Buckley</name>
</author>
<author>
<name sortKey="Leone, Tj" uniqKey="Leone T">TJ Leone</name>
</author>
<author>
<name sortKey="Hickam, D" uniqKey="Hickam D">D Hickam</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, B" uniqKey="Chen B">B Chen</name>
</author>
<author>
<name sortKey="Zaebst, D" uniqKey="Zaebst D">D Zaebst</name>
</author>
<author>
<name sortKey="Seel, L" uniqKey="Seel L">L Seel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hersh, W" uniqKey="Hersh W">W Hersh</name>
</author>
<author>
<name sortKey="Price, S" uniqKey="Price S">S Price</name>
</author>
<author>
<name sortKey="Donohoe, L" uniqKey="Donohoe L">L Donohoe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
<author>
<name sortKey="Kim, W" uniqKey="Kim W">W Kim</name>
</author>
<author>
<name sortKey="Wilbur, Wj" uniqKey="Wilbur W">WJ Wilbur</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Funk, Me" uniqKey="Funk M">ME Funk</name>
</author>
<author>
<name sortKey="Reid, Ca" uniqKey="Reid C">CA Reid</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Darmoni, Sj" uniqKey="Darmoni S">SJ Darmoni</name>
</author>
<author>
<name sortKey="Pereira, S" uniqKey="Pereira S">S Pereira</name>
</author>
<author>
<name sortKey="Neveol, A" uniqKey="Neveol A">A Névéol</name>
</author>
<author>
<name sortKey="Massari, P" uniqKey="Massari P">P Massari</name>
</author>
<author>
<name sortKey="Dahamna, B" uniqKey="Dahamna B">B Dahamna</name>
</author>
<author>
<name sortKey="Letord, C" uniqKey="Letord C">C Letord</name>
</author>
<author>
<name sortKey="Kedelhue, G" uniqKey="Kedelhue G">G Kedelhué</name>
</author>
<author>
<name sortKey="Piot, J" uniqKey="Piot J">J Piot</name>
</author>
<author>
<name sortKey="Derville, A" uniqKey="Derville A">A Derville</name>
</author>
<author>
<name sortKey="Thirion, B" uniqKey="Thirion B">B Thirion</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Islamaj Do An, R" uniqKey="Islamaj Do An R">R Islamaj Doğan</name>
</author>
<author>
<name sortKey="Murray, Gc" uniqKey="Murray G">GC Murray</name>
</author>
<author>
<name sortKey="Neveol, A" uniqKey="Neveol A">A Névéol</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Med Inform Decis Mak</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Med Inform Decis Mak</journal-id>
<journal-title-group>
<journal-title>BMC Medical Informatics and Decision Making</journal-title>
</journal-title-group>
<issn pub-type="epub">1472-6947</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22376010</article-id>
<article-id pub-id-type="pmc">3309945</article-id>
<article-id pub-id-type="publisher-id">1472-6947-12-12</article-id>
<article-id pub-id-type="doi">10.1186/1472-6947-12-12</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Performance evaluation of unified medical language system
<sup>®</sup>
's synonyms expansion to query PubMed</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes" id="A1">
<name>
<surname>Griffon</surname>
<given-names>Nicolas</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>nicolas.griffon@chu-rouen.fr</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A2">
<name>
<surname>Chebil</surname>
<given-names>Wiem</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>wiem.chebil@yahoo.fr</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A3">
<name>
<surname>Rollin</surname>
<given-names>Laetitia</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>laetitia.rollin@chu-rouen.fr</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A4">
<name>
<surname>Kerdelhue</surname>
<given-names>Gaetan</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>gaetan.kerdelhue@chu-rouen.fr</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A5">
<name>
<surname>Thirion</surname>
<given-names>Benoit</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>benoit.thirion@chu-rouen.fr</email>
</contrib>
<contrib contrib-type="author" equal-contrib="yes" id="A6">
<name>
<surname>Gehanno</surname>
<given-names>Jean-François</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>jean-francois.gehanno@chu-rouen.fr</email>
</contrib>
<contrib contrib-type="author" corresp="yes" equal-contrib="yes" id="A7">
<name>
<surname>Darmoni</surname>
<given-names>Stéfan Jacques</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>stefan.darmoni@chu-rouen.fr</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
CISMeF, Rouen University Hospital, Cour Leschevin, Porte 21, 3ème étage. 1 rue de Germont, 76031 Rouen Cedex, France</aff>
<aff id="I2">
<label>2</label>
TIBS, Rouen University, LITIS EA 4108, Rouen, France</aff>
<aff id="I3">
<label>3</label>
Institute of Management, University of Sousse, Sousse, Tunisia</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>29</day>
<month>2</month>
<year>2012</year>
</pub-date>
<volume>12</volume>
<fpage>12</fpage>
<lpage>12</lpage>
<history>
<date date-type="received">
<day>15</day>
<month>9</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>29</day>
<month>2</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2012 Griffon et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Griffon et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1472-6947/12/12"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>PubMed is the main access to medical literature on the Internet. In order to enhance the performance of its information retrieval tools, primarily non-indexed citations, the authors propose a method: expanding users' queries using Unified Medical Language System' (UMLS) synonyms i.e. all the terms gathered under one unique Concept Unique Identifier.</p>
</sec>
<sec>
<title>Methods</title>
<p>This method was evaluated using queries constructed to emphasize the differences between this new method and the current PubMed automatic term mapping. Four experts assessed citation relevance.</p>
</sec>
<sec>
<title>Results</title>
<p>Using UMLS, we were able to retrieve new citations in 45.5% of queries, which implies a small increase in recall. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved. Of these, 82% have been published less than 4 months earlier. The overall mean precision was 48.4% but differed according to the evaluators, ranging from 36.7% to 88.1% (Inter rater agreement was poor: kappa = 0.34).</p>
</sec>
<sec>
<title>Conclusions</title>
<p>This study highlights the need for specific search tools for each type of user and use-cases. The proposed strategy may be useful to retrieve recent scientific advancement.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>The most important tool to access the medical literature is the PubMed search engine, which allows access to more than 20 millions of biomedical citations. The major part of these citations comes from the MEDLINE bibliographic database, which uses the MeSH thesaurus for indexing [
<xref ref-type="bibr" rid="B1">1</xref>
]. Other citations, i.e. OLDMEDLINE, out of scope or recent citations, are not indexed at the time of user query [
<xref ref-type="bibr" rid="B2">2</xref>
]. The most comprehensive way to find citations in MEDLINE is to use the MeSH thesaurus.</p>
<p>Because one third of Medline queries are performed by members of the general public [
<xref ref-type="bibr" rid="B3">3</xref>
] and furthermore because most health professionals [
<xref ref-type="bibr" rid="B4">4</xref>
] are not aware of this thesaurus, they run free-text queries, as they do when using Google™. This allows searching the entire PubMed collection but does not at all exploit the indexing work produced by National Library of Medicine (NLM) building the MeSH thesaurus and indexing millions of citations. Consequently, the US National Center of Biotechnology Information has developed several techniques (Automatic Term Mapping (ATM)) to map end-user queries to the MeSH thesaurus and other search field descriptors (e.g. author's name, publication's name, etc.) [
<xref ref-type="bibr" rid="B5">5</xref>
]. The first ATM aim is to improve information retrieval in structured information: searching indexes (mostly MeSH terms used to index citations in MEDLINE) instead of only the free text. Almost nothing is done to enhance the search for recent citations (not indexed). This is a limiting factor because 1) these citations contain the most recent scientific discoveries and 2) they are the first returned by PubMed, which displays recent articles first, by default.</p>
<p>Although PubMed ATM query is continuously improved, a recent review [
<xref ref-type="bibr" rid="B6">6</xref>
] has counted 28 different entities that have devoted themselves to develop Web tools for helping users to quickly and effectively search and retrieve relevant publications on MEDLINE. This highlights the need for alternative ways of searching the medical literature. Thirion et al. [
<xref ref-type="bibr" rid="B7">7</xref>
] have shown that it is possible to improve ATM's performance, mainly in precision (tools available with the Doc'CISMeF search engine
<ext-link ext-link-type="uri" xlink:href="http://www.cismef.org">http://www.cismef.org</ext-link>
) using MeSH synonyms. The aim of this paper was to propose an extension to this previous optimization, using Unified Medical Language System
<sup>® </sup>
(UMLS) synonyms, and to assess its performance.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>MeSH & UMLS</title>
<p>The MeSH is the terminology, covering the whole area of medicine, used by the NLM for indexing MEDLINE citations. Each MeSH descriptor is named by a preferred term and may have some entry terms or synonyms, e.g. "myocardial infarction" is the preferred term designating the same MeSH descriptor rather than "myocardial infarct", "infarct, myocardial", etc. which are entry terms, or synonyms.</p>
<p>The UMLS contains a metathesaurus gathering many health terminologies/ontologies (T/O), like MeSH. For each T/O, each term is assigned to one or more concepts in UMLS. We defined as UMLS synonyms all the different terms from different T/O gathered under the same UMLS concept (same Concept Unique Identifier), e.g. "myocardial infarction" from the MeSH, "myocardial infarction" from the WHO Adverse Reaction Terminology (WHO-ART) and "heart attack" from the WHO-ART, etc. are UMLS synonyms as they are within the same UMLS concept.</p>
</sec>
<sec>
<title>Queries</title>
<p>In April 2011, when the ATM was used to match some query terms with a MeSH term, the resulting modified query was different if the query terms matched with the preferred term or an entry term or a UMLS' synonym [
<xref ref-type="bibr" rid="B5">5</xref>
]. If it was the preferred term, the resulting modified query was:
<italic>q1 = "preferred term"[MeSH term] OR "preferred term"[all fields] OR ("word
<sub>1 </sub>
of preferred term"[All Fields] AND "word
<sub>2 </sub>
of preferred term"[All Fields] AND </italic>
etc.) (Table
<xref ref-type="table" rid="T1">1</xref>
). If it was an entry term or a UMLS' synonym, the resulting modified query was:
<italic>q2 = "preferred term"[MeSH term] OR "preferred term"[all fields] OR ("word
<sub>1 </sub>
of preferred term"[All Fields] AND "word
<sub>2 </sub>
of preferred term"[All Fields] AND </italic>
etc.)
<italic>OR "entry term"[All Fields] OR ("word
<sub>1 </sub>
of entry term"[All Fields] AND "word
<sub>2 </sub>
of entry term"[All Fields] AND </italic>
etc.) (Table
<xref ref-type="table" rid="T1">1</xref>
).</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Examples of query</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Query</th>
<th></th>
<th align="center">Query syntax</th>
</tr>
<tr>
<th></th>
<th colspan="2">
<hr></hr>
</th>
</tr>
<tr>
<th></th>
<th align="center">Preferred term</th>
<th align="center">Entry term or UMLS' synonym</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<bold>User</bold>
</td>
<td align="center">
<bold>Myocardial infarction</bold>
</td>
<td align="center">
<bold>Myocardial infarct</bold>
</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">q1</td>
<td align="center">"myocardial infarction"[MeSH Terms] OR ("myocardial"[All Fields] AND "infarction"[All Fields]) OR "myocardial infarction"[All Fields]</td>
<td></td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">q2</td>
<td></td>
<td align="center">"myocardial infarction"[MeSH Terms] OR ("myocardial"[All Fields] AND "infarction"[All Fields]) OR "myocardial infarction"[All Fields] OR ("myocardial"[All Fields] AND "infarct"[All Fields]) OR "myocardial infarct"[All Fields]</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">q3</td>
<td align="center" colspan="2">"myocardial infarction"[MeSH Terms] OR (("infarct, myocardial"[TIAB] OR "infarction, myocardial"[TIAB] OR "myocardial infarcts"[TIAB] OR "myocardial infarct"[TIAB] OR "myocardial infarction"[TIAB] OR "infarcts, myocardial"[TIAB] OR "myocardial infarctions"[TIAB] OR "infarctions, myocardial"[TIAB]) NOT MEDLINE[SB])</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">q4</td>
<td align="center" colspan="2">"myocardial infarction"[MeSH Terms] OR (("infarct, myocardial"[TIAB] OR
<underline>"heart attack"[TIAB</underline>
] OR "infarction, myocardial"[TIAB] OR "myocardial infarcts"[TIAB] OR "myocardial infarct"[TIAB] OR "myocardial infarction"[TIAB] OR
<underline>"myocardial infarction, nos"[TIAB] </underline>
OR "infarcts, myocardial"[TIAB] OR "myocardial infarctions"[TIAB] OR "infarctions, myocardial"[TIAB]) NOT (MEDLINE[SB] OR OldMedline[SB]))</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">q5</td>
<td align="center" colspan="2">q4 NOT q3</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>TIAB: title or abstract; SB: Subset; UMLS' synonyms are underlined</p>
</table-wrap-foot>
</table-wrap>
<p>However, these queries were not the same compared to Thirion et al.'s strategies [
<xref ref-type="bibr" rid="B7">7</xref>
], as the word tokenization had only been added recently.</p>
<p>The improvement made by Thirion et al. consisted in limiting noise in MEDLINE and increasing recall in non-indexed PubMed subsets. When a MeSH term was used in a query, this improvement resulted in the retrieval of: 1) citations indexed with this same MeSH term in MEDLINE and 2) non-indexed citation containing any entry term for this MeSH term in its title or abstract. The corresponding query was:
<italic>q3 = 'preferred term"[MeSH term] OR (("preferred term"[TIAB] OR "entry term
<sub>1</sub>
"[TIAB] OR "entry term
<sub>2</sub>
"[TIAB] OR...) NOT Medline[SB]) </italic>
(Table
<xref ref-type="table" rid="T1">1</xref>
). In contrast to the PubMed ATM, the strategy proposed by Thirion et al. provides the same query, whether or not the query includes preferred terms or entry terms.</p>
<p>In the current study, we propose a new strategy in order to increase recall: adding to the mapped queries all the UMLS synonyms with "ORs":
<italic>q4 = "preferred term"[MeSH term] OR (("preferred term"[TIAB] OR "entry term
<sub>1</sub>
"[TIAB] OR "entry term
<sub>2</sub>
"[TIAB] OR... OR "UMLS synonym
<sub>1</sub>
"[TIAB] OR "UMLS synonym
<sub>2</sub>
"[TIAB] OR...) NOT (Medline[SB] OR OldMedline[SB])) </italic>
(Table
<xref ref-type="table" rid="T1">1</xref>
). The exclusion of OldMedline subset allows this query to focus on Pre-MEDLINE citations ("as supplied by publisher" and "in process" citations), which are not yet manually indexed by NLM curators. Non-indexed citations are not necessarily the latest citations. Nevertheless, according to the NLM customer service [
<xref ref-type="bibr" rid="B8">8</xref>
], time to index varies greatly between all of the different works that MEDLINE indexes. According to their recent statistical analysis, 25% of the citations are completed within 30 days of receipt, 50% within 60 days, and 75% within 90 days. Furthermore, 82% of Pre-MEDLINE citations that were evaluated in this study were published in 2011 and 11% in 2010. Obviously, when multiple UMLS synonyms contained the same spellings, they were not added in the mapped query. For technical purposes, we limited this list of synonyms to those included in the Health Multi-Terminology Portal
<ext-link ext-link-type="uri" xlink:href="http://pts.chu-rouen.fr">http://pts.chu-rouen.fr</ext-link>
[
<xref ref-type="bibr" rid="B9">9</xref>
]: SNOMED CT, SNOMED intl, ICD-10, WHO-ART, WHO-ICF, WHO-ICPC2, LOINC, MedDRA, FMA and MEDLINEPlus.</p>
</sec>
<sec>
<title>Evaluation</title>
<p>For a quantitative assessment, the number of recent citations retrieved only by the new strategy was computed and compared to the entire number of recent citations retrieved.</p>
<p>To evaluate qualitative changes induced by this modification of mapping, we built Boolean queries based on MeSH terms:
<italic>q5 = q4 NOT q3 </italic>
(Table
<xref ref-type="table" rid="T1">1</xref>
). We have selected 20 of the most frequently used MeSH Descriptors (according to the 2011 MEDLINE Baseline Repository data available at
<ext-link ext-link-type="uri" xlink:href="http://mbr.nlm.nih.gov/Download/index.shtml#MeSH">http://mbr.nlm.nih.gov/Download/index.shtml#MeSH</ext-link>
) from the MeSH Diseases Category (C) where q5 provides citations. The choice of the C (diseases) tree from the MeSH thesaurus was driven by its potential impact on daily health care. Two medical librarians (BT and GK) and two physicians (LR and NG) assessed the relevance of the top 20 answers for each query manually after a careful reading of the title and abstract. Retrieved citations were assessed for relevance according to a three-modality scale used in other standard Information Retrieval test sets [
<xref ref-type="bibr" rid="B10">10</xref>
]: bad, partial or full relevance.</p>
<p>Three factors might have an impact on the number of citation retrieved: (a) the number of sons in MeSH hierarchy (b) number of MeSH synonyms (c) number of UMLS synonyms. These factors were recorded and any association was evaluated using Spearman's correlation.</p>
<p>Evaluators' agreement was measured using kappa statistics (SAS Macro MKAPPA [
<xref ref-type="bibr" rid="B11">11</xref>
]). Precision was computed at two levels of relevance: using only fully relevant or fully and partially relevant citation. They were then computed for each evaluator and compared using the Friedman test and Chi
<sup>2 </sup>
test.</p>
</sec>
</sec>
<sec>
<title>Results</title>
<p>Table
<xref ref-type="table" rid="T2">2</xref>
summarizes results for the 43 queries we had to perform in order to obtain 20 citations for 20 queries. For the other 23 queries, q5 query did not produce any results: enhancing query using UMLS synonyms did not add any further results. The new strategy led to a heterogeneous 23.7% mean increase in non-indexed citation retrieved (from 0 to 9,876 new citations retrieved). None of the three tested factors (number of sons in MeSH hierarchy, of MeSH synonyms and of UMLS synonyms) were significantly correlated with the number of citations retrieved or the precision.</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>Increase in non-indexed citation retrieved with the new method</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Query Term</th>
<th align="center">q5</th>
<th align="center">Non-indexed citation with q3</th>
<th align="center">Increase in non-indexed citation retrieved</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">Neoplasms</td>
<td align="center">46</td>
<td align="center">1871</td>
<td align="center">2.5%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Hypertension</td>
<td align="center">23</td>
<td align="center">10547</td>
<td align="center">0.2%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Myocardial infarction</td>
<td align="center">155</td>
<td align="center">5298</td>
<td align="center">2.9%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Coronary disease</td>
<td align="center">41</td>
<td align="center">5397</td>
<td align="center">0.8%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Asthma</td>
<td align="center">133</td>
<td align="center">4149</td>
<td align="center">3.2%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Obesity</td>
<td align="center">379</td>
<td align="center">7552</td>
<td align="center">5.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Liver neoplasms</td>
<td align="center">641</td>
<td align="center">266</td>
<td align="center">241.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Diabetes Mellitus</td>
<td align="center">9876</td>
<td align="center">5033</td>
<td align="center">196.2%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Inflammation</td>
<td align="center">361</td>
<td align="center">13019</td>
<td align="center">2.8%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Heart Failure</td>
<td align="center">272</td>
<td align="center">6042</td>
<td align="center">4.5%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Kidney Failure, Chronic</td>
<td align="center">81</td>
<td align="center">372</td>
<td align="center">21.8%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Alcoholism</td>
<td align="center">295</td>
<td align="center">713</td>
<td align="center">41.4%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Epilepsy</td>
<td align="center">2470</td>
<td align="center">3256</td>
<td align="center">75.9%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Tuberculosis</td>
<td align="center">1238</td>
<td align="center">6752</td>
<td align="center">18.3%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Liver cirrhosis</td>
<td align="center">155</td>
<td align="center">1983</td>
<td align="center">7.8%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Kidney Diseases</td>
<td align="center">2667</td>
<td align="center">1095</td>
<td align="center">243.6%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Cross Infection</td>
<td align="center">167</td>
<td align="center">1255</td>
<td align="center">13.3%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Parkinson Disease</td>
<td align="center">411</td>
<td align="center">396</td>
<td align="center">103.8%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Lymphoma</td>
<td align="center">144</td>
<td align="center">3939</td>
<td align="center">3.7%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Hypersensitivity</td>
<td align="center">159</td>
<td align="center">1357</td>
<td align="center">11.7%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Breast neoplasms</td>
<td align="center">0</td>
<td align="center">11</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Lung neoplasms</td>
<td align="center">0</td>
<td align="center">23</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Skin neoplasms</td>
<td align="center">0</td>
<td align="center">10</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Melanoma</td>
<td align="center">0</td>
<td align="center">2491</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">HIV infections</td>
<td align="center">0</td>
<td align="center">142</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Brain Neoplasms</td>
<td align="center">0</td>
<td align="center">10</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Prostatic Neoplasms</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Arthritis, Rheumatoid</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Neoplasm Metastasis</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Occupational Diseases</td>
<td align="center">0</td>
<td align="center">136</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Neoplasm Recurrence, Local</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Substance-Related Disorders</td>
<td align="center">0</td>
<td align="center">25</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Pregnancy Complications</td>
<td align="center">0</td>
<td align="center">84</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Tuberculosis, Pulmonary</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Genetic Predisposition to Disease</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Wounds and Injuries</td>
<td align="center">0</td>
<td align="center">8</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Diabetes Mellitus, Type 1</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Ovarian Neoplasms</td>
<td align="center">0</td>
<td align="center">24</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Uterine Cervical Neoplasms</td>
<td align="center">0</td>
<td align="center">1</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Arrhythmias, Cardiac</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Pancreatic Neoplasms</td>
<td align="center">0</td>
<td align="center">30</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Colorectal Neoplasms</td>
<td align="center">0</td>
<td align="center">14</td>
<td align="center">0.0%</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Lupus Erythematosus, Systemic</td>
<td align="center">0</td>
<td align="center">0</td>
<td align="center">-</td>
</tr>
<tr>
<td colspan="4">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">Total</td>
<td align="center">19714</td>
<td align="center">83341</td>
<td align="center">23.7%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For the 20 studied MeSH Descriptors, inter-rater agreement was poor: multi-rater's kappa was 0.34. Results of relevance evaluation are summarized in Table
<xref ref-type="table" rid="T3">3</xref>
. The mean precision for fully relevant citation was 48.4% CI
<sub>95% </sub>
= [45.8-50.9] but this number does not reflect discrepancies between evaluators: three evaluators (BT, GK and NG) found full relevance around 40% (43.7%, 36.7% and 37.4%, respectively) and one (LR) found 75%. Results are somewhat better for partially relevant citations but have a similar pattern, LR's evaluations were often more relevant than other evaluations: mean partial precision was 59.8% CI
<sub>95% </sub>
= [57.3-62.2]. BT, GK and NG found a precision of about 50% (50.1%, 51.7% and 48.2% respectively) whereas LR found 88.1%. Differences between evaluators were significant (p < 0.001, Friedman test). There was also a significant difference of precision depending on the MeSH term (data not shown, p < 0.001, Chi
<sup>2 </sup>
test): for 8 MeSH term the full relevance precision was higher than 0.5, for 8 MeSH terms the partial and full relevance precision was less than 0.5.</p>
<table-wrap id="T3" position="float">
<label>Table 3</label>
<caption>
<p>Precision, by experts and relevance</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Expert</th>
<th align="center" colspan="2">Precision [CI95%]</th>
</tr>
<tr>
<th></th>
<th colspan="2">
<hr></hr>
</th>
</tr>
<tr>
<th></th>
<th align="center">Partial and Full relevance</th>
<th align="center">Full relevance</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">LR</td>
<td align="center">88.1% [84.9-91.3]</td>
<td align="center">75.0% [70.7-79.3]</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">NG</td>
<td align="center">48.2% [43.3-53.2]</td>
<td align="center">37.4% [32.6-42.1]</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">BT</td>
<td align="center">50.1% [45.0-55.2]</td>
<td align="center">43.7% [38.6-48.7]</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">GK</td>
<td align="center">51.7% [46.7-56.7]</td>
<td align="center">36.7% [31.8-41.5]</td>
</tr>
<tr>
<td colspan="3">
<hr></hr>
</td>
</tr>
<tr>
<td align="center">All</td>
<td align="center">59.8% [57.3-62.2]</td>
<td align="center">48.4% [45.8-50.9]</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Discussion</title>
<p>Enhancing information retrieval is one possible use of UMLS [
<xref ref-type="bibr" rid="B12">12</xref>
]. The new strategy led to a slight increase in non-indexed citation retrieval (23.7%) for a precision very similar to those observed in previous reports studying PubMed performances: Thirion et al. [
<xref ref-type="bibr" rid="B7">7</xref>
] showed a precision of 54.5%; Lu et al. [
<xref ref-type="bibr" rid="B13">13</xref>
], for a normal use of PubMed, found a mean rank precision for the 20 top results between 40% and 55%.</p>
<p>Nevertheless, this study has some limitations: First, the absence of a control group to make a comparison led to difficult interpretation of results. However, consistency with literature review suggests that there was no major bias. Second, we used queries based on one MeSH term from the "disease" tree (C). However, would the results be similar for other MeSH tree terms, queries including several MeSH terms or queries including MeSH terms and keywords? Third, there is great variation in the results of the expansion proposed here between queries. The three factors tested were not significantly correlated with precision, but a qualitative assessment of the results was manually performed:</p>
<p>(a) Some UMLS synonyms provide very good results (e.g. "hepatoma" for "liver neoplasms", "Nephropathy" for "Kidney Diseases"), probably because they are very similar to a son of MeSH Descriptor.</p>
<p>(b) Some UMLS synonyms are ambiguous acronyms that generate a lot of noise (e.g. TB for tuberculosis).</p>
<p>(c) Some MeSH descriptors correspond to frequent confounding factors (e.g. "hypertension", "obesity"). Results of retrieved citations are adjusted based on these factors but they are not the real subjects of the citations (mean precision for fully relevant citations: 21.1%, 23% respectively).</p>
<p>We have also tried to explain the number of newly retrieved citations using the q5 query, which varies from 0 to 9,876 (for Diabetes Mellitus; see Table
<xref ref-type="table" rid="T2">2</xref>
).</p>
<p>(a) The number of UMLS synonyms greatly varies from 0 to 38, with a median of 10 (data not shown). This natural source of variation was not confirmed by correlation tests. The difference must be more qualitative:</p>
<p>(b) Some UMLS synonyms do not provide any added value in information retrieval (e.g. all the synonyms finishing by ", NOS" will not provide any citation).</p>
<p>Fourth, the relevance assessment was performed on title and abstract alone, but not with the full text of the article. Although this could have introduced a bias in this study, it seems to us more pragmatic as most end-users select the relevant citation based on title and abstract alone. Lastly, the poor inter-rater agreement measured here (kappa = 0.34) suggests that we do not really know what we are measuring, even if it is common for this type of study [
<xref ref-type="bibr" rid="B14">14</xref>
]. This poor kappa score, and the surprising distribution of results, only highlights differences between users. The improvement proposed here is probably not of interest for some users but may be of interest for others. Based on this study, we have implemented the following three procedures to query MEDLINE via PubMed in the following tool InfoRoute, French Infobutton (URL: inforoute.churouen.fr) [
<xref ref-type="bibr" rid="B15">15</xref>
]:</p>
<p>(a) The classical PubMed ATM</p>
<p>(b) The previous procedure developed by Thirion et al. (semantic expansion with MeSH Entry terms)</p>
<p>(c) The current procedure (semantic expansion with UMLS synonyms)</p>
<p>Different types of users should use these three procedures. Users expecting the most exhaustive results, even at the cost of some noise, should use the latest one. This type of users wants to maximize the recall.</p>
<p>Lu et al. [
<xref ref-type="bibr" rid="B6">6</xref>
] reviewed 28 different ways to access MEDLINE citations. The search strategy we propose could possibly be the 29
<sup>th</sup>
. However, when compared to other teams' strategy to improve PubMed information retrieval, the ones developed by our team modify the ATM and then are applicable in the PubMed interface. In fact, there is no need to integrate and update the MEDLINE bibliographic database in our information system.</p>
<p>Considering the huge number of citations retrieved by each q3 query (frequently more than dozens of thousands, data not shown), the increased number of recent citations retrieved may not lead to an important increase in recall. Nevertheless, the proposed strategy is based on the following assertion: a citation that is not indexed with a MeSH term does not have to be retrieved whatever the semantic expansion was used.</p>
<p>Based on this, the new strategy will only retrieve new citations not belonging to MEDLINE that represent more than ¾ of PubMed citation. We observed a 23.7% increase in recall for the citations aimed by the new strategy, which is not insignificant for the users, especially if they are searching for recent scientific advancements. This improvement mainly concerns new citations (82% of the citations retrieved by q5 have been published less than 4 months earlier). Furthermore, these citations, ranked first by PubMed, may be of great interest for PubMed users who frequently do not read more than the top 20 answers [
<xref ref-type="bibr" rid="B16">16</xref>
].</p>
<p>In contrast to PubMed, we assumed that when end users search for a disease name in PubMed, they do not add synonyms because of laxity or unawareness. It could be useful to add the son's preferred terms, son's entry terms and son's UMLS synonyms to the query with "ORs". This would eventually lead to an increase in recall and in proportion of queries retrieving additional citation (20 on 43 for this study) and a decrease in precision. However, it would drastically increase query size and resources needs, which are already quite substantial.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>The expansion of queries using UMLS' synonyms may not be of interest for all PubMed users, but could be quite useful when seeking for exhaustivity (review, meta-analysis, etc.) as well as when searching for the latest scientific citations. This study highlights the need for specific search tools for each type of user and use-cases.</p>
</sec>
<sec>
<title>Abbreviations</title>
<p>ATM: Automatic term mapping; FMA: Foundational model for anatomy; ICD10: International classification of diseases: tenth revision; ICF: International Classification of Functioning: Disability and Health; ICPC2: International Classification of Primary Care: Second edition; LOINC: Logical Observation Identifiers Names and Codes; MedDRA: Medical Dictionary for Regulatory Activities; MeSH: Medical subject heading; NLM: National library of medicine; SB: Subset; SNOMED CT: Systematized nomenclature of medicine clinical terms; SNOMED intl: Systematized nomenclature of medicine international; TIAB: Title or abstract; UMLS: Unified medical language system; WHO: World health organization.</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>JFG and SJD formulated the idea of this study, design it and participated in writing the draft. NG evaluated citations' relevance, made statistical analysis and wrote the draft. LR, GK and BT have evaluated citation's relevance and have participated in writing. WC has built queries, has retrieved results and has participated in writing the draft. All authors read and approved the final manuscript.</p>
</sec>
<sec>
<title>Pre-publication history</title>
<p>The pre-publication history for this paper can be accessed here:</p>
<p>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1472-6947/12/12/prepub">http://www.biomedcentral.com/1472-6947/12/12/prepub</ext-link>
</p>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>The authors thank Richard Medeiros, Rouen University Hospital medical educator, for editing this manuscript.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="book">
<name>
<surname>Nelson</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>WD</given-names>
</name>
<name>
<surname>Humphreys</surname>
<given-names>BL</given-names>
</name>
<person-group person-group-type="editor">Bean CA, Green R</person-group>
<article-title>Relationship in medical subject headings</article-title>
<source>Relationships in the Organization of Knowledge</source>
<year>2001</year>
<publisher-name>New York: Kluwer Academic Publishers</publisher-name>
<fpage>171</fpage>
<lpage>84</lpage>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="other">
<article-title>What's the Difference Between MEDLINE
<sup>® </sup>
and PubMed
<sup>®</sup>
?</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html">http://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html</ext-link>
Accessed in 17 February 2012.]</comment>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Herskovic</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Tanaka</surname>
<given-names>LY</given-names>
</name>
<name>
<surname>Hersh</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Bernstam</surname>
<given-names>EV</given-names>
</name>
<article-title>A day in the life of PubMed: Analysis of a typical day's query log</article-title>
<source>J Am Med Inform Assoc</source>
<year>2007</year>
<volume>14</volume>
<issue>2</issue>
<fpage>212</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.1197/jamia.M2191</pub-id>
<pub-id pub-id-type="pmid">17213501</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Hoogendam</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Stalenhoef</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Robbé</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Overbeke</surname>
<given-names>AJ</given-names>
</name>
<article-title>Analysis of queries sent to PubMed at the point of care: observation of search behaviour in a medical teaching hospital</article-title>
<source>BMC Med Inform Decis Mak</source>
<year>2008</year>
<volume>8</volume>
<fpage>42</fpage>
<pub-id pub-id-type="doi">10.1186/1472-6947-8-42</pub-id>
<pub-id pub-id-type="pmid">18816391</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="other">
<collab>How PubMed works</collab>
<article-title>Automatic Term Mapping</article-title>
<comment>[
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.How_PubMed_works_aut">http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.How_PubMed_works_aut</ext-link>
Accessed in 17 February 2012.]</comment>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<article-title>PubMed and beyond: a survey of web tools for searching biomedical literature</article-title>
<source>Database</source>
<year>2011</year>
<volume>2011</volume>
<fpage>baq036</fpage>
<pub-id pub-id-type="doi">10.1093/database/baq036</pub-id>
<pub-id pub-id-type="pmid">21245076</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Thirion</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Robu</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Darmoni</surname>
<given-names>SJ</given-names>
</name>
<article-title>Optimization of the PubMed Automatic Term Mapping</article-title>
<source>Stud Health Technol Inform</source>
<year>2009</year>
<volume>150</volume>
<fpage>238</fpage>
<lpage>42</lpage>
<pub-id pub-id-type="pmid">19745304</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Huang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Névéol</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<article-title>Recommending MeSH terms for annotating biomedical articles</article-title>
<source>J Am Med Inform Assoc</source>
<year>2011</year>
<volume>18</volume>
<issue>5</issue>
<fpage>660</fpage>
<lpage>7</lpage>
<comment>Epub 2011 May 25</comment>
<pub-id pub-id-type="doi">10.1136/amiajnl-2010-000055</pub-id>
<pub-id pub-id-type="pmid">21613640</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Grosjean</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Merabti</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Dahamna</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kergourlay</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Thirion</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Soualmia</surname>
<given-names>LF</given-names>
</name>
<name>
<surname>Darmoni</surname>
<given-names>SJ</given-names>
</name>
<article-title>Health multi-terminology portal: a semantic added-value for patient safety</article-title>
<source>Stud Health Technol Inform</source>
<year>2011</year>
<volume>166</volume>
<fpage>129</fpage>
<lpage>38</lpage>
<pub-id pub-id-type="pmid">21685618</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="book">
<name>
<surname>Hersh</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Buckley</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Leone</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Hickam</surname>
<given-names>D</given-names>
</name>
<article-title>OHSUmed: An interactive retrieval evaluation and new large test collection for research</article-title>
<source>Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval: 1994</source>
<year>1994</year>
<publisher-name>Springer-Verlag New York</publisher-name>
<fpage>192</fpage>
<lpage>201</lpage>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="other">
<name>
<surname>Chen</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zaebst</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Seel</surname>
<given-names>L</given-names>
</name>
<article-title>A Macro to Calculate Kappa Statistics for Categorizations by Multiple Raters</article-title>
<source>Proceedings of the 30th SAS User Group International conference: April 2005; Philadelphia</source>
<fpage>155</fpage>
<lpage>30</lpage>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="other">
<name>
<surname>Hersh</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Price</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Donohoe</surname>
<given-names>L</given-names>
</name>
<article-title>Assessing thesaurus based query expansion using the UMLS metathesaurus</article-title>
<source>Proceedings of AMIA symposium: November 4-8, 2000; Los Angeles</source>
<fpage>344</fpage>
<lpage>8</lpage>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Wilbur</surname>
<given-names>WJ</given-names>
</name>
<article-title>Evaluation of Query Expansion Using MeSH in PubMed</article-title>
<source>Inf Retr Boston</source>
<year>2009</year>
<volume>12</volume>
<issue>1</issue>
<fpage>69</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="pmid">19774223</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Funk</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Reid</surname>
<given-names>CA</given-names>
</name>
<article-title>Indexing consistency in MEDLINE</article-title>
<source>Bulletin of the Medical Library Association</source>
<year>1983</year>
<volume>2</volume>
<issue>71</issue>
<fpage>176</fpage>
<lpage>83</lpage>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="book">
<name>
<surname>Darmoni</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Pereira</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Névéol</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Massari</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Dahamna</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Letord</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kedelhué</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Piot</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Derville</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Thirion</surname>
<given-names>B</given-names>
</name>
<article-title>French Infobutton: an academic and... business perspective</article-title>
<source>AMIA Symp</source>
<year>2008</year>
<publisher-name>IOS Press</publisher-name>
<fpage>920</fpage>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="other">
<name>
<surname>Islamaj Doğan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Murray</surname>
<given-names>GC</given-names>
</name>
<name>
<surname>Névéol</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
<article-title>Understanding PubMed user search behavior through log analysis</article-title>
<source>Database</source>
<year>2009</year>
<fpage>bap018</fpage>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/ParkinsonFranceV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000695 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000695 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    ParkinsonFranceV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3309945
   |texte=   Performance evaluation of unified medical language system®'s synonyms expansion to query PubMed
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22376010" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a ParkinsonFranceV1 

Wicri

This area was generated with Dilib version V0.6.29.
Data generation: Wed May 17 19:46:39 2017. Site generation: Mon Mar 4 15:48:15 2024