LINNAEUS: A species name identification system for biomedical literature
Identifieur interne : 000148 ( Pmc/Corpus ); précédent : 000147; suivant : 000149LINNAEUS: A species name identification system for biomedical literature
Auteurs : Martin Gerner ; Goran Nenadic ; Casey M. BergmanSource :
- BMC Bioinformatics [ 1471-2105 ] ; 2010.
Abstract
The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.
In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers.
LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at
Url:
DOI: 10.1186/1471-2105-11-85
PubMed: 20149233
PubMed Central: 2836304
Links to Exploration step
PMC:2836304***** Acces problem to record *****\Le document en format XML
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000148 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000148 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Pmc |étape= Corpus |type= RBID |clé= PMC:2836304 |texte= LINNAEUS: A species name identification system for biomedical literature }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i -Sk "pubmed:20149233" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a OcrV1
This area was generated with Dilib version V0.6.32. |