NetiNeti: discovery of scientific names from text using machine learning methods
Identifieur interne : 000143 ( Ncbi/Merge ); précédent : 000142; suivant : 000144NetiNeti: discovery of scientific names from text using machine learning methods
Auteurs : Lakshmi Manohar Akella [États-Unis] ; Catherine N. Norton [États-Unis] ; Holly Miller [États-Unis]Source :
- BMC Bioinformatics [ 1471-2105 ] ; 2012.
English descriptors
- KwdEn :
- MESH :
Abstract
A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.
We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages.
We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at
Url:
DOI: 10.1186/1471-2105-13-211
PubMed: 22913485
PubMed Central: 3542245
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000097
- to stream Pmc, to step Curation: 000097
- to stream Pmc, to step Checkpoint: 000097
- to stream PubMed, to step Corpus: 000026
- to stream PubMed, to step Curation: 000026
- to stream PubMed, to step Checkpoint: 000026
Links to Exploration step
PMC:3542245Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">NetiNeti: discovery of scientific names from text using machine learning methods</title>
<author><name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="I2">Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179</wicri:regionArea>
<placeName><region type="state">Illinois</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">22913485</idno>
<idno type="pmc">3542245</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542245</idno>
<idno type="RBID">PMC:3542245</idno>
<idno type="doi">10.1186/1471-2105-13-211</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000097</idno>
<idno type="wicri:Area/Pmc/Curation">000097</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000097</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="wicri:Area/PubMed/Corpus">000026</idno>
<idno type="wicri:Area/PubMed/Curation">000026</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000026</idno>
<idno type="wicri:Area/Ncbi/Merge">000143</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">NetiNeti: discovery of scientific names from text using machine learning methods</title>
<author><name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="I2">Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179</wicri:regionArea>
<placeName><region type="state">Illinois</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Animals</term>
<term>Artificial Intelligence</term>
<term>Classification</term>
<term>Data Mining</term>
<term>MEDLINE</term>
<term>PubMed</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Animals</term>
<term>Artificial Intelligence</term>
<term>Classification</term>
<term>Data Mining</term>
<term>MEDLINE</term>
<term>PubMed</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.</p>
</sec>
<sec><title>Results</title>
<p>We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages.</p>
</sec>
<sec><title>Conclusions</title>
<p>We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://namefinding.ubio.org.">http://namefinding.ubio.org.</ext-link>
</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Poon, H" uniqKey="Poon H">H Poon</name>
</author>
<author><name sortKey="Vanderwende, L" uniqKey="Vanderwende L">L Vanderwende</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gerner, M" uniqKey="Gerner M">M Gerner</name>
</author>
<author><name sortKey="Nenadic, G" uniqKey="Nenadic G">G Nenadic</name>
</author>
<author><name sortKey="Bergman, Cm" uniqKey="Bergman C">CM Bergman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kappeler, T" uniqKey="Kappeler T">T Kappeler</name>
</author>
<author><name sortKey="Kaljurand, K" uniqKey="Kaljurand K">K Kaljurand</name>
</author>
<author><name sortKey="Rinaldi, F" uniqKey="Rinaldi F">F Rinaldi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hakenberg, J" uniqKey="Hakenberg J">J Hakenberg</name>
</author>
<author><name sortKey="Plake, C" uniqKey="Plake C">C Plake</name>
</author>
<author><name sortKey="Leaman, R" uniqKey="Leaman R">R Leaman</name>
</author>
<author><name sortKey="Schroeder, M" uniqKey="Schroeder M">M Schroeder</name>
</author>
<author><name sortKey="Gonzalez, G" uniqKey="Gonzalez G">G Gonzalez</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hanisch, D" uniqKey="Hanisch D">D Hanisch</name>
</author>
<author><name sortKey="Fundel, K" uniqKey="Fundel K">K Fundel</name>
</author>
<author><name sortKey="Mevissen, Ht" uniqKey="Mevissen H">HT Mevissen</name>
</author>
<author><name sortKey="Zimmer, R" uniqKey="Zimmer R">R Zimmer</name>
</author>
<author><name sortKey="Fluck, J" uniqKey="Fluck J">J Fluck</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Matthews, M" uniqKey="Matthews M">M Matthews</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Borthwick, A" uniqKey="Borthwick A">A Borthwick</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chieu, Hl" uniqKey="Chieu H">HL Chieu</name>
</author>
<author><name sortKey="Ng, Ht" uniqKey="Ng H">HT Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author><name sortKey="Cooper, J" uniqKey="Cooper J">J Cooper</name>
</author>
<author><name sortKey="Kirk, Pm" uniqKey="Kirk P">PM Kirk</name>
</author>
<author><name sortKey="Pyle, Rl" uniqKey="Pyle R">RL Pyle</name>
</author>
<author><name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author><name sortKey="Remsen, D" uniqKey="Remsen D">D Remsen</name>
</author>
<author><name sortKey="Marino, Wa" uniqKey="Marino W">WA Marino</name>
</author>
<author><name sortKey="Norton, C" uniqKey="Norton C">C Norton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Leary, Pr" uniqKey="Leary P">PR Leary</name>
</author>
<author><name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
<author><name sortKey="Norton, Cn" uniqKey="Norton C">CN Norton</name>
</author>
<author><name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author><name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Page, Rd" uniqKey="Page R">RD Page</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koning, D" uniqKey="Koning D">D Koning</name>
</author>
<author><name sortKey="Sarkar, I" uniqKey="Sarkar I">I Sarkar</name>
</author>
<author><name sortKey="Mortiz, T" uniqKey="Mortiz T">T Mortiz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sautter, G" uniqKey="Sautter G">G Sautter</name>
</author>
<author><name sortKey="Bohm, K" uniqKey="Bohm K">K Böhm</name>
</author>
<author><name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hopcroft, Je" uniqKey="Hopcroft J">JE Hopcroft</name>
</author>
<author><name sortKey="Motwani, R" uniqKey="Motwani R">R Motwani</name>
</author>
<author><name sortKey="Ullman, Jd" uniqKey="Ullman J">JD Ullman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Okazaki, N" uniqKey="Okazaki N">N Okazaki</name>
</author>
<author><name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Plake, C" uniqKey="Plake C">C Plake</name>
</author>
<author><name sortKey="Schiemann, T" uniqKey="Schiemann T">T Schiemann</name>
</author>
<author><name sortKey="Pankalla, M" uniqKey="Pankalla M">M Pankalla</name>
</author>
<author><name sortKey="Hakenberg, J" uniqKey="Hakenberg J">J Hakenberg</name>
</author>
<author><name sortKey="Leser, U" uniqKey="Leser U">U Leser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rebholz Schuhmann, D" uniqKey="Rebholz Schuhmann D">D Rebholz-Schuhmann</name>
</author>
<author><name sortKey="Arregui, M" uniqKey="Arregui M">M Arregui</name>
</author>
<author><name sortKey="Gaudan, S" uniqKey="Gaudan S">S Gaudan</name>
</author>
<author><name sortKey="Kirsch, H" uniqKey="Kirsch H">H Kirsch</name>
</author>
<author><name sortKey="Jimeno, A" uniqKey="Jimeno A">A Jimeno</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Grover, C" uniqKey="Grover C">C Grover</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
<author><name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Rish, I" uniqKey="Rish I">I Rish</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mitchell, Tm" uniqKey="Mitchell T">TM Mitchell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Domingos, P" uniqKey="Domingos P">P Domingos</name>
</author>
<author><name sortKey="Pazzani, M" uniqKey="Pazzani M">M Pazzani</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Beeferman, D" uniqKey="Beeferman D">D Beeferman</name>
</author>
<author><name sortKey="Berger, A" uniqKey="Berger A">A Berger</name>
</author>
<author><name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ratnaparkhi, A" uniqKey="Ratnaparkhi A">A Ratnaparkhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rosenfeld, R" uniqKey="Rosenfeld R">R Rosenfeld</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nigam, K" uniqKey="Nigam K">K Nigam</name>
</author>
<author><name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
<author><name sortKey="Mccallum, A" uniqKey="Mccallum A">A Mccallum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berger, Al" uniqKey="Berger A">AL Berger</name>
</author>
<author><name sortKey="Dellapietra, Sa" uniqKey="Dellapietra S">SA DellaPietra</name>
</author>
<author><name sortKey="Dellapietra, Vj" uniqKey="Dellapietra V">VJ DellaPietra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dellapietra, S" uniqKey="Dellapietra S">S DellaPietra</name>
</author>
<author><name sortKey="Dellapietra, V" uniqKey="Dellapietra V">V DellaPietra</name>
</author>
<author><name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Darroch, Jn" uniqKey="Darroch J">JN Darroch</name>
</author>
<author><name sortKey="Ratcliff, D" uniqKey="Ratcliff D">D Ratcliff</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nocedal, J" uniqKey="Nocedal J">J Nocedal</name>
</author>
<author><name sortKey="Wright, S" uniqKey="Wright S">S Wright</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Malouf, R" uniqKey="Malouf R">R Malouf</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Goodrich, Bsg" uniqKey="Goodrich B">BSG Goodrich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Abbott, Rt" uniqKey="Abbott R">RT Abbott</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Quinlan, Jr" uniqKey="Quinlan J">JR Quinlan</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="22913485"><pmc><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">NetiNeti: discovery of scientific names from text using machine learning methods</title>
<author><name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="I2">Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179</wicri:regionArea>
<placeName><region type="state">Illinois</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">22913485</idno>
<idno type="pmc">3542245</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542245</idno>
<idno type="RBID">PMC:3542245</idno>
<idno type="doi">10.1186/1471-2105-13-211</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000097</idno>
<idno type="wicri:Area/Pmc/Curation">000097</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000097</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">NetiNeti: discovery of scientific names from text using machine learning methods</title>
<author><name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="I2">Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Present address: Sears Holdings Corporation, Hoffman Estates, IL 60179</wicri:regionArea>
<placeName><region type="state">Illinois</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
<affiliation wicri:level="2"><nlm:aff id="I1">MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.</p>
</sec>
<sec><title>Results</title>
<p>We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages.</p>
</sec>
<sec><title>Conclusions</title>
<p>We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at
<ext-link ext-link-type="uri" xlink:href="http://namefinding.ubio.org.">http://namefinding.ubio.org.</ext-link>
</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Poon, H" uniqKey="Poon H">H Poon</name>
</author>
<author><name sortKey="Vanderwende, L" uniqKey="Vanderwende L">L Vanderwende</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gerner, M" uniqKey="Gerner M">M Gerner</name>
</author>
<author><name sortKey="Nenadic, G" uniqKey="Nenadic G">G Nenadic</name>
</author>
<author><name sortKey="Bergman, Cm" uniqKey="Bergman C">CM Bergman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kappeler, T" uniqKey="Kappeler T">T Kappeler</name>
</author>
<author><name sortKey="Kaljurand, K" uniqKey="Kaljurand K">K Kaljurand</name>
</author>
<author><name sortKey="Rinaldi, F" uniqKey="Rinaldi F">F Rinaldi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hakenberg, J" uniqKey="Hakenberg J">J Hakenberg</name>
</author>
<author><name sortKey="Plake, C" uniqKey="Plake C">C Plake</name>
</author>
<author><name sortKey="Leaman, R" uniqKey="Leaman R">R Leaman</name>
</author>
<author><name sortKey="Schroeder, M" uniqKey="Schroeder M">M Schroeder</name>
</author>
<author><name sortKey="Gonzalez, G" uniqKey="Gonzalez G">G Gonzalez</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hanisch, D" uniqKey="Hanisch D">D Hanisch</name>
</author>
<author><name sortKey="Fundel, K" uniqKey="Fundel K">K Fundel</name>
</author>
<author><name sortKey="Mevissen, Ht" uniqKey="Mevissen H">HT Mevissen</name>
</author>
<author><name sortKey="Zimmer, R" uniqKey="Zimmer R">R Zimmer</name>
</author>
<author><name sortKey="Fluck, J" uniqKey="Fluck J">J Fluck</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Matthews, M" uniqKey="Matthews M">M Matthews</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Borthwick, A" uniqKey="Borthwick A">A Borthwick</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chieu, Hl" uniqKey="Chieu H">HL Chieu</name>
</author>
<author><name sortKey="Ng, Ht" uniqKey="Ng H">HT Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author><name sortKey="Cooper, J" uniqKey="Cooper J">J Cooper</name>
</author>
<author><name sortKey="Kirk, Pm" uniqKey="Kirk P">PM Kirk</name>
</author>
<author><name sortKey="Pyle, Rl" uniqKey="Pyle R">RL Pyle</name>
</author>
<author><name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author><name sortKey="Remsen, D" uniqKey="Remsen D">D Remsen</name>
</author>
<author><name sortKey="Marino, Wa" uniqKey="Marino W">WA Marino</name>
</author>
<author><name sortKey="Norton, C" uniqKey="Norton C">C Norton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Leary, Pr" uniqKey="Leary P">PR Leary</name>
</author>
<author><name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
<author><name sortKey="Norton, Cn" uniqKey="Norton C">CN Norton</name>
</author>
<author><name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author><name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Page, Rd" uniqKey="Page R">RD Page</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Koning, D" uniqKey="Koning D">D Koning</name>
</author>
<author><name sortKey="Sarkar, I" uniqKey="Sarkar I">I Sarkar</name>
</author>
<author><name sortKey="Mortiz, T" uniqKey="Mortiz T">T Mortiz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sautter, G" uniqKey="Sautter G">G Sautter</name>
</author>
<author><name sortKey="Bohm, K" uniqKey="Bohm K">K Böhm</name>
</author>
<author><name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hopcroft, Je" uniqKey="Hopcroft J">JE Hopcroft</name>
</author>
<author><name sortKey="Motwani, R" uniqKey="Motwani R">R Motwani</name>
</author>
<author><name sortKey="Ullman, Jd" uniqKey="Ullman J">JD Ullman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Okazaki, N" uniqKey="Okazaki N">N Okazaki</name>
</author>
<author><name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Plake, C" uniqKey="Plake C">C Plake</name>
</author>
<author><name sortKey="Schiemann, T" uniqKey="Schiemann T">T Schiemann</name>
</author>
<author><name sortKey="Pankalla, M" uniqKey="Pankalla M">M Pankalla</name>
</author>
<author><name sortKey="Hakenberg, J" uniqKey="Hakenberg J">J Hakenberg</name>
</author>
<author><name sortKey="Leser, U" uniqKey="Leser U">U Leser</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rebholz Schuhmann, D" uniqKey="Rebholz Schuhmann D">D Rebholz-Schuhmann</name>
</author>
<author><name sortKey="Arregui, M" uniqKey="Arregui M">M Arregui</name>
</author>
<author><name sortKey="Gaudan, S" uniqKey="Gaudan S">S Gaudan</name>
</author>
<author><name sortKey="Kirsch, H" uniqKey="Kirsch H">H Kirsch</name>
</author>
<author><name sortKey="Jimeno, A" uniqKey="Jimeno A">A Jimeno</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Grover, C" uniqKey="Grover C">C Grover</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author><name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
<author><name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Rish, I" uniqKey="Rish I">I Rish</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mitchell, Tm" uniqKey="Mitchell T">TM Mitchell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Domingos, P" uniqKey="Domingos P">P Domingos</name>
</author>
<author><name sortKey="Pazzani, M" uniqKey="Pazzani M">M Pazzani</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Beeferman, D" uniqKey="Beeferman D">D Beeferman</name>
</author>
<author><name sortKey="Berger, A" uniqKey="Berger A">A Berger</name>
</author>
<author><name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ratnaparkhi, A" uniqKey="Ratnaparkhi A">A Ratnaparkhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rosenfeld, R" uniqKey="Rosenfeld R">R Rosenfeld</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nigam, K" uniqKey="Nigam K">K Nigam</name>
</author>
<author><name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
<author><name sortKey="Mccallum, A" uniqKey="Mccallum A">A Mccallum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berger, Al" uniqKey="Berger A">AL Berger</name>
</author>
<author><name sortKey="Dellapietra, Sa" uniqKey="Dellapietra S">SA DellaPietra</name>
</author>
<author><name sortKey="Dellapietra, Vj" uniqKey="Dellapietra V">VJ DellaPietra</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dellapietra, S" uniqKey="Dellapietra S">S DellaPietra</name>
</author>
<author><name sortKey="Dellapietra, V" uniqKey="Dellapietra V">V DellaPietra</name>
</author>
<author><name sortKey="Lafferty, J" uniqKey="Lafferty J">J Lafferty</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Darroch, Jn" uniqKey="Darroch J">JN Darroch</name>
</author>
<author><name sortKey="Ratcliff, D" uniqKey="Ratcliff D">D Ratcliff</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nocedal, J" uniqKey="Nocedal J">J Nocedal</name>
</author>
<author><name sortKey="Wright, S" uniqKey="Wright S">S Wright</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Malouf, R" uniqKey="Malouf R">R Malouf</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Goodrich, Bsg" uniqKey="Goodrich B">BSG Goodrich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Abbott, Rt" uniqKey="Abbott R">RT Abbott</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Quinlan, Jr" uniqKey="Quinlan J">JR Quinlan</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">NetiNeti: discovery of scientific names from text using machine learning methods.</title>
<author><name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2"><nlm:affiliation>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA. manohar.akella@gmail.com</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
</author>
<author><name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="doi">10.1186/1471-2105-13-211</idno>
<idno type="RBID">pubmed:22913485</idno>
<idno type="pmid">22913485</idno>
<idno type="wicri:Area/PubMed/Corpus">000026</idno>
<idno type="wicri:Area/PubMed/Curation">000026</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000026</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">NetiNeti: discovery of scientific names from text using machine learning methods.</title>
<author><name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2"><nlm:affiliation>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA. manohar.akella@gmail.com</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
</author>
<author><name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
</author>
</analytic>
<series><title level="j">BMC bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Animals</term>
<term>Artificial Intelligence</term>
<term>Classification</term>
<term>Data Mining</term>
<term>MEDLINE</term>
<term>PubMed</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Animals</term>
<term>Artificial Intelligence</term>
<term>Classification</term>
<term>Data Mining</term>
<term>MEDLINE</term>
<term>PubMed</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.</div>
</front>
</TEI>
</pubmed>
</double>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000143 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000143 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Ncbi |étape= Merge |type= RBID |clé= PMC:3542245 |texte= NetiNeti: discovery of scientific names from text using machine learning methods }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i -Sk "pubmed:22913485" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd \ | NlmPubMed2Wicri -a OcrV1
This area was generated with Dilib version V0.6.32. |