Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

NetiNeti: discovery of scientific names from text using machine learning methods.

Identifieur interne : 000027 ( PubMed/Checkpoint ); précédent : 000026; suivant : 000028

NetiNeti: discovery of scientific names from text using machine learning methods.

Auteurs : Lakshmi Manohar Akella [États-Unis] ; Catherine N. Norton ; Holly Miller

Source :

RBID : pubmed:22913485

English descriptors

Abstract

A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.

DOI: 10.1186/1471-2105-13-211
PubMed: 22913485


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:22913485

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">NetiNeti: discovery of scientific names from text using machine learning methods.</title>
<author>
<name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2">
<nlm:affiliation>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA. manohar.akella@gmail.com</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
</author>
<author>
<name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="doi">10.1186/1471-2105-13-211</idno>
<idno type="RBID">pubmed:22913485</idno>
<idno type="pmid">22913485</idno>
<idno type="wicri:Area/PubMed/Corpus">000026</idno>
<idno type="wicri:Area/PubMed/Curation">000026</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000026</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">NetiNeti: discovery of scientific names from text using machine learning methods.</title>
<author>
<name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
<affiliation wicri:level="2">
<nlm:affiliation>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA. manohar.akella@gmail.com</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
</author>
<author>
<name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
</author>
</analytic>
<series>
<title level="j">BMC bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Animals</term>
<term>Artificial Intelligence</term>
<term>Classification</term>
<term>Data Mining</term>
<term>MEDLINE</term>
<term>PubMed</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Animals</term>
<term>Artificial Intelligence</term>
<term>Classification</term>
<term>Data Mining</term>
<term>MEDLINE</term>
<term>PubMed</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">22913485</PMID>
<DateCreated>
<Year>2013</Year>
<Month>01</Month>
<Day>11</Day>
</DateCreated>
<DateCompleted>
<Year>2013</Year>
<Month>03</Month>
<Day>18</Day>
</DateCompleted>
<DateRevised>
<Year>2015</Year>
<Month>02</Month>
<Day>23</Day>
</DateRevised>
<Article PubModel="Electronic">
<Journal>
<ISSN IssnType="Electronic">1471-2105</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>13</Volume>
<PubDate>
<Year>2012</Year>
</PubDate>
</JournalIssue>
<Title>BMC bioinformatics</Title>
<ISOAbbreviation>BMC Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>NetiNeti: discovery of scientific names from text using machine learning methods.</ArticleTitle>
<Pagination>
<MedlinePgn>211</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/1471-2105-13-211</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages.</AbstractText>
<AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Akella</LastName>
<ForeName>Lakshmi Manohar</ForeName>
<Initials>LM</Initials>
<AffiliationInfo>
<Affiliation>MBLWHOI Library, Marine Biological Laboratory, Woods Hole, MA, USA. manohar.akella@gmail.com</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Norton</LastName>
<ForeName>Catherine N</ForeName>
<Initials>CN</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Miller</LastName>
<ForeName>Holly</ForeName>
<Initials>H</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>R01 LM009725</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2012</Year>
<Month>08</Month>
<Day>22</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>BMC Bioinformatics</MedlineTA>
<NlmUniqueID>100965194</NlmUniqueID>
<ISSNLinking>1471-2105</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2005;6 Suppl 1:S14</RefSource>
<PMID Version="1">15960826</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Syst Biol. 2006 Jun;55(3):367-73</RefSource>
<PMID Version="1">16861205</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2006 Oct 1;22(19):2444-5</RefSource>
<PMID Version="1">16870931</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2006 Dec 15;22(24):3089-95</RefSource>
<PMID Version="1">17050571</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2007;8:158</RefSource>
<PMID Version="1">17511869</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2007 Jun 1;23(11):1434-6</RefSource>
<PMID Version="1">17392332</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Trends Ecol Evol. 2010 Dec;25(12):686-91</RefSource>
<PMID Version="1">20961649</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2008 Jan 15;24(2):296-8</RefSource>
<PMID Version="1">18006544</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2008 Aug 15;24(16):i126-132</RefSource>
<PMID Version="1">18689813</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2008;9 Suppl 11:S6</RefSource>
<PMID Version="1">19025692</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2010 Mar 1;26(5):661-7</RefSource>
<PMID Version="1">20053840</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Bioinformatics. 2010;11:85</RefSource>
<PMID Version="1">20149233</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Brief Bioinform. 2007 Sep;8(5):347-57</RefSource>
<PMID Version="1">17704120</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D000818">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y" UI="D001185">Artificial Intelligence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y" UI="D002965">Classification</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y" UI="D057225">Data Mining</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D016239">MEDLINE</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D039781">PubMed</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<OtherID Source="NLM">PMC3542245</OtherID>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2010</Year>
<Month>10</Month>
<Day>15</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2012</Year>
<Month>8</Month>
<Day>6</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="aheadofprint">
<Year>2012</Year>
<Month>8</Month>
<Day>22</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2012</Year>
<Month>8</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2012</Year>
<Month>8</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2013</Year>
<Month>3</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pii">1471-2105-13-211</ArticleId>
<ArticleId IdType="doi">10.1186/1471-2105-13-211</ArticleId>
<ArticleId IdType="pubmed">22913485</ArticleId>
<ArticleId IdType="pmc">PMC3542245</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Massachusetts</li>
</region>
</list>
<tree>
<noCountry>
<name sortKey="Miller, Holly" sort="Miller, Holly" uniqKey="Miller H" first="Holly" last="Miller">Holly Miller</name>
<name sortKey="Norton, Catherine N" sort="Norton, Catherine N" uniqKey="Norton C" first="Catherine N" last="Norton">Catherine N. Norton</name>
</noCountry>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Akella, Lakshmi Manohar" sort="Akella, Lakshmi Manohar" uniqKey="Akella L" first="Lakshmi Manohar" last="Akella">Lakshmi Manohar Akella</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000027 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000027 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:22913485
   |texte=   NetiNeti: discovery of scientific names from text using machine learning methods.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:22913485" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024