Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Applications of Natural Language Processing in Biodiversity Science

Identifieur interne : 000323 ( Ncbi/Merge ); précédent : 000322; suivant : 000324

Applications of Natural Language Processing in Biodiversity Science

Auteurs : Anne E. Thessen [États-Unis] ; Hong Cui [États-Unis] ; Dmitry Mozzherin [États-Unis]

Source :

RBID : PMC:3364545

Abstract

Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.


Url:
DOI: 10.1155/2012/391574
PubMed: 22685456
PubMed Central: 3364545

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3364545

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Applications of Natural Language Processing in Biodiversity Science</title>
<author>
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Cui, Hong" sort="Cui, Hong" uniqKey="Cui H" first="Hong" last="Cui">Hong Cui</name>
<affiliation wicri:level="2">
<nlm:aff id="I2">School of Information Resources and Library Science, University of Arizona, Tucson, AZ 85719, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Information Resources and Library Science, University of Arizona, Tucson, AZ 85719</wicri:regionArea>
<placeName>
<region type="state">Arizona</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Mozzherin, Dmitry" sort="Mozzherin, Dmitry" uniqKey="Mozzherin D" first="Dmitry" last="Mozzherin">Dmitry Mozzherin</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22685456</idno>
<idno type="pmc">3364545</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3364545</idno>
<idno type="RBID">PMC:3364545</idno>
<idno type="doi">10.1155/2012/391574</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000693</idno>
<idno type="wicri:Area/Pmc/Curation">000693</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000440</idno>
<idno type="wicri:Area/Ncbi/Merge">000323</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Applications of Natural Language Processing in Biodiversity Science</title>
<author>
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Cui, Hong" sort="Cui, Hong" uniqKey="Cui H" first="Hong" last="Cui">Hong Cui</name>
<affiliation wicri:level="2">
<nlm:aff id="I2">School of Information Resources and Library Science, University of Arizona, Tucson, AZ 85719, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Information Resources and Library Science, University of Arizona, Tucson, AZ 85719</wicri:regionArea>
<placeName>
<region type="state">Arizona</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Mozzherin, Dmitry" sort="Mozzherin, Dmitry" uniqKey="Mozzherin D" first="Dmitry" last="Mozzherin">Dmitry Mozzherin</name>
<affiliation wicri:level="2">
<nlm:aff id="I1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Advances in Bioinformatics</title>
<idno type="ISSN">1687-8027</idno>
<idno type="eISSN">1687-8035</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science. </p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Wuethrich, B" uniqKey="Wuethrich B">B Wuethrich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bradshaw, We" uniqKey="Bradshaw W">WE Bradshaw</name>
</author>
<author>
<name sortKey="Holzapfel, Cm" uniqKey="Holzapfel C">CM Holzapfel</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thessen, Ae" uniqKey="Thessen A">AE Thessen</name>
</author>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hey, A" uniqKey="Hey A">A Hey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, Ld" uniqKey="Stein L">LD Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heidorn, Pb" uniqKey="Heidorn P">PB Heidorn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vollmar, A" uniqKey="Vollmar A">A Vollmar</name>
</author>
<author>
<name sortKey="Macklin, Ja" uniqKey="Macklin J">JA Macklin</name>
</author>
<author>
<name sortKey="Ford, L" uniqKey="Ford L">L Ford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schofield, Pn" uniqKey="Schofield P">PN Schofield</name>
</author>
<author>
<name sortKey="Eppig, J" uniqKey="Eppig J">J Eppig</name>
</author>
<author>
<name sortKey="Huala, E" uniqKey="Huala E">E Huala</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Groth, P" uniqKey="Groth P">P Groth</name>
</author>
<author>
<name sortKey="Gibson, A" uniqKey="Gibson A">A Gibson</name>
</author>
<author>
<name sortKey="Velterop, J" uniqKey="Velterop J">J Velterop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kalfatovic, M" uniqKey="Kalfatovic M">M Kalfatovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tang, X" uniqKey="Tang X">X Tang</name>
</author>
<author>
<name sortKey="Heidorn, P" uniqKey="Heidorn P">P Heidorn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
<author>
<name sortKey="Selden, P" uniqKey="Selden P">P Selden</name>
</author>
<author>
<name sortKey="Boufford, D" uniqKey="Boufford D">D Boufford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Taylor, A" uniqKey="Taylor A">A Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miyao, Y" uniqKey="Miyao Y">Y Miyao</name>
</author>
<author>
<name sortKey="Sagae, K" uniqKey="Sagae K">K Sagae</name>
</author>
<author>
<name sortKey="S Tre, R" uniqKey="S Tre R">R Sætre</name>
</author>
<author>
<name sortKey="Matsuzaki, T" uniqKey="Matsuzaki T">T Matsuzaki</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J Tsujii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Humphreys, K" uniqKey="Humphreys K">K Humphreys</name>
</author>
<author>
<name sortKey="Demetriou, G" uniqKey="Demetriou G">G Demetriou</name>
</author>
<author>
<name sortKey="Gaizauskas, R" uniqKey="Gaizauskas R">R Gaizauskas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gaizauskas, R" uniqKey="Gaizauskas R">R Gaizauskas</name>
</author>
<author>
<name sortKey="Demetriou, G" uniqKey="Demetriou G">G Demetriou</name>
</author>
<author>
<name sortKey="Artymiuk, Pj" uniqKey="Artymiuk P">PJ Artymiuk</name>
</author>
<author>
<name sortKey="Willett, P" uniqKey="Willett P">P Willett</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Divoli, A" uniqKey="Divoli A">A Divoli</name>
</author>
<author>
<name sortKey="Attwood, Tk" uniqKey="Attwood T">TK Attwood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corney, Dpa" uniqKey="Corney D">DPA Corney</name>
</author>
<author>
<name sortKey="Buxton, Bf" uniqKey="Buxton B">BF Buxton</name>
</author>
<author>
<name sortKey="Langdon, Wb" uniqKey="Langdon W">WB Langdon</name>
</author>
<author>
<name sortKey="Jones, Dt" uniqKey="Jones D">DT Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author>
<name sortKey="Sharp, Bm" uniqKey="Sharp B">BM Sharp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, X" uniqKey="Zhou X">X Zhou</name>
</author>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
<author>
<name sortKey="Hu, X" uniqKey="Hu X">X Hu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rebholz Schuhmann, D" uniqKey="Rebholz Schuhmann D">D Rebholz-Schuhmann</name>
</author>
<author>
<name sortKey="Kirsch, H" uniqKey="Kirsch H">H Kirsch</name>
</author>
<author>
<name sortKey="Arregui, M" uniqKey="Arregui M">M Arregui</name>
</author>
<author>
<name sortKey="Gaudan, S" uniqKey="Gaudan S">S Gaudan</name>
</author>
<author>
<name sortKey="Riethoven, M" uniqKey="Riethoven M">M Riethoven</name>
</author>
<author>
<name sortKey="Stoehr, P" uniqKey="Stoehr P">P Stoehr</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, Zz" uniqKey="Hu Z">ZZ Hu</name>
</author>
<author>
<name sortKey="Mani, I" uniqKey="Mani I">I Mani</name>
</author>
<author>
<name sortKey="Hermoso, V" uniqKey="Hermoso V">V Hermoso</name>
</author>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H Liu</name>
</author>
<author>
<name sortKey="Wu, Ch" uniqKey="Wu C">CH Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Demaine, J" uniqKey="Demaine J">J Demaine</name>
</author>
<author>
<name sortKey="Martin, J" uniqKey="Martin J">J Martin</name>
</author>
<author>
<name sortKey="Wei, L" uniqKey="Wei L">L Wei</name>
</author>
<author>
<name sortKey="De Bruijn, B" uniqKey="De Bruijn B">B De Bruijn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lease, M" uniqKey="Lease M">M Lease</name>
</author>
<author>
<name sortKey="Charniak, E" uniqKey="Charniak E">E Charniak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pyysalo, S" uniqKey="Pyysalo S">S Pyysalo</name>
</author>
<author>
<name sortKey="Salakoski, T" uniqKey="Salakoski T">T Salakoski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rimell, L" uniqKey="Rimell L">L Rimell</name>
</author>
<author>
<name sortKey="Clark, S" uniqKey="Clark S">S Clark</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koning, D" uniqKey="Koning D">D Koning</name>
</author>
<author>
<name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
<author>
<name sortKey="Moritz, T" uniqKey="Moritz T">T Moritz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Akella, Lm" uniqKey="Akella L">LM Akella</name>
</author>
<author>
<name sortKey="Norton, Cn" uniqKey="Norton C">CN Norton</name>
</author>
<author>
<name sortKey="Miller, H" uniqKey="Miller H">H Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerner, M" uniqKey="Gerner M">M Gerner</name>
</author>
<author>
<name sortKey="Nenadic, G" uniqKey="Nenadic G">G Nenadic</name>
</author>
<author>
<name sortKey="Bergman, Cm" uniqKey="Bergman C">CM Bergman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Naderi, N" uniqKey="Naderi N">N Naderi</name>
</author>
<author>
<name sortKey="Kappler, T" uniqKey="Kappler T">T Kappler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abascal, R" uniqKey="Abascal R">R Abascal</name>
</author>
<author>
<name sortKey="Sanchez, Ja" uniqKey="Sanchez J">JA Sánchez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krauthammer, M" uniqKey="Krauthammer M">M Krauthammer</name>
</author>
<author>
<name sortKey="Rzhetsky, A" uniqKey="Rzhetsky A">A Rzhetsky</name>
</author>
<author>
<name sortKey="Morozov, P" uniqKey="Morozov P">P Morozov</name>
</author>
<author>
<name sortKey="Friedman, C" uniqKey="Friedman C">C Friedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lenzi, L" uniqKey="Lenzi L">L Lenzi</name>
</author>
<author>
<name sortKey="Frabetti, F" uniqKey="Frabetti F">F Frabetti</name>
</author>
<author>
<name sortKey="Facchin, F" uniqKey="Facchin F">F Facchin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nasr, A" uniqKey="Nasr A">A Nasr</name>
</author>
<author>
<name sortKey="Rambow, O" uniqKey="Rambow O">O Rambow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leaman, R" uniqKey="Leaman R">R Leaman</name>
</author>
<author>
<name sortKey="Gonzalez, G" uniqKey="Gonzalez G">G Gonzalez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schroder, M" uniqKey="Schroder M">M Schröder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Witten, Ih" uniqKey="Witten I">IH Witten</name>
</author>
<author>
<name sortKey="Frank, E" uniqKey="Frank E">E Frank</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blaschke, C" uniqKey="Blaschke C">C Blaschke</name>
</author>
<author>
<name sortKey="Hirschman, L" uniqKey="Hirschman L">L Hirschman</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jimeno Yepes, A" uniqKey="Jimeno Yepes A">A Jimeno-Yepes</name>
</author>
<author>
<name sortKey="Aronson, Ar" uniqKey="Aronson A">AR Aronson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freeland, C" uniqKey="Freeland C">C Freeland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kornai, A" uniqKey="Kornai A">A Kornai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kornai, A" uniqKey="Kornai A">A Kornai</name>
</author>
<author>
<name sortKey="Mohiuddin, K" uniqKey="Mohiuddin K">K Mohiuddin</name>
</author>
<author>
<name sortKey="Connell, Sd" uniqKey="Connell S">SD Connell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freeland, C" uniqKey="Freeland C">C Freeland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Willis, A" uniqKey="Willis A">A Willis</name>
</author>
<author>
<name sortKey="King, D" uniqKey="King D">D King</name>
</author>
<author>
<name sortKey="Morse, D" uniqKey="Morse D">D Morse</name>
</author>
<author>
<name sortKey="Dil, A" uniqKey="Dil A">A Dil</name>
</author>
<author>
<name sortKey="Lyal, C" uniqKey="Lyal C">C Lyal</name>
</author>
<author>
<name sortKey="Roberts, D" uniqKey="Roberts D">D Roberts</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bapst, F" uniqKey="Bapst F">F Bapst</name>
</author>
<author>
<name sortKey="Ingold, R" uniqKey="Ingold R">R Ingold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weitzman, Al" uniqKey="Weitzman A">AL Weitzman</name>
</author>
<author>
<name sortKey="Lyal, Chc" uniqKey="Lyal C">CHC Lyal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rees, T" uniqKey="Rees T">T Rees</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sautter, G" uniqKey="Sautter G">G Sautter</name>
</author>
<author>
<name sortKey="Bohm, K" uniqKey="Bohm K">K Böhm</name>
</author>
<author>
<name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Settles, B" uniqKey="Settles B">B Settles</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pavlopoulos, Ga" uniqKey="Pavlopoulos G">GA Pavlopoulos</name>
</author>
<author>
<name sortKey="Pafilis, E" uniqKey="Pafilis E">E Pafilis</name>
</author>
<author>
<name sortKey="Kuhn, M" uniqKey="Kuhn M">M Kuhn</name>
</author>
<author>
<name sortKey="Hooper, Sd" uniqKey="Hooper S">SD Hooper</name>
</author>
<author>
<name sortKey="Schneider, R" uniqKey="Schneider R">R Schneider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pafilis, E" uniqKey="Pafilis E">E Pafilis</name>
</author>
<author>
<name sortKey="O Onoghue, Si" uniqKey="O Onoghue S">SI O’Donoghue</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuhn, M" uniqKey="Kuhn M">M Kuhn</name>
</author>
<author>
<name sortKey="Von Mering, C" uniqKey="Von Mering C">C von Mering</name>
</author>
<author>
<name sortKey="Campillos, M" uniqKey="Campillos M">M Campillos</name>
</author>
<author>
<name sortKey="Jensen, Lj" uniqKey="Jensen L">LJ Jensen</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Balhoff, Jp" uniqKey="Balhoff J">JP Balhoff</name>
</author>
<author>
<name sortKey="Dahdul, Wm" uniqKey="Dahdul W">WM Dahdul</name>
</author>
<author>
<name sortKey="Kothari, Cr" uniqKey="Kothari C">CR Kothari</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dahdul, Wm" uniqKey="Dahdul W">WM Dahdul</name>
</author>
<author>
<name sortKey="Balhoff, Jp" uniqKey="Balhoff J">JP Balhoff</name>
</author>
<author>
<name sortKey="Engeman, J" uniqKey="Engeman J">J Engeman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sautter, G" uniqKey="Sautter G">G Sautter</name>
</author>
<author>
<name sortKey="Bohm, K" uniqKey="Bohm K">K Bohm</name>
</author>
<author>
<name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leary, Pr" uniqKey="Leary P">PR Leary</name>
</author>
<author>
<name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
<author>
<name sortKey="Norton, Cn" uniqKey="Norton C">CN Norton</name>
</author>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author>
<name sortKey="Sarkar, In" uniqKey="Sarkar I">IN Sarkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Okazaki, N" uniqKey="Okazaki N">N Okazaki</name>
</author>
<author>
<name sortKey="Ananiadou, S" uniqKey="Ananiadou S">S Ananiadou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bontcheva, K" uniqKey="Bontcheva K">K Bontcheva</name>
</author>
<author>
<name sortKey="Tablan, V" uniqKey="Tablan V">V Tablan</name>
</author>
<author>
<name sortKey="Maynard, D" uniqKey="Maynard D">D Maynard</name>
</author>
<author>
<name sortKey="Cunningham, H" uniqKey="Cunningham H">H Cunningham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cunningham, H" uniqKey="Cunningham H">H Cunningham</name>
</author>
<author>
<name sortKey="Maynard, D" uniqKey="Maynard D">D Maynard</name>
</author>
<author>
<name sortKey="Bontcheva, K" uniqKey="Bontcheva K">K Bontcheva</name>
</author>
<author>
<name sortKey="Tablan, V" uniqKey="Tablan V">V Tablan</name>
</author>
<author>
<name sortKey="Ursu, C" uniqKey="Ursu C">C Ursu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fitzpatrick, E" uniqKey="Fitzpatrick E">E Fitzpatrick</name>
</author>
<author>
<name sortKey="Bachenko, J" uniqKey="Bachenko J">J Bachenko</name>
</author>
<author>
<name sortKey="Hindle, D" uniqKey="Hindle D">D Hindle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, M" uniqKey="Wood M">M Wood</name>
</author>
<author>
<name sortKey="Lydon, S" uniqKey="Lydon S">S Lydon</name>
</author>
<author>
<name sortKey="Tablan, V" uniqKey="Tablan V">V Tablan</name>
</author>
<author>
<name sortKey="Maynard, D" uniqKey="Maynard D">D Maynard</name>
</author>
<author>
<name sortKey="Cunningham, H" uniqKey="Cunningham H">H Cunningham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, L" uniqKey="Chen L">L Chen</name>
</author>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H Liu</name>
</author>
<author>
<name sortKey="Friedman, C" uniqKey="Friedman C">C Friedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Kim, W" uniqKey="Kim W">W Kim</name>
</author>
<author>
<name sortKey="Hatzivassiloglou, V" uniqKey="Hatzivassiloglou V">V Hatzivassiloglou</name>
</author>
<author>
<name sortKey="Wilbur, Wj" uniqKey="Wilbur W">WJ Wilbur</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chang, Jt" uniqKey="Chang J">JT Chang</name>
</author>
<author>
<name sortKey="Schutze, H" uniqKey="Schutze H">H Schutze</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wren, Jd" uniqKey="Wren J">JD Wren</name>
</author>
<author>
<name sortKey="Garner, Hr" uniqKey="Garner H">HR Garner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lydon, S" uniqKey="Lydon S">S Lydon</name>
</author>
<author>
<name sortKey="Wood, M" uniqKey="Wood M">M Wood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Taylor, A" uniqKey="Taylor A">A Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Radford, Ae" uniqKey="Radford A">AE Radford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Diederich, J" uniqKey="Diederich J">J Diederich</name>
</author>
<author>
<name sortKey="Fortuner, R" uniqKey="Fortuner R">R Fortuner</name>
</author>
<author>
<name sortKey="Milton, J" uniqKey="Milton J">J Milton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, M" uniqKey="Wood M">M Wood</name>
</author>
<author>
<name sortKey="Lydon, S" uniqKey="Lydon S">S Lydon</name>
</author>
<author>
<name sortKey="Tablan, V" uniqKey="Tablan V">V Tablan</name>
</author>
<author>
<name sortKey="Maynard, D" uniqKey="Maynard D">D Maynard</name>
</author>
<author>
<name sortKey="Cunningham, H" uniqKey="Cunningham H">H Cunningham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
<author>
<name sortKey="Heidorn, Pb" uniqKey="Heidorn P">PB Heidorn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wei, Q" uniqKey="Wei Q">Q Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soderland, S" uniqKey="Soderland S">S Soderland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cui, H" uniqKey="Cui H">H Cui</name>
</author>
<author>
<name sortKey="Singaram, S" uniqKey="Singaram S">S Singaram</name>
</author>
<author>
<name sortKey="Janning, A" uniqKey="Janning A">A Janning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mabee, Pm" uniqKey="Mabee P">PM Mabee</name>
</author>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Cronk, Q" uniqKey="Cronk Q">Q Cronk</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="review-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Adv Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Adv Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">ABI</journal-id>
<journal-title-group>
<journal-title>Advances in Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1687-8027</issn>
<issn pub-type="epub">1687-8035</issn>
<publisher>
<publisher-name>Hindawi Publishing Corporation</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22685456</article-id>
<article-id pub-id-type="pmc">3364545</article-id>
<article-id pub-id-type="doi">10.1155/2012/391574</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Review Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Applications of Natural Language Processing in Biodiversity Science</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Thessen</surname>
<given-names>Anne E.</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cui</surname>
<given-names>Hong</given-names>
</name>
<xref ref-type="aff" rid="I2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mozzherin</surname>
<given-names>Dmitry</given-names>
</name>
<xref ref-type="aff" rid="I1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group>
<aff id="I1">
<sup>1</sup>
Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA</aff>
<aff id="I2">
<sup>2</sup>
School of Information Resources and Library Science, University of Arizona, Tucson, AZ 85719, USA</aff>
<author-notes>
<corresp id="cor1">*Anne E. Thessen:
<email>athessen@mbl.edu</email>
</corresp>
<fn fn-type="other">
<p>Academic Editor: Jörg Hakenberg</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>22</day>
<month>5</month>
<year>2012</year>
</pub-date>
<volume>2012</volume>
<elocation-id>391574</elocation-id>
<history>
<date date-type="received">
<day>4</day>
<month>11</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>2</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2012 Anne E. Thessen et al.</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access">
<license-p>This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science. </p>
</abstract>
</article-meta>
</front>
<floats-group>
<fig id="fig1" position="float">
<label>Figure 1</label>
<caption>
<p>The long tail of biology. Data quantity, digitization, and openness can be described using a hyperbolic (hollow) curve with a small number of providers providing large quantities of data, and a large number of individuals providing small quantities of data.</p>
</caption>
<graphic xlink:href="ABI2012-391574.001"></graphic>
</fig>
<fig id="fig2" position="float">
<label>Figure 2</label>
<caption>
<p>A reference system architecture for an example IE system. Numbers correspond to the text.</p>
</caption>
<graphic xlink:href="ABI2012-391574.002"></graphic>
</fig>
<fig id="fig3" position="float">
<label>Figure 3</label>
<caption>
<p>An example of shallow parsing. Words and a sentence (S) are recognized. Then, the sentence is parsed into noun phrases (NP), verbs (V), and verb phrases (VP).</p>
</caption>
<graphic xlink:href="ABI2012-391574.003"></graphic>
</fig>
<fig id="fig4" position="float">
<label>Figure 4</label>
<caption>
<p>Shallow-vs-Deep-Parsing. The shallow parsing result produced by GENIA Tagger (
<ext-link ext-link-type="uri" xlink:href="http://text0.mib.man.ac.uk/software/geniatagger/">http://text0.mib.man.ac.uk/software/geniatagger/</ext-link>
). The deep parsing result produced by Enju Parser for Biomedical Domain (
<ext-link ext-link-type="uri" xlink:href="http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html">http://www-tsujii.is.s.u-tokyo.ac.jp/enju/demo.html</ext-link>
). GENIA Tagger and Enju Parser are products of the Tsujii Laboratory of the University of Tokyo and optimized for biomedical domain. Both Parsing results contain errors, for example “obovate” should be an ADJP (adjective phrase), but GENIA Tagger chunked it as a VP (verb phrase). “blade” is a noun, but Enju parser parsed it as a verb (VBD). This is not to criticize the tools, but to point out language differences in different domains could have a significant impact on the performance of NLP tools. Parsers trained for a general domain produce erroneous results on morphological descriptions [
<xref ref-type="bibr" rid="B28">16</xref>
].</p>
</caption>
<graphic xlink:href="ABI2012-391574.004"></graphic>
</fig>
<fig id="fig5" position="float">
<label>Figure 5</label>
<caption>
<p>Extraction result from a descriptive sentence.</p>
</caption>
<graphic xlink:href="ABI2012-391574.005"></graphic>
</fig>
<fig id="figbox1" position="float">
<label>Box 1</label>
<graphic xlink:href="ABI2012-391574.006"></graphic>
</fig>
<fig id="figbox2" position="float">
<label>Box 2</label>
<graphic xlink:href="ABI2012-391574.007"></graphic>
</fig>
<fig id="figbox3" position="float">
<label>Box 3</label>
<graphic xlink:href="ABI2012-391574.008"></graphic>
</fig>
<fig id="figbox4" position="float">
<label>Box 4</label>
<graphic xlink:href="ABI2012-391574.009"></graphic>
</fig>
<table-wrap id="tab1" position="float">
<label>Table 1</label>
<caption>
<p>From Tang and Heidorn [
<xref ref-type="bibr" rid="B13">13</xref>
]. An example template for morphological character extraction. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Template slots</th>
<th align="center" rowspan="1" colspan="1">Extracted information</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Genus</td>
<td align="center" rowspan="1" colspan="1">Pellaea</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Species</td>
<td align="center" rowspan="1" colspan="1">mucronata</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Distribution</td>
<td align="center" rowspan="1" colspan="1">Nev. Calif.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Leaf shape</td>
<td align="center" rowspan="1" colspan="1">ovate-deltate</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Leaf margin</td>
<td align="center" rowspan="1" colspan="1">dentate</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Leaf apex</td>
<td align="center" rowspan="1" colspan="1">mucronate</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Leaf base</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Leaf arrangement</td>
<td align="center" rowspan="1" colspan="1">clustered</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Blade dimension</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Leaf color</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Fruit/nut shape</td>
<td align="center" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab2" position="float">
<label>Table 2</label>
<caption>
<p>Information extraction tasks outlined by the MUCs and their descriptions.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Task</th>
<th align="center" rowspan="1" colspan="1">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Named entity</td>
<td align="center" rowspan="1" colspan="1">Extracts names of entities</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Coreference</td>
<td align="center" rowspan="1" colspan="1">Links references to the same entity</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Template element</td>
<td align="center" rowspan="1" colspan="1">Extracts descriptors of entities</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Template rotation</td>
<td align="center" rowspan="1" colspan="1">Extracts relationships between entities</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Scenario template</td>
<td align="center" rowspan="1" colspan="1">Extracts events</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab3" position="float">
<label>Table 3</label>
<caption>
<p>Existing IE systems for biology [
<xref ref-type="bibr" rid="B70">17</xref>
<xref ref-type="bibr" rid="B78">26</xref>
]. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">System</th>
<th align="left" rowspan="1" colspan="1">Approach</th>
<th align="left" rowspan="1" colspan="1">Structure of Text</th>
<th align="left" rowspan="1" colspan="1">Knowledge in</th>
<th align="left" rowspan="1" colspan="1">Application domain</th>
<th align="left" rowspan="1" colspan="1">Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">AkanePPI</td>
<td align="left" rowspan="1" colspan="1">shallow parsing</td>
<td align="left" rowspan="1" colspan="1">sentence-split, tokenized, and annotated</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">protein interactions</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B70">17</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">EMPathIE</td>
<td align="left" rowspan="1" colspan="1">pattern matching</td>
<td align="left" rowspan="1" colspan="1">text</td>
<td align="left" rowspan="1" colspan="1">EMP database</td>
<td align="left" rowspan="1" colspan="1">enzymes</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B71">18</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PASTA</td>
<td align="left" rowspan="1" colspan="1">pattern matching</td>
<td align="left" rowspan="1" colspan="1">text</td>
<td align="left" rowspan="1" colspan="1">biological lexicons</td>
<td align="left" rowspan="1" colspan="1">protein structure</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B72">19</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">BioIE</td>
<td align="left" rowspan="1" colspan="1">pattern matching</td>
<td align="left" rowspan="1" colspan="1">xml</td>
<td align="left" rowspan="1" colspan="1">dictionary of terms</td>
<td align="left" rowspan="1" colspan="1">biomedicine</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B73">20</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">BioRAT</td>
<td align="left" rowspan="1" colspan="1">pattern matching, sub-language driven</td>
<td align="left" rowspan="1" colspan="1">could be xml, html, text or asn.1, can do full-length pdf papers (converts to text)</td>
<td align="left" rowspan="1" colspan="1">dictionary for protein and gene names, dictionary for interactions, and synonyms; text pattern template</td>
<td align="left" rowspan="1" colspan="1">biomedicine</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B74">21</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Chilibot</td>
<td align="left" rowspan="1" colspan="1">shallow parsing</td>
<td align="left" rowspan="1" colspan="1">not sure what was used in paper, but could be xml, html, text or asn.1</td>
<td align="left" rowspan="1" colspan="1">nomenclature dictionary</td>
<td align="left" rowspan="1" colspan="1">biomedicine</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B75">22</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Dragon Toolkit</td>
<td align="left" rowspan="1" colspan="1">mixed syntactic semantic</td>
<td align="left" rowspan="1" colspan="1">text</td>
<td align="left" rowspan="1" colspan="1">domain ontologies</td>
<td align="left" rowspan="1" colspan="1">genomics</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B79">23</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">EBIMed</td>
<td align="left" rowspan="1" colspan="1">pattern matching</td>
<td align="left" rowspan="1" colspan="1">xml</td>
<td align="left" rowspan="1" colspan="1">dictionary of terms</td>
<td align="left" rowspan="1" colspan="1">biomedicine</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B76">24</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">iProLINK</td>
<td align="left" rowspan="1" colspan="1">shallow parsing</td>
<td align="left" rowspan="1" colspan="1">text</td>
<td align="left" rowspan="1" colspan="1">protein name dictionary, ontology, and annotated corpora</td>
<td align="left" rowspan="1" colspan="1">proteins</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B77">25</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LitMiner</td>
<td align="left" rowspan="1" colspan="1">mixed syntactic semantic</td>
<td align="left" rowspan="1" colspan="1">web documents</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Drosophila research</td>
<td align="left" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B78">26</xref>
]</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="tab4" position="float">
<label>Table 4</label>
<caption>
<p>Performance metrics for the names recognition and morphological character extraction algorithms reviewed. Recall and precision values may not be directly comparable between the different algorithms. NA: not available [
<xref ref-type="bibr" rid="B80">30</xref>
]. </p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Tool</th>
<th align="center" rowspan="1" colspan="1">Recall</th>
<th align="center" rowspan="1" colspan="1">Precision</th>
<th align="center" rowspan="1" colspan="1">Test Corpora</th>
<th align="center" rowspan="1" colspan="1">Reference</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">TaxonGrab</td>
<td align="center" rowspan="1" colspan="1">>94%</td>
<td align="center" rowspan="1" colspan="1">>96%</td>
<td align="center" rowspan="1" colspan="1">Vol. 1 Birds of the Belgian Congo by Chapin</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B44">31</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">FAT</td>
<td align="center" rowspan="1" colspan="1">40.2%</td>
<td align="center" rowspan="1" colspan="1">84.0%</td>
<td align="center" rowspan="1" colspan="1">American Seashells by Abbott</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B27">32</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Taxon Finder</td>
<td align="center" rowspan="1" colspan="1">54.3%</td>
<td align="center" rowspan="1" colspan="1">97.5%</td>
<td align="center" rowspan="1" colspan="1">American Seashells by Abbott</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B27">32</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Neti Neti</td>
<td align="center" rowspan="1" colspan="1">70.5%</td>
<td align="center" rowspan="1" colspan="1">98.9%</td>
<td align="center" rowspan="1" colspan="1">American Seashells by Abbott</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B27">32</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LINNAEUS</td>
<td align="center" rowspan="1" colspan="1">94.3%</td>
<td align="center" rowspan="1" colspan="1">97.1%</td>
<td align="center" rowspan="1" colspan="1">LINNAEUS gold standard data set</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B47">33</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Organism Tagger</td>
<td align="center" rowspan="1" colspan="1">94.0%</td>
<td align="center" rowspan="1" colspan="1">95.0%</td>
<td align="center" rowspan="1" colspan="1">LINNAEUS gold standard data set</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B49">34</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">X-tract</td>
<td align="center" rowspan="1" colspan="1">NA</td>
<td align="center" rowspan="1" colspan="1">NA</td>
<td align="center" rowspan="1" colspan="1">Flora of North America</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B61">35</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Worldwide Botanical Knowledge Base</td>
<td align="center" rowspan="1" colspan="1">NA</td>
<td align="center" rowspan="1" colspan="1">NA</td>
<td align="center" rowspan="1" colspan="1">Flora of China</td>
<td align="center" rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://wwbota.free.fr/">http://wwbota.free.fr/</ext-link>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Terminator</td>
<td align="center" rowspan="1" colspan="1">NA</td>
<td align="center" rowspan="1" colspan="1">NA</td>
<td align="center" rowspan="1" colspan="1">16 nematode descriptions</td>
<td align="center" rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.math.ucdavis.edu/~milton/genisys/terminator.html">http://www.math.ucdavis.edu/~milton/genisys/terminator.html</ext-link>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">MultiFlora</td>
<td align="center" rowspan="1" colspan="1">mid 60%</td>
<td align="center" rowspan="1" colspan="1">mid 70%</td>
<td align="center" rowspan="1" colspan="1">Descriptions of Ranunculus spp. from six Floras</td>
<td align="center" rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://intranet.cs.man.ac.uk/ai/public/MultiFlora/MF1.html">http://intranet.cs.man.ac.uk/ai/public/MultiFlora/MF1.html</ext-link>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">MARTT</td>
<td align="center" rowspan="1" colspan="1">98.0%</td>
<td align="center" rowspan="1" colspan="1">58.0%</td>
<td align="center" rowspan="1" colspan="1">Flora of North America and Flora of China</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B80">30</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">WHISK</td>
<td align="center" rowspan="1" colspan="1">33.33% to 79.65%</td>
<td align="center" rowspan="1" colspan="1">72.52% to 100%</td>
<td align="center" rowspan="1" colspan="1">Flora of North America</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B13">13</xref>
]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CharaParser</td>
<td align="center" rowspan="1" colspan="1">90.0%</td>
<td align="center" rowspan="1" colspan="1">91.0%</td>
<td align="center" rowspan="1" colspan="1">Flora of North America</td>
<td align="center" rowspan="1" colspan="1">[
<xref ref-type="bibr" rid="B68">36</xref>
]</td>
</tr>
</tbody>
</table>
</table-wrap>
</floats-group>
</pmc>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Arizona</li>
<li>Massachusetts</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
</region>
<name sortKey="Cui, Hong" sort="Cui, Hong" uniqKey="Cui H" first="Hong" last="Cui">Hong Cui</name>
<name sortKey="Mozzherin, Dmitry" sort="Mozzherin, Dmitry" uniqKey="Mozzherin D" first="Dmitry" last="Mozzherin">Dmitry Mozzherin</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000323 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000323 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:3364545
   |texte=   Applications of Natural Language Processing in Biodiversity Science
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:22685456" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024