Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

Identifieur interne : 000010 ( Pmc/Curation ); précédent : 000009; suivant : 000011

DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

Auteurs : Xu-Cheng Yin [République populaire de Chine] ; Chun Yang [République populaire de Chine] ; Wei-Yi Pei [République populaire de Chine] ; Haixia Man [République populaire de Chine] ; Jun Zhang [République populaire de Chine] ; Erik Learned-Miller [États-Unis] ; Hong Yu [États-Unis]

Source :

RBID : PMC:4423993

Abstract

Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.


Url:
DOI: 10.1371/journal.pone.0126200
PubMed: 25951377
PubMed Central: 4423993

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4423993

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures</title>
<author>
<name sortKey="Yin, Xu Cheng" sort="Yin, Xu Cheng" uniqKey="Yin X" first="Xu-Cheng" last="Yin">Xu-Cheng Yin</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Yang, Chun" sort="Yang, Chun" uniqKey="Yang C" first="Chun" last="Yang">Chun Yang</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pei, Wei Yi" sort="Pei, Wei Yi" uniqKey="Pei W" first="Wei-Yi" last="Pei">Wei-Yi Pei</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Man, Haixia" sort="Man, Haixia" uniqKey="Man H" first="Haixia" last="Man">Haixia Man</name>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>School of Foreign Studies, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Foreign Studies, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Jun" sort="Zhang, Jun" uniqKey="Zhang J" first="Jun" last="Zhang">Jun Zhang</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Learned Miller, Erik" sort="Learned Miller, Erik" uniqKey="Learned Miller E" first="Erik" last="Learned-Miller">Erik Learned-Miller</name>
<affiliation wicri:level="1">
<nlm:aff id="aff003">
<addr-line>School of Computer Science, University of Massachusetts Amherst, MA, USA</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Computer Science, University of Massachusetts Amherst, MA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Yu, Hong" sort="Yu, Hong" uniqKey="Yu H" first="Hong" last="Yu">Hong Yu</name>
<affiliation wicri:level="1">
<nlm:aff id="aff003">
<addr-line>School of Computer Science, University of Massachusetts Amherst, MA, USA</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Computer Science, University of Massachusetts Amherst, MA</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff004">
<addr-line>Department of Quantitative Health Sciences, University of Massachusetts Medical School, MA, USA</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Quantitative Health Sciences, University of Massachusetts Medical School, MA</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25951377</idno>
<idno type="pmc">4423993</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4423993</idno>
<idno type="RBID">PMC:4423993</idno>
<idno type="doi">10.1371/journal.pone.0126200</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000010</idno>
<idno type="wicri:Area/Pmc/Curation">000010</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures</title>
<author>
<name sortKey="Yin, Xu Cheng" sort="Yin, Xu Cheng" uniqKey="Yin X" first="Xu-Cheng" last="Yin">Xu-Cheng Yin</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Yang, Chun" sort="Yang, Chun" uniqKey="Yang C" first="Chun" last="Yang">Chun Yang</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pei, Wei Yi" sort="Pei, Wei Yi" uniqKey="Pei W" first="Wei-Yi" last="Pei">Wei-Yi Pei</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Man, Haixia" sort="Man, Haixia" uniqKey="Man H" first="Haixia" last="Man">Haixia Man</name>
<affiliation wicri:level="1">
<nlm:aff id="aff002">
<addr-line>School of Foreign Studies, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Foreign Studies, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Jun" sort="Zhang, Jun" uniqKey="Zhang J" first="Jun" last="Zhang">Jun Zhang</name>
<affiliation wicri:level="1">
<nlm:aff id="aff001">
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Learned Miller, Erik" sort="Learned Miller, Erik" uniqKey="Learned Miller E" first="Erik" last="Learned-Miller">Erik Learned-Miller</name>
<affiliation wicri:level="1">
<nlm:aff id="aff003">
<addr-line>School of Computer Science, University of Massachusetts Amherst, MA, USA</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Computer Science, University of Massachusetts Amherst, MA</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Yu, Hong" sort="Yu, Hong" uniqKey="Yu H" first="Hong" last="Yu">Hong Yu</name>
<affiliation wicri:level="1">
<nlm:aff id="aff003">
<addr-line>School of Computer Science, University of Massachusetts Amherst, MA, USA</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of Computer Science, University of Massachusetts Amherst, MA</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff004">
<addr-line>Department of Quantitative Health Sciences, University of Massachusetts Medical School, MA, USA</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Quantitative Health Sciences, University of Massachusetts Medical School, MA</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes
<bold>D</bold>
e
<bold>TEXT</bold>
: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the
<bold>D</bold>
e
<bold>TEXT</bold>
data and make available evaluation protocols for
<bold>D</bold>
e
<bold>TEXT</bold>
. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area.
<bold>D</bold>
e
<bold>TEXT</bold>
is publicly available for downloading at
<ext-link ext-link-type="uri" xlink:href="http://prir.ustb.edu.cn/DeTEXT/">http://prir.ustb.edu.cn/DeTEXT/</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Shatkay, H" uniqKey="Shatkay H">H Shatkay</name>
</author>
<author>
<name sortKey="Chen, N" uniqKey="Chen N">N Chen</name>
</author>
<author>
<name sortKey="Blostein, D" uniqKey="Blostein D">D Blostein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Lee, M" uniqKey="Lee M">M Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hearst, Ma" uniqKey="Hearst M">MA Hearst</name>
</author>
<author>
<name sortKey="Divoli, A" uniqKey="Divoli A">A Divoli</name>
</author>
<author>
<name sortKey="Guturu, H" uniqKey="Guturu H">H Guturu</name>
</author>
<author>
<name sortKey="Ksikes, A" uniqKey="Ksikes A">A Ksikes</name>
</author>
<author>
<name sortKey="Nakov, P" uniqKey="Nakov P">P Nakov</name>
</author>
<author>
<name sortKey="Wooldridge, Ma" uniqKey="Wooldridge M">MA Wooldridge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Qian, Y" uniqKey="Qian Y">Y Qian</name>
</author>
<author>
<name sortKey="Murphy, R" uniqKey="Murphy R">R Murphy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, S" uniqKey="Xu S">S Xu</name>
</author>
<author>
<name sortKey="Mccusker, J" uniqKey="Mccusker J">J McCusker</name>
</author>
<author>
<name sortKey="Krauthammer, M" uniqKey="Krauthammer M">M Krauthammer</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ahmed, A" uniqKey="Ahmed A">A Ahmed</name>
</author>
<author>
<name sortKey="Arnold, A" uniqKey="Arnold A">A Arnold</name>
</author>
<author>
<name sortKey="Coelho, L" uniqKey="Coelho L">L Coelho</name>
</author>
<author>
<name sortKey="Kangas, J" uniqKey="Kangas J">J Kangas</name>
</author>
<author>
<name sortKey="Sheikh, As" uniqKey="Sheikh A">AS Sheikh</name>
</author>
<author>
<name sortKey="Xing, E" uniqKey="Xing E">E Xing</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
<author>
<name sortKey="Liu, F" uniqKey="Liu F">F Liu</name>
</author>
<author>
<name sortKey="Ramesh, B" uniqKey="Ramesh B">B Ramesh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bockhorst, J" uniqKey="Bockhorst J">J Bockhorst</name>
</author>
<author>
<name sortKey="Conroy, J" uniqKey="Conroy J">J Conroy</name>
</author>
<author>
<name sortKey="Agarwal, S" uniqKey="Agarwal S">S Agarwal</name>
</author>
<author>
<name sortKey="O Eary, D" uniqKey="O Eary D">D O’Leary</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lopez, L" uniqKey="Lopez L">L Lopez</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Arighi, C" uniqKey="Arighi C">C Arighi</name>
</author>
<author>
<name sortKey="Tudor, C" uniqKey="Tudor C">C Tudor</name>
</author>
<author>
<name sortKey="Torri, M" uniqKey="Torri M">M Torri</name>
</author>
<author>
<name sortKey="Huang, H" uniqKey="Huang H">H Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, F" uniqKey="Liu F">F Liu</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hua, X" uniqKey="Hua X">X Hua</name>
</author>
<author>
<name sortKey="Liu, W" uniqKey="Liu W">W Liu</name>
</author>
<author>
<name sortKey="Zhang, H" uniqKey="Zhang H">H Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yi, C" uniqKey="Yi C">C Yi</name>
</author>
<author>
<name sortKey="Tian, Y" uniqKey="Tian Y">Y Tian</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, K" uniqKey="Kim K">K Kim</name>
</author>
<author>
<name sortKey="Jung, K" uniqKey="Jung K">K Jung</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cao, Yg" uniqKey="Cao Y">YG Cao</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Liu, F" uniqKey="Liu F">F Liu</name>
</author>
<author>
<name sortKey="Agarwal, S" uniqKey="Agarwal S">S Agarwal</name>
</author>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Liu, F" uniqKey="Liu F">F Liu</name>
</author>
<author>
<name sortKey="Antiean, L" uniqKey="Antiean L">L Antiean</name>
</author>
<author>
<name sortKey="Cao, Y" uniqKey="Cao Y">Y Cao</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yi, C" uniqKey="Yi C">C Yi</name>
</author>
<author>
<name sortKey="Tian, Y" uniqKey="Tian Y">Y Tian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pan, Xf" uniqKey="Pan X">XF Pan</name>
</author>
<author>
<name sortKey="Hou, X" uniqKey="Hou X">X Hou</name>
</author>
<author>
<name sortKey="Liu, Cl" uniqKey="Liu C">CL Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, C" uniqKey="Shi C">C Shi</name>
</author>
<author>
<name sortKey="Wang, C" uniqKey="Wang C">C Wang</name>
</author>
<author>
<name sortKey="Xiao, B" uniqKey="Xiao B">B Xiao</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Gao, S" uniqKey="Gao S">S Gao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koo, H" uniqKey="Koo H">H Koo</name>
</author>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yin, Xc" uniqKey="Yin X">XC Yin</name>
</author>
<author>
<name sortKey="Yin, X" uniqKey="Yin X">X Yin</name>
</author>
<author>
<name sortKey="Huang, K" uniqKey="Huang K">K Huang</name>
</author>
<author>
<name sortKey="Hao, Hw" uniqKey="Hao H">HW Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weinman, J" uniqKey="Weinman J">J Weinman</name>
</author>
<author>
<name sortKey="Learned Miller, E" uniqKey="Learned Miller E">E Learned-Miller</name>
</author>
<author>
<name sortKey="Hanson, A" uniqKey="Hanson A">A Hanson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yin, Xc" uniqKey="Yin X">XC Yin</name>
</author>
<author>
<name sortKey="Yang, C" uniqKey="Yang C">C Yang</name>
</author>
<author>
<name sortKey="Pei, Wy" uniqKey="Pei W">WY Pei</name>
</author>
<author>
<name sortKey="Hao, Hw" uniqKey="Hao H">HW Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wolf, C" uniqKey="Wolf C">C Wolf</name>
</author>
<author>
<name sortKey="Jolion, J" uniqKey="Jolion J">J Jolion</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author>
<name sortKey="Ramesh, Bp" uniqKey="Ramesh B">BP Ramesh</name>
</author>
<author>
<name sortKey="Yu, H" uniqKey="Yu H">H Yu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bada, M" uniqKey="Bada M">M Bada</name>
</author>
<author>
<name sortKey="Eckert, M" uniqKey="Eckert M">M Eckert</name>
</author>
<author>
<name sortKey="Evans, D" uniqKey="Evans D">D Evans</name>
</author>
<author>
<name sortKey="Garcia, K" uniqKey="Garcia K">K Garcia</name>
</author>
<author>
<name sortKey="Shipley, K" uniqKey="Shipley K">K Shipley</name>
</author>
<author>
<name sortKey="Sitnikov, D" uniqKey="Sitnikov D">D Sitnikov</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25951377</article-id>
<article-id pub-id-type="pmc">4423993</article-id>
<article-id pub-id-type="publisher-id">PONE-D-14-57296</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0126200</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures</article-title>
<alt-title alt-title-type="running-head">DeTEXT</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Yin</surname>
<given-names>Xu-Cheng</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yang</surname>
<given-names>Chun</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pei</surname>
<given-names>Wei-Yi</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Man</surname>
<given-names>Haixia</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Jun</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Learned-Miller</surname>
<given-names>Erik</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yu</surname>
<given-names>Hong</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>School of Foreign Studies, University of Science and Technology Beijing, Beijing, China</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>School of Computer Science, University of Massachusetts Amherst, MA, USA</addr-line>
</aff>
<aff id="aff004">
<label>4</label>
<addr-line>Department of Quantitative Health Sciences, University of Massachusetts Medical School, MA, USA</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Ranganathan</surname>
<given-names>Shoba</given-names>
</name>
<role>Academic Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Macquarie University, AUSTRALIA</addr-line>
</aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: XCY ELM HY. Performed the experiments: CY WYP JZ. Analyzed the data: XCY WYP JZ. Contributed reagents/materials/analysis tools: WYP JZ. Wrote the paper: XCP CY HM HY ELM.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>xuchengyin@ustb.edu.cn</email>
(XCY);
<email>hong.yu@umassmed.edu</email>
(HY)</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>7</day>
<month>5</month>
<year>2015</year>
</pub-date>
<volume>10</volume>
<issue>5</issue>
<elocation-id>e0126200</elocation-id>
<history>
<date date-type="received">
<day>24</day>
<month>12</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>3</month>
<year>2015</year>
</date>
</history>
<permissions>
<license xlink:href="https://creativecommons.org/publicdomain/zero/1.0/">
<license-p>This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the
<ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/publicdomain/zero/1.0/">Creative Commons CC0</ext-link>
public domain dedication</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0126200.pdf"></self-uri>
<abstract>
<p>Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes
<bold>D</bold>
e
<bold>TEXT</bold>
: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the
<bold>D</bold>
e
<bold>TEXT</bold>
data and make available evaluation protocols for
<bold>D</bold>
e
<bold>TEXT</bold>
. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area.
<bold>D</bold>
e
<bold>TEXT</bold>
is publicly available for downloading at
<ext-link ext-link-type="uri" xlink:href="http://prir.ustb.edu.cn/DeTEXT/">http://prir.ustb.edu.cn/DeTEXT/</ext-link>
.</p>
</abstract>
<funding-group>
<funding-statement>Xu-Cheng Yin's work was partially supported by National Natural Science Foundation of China (61105018,61473036). The research reported in this publication was supported in part by the National Institutes of Health the National Institute of General Medical Sciences under award number 5R01GM095476 and the National Center for Advancing Translational Sciences under award number UL1TR000161. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="6"></fig-count>
<table-count count="8"></table-count>
<page-count count="19"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>All relevant data are within the paper.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>All relevant data are within the paper.</p>
</notes>
</front>
<body>
<sec sec-type="intro" id="sec001">
<title>Introduction</title>
<p>Figures are ubiquitous in biomedical literature, and they represent important biomedical knowledge.
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1</xref>
shows some representative biomedical figures and their embedded text. The sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Consequently, during the last few years, figure classification, retrieval and mining have garnered significant attention in the biomedical research communities [
<xref rid="pone.0126200.ref001" ref-type="bibr">1</xref>
<xref rid="pone.0126200.ref012" ref-type="bibr">12</xref>
]. Since text frequently appears in figures, automatically extracting such figure text may assist the task of mining information from figures. Little research, however, has specifically explored automated text extraction from biomedical figures.</p>
<fig id="pone.0126200.g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Representative biomedical figures and their texts.</title>
<p>(a) experimental results (gene sequence), (b) research models, and (c) biomedical objects.</p>
</caption>
<graphic xlink:href="pone.0126200.g001"></graphic>
</fig>
<p>The structured literature image finder (SLIF) system applies an existing optical character recognition (OCR) system to recognize figure text and identify potential image pointers. SLIF then parses text and figures in biomedical literature by matching image pointers in images and captions [
<xref rid="pone.0126200.ref007" ref-type="bibr">7</xref>
]. Other researchers have also applied existing OCR tools to extract figure text and then incorporate the figure text for applications, e.g., image and document retrieval [
<xref rid="pone.0126200.ref005" ref-type="bibr">5</xref>
,
<xref rid="pone.0126200.ref011" ref-type="bibr">11</xref>
]. Kim and Yu developed algorithms to improve the performance of an existing off-the-shelf OCR tool for specifically recognizing biomedical figure text [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
].</p>
<p>Benchmark datasets have proved an invaluable resource in developing automated systems for text detection and reading. Many publicly available image datasets have had major impacts in text detection and recognition from scene images, e.g., MSRA-I [
<xref rid="pone.0126200.ref013" ref-type="bibr">13</xref>
], KIST [
<xref rid="pone.0126200.ref014" ref-type="bibr">14</xref>
], SVT [
<xref rid="pone.0126200.ref015" ref-type="bibr">15</xref>
], NEOCR [
<xref rid="pone.0126200.ref016" ref-type="bibr">16</xref>
], OSTD [
<xref rid="pone.0126200.ref017" ref-type="bibr">17</xref>
], IIIT5K Word [
<xref rid="pone.0126200.ref018" ref-type="bibr">18</xref>
], MSRA-II [
<xref rid="pone.0126200.ref019" ref-type="bibr">19</xref>
], and USTB-SV1K [
<xref rid="pone.0126200.ref020" ref-type="bibr">20</xref>
]. Using the annotated datasets as the ground truth, the International Conference on Document Analysis and Recognition (ICDAR) has held several international technical competitions on text extraction from scene images and born-digital figures by releasing a series of public benchmark datasets, i.e., ICDAR Robust Reading Competitions 2003 [
<xref rid="pone.0126200.ref021" ref-type="bibr">21</xref>
], 2005 [
<xref rid="pone.0126200.ref022" ref-type="bibr">22</xref>
], 2011 [
<xref rid="pone.0126200.ref023" ref-type="bibr">23</xref>
,
<xref rid="pone.0126200.ref024" ref-type="bibr">24</xref>
], and 2013 [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
]. Similarly, efforts to build benchmark datasets and create common ground for evaluation, including the GENIA corpus [
<xref rid="pone.0126200.ref026" ref-type="bibr">26</xref>
], the TREC Genomics [
<xref rid="pone.0126200.ref027" ref-type="bibr">27</xref>
], the BioCreative challenges [
<xref rid="pone.0126200.ref028" ref-type="bibr">28</xref>
], and the i2b2 challenges [
<xref rid="pone.0126200.ref029" ref-type="bibr">29</xref>
], have been significant in biomedical natural language processing research.</p>
<p>Many technologies and systems for text detection and recognition have been widely investigated and developed in the open domain for common complex images, e.g., scene images and born-digital pictures [
<xref rid="pone.0126200.ref030" ref-type="bibr">30</xref>
]. Specifically, text detection and recognition in natural scene images is a recent hot topic in the fields of Document Analysis and Recognition, Computer Vision, and Machine Learning. First, various scene text detection methods, including sliding window based methods [
<xref rid="pone.0126200.ref026" ref-type="bibr">26</xref>
,
<xref rid="pone.0126200.ref031" ref-type="bibr">31</xref>
], connected component based methods [
<xref rid="pone.0126200.ref017" ref-type="bibr">17</xref>
,
<xref rid="pone.0126200.ref032" ref-type="bibr">32</xref>
,
<xref rid="pone.0126200.ref033" ref-type="bibr">33</xref>
] and hybrid methods [
<xref rid="pone.0126200.ref034" ref-type="bibr">34</xref>
], have been proposed and applied in the literature. Recently, Maximally Stable Extremal Regions (MSERs) or Extremal Regions (ERs) based methods have been the focus of many methods [
<xref rid="pone.0126200.ref035" ref-type="bibr">35</xref>
<xref rid="pone.0126200.ref038" ref-type="bibr">38</xref>
]. Moreover, Yin’s [
<xref rid="pone.0126200.ref038" ref-type="bibr">38</xref>
] and Kim’s [
<xref rid="pone.0126200.ref037" ref-type="bibr">37</xref>
] MSER based methods won first place in both the “Text Localization in Real Scenes” competition at ICDAR 2013 [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
] and the ICDAR 2011 [
<xref rid="pone.0126200.ref024" ref-type="bibr">24</xref>
] Robust Reading Competition.</p>
<p>There are also significant research efforts on scene word recognition, e.g., recognition frameworks by exploiting bottom-up and top-down cues [
<xref rid="pone.0126200.ref018" ref-type="bibr">18</xref>
], recognition methods with language models [
<xref rid="pone.0126200.ref039" ref-type="bibr">39</xref>
,
<xref rid="pone.0126200.ref040" ref-type="bibr">40</xref>
], and recognition approaches with probabilistic graphical models [
<xref rid="pone.0126200.ref041" ref-type="bibr">41</xref>
]. Specifically, “PhotoOCR”, which won first place in “Word Recognition in Real Scenes” at ICDAR 2013 [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
], is built on character classification with deep neural networks and language modeling with massive training data [
<xref rid="pone.0126200.ref042" ref-type="bibr">42</xref>
]. Finally, there are also some works on end-to-end scene text recognition, e.g., word spotting based systems [
<xref rid="pone.0126200.ref043" ref-type="bibr">43</xref>
], efficient character detection and recognition based systems [
<xref rid="pone.0126200.ref035" ref-type="bibr">35</xref>
,
<xref rid="pone.0126200.ref044" ref-type="bibr">44</xref>
], and hybrid recognition systems [
<xref rid="pone.0126200.ref045" ref-type="bibr">45</xref>
].</p>
<p>Unlike images in the open domain, biomedical figures are highly complex and therefore present unique challenges [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
]. For example, as shown in Figs
<xref ref-type="fig" rid="pone.0126200.g001">1</xref>
and
<xref ref-type="fig" rid="pone.0126200.g002">2</xref>
, biomedical figures typically have complex layout, small font size, short text, specific text (e.g. gene sequence), and complex symbols. In most cases, complexity is high. As shown in
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
, figure text has not only come with a complex layout but also color text and irregular text arrangement. Consequently, conventional OCR technologies and systems which are typically trained on simpler open domain document images can’t deal with these challenges uniquely presented in biomedical figures. Moreover, without a high quality benchmark dataset, it would be difficult to develop and to compare different techniques for extracting figure text.</p>
<fig id="pone.0126200.g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.g002</object-id>
<label>Fig 2</label>
<caption>
<title>An example biomedical figure with a complex layout, color text, and irregular text arrangement.</title>
</caption>
<graphic xlink:href="pone.0126200.g002"></graphic>
</fig>
<p>In FigTExT [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
], Kim and Yu constructed a gold standard (dataset) for developing and testing figure text detection and recognition. This dataset comprises of 382 biomedical figures from 70 full-text articles randomly selected from PubMed Central. However, the dataset has significant limitations. First it is not publicly available. Secondly, authors annotated only ground truth text in figures without corresponding locations or other related information in the image. Therefore, it is not possible to use it as the benchmark to evaluate the performance of text detection and recognition technologies as done in the Document Analysis and Recognition (DAR) literature, e.g., a series of ICDAR Robust Reading Competitions.</p>
<p>As a result, following the general strategies in DAR, in this paper we report the development of
<bold>D
<sc>e</sc>
TEXT</bold>
: A database for evaluating text extraction from biomedical literature figures. Due to the complexity of biomedical figures,
<bold>D
<sc>e</sc>
TEXT</bold>
can be used as a common ground to evaluate text detection and recognition algorithms for complex images.</p>
<p>The contributions of this work are as follows.
<bold>D
<sc>e</sc>
TEXT</bold>
is the first figure-text annotation of biomedical literature. Giving the importance of biomedical literature and the experiments (figures), the potential impact of
<bold>D
<sc>e</sc>
TEXT</bold>
is huge.
<bold>D
<sc>e</sc>
TEXT</bold>
is large and representative. It comprises of close to ten thousands annotated text regions from hundreds of full-text biomedical articles. The annotation is rich and comprehensive. Our annotation guideline extended the existing guideline used in the open domain (e.g., the ICDAR Robust Reading Competition [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
]). In our annotation, figures were annotated with not only the text region’s orientation, location and ground truth text, but also the image quality. Finally,
<bold>D
<sc>e</sc>
TEXT</bold>
(
<ext-link ext-link-type="uri" xlink:href="http://prir.ustb.edu.cn/DeTEXT/">http://prir.ustb.edu.cn/DeTEXT/</ext-link>
) is open-access and we will make available the fully annotated data to the public.</p>
<p>Moreover, compared to the datasets in the literature,
<bold>D
<sc>e</sc>
TEXT</bold>
has a various types of new text region features, where typical representations include blurred text, small-size characters, color text, and complex background and layouts. There are also some specific challenges from the text complexity of biomedical figures, where a large amount of short words, domain terms, upper cases, text with irregular arrangement, etc. are embedded in figures.</p>
<p>In summary,
<bold>D
<sc>e</sc>
TEXT</bold>
is the first public image dataset for biomedical literature figure detection, recognition, and retrieval that can be used as a benchmark dataset for fair comparison and technique improvement. Large scale image-text annotation including the TREC (trec.nist.gov) and CLEF (
<ext-link ext-link-type="uri" xlink:href="http://www.clef-initiative.eu">www.clef-initiative.eu</ext-link>
) efforts have shown significant impact on the research community. In addition to being the first benchmark dataset, we will also make freely available our
<bold>D
<sc>e</sc>
TEXT</bold>
annotation tool, another contribution to the research community.</p>
</sec>
<sec sec-type="materials|methods" id="sec002">
<title>Methods</title>
<p>In the following, we first describe how we selected figures. Then we introduce the annotation guideline and the annotation tool and describe our annotation process. Finally, several strategies for dataset separation and evaluation protocols are presented.</p>
<sec id="sec003">
<title>A Collection of Representative Open-Access Biomedical Figures</title>
<p>In order to make impact in research,
<bold>D
<sc>e</sc>
TEXT</bold>
must be publicly available and free of licensing issues. We therefore selected open-access full-text articles and their figures from the PubMed Central (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed">http://www.ncbi.nlm.nih.gov/pubmed</ext-link>
). In order for
<bold>D
<sc>e</sc>
TEXT</bold>
to be representative, we maximized the number of figures to be annotated as well as the number of full-text articles from which the figures are included in
<bold>D
<sc>e</sc>
TEXT</bold>
. For this, we first randomly selected 100 articles from which we randomly selected one figure from each article. We then randomly selected an article from which we added all its figures to
<bold>D
<sc>e</sc>
TEXT</bold>
. We repeated this process until we reached 500, the total number of figures in
<bold>D
<sc>e</sc>
TEXT</bold>
. Therefore an additional 188 articles are included.</p>
</sec>
<sec id="sec004">
<title>Annotation Guideline</title>
<p>We have initially followed the existing guideline for image text annotation (for detection and recognition) in the open domain (e.g., ICDAR Robust Reading Competition [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
]). However, we found the guideline is limited; it only requires for annotating image text with location and true text information. Figures published in the biomedical domain are complex. Studies have shown that many of them are in poor quality [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
]. Moreover, some text (e.g., the mention of gene or protein names) is more semantically rich than others (e.g., panel markers) [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
], we annotate not only the text region’s location, orientation, and ground truth text, but also image quality.</p>
<p>Following the annotation guideline [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
], we annotate text region’s location and orientation information with four vertices, i.e., the left-top (LT), top-right (TR), right-bottom (RB), and bottom-left (BL) points of the text region. Some text regions can have multiple orientations (one example is illustrated in
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
). We also annotate orientation attributes for every text region. The “horizontal/oriented” indicates whether the text region is aligned in the horizontal (0) or oriented (or vertical, 1) direction.</p>
<p>We found many text regions are fragmented. An example is illustrated in
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(c)</xref>
with single characters “A”, “B”, “C” and “D” that usually illustrate image sections and do not carry out semantic meanings of figure content. We therefore define two additional requirements for text region inclusion. Firstly, we annotate text region that incorporates at least one or more words. Here, the “word” unit should be a character set composed of several aligned and close characters. Most text regions are in a horizontal direction; a few text regions are with multi-directions (including the vertical direction). The second requirement is word length. The length of a word to be annotated should be equal to or more than 2.</p>
<p>We also made changes for annotating ground truth text for a text region. In biomedical literature figures, figure texts are typically complex, including incorporating uncommon symbols. For example, a chemical formula comprises of digits, uppercase letters, superscript or subscript characters and specific symbols. Accurately identifying the location of superscript and subscript characters poses a significant challenge for human annotators. For consistent annotations, we only annotate the ground truth text of superscript or subscript characters and leave out their location information, as illustrated in
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
. Another rational factor for skipping the location of super- and sub-script characters is that most web-based full-text articles and documents in Database or Information Retrieval systems only provide text and characters without superscript or subscript locations. We annotate the location of other types of characters in figure text.</p>
<p>We assess image quality information (e.g., with blurring and noising) from the prospective of judging how difficult it would be for a human to detect and recognize the text in the annotated region. For every text region, we assign one of the following types for image quality assessment: “normal”, “blurry”, “small”, “color”, “short”, “complex_background”, “complex_symbol”, or “specific_text” (see more descriptions of “difficulty” for challenges in Section “
<xref ref-type="sec" rid="sec014">Discussion</xref>
”).</p>
</sec>
<sec id="sec005">
<title>Annotation Tool</title>
<p>We developed an annotation tool for annotating
<bold>D
<sc>e</sc>
TEXT</bold>
and made it freely available from
<ext-link ext-link-type="uri" xlink:href="http://prir.ustb.edu.cn/DeTEXT/">http://prir.ustb.edu.cn/DeTEXT/</ext-link>
. We used Microsoft VS2012 (C#) to implement our tool in the Windows 32-Bit Platform.
<xref ref-type="fig" rid="pone.0126200.g003">Fig 3</xref>
shows the front-end interface of the annotation tool. The figure and its annotated text regions are shown to the left. The annotated information (e.g., text and locations) is shown to the right, where “folderpath” is to open a directory of figures to be annotated, “back” and “next” are to browse previous and next figures. Functions for displaying the figure (zoom in and out) are also shown to the right. “Page1” on the right shows the annotation information for the entire figure, and “Page 2” displays detailed annotation information for each text region, including the region’s location and orientation, ground truth text and difficulty (for the image quality). In “Page 1”, “write_pic” means to start the annotation procedure. When annotating a text region, press the mouse right key on the left top corner of the region and drop to the right bottom corner. Then, “Page 2” pops up, and corresponding text region information can be easily annotated.</p>
<fig id="pone.0126200.g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.g003</object-id>
<label>Fig 3</label>
<caption>
<title>The annotation tool for D
<sc>e</sc>
TEXT.</title>
<p>The figure and its annotated text regions are shown to the left. The annotated information (e.g., text and locations) is shown to the right. Functions for displaying the figure (zoom in and out), etc, are also shown to the right.</p>
</caption>
<graphic xlink:href="pone.0126200.g003"></graphic>
</fig>
<p>With our annotation tool, each figure in the database corresponds to a ground truth file (we use a “.txt” file to store the annotation information), in which each line records the information of the text in the corresponding region. The format of the ground truth file (e.g., “ex.txt”) is illustrated in
<xref ref-type="fig" rid="pone.0126200.g004">Fig 4</xref>
.</p>
<fig id="pone.0126200.g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.g004</object-id>
<label>Fig 4</label>
<caption>
<title>An example for the annotation information.</title>
<p>Each figure in the database corresponds to a ground truth file (we use a “.txt” file to store the annotation information), in which each line records the information of the text in the corresponding text region.</p>
</caption>
<graphic xlink:href="pone.0126200.g004"></graphic>
</fig>
</sec>
<sec id="sec006">
<title>Annotation Process</title>
<p>Six annotators, all of whom are computer science graduate students in pattern recognition and image processing, completed the annotation of
<bold>D
<sc>e</sc>
TEXT</bold>
. We performed the annotation process with two consecutive iterations. 500 figures of the entire database are randomly divided into 5 100-figure subsets. On the first iteration, five students each independently annotated one subset. On the second iteration, each student checked one subset of figures annotated by one other student and resolve the conflicts if occurred. Our initial annotation has been an iterative process during which we refined the annotation guideline and updated the annotated data accordingly. We therefore did not report the annotation agreement. Instead, in order to measure the agreement of the inter-annotator, we asked a different annotator who followed the updated annotation guideline. This new annotator independently annotated 10 figures randomly selected from the entire database (500 figures) and we measured inter-annotator agreement with those 10 figures.</p>
</sec>
<sec id="sec007">
<title>Inter-Annotator Agreement Metrics</title>
<p>We simply calculated the overlap of ground truth for inter-annotator agreement of text annotation. For inter-annotator agreement with text location, we followed a metric commonly used in DAR [
<xref rid="pone.0126200.ref021" ref-type="bibr">21</xref>
]. Specifically, we compute the matching (overlapping) score between two regions, i.e.,
<italic>S</italic>
<sub>1</sub>
and
<italic>S</italic>
<sub>2</sub>
,
<disp-formula id="pone.0126200.e001">
<alternatives>
<graphic xlink:href="pone.0126200.e001.jpg" id="pone.0126200.e001g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M1">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>h</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>×</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>S</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
where
<italic>S</italic>
<sub>1</sub>
and
<italic>S</italic>
<sub>2</sub>
are the regions in the original annotation and the re-annotation respectively, and
<italic>Area</italic>
is the area size of the (rectangle) region. If these two text regions in both annotations are overlapped much as
<italic>fMatch</italic>
(
<italic>S</italic>
<sub>1</sub>
,
<italic>S</italic>
<sub>2</sub>
) ≥ 85% then we identify these two regions are with the same location (i.e., annotation agreement for the location).</p>
</sec>
<sec id="sec008">
<title>D
<sc>e</sc>
TEXT Subsets Division</title>
<p>In the image community, a high quality annotation such as
<bold>D
<sc>e</sc>
TEXT</bold>
can be used as ground truth to evaluate different technologies. In order to present a fair universal evaluation database with
<bold>D
<sc>e</sc>
TEXT</bold>
, we present several dataset division strategies for research. First, we provided a public database of
<bold>D
<sc>e</sc>
TEXT</bold>
that contains all collected figures. Second, following the conventional way in the Document Analysis and Recognition field, we also divided the entire
<bold>D
<sc>e</sc>
TEXT</bold>
into three separate non-overlapping subsets: training, validation, and testing. We also utilized another popular strategy, cross-validation, for using the dataset.</p>
</sec>
<sec id="sec009">
<title>Evaluation Protocols of D
<sc>e</sc>
TEXT</title>
<p>There are a variety of evaluation protocols for text detection and recognition in images, most of which are based on the overlapping ratio protocol and accuracy protocol. Here, for text detection and recognition from biomedical literature figures, we followed the evaluation strategies used in a series of ICDAR Robust Reading Competitions 2003 [
<xref rid="pone.0126200.ref021" ref-type="bibr">21</xref>
], 2005 [
<xref rid="pone.0126200.ref022" ref-type="bibr">22</xref>
], 2011 [
<xref rid="pone.0126200.ref023" ref-type="bibr">23</xref>
,
<xref rid="pone.0126200.ref024" ref-type="bibr">24</xref>
], and 2013 [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
]. Specifically, we recommended the text detection and word recognition evaluation protocols used in ICDAR 2011 Robust Reading Competition (ICDAR2011), and the end-to-end text recognition evaluation protocol used in ICDAR 2003 Robust Reading Competition, for evaluating methods and systems for our
<bold>D
<sc>e</sc>
TEXT</bold>
dataset.</p>
<p>
<italic>Text detection evaluation</italic>
(with ICDAR2011 [
<xref rid="pone.0126200.ref024" ref-type="bibr">24</xref>
] protocol, DetEval [
<xref rid="pone.0126200.ref046" ref-type="bibr">46</xref>
]): This protocol comprises the area overlap and the object level evaluation. DetEval is also a software toolbox, which is publicly available at
<ext-link ext-link-type="uri" xlink:href="http://liris.cnrs.fr/christian.wolf/software/deteval/index.html">http://liris.cnrs.fr/christian.wolf/software/deteval/index.html</ext-link>
. First, from the two sets
<italic>D</italic>
and
<italic>G</italic>
of detected rectangles (regions) and ground truth rectangles, we can construct two recall and precision matrices
<italic>σ</italic>
and
<italic>τ</italic>
of the area overlap where the rows of the matrices correspond to the ground truth rectangles and the columns correspond to the detected rectangles [
<xref rid="pone.0126200.ref047" ref-type="bibr">47</xref>
]. Here, the values of the
<italic>i</italic>
<sup>
<italic>th</italic>
</sup>
row and
<italic>j</italic>
<sup>
<italic>th</italic>
</sup>
column of these two matrices are
<disp-formula id="pone.0126200.e002">
<alternatives>
<graphic xlink:href="pone.0126200.e002.jpg" id="pone.0126200.e002g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M2">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>R</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
<disp-formula id="pone.0126200.e003">
<alternatives>
<graphic xlink:href="pone.0126200.e003.jpg" id="pone.0126200.e003g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M3">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:msub>
<mml:mi>τ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>R</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
where
<italic>Area</italic>
is the area size of the rectangle region. Then, the two rectangles are decided as matched ones if
<disp-formula id="pone.0126200.e004">
<alternatives>
<graphic xlink:href="pone.0126200.e004.jpg" id="pone.0126200.e004g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M4">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:msub>
<mml:mi>σ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>></mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>8</mml:mn>
<mml:mo>,</mml:mo>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:msub>
<mml:mi>τ</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>></mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
By supporting one-to-one, one-to-many, and many-to-one matches among ground-truth objects and detections, this evaluation strategy deals with over-split or over-merge of detections [
<xref rid="pone.0126200.ref046" ref-type="bibr">46</xref>
]. Based on this matching strategy, the recall and precision measures in one image can be defined as
<disp-formula id="pone.0126200.e005">
<alternatives>
<graphic xlink:href="pone.0126200.e005.jpg" id="pone.0126200.e005g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M5">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>G</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
<disp-formula id="pone.0126200.e006">
<alternatives>
<graphic xlink:href="pone.0126200.e006.jpg" id="pone.0126200.e006g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M6">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>D</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
where
<italic>Match</italic>
<sub>
<italic>G</italic>
</sub>
and
<italic>Match</italic>
<sub>
<italic>D</italic>
</sub>
are functions by considering different types of matches. These functions are defined as
<disp-formula id="pone.0126200.e007">
<alternatives>
<graphic xlink:href="pone.0126200.e007.jpg" id="pone.0126200.e007g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M7">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>G</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:mtable>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>matches</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>against</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>a</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>single</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>detected</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>rectangle</mml:mtext>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd></mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>does</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>not</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>match</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>against</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>any</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>detected</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>rectangle</mml:mtext>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd></mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>matches</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>against</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>several</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>(</mml:mtext>
<mml:mi>k</mml:mi>
<mml:mtext>)</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>detected</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>rectangles</mml:mtext>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mo></mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
<disp-formula id="pone.0126200.e008">
<alternatives>
<graphic xlink:href="pone.0126200.e008.jpg" id="pone.0126200.e008g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M8">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>D</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:mtable>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>matches</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>against</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>a</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>single</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>detected</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>rectangle</mml:mtext>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd></mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>does</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>not</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>match</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>against</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>any</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>detected</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>rectangle</mml:mtext>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd></mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mtext>if</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:msub>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>matches</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>against</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>several</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>(</mml:mtext>
<mml:mi>k</mml:mi>
<mml:mtext>)</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>detected</mml:mtext>
<mml:mspace width="4.pt"></mml:mspace>
<mml:mtext>rectangles</mml:mtext>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
<mml:mo></mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
where
<italic>f</italic>
<sub>
<italic>sc</italic>
</sub>
(
<italic>k</italic>
) is set as a constant (0.8). In the case of
<italic>N</italic>
images with
<inline-formula id="pone.0126200.e009">
<mml:math id="M9">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>G</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mi>N</mml:mi>
</mml:msup>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula id="pone.0126200.e010">
<mml:math id="M10">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>D</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mo>=</mml:mo>
<mml:mo stretchy="false">{</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mn>1</mml:mn>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>.</mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mi>N</mml:mi>
</mml:msup>
<mml:mo stretchy="false">}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
, text region recall and precision are defined as
<disp-formula id="pone.0126200.e011">
<alternatives>
<graphic xlink:href="pone.0126200.e011.jpg" id="pone.0126200.e011g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M11">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mover>
<mml:mi>G</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mo>,</mml:mo>
<mml:mover>
<mml:mi>D</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>G</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
<disp-formula id="pone.0126200.e012">
<alternatives>
<graphic xlink:href="pone.0126200.e012.jpg" id="pone.0126200.e012g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M12">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mover>
<mml:mi>G</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mo>,</mml:mo>
<mml:mover>
<mml:mi>D</mml:mi>
<mml:mo>¯</mml:mo>
</mml:mover>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mi>M</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>c</mml:mi>
<mml:msub>
<mml:mi>h</mml:mi>
<mml:mi>D</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mi>D</mml:mi>
<mml:mi>j</mml:mi>
<mml:mi>k</mml:mi>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mi>G</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>r</mml:mi>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>t</mml:mi>
<mml:mi>p</mml:mi>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mi>k</mml:mi>
</mml:msup>
<mml:mo>|</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
Finally, f-score is easily calculated as
<disp-formula id="pone.0126200.e013">
<alternatives>
<graphic xlink:href="pone.0126200.e013.jpg" id="pone.0126200.e013g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M13">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>Please note that for the rotated text detection region, we will first correct the rotated rectangle to the horizontal rectangle, and then use this protocol for evaluating.</p>
<p>
<italic>Word recognition evaluation</italic>
(with ICDAR 2011 [
<xref rid="pone.0126200.ref024" ref-type="bibr">24</xref>
] protocol): Word recognition is usually and simply evaluated by
<disp-formula id="pone.0126200.e014">
<alternatives>
<graphic xlink:href="pone.0126200.e014.jpg" id="pone.0126200.e014g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M14">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>/</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
where
<italic>C</italic>
and
<italic>G</italic>
are the correctly recognized word set and ground truth set respectively.</p>
<p>
<italic>End-to-end text recognition evaluation</italic>
(with ICDAR 2003 [
<xref rid="pone.0126200.ref021" ref-type="bibr">21</xref>
] protocol): This protocol uses the standard measures of precision, recall and f-score to evaluate the performance of the end-to-end system, where it rates the quality of match between a target and the estimated rectangle, and defines a strict notion of match between the target and the estimated words: the rectangles must have a match score greater than 0.5 and the word text must match exactly. The match score between two bounding rectangles of text objects is defined as the ratio between the area of intersection and that of the minimum bounding rectangle containing both rectangles. Suppose
<italic>M</italic>
,
<italic>D</italic>
and
<italic>G</italic>
are the set of correctly recognized and location matched text regions, the set of all detected regions, and the set of ground truth regions respectively, the definitions of precision and recall are
<disp-formula id="pone.0126200.e015">
<alternatives>
<graphic xlink:href="pone.0126200.e015.jpg" id="pone.0126200.e015g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M15">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>/</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>D</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>,</mml:mo>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mspace width="3.33333pt"></mml:mspace>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>M</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>/</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>G</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
and f-score is correspondingly computed as
<disp-formula id="pone.0126200.e016">
<alternatives>
<graphic xlink:href="pone.0126200.e016.jpg" id="pone.0126200.e016g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M16">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>P</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo>/</mml:mo>
<mml:mi>R</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>l</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<p>Similar to the evaluation on the important figure text in [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
], we can conveniently evaluate text detection, word recognition, and end-to-end text recognition on the subset of the important figure text according to the corresponding text importance in the full article. Moreover, in D
<sc>e</sc>
TEXT, we are also able to measure the performances of text detection, word recognition, and end-to-end text recognition methods on the subset of the figure text according to the corresponding difficulty for the image quality of the figures.</p>
</sec>
</sec>
<sec sec-type="results" id="sec010">
<title>Results</title>
<sec id="sec011">
<title>Inter-Annotator Agreement</title>
<p>
<xref ref-type="table" rid="pone.0126200.t001">Table 1</xref>
shows the annotation agreement results (i.e., the same location by
<italic>fMatch</italic>
(
<italic>S</italic>
<sub>1</sub>
,
<italic>S</italic>
<sub>2</sub>
) ≥ 85% and the same annotated text in both annotations) of the 10 double-annotated figures (see the above subsection “Annotation Process”). Using the first run annotation as the standard, we found that the agreement of the second run annotation is over 97% in both ground truth text and location. Actually, with
<xref ref-type="table" rid="pone.0126200.t001">Table 1</xref>
, the text and location agreement percentages are same, and are calculated as
<disp-formula id="pone.0126200.e017">
<alternatives>
<graphic xlink:href="pone.0126200.e017.jpg" id="pone.0126200.e017g" mimetype="image" position="anchor" orientation="portrait"></graphic>
<mml:math id="M17">
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mfrac>
<mml:mn>176</mml:mn>
<mml:mrow>
<mml:mo movablelimits="true" form="prefix">max</mml:mo>
<mml:mo>{</mml:mo>
<mml:mn>181</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>189</mml:mn>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:mfrac>
<mml:mo>=</mml:mo>
<mml:mn>97</mml:mn>
<mml:mo>.</mml:mo>
<mml:mn>24</mml:mn>
<mml:mo>%</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
</alternatives>
</disp-formula>
</p>
<table-wrap id="pone.0126200.t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t001</object-id>
<label>Table 1</label>
<caption>
<title>The annotation agreement of the 10 figures randomly selected.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t001g" xlink:href="pone.0126200.t001"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">Original annotations</th>
<th align="left" rowspan="1" colspan="1">Re-annotations</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Number of text regions</td>
<td align="left" rowspan="1" colspan="1">181</td>
<td align="left" rowspan="1" colspan="1">189</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of text regions which have
<italic>the same annotated text</italic>
in both annotations</td>
<td align="left" rowspan="1" colspan="1">176</td>
<td align="left" rowspan="1" colspan="1">176</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Number of text regions which have
<italic>the same location</italic>
in both annotations</td>
<td align="left" rowspan="1" colspan="1">176</td>
<td align="left" rowspan="1" colspan="1">176</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>We manually analyzed the inconsistent annotations. A few examples are shown in
<xref ref-type="fig" rid="pone.0126200.g005">Fig 5</xref>
, in which thin red boxes are agreed annotations while thick blue boxes and thick red boxes are in disagreement, representing the original annotation and the re-annotation respectively.
<xref ref-type="fig" rid="pone.0126200.g005">Fig 5</xref>
also shows cases where ground truth text differs.</p>
<fig id="pone.0126200.g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.g005</object-id>
<label>Fig 5</label>
<caption>
<title>Disagreed examples between the original annotation and the re-annotation, where thick blue and red boxes are text regions with inconsistent annotations.</title>
</caption>
<graphic xlink:href="pone.0126200.g005"></graphic>
</fig>
<p>There are two main reasons for the disagreement, which correspond to two types of text regions, i.e., text with low image quality and text with domain-specific terms. First, although the quality of images overall is reasonable, in some cases, text regions are blurry and small which may be overlooked by the annotators. In addition, domain-specific terms in biomedical literature (e.g., “INSpr” and “bGHpA” in
<xref ref-type="fig" rid="pone.0126200.g005">Fig 5</xref>
) are also challenging. Despite the challenges, the overall agreement is high and therefore we consider
<bold>D
<sc>e</sc>
TEXT</bold>
a high-quality annotated corpus for biomedical figures.</p>
</sec>
<sec id="sec012">
<title>Data Statistics</title>
<p>As described previously,
<bold>D
<sc>e</sc>
TEXT</bold>
comprises of a total of 500 open-access publicly available figures that appear in 288 full-text articles randomly selected from PubMed Central.
<bold>D
<sc>e</sc>
TEXT</bold>
is composed of a total of 9308 text regions which are finely annotated. It is a large-scale dataset for text extraction from images and figures, as in the open domain many publicly available image datasets (e.g. ICDAR Robust Reading Competition datasets) only have about 2000 text (word) regions.
<xref ref-type="table" rid="pone.0126200.t002">Table 2</xref>
shows the annotation statistics by different text regions, and
<xref ref-type="fig" rid="pone.0126200.g006">Fig 6</xref>
shows region samples of different categories. As shown in
<xref ref-type="table" rid="pone.0126200.t002">Table 2</xref>
, “short” is the most common type of region, accounting for 46.8% (4,354/9,308) of all annotated text regions. “Normal” follows the second, accounting for 37.8% (3,519/9,308) of all annotated text regions. “Small”, “blurry”, “color”, “complex_background”, “complex_symbol”, and “specific_text” account for the remaining text regions.</p>
<table-wrap id="pone.0126200.t002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t002</object-id>
<label>Table 2</label>
<caption>
<title>Statistics of text (word) regions and figures with different categories.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t002g" xlink:href="pone.0126200.t002"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Text region category</th>
<th align="left" rowspan="1" colspan="1">NO. of regions (%)</th>
<th align="left" rowspan="1" colspan="1">NO. of figures (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Normal</td>
<td align="char" char="." rowspan="1" colspan="1">3519 (37.8%)</td>
<td align="char" char="." rowspan="1" colspan="1">424 (84.8%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Small</td>
<td align="char" char="." rowspan="1" colspan="1">2419 (26.0%)</td>
<td align="char" char="." rowspan="1" colspan="1">151 (30.2%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Blurry</td>
<td align="char" char="." rowspan="1" colspan="1">1118 (12.0%)</td>
<td align="char" char="." rowspan="1" colspan="1">65 (13.0%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Color</td>
<td align="char" char="." rowspan="1" colspan="1">293 (3.1%)</td>
<td align="char" char="." rowspan="1" colspan="1">39 (7.8%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Short</td>
<td align="char" char="." rowspan="1" colspan="1">4354 (46.8%)</td>
<td align="char" char="." rowspan="1" colspan="1">379 (75.8%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Complex_background</td>
<td align="char" char="." rowspan="1" colspan="1">670 (7.2%)</td>
<td align="char" char="." rowspan="1" colspan="1">86 (17.2%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Complex_symbol</td>
<td align="char" char="." rowspan="1" colspan="1">240 (2.6%)</td>
<td align="char" char="." rowspan="1" colspan="1">75 (15.0%)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Specific_text</td>
<td align="char" char="." rowspan="1" colspan="1">74 (0.8%)</td>
<td align="char" char="." rowspan="1" colspan="1">14 (2.8%)</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<fig id="pone.0126200.g006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Region samples of different categories.</title>
</caption>
<graphic xlink:href="pone.0126200.g006"></graphic>
</fig>
<p>We further counted the number of text regions belonging to multiple categories as shown in
<xref ref-type="table" rid="pone.0126200.t003">Table 3</xref>
. The most common text regions are “small”+“short”, followed by “small”+“blurry” and “blurry”+“short”.</p>
<table-wrap id="pone.0126200.t003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t003</object-id>
<label>Table 3</label>
<caption>
<title>Statistics of text (word) regions and figures with combination of categories.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t003g" xlink:href="pone.0126200.t003"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Combination of region categories</th>
<th align="left" rowspan="1" colspan="1">NO. of regions</th>
<th align="left" rowspan="1" colspan="1">NO. of figures</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">short, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">71</td>
<td align="left" rowspan="1" colspan="1">18</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, short</td>
<td align="left" rowspan="1" colspan="1">1786</td>
<td align="left" rowspan="1" colspan="1">126</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">complex_background, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">23</td>
<td align="left" rowspan="1" colspan="1">9</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">color, short</td>
<td align="left" rowspan="1" colspan="1">96</td>
<td align="left" rowspan="1" colspan="1">22</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, blurry</td>
<td align="left" rowspan="1" colspan="1">858</td>
<td align="left" rowspan="1" colspan="1">47</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, blurry, short</td>
<td align="left" rowspan="1" colspan="1">485</td>
<td align="left" rowspan="1" colspan="1">33</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">short, complex_background</td>
<td align="left" rowspan="1" colspan="1">279</td>
<td align="left" rowspan="1" colspan="1">48</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">blurry, short</td>
<td align="left" rowspan="1" colspan="1">603</td>
<td align="left" rowspan="1" colspan="1">44</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">19</td>
<td align="left" rowspan="1" colspan="1">9</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">color, specific_text</td>
<td align="left" rowspan="1" colspan="1">35</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, blurry, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">5</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, complex_background</td>
<td align="left" rowspan="1" colspan="1">106</td>
<td align="left" rowspan="1" colspan="1">13</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">blurry, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">14</td>
<td align="left" rowspan="1" colspan="1">7</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, short, complex_background</td>
<td align="left" rowspan="1" colspan="1">47</td>
<td align="left" rowspan="1" colspan="1">8</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">color, complex_background</td>
<td align="left" rowspan="1" colspan="1">81</td>
<td align="left" rowspan="1" colspan="1">16</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">color, short, complex_background</td>
<td align="left" rowspan="1" colspan="1">24</td>
<td align="left" rowspan="1" colspan="1">9</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, color, short</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">color, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, color</td>
<td align="left" rowspan="1" colspan="1">28</td>
<td align="left" rowspan="1" colspan="1">7</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, blurry, complex_background</td>
<td align="left" rowspan="1" colspan="1">43</td>
<td align="left" rowspan="1" colspan="1">4</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, blurry, short, complex_background</td>
<td align="left" rowspan="1" colspan="1">9</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">short, complex_background, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, short, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">blurry, complex_background, complex_symbol</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">blurry, short, complex_background</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">3</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">small, color, complex_background</td>
<td align="left" rowspan="1" colspan="1">15</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">complex_background, specific_text</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">2</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>We also annotated orientation attributes (“horizontal/oriented”) for every text region. As shown in
<xref ref-type="table" rid="pone.0126200.t004">Table 4</xref>
, over 9% (847/9,308) of all annotated text regions have rotated text.
<xref ref-type="table" rid="pone.0126200.t004">Table 4</xref>
also shows that there are both horizontal and oriented text regions in some figures (see
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
as a common case).</p>
<table-wrap id="pone.0126200.t004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t004</object-id>
<label>Table 4</label>
<caption>
<title>Statistics of text (word) regions with orientation attributes.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t004g" xlink:href="pone.0126200.t004"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Orientation attribute</th>
<th align="left" rowspan="1" colspan="1">NO. of regions</th>
<th align="left" rowspan="1" colspan="1">NO. of figures</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Horizontal</td>
<td align="left" rowspan="1" colspan="1">8461</td>
<td align="left" rowspan="1" colspan="1">492</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Oriented</td>
<td align="left" rowspan="1" colspan="1">847</td>
<td align="left" rowspan="1" colspan="1">268</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Total</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<italic>9308</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<italic>500</italic>
</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>Since biomedical figures can be classified into five different types(i.e., Gel-image, Image-of-thing, Graph, Model, and Mix) [
<xref rid="pone.0126200.ref048" ref-type="bibr">48</xref>
].
<xref ref-type="table" rid="pone.0126200.t005">Table 5</xref>
shows the statistics of images among image types. Here,
<italic>Gel-image</italic>
consists of gel images (e.g., DNA, RNA and protein);
<italic>Image-of-thing</italic>
refers to pictures of existing objects such as cells, tissues, organs, and equipments;
<italic>Graph</italic>
consists of bar chart, column charts, line charts, plots and other drawn graphs;
<italic>Model</italic>
demonstrates a biological process, a chemical or cellular structure, or an algorithm framework; and
<italic>Mix</italic>
refers to a figure that incorporates two or more other figure types. In
<bold>D
<sc>e</sc>
TEXT</bold>
, there are 16, 46, 232, 124, and 82 images for
<italic>Gel-image</italic>
,
<italic>Image-of-thing</italic>
,
<italic>Graph</italic>
,
<italic>Model</italic>
, and
<italic>Mix</italic>
respectively, which will be sufficient to represent general situations for text extraction from different biomedical figures.</p>
<table-wrap id="pone.0126200.t005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t005</object-id>
<label>Table 5</label>
<caption>
<title>Statistics of biomedical figures with five different types.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t005g" xlink:href="pone.0126200.t005"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">
<italic>Gel-image</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Image-of-thing</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Graph</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Model</italic>
</th>
<th align="left" rowspan="1" colspan="1">
<italic>Mix</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">NO. of figures</td>
<td align="left" rowspan="1" colspan="1">16</td>
<td align="left" rowspan="1" colspan="1">46</td>
<td align="left" rowspan="1" colspan="1">232</td>
<td align="left" rowspan="1" colspan="1">124</td>
<td align="left" rowspan="1" colspan="1">82</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
<sec id="sec013">
<title>Data Subsets for Evaluation</title>
<p>First, the researchers can download this entire dataset of
<bold>D
<sc>e</sc>
TEXT</bold>
with 500 figures, and these resources may be altered, amended or annotated in any way for facilitating related research issues.</p>
<p>Second, we also got three separate non-overlapping subsets: training, validation, and testing. Details are shown in
<xref ref-type="table" rid="pone.0126200.t006">Table 6</xref>
.</p>
<table-wrap id="pone.0126200.t006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t006</object-id>
<label>Table 6</label>
<caption>
<title>Training, validation, and testing sets of D
<sc>e</sc>
TEXT.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t006g" xlink:href="pone.0126200.t006"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Subset</th>
<th align="left" rowspan="1" colspan="1">NO. of figures</th>
<th align="left" rowspan="1" colspan="1">NO. of articles</th>
<th align="left" rowspan="1" colspan="1">Remarks</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Training set</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">Select one figure for each article.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Validation set</td>
<td align="left" rowspan="1" colspan="1">100</td>
<td align="left" rowspan="1" colspan="1">45</td>
<td align="left" rowspan="1" colspan="1">Randomly select 45 articles and include all common figures in these articles from the remaining dataset without the training set.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Testing set</td>
<td align="left" rowspan="1" colspan="1">300</td>
<td align="left" rowspan="1" colspan="1">143</td>
<td align="left" rowspan="1" colspan="1">The remaining subset after selecting the validation set.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<italic>Total</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<italic>500</italic>
</td>
<td align="left" rowspan="1" colspan="1">
<italic>288</italic>
</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>The training set comprises 100 figures from 100 articles (each figure from one article), maximizing the number of both figures and articles used for training. The validation set is composed of 100 figures from 45 articles randomly selected from the remaining dataset after the construction of the training set is finished. The testing set is the remaining subset after the construction of the training and validation sets are completed. It comprises 300 figures from 143 articles.</p>
<p>Similar to the entire dataset (in
<xref ref-type="table" rid="pone.0126200.t003">Table 3</xref>
), we also presented the annotation statistics by different text regions and figures with different categories of these three separate non-overlapping subsets (training, validation, and testing sets) in
<xref ref-type="table" rid="pone.0126200.t007">Table 7</xref>
. From
<xref ref-type="table" rid="pone.0126200.t007">Table 7</xref>
, we can see that training, validation, and testing sets have similar distributions of regions and figures with different text region categories (challenges for text recognition).</p>
<table-wrap id="pone.0126200.t007" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t007</object-id>
<label>Table 7</label>
<caption>
<title>Statistics of text regions and figures with different categories on the training, validation, and testing sets.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t007g" xlink:href="pone.0126200.t007"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="2" colspan="1">Text region category</th>
<th align="left" rowspan="1" colspan="1">NO. of regions</th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">NO. of figures</th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">Training</th>
<th align="left" rowspan="1" colspan="1">Validation</th>
<th align="left" rowspan="1" colspan="1">Testing</th>
<th align="left" rowspan="1" colspan="1">Training</th>
<th align="left" rowspan="1" colspan="1">Validation</th>
<th align="left" rowspan="1" colspan="1">Testing</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Normal</td>
<td align="left" rowspan="1" colspan="1">731</td>
<td align="left" rowspan="1" colspan="1">597</td>
<td align="left" rowspan="1" colspan="1">2191</td>
<td align="left" rowspan="1" colspan="1">76</td>
<td align="left" rowspan="1" colspan="1">83</td>
<td align="left" rowspan="1" colspan="1">265</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Small</td>
<td align="left" rowspan="1" colspan="1">703</td>
<td align="left" rowspan="1" colspan="1">483</td>
<td align="left" rowspan="1" colspan="1">1233</td>
<td align="left" rowspan="1" colspan="1">37</td>
<td align="left" rowspan="1" colspan="1">36</td>
<td align="left" rowspan="1" colspan="1">78</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Blurry</td>
<td align="left" rowspan="1" colspan="1">638</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">472</td>
<td align="left" rowspan="1" colspan="1">28</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">36</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Color</td>
<td align="left" rowspan="1" colspan="1">52</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">230</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="left" rowspan="1" colspan="1">3</td>
<td align="left" rowspan="1" colspan="1">29</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Short</td>
<td align="left" rowspan="1" colspan="1">964</td>
<td align="left" rowspan="1" colspan="1">780</td>
<td align="left" rowspan="1" colspan="1">2610</td>
<td align="left" rowspan="1" colspan="1">81</td>
<td align="left" rowspan="1" colspan="1">63</td>
<td align="left" rowspan="1" colspan="1">235</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Complex_background</td>
<td align="left" rowspan="1" colspan="1">270</td>
<td align="left" rowspan="1" colspan="1">126</td>
<td align="left" rowspan="1" colspan="1">294</td>
<td align="left" rowspan="1" colspan="1">24</td>
<td align="left" rowspan="1" colspan="1">15</td>
<td align="left" rowspan="1" colspan="1">47</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Complex_symbol</td>
<td align="left" rowspan="1" colspan="1">112</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="left" rowspan="1" colspan="1">128</td>
<td align="left" rowspan="1" colspan="1">33</td>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">42</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Specific_text</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">56</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">5</td>
<td align="left" rowspan="1" colspan="1">7</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>Third, for the cross-validation separation strategy, if we take all of the images (actually the entire
<bold>D
<sc>e</sc>
TEXT</bold>
database), and do 5-fold cross validation, then for each fold we can use 400 for training and 100 for testing. As a result, we constructed 5-fold and 10-fold cross validation datasets which are public and available at
<ext-link ext-link-type="uri" xlink:href="http://prir.ustb.edu.cn/DeTEXT/">http://prir.ustb.edu.cn/DeTEXT/</ext-link>
.</p>
<p>Finally, according to the categories of biomedical images (i.e., Gel-image, Image-of-thing, Graph, Model, and Mix),
<bold>D
<sc>e</sc>
TEXT</bold>
is grouped into these 5 image categories, i.e., 5 subsets. Hence, only one type of images can be chosen for the evaluation.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="sec014">
<title>Discussion</title>
<p>Throughout the
<bold>D
<sc>e</sc>
TEXT</bold>
annotation, we found unique challenges for automatically detecting text from figures. As shown in Tables
<xref ref-type="table" rid="pone.0126200.t002">2</xref>
,
<xref ref-type="table" rid="pone.0126200.t003">3</xref>
and
<xref ref-type="table" rid="pone.0126200.t004">4</xref>
, only 37.8% text regions are normal. In most cases, text is small (26.0%), blurry (12.0%), short (46.8%), embedded in complex background (7.2%), with different orientations (9.1%), and with a combination of multiple aforementioned challenges. For example, as shown in
<xref ref-type="table" rid="pone.0126200.t003">Table 3</xref>
, 19.2% (1,786/9,308) figure text is both small and short, and 9.2% (858/9308) figure text is both small and blurry. All these issues are significant challenges to figure text recognition, and most conventional OCR technologies would likely fail. In the following we focused on the discussion of challenges from image quality and complex images in both the open domain and the specific domain (biomedical figures), and challenges from text regions themselves in the specific domain. Finally, we also discussed some issues of the size of
<bold>D
<sc>e</sc>
TEXT</bold>
, and presented some possible future research directions.</p>
<sec id="sec015">
<title>Image Quality, Complex Images and Complex Background</title>
<p>We believe that figure image quality poses significant challenges for automatic text detection and recognition. In addition, complex images have many common challenges due to environment complexities, flexible acquisitions, and text variations [
<xref rid="pone.0126200.ref030" ref-type="bibr">30</xref>
]: background complexity, blurring and degradation, aspect ratios of text, various text fonts, and image distortion.</p>
<p>Biomedical literature figures are sometimes displayed with a low resolution. In a low-resolution image, text is always composed of blurry and small-size characters. In our annotation (training, validation, and testing) datasets, there are about a quarter of figures with blurry text or / and small-size characters (see examples in
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1</xref>
).</p>
<p>Layout complexity is one of the characteristics of biomedical figures. As shown in Figs
<xref ref-type="fig" rid="pone.0126200.g001">1</xref>
and
<xref ref-type="fig" rid="pone.0126200.g002">2</xref>
, figures compose of different objects, including experimental results, research models, and biomedical objects with different targets, patterns and presentations. Consequently, they form a complex layout for figure representation. For example,
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
is simultaneously composed of biomedical objects, experimental results, different graphs, rotated and color text. This complex layout is a big challenge not only for image processing but also for text extraction.</p>
<p>In summary, challenges from image quality and complex images in both the open domain and the specific domain mainly include blurred text, small-size character, color text, and complex background and layout, which are described in details in the following.</p>
<p>Blurred text (“blurry”): Because of the limitation of the file size, or the incorrect handling of the figure itself, it is common to see blurred figures. It degrades the quality of text images. The common influence of blurring and degradation is that they always reduce characters’ sharpness and introduce touching characters (see
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(B)</xref>
), which makes text detection, character segmentation, and word recognition very difficult.</p>
<p>Small-size character (“small”): Generally, literature figures have limited space for text insertion and presentation. Consequently, authors often use a small font size when embedding text. Small font size, however, often lowers both image quality and contrast, as in
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(B)</xref>
, serving as one main error source. Moreover, sometimes there are also some oversized characters in figures. Characters of various fonts and sizes have large within-class variations, and could form many pattern subspaces, making it difficult to perform good segmentation and recognition.</p>
<p>Color image / text (“color”): In order to clearly and discriminatively present information and objects, there is plenty of color text or/and color background in figures (see
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
). Color variation introduces challenges in text localization, segmentation and recognition.</p>
<p>Complex background and layout (“complex_background”): In biomedical literature figures, there are lots of experimental results, research models and biomedical objects with different representations and frequently intertwined text and image content(examples are given in Figs
<xref ref-type="fig" rid="pone.0126200.g001">1(a)</xref>
,
<xref ref-type="fig" rid="pone.0126200.g001">1(B)</xref>
and
<xref ref-type="fig" rid="pone.0126200.g002">2</xref>
). These objects and their embedded text contribute to the layout complexity and make it difficult to localize and segment text.</p>
</sec>
<sec id="sec016">
<title>Text Complexity</title>
<p>In the specific domain of biomedical figures, there are a large amount of short words, domain terms, upper cases, text with irregular arrangement, etc. This text complexity also bring several significant challenges for figure text recognition. For example, irregular text arrangement is a common characteristic in biomedical figures (see Figs
<xref ref-type="fig" rid="pone.0126200.g001">1</xref>
and
<xref ref-type="fig" rid="pone.0126200.g002">2</xref>
). The figure is the precise, concise description of one idea (or content) in a paper. In a limited-scale figure, text is always arranged with a wide range of sizes, orientations, and locations.</p>
<p>In summary, challenges from texts themselves in the specific domain mainly include short words, complex symbols, specific text, and oriented text, which are described in details in the following.</p>
<p>Short word (“short”): There are plenty of short words (two or three characters) in figures (see Figs
<xref ref-type="fig" rid="pone.0126200.g001">1(c)</xref>
and
<xref ref-type="fig" rid="pone.0126200.g002">2</xref>
). Two or three characters are always difficult for text grouping and text classification in the text detection stage. Moreover, some noise regions have similar structures and appearances with short words.</p>
<p>Complex symbol (“complex_symbol”): In biomedical literature figures, there is plenty of complex text with complex and specific symbols, e.g., chemical formula, molecular, and abbreviations (see
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
). A chemical formula is always composed of digits, uppercase letters, superscript or subscript characters, and specific symbols. Besides the big challenge for character and word recognition, it is also very difficult for layout analysis and text detection.</p>
<p>Specific text (“specific_text”): There are several specific texts in biomedical figures. The two most common ones are gene sequence and linked terms [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
]. One gene sequence is composed of several characters, which are always shown in tables (see
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(a)</xref>
). However, the spacing between characters is sometimes small and sometimes large. Consequently, it is very difficult to detect and locate the text region of the whole sequence. But a whole gene sequence unit is very important, as well as enjoying a high priority, for figure retrieval and text mining.</p>
<p>Another issue is rotated (oriented) text. Multi-orientation text is always embedded in literature figures in order to compact representation and beautiful arrangement. Two common cases are the vertical text along the Y-axis (
<xref ref-type="fig" rid="pone.0126200.g003">Fig 3</xref>
), and the oriented text (with a long text) along the X-axis in plot and histogram figures (
<xref ref-type="fig" rid="pone.0126200.g005">Fig 5</xref>
). However, most existing methods have focused on detecting horizontal or near-horizontal texts in images and figures due to the challenging issues for detecting multi-orientation text. The fundamental difficulty is that the text line alignment feature can no longer be used to regularize the text construction process. However, most current clustering- or rule-based methods always rely on such information for character grouping and line construction [
<xref rid="pone.0126200.ref017" ref-type="bibr">17</xref>
,
<xref rid="pone.0126200.ref034" ref-type="bibr">34</xref>
,
<xref rid="pone.0126200.ref038" ref-type="bibr">38</xref>
,
<xref rid="pone.0126200.ref044" ref-type="bibr">44</xref>
] because the bottom alignment is the key and most stable feature for text lines [
<xref rid="pone.0126200.ref038" ref-type="bibr">38</xref>
]. Another challenge is that in arbitrary orientations, it is complicated to determine numerous empirical rules and to train robust character and text classifiers for text detection and recognition.</p>
<p>
<xref ref-type="table" rid="pone.0126200.t008">Table 8</xref>
summarizes all aforementioned common and notable challenges (“difficulties”) for text detection and recognition from biomedical literature figures.</p>
<table-wrap id="pone.0126200.t008" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0126200.t008</object-id>
<label>Table 8</label>
<caption>
<title>Challenges for text detection and recognition from biomedical literature figures.</title>
</caption>
<alternatives>
<graphic id="pone.0126200.t008g" xlink:href="pone.0126200.t008"></graphic>
<table frame="box" rules="all" border="0">
<colgroup span="1">
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
<col align="left" valign="top" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Challenges</th>
<th align="left" rowspan="1" colspan="1">Sub Categorization</th>
<th align="left" rowspan="1" colspan="1">Difficulty</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="4" colspan="1">From image quality and complex images</td>
<td align="left" rowspan="1" colspan="1">Blurred text</td>
<td align="left" rowspan="1" colspan="1">“blurry” (see
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(b)</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Small-size character</td>
<td align="left" rowspan="1" colspan="1">“small” (see
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(b)</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Color image / text</td>
<td align="left" rowspan="1" colspan="1">“color” (see
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Complex background and layout</td>
<td align="left" rowspan="1" colspan="1">“complex_background” (see
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="4" colspan="1">From text complexity</td>
<td align="left" rowspan="1" colspan="1">Short word</td>
<td align="left" rowspan="1" colspan="1">“short” (see
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(c)</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Complex symbol</td>
<td align="left" rowspan="1" colspan="1">“complex_symbol” (see
<xref ref-type="fig" rid="pone.0126200.g004">Fig 4</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Specific text</td>
<td align="left" rowspan="1" colspan="1">“specific_text” (see
<xref ref-type="fig" rid="pone.0126200.g001">Fig 1(a)</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Oriented text</td>
<td align="left" rowspan="1" colspan="1">“oriented” (see
<xref ref-type="fig" rid="pone.0126200.g002">Fig 2</xref>
)</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
<sec id="sec017">
<title>Database Size and Annotation Effort</title>
<p>As described previously,
<bold>D
<sc>e</sc>
TEXT</bold>
comprises of a total of 9308 text regions from 500 figures of 288 full-text articles. Significant amount of annotation work has been put forth in the biomedical domain. For example, two highly successful text-based evaluation efforts, the BioCreAtIvE (
<ext-link ext-link-type="uri" xlink:href="http://biocreative.sourceforge.net/index.html">http://biocreative.sourceforge.net/index.html</ext-link>
) and the i2b2 (
<ext-link ext-link-type="uri" xlink:href="https://www.i2b2.org/">https://www.i2b2.org/</ext-link>
) both have the annotated corpora at the scale of a hundred or a few hundred. A five-year annotation effort supported by NIH resulted in 97 annotation of full-text articles [
<xref rid="pone.0126200.ref049" ref-type="bibr">49</xref>
]. We have also demonstrated that careful annotations of hundreds or less articles can lead to meaningful biomedical knowledge discoveries [
<xref rid="pone.0126200.ref010" ref-type="bibr">10</xref>
]. Since biomedical images can be classified mainly into five types [
<xref rid="pone.0126200.ref048" ref-type="bibr">48</xref>
], with thousands of text regions annotated for each image type, we are confident that our annotation data size is sufficient as a benchmark dataset.</p>
</sec>
<sec id="sec018">
<title>Future Work</title>
<p>As we know, hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text richly appears in figures, text extraction (detection and recognition) from figures is an important step for applications of figure text and figure mining in biomedical literature. Consequently, one future work is to develop automated systems to detect and recognize text in biomedical figures. Unlike images in the open domain, biomedical figures are highly complex and therefore present unique challenges. D
<sc>e</sc>
TEXT provides a high quality benchmark dataset for exploring automated text extraction from biomedical figures in both biomedical informatics and document analysis and recognition fields. Another possible work is to perform biomedical figure search which combines a variety of information from both figure captions, full-text article and also the text embedded in its figure. Again, D
<sc>e</sc>
TEXT along with its full articles provide a good resource for investigating such topics in both biomedical informatics and information retrieval fields.</p>
</sec>
<sec id="sec019">
<title>Conclusion</title>
<p>In this paper, we released the first public image dataset for biomedical literature figure text detection and recognition,
<bold>D
<sc>e</sc>
TEXT</bold>
: a Database for
<bold>E</bold>
valuating
<bold>TEXT</bold>
-extraction from biomedical literature figures. Similar to the figure dataset in FigTExT [
<xref rid="pone.0126200.ref009" ref-type="bibr">9</xref>
] but with a larger number of figures and articles,
<bold>D
<sc>e</sc>
TEXT</bold>
is composed of 500 typical biomedical literature figures existing in about 300 full-text articles randomly selected from PubMed Central. Moreover, similar to the image dataset in the recent ICDAR Robust Reading Competition [
<xref rid="pone.0126200.ref025" ref-type="bibr">25</xref>
] but with much richer information, images in
<bold>D
<sc>e</sc>
TEXT</bold>
are annotated with not only the text region’s orientation, location and ground truth text, but also the image quality that is essential for technology study, error analysis and application investigation. Meanwhile, we also recommended the text detection and word recognition evaluation protocols for our
<bold>D
<sc>e</sc>
TEXT</bold>
dataset. The next tasks are how to detect and recognize figure text in this dataset, and how to retrieve biomedical literature figures with figure text extraction. We hope our continuous efforts will help to improve figure classification, retrieval and mining in the literature.</p>
</sec>
</sec>
</body>
<back>
<ack>
<p>We are grateful to the academic editor (Prof. Shoba Ranganathan) and the anonymous reviewers for their constructive comments. The funders of this research had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0126200.ref001">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Shatkay</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Chen</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Blostein</surname>
<given-names>D</given-names>
</name>
(
<year>2006</year>
)
<article-title>Integrating image data into biomedical text categorization</article-title>
.
<source>Bioinformatics</source>
<volume>14</volume>
:
<fpage>446</fpage>
<lpage>453</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btl235">10.1093/bioinformatics/btl235</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref002">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Lee</surname>
<given-names>M</given-names>
</name>
(
<year>2006</year>
)
<article-title>Accessing bioscience images from abstract sentences</article-title>
.
<source>Bioinformatics</source>
<volume>14</volume>
:
<fpage>547</fpage>
<lpage>556</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btl261">10.1093/bioinformatics/btl261</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref003">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hearst</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Divoli</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Guturu</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Ksikes</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Nakov</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Wooldridge</surname>
<given-names>MA</given-names>
</name>
,
<etal>et al</etal>
(
<year>2007</year>
)
<article-title>BioText Search Engine: beyond abstract search</article-title>
.
<source>Bioinformatics</source>
<volume>23</volume>
:
<fpage>2196</fpage>
<lpage>2197</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btm301">10.1093/bioinformatics/btm301</ext-link>
</comment>
<pub-id pub-id-type="pmid">17545178</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref004">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Qian</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Murphy</surname>
<given-names>R</given-names>
</name>
(
<year>2008</year>
)
<article-title>Improved recognition of figures containing fluorescence microscope images in online journal articles using graphical models</article-title>
.
<source>Bioinformatics</source>
<volume>24</volume>
:
<fpage>569</fpage>
<lpage>576</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btm561">10.1093/bioinformatics/btm561</ext-link>
</comment>
<pub-id pub-id-type="pmid">18033795</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref005">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Xu</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>McCusker</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Krauthammer</surname>
<given-names>M</given-names>
</name>
(
<year>2008</year>
)
<article-title>Yale Image Finder (YIF): a new search engine for retrieving biomedical images</article-title>
.
<source>Bioinformatics</source>
<volume>24</volume>
:
<fpage>1968</fpage>
<lpage>1970</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btn340">10.1093/bioinformatics/btn340</ext-link>
</comment>
<pub-id pub-id-type="pmid">18614584</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref006">
<label>6</label>
<mixed-citation publication-type="other">Ahmed A, Xing E, Cohen W, Murphy R (2009) Structured correspondence topic models for mining captioned figures in biological literature. In: ACM International Conference on Knowledge Discovery and Data Mining. pp. 39–47.</mixed-citation>
</ref>
<ref id="pone.0126200.ref007">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ahmed</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Arnold</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Coelho</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Kangas</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Sheikh</surname>
<given-names>AS</given-names>
</name>
,
<name>
<surname>Xing</surname>
<given-names>E</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Structured literature image finder: Parsing text and figures in biomedical literature</article-title>
.
<source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
<volume>8</volume>
:
<fpage>151</fpage>
<lpage>154</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.websem.2010.04.002">10.1016/j.websem.2010.04.002</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref008">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Liu</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Ramesh</surname>
<given-names>B</given-names>
</name>
(
<year>2010</year>
)
<article-title>Automatic figure ranking and user interfacing for intelligent figure search</article-title>
.
<source>PLoS ONE</source>
<volume>5</volume>
:
<fpage>e12983</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0012983">10.1371/journal.pone.0012983</ext-link>
</comment>
<pub-id pub-id-type="pmid">20949102</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref009">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
(
<year>2011</year>
)
<article-title>Figure text extraction in biomedical literature</article-title>
.
<source>PLoS ONE</source>
<volume>6</volume>
:
<fpage>e15338</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0015338">10.1371/journal.pone.0015338</ext-link>
</comment>
<pub-id pub-id-type="pmid">21249186</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref010">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bockhorst</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Conroy</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>O’Leary</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
(
<year>2012</year>
)
<article-title>Beyond captions: Linking figures with abstract sentences in biomedical articles</article-title>
.
<source>PLoS ONE</source>
<volume>7</volume>
:
<fpage>e39618</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0039618">10.1371/journal.pone.0039618</ext-link>
</comment>
<pub-id pub-id-type="pmid">22815711</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref011">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lopez</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Arighi</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Tudor</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Torri</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Huang</surname>
<given-names>H</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>A framework for biomedical figure segmentation towards image-based document retrieval</article-title>
.
<source>BMC Systems Biology</source>
<volume>7</volume>
:
<fpage>S8</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1752-0509-7-S4-S8">10.1186/1752-0509-7-S4-S8</ext-link>
</comment>
<pub-id pub-id-type="pmid">24565394</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref012">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
(
<year>2014</year>
)
<article-title>Learning to rank figures within a biomedical article</article-title>
.
<source>PLoS ONE</source>
<volume>9</volume>
:
<fpage>e61567</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0061567">10.1371/journal.pone.0061567</ext-link>
</comment>
<pub-id pub-id-type="pmid">24625719</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref013">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hua</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Liu</surname>
<given-names>W</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>H</given-names>
</name>
(
<year>2004</year>
)
<article-title>An automatic performance evaluation protocol for video text detection algorithms</article-title>
.
<source>IEEE Trans Circuits and Systems for Video Technology</source>
<volume>14</volume>
:
<fpage>498</fpage>
<lpage>507</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TCSVT.2004.825538">10.1109/TCSVT.2004.825538</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref014">
<label>14</label>
<mixed-citation publication-type="other">Lee S, Cho M, Jung K, Kim J (2010) Scene text extraction with edge constraint and text collinearity. In: International Conference on Pattern Recognition. pp. 3983–3986.</mixed-citation>
</ref>
<ref id="pone.0126200.ref015">
<label>15</label>
<mixed-citation publication-type="other">Wang K, Belongie S (2010) Word spotting in the wild. In: European Conference on Computer Vision. pp. 591–604.</mixed-citation>
</ref>
<ref id="pone.0126200.ref016">
<label>16</label>
<mixed-citation publication-type="other">Nagy R, Dicker A, Meyer-Wegener K (2011) NEOCR: A configurable dataset for natural image text recognition. In: International Workshop on Camera-Based Document Analysis and Recognition. pp. 150–163.</mixed-citation>
</ref>
<ref id="pone.0126200.ref017">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yi</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Tian</surname>
<given-names>Y</given-names>
</name>
(
<year>2011</year>
)
<article-title>Text string detection from natural scenes by structure-based partition and grouping</article-title>
.
<source>IEEE Trans Image Processing</source>
<volume>20</volume>
:
<fpage>2594</fpage>
<lpage>2605</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TIP.2011.2126586">10.1109/TIP.2011.2126586</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref018">
<label>18</label>
<mixed-citation publication-type="other">Mishra A, Alahari K, Jawahar C (2012) Top-down and bottom-up cues for scene text recognition. In: International Conference on Computer Vision and Pattern Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref019">
<label>19</label>
<mixed-citation publication-type="other">Yao C, Zhang X, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: International Conference on Computer Vision and Pattern Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref020">
<label>20</label>
<mixed-citation publication-type="other">Yin XC, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Analysis and Machine Intelligence, preprint.</mixed-citation>
</ref>
<ref id="pone.0126200.ref021">
<label>21</label>
<mixed-citation publication-type="other">Lucas S, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 Robust Reading Competitions. In: International Conference on Document Analysis and Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref022">
<label>22</label>
<mixed-citation publication-type="other">Lucas S (2005) ICDAR 2005 text locating competition results. In: International Conference on Document Analysis and Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref023">
<label>23</label>
<mixed-citation publication-type="other">Karatzas D, Mestre S, Mas J, Nourbakhsh F, Roy P (2011) ICDAR 2011 Robust Reading Competition—Challenge 1: Reading text in born-digital images (web and email). In: International Conference on Document Analysis and Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref024">
<label>24</label>
<mixed-citation publication-type="other">Shahab A, Shafait F, Dengel A (2011) ICDAR 2011 Robust Reading Competition—Challenge 2: Reading text in scene images. In: International Conference on Document Analysis and Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref025">
<label>25</label>
<mixed-citation publication-type="other">Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda L, Mestre S, et al. (2013) ICDAR 2013 Robust Reading Competition. In: International Conference on Document Analysis and Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref026">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Jung</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>J</given-names>
</name>
(
<year>2003</year>
)
<article-title>Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm</article-title>
.
<source>IEEE Trans Pattern Analysis and Machine Intelligence</source>
<volume>25</volume>
:
<fpage>1631</fpage>
<lpage>1639</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TPAMI.2003.1251157">10.1109/TPAMI.2003.1251157</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref027">
<label>27</label>
<mixed-citation publication-type="other">Hersh W, Cohen A, Roberts P, Rekapalli H (2006) Trec 2006 genomics track overview. In: The Fifteenth Text Retrieval Conference (TREC 2006). pp. 52–78.</mixed-citation>
</ref>
<ref id="pone.0126200.ref028">
<label>28</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cao</surname>
<given-names>YG</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Liu</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
(
<year>2010</year>
)
<article-title>An IR-aided machine learning framework for the biocreative ii.5 challenge</article-title>
.
<source>IEEE/ACM Trans Computational Biology and Bioinformatics</source>
<volume>7</volume>
:
<fpage>454</fpage>
<lpage>461</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TCBB.2010.56">10.1109/TCBB.2010.56</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref029">
<label>29</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Liu</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Antiean</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Cao</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
(
<year>2010</year>
)
<article-title>Lancet: a high precision medication event extraction system for clinical text</article-title>
.
<source>J Am Med Inform Assoc</source>
<volume>17</volume>
:
<fpage>563</fpage>
<lpage>567</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1136/jamia.2010.004077">10.1136/jamia.2010.004077</ext-link>
</comment>
<pub-id pub-id-type="pmid">20819865</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref030">
<label>30</label>
<mixed-citation publication-type="other">Ye Q, Doermann D (2014) Text detection and recognition in imagery: a survey. submitted to IEEE Trans Pattern Analysis and Machine Intelligence.</mixed-citation>
</ref>
<ref id="pone.0126200.ref031">
<label>31</label>
<mixed-citation publication-type="other">Chen X, Yuille A (2004) Detecting and reading text in natural scenes. In: International Conference on Computer Vision and Pattern Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref032">
<label>32</label>
<mixed-citation publication-type="other">Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: International Conference on Computer Vision and Pattern Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref033">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yi</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Tian</surname>
<given-names>Y</given-names>
</name>
(
<year>2012</year>
)
<article-title>Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification</article-title>
.
<source>IEEE Trans Image Processing</source>
<volume>21</volume>
:
<fpage>4256</fpage>
<lpage>4268</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TIP.2012.2199327">10.1109/TIP.2012.2199327</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref034">
<label>34</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pan</surname>
<given-names>XF</given-names>
</name>
,
<name>
<surname>Hou</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Liu</surname>
<given-names>CL</given-names>
</name>
(
<year>2011</year>
)
<article-title>A hybrid approach to detect and localize texts in natural scene images</article-title>
.
<source>IEEE Trans Image Processing</source>
<volume>20</volume>
:
<fpage>800</fpage>
<lpage>813</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TIP.2010.2070803">10.1109/TIP.2010.2070803</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref035">
<label>35</label>
<mixed-citation publication-type="other">Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: International Conference on Computer Vision and Pattern Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref036">
<label>36</label>
<mixed-citation publication-type="journal">
<name>
<surname>Shi</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Xiao</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Gao</surname>
<given-names>S</given-names>
</name>
(
<year>2013</year>
)
<article-title>Scene text detection using graph model built upon maximally stable extremal regions</article-title>
.
<source>Pattern Recognition Letters</source>
<volume>34</volume>
:
<fpage>107</fpage>
<lpage>116</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.patrec.2012.09.019">10.1016/j.patrec.2012.09.019</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref037">
<label>37</label>
<mixed-citation publication-type="journal">
<name>
<surname>Koo</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
(
<year>2013</year>
)
<article-title>Scene text detection via connected component clustering and nontext filtering</article-title>
.
<source>IEEE Trans Image Processing</source>
<volume>22</volume>
:
<fpage>2296</fpage>
<lpage>2305</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TIP.2013.2249082">10.1109/TIP.2013.2249082</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref038">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Yin</surname>
<given-names>XC</given-names>
</name>
,
<name>
<surname>Yin</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Huang</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Hao</surname>
<given-names>HW</given-names>
</name>
(
<year>2014</year>
)
<article-title>Robust text detection in natural scene images</article-title>
.
<source>IEEE Trans Pattern Analysis and Machine Intelligence</source>
<volume>36</volume>
:
<fpage>970</fpage>
<lpage>983</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TPAMI.2013.182">10.1109/TPAMI.2013.182</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref039">
<label>39</label>
<mixed-citation publication-type="journal">
<name>
<surname>Weinman</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Learned-Miller</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Hanson</surname>
<given-names>A</given-names>
</name>
(
<year>2009</year>
)
<article-title>Scene text recognition using similarity and a lexicon with sparse belief propagation</article-title>
.
<source>IEEE Trans Pattern Analysis and Machine Intelligence</source>
<volume>31</volume>
:
<fpage>1733</fpage>
<lpage>1746</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/TPAMI.2009.38">10.1109/TPAMI.2009.38</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref040">
<label>40</label>
<mixed-citation publication-type="other">Field J, Learned-Miller E (2013) Improving open-vocabulary scene text recognition. In: International Conference on Document Analysis and Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref041">
<label>41</label>
<mixed-citation publication-type="other">Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: International Conference on Computer Vision and Pattern Recognition.</mixed-citation>
</ref>
<ref id="pone.0126200.ref042">
<label>42</label>
<mixed-citation publication-type="other">Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photoocr: Reading text in uncontrolled conditions. In: International Conference on Computer Vision.</mixed-citation>
</ref>
<ref id="pone.0126200.ref043">
<label>43</label>
<mixed-citation publication-type="other">Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: International Conference on Computer Vision.</mixed-citation>
</ref>
<ref id="pone.0126200.ref044">
<label>44</label>
<mixed-citation publication-type="other">Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: International Conference on Computer Vision.</mixed-citation>
</ref>
<ref id="pone.0126200.ref045">
<label>45</label>
<mixed-citation publication-type="book">
<name>
<surname>Yin</surname>
<given-names>XC</given-names>
</name>
,
<name>
<surname>Yang</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Pei</surname>
<given-names>WY</given-names>
</name>
,
<name>
<surname>Hao</surname>
<given-names>HW</given-names>
</name>
(
<year>2014</year>
)
<chapter-title>Effective end-to-end scene text recognition</chapter-title>
<source>Technical Reports</source>
,
<publisher-name>University of Science and Technology Beijing</publisher-name>
.</mixed-citation>
</ref>
<ref id="pone.0126200.ref046">
<label>46</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wolf</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Jolion</surname>
<given-names>J</given-names>
</name>
(
<year>2006</year>
)
<article-title>Object count/area graphs for the evaluation of object detection and segmentation algorithms</article-title>
.
<source>International Journal of Document Analysis and Recognition</source>
<volume>28</volume>
:
<fpage>280</fpage>
<lpage>296</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s10032-006-0014-0">10.1007/s10032-006-0014-0</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref047">
<label>47</label>
<mixed-citation publication-type="other">Liang J, Phillips I, Haralick R (1997) Performance evaluation of document layout analysis algorithms on the uw data set. In: SPIE International Conference on Document Recognitoin IV. pp. 149–160.</mixed-citation>
</ref>
<ref id="pone.0126200.ref048">
<label>48</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Ramesh</surname>
<given-names>BP</given-names>
</name>
,
<name>
<surname>Yu</surname>
<given-names>H</given-names>
</name>
(
<year>2011</year>
)
<article-title>Automatic figure classification in bioscience literature</article-title>
.
<source>J Biomed Inform</source>
<volume>44</volume>
:
<fpage>848</fpage>
<lpage>858</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.jbi.2011.05.003">10.1016/j.jbi.2011.05.003</ext-link>
</comment>
<pub-id pub-id-type="pmid">21645638</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0126200.ref049">
<label>49</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bada</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Eckert</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Evans</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Garcia</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Shipley</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Sitnikov</surname>
<given-names>D</given-names>
</name>
,
<etal>et al</etal>
(
<year>2012</year>
)
<article-title>Concept annotation in the CRAFT corpus</article-title>
.
<source>BMC Bioinformatics</source>
<volume>13</volume>
:
<fpage>161</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-13-161">10.1186/1471-2105-13-161</ext-link>
</comment>
<pub-id pub-id-type="pmid">22776079</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000010 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000010 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4423993
   |texte=   DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:25951377" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024