OcrV1, Main, Merge, bibRecord, 000616

A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images

Identifieur interne : 000616 ( Main/Merge ); précédent : 000615; suivant : 000617

A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images

Auteurs : Songhua Xu [États-Unis] ; Michael Krauthammer [États-Unis]

Source :

Journal of Biomedical Informatics [ 1532-0464 ] ; 2010.

RBID : PMC:3265968

English descriptors

KwdEn :
- Algorithms, Image Interpretation, Computer-Assisted (methods), Information Storage and Retrieval (methods), Pattern Recognition, Automated (methods).
MESH :
- methods : Image Interpretation, Computer-Assisted, Information Storage and Retrieval, Pattern Recognition, Automated.
- Algorithms.

Abstract

There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper’s key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. In this paper, we demonstrate that a projection histogram-based text detection approach is well suited for text detection in biomedical images, with a performance of F score of .60. The approach performs better than comparable approaches for text detection. Further, we show that the iterative application of the algorithm is boosting overall detection performance. A C++ implementation of our algorithm is freely available through email request for academic use.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265968

DOI: 10.1016/j.jbi.2010.09.006
PubMed: 20887803
PubMed Central: 3265968

Links to Exploration step

PMC:3265968

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images</title>
<author><name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="A2">Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831</wicri:regionArea>
<placeName><region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">20887803</idno>
<idno type="pmc">3265968</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265968</idno>
<idno type="RBID">PMC:3265968</idno>
<idno type="doi">10.1016/j.jbi.2010.09.006</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000087</idno>
<idno type="wicri:Area/Pmc/Curation">000087</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000153</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="wicri:Area/PubMed/Corpus">000039</idno>
<idno type="wicri:Area/PubMed/Curation">000039</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000039</idno>
<idno type="wicri:Area/Ncbi/Merge">000087</idno>
<idno type="wicri:Area/Ncbi/Curation">000087</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000087</idno>
<idno type="wicri:doubleKey">1532-0464:2010:Xu S:a:new:pivoting</idno>
<idno type="wicri:Area/Main/Merge">000616</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images</title>
<author><name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="A2">Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831</wicri:regionArea>
<placeName><region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of Biomedical Informatics</title>
<idno type="ISSN">1532-0464</idno>
<idno type="eISSN">1532-0480</idno>
<imprint><date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Image Interpretation, Computer-Assisted (methods)</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Pattern Recognition, Automated (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Image Interpretation, Computer-Assisted</term>
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p id="P2">There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper’s key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. In this paper, we demonstrate that a projection histogram-based text detection approach is well suited for text detection in biomedical images, with a performance of F score of .60. The approach performs better than comparable approaches for text detection. Further, we show that the iterative application of the algorithm is boosting overall detection performance. A C++ implementation of our algorithm is freely available through email request for academic use.</p>
</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000616 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 000616 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:3265968
   |texte=   A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Merge/RBID.i   -Sk "pubmed:20887803" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

Serveur d'exploration sur l'OCR

A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images

A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.