A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images
Identifieur interne : 000616 ( Main/Merge ); précédent : 000615; suivant : 000617A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images
Auteurs : Songhua Xu [États-Unis] ; Michael Krauthammer [États-Unis]Source :
- Journal of Biomedical Informatics [ 1532-0464 ] ; 2010.
English descriptors
- KwdEn :
- MESH :
Abstract
There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper’s key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. In this paper, we demonstrate that a projection histogram-based text detection approach is well suited for text detection in biomedical images, with a performance of F score of .60. The approach performs better than comparable approaches for text detection. Further, we show that the iterative application of the algorithm is boosting overall detection performance. A C++ implementation of our algorithm is freely available through email request for academic use.
Url:
DOI: 10.1016/j.jbi.2010.09.006
PubMed: 20887803
PubMed Central: 3265968
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000087
- to stream Pmc, to step Curation: 000087
- to stream Pmc, to step Checkpoint: 000153
- to stream PubMed, to step Corpus: 000039
- to stream PubMed, to step Curation: 000039
- to stream PubMed, to step Checkpoint: 000039
- to stream Ncbi, to step Merge: 000087
- to stream Ncbi, to step Curation: 000087
- to stream Ncbi, to step Checkpoint: 000087
Links to Exploration step
PMC:3265968Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images</title>
<author><name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="A2">Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831</wicri:regionArea>
<placeName><region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">20887803</idno>
<idno type="pmc">3265968</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265968</idno>
<idno type="RBID">PMC:3265968</idno>
<idno type="doi">10.1016/j.jbi.2010.09.006</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000087</idno>
<idno type="wicri:Area/Pmc/Curation">000087</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000153</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="wicri:Area/PubMed/Corpus">000039</idno>
<idno type="wicri:Area/PubMed/Curation">000039</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000039</idno>
<idno type="wicri:Area/Ncbi/Merge">000087</idno>
<idno type="wicri:Area/Ncbi/Curation">000087</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000087</idno>
<idno type="wicri:doubleKey">1532-0464:2010:Xu S:a:new:pivoting</idno>
<idno type="wicri:Area/Main/Merge">000616</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images</title>
<author><name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="A2">Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831</wicri:regionArea>
<placeName><region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<affiliation wicri:level="2"><nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Journal of Biomedical Informatics</title>
<idno type="ISSN">1532-0464</idno>
<idno type="eISSN">1532-0480</idno>
<imprint><date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Image Interpretation, Computer-Assisted (methods)</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Pattern Recognition, Automated (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Image Interpretation, Computer-Assisted</term>
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p id="P2">There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper’s key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. In this paper, we demonstrate that a projection histogram-based text detection approach is well suited for text detection in biomedical images, with a performance of F score of .60. The approach performs better than comparable approaches for text detection. Further, we show that the iterative application of the algorithm is boosting overall detection performance. A C++ implementation of our algorithm is freely available through email request for academic use.</p>
</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000616 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 000616 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= PMC:3265968 |texte= A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Merge/RBID.i -Sk "pubmed:20887803" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Merge/biblio.hfd \ | NlmPubMed2Wicri -a OcrV1
This area was generated with Dilib version V0.6.32. |