Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A new pivoting and iterative text detection algorithm for biomedical images.

Identifieur interne : 000039 ( PubMed/Corpus ); précédent : 000038; suivant : 000040

A new pivoting and iterative text detection algorithm for biomedical images.

Auteurs : Songhua Xu ; Michael Krauthammer

Source :

RBID : pubmed:20887803

English descriptors

Abstract

There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper's key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. We demonstrate that our projection histogram-based text detection approach is well suited for text detection in biomedical images, and that the iterative application of the algorithm boosts performance to an F score of .60. We provide a C++ implementation of our algorithm freely available for academic use.

DOI: 10.1016/j.jbi.2010.09.006
PubMed: 20887803

Links to Exploration step

pubmed:20887803

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A new pivoting and iterative text detection algorithm for biomedical images.</title>
<author>
<name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation>
<nlm:affiliation>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="doi">10.1016/j.jbi.2010.09.006</idno>
<idno type="RBID">pubmed:20887803</idno>
<idno type="pmid">20887803</idno>
<idno type="wicri:Area/PubMed/Corpus">000039</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">A new pivoting and iterative text detection algorithm for biomedical images.</title>
<author>
<name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation>
<nlm:affiliation>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
</author>
</analytic>
<series>
<title level="j">Journal of biomedical informatics</title>
<idno type="eISSN">1532-0480</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Image Interpretation, Computer-Assisted (methods)</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Pattern Recognition, Automated (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Image Interpretation, Computer-Assisted</term>
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper's key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. We demonstrate that our projection histogram-based text detection approach is well suited for text detection in biomedical images, and that the iterative application of the algorithm boosts performance to an F score of .60. We provide a C++ implementation of our algorithm freely available for academic use.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">20887803</PMID>
<DateCreated>
<Year>2010</Year>
<Month>11</Month>
<Day>24</Day>
</DateCreated>
<DateCompleted>
<Year>2011</Year>
<Month>03</Month>
<Day>18</Day>
</DateCompleted>
<DateRevised>
<Year>2016</Year>
<Month>04</Month>
<Day>29</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1532-0480</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>43</Volume>
<Issue>6</Issue>
<PubDate>
<Year>2010</Year>
<Month>Dec</Month>
</PubDate>
</JournalIssue>
<Title>Journal of biomedical informatics</Title>
<ISOAbbreviation>J Biomed Inform</ISOAbbreviation>
</Journal>
<ArticleTitle>A new pivoting and iterative text detection algorithm for biomedical images.</ArticleTitle>
<Pagination>
<MedlinePgn>924-31</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1016/j.jbi.2010.09.006</ELocationID>
<Abstract>
<AbstractText>There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper's key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. We demonstrate that our projection histogram-based text detection approach is well suited for text detection in biomedical images, and that the iterative application of the algorithm boosts performance to an F score of .60. We provide a C++ implementation of our algorithm freely available for academic use.</AbstractText>
<CopyrightInformation>Copyright © 2010 Elsevier Inc. All rights reserved.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Xu</LastName>
<ForeName>Songhua</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Krauthammer</LastName>
<ForeName>Michael</ForeName>
<Initials>M</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>1R01LM009956</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>5K22LM009255</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>K22 LM009255</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>K22 LM009255-03</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>R01 LM009956</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>R01 LM009956-01A1</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
<Grant>
<GrantID>T15 LM007056</GrantID>
<Acronym>LM</Acronym>
<Agency>NLM NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D023362">Evaluation Studies</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2010</Year>
<Month>09</Month>
<Day>29</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Biomed Inform</MedlineTA>
<NlmUniqueID>100970413</NlmUniqueID>
<ISSNLinking>1532-0464</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2006 Jul 15;22(14):e446-53</RefSource>
<PMID Version="1">16873506</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>BMC Biol. 2006;4:25</RefSource>
<PMID Version="1">16884545</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>PLoS Genet. 2006 Oct 13;2(10):e166</RefSource>
<PMID Version="1">17040129</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biomed Inform. 2007 Jun;40(3):270-81</RefSource>
<PMID Version="1">17084109</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2008 Feb 15;24(4):569-76</RefSource>
<PMID Version="1">18033795</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2008 Sep 1;24(17):1968-70</RefSource>
<PMID Version="1">18614584</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="Y" UI="D000465">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D007090">Image Interpretation, Computer-Assisted</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D016247">Information Storage and Retrieval</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D010363">Pattern Recognition, Automated</DescriptorName>
<QualifierName MajorTopicYN="N" UI="Q000379">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
<OtherID Source="NLM">NIHMS241943</OtherID>
<OtherID Source="NLM">PMC3265968</OtherID>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2009</Year>
<Month>11</Month>
<Day>10</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2010</Year>
<Month>9</Month>
<Day>6</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2010</Year>
<Month>9</Month>
<Day>8</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="aheadofprint">
<Year>2010</Year>
<Month>9</Month>
<Day>29</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2010</Year>
<Month>10</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2010</Year>
<Month>10</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2011</Year>
<Month>3</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pii">S1532-0464(10)00138-3</ArticleId>
<ArticleId IdType="doi">10.1016/j.jbi.2010.09.006</ArticleId>
<ArticleId IdType="pubmed">20887803</ArticleId>
<ArticleId IdType="pmc">PMC3265968</ArticleId>
<ArticleId IdType="mid">NIHMS241943</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000039 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000039 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:20887803
   |texte=   A new pivoting and iterative text detection algorithm for biomedical images.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:20887803" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024