Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Binarization of color document images via luminance and saturation color features.

Identifieur interne : 000054 ( PubMed/Curation ); précédent : 000053; suivant : 000055

Binarization of color document images via luminance and saturation color features.

Auteurs : Chun-Ming Tsai [Taïwan] ; Hsi-Jian Lee

Source :

RBID : pubmed:18244645

Abstract

This paper presents a novel binarization algorithm for color document images. Conventional thresholding methods do not produce satisfactory binarization results for documents with close or mixed foreground colors and background colors. Initially, statistical image features are extracted from the luminance distribution. Then, a decision-tree based binarization method is proposed, which selects various color features to binarize color document images. First, if the document image colors are concentrated within a limited range, saturation is employed. Second, if the image foreground colors are significant, luminance is adopted. Third, if the image background colors are concentrated within a limited range, luminance is also applied. Fourth, if the total number of pixels with low luminance (less than 60) is limited, saturation is applied; else both luminance and saturation are employed. Our experiments include 519 color images, most of which are uniform invoice and name-card document images. The proposed binarization method generates better results than other available methods in shape and connected-component measurements. Also, the binarization method obtains higher recognition accuracy in a commercial OCR system than other comparable methods.

DOI: 10.1109/TIP.2002.999677
PubMed: 18244645

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:18244645

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Binarization of color document images via luminance and saturation color features.</title>
<author>
<name sortKey="Tsai, Chun Ming" sort="Tsai, Chun Ming" uniqKey="Tsai C" first="Chun-Ming" last="Tsai">Chun-Ming Tsai</name>
<affiliation wicri:level="1">
<nlm:affiliation>Dept. of Comput. Sci. and Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan, R. O. C. chunming@csie.nctu.edu.tw</nlm:affiliation>
<country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
<author>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2002">2002</date>
<idno type="doi">10.1109/TIP.2002.999677</idno>
<idno type="RBID">pubmed:18244645</idno>
<idno type="pmid">18244645</idno>
<idno type="wicri:Area/PubMed/Corpus">000054</idno>
<idno type="wicri:Area/PubMed/Curation">000054</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Binarization of color document images via luminance and saturation color features.</title>
<author>
<name sortKey="Tsai, Chun Ming" sort="Tsai, Chun Ming" uniqKey="Tsai C" first="Chun-Ming" last="Tsai">Chun-Ming Tsai</name>
<affiliation wicri:level="1">
<nlm:affiliation>Dept. of Comput. Sci. and Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan, R. O. C. chunming@csie.nctu.edu.tw</nlm:affiliation>
<country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
<author>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
</analytic>
<series>
<title level="j">IEEE transactions on image processing : a publication of the IEEE Signal Processing Society</title>
<idno type="ISSN">1057-7149</idno>
<imprint>
<date when="2002" type="published">2002</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents a novel binarization algorithm for color document images. Conventional thresholding methods do not produce satisfactory binarization results for documents with close or mixed foreground colors and background colors. Initially, statistical image features are extracted from the luminance distribution. Then, a decision-tree based binarization method is proposed, which selects various color features to binarize color document images. First, if the document image colors are concentrated within a limited range, saturation is employed. Second, if the image foreground colors are significant, luminance is adopted. Third, if the image background colors are concentrated within a limited range, luminance is also applied. Fourth, if the total number of pixels with low luminance (less than 60) is limited, saturation is applied; else both luminance and saturation are employed. Our experiments include 519 color images, most of which are uniform invoice and name-card document images. The proposed binarization method generates better results than other available methods in shape and connected-component measurements. Also, the binarization method obtains higher recognition accuracy in a commercial OCR system than other comparable methods.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Owner="NLM" Status="PubMed-not-MEDLINE">
<PMID Version="1">18244645</PMID>
<DateCreated>
<Year>2008</Year>
<Month>02</Month>
<Day>04</Day>
</DateCreated>
<DateCompleted>
<Year>2009</Year>
<Month>12</Month>
<Day>16</Day>
</DateCompleted>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">1057-7149</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>11</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2002</Year>
</PubDate>
</JournalIssue>
<Title>IEEE transactions on image processing : a publication of the IEEE Signal Processing Society</Title>
<ISOAbbreviation>IEEE Trans Image Process</ISOAbbreviation>
</Journal>
<ArticleTitle>Binarization of color document images via luminance and saturation color features.</ArticleTitle>
<Pagination>
<MedlinePgn>434-51</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1109/TIP.2002.999677</ELocationID>
<Abstract>
<AbstractText>This paper presents a novel binarization algorithm for color document images. Conventional thresholding methods do not produce satisfactory binarization results for documents with close or mixed foreground colors and background colors. Initially, statistical image features are extracted from the luminance distribution. Then, a decision-tree based binarization method is proposed, which selects various color features to binarize color document images. First, if the document image colors are concentrated within a limited range, saturation is employed. Second, if the image foreground colors are significant, luminance is adopted. Third, if the image background colors are concentrated within a limited range, luminance is also applied. Fourth, if the total number of pixels with low luminance (less than 60) is limited, saturation is applied; else both luminance and saturation are employed. Our experiments include 519 color images, most of which are uniform invoice and name-card document images. The proposed binarization method generates better results than other available methods in shape and connected-component measurements. Also, the binarization method obtains higher recognition accuracy in a commercial OCR system than other comparable methods.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Tsai</LastName>
<ForeName>Chun-Ming</ForeName>
<Initials>CM</Initials>
<AffiliationInfo>
<Affiliation>Dept. of Comput. Sci. and Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan, R. O. C. chunming@csie.nctu.edu.tw</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Lee</LastName>
<ForeName>Hsi-Jian</ForeName>
<Initials>HJ</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>IEEE Trans Image Process</MedlineTA>
<NlmUniqueID>9886191</NlmUniqueID>
<ISSNLinking>1057-7149</ISSNLinking>
</MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="pubmed">
<Year>2008</Year>
<Month>2</Month>
<Day>5</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2008</Year>
<Month>2</Month>
<Day>5</Day>
<Hour>9</Hour>
<Minute>1</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2008</Year>
<Month>2</Month>
<Day>5</Day>
<Hour>9</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="doi">10.1109/TIP.2002.999677</ArticleId>
<ArticleId IdType="pubmed">18244645</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000054 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 000054 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:18244645
   |texte=   Binarization of color document images via luminance and saturation color features.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:18244645" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024