OcrV1, PascalFrancis, Curation, bibRecord, 000435

Italic detection and rectification

Identifieur interne : 000435 ( PascalFrancis/Curation ); précédent : 000434; suivant : 000436

Italic detection and rectification

Auteurs : Kuo-Chin Fan [Taïwan] ; Chien-Hsiang Huang [Taïwan]

Source :

Journal of information science and engineering [ 1016-2364 ] ; 2007.

RBID : Pascal:07-0170808

Descripteurs français

Pascal (Inist)
- Reconnaissance caractère, Reconnaissance forme, Traitement image, Système expert, Classification, Reconnaissance optique caractère, Rectification, Cisaillement, Phrase, Analyse statistique, Approche probabiliste, Base connaissance.
Wicri :
- topic : Classification.

English descriptors

KwdEn :
- Character recognition, Classification, Expert system, Image processing, Knowledge base, Optical character recognition, Pattern recognition, Probabilistic approach, Rectification, Sentence, Shear, Statistical analysis.

Abstract

In this paper, a novel italic detection and rectification method without the prerequisite of character recognition is proposed. An italic style character can be obtained by performing shear transformation on its corresponding non-italic style character. Traditional italic detection methods have to be operated at least on the word, sentence or even the whole paragraph. The merit of the proposed method is that it can be operated directly on a single character so that more accurate statistical information can be obtained. The rationale of our proposed method is that the difference of certain features derived from italic style characters after shear transformation will be canceled, whereas the difference will be more obvious for non-italic style (normal style) characters. In our proposed approach, the virtual strokes embedded in the considered character image are extracted first. Then, reverse transformation is operated on the considered character image. The 26 upper and 26 lower alphabets are classified into three classes based on the structural information of the extracted virtual strokes. The italic and non-italic style characters can then be distinguished based on the classification rule devised for each class of characters. Last, the exact shear angle of the identified italic character is calculated to perform more accurate reverse shear transformation to rectify the italic style character into normal (non-italic) style character to facilitate the later OCR task. Experiments were conducted on 50 document images with mixed italic and normal style characters. Satisfactory accuracy rate 99.59% for italic style characters and 99.85% for normal style characters are achieved. Experimental results verify the validity of our proposed method in distinguishing italic and non-italic style characters.

A01	`01`	`1`		`@0 1016-2364`
A03		`1`		`@0 J. inf. sci. eng.`
A05				`@2 23`
A06				`@2 2`
A08	`01`	`1`	`ENG`	`@1 Italic detection and rectification`
A11	`01`	`1`		`@1 FAN (Kuo-Chin)`
A11	`02`	`1`		`@1 HUANG (Chien-Hsiang)`
A14	`01`			`@1 Institute of Computer Science and Information Engineering National Central University @2 Chungli, 320 @3 TWN @Z 1 aut. @Z 2 aut.`
A20				`@1 403-419`
A21				`@1 2007`
A23	`01`			`@0 ENG`
A43	`01`			`@1 INIST @2 26861 @5 354000147194220040`
A44				`@0 0000 @1 © 2007 INIST-CNRS. All rights reserved.`
A45				`@0 12 ref.`
A47	`01`	`1`		`@0 07-0170808`
A60				`@1 P`
A61				`@0 A`
A64	`01`	`1`		`@0 Journal of information science and engineering`
A66	`01`			`@0 TWN`
C01	`01`		`ENG`	@0 In this paper, a novel italic detection and rectification method without the prerequisite of character recognition is proposed. An italic style character can be obtained by performing shear transformation on its corresponding non-italic style character. Traditional italic detection methods have to be operated at least on the word, sentence or even the whole paragraph. The merit of the proposed method is that it can be operated directly on a single character so that more accurate statistical information can be obtained. The rationale of our proposed method is that the difference of certain features derived from italic style characters after shear transformation will be canceled, whereas the difference will be more obvious for non-italic style (normal style) characters. In our proposed approach, the virtual strokes embedded in the considered character image are extracted first. Then, reverse transformation is operated on the considered character image. The 26 upper and 26 lower alphabets are classified into three classes based on the structural information of the extracted virtual strokes. The italic and non-italic style characters can then be distinguished based on the classification rule devised for each class of characters. Last, the exact shear angle of the identified italic character is calculated to perform more accurate reverse shear transformation to rectify the italic style character into normal (non-italic) style character to facilitate the later OCR task. Experiments were conducted on 50 document images with mixed italic and normal style characters. Satisfactory accuracy rate 99.59% for italic style characters and 99.85% for normal style characters are achieved. Experimental results verify the validity of our proposed method in distinguishing italic and non-italic style characters.
C02	`01`	`X`		`@0 001D02C03`
C03	`01`	`X`	`FRE`	`@0 Reconnaissance caractère @5 06`
C03	`01`	`X`	`ENG`	`@0 Character recognition @5 06`
C03	`01`	`X`	`SPA`	`@0 Reconocimiento carácter @5 06`
C03	`02`	`X`	`FRE`	`@0 Reconnaissance forme @5 07`
C03	`02`	`X`	`ENG`	`@0 Pattern recognition @5 07`
C03	`02`	`X`	`SPA`	`@0 Reconocimiento patrón @5 07`
C03	`03`	`X`	`FRE`	`@0 Traitement image @5 08`
C03	`03`	`X`	`ENG`	`@0 Image processing @5 08`
C03	`03`	`X`	`SPA`	`@0 Procesamiento imagen @5 08`
C03	`04`	`X`	`FRE`	`@0 Système expert @5 09`
C03	`04`	`X`	`ENG`	`@0 Expert system @5 09`
C03	`04`	`X`	`SPA`	`@0 Sistema experto @5 09`
C03	`05`	`X`	`FRE`	`@0 Classification @5 10`
C03	`05`	`X`	`ENG`	`@0 Classification @5 10`
C03	`05`	`X`	`SPA`	`@0 Clasificación @5 10`
C03	`06`	`X`	`FRE`	`@0 Reconnaissance optique caractère @5 11`
C03	`06`	`X`	`ENG`	`@0 Optical character recognition @5 11`
C03	`06`	`X`	`SPA`	`@0 Reconocimento óptico de caracteres @5 11`
C03	`07`	`X`	`FRE`	`@0 Rectification @5 18`
C03	`07`	`X`	`ENG`	`@0 Rectification @5 18`
C03	`07`	`X`	`SPA`	`@0 Rectificación @5 18`
C03	`08`	`X`	`FRE`	`@0 Cisaillement @5 19`
C03	`08`	`X`	`ENG`	`@0 Shear @5 19`
C03	`08`	`X`	`SPA`	`@0 Cizalladura @5 19`
C03	`09`	`X`	`FRE`	`@0 Phrase @5 20`
C03	`09`	`X`	`ENG`	`@0 Sentence @5 20`
C03	`09`	`X`	`SPA`	`@0 Frase @5 20`
C03	`10`	`X`	`FRE`	`@0 Analyse statistique @5 23`
C03	`10`	`X`	`ENG`	`@0 Statistical analysis @5 23`
C03	`10`	`X`	`SPA`	`@0 Análisis estadístico @5 23`
C03	`11`	`X`	`FRE`	`@0 Approche probabiliste @5 24`
C03	`11`	`X`	`ENG`	`@0 Probabilistic approach @5 24`
C03	`11`	`X`	`SPA`	`@0 Enfoque probabilista @5 24`
C03	`12`	`X`	`FRE`	`@0 Base connaissance @5 25`
C03	`12`	`X`	`ENG`	`@0 Knowledge base @5 25`
C03	`12`	`X`	`SPA`	`@0 Base conocimiento @5 25`
N21				`@1 121`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000351

Links to Exploration step

Pascal:07-0170808

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Italic detection and rectification</title>
<author><name sortKey="Fan, Kuo Chin" sort="Fan, Kuo Chin" uniqKey="Fan K" first="Kuo-Chin" last="Fan">Kuo-Chin Fan</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute of Computer Science and Information Engineering National Central University</s1>
<s2>Chungli, 320</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
</affiliation>
</author>
<author><name sortKey="Huang, Chien Hsiang" sort="Huang, Chien Hsiang" uniqKey="Huang C" first="Chien-Hsiang" last="Huang">Chien-Hsiang Huang</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute of Computer Science and Information Engineering National Central University</s1>
<s2>Chungli, 320</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">07-0170808</idno>
<date when="2007">2007</date>
<idno type="stanalyst">PASCAL 07-0170808 INIST</idno>
<idno type="RBID">Pascal:07-0170808</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000351</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000435</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Italic detection and rectification</title>
<author><name sortKey="Fan, Kuo Chin" sort="Fan, Kuo Chin" uniqKey="Fan K" first="Kuo-Chin" last="Fan">Kuo-Chin Fan</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute of Computer Science and Information Engineering National Central University</s1>
<s2>Chungli, 320</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
</affiliation>
</author>
<author><name sortKey="Huang, Chien Hsiang" sort="Huang, Chien Hsiang" uniqKey="Huang C" first="Chien-Hsiang" last="Huang">Chien-Hsiang Huang</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute of Computer Science and Information Engineering National Central University</s1>
<s2>Chungli, 320</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Journal of information science and engineering</title>
<title level="j" type="abbreviated">J. inf. sci. eng.</title>
<idno type="ISSN">1016-2364</idno>
<imprint><date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Journal of information science and engineering</title>
<title level="j" type="abbreviated">J. inf. sci. eng.</title>
<idno type="ISSN">1016-2364</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Classification</term>
<term>Expert system</term>
<term>Image processing</term>
<term>Knowledge base</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Probabilistic approach</term>
<term>Rectification</term>
<term>Sentence</term>
<term>Shear</term>
<term>Statistical analysis</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Traitement image</term>
<term>Système expert</term>
<term>Classification</term>
<term>Reconnaissance optique caractère</term>
<term>Rectification</term>
<term>Cisaillement</term>
<term>Phrase</term>
<term>Analyse statistique</term>
<term>Approche probabiliste</term>
<term>Base connaissance</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Classification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, a novel italic detection and rectification method without the prerequisite of character recognition is proposed. An italic style character can be obtained by performing shear transformation on its corresponding non-italic style character. Traditional italic detection methods have to be operated at least on the word, sentence or even the whole paragraph. The merit of the proposed method is that it can be operated directly on a single character so that more accurate statistical information can be obtained. The rationale of our proposed method is that the difference of certain features derived from italic style characters after shear transformation will be canceled, whereas the difference will be more obvious for non-italic style (normal style) characters. In our proposed approach, the virtual strokes embedded in the considered character image are extracted first. Then, reverse transformation is operated on the considered character image. The 26 upper and 26 lower alphabets are classified into three classes based on the structural information of the extracted virtual strokes. The italic and non-italic style characters can then be distinguished based on the classification rule devised for each class of characters. Last, the exact shear angle of the identified italic character is calculated to perform more accurate reverse shear transformation to rectify the italic style character into normal (non-italic) style character to facilitate the later OCR task. Experiments were conducted on 50 document images with mixed italic and normal style characters. Satisfactory accuracy rate 99.59% for italic style characters and 99.85% for normal style characters are achieved. Experimental results verify the validity of our proposed method in distinguishing italic and non-italic style characters.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>1016-2364</s0>
</fA01>
<fA03 i2="1"><s0>J. inf. sci. eng.</s0>
</fA03>
<fA05><s2>23</s2>
</fA05>
<fA06><s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG"><s1>Italic detection and rectification</s1>
</fA08>
<fA11 i1="01" i2="1"><s1>FAN (Kuo-Chin)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>HUANG (Chien-Hsiang)</s1>
</fA11>
<fA14 i1="01"><s1>Institute of Computer Science and Information Engineering National Central University</s1>
<s2>Chungli, 320</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</fA14>
<fA20><s1>403-419</s1>
</fA20>
<fA21><s1>2007</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA43 i1="01"><s1>INIST</s1>
<s2>26861</s2>
<s5>354000147194220040</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2007 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>12 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>07-0170808</s0>
</fA47>
<fA60><s1>P</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Journal of information science and engineering</s0>
</fA64>
<fA66 i1="01"><s0>TWN</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>In this paper, a novel italic detection and rectification method without the prerequisite of character recognition is proposed. An italic style character can be obtained by performing shear transformation on its corresponding non-italic style character. Traditional italic detection methods have to be operated at least on the word, sentence or even the whole paragraph. The merit of the proposed method is that it can be operated directly on a single character so that more accurate statistical information can be obtained. The rationale of our proposed method is that the difference of certain features derived from italic style characters after shear transformation will be canceled, whereas the difference will be more obvious for non-italic style (normal style) characters. In our proposed approach, the virtual strokes embedded in the considered character image are extracted first. Then, reverse transformation is operated on the considered character image. The 26 upper and 26 lower alphabets are classified into three classes based on the structural information of the extracted virtual strokes. The italic and non-italic style characters can then be distinguished based on the classification rule devised for each class of characters. Last, the exact shear angle of the identified italic character is calculated to perform more accurate reverse shear transformation to rectify the italic style character into normal (non-italic) style character to facilitate the later OCR task. Experiments were conducted on 50 document images with mixed italic and normal style characters. Satisfactory accuracy rate 99.59% for italic style characters and 99.85% for normal style characters are achieved. Experimental results verify the validity of our proposed method in distinguishing italic and non-italic style characters.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Reconnaissance caractère</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Character recognition</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Reconocimiento carácter</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Reconnaissance forme</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Pattern recognition</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Reconocimiento patrón</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Traitement image</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Image processing</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Procesamiento imagen</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Système expert</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Expert system</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Sistema experto</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Classification</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Classification</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Clasificación</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Optical character recognition</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Reconocimento óptico de caracteres</s0>
<s5>11</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Rectification</s0>
<s5>18</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Rectification</s0>
<s5>18</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Rectificación</s0>
<s5>18</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Cisaillement</s0>
<s5>19</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Shear</s0>
<s5>19</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Cizalladura</s0>
<s5>19</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Phrase</s0>
<s5>20</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Sentence</s0>
<s5>20</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Frase</s0>
<s5>20</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Analyse statistique</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Statistical analysis</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Análisis estadístico</s0>
<s5>23</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Approche probabiliste</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>Probabilistic approach</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Enfoque probabilista</s0>
<s5>24</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Base connaissance</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Knowledge base</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Base conocimiento</s0>
<s5>25</s5>
</fC03>
<fN21><s1>121</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
</standard>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Curation

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000435 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Curation/biblio.hfd -nk 000435 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Curation
   |type=    RBID
   |clé=     Pascal:07-0170808
   |texte=   Italic detection and rectification
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Italic detection and rectification

Italic detection and rectification

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri