Optimal feature extraction for bilingual OCR
Identifieur interne :
000620 ( PascalFrancis/Corpus );
précédent :
000619;
suivant :
000621
Optimal feature extraction for bilingual OCR
Auteurs : D. Dhanya ;
A. G. RamakrishnanSource :
-
Lecture notes in computer science [ 0302-9743 ] ; 2002.
RBID : Pascal:03-0249356
Descripteurs français
- Pascal (Inist)
- Maximisation fonction,
Reconnaissance forme,
Reconnaissance caractère,
Reconnaissance optique caractère,
Transformation linéaire,
Transformation ondelette,
Analyse composante principale,
Extraction forme,
Procédé extraction,
Multilinguisme,
Extraction caractéristique,
Bilinguisme.
English descriptors
- KwdEn :
- Bilingualism,
Character recognition,
Extraction process,
Feature extraction,
Function maximization,
Linear transformation,
Multilingualism,
Optical character recognition,
Pattern extraction,
Pattern recognition,
Principal component analysis,
Wavelet transformation.
Abstract
Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques: Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations.
Notice en format standard (ISO 2709)
Pour connaître la documentation sur le format Inist Standard.
pA |
A01 | 01 | 1 | | @0 0302-9743 |
---|
A05 | | | | @2 2423 |
---|
A08 | 01 | 1 | ENG | @1 Optimal feature extraction for bilingual OCR |
---|
A09 | 01 | 1 | ENG | @1 DAS 2002 : document analysis systems V : Princeton NJ, 19-21 August 2002 |
---|
A11 | 01 | 1 | | @1 DHANYA (D.) |
---|
A11 | 02 | 1 | | @1 RAMAKRISHNAN (A. G.) |
---|
A12 | 01 | 1 | | @1 LOPRESTI (Daniel) @9 ed. |
---|
A12 | 02 | 1 | | @1 JIANYING HU @9 ed. |
---|
A12 | 03 | 1 | | @1 KASHI (Ramanujan) @9 ed. |
---|
A14 | 01 | | | @1 Department of Electrical Engineering, Indian Institute of Science @2 Bangalore @3 IND @Z 1 aut. @Z 2 aut. |
---|
A20 | | | | @1 25-36 |
---|
A21 | | | | @1 2002 |
---|
A23 | 01 | | | @0 ENG |
---|
A26 | 01 | | | @0 3-540-44068-2 |
---|
A43 | 01 | | | @1 INIST @2 16343 @5 354000108470940030 |
---|
A44 | | | | @0 0000 @1 © 2003 INIST-CNRS. All rights reserved. |
---|
A45 | | | | @0 8 ref. |
---|
A47 | 01 | 1 | | @0 03-0249356 |
---|
A60 | | | | @1 P @2 C |
---|
A61 | | | | @0 A |
---|
A64 | 01 | 1 | | @0 Lecture notes in computer science |
---|
A66 | 01 | | | @0 DEU |
---|
C01 | 01 | | ENG | @0 Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques: Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations. |
---|
C02 | 01 | X | | @0 001D02C03 |
---|
C03 | 01 | X | FRE | @0 Maximisation fonction @5 01 |
---|
C03 | 01 | X | ENG | @0 Function maximization @5 01 |
---|
C03 | 01 | X | SPA | @0 Maximización función @5 01 |
---|
C03 | 02 | X | FRE | @0 Reconnaissance forme @5 02 |
---|
C03 | 02 | X | ENG | @0 Pattern recognition @5 02 |
---|
C03 | 02 | X | SPA | @0 Reconocimiento patrón @5 02 |
---|
C03 | 03 | X | FRE | @0 Reconnaissance caractère @5 03 |
---|
C03 | 03 | X | ENG | @0 Character recognition @5 03 |
---|
C03 | 03 | X | SPA | @0 Reconocimiento carácter @5 03 |
---|
C03 | 04 | X | FRE | @0 Reconnaissance optique caractère @5 04 |
---|
C03 | 04 | X | ENG | @0 Optical character recognition @5 04 |
---|
C03 | 04 | X | SPA | @0 Reconocimento óptico de caracteres @5 04 |
---|
C03 | 05 | X | FRE | @0 Transformation linéaire @5 05 |
---|
C03 | 05 | X | ENG | @0 Linear transformation @5 05 |
---|
C03 | 05 | X | SPA | @0 Transformación lineal @5 05 |
---|
C03 | 06 | X | FRE | @0 Transformation ondelette @5 06 |
---|
C03 | 06 | X | ENG | @0 Wavelet transformation @5 06 |
---|
C03 | 06 | X | SPA | @0 Transformación ondita @5 06 |
---|
C03 | 07 | X | FRE | @0 Analyse composante principale @5 07 |
---|
C03 | 07 | X | ENG | @0 Principal component analysis @5 07 |
---|
C03 | 07 | X | SPA | @0 Análisis componente principal @5 07 |
---|
C03 | 08 | X | FRE | @0 Extraction forme @5 08 |
---|
C03 | 08 | X | ENG | @0 Pattern extraction @5 08 |
---|
C03 | 08 | X | SPA | @0 Extracción forma @5 08 |
---|
C03 | 09 | X | FRE | @0 Procédé extraction @5 09 |
---|
C03 | 09 | X | ENG | @0 Extraction process @5 09 |
---|
C03 | 09 | X | SPA | @0 Procedimiento extracción @5 09 |
---|
C03 | 10 | X | FRE | @0 Multilinguisme @5 10 |
---|
C03 | 10 | X | ENG | @0 Multilingualism @5 10 |
---|
C03 | 10 | X | SPA | @0 Multilinguismo @5 10 |
---|
C03 | 11 | 1 | FRE | @0 Extraction caractéristique @5 11 |
---|
C03 | 11 | 1 | ENG | @0 Feature extraction @5 11 |
---|
C03 | 12 | X | FRE | @0 Bilinguisme @5 12 |
---|
C03 | 12 | X | ENG | @0 Bilingualism @5 12 |
---|
C03 | 12 | X | SPA | @0 Bilingüismo @5 12 |
---|
N21 | | | | @1 160 |
---|
N82 | | | | @1 PSI |
---|
|
pR |
A30 | 01 | 1 | ENG | @1 IAPR workshop on document analysis systems @2 5 @3 Princeton NJ USA @4 2002-08-19 |
---|
|
Format Inist (serveur)
NO : | PASCAL 03-0249356 INIST |
ET : | Optimal feature extraction for bilingual OCR |
AU : | DHANYA (D.); RAMAKRISHNAN (A. G.); LOPRESTI (Daniel); JIANYING HU; KASHI (Ramanujan) |
AF : | Department of Electrical Engineering, Indian Institute of Science/Bangalore/Inde (1 aut., 2 aut.) |
DT : | Publication en série; Congrès; Niveau analytique |
SO : | Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 2002; Vol. 2423; Pp. 25-36; Bibl. 8 ref. |
LA : | Anglais |
EA : | Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques: Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations. |
CC : | 001D02C03 |
FD : | Maximisation fonction; Reconnaissance forme; Reconnaissance caractère; Reconnaissance optique caractère; Transformation linéaire; Transformation ondelette; Analyse composante principale; Extraction forme; Procédé extraction; Multilinguisme; Extraction caractéristique; Bilinguisme |
ED : | Function maximization; Pattern recognition; Character recognition; Optical character recognition; Linear transformation; Wavelet transformation; Principal component analysis; Pattern extraction; Extraction process; Multilingualism; Feature extraction; Bilingualism |
SD : | Maximización función; Reconocimiento patrón; Reconocimiento carácter; Reconocimento óptico de caracteres; Transformación lineal; Transformación ondita; Análisis componente principal; Extracción forma; Procedimiento extracción; Multilinguismo; Bilingüismo |
LO : | INIST-16343.354000108470940030 |
ID : | 03-0249356 |
Links to Exploration step
Pascal:03-0249356
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Optimal feature extraction for bilingual OCR</title>
<author><name sortKey="Dhanya, D" sort="Dhanya, D" uniqKey="Dhanya D" first="D." last="Dhanya">D. Dhanya</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Indian Institute of Science</s1>
<s2>Bangalore</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Ramakrishnan, A G" sort="Ramakrishnan, A G" uniqKey="Ramakrishnan A" first="A. G." last="Ramakrishnan">A. G. Ramakrishnan</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Indian Institute of Science</s1>
<s2>Bangalore</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">03-0249356</idno>
<date when="2002">2002</date>
<idno type="stanalyst">PASCAL 03-0249356 INIST</idno>
<idno type="RBID">Pascal:03-0249356</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000620</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Optimal feature extraction for bilingual OCR</title>
<author><name sortKey="Dhanya, D" sort="Dhanya, D" uniqKey="Dhanya D" first="D." last="Dhanya">D. Dhanya</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Indian Institute of Science</s1>
<s2>Bangalore</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Ramakrishnan, A G" sort="Ramakrishnan, A G" uniqKey="Ramakrishnan A" first="A. G." last="Ramakrishnan">A. G. Ramakrishnan</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Indian Institute of Science</s1>
<s2>Bangalore</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="2002">2002</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Bilingualism</term>
<term>Character recognition</term>
<term>Extraction process</term>
<term>Feature extraction</term>
<term>Function maximization</term>
<term>Linear transformation</term>
<term>Multilingualism</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Principal component analysis</term>
<term>Wavelet transformation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Maximisation fonction</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Transformation linéaire</term>
<term>Transformation ondelette</term>
<term>Analyse composante principale</term>
<term>Extraction forme</term>
<term>Procédé extraction</term>
<term>Multilinguisme</term>
<term>Extraction caractéristique</term>
<term>Bilinguisme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques: Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0302-9743</s0>
</fA01>
<fA05><s2>2423</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>Optimal feature extraction for bilingual OCR</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>DAS 2002 : document analysis systems V : Princeton NJ, 19-21 August 2002</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>DHANYA (D.)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>RAMAKRISHNAN (A. G.)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>LOPRESTI (Daniel)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>JIANYING HU</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>KASHI (Ramanujan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Department of Electrical Engineering, Indian Institute of Science</s1>
<s2>Bangalore</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</fA14>
<fA20><s1>25-36</s1>
</fA20>
<fA21><s1>2002</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA26 i1="01"><s0>3-540-44068-2</s0>
</fA26>
<fA43 i1="01"><s1>INIST</s1>
<s2>16343</s2>
<s5>354000108470940030</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2003 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>8 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>03-0249356</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA64 i1="01" i2="1"><s0>Lecture notes in computer science</s0>
</fA64>
<fA66 i1="01"><s0>DEU</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques: Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Maximisation fonction</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Function maximization</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Maximización función</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Reconnaissance forme</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Pattern recognition</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Reconocimiento patrón</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Reconnaissance caractère</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Character recognition</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Reconocimiento carácter</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Optical character recognition</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Reconocimento óptico de caracteres</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Transformation linéaire</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Linear transformation</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Transformación lineal</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Transformation ondelette</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Wavelet transformation</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Transformación ondita</s0>
<s5>06</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Analyse composante principale</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Principal component analysis</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Análisis componente principal</s0>
<s5>07</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Extraction forme</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Pattern extraction</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Extracción forma</s0>
<s5>08</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Procédé extraction</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Extraction process</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Procedimiento extracción</s0>
<s5>09</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Multilinguisme</s0>
<s5>10</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Multilingualism</s0>
<s5>10</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Multilinguismo</s0>
<s5>10</s5>
</fC03>
<fC03 i1="11" i2="1" l="FRE"><s0>Extraction caractéristique</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="1" l="ENG"><s0>Feature extraction</s0>
<s5>11</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Bilinguisme</s0>
<s5>12</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Bilingualism</s0>
<s5>12</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Bilingüismo</s0>
<s5>12</s5>
</fC03>
<fN21><s1>160</s1>
</fN21>
<fN82><s1>PSI</s1>
</fN82>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>IAPR workshop on document analysis systems</s1>
<s2>5</s2>
<s3>Princeton NJ USA</s3>
<s4>2002-08-19</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 03-0249356 INIST</NO>
<ET>Optimal feature extraction for bilingual OCR</ET>
<AU>DHANYA (D.); RAMAKRISHNAN (A. G.); LOPRESTI (Daniel); JIANYING HU; KASHI (Ramanujan)</AU>
<AF>Department of Electrical Engineering, Indian Institute of Science/Bangalore/Inde (1 aut., 2 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 2002; Vol. 2423; Pp. 25-36; Bibl. 8 ref.</SO>
<LA>Anglais</LA>
<EA>Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques: Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations.</EA>
<CC>001D02C03</CC>
<FD>Maximisation fonction; Reconnaissance forme; Reconnaissance caractère; Reconnaissance optique caractère; Transformation linéaire; Transformation ondelette; Analyse composante principale; Extraction forme; Procédé extraction; Multilinguisme; Extraction caractéristique; Bilinguisme</FD>
<ED>Function maximization; Pattern recognition; Character recognition; Optical character recognition; Linear transformation; Wavelet transformation; Principal component analysis; Pattern extraction; Extraction process; Multilingualism; Feature extraction; Bilingualism</ED>
<SD>Maximización función; Reconocimiento patrón; Reconocimiento carácter; Reconocimento óptico de caracteres; Transformación lineal; Transformación ondita; Análisis componente principal; Extracción forma; Procedimiento extracción; Multilinguismo; Bilingüismo</SD>
<LO>INIST-16343.354000108470940030</LO>
<ID>03-0249356</ID>
</server>
</inist>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000620 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000620 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien
|wiki= Ticri/CIDE
|area= OcrV1
|flux= PascalFrancis
|étape= Corpus
|type= RBID
|clé= Pascal:03-0249356
|texte= Optimal feature extraction for bilingual OCR
}}
| This area was generated with Dilib version V0.6.32. Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024 | ![](Common/icons/LogoDilib.gif) |