Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Date of Birth Extraction Using Precise Shallow Parsing

Identifieur interne : 000162 ( PascalFrancis/Corpus ); précédent : 000161; suivant : 000163

Date of Birth Extraction Using Precise Shallow Parsing

Auteurs : Ray Pereda ; Kazem Taghva

Source :

RBID : Pascal:10-0429703

Descripteurs français

English descriptors

Abstract

This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 0277-786X
A02 01      @0 PSISDG
A03   1    @0 Proc. SPIE Int. Soc. Opt. Eng.
A05       @2 7534
A08 01  1  ENG  @1 Date of Birth Extraction Using Precise Shallow Parsing
A09 01  1  ENG  @1 Document recognition and retrieval XVII : 19-21 January 2010, San Jose, California, United States
A11 01  1    @1 PEREDA (Ray)
A11 02  1    @1 TAGHVA (Kazem)
A12 01  1    @1 LIKFORMAN-SULEM (Laurence) @9 ed.
A12 02  1    @1 AGAM (Gady) @9 ed.
A14 01      @1 Information Science Research Institute University of Nevada, Las Vegas @2 Las Vegas, NV 89154-4021 @3 USA @Z 1 aut. @Z 2 aut.
A18 01  1    @1 SPIE @3 USA @9 org-cong.
A18 02  1    @1 IS&T @3 USA @9 org-cong.
A18 03  1    @1 Institut TELECOM @3 FRA @9 org-cong.
A20       @2 753406.1-753406.7
A21       @1 2010
A23 01      @0 ENG
A25 01      @1 SPIE @2 Bellingham WA
A26 01      @0 978-0-8194-7927-3
A43 01      @1 INIST @2 21760 @5 354000174683810050
A44       @0 0000 @1 © 2010 INIST-CNRS. All rights reserved.
A45       @0 10 ref.
A47 01  1    @0 10-0429703
A60       @1 P @2 C
A61       @0 A
A64 01  1    @0 Proceedings of SPIE, the International Society for Optical Engineering
A66 01      @0 USA
C01 01    ENG  @0 This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.
C02 01  3    @0 001B00A30C
C02 02  3    @0 001B40B30S
C02 03  X    @0 001D04A05A
C02 04  X    @0 001D04A03
C03 01  3  FRE  @0 Reconnaissance forme @5 61
C03 01  3  ENG  @0 Pattern recognition @5 61
C03 02  X  FRE  @0 Recherche documentaire @5 62
C03 02  X  ENG  @0 Document retrieval @5 62
C03 02  X  SPA  @0 Búsqueda documental @5 62
C03 03  X  FRE  @0 Analyse syntaxique @5 63
C03 03  X  ENG  @0 Syntactic analysis @5 63
C03 03  X  SPA  @0 Análisis sintáxico @5 63
C03 04  3  FRE  @0 Implémentation @5 64
C03 04  3  ENG  @0 Implementation @5 64
C03 05  3  FRE  @0 Reconnaissance optique caractère @5 65
C03 05  3  ENG  @0 Optical character recognition @5 65
C03 06  X  FRE  @0 Précision élevée @5 66
C03 06  X  ENG  @0 High precision @5 66
C03 06  X  SPA  @0 Precisión elevada @5 66
C03 07  X  FRE  @0 Extraction information @5 67
C03 07  X  ENG  @0 Information extraction @5 67
C03 07  X  SPA  @0 Extracción información @5 67
C03 08  3  FRE  @0 0130C @4 INC @5 83
C03 09  3  FRE  @0 4230S @4 INC @5 91
C07 01  X  FRE  @0 Traitement information @5 68
C07 01  X  ENG  @0 Information processing @5 68
C07 01  X  SPA  @0 Procesamiento información @5 68
N21       @1 277
N44 01      @1 OTO
N82       @1 OTO
pR  
A30 01  1  ENG  @1 Document recognition and retrieval @2 17 @3 San Jose CA USA @4 2010

Format Inist (serveur)

NO : PASCAL 10-0429703 INIST
ET : Date of Birth Extraction Using Precise Shallow Parsing
AU : PEREDA (Ray); TAGHVA (Kazem); LIKFORMAN-SULEM (Laurence); AGAM (Gady)
AF : Information Science Research Institute University of Nevada, Las Vegas/Las Vegas, NV 89154-4021/Etats-Unis (1 aut., 2 aut.)
DT : Publication en série; Congrès; Niveau analytique
SO : Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Coden PSISDG; Etats-Unis; Da. 2010; Vol. 7534; 753406.1-753406.7; Bibl. 10 ref.
LA : Anglais
EA : This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.
CC : 001B00A30C; 001B40B30S; 001D04A05A; 001D04A03
FD : Reconnaissance forme; Recherche documentaire; Analyse syntaxique; Implémentation; Reconnaissance optique caractère; Précision élevée; Extraction information; 0130C; 4230S
FG : Traitement information
ED : Pattern recognition; Document retrieval; Syntactic analysis; Implementation; Optical character recognition; High precision; Information extraction
EG : Information processing
SD : Búsqueda documental; Análisis sintáxico; Precisión elevada; Extracción información
LO : INIST-21760.354000174683810050
ID : 10-0429703

Links to Exploration step

Pascal:10-0429703

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Date of Birth Extraction Using Precise Shallow Parsing</title>
<author>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0429703</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0429703 INIST</idno>
<idno type="RBID">Pascal:10-0429703</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000162</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Date of Birth Extraction Using Precise Shallow Parsing</title>
<author>
<name sortKey="Pereda, Ray" sort="Pereda, Ray" uniqKey="Pereda R" first="Ray" last="Pereda">Ray Pereda</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Document retrieval</term>
<term>High precision</term>
<term>Implementation</term>
<term>Information extraction</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Syntactic analysis</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance forme</term>
<term>Recherche documentaire</term>
<term>Analyse syntaxique</term>
<term>Implémentation</term>
<term>Reconnaissance optique caractère</term>
<term>Précision élevée</term>
<term>Extraction information</term>
<term>0130C</term>
<term>4230S</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0277-786X</s0>
</fA01>
<fA02 i1="01">
<s0>PSISDG</s0>
</fA02>
<fA03 i2="1">
<s0>Proc. SPIE Int. Soc. Opt. Eng.</s0>
</fA03>
<fA05>
<s2>7534</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG">
<s1>Date of Birth Extraction Using Precise Shallow Parsing</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval XVII : 19-21 January 2010, San Jose, California, United States</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>PEREDA (Ray)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>TAGHVA (Kazem)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>LIKFORMAN-SULEM (Laurence)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>AGAM (Gady)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Information Science Research Institute University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1">
<s1>SPIE</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1">
<s1>IS&T</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="03" i2="1">
<s1>Institut TELECOM</s1>
<s3>FRA</s3>
<s9>org-cong.</s9>
</fA18>
<fA20>
<s2>753406.1-753406.7</s2>
</fA20>
<fA21>
<s1>2010</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA25 i1="01">
<s1>SPIE</s1>
<s2>Bellingham WA</s2>
</fA25>
<fA26 i1="01">
<s0>978-0-8194-7927-3</s0>
</fA26>
<fA43 i1="01">
<s1>INIST</s1>
<s2>21760</s2>
<s5>354000174683810050</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2010 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>10 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>10-0429703</s0>
</fA47>
<fA60>
<s1>P</s1>
<s2>C</s2>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Proceedings of SPIE, the International Society for Optical Engineering</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.</s0>
</fC01>
<fC02 i1="01" i2="3">
<s0>001B00A30C</s0>
</fC02>
<fC02 i1="02" i2="3">
<s0>001B40B30S</s0>
</fC02>
<fC02 i1="03" i2="X">
<s0>001D04A05A</s0>
</fC02>
<fC02 i1="04" i2="X">
<s0>001D04A03</s0>
</fC02>
<fC03 i1="01" i2="3" l="FRE">
<s0>Reconnaissance forme</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="3" l="ENG">
<s0>Pattern recognition</s0>
<s5>61</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Recherche documentaire</s0>
<s5>62</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Document retrieval</s0>
<s5>62</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Búsqueda documental</s0>
<s5>62</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Analyse syntaxique</s0>
<s5>63</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Syntactic analysis</s0>
<s5>63</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Análisis sintáxico</s0>
<s5>63</s5>
</fC03>
<fC03 i1="04" i2="3" l="FRE">
<s0>Implémentation</s0>
<s5>64</s5>
</fC03>
<fC03 i1="04" i2="3" l="ENG">
<s0>Implementation</s0>
<s5>64</s5>
</fC03>
<fC03 i1="05" i2="3" l="FRE">
<s0>Reconnaissance optique caractère</s0>
<s5>65</s5>
</fC03>
<fC03 i1="05" i2="3" l="ENG">
<s0>Optical character recognition</s0>
<s5>65</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Précision élevée</s0>
<s5>66</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>High precision</s0>
<s5>66</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Precisión elevada</s0>
<s5>66</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Extraction information</s0>
<s5>67</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Information extraction</s0>
<s5>67</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Extracción información</s0>
<s5>67</s5>
</fC03>
<fC03 i1="08" i2="3" l="FRE">
<s0>0130C</s0>
<s4>INC</s4>
<s5>83</s5>
</fC03>
<fC03 i1="09" i2="3" l="FRE">
<s0>4230S</s0>
<s4>INC</s4>
<s5>91</s5>
</fC03>
<fC07 i1="01" i2="X" l="FRE">
<s0>Traitement information</s0>
<s5>68</s5>
</fC07>
<fC07 i1="01" i2="X" l="ENG">
<s0>Information processing</s0>
<s5>68</s5>
</fC07>
<fC07 i1="01" i2="X" l="SPA">
<s0>Procesamiento información</s0>
<s5>68</s5>
</fC07>
<fN21>
<s1>277</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
<pR>
<fA30 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval</s1>
<s2>17</s2>
<s3>San Jose CA USA</s3>
<s4>2010</s4>
</fA30>
</pR>
</standard>
<server>
<NO>PASCAL 10-0429703 INIST</NO>
<ET>Date of Birth Extraction Using Precise Shallow Parsing</ET>
<AU>PEREDA (Ray); TAGHVA (Kazem); LIKFORMAN-SULEM (Laurence); AGAM (Gady)</AU>
<AF>Information Science Research Institute University of Nevada, Las Vegas/Las Vegas, NV 89154-4021/Etats-Unis (1 aut., 2 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Coden PSISDG; Etats-Unis; Da. 2010; Vol. 7534; 753406.1-753406.7; Bibl. 10 ref.</SO>
<LA>Anglais</LA>
<EA>This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information from OCR text. Although the program finds data of birth information with high precision and recall, this type of information extraction task seems to be negatively impacted by OCR errors.</EA>
<CC>001B00A30C; 001B40B30S; 001D04A05A; 001D04A03</CC>
<FD>Reconnaissance forme; Recherche documentaire; Analyse syntaxique; Implémentation; Reconnaissance optique caractère; Précision élevée; Extraction information; 0130C; 4230S</FD>
<FG>Traitement information</FG>
<ED>Pattern recognition; Document retrieval; Syntactic analysis; Implementation; Optical character recognition; High precision; Information extraction</ED>
<EG>Information processing</EG>
<SD>Búsqueda documental; Análisis sintáxico; Precisión elevada; Extracción información</SD>
<LO>INIST-21760.354000174683810050</LO>
<ID>10-0429703</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000162 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000162 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:10-0429703
   |texte=   Date of Birth Extraction Using Precise Shallow Parsing
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024