Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Data extraction from form images

Identifieur interne : 002F19 ( Istex/Corpus ); précédent : 002F18; suivant : 002F20

Data extraction from form images

Auteurs : F. Cesarini ; M. Gori ; S. Marinai ; G. Soda

Source :

RBID : ISTEX:2ED2F837534801DCF39D5ADFAF2F044D26A17A55

Abstract

Abstract: In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.

Url:
DOI: 10.1007/BFb0049141

Links to Exploration step

ISTEX:2ED2F837534801DCF39D5ADFAF2F044D26A17A55

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Data extraction from form images</title>
<author>
<name sortKey="Cesarini, F" sort="Cesarini, F" uniqKey="Cesarini F" first="F." last="Cesarini">F. Cesarini</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gori, M" sort="Gori, M" uniqKey="Gori M" first="M." last="Gori">M. Gori</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Marinai, S" sort="Marinai, S" uniqKey="Marinai S" first="S." last="Marinai">S. Marinai</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: simone@mcculloch.ing.unifi.it</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Soda, G" sort="Soda, G" uniqKey="Soda G" first="G." last="Soda">G. Soda</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:2ED2F837534801DCF39D5ADFAF2F044D26A17A55</idno>
<date when="1995" year="1995">1995</date>
<idno type="doi">10.1007/BFb0049141</idno>
<idno type="url">https://api.istex.fr/document/2ED2F837534801DCF39D5ADFAF2F044D26A17A55/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002F19</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Data extraction from form images</title>
<author>
<name sortKey="Cesarini, F" sort="Cesarini, F" uniqKey="Cesarini F" first="F." last="Cesarini">F. Cesarini</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gori, M" sort="Gori, M" uniqKey="Gori M" first="M." last="Gori">M. Gori</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Marinai, S" sort="Marinai, S" uniqKey="Marinai S" first="S." last="Marinai">S. Marinai</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: simone@mcculloch.ing.unifi.it</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Soda, G" sort="Soda, G" uniqKey="Soda G" first="G." last="Soda">G. Soda</name>
<affiliation>
<mods:affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>1995</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">2ED2F837534801DCF39D5ADFAF2F044D26A17A55</idno>
<idno type="DOI">10.1007/BFb0049141</idno>
<idno type="ChapterID">42</idno>
<idno type="ChapterID">Chap42</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.</div>
</front>
</TEI>
<istex>
<corpusName>springer</corpusName>
<author>
<json:item>
<name>F. Cesarini</name>
<affiliations>
<json:string>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</json:string>
</affiliations>
</json:item>
<json:item>
<name>M. Gori</name>
<affiliations>
<json:string>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</json:string>
</affiliations>
</json:item>
<json:item>
<name>S. Marinai</name>
<affiliations>
<json:string>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</json:string>
<json:string>E-mail: simone@mcculloch.ing.unifi.it</json:string>
</affiliations>
</json:item>
<json:item>
<name>G. Soda</name>
<affiliations>
<json:string>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</json:string>
</affiliations>
</json:item>
</author>
<language>
<json:string>eng</json:string>
</language>
<abstract>Abstract: In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.</abstract>
<qualityIndicators>
<score>5.401</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>439.208 x 662.424 pts</pdfPageSize>
<refBibsNative>false</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>663</abstractCharCount>
<pdfWordCount>4201</pdfWordCount>
<pdfCharCount>22669</pdfCharCount>
<pdfPageCount>11</pdfPageCount>
<abstractWordCount>100</abstractWordCount>
</qualityIndicators>
<title>Data extraction from form images</title>
<genre.original>
<json:string>ReviewPaper</json:string>
</genre.original>
<chapterId>
<json:string>42</json:string>
<json:string>Chap42</json:string>
</chapterId>
<genre>
<json:string>conference [eBooks]</json:string>
</genre>
<serie>
<editor>
<json:item>
<name>Gerhard Goos</name>
</json:item>
<json:item>
<name>Juris Hartmanis</name>
</json:item>
<json:item>
<name>Jan van Leeuwen</name>
</json:item>
<json:item>
<name>W. Brauer</name>
</json:item>
<json:item>
<name>D. Greis</name>
</json:item>
<json:item>
<name>J. Stoer</name>
</json:item>
</editor>
<issn>
<json:string>0302-9743</json:string>
</issn>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1611-3349</json:string>
</eissn>
<title>Lecture Notes in Computer Science</title>
<copyrightDate>1995</copyrightDate>
</serie>
<host>
<editor>
<json:item>
<name>Norman Revell</name>
</json:item>
<json:item>
<name>A Min Tjoa</name>
</json:item>
</editor>
<subject>
<json:item>
<value>Computer Science</value>
</json:item>
<json:item>
<value>Computer Science</value>
</json:item>
<json:item>
<value>Database Management</value>
</json:item>
<json:item>
<value>Artificial Intelligence (incl. Robotics)</value>
</json:item>
<json:item>
<value>Information Systems Applications (incl.Internet)</value>
</json:item>
<json:item>
<value>Business Information Systems</value>
</json:item>
<json:item>
<value>Information Storage and Retrieval</value>
</json:item>
</subject>
<isbn>
<json:string>978-3-540-60303-0</json:string>
</isbn>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1611-3349</json:string>
</eissn>
<title>Database and Expert Systems Applications</title>
<genre.original>
<json:string>Proceedings</json:string>
</genre.original>
<bookId>
<json:string>3540603034</json:string>
</bookId>
<volume>978</volume>
<pages>
<last>448</last>
<first>438</first>
</pages>
<issn>
<json:string>0302-9743</json:string>
</issn>
<genre>
<json:string>Book Series</json:string>
</genre>
<eisbn>
<json:string>978-3-540-44790-0</json:string>
</eisbn>
<copyrightDate>1995</copyrightDate>
<doi>
<json:string>10.1007/BFb0049099</json:string>
</doi>
</host>
<publicationDate>1995</publicationDate>
<copyrightDate>1995</copyrightDate>
<doi>
<json:string>10.1007/BFb0049141</json:string>
</doi>
<id>2ED2F837534801DCF39D5ADFAF2F044D26A17A55</id>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/2ED2F837534801DCF39D5ADFAF2F044D26A17A55/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/2ED2F837534801DCF39D5ADFAF2F044D26A17A55/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/2ED2F837534801DCF39D5ADFAF2F044D26A17A55/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Data extraction from form images</title>
<respStmt xml:id="ISTEX-API" resp="Références bibliographiques récupérées via GROBID" name="ISTEX-API (INIST-CNRS)"></respStmt>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<availability>
<p>SPRINGER</p>
</availability>
<date>1995</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Data extraction from form images</title>
<author>
<persName>
<forename type="first">F.</forename>
<surname>Cesarini</surname>
</persName>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
</author>
<author>
<persName>
<forename type="first">M.</forename>
<surname>Gori</surname>
</persName>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
</author>
<author>
<persName>
<forename type="first">S.</forename>
<surname>Marinai</surname>
</persName>
<email>simone@mcculloch.ing.unifi.it</email>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
</author>
<author>
<persName>
<forename type="first">G.</forename>
<surname>Soda</surname>
</persName>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
</author>
</analytic>
<monogr>
<title level="m">Database and Expert Systems Applications</title>
<title level="m" type="sub">6th International Conference, DEXA '95 London, United Kingdom, September 4–8, 1995 Proceedings</title>
<idno type="pISBN">978-3-540-60303-0</idno>
<idno type="eISBN">978-3-540-44790-0</idno>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="DOI">10.1007/BFb0049099</idno>
<idno type="BookID">3540603034</idno>
<idno type="BookTitleID">42656</idno>
<idno type="BookVolumeNumber">978</idno>
<idno type="BookChapterCount">62</idno>
<editor>
<persName>
<forename type="first">Norman</forename>
<surname>Revell</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">A</forename>
<forename type="first">Min</forename>
<surname>Tjoa</surname>
</persName>
</editor>
<imprint>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<date type="published" when="1995"></date>
<biblScope unit="volume">978</biblScope>
<biblScope unit="page" from="438">438</biblScope>
<biblScope unit="page" to="448">448</biblScope>
</imprint>
</monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<editor>
<persName>
<forename type="first">Gerhard</forename>
<surname>Goos</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">Juris</forename>
<surname>Hartmanis</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">Jan</forename>
<surname>van Leeuwen</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">W.</forename>
<surname>Brauer</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">D.</forename>
<surname>Greis</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">J.</forename>
<surname>Stoer</surname>
</persName>
</editor>
<biblScope>
<date>1995</date>
</biblScope>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="seriesId">558</idno>
</series>
<idno type="istex">2ED2F837534801DCF39D5ADFAF2F044D26A17A55</idno>
<idno type="DOI">10.1007/BFb0049141</idno>
<idno type="ChapterID">42</idno>
<idno type="ChapterID">Chap42</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>1995</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Abstract: In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.</p>
</abstract>
<textClass>
<keywords scheme="Book Subject Collection">
<list>
<label>SUCO11645</label>
<item>
<term>Computer Science</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Book Subject Group">
<list>
<label>I</label>
<label>I18024</label>
<label>I21017</label>
<label>I18040</label>
<label>W26007</label>
<label>I18032</label>
<item>
<term>Computer Science</term>
</item>
<item>
<term>Database Management</term>
</item>
<item>
<term>Artificial Intelligence (incl. Robotics)</term>
</item>
<item>
<term>Information Systems Applications (incl.Internet)</term>
</item>
<item>
<term>Business Information Systems</term>
</item>
<item>
<term>Information Storage and Retrieval</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="1995">Published</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-3-20">References added</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/2ED2F837534801DCF39D5ADFAF2F044D26A17A55/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="Springer, Publisher found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//Springer-Verlag//DTD A++ V2.4//EN" URI="http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd" name="istex:docType"></istex:docType>
<istex:document>
<Publisher>
<PublisherInfo>
<PublisherName>Springer Berlin Heidelberg</PublisherName>
<PublisherLocation>Berlin, Heidelberg</PublisherLocation>
</PublisherInfo>
<Series>
<SeriesInfo TocLevels="0">
<SeriesID>558</SeriesID>
<SeriesPrintISSN>0302-9743</SeriesPrintISSN>
<SeriesElectronicISSN>1611-3349</SeriesElectronicISSN>
<SeriesTitle Language="En">Lecture Notes in Computer Science</SeriesTitle>
<SeriesAbbreviatedTitle>Lect Notes Comput Sci</SeriesAbbreviatedTitle>
</SeriesInfo>
<SeriesHeader>
<EditorGroup>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>Gerhard</GivenName>
<FamilyName>Goos</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>Juris</GivenName>
<FamilyName>Hartmanis</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>Jan</GivenName>
<Particle>van</Particle>
<FamilyName>Leeuwen</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>W.</GivenName>
<FamilyName>Brauer</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>D.</GivenName>
<FamilyName>Greis</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>J.</GivenName>
<FamilyName>Stoer</FamilyName>
</EditorName>
</Editor>
</EditorGroup>
</SeriesHeader>
<Book Language="En">
<BookInfo MediaType="eBook" BookProductType="Proceedings" Language="En" NumberingStyle="Unnumbered" TocLevels="0">
<BookID>3540603034</BookID>
<BookTitle>Database and Expert Systems Applications</BookTitle>
<BookSubTitle>6th International Conference, DEXA '95 London, United Kingdom, September 4–8, 1995 Proceedings</BookSubTitle>
<BookVolumeNumber>978</BookVolumeNumber>
<BookDOI>10.1007/BFb0049099</BookDOI>
<BookTitleID>42656</BookTitleID>
<BookPrintISBN>978-3-540-60303-0</BookPrintISBN>
<BookElectronicISBN>978-3-540-44790-0</BookElectronicISBN>
<BookChapterCount>62</BookChapterCount>
<BookCopyright>
<CopyrightHolderName>Springer-Verlag</CopyrightHolderName>
<CopyrightYear>1995</CopyrightYear>
</BookCopyright>
<BookSubjectGroup>
<BookSubject Code="I" Type="Primary">Computer Science</BookSubject>
<BookSubject Code="I18024" Priority="1" Type="Secondary">Database Management</BookSubject>
<BookSubject Code="I21017" Priority="2" Type="Secondary">Artificial Intelligence (incl. Robotics)</BookSubject>
<BookSubject Code="I18040" Priority="3" Type="Secondary">Information Systems Applications (incl.Internet)</BookSubject>
<BookSubject Code="W26007" Priority="4" Type="Secondary">Business Information Systems</BookSubject>
<BookSubject Code="I18032" Priority="5" Type="Secondary">Information Storage and Retrieval</BookSubject>
<SubjectCollection Code="SUCO11645">Computer Science</SubjectCollection>
</BookSubjectGroup>
</BookInfo>
<BookHeader>
<EditorGroup>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>Norman</GivenName>
<FamilyName>Revell</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>A</GivenName>
<GivenName>Min</GivenName>
<FamilyName>Tjoa</FamilyName>
</EditorName>
</Editor>
</EditorGroup>
</BookHeader>
<Chapter ID="Chap42" Language="En">
<ChapterInfo ChapterType="ReviewPaper" ContainsESM="No" NumberingStyle="Unnumbered" TocLevels="0">
<ChapterID>42</ChapterID>
<ChapterDOI>10.1007/BFb0049141</ChapterDOI>
<ChapterSequenceNumber>42</ChapterSequenceNumber>
<ChapterTitle Language="En">Data extraction from form images</ChapterTitle>
<ChapterFirstPage>438</ChapterFirstPage>
<ChapterLastPage>448</ChapterLastPage>
<ChapterCopyright>
<CopyrightHolderName>Springer-Verlag</CopyrightHolderName>
<CopyrightYear>1995</CopyrightYear>
</ChapterCopyright>
<ChapterHistory>
<OnlineDate>
<Year>2006</Year>
<Month>2</Month>
<Day>1</Day>
</OnlineDate>
</ChapterHistory>
<ChapterGrants Type="Regular">
<MetadataGrant Grant="OpenAccess"></MetadataGrant>
<AbstractGrant Grant="OpenAccess"></AbstractGrant>
<BodyPDFGrant Grant="Restricted"></BodyPDFGrant>
<BodyHTMLGrant Grant="Restricted"></BodyHTMLGrant>
<BibliographyGrant Grant="Restricted"></BibliographyGrant>
<ESMGrant Grant="Restricted"></ESMGrant>
</ChapterGrants>
<ChapterContext>
<SeriesID>558</SeriesID>
<BookID>3540603034</BookID>
<BookTitle>Database and Expert Systems Applications</BookTitle>
</ChapterContext>
</ChapterInfo>
<ChapterHeader>
<AuthorGroup>
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>F.</GivenName>
<FamilyName>Cesarini</FamilyName>
</AuthorName>
<Contact>
<Fax>+39 55 4796363</Fax>
</Contact>
</Author>
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>M.</GivenName>
<FamilyName>Gori</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>S.</GivenName>
<FamilyName>Marinai</FamilyName>
</AuthorName>
<Contact>
<Email>simone@mcculloch.ing.unifi.it</Email>
</Contact>
</Author>
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>G.</GivenName>
<FamilyName>Soda</FamilyName>
</AuthorName>
</Author>
<Affiliation ID="Aff1">
<OrgDivision>Dipartimento di Sistemi e Informatica</OrgDivision>
<OrgName>Università di Firenze</OrgName>
<OrgAddress>
<Street>Via S.Marta,3</Street>
<Postcode>50139</Postcode>
<City>Firenze</City>
<Country>Italia</Country>
</OrgAddress>
</Affiliation>
</AuthorGroup>
<Abstract ID="Abs1" Language="En">
<Heading>Abstract</Heading>
<Para>In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.</Para>
</Abstract>
<KeywordGroup Language="En">
<Heading>Keywords</Heading>
<Keyword>Attributed Relational Graphs</Keyword>
<Keyword>Document Registration</Keyword>
<Keyword>Form Processing</Keyword>
<Keyword>Layout Description</Keyword>
</KeywordGroup>
</ChapterHeader>
<NoBody></NoBody>
</Chapter>
</Book>
</Series>
</Publisher>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Data extraction from form images</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en">
<title>Data extraction from form images</title>
</titleInfo>
<name type="personal">
<namePart type="given">F.</namePart>
<namePart type="family">Cesarini</namePart>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">M.</namePart>
<namePart type="family">Gori</namePart>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">S.</namePart>
<namePart type="family">Marinai</namePart>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
<affiliation>E-mail: simone@mcculloch.ing.unifi.it</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">G.</namePart>
<namePart type="family">Soda</namePart>
<affiliation>Dipartimento di Sistemi e Informatica, Università di Firenze, Via S.Marta,3, 50139, Firenze, Italia</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="conference [eBooks]" displayLabel="ReviewPaper"></genre>
<originInfo>
<publisher>Springer Berlin Heidelberg</publisher>
<place>
<placeTerm type="text">Berlin, Heidelberg</placeTerm>
</place>
<dateIssued encoding="w3cdtf">1995</dateIssued>
<copyrightDate encoding="w3cdtf">1995</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">Abstract: In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.</abstract>
<relatedItem type="host">
<titleInfo>
<title>Database and Expert Systems Applications</title>
<subTitle>6th International Conference, DEXA '95 London, United Kingdom, September 4–8, 1995 Proceedings</subTitle>
</titleInfo>
<name type="personal">
<namePart type="given">Norman</namePart>
<namePart type="family">Revell</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">A</namePart>
<namePart type="given">Min</namePart>
<namePart type="family">Tjoa</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="Book Series" displayLabel="Proceedings"></genre>
<originInfo>
<copyrightDate encoding="w3cdtf">1995</copyrightDate>
<issuance>monographic</issuance>
</originInfo>
<subject>
<genre>Book Subject Collection</genre>
<topic authority="SpringerSubjectCodes" authorityURI="SUCO11645">Computer Science</topic>
</subject>
<subject>
<genre>Book Subject Group</genre>
<topic authority="SpringerSubjectCodes" authorityURI="I">Computer Science</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18024">Database Management</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I21017">Artificial Intelligence (incl. Robotics)</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18040">Information Systems Applications (incl.Internet)</topic>
<topic authority="SpringerSubjectCodes" authorityURI="W26007">Business Information Systems</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18032">Information Storage and Retrieval</topic>
</subject>
<identifier type="DOI">10.1007/BFb0049099</identifier>
<identifier type="ISBN">978-3-540-60303-0</identifier>
<identifier type="eISBN">978-3-540-44790-0</identifier>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="BookTitleID">42656</identifier>
<identifier type="BookID">3540603034</identifier>
<identifier type="BookChapterCount">62</identifier>
<identifier type="BookVolumeNumber">978</identifier>
<part>
<date>1995</date>
<detail type="volume">
<number>978</number>
<caption>vol.</caption>
</detail>
<extent unit="pages">
<start>438</start>
<end>448</end>
</extent>
</part>
<recordInfo>
<recordOrigin>Springer-Verlag, 1995</recordOrigin>
</recordInfo>
</relatedItem>
<relatedItem type="series">
<titleInfo>
<title>Lecture Notes in Computer Science</title>
</titleInfo>
<name type="personal">
<namePart type="given">Gerhard</namePart>
<namePart type="family">Goos</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Juris</namePart>
<namePart type="family">Hartmanis</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jan</namePart>
<namePart type="family">van Leeuwen</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">W.</namePart>
<namePart type="family">Brauer</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">D.</namePart>
<namePart type="family">Greis</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">J.</namePart>
<namePart type="family">Stoer</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<copyrightDate encoding="w3cdtf">1995</copyrightDate>
<issuance>serial</issuance>
</originInfo>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="SeriesID">558</identifier>
<recordInfo>
<recordOrigin>Springer-Verlag, 1995</recordOrigin>
</recordInfo>
</relatedItem>
<identifier type="istex">2ED2F837534801DCF39D5ADFAF2F044D26A17A55</identifier>
<identifier type="DOI">10.1007/BFb0049141</identifier>
<identifier type="ChapterID">42</identifier>
<identifier type="ChapterID">Chap42</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Springer-Verlag, 1995</accessCondition>
<recordInfo>
<recordContentSource>SPRINGER</recordContentSource>
<recordOrigin>Springer-Verlag, 1995</recordOrigin>
</recordInfo>
</mods>
</metadata>
<enrichments>
<istex:refBibTEI uri="https://api.istex.fr/document/2ED2F837534801DCF39D5ADFAF2F044D26A17A55/enrichments/refBib">
<teiHeader></teiHeader>
<text>
<front></front>
<body></body>
<back>
<listBibl>
<biblStruct xml:id="b0">
<analytic>
<title level="a" type="main">Learning in Multilayered Networks Used as Autoassociators</title>
<author>
<persName>
<forename type="first">M</forename>
<surname>Bianchini</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">P</forename>
<surname>Frasconi</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<surname>Gori</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">IEEE Transaction on Neural Networks</title>
<imprint>
<biblScope unit="volume">6</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="512" to="515"></biblScope>
<date type="published" when="1995"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b1">
<analytic>
<title level="a" type="main">A Hybrid System for Locating Low Level Graphic Items</title>
<author>
<persName>
<forename type="first">F</forename>
<surname>Cesarini</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<surname>Gori</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Marinai</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">G</forename>
<surname>Soda</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">To appear in Proceedings of the First IAPR Workshop on Graphic Recognition</title>
<imprint>
<date type="published" when="1995"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b2">
<analytic>
<title level="a" type="main">A System for Data Extraction from Forms of Known Class</title>
<author>
<persName>
<forename type="first">F</forename>
<surname>Cesarini</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<surname>Gori</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Marinai</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">G</forename>
<surname>Soda</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">To appear in Plvceedings of the 3th International Conference on Document Analysis and Recognition</title>
<meeting>
<address>
<addrLine>Montreal</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="1995"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b3">
<analytic>
<title level="a" type="main">Rosenfeld The Processing of Form Documents</title>
<author>
<persName>
<forename type="first">D</forename>
<forename type="middle">S</forename>
<surname>Doermann</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">A</forename>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of International Conference on Document Analysis and Recognition</title>
<meeting>International Conference on Document Analysis and Recognition</meeting>
<imprint>
<date type="published" when="1993"></date>
<biblScope unit="page" from="497" to="501"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b4">
<analytic>
<title level="a" type="main">An Image Understanding System using Attributed Symbolic Representation and Inexact Graph-matching</title>
<author>
<persName>
<forename type="first">M</forename>
<forename type="middle">A</forename>
<surname>Eshera</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">K</forename>
<forename type="middle">S</forename>
<surname>Fu</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">IEEE Transaction on PAMI</title>
<imprint>
<biblScope unit="volume">8</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="604" to="617"></biblScope>
<date type="published" when="1986"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b5">
<monogr>
<title level="m" type="main">NIST Form-based Handprint Recognition System</title>
<author>
<persName>
<forename type="first">M</forename>
<forename type="middle">D</forename>
<surname>Garris</surname>
</persName>
</author>
<imprint>
<date type="published" when="1994-07"></date>
</imprint>
</monogr>
<note>NISTIR. 5469. U.S. Department of Commerce. Technology Administration. National Institute of Standards and Technology</note>
</biblStruct>
<biblStruct xml:id="b6">
<monogr>
<title level="m" type="main">Object Recognition by Computer, the Role of Geometric Constraints</title>
<author>
<persName>
<forename type="first">W</forename>
<forename type="middle">E L</forename>
<surname>Grimson</surname>
</persName>
</author>
<imprint>
<date type="published" when="1990"></date>
<publisher>MIT Press</publisher>
<pubPlace>Cambridge</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b7">
<analytic>
<title level="a" type="main">Multi-domain Document Layout Understanding</title>
<author>
<persName>
<forename type="first">S</forename>
<forename type="middle">W</forename>
<surname>Lam</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<forename type="middle">N</forename>
<surname>Srihari</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of International Conference on Document Analysis and Recognition</title>
<meeting>International Conference on Document Analysis and Recognition</meeting>
<imprint>
<date type="published" when="1991"></date>
<biblScope unit="page" from="112" to="120"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b8">
<analytic>
<title level="a" type="main">An Adaptive Approach to Document Classification and Understanding</title>
<author>
<persName>
<forename type="first">S</forename>
<forename type="middle">W</forename>
<surname>Lam</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the IAPR Workshop on Document Analysis Systems Kaiserslautern</title>
<meeting>the IAPR Workshop on Document Analysis Systems Kaiserslautern
<address>
<addrLine>Germany</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="1994-10"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b9">
<analytic>
<title level="a" type="main">Document Processing for Automatic Knowledge Acquisition</title>
<author>
<persName>
<forename type="first">Y</forename>
<forename type="middle">Y</forename>
<surname>Tang</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<surname>De Yan</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<forename type="middle">Y</forename>
<surname>Suen</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">IEEE Transaction on Knowledge and Data Engineering</title>
<imprint>
<biblScope unit="volume">6</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="3" to="20"></biblScope>
<date type="published" when="1994"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b10">
<analytic>
<title level="a" type="main">Form Understanding System Based on Form Description Language</title>
<author>
<persName>
<forename type="first">C</forename>
<forename type="middle">D</forename>
<surname>Yan</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">Y</forename>
<forename type="middle">Y</forename>
<surname>Tang</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<forename type="middle">Y</forename>
<surname>Suen</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of International Conference on Document Analysis and Recognition</title>
<meeting>International Conference on Document Analysis and Recognition</meeting>
<imprint>
<date type="published" when="1991"></date>
<biblScope unit="page" from="283" to="293"></biblScope>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</back>
</text>
</istex:refBibTEI>
</enrichments>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002F19 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 002F19 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:2ED2F837534801DCF39D5ADFAF2F044D26A17A55
   |texte=   Data extraction from form images
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024