OcrV1, Main, Exploration, bibRecord, 000D13

Improving Multimedia Retrieval with a Video OCR

Identifieur interne : 000D13 ( Main/Exploration ); précédent : 000D12; suivant : 000D14

Improving Multimedia Retrieval with a Video OCR

Auteurs : Dipanjan Das [États-Unis] ; DATONG CHEN [États-Unis] ; Alexander G. Hauptmann [États-Unis]

Source :

Proceedings electronic imaging science and technology

RBID : Pascal:08-0426688

Descripteurs français

Pascal (Inist)
- Communication multimédia, Reconnaissance optique caractère, Recherche information, Requête, Reconnaissance caractère, Evaluation performance, Reconnaissance automatique, Reconnaissance parole, Reconnaissance forme, Traitement parole.

English descriptors

KwdEn :
- Automatic recognition, Character recognition, Information retrieval, Multimedia communication, Optical character recognition, Pattern recognition, Performance evaluation, Query, Speech processing, Speech recognition.

Abstract

We present a set of experiments with a video OCR system (VOCR) tailored for video information retrieval and establish its importance in multimedia search in general and for some specific queries in particular. The system, inspired by an existing work on text detection and recognition in images, has been developed using techniques involving detailed analysis of video frames producing candidate text regions. The text regions are then binarized and sent to a commercial OCR resulting in ASCII text, that is finally used to create search indexes. The system is evaluated using the TRECVID data. We compare the system's performance from an information retrieval perspective with another VOCR developed using multi-frame integration and empirically demonstrate that deep analysis on individual video frames result in better video retrieval. We also evaluate the effect of various textual sources on multimedia retrieval by combining the VOCR outputs with automatic speech recognition (ASR) transcripts. For general search queries, the VOCR system coupled with ASR sources outperforms the other system by a very large extent. For search queries that involve named entities, especially people names, the VOCR system even outperforms speech transcripts, demonstrating that source selection for particular query types is extremely essential.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000266
to stream PascalFrancis, to step Curation: 000518
to stream PascalFrancis, to step Checkpoint: 000235
to stream Main, to step Merge: 000D25
to stream Main, to step Curation: 000D13

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Improving Multimedia Retrieval with a Video OCR</title>
<author><name sortKey="Das, Dipanjan" sort="Das, Dipanjan" uniqKey="Das D" first="Dipanjan" last="Das">Dipanjan Das</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Language Technologies Institute, Carnegie Mellon University</s1>
<s2>Pittsburgh, PA</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Datong Chen" sort="Datong Chen" uniqKey="Datong Chen" last="Datong Chen">DATONG CHEN</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Computer Science Department, Carnegie Mellon University</s1>
<s2>Pittsburgh, PA</s2>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Hauptmann, Alexander G" sort="Hauptmann, Alexander G" uniqKey="Hauptmann A" first="Alexander G." last="Hauptmann">Alexander G. Hauptmann</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Language Technologies Institute, Carnegie Mellon University</s1>
<s2>Pittsburgh, PA</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">08-0426688</idno>
<date when="2008">2008</date>
<idno type="stanalyst">PASCAL 08-0426688 INIST</idno>
<idno type="RBID">Pascal:08-0426688</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000266</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000518</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000235</idno>
<idno type="wicri:Area/Main/Merge">000D25</idno>
<idno type="wicri:Area/Main/Curation">000D13</idno>
<idno type="wicri:Area/Main/Exploration">000D13</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Improving Multimedia Retrieval with a Video OCR</title>
<author><name sortKey="Das, Dipanjan" sort="Das, Dipanjan" uniqKey="Das D" first="Dipanjan" last="Das">Dipanjan Das</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Language Technologies Institute, Carnegie Mellon University</s1>
<s2>Pittsburgh, PA</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Datong Chen" sort="Datong Chen" uniqKey="Datong Chen" last="Datong Chen">DATONG CHEN</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Computer Science Department, Carnegie Mellon University</s1>
<s2>Pittsburgh, PA</s2>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Hauptmann, Alexander G" sort="Hauptmann, Alexander G" uniqKey="Hauptmann A" first="Alexander G." last="Hauptmann">Alexander G. Hauptmann</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Language Technologies Institute, Carnegie Mellon University</s1>
<s2>Pittsburgh, PA</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Pennsylvanie</region>
<settlement type="city">Pittsburgh</settlement>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings electronic imaging science and technology</title>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings electronic imaging science and technology</title>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic recognition</term>
<term>Character recognition</term>
<term>Information retrieval</term>
<term>Multimedia communication</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Query</term>
<term>Speech processing</term>
<term>Speech recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Communication multimédia</term>
<term>Reconnaissance optique caractère</term>
<term>Recherche information</term>
<term>Requête</term>
<term>Reconnaissance caractère</term>
<term>Evaluation performance</term>
<term>Reconnaissance automatique</term>
<term>Reconnaissance parole</term>
<term>Reconnaissance forme</term>
<term>Traitement parole</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We present a set of experiments with a video OCR system (VOCR) tailored for video information retrieval and establish its importance in multimedia search in general and for some specific queries in particular. The system, inspired by an existing work on text detection and recognition in images, has been developed using techniques involving detailed analysis of video frames producing candidate text regions. The text regions are then binarized and sent to a commercial OCR resulting in ASCII text, that is finally used to create search indexes. The system is evaluated using the TRECVID data. We compare the system's performance from an information retrieval perspective with another VOCR developed using multi-frame integration and empirically demonstrate that deep analysis on individual video frames result in better video retrieval. We also evaluate the effect of various textual sources on multimedia retrieval by combining the VOCR outputs with automatic speech recognition (ASR) transcripts. For general search queries, the VOCR system coupled with ASR sources outperforms the other system by a very large extent. For search queries that involve named entities, especially people names, the VOCR system even outperforms speech transcripts, demonstrating that source selection for particular query types is extremely essential.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Pennsylvanie</li>
</region>
<settlement><li>Pittsburgh</li>
</settlement>
<orgName><li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="Pennsylvanie"><name sortKey="Das, Dipanjan" sort="Das, Dipanjan" uniqKey="Das D" first="Dipanjan" last="Das">Dipanjan Das</name>
</region>
<name sortKey="Datong Chen" sort="Datong Chen" uniqKey="Datong Chen" last="Datong Chen">DATONG CHEN</name>
<name sortKey="Hauptmann, Alexander G" sort="Hauptmann, Alexander G" uniqKey="Hauptmann A" first="Alexander G." last="Hauptmann">Alexander G. Hauptmann</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D13 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D13 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:08-0426688
   |texte=   Improving Multimedia Retrieval with a Video OCR
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Improving Multimedia Retrieval with a Video OCR

Improving Multimedia Retrieval with a Video OCR

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri