Compilation of dictionaries for semantic attribute analysis of television news captions
Identifieur interne : 001810 ( Main/Merge ); précédent : 001809; suivant : 001811Compilation of dictionaries for semantic attribute analysis of television news captions
Auteurs : Ichiro Ide [Japon] ; Reiko Hamada [Japon] ; Shuichi Sakai [Japon] ; Hidehiko Tanaka [Japon]Source :
- Systems and Computers in Japan [ 0882-1666 ] ; 2003-11-15.
English descriptors
- KwdEn :
Abstract
With the increase in the amount of video that is broadcast daily, there is an increasing need for storage of video in a systematic way for future reuse and retrieval. In particular, from the viewpoint of importance and usability, it is desirable to index news videos. For adequate automatic indexing based on the text information in the video, it is not sufficient to apply the simple index extraction and annotation methods which have been widely used in conventional methods. It is important to select index candidates with reference to semantic attributes. The purpose of this study is to compile dictionaries which are needed for analyzing the semantic attributes of captions (noun phrases) in TV news videos. We describe the process by which words are extracted from text corpora and a thesaurus for storage on the basis of specified conditions. The quality of the dictionaries is examined by analysis of the semantic attributes of the words appearing in actual news videos, and the results are presented. In evaluation experiments in which an existing proper noun dictionary and temporal noun dictionary were combined and used, a recall of 79 to 93% and a precision of 41 to 71% were obtained. Although the precision is low in this result, it is concluded that the compiled dictionaries are of practical use for indexing since the recall is more important in that case. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(12): 32–44, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10417
Url:
DOI: 10.1002/scj.10417
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000736
- to stream Istex, to step Curation: 000728
- to stream Istex, to step Checkpoint: 000F06
Links to Exploration step
ISTEX:C0C0323E3979ABF70A5D71C305052E35B494F8AALe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Compilation of dictionaries for semantic attribute analysis of television news captions</title>
<author><name sortKey="Ide, Ichiro" sort="Ide, Ichiro" uniqKey="Ide I" first="Ichiro" last="Ide">Ichiro Ide</name>
</author>
<author><name sortKey="Hamada, Reiko" sort="Hamada, Reiko" uniqKey="Hamada R" first="Reiko" last="Hamada">Reiko Hamada</name>
</author>
<author><name sortKey="Sakai, Shuichi" sort="Sakai, Shuichi" uniqKey="Sakai S" first="Shuichi" last="Sakai">Shuichi Sakai</name>
</author>
<author><name sortKey="Tanaka, Hidehiko" sort="Tanaka, Hidehiko" uniqKey="Tanaka H" first="Hidehiko" last="Tanaka">Hidehiko Tanaka</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:C0C0323E3979ABF70A5D71C305052E35B494F8AA</idno>
<date when="2003" year="2003">2003</date>
<idno type="doi">10.1002/scj.10417</idno>
<idno type="url">https://api.istex.fr/document/C0C0323E3979ABF70A5D71C305052E35B494F8AA/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000736</idno>
<idno type="wicri:Area/Istex/Curation">000728</idno>
<idno type="wicri:Area/Istex/Checkpoint">000F06</idno>
<idno type="wicri:doubleKey">0882-1666:2003:Ide I:compilation:of:dictionaries</idno>
<idno type="wicri:Area/Main/Merge">001810</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Compilation of dictionaries for semantic attribute analysis of television news captions</title>
<author><name sortKey="Ide, Ichiro" sort="Ide, Ichiro" uniqKey="Ide I" first="Ichiro" last="Ide">Ichiro Ide</name>
<affiliation wicri:level="3"><country xml:lang="fr">Japon</country>
<wicri:regionArea>National Institute of Informatics, Tokyo</wicri:regionArea>
<placeName><settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Hamada, Reiko" sort="Hamada, Reiko" uniqKey="Hamada R" first="Reiko" last="Hamada">Reiko Hamada</name>
<affiliation wicri:level="3"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Graduate School of Engineering, The University of Tokyo, Tokyo</wicri:regionArea>
<placeName><settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Sakai, Shuichi" sort="Sakai, Shuichi" uniqKey="Sakai S" first="Shuichi" last="Sakai">Shuichi Sakai</name>
<affiliation wicri:level="3"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Graduate School of Information Science and Technology, The University of Tokyo, Tokyo</wicri:regionArea>
<placeName><settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Tanaka, Hidehiko" sort="Tanaka, Hidehiko" uniqKey="Tanaka H" first="Hidehiko" last="Tanaka">Hidehiko Tanaka</name>
<affiliation wicri:level="3"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Graduate School of Information Science and Technology, The University of Tokyo, Tokyo</wicri:regionArea>
<placeName><settlement type="city">Tokyo</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Systems and Computers in Japan</title>
<title level="j" type="abbrev">Syst. Comp. Jpn.</title>
<idno type="ISSN">0882-1666</idno>
<idno type="eISSN">1520-684X</idno>
<imprint><publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2003-11-15">2003-11-15</date>
<biblScope unit="volume">34</biblScope>
<biblScope unit="issue">12</biblScope>
<biblScope unit="page" from="32">32</biblScope>
<biblScope unit="page" to="44">44</biblScope>
</imprint>
<idno type="ISSN">0882-1666</idno>
</series>
<idno type="istex">C0C0323E3979ABF70A5D71C305052E35B494F8AA</idno>
<idno type="DOI">10.1002/scj.10417</idno>
<idno type="ArticleID">SCJ10417</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0882-1666</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>caption</term>
<term>dictionary</term>
<term>indexing</term>
<term>semantic attribute</term>
<term>suffix noun</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">With the increase in the amount of video that is broadcast daily, there is an increasing need for storage of video in a systematic way for future reuse and retrieval. In particular, from the viewpoint of importance and usability, it is desirable to index news videos. For adequate automatic indexing based on the text information in the video, it is not sufficient to apply the simple index extraction and annotation methods which have been widely used in conventional methods. It is important to select index candidates with reference to semantic attributes. The purpose of this study is to compile dictionaries which are needed for analyzing the semantic attributes of captions (noun phrases) in TV news videos. We describe the process by which words are extracted from text corpora and a thesaurus for storage on the basis of specified conditions. The quality of the dictionaries is examined by analysis of the semantic attributes of the words appearing in actual news videos, and the results are presented. In evaluation experiments in which an existing proper noun dictionary and temporal noun dictionary were combined and used, a recall of 79 to 93% and a precision of 41 to 71% were obtained. Although the precision is low in this result, it is concluded that the compiled dictionaries are of practical use for indexing since the recall is more important in that case. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(12): 32–44, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10417</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001810 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001810 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:C0C0323E3979ABF70A5D71C305052E35B494F8AA |texte= Compilation of dictionaries for semantic attribute analysis of television news captions }}
This area was generated with Dilib version V0.6.32. |