Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Text Mining via information extraction

Identifieur interne : 000193 ( PascalFrancis/Corpus ); précédent : 000192; suivant : 000194

Text Mining via information extraction

Auteurs : R. Feldman ; Y. Aumann ; M. Fresko ; O. Liphstat ; B. Rosenfeld ; Y. Schler

Source :

RBID : Pascal:99-0549347

Descripteurs français

English descriptors

Abstract

Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 0302-9743
A05       @2 1704
A08 01  1  ENG  @1 Text Mining via information extraction
A09 01  1  ENG  @1 PKDD'99 : principles of data mining and knowledge discovery : Prague, 15-18 September 1999
A11 01  1    @1 FELDMAN (R.)
A11 02  1    @1 AUMANN (Y.)
A11 03  1    @1 FRESKO (M.)
A11 04  1    @1 LIPHSTAT (O.)
A11 05  1    @1 ROSENFELD (B.)
A11 06  1    @1 SCHLER (Y.)
A12 01  1    @1 ZYTKOW (Jan M.) @9 ed.
A12 02  1    @1 RAUCH (Jan) @9 ed.
A14 01      @1 Department of Mathematics and Computer Science, Bar-Ilan University @2 Ramat-Gan @3 ISR @Z 1 aut. @Z 2 aut. @Z 3 aut. @Z 4 aut. @Z 5 aut. @Z 6 aut.
A20       @1 165-173
A21       @1 1999
A23 01      @0 ENG
A26 01      @0 3-540-66490-4
A43 01      @1 INIST @2 16343 @5 354000084589530180
A44       @0 0000 @1 © 1999 INIST-CNRS. All rights reserved.
A45       @0 7 ref.
A47 01  1    @0 99-0549347
A60       @1 P @2 C
A61       @0 A
A64 01  1    @0 Lecture notes in computer science
A66 01      @0 DEU
A66 02      @0 USA
C01 01    ENG  @0 Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.
C02 01  X    @0 001D02B07D
C03 01  X  FRE  @0 Système intelligent @5 01
C03 01  X  ENG  @0 Intelligent system @5 01
C03 01  X  SPA  @0 Sistema inteligente @5 01
C03 02  X  FRE  @0 Système information @5 02
C03 02  X  ENG  @0 Information system @5 02
C03 02  X  SPA  @0 Sistema información @5 02
C03 03  X  FRE  @0 Traitement information @5 03
C03 03  X  ENG  @0 Information processing @5 03
C03 03  X  SPA  @0 Procesamiento información @5 03
C03 04  X  FRE  @0 Traitement document @5 04
C03 04  X  ENG  @0 Document processing @5 04
C03 04  X  SPA  @0 Tratamiento documento @5 04
C03 05  X  FRE  @0 Extraction information @5 05
C03 05  X  ENG  @0 Information extraction @5 05
C03 05  X  SPA  @0 Extractión información @5 05
C03 06  X  FRE  @0 Règle association @4 INC @5 82
N21       @1 355
pR  
A30 01  1  ENG  @1 Principles of data mining and knowledge discovery. European conference @2 3 @3 Prague CZE @4 1999-09-15

Format Inist (serveur)

NO : PASCAL 99-0549347 INIST
ET : Text Mining via information extraction
AU : FELDMAN (R.); AUMANN (Y.); FRESKO (M.); LIPHSTAT (O.); ROSENFELD (B.); SCHLER (Y.); ZYTKOW (Jan M.); RAUCH (Jan)
AF : Department of Mathematics and Computer Science, Bar-Ilan University/Ramat-Gan/Israël (1 aut., 2 aut., 3 aut., 4 aut., 5 aut., 6 aut.)
DT : Publication en série; Congrès; Niveau analytique
SO : Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 1999; Vol. 1704; Pp. 165-173; Bibl. 7 ref.
LA : Anglais
EA : Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.
CC : 001D02B07D
FD : Système intelligent; Système information; Traitement information; Traitement document; Extraction information; Règle association
ED : Intelligent system; Information system; Information processing; Document processing; Information extraction
SD : Sistema inteligente; Sistema información; Procesamiento información; Tratamiento documento; Extractión información
LO : INIST-16343.354000084589530180
ID : 99-0549347

Links to Exploration step

Pascal:99-0549347

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Text Mining via information extraction</title>
<author>
<name sortKey="Feldman, R" sort="Feldman, R" uniqKey="Feldman R" first="R." last="Feldman">R. Feldman</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Aumann, Y" sort="Aumann, Y" uniqKey="Aumann Y" first="Y." last="Aumann">Y. Aumann</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Fresko, M" sort="Fresko, M" uniqKey="Fresko M" first="M." last="Fresko">M. Fresko</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Liphstat, O" sort="Liphstat, O" uniqKey="Liphstat O" first="O." last="Liphstat">O. Liphstat</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Rosenfeld, B" sort="Rosenfeld, B" uniqKey="Rosenfeld B" first="B." last="Rosenfeld">B. Rosenfeld</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Schler, Y" sort="Schler, Y" uniqKey="Schler Y" first="Y." last="Schler">Y. Schler</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">99-0549347</idno>
<date when="1999">1999</date>
<idno type="stanalyst">PASCAL 99-0549347 INIST</idno>
<idno type="RBID">Pascal:99-0549347</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000193</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Text Mining via information extraction</title>
<author>
<name sortKey="Feldman, R" sort="Feldman, R" uniqKey="Feldman R" first="R." last="Feldman">R. Feldman</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Aumann, Y" sort="Aumann, Y" uniqKey="Aumann Y" first="Y." last="Aumann">Y. Aumann</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Fresko, M" sort="Fresko, M" uniqKey="Fresko M" first="M." last="Fresko">M. Fresko</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Liphstat, O" sort="Liphstat, O" uniqKey="Liphstat O" first="O." last="Liphstat">O. Liphstat</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Rosenfeld, B" sort="Rosenfeld, B" uniqKey="Rosenfeld B" first="B." last="Rosenfeld">B. Rosenfeld</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Schler, Y" sort="Schler, Y" uniqKey="Schler Y" first="Y." last="Schler">Y. Schler</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint>
<date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Document processing</term>
<term>Information extraction</term>
<term>Information processing</term>
<term>Information system</term>
<term>Intelligent system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Système intelligent</term>
<term>Système information</term>
<term>Traitement information</term>
<term>Traitement document</term>
<term>Extraction information</term>
<term>Règle association</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0302-9743</s0>
</fA01>
<fA05>
<s2>1704</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG">
<s1>Text Mining via information extraction</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>PKDD'99 : principles of data mining and knowledge discovery : Prague, 15-18 September 1999</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>FELDMAN (R.)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>AUMANN (Y.)</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>FRESKO (M.)</s1>
</fA11>
<fA11 i1="04" i2="1">
<s1>LIPHSTAT (O.)</s1>
</fA11>
<fA11 i1="05" i2="1">
<s1>ROSENFELD (B.)</s1>
</fA11>
<fA11 i1="06" i2="1">
<s1>SCHLER (Y.)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>ZYTKOW (Jan M.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>RAUCH (Jan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</fA14>
<fA20>
<s1>165-173</s1>
</fA20>
<fA21>
<s1>1999</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA26 i1="01">
<s0>3-540-66490-4</s0>
</fA26>
<fA43 i1="01">
<s1>INIST</s1>
<s2>16343</s2>
<s5>354000084589530180</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 1999 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>7 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>99-0549347</s0>
</fA47>
<fA60>
<s1>P</s1>
<s2>C</s2>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Lecture notes in computer science</s0>
</fA64>
<fA66 i1="01">
<s0>DEU</s0>
</fA66>
<fA66 i1="02">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D02B07D</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Système intelligent</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Intelligent system</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Sistema inteligente</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Système information</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Information system</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Sistema información</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Traitement information</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Information processing</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Procesamiento información</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Traitement document</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Document processing</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Tratamiento documento</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Extraction information</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Information extraction</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Extractión información</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Règle association</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fN21>
<s1>355</s1>
</fN21>
</pA>
<pR>
<fA30 i1="01" i2="1" l="ENG">
<s1>Principles of data mining and knowledge discovery. European conference</s1>
<s2>3</s2>
<s3>Prague CZE</s3>
<s4>1999-09-15</s4>
</fA30>
</pR>
</standard>
<server>
<NO>PASCAL 99-0549347 INIST</NO>
<ET>Text Mining via information extraction</ET>
<AU>FELDMAN (R.); AUMANN (Y.); FRESKO (M.); LIPHSTAT (O.); ROSENFELD (B.); SCHLER (Y.); ZYTKOW (Jan M.); RAUCH (Jan)</AU>
<AF>Department of Mathematics and Computer Science, Bar-Ilan University/Ramat-Gan/Israël (1 aut., 2 aut., 3 aut., 4 aut., 5 aut., 6 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 1999; Vol. 1704; Pp. 165-173; Bibl. 7 ref.</SO>
<LA>Anglais</LA>
<EA>Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.</EA>
<CC>001D02B07D</CC>
<FD>Système intelligent; Système information; Traitement information; Traitement document; Extraction information; Règle association</FD>
<ED>Intelligent system; Information system; Information processing; Document processing; Information extraction</ED>
<SD>Sistema inteligente; Sistema información; Procesamiento información; Tratamiento documento; Extractión información</SD>
<LO>INIST-16343.354000084589530180</LO>
<ID>99-0549347</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000193 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000193 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:99-0549347
   |texte=   Text Mining via information extraction
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021