SgmlV1, PascalFrancis, Corpus, bibRecord, 000193

Text Mining via information extraction

Identifieur interne : 000193 ( PascalFrancis/Corpus ); précédent : 000192; suivant : 000194

Text Mining via information extraction

Auteurs : R. Feldman ; Y. Aumann ; M. Fresko ; O. Liphstat ; B. Rosenfeld ; Y. Schler

Source :

Lecture notes in computer science [ 0302-9743 ] ; 1999.

RBID : Pascal:99-0549347

Descripteurs français

Pascal (Inist)
- Système intelligent, Système information, Traitement information, Traitement document, Extraction information, Règle association.

English descriptors

KwdEn :
- Document processing, Information extraction, Information processing, Information system, Intelligent system.

Abstract

Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0302-9743`
A05				`@2 1704`
A08	`01`	`1`	`ENG`	`@1 Text Mining via information extraction`
A09	`01`	`1`	`ENG`	`@1 PKDD'99 : principles of data mining and knowledge discovery : Prague, 15-18 September 1999`
A11	`01`	`1`		`@1 FELDMAN (R.)`
A11	`02`	`1`		`@1 AUMANN (Y.)`
A11	`03`	`1`		`@1 FRESKO (M.)`
A11	`04`	`1`		`@1 LIPHSTAT (O.)`
A11	`05`	`1`		`@1 ROSENFELD (B.)`
A11	`06`	`1`		`@1 SCHLER (Y.)`
A12	`01`	`1`		`@1 ZYTKOW (Jan M.) @9 ed.`
A12	`02`	`1`		`@1 RAUCH (Jan) @9 ed.`
A14	`01`			`@1 Department of Mathematics and Computer Science, Bar-Ilan University @2 Ramat-Gan @3 ISR @Z 1 aut. @Z 2 aut. @Z 3 aut. @Z 4 aut. @Z 5 aut. @Z 6 aut.`
A20				`@1 165-173`
A21				`@1 1999`
A23	`01`			`@0 ENG`
A26	`01`			`@0 3-540-66490-4`
A43	`01`			`@1 INIST @2 16343 @5 354000084589530180`
A44				`@0 0000 @1 © 1999 INIST-CNRS. All rights reserved.`
A45				`@0 7 ref.`
A47	`01`	`1`		`@0 99-0549347`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Lecture notes in computer science`
A66	`01`			`@0 DEU`
A66	`02`			`@0 USA`
C01	`01`		`ENG`	@0 Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.
C02	`01`	`X`		`@0 001D02B07D`
C03	`01`	`X`	`FRE`	`@0 Système intelligent @5 01`
C03	`01`	`X`	`ENG`	`@0 Intelligent system @5 01`
C03	`01`	`X`	`SPA`	`@0 Sistema inteligente @5 01`
C03	`02`	`X`	`FRE`	`@0 Système information @5 02`
C03	`02`	`X`	`ENG`	`@0 Information system @5 02`
C03	`02`	`X`	`SPA`	`@0 Sistema información @5 02`
C03	`03`	`X`	`FRE`	`@0 Traitement information @5 03`
C03	`03`	`X`	`ENG`	`@0 Information processing @5 03`
C03	`03`	`X`	`SPA`	`@0 Procesamiento información @5 03`
C03	`04`	`X`	`FRE`	`@0 Traitement document @5 04`
C03	`04`	`X`	`ENG`	`@0 Document processing @5 04`
C03	`04`	`X`	`SPA`	`@0 Tratamiento documento @5 04`
C03	`05`	`X`	`FRE`	`@0 Extraction information @5 05`
C03	`05`	`X`	`ENG`	`@0 Information extraction @5 05`
C03	`05`	`X`	`SPA`	`@0 Extractión información @5 05`
C03	`06`	`X`	`FRE`	`@0 Règle association @4 INC @5 82`
N21				`@1 355`

A30	`01`	`1`	`ENG`	`@1 Principles of data mining and knowledge discovery. European conference @2 3 @3 Prague CZE @4 1999-09-15`

Format Inist (serveur)

NO :	PASCAL 99-0549347 INIST
ET :	Text Mining via information extraction
AU :	FELDMAN (R.); AUMANN (Y.); FRESKO (M.); LIPHSTAT (O.); ROSENFELD (B.); SCHLER (Y.); ZYTKOW (Jan M.); RAUCH (Jan)
AF :	Department of Mathematics and Computer Science, Bar-Ilan University/Ramat-Gan/Israël (1 aut., 2 aut., 3 aut., 4 aut., 5 aut., 6 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 1999; Vol. 1704; Pp. 165-173; Bibl. 7 ref.
LA :	Anglais
EA :	Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.
CC :	001D02B07D
FD :	Système intelligent; Système information; Traitement information; Traitement document; Extraction information; Règle association
ED :	Intelligent system; Information system; Information processing; Document processing; Information extraction
SD :	Sistema inteligente; Sistema información; Procesamiento información; Tratamiento documento; Extractión información
LO :	INIST-16343.354000084589530180
ID :	99-0549347

Links to Exploration step

Pascal:99-0549347

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Text Mining via information extraction</title>
<author><name sortKey="Feldman, R" sort="Feldman, R" uniqKey="Feldman R" first="R." last="Feldman">R. Feldman</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Aumann, Y" sort="Aumann, Y" uniqKey="Aumann Y" first="Y." last="Aumann">Y. Aumann</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Fresko, M" sort="Fresko, M" uniqKey="Fresko M" first="M." last="Fresko">M. Fresko</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Liphstat, O" sort="Liphstat, O" uniqKey="Liphstat O" first="O." last="Liphstat">O. Liphstat</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Rosenfeld, B" sort="Rosenfeld, B" uniqKey="Rosenfeld B" first="B." last="Rosenfeld">B. Rosenfeld</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Schler, Y" sort="Schler, Y" uniqKey="Schler Y" first="Y." last="Schler">Y. Schler</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">99-0549347</idno>
<date when="1999">1999</date>
<idno type="stanalyst">PASCAL 99-0549347 INIST</idno>
<idno type="RBID">Pascal:99-0549347</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000193</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Text Mining via information extraction</title>
<author><name sortKey="Feldman, R" sort="Feldman, R" uniqKey="Feldman R" first="R." last="Feldman">R. Feldman</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Aumann, Y" sort="Aumann, Y" uniqKey="Aumann Y" first="Y." last="Aumann">Y. Aumann</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Fresko, M" sort="Fresko, M" uniqKey="Fresko M" first="M." last="Fresko">M. Fresko</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Liphstat, O" sort="Liphstat, O" uniqKey="Liphstat O" first="O." last="Liphstat">O. Liphstat</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Rosenfeld, B" sort="Rosenfeld, B" uniqKey="Rosenfeld B" first="B." last="Rosenfeld">B. Rosenfeld</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Schler, Y" sort="Schler, Y" uniqKey="Schler Y" first="Y." last="Schler">Y. Schler</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Document processing</term>
<term>Information extraction</term>
<term>Information processing</term>
<term>Information system</term>
<term>Intelligent system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Système intelligent</term>
<term>Système information</term>
<term>Traitement information</term>
<term>Traitement document</term>
<term>Extraction information</term>
<term>Règle association</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are  that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0302-9743</s0>
</fA01>
<fA05><s2>1704</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>Text Mining via information extraction</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>PKDD'99 : principles of data mining and knowledge discovery : Prague, 15-18 September 1999</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>FELDMAN (R.)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>AUMANN (Y.)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>FRESKO (M.)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>LIPHSTAT (O.)</s1>
</fA11>
<fA11 i1="05" i2="1"><s1>ROSENFELD (B.)</s1>
</fA11>
<fA11 i1="06" i2="1"><s1>SCHLER (Y.)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>ZYTKOW (Jan M.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>RAUCH (Jan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Department of Mathematics and Computer Science, Bar-Ilan University</s1>
<s2>Ramat-Gan</s2>
<s3>ISR</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
</fA14>
<fA20><s1>165-173</s1>
</fA20>
<fA21><s1>1999</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA26 i1="01"><s0>3-540-66490-4</s0>
</fA26>
<fA43 i1="01"><s1>INIST</s1>
<s2>16343</s2>
<s5>354000084589530180</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 1999 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>7 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>99-0549347</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Lecture notes in computer science</s0>
</fA64>
<fA66 i1="01"><s0>DEU</s0>
</fA66>
<fA66 i1="02"><s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are  that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02B07D</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Système intelligent</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Intelligent system</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Sistema inteligente</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Système information</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Information system</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Sistema información</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Traitement information</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Information processing</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Procesamiento información</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Traitement document</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Document processing</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Tratamiento documento</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Extraction information</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Information extraction</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Extractión información</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Règle association</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fN21><s1>355</s1>
</fN21>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>Principles of data mining and knowledge discovery. European conference</s1>
<s2>3</s2>
<s3>Prague CZE</s3>
<s4>1999-09-15</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 99-0549347 INIST</NO>
<ET>Text Mining via information extraction</ET>
<AU>FELDMAN (R.); AUMANN (Y.); FRESKO (M.); LIPHSTAT (O.); ROSENFELD (B.); SCHLER (Y.); ZYTKOW (Jan M.); RAUCH (Jan)</AU>
<AF>Department of Mathematics and Computer Science, Bar-Ilan University/Ramat-Gan/Israël (1 aut., 2 aut., 3 aut., 4 aut., 5 aut., 6 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 1999; Vol. 1704; Pp. 165-173; Bibl. 7 ref.</SO>
<LA>Anglais</LA>
<EA>Knowledge Discovery in Databases (KDD), also known as data mining, focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form, Given a collection of text documents, most approaches to text mining perform knowledge-discovery operations on labels associated with each document. At one extreme, these labels are  that represent the results of non-trivial keyword-labeling processes, and, at the other extreme, these labels are nothing more than a list of the words within the documents of interest. This paper presents an intermediate approach, one that we call text mining via information extraction, in which knowledge discovery takes place on a more focused collection of events and phrases that are extracted from and label each document. These events plus additional higher-level entities are then organized in a hierarchical taxonomy and are used in the knowledge discovery process. This approach was implemented in the Textoscope system. Textoscope cpnsists of a document retrieval module which converts retrieved documents from their native formats into SGML documents used by Textoscope; an information extraction engine, which is based on a powerful attribute grammar which is augmented by a rich background knowledge; a taxonomy-creation tool by which the user can help specify higher-level entities that inform the knowledge-discovery process; and a set of knowledge-discovery tools for the resulting event-labeled documents. We evaluate our approach on a collection of newswire stories extracted by Textoscope's own agent. Our results confirm that Text Mining via information extraction serves as an accurate and powerful technique by which to manage knowledge encapsulated in large document collections.</EA>
<CC>001D02B07D</CC>
<FD>Système intelligent; Système information; Traitement information; Traitement document; Extraction information; Règle association</FD>
<ED>Intelligent system; Information system; Information processing; Document processing; Information extraction</ED>
<SD>Sistema inteligente; Sistema información; Procesamiento información; Tratamiento documento; Extractión información</SD>
<LO>INIST-16343.354000084589530180</LO>
<ID>99-0549347</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000193 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000193 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:99-0549347
   |texte=   Text Mining via information extraction
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021

	Serveur d'exploration sur SGML
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur SGML

Text Mining via information extraction

Text Mining via information extraction

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri