Serveur d'exploration sur la visibilité du Havre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Extraction and clustering for regularities identification : application to dialogues analysis

Identifieur interne : 000096 ( Hal/Checkpoint ); précédent : 000095; suivant : 000097

Extraction and clustering for regularities identification : application to dialogues analysis

Auteurs : Zacharie Ales [France]

Source :

RBID : Hal:tel-01165590

Descripteurs français

English descriptors

Abstract

In the context of dialogue analysis, a corpus of dialogues can be represented as a set of arrays of annotations encoding the dialogue utterances. In order to identify the frequently used dialogue schemes, we design a two-step methodology in which recurrent patterns are first extracted and then partitioned into homogenous classes constituting the regularities. Two methods are developed to extract recurrent patterns: LPCA-DC and SABRE. The former is an adaptation of a dynamic programming algorithm whereas the latter is obtained from a formal modeling of the extraction of local alignment problem in annotations arrays.The partitioning of recurrent patterns is realised using various heuristics from the literature as well as two original formulations of the K-partitioning problem in the form of mixed integer linear programs. Throughout a polyhedral study of a polyhedron associated to these formulations, facets are characterized (in particular: 2-chorded cycle inequalities, 2-partition inequalities and general clique inequalities). These theoretical results allow the establishment of an efficient cutting plane algorithm.We developed a decision support software called VIESA which implements these different methods and allows their evaluation during two experiments realised by a psychologist. Thus, regularities corresponding to dialogical strategies that previous manual extractions failed to identify are obtained.

Url:

Links toward previous steps (curation, corpus...)


Links to Exploration step

Hal:tel-01165590

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Extraction and clustering for regularities identification : application to dialogues analysis</title>
<title xml:lang="fr">Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues</title>
<author>
<name sortKey="Ales, Zacharie" sort="Ales, Zacharie" uniqKey="Ales Z" first="Zacharie" last="Ales">Zacharie Ales</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:tel-01165590</idno>
<idno type="halId">tel-01165590</idno>
<idno type="halUri">https://tel.archives-ouvertes.fr/tel-01165590</idno>
<idno type="url">https://tel.archives-ouvertes.fr/tel-01165590</idno>
<date when="2014-11-28">2014-11-28</date>
<idno type="wicri:Area/Hal/Corpus">000121</idno>
<idno type="wicri:Area/Hal/Curation">000121</idno>
<idno type="wicri:Area/Hal/Checkpoint">000096</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Extraction and clustering for regularities identification : application to dialogues analysis</title>
<title xml:lang="fr">Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues</title>
<author>
<name sortKey="Ales, Zacharie" sort="Ales, Zacharie" uniqKey="Ales Z" first="Zacharie" last="Ales">Zacharie Ales</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="en">
<term>Combinatorial optimization</term>
<term>Data mining</term>
<term>Regularity extraction</term>
</keywords>
<keywords scheme="mix" xml:lang="fr">
<term>Approche polyèdrale</term>
<term>Extraction de régularités</term>
<term>K-partitionnement</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In the context of dialogue analysis, a corpus of dialogues can be represented as a set of arrays of annotations encoding the dialogue utterances. In order to identify the frequently used dialogue schemes, we design a two-step methodology in which recurrent patterns are first extracted and then partitioned into homogenous classes constituting the regularities. Two methods are developed to extract recurrent patterns: LPCA-DC and SABRE. The former is an adaptation of a dynamic programming algorithm whereas the latter is obtained from a formal modeling of the extraction of local alignment problem in annotations arrays.The partitioning of recurrent patterns is realised using various heuristics from the literature as well as two original formulations of the K-partitioning problem in the form of mixed integer linear programs. Throughout a polyhedral study of a polyhedron associated to these formulations, facets are characterized (in particular: 2-chorded cycle inequalities, 2-partition inequalities and general clique inequalities). These theoretical results allow the establishment of an efficient cutting plane algorithm.We developed a decision support software called VIESA which implements these different methods and allows their evaluation during two experiments realised by a psychologist. Thus, regularities corresponding to dialogical strategies that previous manual extractions failed to identify are obtained.</div>
</front>
</TEI>
<hal api="V3">
<titleStmt>
<title xml:lang="en">Extraction and clustering for regularities identification : application to dialogues analysis</title>
<title xml:lang="fr">Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues</title>
<author role="aut">
<persName>
<forename type="first">Zacharie</forename>
<forename type="middle">Ales</forename>
<surname>Ales</surname>
</persName>
<email></email>
<idno type="idhal">zacharie-ales</idno>
<idno type="halauthor">1179118</idno>
<idno type="IdRef">http://www.idref.fr/182408558</idno>
<idno type="ORCID">http://orcid.org/0000-0003-4602-2638</idno>
<affiliation ref="#struct-23832"></affiliation>
<affiliation ref="#struct-90"></affiliation>
</author>
<editor role="depositor">
<persName>
<forename>ABES</forename>
<surname>STAR</surname>
</persName>
<email>thelec@abes.fr</email>
</editor>
</titleStmt>
<editionStmt>
<edition n="v1" type="current">
<date type="whenSubmitted">2015-06-19 14:55:08</date>
<date type="whenModified">2016-02-09 20:19:15</date>
<date type="whenReleased">2015-06-19 15:32:00</date>
<date type="whenProduced">2014-11-28</date>
<date type="whenEndEmbargoed">2015-06-19</date>
<ref type="file" target="https://tel.archives-ouvertes.fr/tel-01165590/document">
<date notBefore="2015-06-19"></date>
</ref>
<ref type="file" subtype="author" n="1" target="https://tel.archives-ouvertes.fr/tel-01165590/file/Zacharie_ALES.pdf">
<date notBefore="2015-06-19"></date>
</ref>
</edition>
<respStmt>
<resp>contributor</resp>
<name key="131274">
<persName>
<forename>ABES</forename>
<surname>STAR</surname>
</persName>
<email>thelec@abes.fr</email>
</name>
</respStmt>
</editionStmt>
<publicationStmt>
<distributor>CCSD</distributor>
<idno type="halId">tel-01165590</idno>
<idno type="halUri">https://tel.archives-ouvertes.fr/tel-01165590</idno>
<idno type="halBibtex">ales:tel-01165590</idno>
<idno type="halRefHtml">Mathématiques générales [math.GM]. INSA de Rouen, 2014. Français. <NNT : 2014ISAM0015></idno>
<idno type="halRef">Mathématiques générales [math.GM]. INSA de Rouen, 2014. Français. <NNT : 2014ISAM0015></idno>
</publicationStmt>
<seriesStmt>
<idno type="stamp" n="STAR">STAR - Dépôt national des thèses électroniques</idno>
<idno type="stamp" n="UNIV-LEHAVRE">Université du Havre</idno>
<idno type="stamp" n="UNIV-ROUEN">Université de Rouen</idno>
<idno type="stamp" n="INSMI">CNRS-INSMI - INstitut des Sciences Mathématiques et de leurs Interactions</idno>
<idno type="stamp" n="LITIS">Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</idno>
<idno type="stamp" n="THESES-EN-LIGNE-DAGROCAMPUS-OUEST">Thèses d'AGROCAMPUS OUEST</idno>
<idno type="stamp" n="COMUE-NORMANDIE">Normandie Université</idno>
<idno type="stamp" n="LMI-ROUEN">Laboratoire de Mathématiques de l'INSA Rouen - Normandie Université - INSA Rouen</idno>
</seriesStmt>
<notesStmt></notesStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Extraction and clustering for regularities identification : application to dialogues analysis</title>
<title xml:lang="fr">Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues</title>
<author role="aut">
<persName>
<forename type="first">Zacharie</forename>
<forename type="middle">Ales</forename>
<surname>Ales</surname>
</persName>
<idno type="idHal">zacharie-ales</idno>
<idno type="halAuthorId">1179118</idno>
<idno type="IdRef">http://www.idref.fr/182408558</idno>
<idno type="ORCID">http://orcid.org/0000-0003-4602-2638</idno>
<affiliation ref="#struct-23832"></affiliation>
<affiliation ref="#struct-90"></affiliation>
</author>
</analytic>
<monogr>
<idno type="nnt">2014ISAM0015</idno>
<imprint>
<date type="dateDefended">2014-11-28</date>
</imprint>
<authority type="institution">INSA de Rouen</authority>
<authority type="school">Ecole Doctorale Sciences Physiques Mathématiques et de l'Information pour l'ingénieur (Saint-Etienne-du-Rouvray, Seine-Maritime)</authority>
<authority type="supervisor">Laurent Vercouter</authority>
<authority type="supervisor">Christian Gout</authority>
<authority type="jury">Colin De La Higuera [Président]</authority>
<authority type="jury">Martine Labbé [Rapporteur]</authority>
<authority type="jury">Ali Ridha Mahjoub [Rapporteur]</authority>
<authority type="jury">Olivier Pietquin [Rapporteur]</authority>
<authority type="jury">Arnaud Knippel</authority>
<authority type="jury">Alexandre Pauchet</authority>
</monogr>
</biblStruct>
</sourceDesc>
<profileDesc>
<langUsage>
<language ident="fr">French</language>
</langUsage>
<textClass>
<keywords scheme="author">
<term xml:lang="en">Combinatorial optimization</term>
<term xml:lang="en">Regularity extraction</term>
<term xml:lang="en">Data mining</term>
<term xml:lang="fr">Extraction de régularités</term>
<term xml:lang="fr">K-partitionnement</term>
<term xml:lang="fr">Approche polyèdrale</term>
</keywords>
<classCode scheme="halDomain" n="math.math-gm">Mathematics [math]/General Mathematics [math.GM]</classCode>
<classCode scheme="halDomain" n="info.info-cl">Computer Science [cs]/Computation and Language [cs.CL]</classCode>
<classCode scheme="halTypology" n="THESE">Theses</classCode>
</textClass>
<abstract xml:lang="en">In the context of dialogue analysis, a corpus of dialogues can be represented as a set of arrays of annotations encoding the dialogue utterances. In order to identify the frequently used dialogue schemes, we design a two-step methodology in which recurrent patterns are first extracted and then partitioned into homogenous classes constituting the regularities. Two methods are developed to extract recurrent patterns: LPCA-DC and SABRE. The former is an adaptation of a dynamic programming algorithm whereas the latter is obtained from a formal modeling of the extraction of local alignment problem in annotations arrays.The partitioning of recurrent patterns is realised using various heuristics from the literature as well as two original formulations of the K-partitioning problem in the form of mixed integer linear programs. Throughout a polyhedral study of a polyhedron associated to these formulations, facets are characterized (in particular: 2-chorded cycle inequalities, 2-partition inequalities and general clique inequalities). These theoretical results allow the establishment of an efficient cutting plane algorithm.We developed a decision support software called VIESA which implements these different methods and allows their evaluation during two experiments realised by a psychologist. Thus, regularities corresponding to dialogical strategies that previous manual extractions failed to identify are obtained.</abstract>
<abstract xml:lang="fr">Dans le cadre de l’aide à l’analyse de dialogues, un corpus de dialogues peut être représenté par un ensemble de tableaux d’annotations encodant les différents énoncés des dialogues. Afin d’identifier des schémas dialogiques mis en oeuvre fréquemment, nous définissons une méthodologie en deux étapes : extraction de motifs récurrents, puis partitionnement de ces motifs en classes homogènes constituant ces régularités. Deux méthodes sont développées afin de réaliser l’extraction de motifs récurrents : LPCADC et SABRE. La première est une adaptation d’un algorithme de programmation dynamique tandis que la seconde est issue d’une modélisation formelle du problème d’extraction d’alignements locaux dans un couple de tableaux d’annotations.Le partitionnement de motifs récurrents est réalisé par diverses heuristiques de la littérature ainsi que deux formulations originales du problème de K-partitionnement sous la forme de programmes linéaires en nombres entiers. Lors d’une étude polyèdrale, nous caractérisons des facettes d’un polyèdre associé à ces formulations (notamment les inégalités de 2-partitions, les inégalités 2-chorded cycles et les inégalités de clique généralisées). Ces résultats théoriques permettent la mise en place d’un algorithme de plans coupants résolvant efficacement le problème.Nous développons le logiciel d’aide à la décision VIESA, mettant en oeuvre ces différentes méthodes et permettant leur évaluation au cours de deux expérimentations réalisées par un expert psychologue. Des régularités correspondant à des stratégies dialogiques que des extractions manuelles n’avaient pas permis d’obtenir sont ainsi identifiées.</abstract>
</profileDesc>
</hal>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/France/explor/LeHavreV1/Data/Hal/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000096 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Hal/Checkpoint/biblio.hfd -nk 000096 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/France
   |area=    LeHavreV1
   |flux=    Hal
   |étape=   Checkpoint
   |type=    RBID
   |clé=     Hal:tel-01165590
   |texte=   Extraction and clustering for regularities identification : application to dialogues analysis
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 14:37:02 2016. Site generation: Tue Mar 5 08:25:07 2024