InforLorV4, PascalFrancis, Corpus, bibRecord, 000065

Classification structurée pour l'apprentissage par renforcement inverse

Identifieur interne : 000065 ( PascalFrancis/Corpus ); précédent : 000064; suivant : 000066

Classification structurée pour l'apprentissage par renforcement inverse

Auteurs : Edouard Klein ; Bilal Piot ; Matthieu Geist ; Olivier Pietquin

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2013.

RBID : Pascal:13-0216741

Descripteurs français

Pascal (Inist)
- Classification, Structure donnée, Apprentissage renforcé, Paramétrisation, Simulateur, Récompense, Politique, Automobile, Conduite véhicule, Structure interne, Algorithme apprentissage, Problème inverse, Problème direct, Méthode heuristique.

English descriptors

KwdEn :
- Classification, Data structure, Direct problem, Heuristic method, Internal structure, Inverse problem, Learning algorithm, Motor car, Parameterization, Policy, Reinforcement learning, Reward, Simulator, Vehicle driving.

Abstract

This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0992-499X`
A03		`1`		`@0 Rev. intell. artif.`
A05				`@2 27`
A06				`@2 2`
A08	`01`	`1`	`FRE`	`@1 Classification structurée pour l'apprentissage par renforcement inverse`
A09	`01`	`1`	`FRE`	`@1 Apprentissage par renforcement et planification adaptative`
A11	`01`	`1`		`@1 KLEIN (Edouard)`
A11	`02`	`1`		`@1 PIOT (Bilal)`
A11	`03`	`1`		`@1 GEIST (Matthieu)`
A11	`04`	`1`		`@1 PIETQUIN (Olivier)`
A12	`01`	`1`		`@1 ZANUTTINI (Bruno) @9 ed.`
A12	`02`	`1`		`@1 LAURENT (Guillaume) @9 ed.`
A12	`03`	`1`		`@1 BUFFET (Olivier) @9 ed.`
A14	`01`			`@1 LORIA - équipe ABC @2 Nancy @3 FRA @Z 1 aut.`
A14	`02`			`@1 Supélec - Groupe de recherche IMS-MaLIS @2 Metz @3 FRA @Z 1 aut. @Z 2 aut. @Z 3 aut. @Z 4 aut.`
A14	`03`			`@1 UMI 2958 (GeorgiaTech-CNRS) @2 Metz @3 FRA @Z 2 aut. @Z 4 aut.`
A15	`01`			`@1 Greyc/UCBN @2 Caen @3 FRA @Z 1 aut.`
A15	`02`			`@1 Institut FEMTO-ST/ENSMM @2 Besançon @3 FRA @Z 2 aut.`
A15	`03`			`@1 LORIA/INRIA @2 Nancy @3 FRA @Z 3 aut.`
A20				`@2 151, 155-169 [16 p.]`
A21				`@1 2013`
A23	`01`			`@0 FRE`
A24	`01`			`@0 eng`
A43	`01`			`@1 INIST @2 21320 @5 354000173351010010`
A44				`@0 0000 @1 © 2013 INIST-CNRS. All rights reserved.`
A45				`@0 1 p.`
A47	`01`	`1`		`@0 13-0216741`
A60				`@1 P`
A61				`@0 A`
A64	`01`	`1`		`@0 Revue d'intelligence artificielle`
A66	`01`			`@0 FRA`
A68	`01`	`1`	`ENG`	`@1 Structured classification for inverse reinforcement learning`
C01	`01`		`ENG`	@0 This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.
C02	`01`	`X`		`@0 001D02C02`
C02	`02`	`X`		`@0 001D15C`
C03	`01`	`X`	`FRE`	`@0 Classification @5 06`
C03	`01`	`X`	`ENG`	`@0 Classification @5 06`
C03	`01`	`X`	`SPA`	`@0 Clasificación @5 06`
C03	`02`	`X`	`FRE`	`@0 Structure donnée @5 07`
C03	`02`	`X`	`ENG`	`@0 Data structure @5 07`
C03	`02`	`X`	`SPA`	`@0 Estructura datos @5 07`
C03	`03`	`X`	`FRE`	`@0 Apprentissage renforcé @5 08`
C03	`03`	`X`	`ENG`	`@0 Reinforcement learning @5 08`
C03	`03`	`X`	`SPA`	`@0 Aprendizaje reforzado @5 08`
C03	`04`	`X`	`FRE`	`@0 Paramétrisation @5 09`
C03	`04`	`X`	`ENG`	`@0 Parameterization @5 09`
C03	`04`	`X`	`SPA`	`@0 Parametrización @5 09`
C03	`05`	`X`	`FRE`	`@0 Simulateur @5 10`
C03	`05`	`X`	`ENG`	`@0 Simulator @5 10`
C03	`05`	`X`	`SPA`	`@0 Simulador @5 10`
C03	`06`	`X`	`FRE`	`@0 Récompense @5 18`
C03	`06`	`X`	`ENG`	`@0 Reward @5 18`
C03	`06`	`X`	`SPA`	`@0 Recompensa @5 18`
C03	`07`	`X`	`FRE`	`@0 Politique @5 19`
C03	`07`	`X`	`ENG`	`@0 Policy @5 19`
C03	`07`	`X`	`SPA`	`@0 Política @5 19`
C03	`08`	`X`	`FRE`	`@0 Automobile @5 20`
C03	`08`	`X`	`ENG`	`@0 Motor car @5 20`
C03	`08`	`X`	`SPA`	`@0 Automóvil @5 20`
C03	`09`	`X`	`FRE`	`@0 Conduite véhicule @5 21`
C03	`09`	`X`	`ENG`	`@0 Vehicle driving @5 21`
C03	`09`	`X`	`SPA`	`@0 Conducción vehículo @5 21`
C03	`10`	`X`	`FRE`	`@0 Structure interne @5 23`
C03	`10`	`X`	`ENG`	`@0 Internal structure @5 23`
C03	`10`	`X`	`SPA`	`@0 Estructura interna @5 23`
C03	`11`	`X`	`FRE`	`@0 Algorithme apprentissage @5 24`
C03	`11`	`X`	`ENG`	`@0 Learning algorithm @5 24`
C03	`11`	`X`	`SPA`	`@0 Algoritmo aprendizaje @5 24`
C03	`12`	`X`	`FRE`	`@0 Problème inverse @5 25`
C03	`12`	`X`	`ENG`	`@0 Inverse problem @5 25`
C03	`12`	`X`	`SPA`	`@0 Problema inverso @5 25`
C03	`13`	`X`	`FRE`	`@0 Problème direct @5 26`
C03	`13`	`X`	`ENG`	`@0 Direct problem @5 26`
C03	`13`	`X`	`SPA`	`@0 Problema directo @5 26`
C03	`14`	`X`	`FRE`	`@0 Méthode heuristique @5 27`
C03	`14`	`X`	`ENG`	`@0 Heuristic method @5 27`
C03	`14`	`X`	`SPA`	`@0 Método heurístico @5 27`
N21				`@1 203`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

Format Inist (serveur)

NO :	PASCAL 13-0216741 INIST
FT :	Classification structurée pour l'apprentissage par renforcement inverse
ET :	(Structured classification for inverse reinforcement learning)
AU :	KLEIN (Edouard); PIOT (Bilal); GEIST (Matthieu); PIETQUIN (Olivier); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)
AF :	LORIA - équipe ABC/Nancy/France (1 aut.); Supélec - Groupe de recherche IMS-MaLIS/Metz/France (1 aut., 2 aut., 3 aut., 4 aut.); UMI 2958 (GeorgiaTech-CNRS)/Metz/France (2 aut., 4 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)
DT :	Publication en série; Niveau analytique
SO :	Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 155-169 [16 p.]; Abs. anglais; Bibl. 1 p.
LA :	Français
EA :	This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.
CC :	001D02C02; 001D15C
FD :	Classification; Structure donnée; Apprentissage renforcé; Paramétrisation; Simulateur; Récompense; Politique; Automobile; Conduite véhicule; Structure interne; Algorithme apprentissage; Problème inverse; Problème direct; Méthode heuristique
ED :	Classification; Data structure; Reinforcement learning; Parameterization; Simulator; Reward; Policy; Motor car; Vehicle driving; Internal structure; Learning algorithm; Inverse problem; Direct problem; Heuristic method
SD :	Clasificación; Estructura datos; Aprendizaje reforzado; Parametrización; Simulador; Recompensa; Política; Automóvil; Conducción vehículo; Estructura interna; Algoritmo aprendizaje; Problema inverso; Problema directo; Método heurístico
LO :	INIST-21320.354000173351010010
ID :	13-0216741

Links to Exploration step

Pascal:13-0216741

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Classification structurée pour l'apprentissage par renforcement inverse</title>
<author><name sortKey="Klein, Edouard" sort="Klein, Edouard" uniqKey="Klein E" first="Edouard" last="Klein">Edouard Klein</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - équipe ABC</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Piot, Bilal" sort="Piot, Bilal" uniqKey="Piot B" first="Bilal" last="Piot">Bilal Piot</name>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Pietquin, Olivier" sort="Pietquin, Olivier" uniqKey="Pietquin O" first="Olivier" last="Pietquin">Olivier Pietquin</name>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216741</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216741 INIST</idno>
<idno type="RBID">Pascal:13-0216741</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000065</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Classification structurée pour l'apprentissage par renforcement inverse</title>
<author><name sortKey="Klein, Edouard" sort="Klein, Edouard" uniqKey="Klein E" first="Edouard" last="Klein">Edouard Klein</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - équipe ABC</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Piot, Bilal" sort="Piot, Bilal" uniqKey="Piot B" first="Bilal" last="Piot">Bilal Piot</name>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Pietquin, Olivier" sort="Pietquin, Olivier" uniqKey="Pietquin O" first="Olivier" last="Pietquin">Olivier Pietquin</name>
<affiliation><inist:fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Classification</term>
<term>Data structure</term>
<term>Direct problem</term>
<term>Heuristic method</term>
<term>Internal structure</term>
<term>Inverse problem</term>
<term>Learning algorithm</term>
<term>Motor car</term>
<term>Parameterization</term>
<term>Policy</term>
<term>Reinforcement learning</term>
<term>Reward</term>
<term>Simulator</term>
<term>Vehicle driving</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Classification</term>
<term>Structure donnée</term>
<term>Apprentissage renforcé</term>
<term>Paramétrisation</term>
<term>Simulateur</term>
<term>Récompense</term>
<term>Politique</term>
<term>Automobile</term>
<term>Conduite véhicule</term>
<term>Structure interne</term>
<term>Algorithme apprentissage</term>
<term>Problème inverse</term>
<term>Problème direct</term>
<term>Méthode heuristique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0992-499X</s0>
</fA01>
<fA03 i2="1"><s0>Rev. intell. artif.</s0>
</fA03>
<fA05><s2>27</s2>
</fA05>
<fA06><s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="FRE"><s1>Classification structurée pour l'apprentissage par renforcement inverse</s1>
</fA08>
<fA09 i1="01" i2="1" l="FRE"><s1>Apprentissage par renforcement et planification adaptative</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>KLEIN (Edouard)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>PIOT (Bilal)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>GEIST (Matthieu)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>PIETQUIN (Olivier)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>ZANUTTINI (Bruno)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>LAURENT (Guillaume)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>BUFFET (Olivier)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>LORIA - équipe ABC</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>Supélec - Groupe de recherche IMS-MaLIS</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</fA14>
<fA14 i1="03"><s1>UMI 2958 (GeorgiaTech-CNRS)</s1>
<s2>Metz</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>Greyc/UCBN</s1>
<s2>Caen</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>Institut FEMTO-ST/ENSMM</s1>
<s2>Besançon</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03"><s1>LORIA/INRIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</fA15>
<fA20><s2>151, 155-169 [16 p.]</s2>
</fA20>
<fA21><s1>2013</s1>
</fA21>
<fA23 i1="01"><s0>FRE</s0>
</fA23>
<fA24 i1="01"><s0>eng</s0>
</fA24>
<fA43 i1="01"><s1>INIST</s1>
<s2>21320</s2>
<s5>354000173351010010</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2013 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>13-0216741</s0>
</fA47>
<fA60><s1>P</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Revue d'intelligence artificielle</s0>
</fA64>
<fA66 i1="01"><s0>FRA</s0>
</fA66>
<fA68 i1="01" i2="1" l="ENG"><s1>Structured classification for inverse reinforcement learning</s1>
</fA68>
<fC01 i1="01" l="ENG"><s0>This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C02</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>001D15C</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Classification</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Classification</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Clasificación</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Structure donnée</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Data structure</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Estructura datos</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Apprentissage renforcé</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Reinforcement learning</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Aprendizaje reforzado</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Paramétrisation</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Parameterization</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Parametrización</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Simulateur</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Simulator</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Simulador</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Récompense</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Reward</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Recompensa</s0>
<s5>18</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Politique</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Policy</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Política</s0>
<s5>19</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Automobile</s0>
<s5>20</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Motor car</s0>
<s5>20</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Automóvil</s0>
<s5>20</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Conduite véhicule</s0>
<s5>21</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Vehicle driving</s0>
<s5>21</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Conducción vehículo</s0>
<s5>21</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Structure interne</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Internal structure</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Estructura interna</s0>
<s5>23</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Algorithme apprentissage</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>Learning algorithm</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Algoritmo aprendizaje</s0>
<s5>24</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Problème inverse</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Inverse problem</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Problema inverso</s0>
<s5>25</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE"><s0>Problème direct</s0>
<s5>26</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG"><s0>Direct problem</s0>
<s5>26</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA"><s0>Problema directo</s0>
<s5>26</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE"><s0>Méthode heuristique</s0>
<s5>27</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG"><s0>Heuristic method</s0>
<s5>27</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA"><s0>Método heurístico</s0>
<s5>27</s5>
</fC03>
<fN21><s1>203</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
</standard>
<server><NO>PASCAL 13-0216741 INIST</NO>
<FT>Classification structurée pour l'apprentissage par renforcement inverse</FT>
<ET>(Structured classification for inverse reinforcement learning)</ET>
<AU>KLEIN (Edouard); PIOT (Bilal); GEIST (Matthieu); PIETQUIN (Olivier); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)</AU>
<AF>LORIA - équipe ABC/Nancy/France (1 aut.); Supélec - Groupe de recherche IMS-MaLIS/Metz/France (1 aut., 2 aut., 3 aut., 4 aut.); UMI 2958 (GeorgiaTech-CNRS)/Metz/France (2 aut., 4 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 155-169 [16 p.]; Abs. anglais; Bibl. 1 p.</SO>
<LA>Français</LA>
<EA>This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclasse classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.</EA>
<CC>001D02C02; 001D15C</CC>
<FD>Classification; Structure donnée; Apprentissage renforcé; Paramétrisation; Simulateur; Récompense; Politique; Automobile; Conduite véhicule; Structure interne; Algorithme apprentissage; Problème inverse; Problème direct; Méthode heuristique</FD>
<ED>Classification; Data structure; Reinforcement learning; Parameterization; Simulator; Reward; Policy; Motor car; Vehicle driving; Internal structure; Learning algorithm; Inverse problem; Direct problem; Heuristic method</ED>
<SD>Clasificación; Estructura datos; Aprendizaje reforzado; Parametrización; Simulador; Recompensa; Política; Automóvil; Conducción vehículo; Estructura interna; Algoritmo aprendizaje; Problema inverso; Problema directo; Método heurístico</SD>
<LO>INIST-21320.354000173351010010</LO>
<ID>13-0216741</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000065 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000065 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:13-0216741
   |texte=   Classification structurée pour l'apprentissage par renforcement inverse
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Classification structurée pour l'apprentissage par renforcement inverse

Classification structurée pour l'apprentissage par renforcement inverse

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri