InforLorV4, PascalFrancis, Corpus, bibRecord, 000063

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Identifieur interne : 000063 ( PascalFrancis/Corpus ); précédent : 000062; suivant : 000064

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Auteurs : Raphael Fonteneau ; Susan A. Murphy ; Louis Wehenkel ; Damien Ernst

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2013.

RBID : Pascal:13-0216765

Descripteurs français

Pascal (Inist)
- Apprentissage renforcé, Intelligence artificielle, Action, Système actif, Apprentissage supervisé, Commande optimale, Contrôle optimal, Politique optimale, Echantillonnage, Algorithme apprentissage, Identification système, Ajustement modèle, Méthode espace état, Espace état, ..

English descriptors

KwdEn :
- Action, Active system, Artificial intelligence, Learning algorithm, Model matching, Optimal control, Optimal control (mathematics), Optimal policy, Reinforcement learning, Sampling, State space, State space method, Supervised learning, System identification.

Abstract

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0992-499X`
A03		`1`		`@0 Rev. intell. artif.`
A05				`@2 27`
A06				`@2 2`
A08	`01`	`1`	`FRE`	`@1 Strategies d'échantillonnage pour l'apprentissage par renforcement batch`
A09	`01`	`1`	`FRE`	`@1 Apprentissage par renforcement et planification adaptative`
A11	`01`	`1`		`@1 FONTENEAU (Raphael)`
A11	`02`	`1`		`@1 MURPHY (Susan A.)`
A11	`03`	`1`		`@1 WEHENKEL (Louis)`
A11	`04`	`1`		`@1 ERNST (Damien)`
A12	`01`	`1`		`@1 ZANUTTINI (Bruno) @9 ed.`
A12	`02`	`1`		`@1 LAURENT (Guillaume) @9 ed.`
A12	`03`	`1`		`@1 BUFFET (Olivier) @9 ed.`
A14	`01`			`@1 Université de Liège @3 BEL @Z 1 aut. @Z 3 aut. @Z 4 aut.`
A14	`02`			`@1 Université du Michigan @3 USA @Z 2 aut.`
A15	`01`			`@1 Greyc/UCBN @2 Caen @3 FRA @Z 1 aut.`
A15	`02`			`@1 Institut FEMTO-ST/ENSMM @2 Besançon @3 FRA @Z 2 aut.`
A15	`03`			`@1 LORIA/INRIA @2 Nancy @3 FRA @Z 3 aut.`
A20				`@2 151, 171-194 [25 p.]`
A21				`@1 2013`
A23	`01`			`@0 FRE`
A24	`01`			`@0 eng`
A43	`01`			`@1 INIST @2 21320 @5 354000173351010020`
A44				`@0 0000 @1 © 2013 INIST-CNRS. All rights reserved.`
A45				`@0 1 p.1/4`
A47	`01`	`1`		`@0 13-0216765`
A60				`@1 P`
A61				`@0 A`
A64	`01`	`1`		`@0 Revue d'intelligence artificielle`
A66	`01`			`@0 FRA`
A68	`01`	`1`	`ENG`	`@1 Sampling strategies for batch mode reinforcement learning`
C01	`01`		`ENG`	@0 We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.
C02	`01`	`X`		`@0 001D02C`
C02	`02`	`X`		`@0 001D02D08`
C02	`03`	`X`		`@0 001D02D05`
C03	`01`	`X`	`FRE`	`@0 Apprentissage renforcé @5 06`
C03	`01`	`X`	`ENG`	`@0 Reinforcement learning @5 06`
C03	`01`	`X`	`SPA`	`@0 Aprendizaje reforzado @5 06`
C03	`02`	`X`	`FRE`	`@0 Intelligence artificielle @5 07`
C03	`02`	`X`	`ENG`	`@0 Artificial intelligence @5 07`
C03	`02`	`X`	`SPA`	`@0 Inteligencia artificial @5 07`
C03	`03`	`X`	`FRE`	`@0 Action @5 08`
C03	`03`	`X`	`ENG`	`@0 Action @5 08`
C03	`03`	`X`	`SPA`	`@0 Acción @5 08`
C03	`04`	`X`	`FRE`	`@0 Système actif @5 09`
C03	`04`	`X`	`ENG`	`@0 Active system @5 09`
C03	`04`	`X`	`SPA`	`@0 Sistema activo @5 09`
C03	`05`	`X`	`FRE`	`@0 Apprentissage supervisé @5 18`
C03	`05`	`X`	`ENG`	`@0 Supervised learning @5 18`
C03	`05`	`X`	`SPA`	`@0 Aprendizaje supervisado @5 18`
C03	`06`	`X`	`FRE`	`@0 Commande optimale @5 19`
C03	`06`	`X`	`ENG`	`@0 Optimal control @5 19`
C03	`06`	`X`	`SPA`	`@0 Control óptimo @5 19`
C03	`07`	`X`	`FRE`	`@0 Contrôle optimal @5 20`
C03	`07`	`X`	`ENG`	`@0 Optimal control (mathematics) @5 20`
C03	`07`	`X`	`SPA`	`@0 Control óptimo (matemáticas) @5 20`
C03	`08`	`X`	`FRE`	`@0 Politique optimale @5 21`
C03	`08`	`X`	`ENG`	`@0 Optimal policy @5 21`
C03	`08`	`X`	`SPA`	`@0 Política óptima @5 21`
C03	`09`	`X`	`FRE`	`@0 Echantillonnage @5 23`
C03	`09`	`X`	`ENG`	`@0 Sampling @5 23`
C03	`09`	`X`	`SPA`	`@0 Muestreo @5 23`
C03	`10`	`X`	`FRE`	`@0 Algorithme apprentissage @5 24`
C03	`10`	`X`	`ENG`	`@0 Learning algorithm @5 24`
C03	`10`	`X`	`SPA`	`@0 Algoritmo aprendizaje @5 24`
C03	`11`	`X`	`FRE`	`@0 Identification système @5 25`
C03	`11`	`X`	`ENG`	`@0 System identification @5 25`
C03	`11`	`X`	`SPA`	`@0 Identificación sistema @5 25`
C03	`12`	`X`	`FRE`	`@0 Ajustement modèle @5 26`
C03	`12`	`X`	`ENG`	`@0 Model matching @5 26`
C03	`12`	`X`	`SPA`	`@0 Ajustamiento modelo @5 26`
C03	`13`	`X`	`FRE`	`@0 Méthode espace état @5 27`
C03	`13`	`X`	`ENG`	`@0 State space method @5 27`
C03	`13`	`X`	`SPA`	`@0 Método espacio estado @5 27`
C03	`14`	`X`	`FRE`	`@0 Espace état @5 28`
C03	`14`	`X`	`ENG`	`@0 State space @5 28`
C03	`14`	`X`	`SPA`	`@0 Espacio estado @5 28`
C03	`15`	`X`	`FRE`	`@0 . @4 INC @5 82`
N21				`@1 203`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

Format Inist (serveur)

NO :	PASCAL 13-0216765 INIST
FT :	Strategies d'échantillonnage pour l'apprentissage par renforcement batch
ET :	(Sampling strategies for batch mode reinforcement learning)
AU :	FONTENEAU (Raphael); MURPHY (Susan A.); WEHENKEL (Louis); ERNST (Damien); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)
AF :	Université de Liège/Belgique (1 aut., 3 aut., 4 aut.); Université du Michigan/Etats-Unis (2 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)
DT :	Publication en série; Niveau analytique
SO :	Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 171-194 [25 p.]; Abs. anglais; Bibl. 1 p.1/4
LA :	Français
EA :	We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.
CC :	001D02C; 001D02D08; 001D02D05
FD :	Apprentissage renforcé; Intelligence artificielle; Action; Système actif; Apprentissage supervisé; Commande optimale; Contrôle optimal; Politique optimale; Echantillonnage; Algorithme apprentissage; Identification système; Ajustement modèle; Méthode espace état; Espace état; .
ED :	Reinforcement learning; Artificial intelligence; Action; Active system; Supervised learning; Optimal control; Optimal control (mathematics); Optimal policy; Sampling; Learning algorithm; System identification; Model matching; State space method; State space
SD :	Aprendizaje reforzado; Inteligencia artificial; Acción; Sistema activo; Aprendizaje supervisado; Control óptimo; Control óptimo (matemáticas); Política óptima; Muestreo; Algoritmo aprendizaje; Identificación sistema; Ajustamiento modelo; Método espacio estado; Espacio estado
LO :	INIST-21320.354000173351010020
ID :	13-0216765

Links to Exploration step

Pascal:13-0216765

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation><inist:fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216765</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216765 INIST</idno>
<idno type="RBID">Pascal:13-0216765</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000063</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation><inist:fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Action</term>
<term>Active system</term>
<term>Artificial intelligence</term>
<term>Learning algorithm</term>
<term>Model matching</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimal policy</term>
<term>Reinforcement learning</term>
<term>Sampling</term>
<term>State space</term>
<term>State space method</term>
<term>Supervised learning</term>
<term>System identification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Apprentissage renforcé</term>
<term>Intelligence artificielle</term>
<term>Action</term>
<term>Système actif</term>
<term>Apprentissage supervisé</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Politique optimale</term>
<term>Echantillonnage</term>
<term>Algorithme apprentissage</term>
<term>Identification système</term>
<term>Ajustement modèle</term>
<term>Méthode espace état</term>
<term>Espace état</term>
<term>.</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0992-499X</s0>
</fA01>
<fA03 i2="1"><s0>Rev. intell. artif.</s0>
</fA03>
<fA05><s2>27</s2>
</fA05>
<fA06><s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="FRE"><s1>Strategies d'échantillonnage pour l'apprentissage par renforcement batch</s1>
</fA08>
<fA09 i1="01" i2="1" l="FRE"><s1>Apprentissage par renforcement et planification adaptative</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>FONTENEAU (Raphael)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>MURPHY (Susan A.)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>WEHENKEL (Louis)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>ERNST (Damien)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>ZANUTTINI (Bruno)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>LAURENT (Guillaume)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>BUFFET (Olivier)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>Greyc/UCBN</s1>
<s2>Caen</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>Institut FEMTO-ST/ENSMM</s1>
<s2>Besançon</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03"><s1>LORIA/INRIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</fA15>
<fA20><s2>151, 171-194 [25 p.]</s2>
</fA20>
<fA21><s1>2013</s1>
</fA21>
<fA23 i1="01"><s0>FRE</s0>
</fA23>
<fA24 i1="01"><s0>eng</s0>
</fA24>
<fA43 i1="01"><s1>INIST</s1>
<s2>21320</s2>
<s5>354000173351010020</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2013 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.1/4</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>13-0216765</s0>
</fA47>
<fA60><s1>P</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Revue d'intelligence artificielle</s0>
</fA64>
<fA66 i1="01"><s0>FRA</s0>
</fA66>
<fA68 i1="01" i2="1" l="ENG"><s1>Sampling strategies for batch mode reinforcement learning</s1>
</fA68>
<fC01 i1="01" l="ENG"><s0>We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>001D02D08</s0>
</fC02>
<fC02 i1="03" i2="X"><s0>001D02D05</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Apprentissage renforcé</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Reinforcement learning</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Aprendizaje reforzado</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Intelligence artificielle</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Artificial intelligence</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Inteligencia artificial</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Action</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Action</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Acción</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Système actif</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Active system</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Sistema activo</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Apprentissage supervisé</s0>
<s5>18</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Supervised learning</s0>
<s5>18</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Aprendizaje supervisado</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Commande optimale</s0>
<s5>19</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Optimal control</s0>
<s5>19</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Control óptimo</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Contrôle optimal</s0>
<s5>20</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Optimal control (mathematics)</s0>
<s5>20</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Control óptimo (matemáticas)</s0>
<s5>20</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Politique optimale</s0>
<s5>21</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Optimal policy</s0>
<s5>21</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Política óptima</s0>
<s5>21</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Echantillonnage</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Sampling</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Muestreo</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Algorithme apprentissage</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Learning algorithm</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Algoritmo aprendizaje</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Identification système</s0>
<s5>25</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>System identification</s0>
<s5>25</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Identificación sistema</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Ajustement modèle</s0>
<s5>26</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Model matching</s0>
<s5>26</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Ajustamiento modelo</s0>
<s5>26</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE"><s0>Méthode espace état</s0>
<s5>27</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG"><s0>State space method</s0>
<s5>27</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA"><s0>Método espacio estado</s0>
<s5>27</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE"><s0>Espace état</s0>
<s5>28</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG"><s0>State space</s0>
<s5>28</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA"><s0>Espacio estado</s0>
<s5>28</s5>
</fC03>
<fC03 i1="15" i2="X" l="FRE"><s0>.</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fN21><s1>203</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
</standard>
<server><NO>PASCAL 13-0216765 INIST</NO>
<FT>Strategies d'échantillonnage pour l'apprentissage par renforcement batch</FT>
<ET>(Sampling strategies for batch mode reinforcement learning)</ET>
<AU>FONTENEAU (Raphael); MURPHY (Susan A.); WEHENKEL (Louis); ERNST (Damien); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)</AU>
<AF>Université de Liège/Belgique (1 aut., 3 aut., 4 aut.); Université du Michigan/Etats-Unis (2 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 171-194 [25 p.]; Abs. anglais; Bibl. 1 p.1/4</SO>
<LA>Français</LA>
<EA>We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</EA>
<CC>001D02C; 001D02D08; 001D02D05</CC>
<FD>Apprentissage renforcé; Intelligence artificielle; Action; Système actif; Apprentissage supervisé; Commande optimale; Contrôle optimal; Politique optimale; Echantillonnage; Algorithme apprentissage; Identification système; Ajustement modèle; Méthode espace état; Espace état; .</FD>
<ED>Reinforcement learning; Artificial intelligence; Action; Active system; Supervised learning; Optimal control; Optimal control (mathematics); Optimal policy; Sampling; Learning algorithm; System identification; Model matching; State space method; State space</ED>
<SD>Aprendizaje reforzado; Inteligencia artificial; Acción; Sistema activo; Aprendizaje supervisado; Control óptimo; Control óptimo (matemáticas); Política óptima; Muestreo; Algoritmo aprendizaje; Identificación sistema; Ajustamiento modelo; Método espacio estado; Espacio estado</SD>
<LO>INIST-21320.354000173351010020</LO>
<ID>13-0216765</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000063 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000063 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:13-0216765
   |texte=   Strategies d'échantillonnage pour l'apprentissage par renforcement batch
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri