Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Identifieur interne : 000063 ( PascalFrancis/Corpus ); précédent : 000062; suivant : 000064

Strategies d'échantillonnage pour l'apprentissage par renforcement batch

Auteurs : Raphael Fonteneau ; Susan A. Murphy ; Louis Wehenkel ; Damien Ernst

Source :

RBID : Pascal:13-0216765

Descripteurs français

English descriptors

Abstract

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 0992-499X
A03   1    @0 Rev. intell. artif.
A05       @2 27
A06       @2 2
A08 01  1  FRE  @1 Strategies d'échantillonnage pour l'apprentissage par renforcement batch
A09 01  1  FRE  @1 Apprentissage par renforcement et planification adaptative
A11 01  1    @1 FONTENEAU (Raphael)
A11 02  1    @1 MURPHY (Susan A.)
A11 03  1    @1 WEHENKEL (Louis)
A11 04  1    @1 ERNST (Damien)
A12 01  1    @1 ZANUTTINI (Bruno) @9 ed.
A12 02  1    @1 LAURENT (Guillaume) @9 ed.
A12 03  1    @1 BUFFET (Olivier) @9 ed.
A14 01      @1 Université de Liège @3 BEL @Z 1 aut. @Z 3 aut. @Z 4 aut.
A14 02      @1 Université du Michigan @3 USA @Z 2 aut.
A15 01      @1 Greyc/UCBN @2 Caen @3 FRA @Z 1 aut.
A15 02      @1 Institut FEMTO-ST/ENSMM @2 Besançon @3 FRA @Z 2 aut.
A15 03      @1 LORIA/INRIA @2 Nancy @3 FRA @Z 3 aut.
A20       @2 151, 171-194 [25 p.]
A21       @1 2013
A23 01      @0 FRE
A24 01      @0 eng
A43 01      @1 INIST @2 21320 @5 354000173351010020
A44       @0 0000 @1 © 2013 INIST-CNRS. All rights reserved.
A45       @0 1 p.1/4
A47 01  1    @0 13-0216765
A60       @1 P
A61       @0 A
A64 01  1    @0 Revue d'intelligence artificielle
A66 01      @0 FRA
A68 01  1  ENG  @1 Sampling strategies for batch mode reinforcement learning
C01 01    ENG  @0 We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.
C02 01  X    @0 001D02C
C02 02  X    @0 001D02D08
C02 03  X    @0 001D02D05
C03 01  X  FRE  @0 Apprentissage renforcé @5 06
C03 01  X  ENG  @0 Reinforcement learning @5 06
C03 01  X  SPA  @0 Aprendizaje reforzado @5 06
C03 02  X  FRE  @0 Intelligence artificielle @5 07
C03 02  X  ENG  @0 Artificial intelligence @5 07
C03 02  X  SPA  @0 Inteligencia artificial @5 07
C03 03  X  FRE  @0 Action @5 08
C03 03  X  ENG  @0 Action @5 08
C03 03  X  SPA  @0 Acción @5 08
C03 04  X  FRE  @0 Système actif @5 09
C03 04  X  ENG  @0 Active system @5 09
C03 04  X  SPA  @0 Sistema activo @5 09
C03 05  X  FRE  @0 Apprentissage supervisé @5 18
C03 05  X  ENG  @0 Supervised learning @5 18
C03 05  X  SPA  @0 Aprendizaje supervisado @5 18
C03 06  X  FRE  @0 Commande optimale @5 19
C03 06  X  ENG  @0 Optimal control @5 19
C03 06  X  SPA  @0 Control óptimo @5 19
C03 07  X  FRE  @0 Contrôle optimal @5 20
C03 07  X  ENG  @0 Optimal control (mathematics) @5 20
C03 07  X  SPA  @0 Control óptimo (matemáticas) @5 20
C03 08  X  FRE  @0 Politique optimale @5 21
C03 08  X  ENG  @0 Optimal policy @5 21
C03 08  X  SPA  @0 Política óptima @5 21
C03 09  X  FRE  @0 Echantillonnage @5 23
C03 09  X  ENG  @0 Sampling @5 23
C03 09  X  SPA  @0 Muestreo @5 23
C03 10  X  FRE  @0 Algorithme apprentissage @5 24
C03 10  X  ENG  @0 Learning algorithm @5 24
C03 10  X  SPA  @0 Algoritmo aprendizaje @5 24
C03 11  X  FRE  @0 Identification système @5 25
C03 11  X  ENG  @0 System identification @5 25
C03 11  X  SPA  @0 Identificación sistema @5 25
C03 12  X  FRE  @0 Ajustement modèle @5 26
C03 12  X  ENG  @0 Model matching @5 26
C03 12  X  SPA  @0 Ajustamiento modelo @5 26
C03 13  X  FRE  @0 Méthode espace état @5 27
C03 13  X  ENG  @0 State space method @5 27
C03 13  X  SPA  @0 Método espacio estado @5 27
C03 14  X  FRE  @0 Espace état @5 28
C03 14  X  ENG  @0 State space @5 28
C03 14  X  SPA  @0 Espacio estado @5 28
C03 15  X  FRE  @0 . @4 INC @5 82
N21       @1 203
N44 01      @1 OTO
N82       @1 OTO

Format Inist (serveur)

NO : PASCAL 13-0216765 INIST
FT : Strategies d'échantillonnage pour l'apprentissage par renforcement batch
ET : (Sampling strategies for batch mode reinforcement learning)
AU : FONTENEAU (Raphael); MURPHY (Susan A.); WEHENKEL (Louis); ERNST (Damien); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)
AF : Université de Liège/Belgique (1 aut., 3 aut., 4 aut.); Université du Michigan/Etats-Unis (2 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)
DT : Publication en série; Niveau analytique
SO : Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 171-194 [25 p.]; Abs. anglais; Bibl. 1 p.1/4
LA : Français
EA : We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.
CC : 001D02C; 001D02D08; 001D02D05
FD : Apprentissage renforcé; Intelligence artificielle; Action; Système actif; Apprentissage supervisé; Commande optimale; Contrôle optimal; Politique optimale; Echantillonnage; Algorithme apprentissage; Identification système; Ajustement modèle; Méthode espace état; Espace état; .
ED : Reinforcement learning; Artificial intelligence; Action; Active system; Supervised learning; Optimal control; Optimal control (mathematics); Optimal policy; Sampling; Learning algorithm; System identification; Model matching; State space method; State space
SD : Aprendizaje reforzado; Inteligencia artificial; Acción; Sistema activo; Aprendizaje supervisado; Control óptimo; Control óptimo (matemáticas); Política óptima; Muestreo; Algoritmo aprendizaje; Identificación sistema; Ajustamiento modelo; Método espacio estado; Espacio estado
LO : INIST-21320.354000173351010020
ID : 13-0216765

Links to Exploration step

Pascal:13-0216765

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author>
<name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation>
<inist:fA14 i1="02">
<s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216765</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216765 INIST</idno>
<idno type="RBID">Pascal:13-0216765</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000063</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author>
<name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation>
<inist:fA14 i1="02">
<s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Action</term>
<term>Active system</term>
<term>Artificial intelligence</term>
<term>Learning algorithm</term>
<term>Model matching</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimal policy</term>
<term>Reinforcement learning</term>
<term>Sampling</term>
<term>State space</term>
<term>State space method</term>
<term>Supervised learning</term>
<term>System identification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Apprentissage renforcé</term>
<term>Intelligence artificielle</term>
<term>Action</term>
<term>Système actif</term>
<term>Apprentissage supervisé</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Politique optimale</term>
<term>Echantillonnage</term>
<term>Algorithme apprentissage</term>
<term>Identification système</term>
<term>Ajustement modèle</term>
<term>Méthode espace état</term>
<term>Espace état</term>
<term>.</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0992-499X</s0>
</fA01>
<fA03 i2="1">
<s0>Rev. intell. artif.</s0>
</fA03>
<fA05>
<s2>27</s2>
</fA05>
<fA06>
<s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="FRE">
<s1>Strategies d'échantillonnage pour l'apprentissage par renforcement batch</s1>
</fA08>
<fA09 i1="01" i2="1" l="FRE">
<s1>Apprentissage par renforcement et planification adaptative</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>FONTENEAU (Raphael)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>MURPHY (Susan A.)</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>WEHENKEL (Louis)</s1>
</fA11>
<fA11 i1="04" i2="1">
<s1>ERNST (Damien)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>ZANUTTINI (Bruno)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>LAURENT (Guillaume)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1">
<s1>BUFFET (Olivier)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01">
<s1>Greyc/UCBN</s1>
<s2>Caen</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02">
<s1>Institut FEMTO-ST/ENSMM</s1>
<s2>Besançon</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03">
<s1>LORIA/INRIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</fA15>
<fA20>
<s2>151, 171-194 [25 p.]</s2>
</fA20>
<fA21>
<s1>2013</s1>
</fA21>
<fA23 i1="01">
<s0>FRE</s0>
</fA23>
<fA24 i1="01">
<s0>eng</s0>
</fA24>
<fA43 i1="01">
<s1>INIST</s1>
<s2>21320</s2>
<s5>354000173351010020</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2013 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>1 p.1/4</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>13-0216765</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Revue d'intelligence artificielle</s0>
</fA64>
<fA66 i1="01">
<s0>FRA</s0>
</fA66>
<fA68 i1="01" i2="1" l="ENG">
<s1>Sampling strategies for batch mode reinforcement learning</s1>
</fA68>
<fC01 i1="01" l="ENG">
<s0>We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D02C</s0>
</fC02>
<fC02 i1="02" i2="X">
<s0>001D02D08</s0>
</fC02>
<fC02 i1="03" i2="X">
<s0>001D02D05</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Apprentissage renforcé</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Reinforcement learning</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Aprendizaje reforzado</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Intelligence artificielle</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Artificial intelligence</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Inteligencia artificial</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Action</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Action</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Acción</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Système actif</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Active system</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Sistema activo</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Apprentissage supervisé</s0>
<s5>18</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Supervised learning</s0>
<s5>18</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Aprendizaje supervisado</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Commande optimale</s0>
<s5>19</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>Optimal control</s0>
<s5>19</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Control óptimo</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Contrôle optimal</s0>
<s5>20</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Optimal control (mathematics)</s0>
<s5>20</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Control óptimo (matemáticas)</s0>
<s5>20</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Politique optimale</s0>
<s5>21</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Optimal policy</s0>
<s5>21</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Política óptima</s0>
<s5>21</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Echantillonnage</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Sampling</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Muestreo</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Algorithme apprentissage</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Learning algorithm</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Algoritmo aprendizaje</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE">
<s0>Identification système</s0>
<s5>25</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG">
<s0>System identification</s0>
<s5>25</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA">
<s0>Identificación sistema</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Ajustement modèle</s0>
<s5>26</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Model matching</s0>
<s5>26</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Ajustamiento modelo</s0>
<s5>26</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE">
<s0>Méthode espace état</s0>
<s5>27</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG">
<s0>State space method</s0>
<s5>27</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA">
<s0>Método espacio estado</s0>
<s5>27</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE">
<s0>Espace état</s0>
<s5>28</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG">
<s0>State space</s0>
<s5>28</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA">
<s0>Espacio estado</s0>
<s5>28</s5>
</fC03>
<fC03 i1="15" i2="X" l="FRE">
<s0>.</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fN21>
<s1>203</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
</standard>
<server>
<NO>PASCAL 13-0216765 INIST</NO>
<FT>Strategies d'échantillonnage pour l'apprentissage par renforcement batch</FT>
<ET>(Sampling strategies for batch mode reinforcement learning)</ET>
<AU>FONTENEAU (Raphael); MURPHY (Susan A.); WEHENKEL (Louis); ERNST (Damien); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)</AU>
<AF>Université de Liège/Belgique (1 aut., 3 aut., 4 aut.); Université du Michigan/Etats-Unis (2 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 171-194 [25 p.]; Abs. anglais; Bibl. 1 p.1/4</SO>
<LA>Français</LA>
<EA>We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</EA>
<CC>001D02C; 001D02D08; 001D02D05</CC>
<FD>Apprentissage renforcé; Intelligence artificielle; Action; Système actif; Apprentissage supervisé; Commande optimale; Contrôle optimal; Politique optimale; Echantillonnage; Algorithme apprentissage; Identification système; Ajustement modèle; Méthode espace état; Espace état; .</FD>
<ED>Reinforcement learning; Artificial intelligence; Action; Active system; Supervised learning; Optimal control; Optimal control (mathematics); Optimal policy; Sampling; Learning algorithm; System identification; Model matching; State space method; State space</ED>
<SD>Aprendizaje reforzado; Inteligencia artificial; Acción; Sistema activo; Aprendizaje supervisado; Control óptimo; Control óptimo (matemáticas); Política óptima; Muestreo; Algoritmo aprendizaje; Identificación sistema; Ajustamiento modelo; Método espacio estado; Espacio estado</SD>
<LO>INIST-21320.354000173351010020</LO>
<ID>13-0216765</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000063 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000063 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:13-0216765
   |texte=   Strategies d'échantillonnage pour l'apprentissage par renforcement batch
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022