Strategies d'échantillonnage pour l'apprentissage par renforcement batch
Identifieur interne : 000063 ( PascalFrancis/Corpus ); précédent : 000062; suivant : 000064Strategies d'échantillonnage pour l'apprentissage par renforcement batch
Auteurs : Raphael Fonteneau ; Susan A. Murphy ; Louis Wehenkel ; Damien ErnstSource :
- Revue d'intelligence artificielle [ 0992-499X ] ; 2013.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.
Notice en format standard (ISO 2709)
Pour connaître la documentation sur le format Inist Standard.
pA |
|
---|
Format Inist (serveur)
NO : | PASCAL 13-0216765 INIST |
---|---|
FT : | Strategies d'échantillonnage pour l'apprentissage par renforcement batch |
ET : | (Sampling strategies for batch mode reinforcement learning) |
AU : | FONTENEAU (Raphael); MURPHY (Susan A.); WEHENKEL (Louis); ERNST (Damien); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier) |
AF : | Université de Liège/Belgique (1 aut., 3 aut., 4 aut.); Université du Michigan/Etats-Unis (2 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.) |
DT : | Publication en série; Niveau analytique |
SO : | Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 171-194 [25 p.]; Abs. anglais; Bibl. 1 p.1/4 |
LA : | Français |
EA : | We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results. |
CC : | 001D02C; 001D02D08; 001D02D05 |
FD : | Apprentissage renforcé; Intelligence artificielle; Action; Système actif; Apprentissage supervisé; Commande optimale; Contrôle optimal; Politique optimale; Echantillonnage; Algorithme apprentissage; Identification système; Ajustement modèle; Méthode espace état; Espace état; . |
ED : | Reinforcement learning; Artificial intelligence; Action; Active system; Supervised learning; Optimal control; Optimal control (mathematics); Optimal policy; Sampling; Learning algorithm; System identification; Model matching; State space method; State space |
SD : | Aprendizaje reforzado; Inteligencia artificial; Acción; Sistema activo; Aprendizaje supervisado; Control óptimo; Control óptimo (matemáticas); Política óptima; Muestreo; Algoritmo aprendizaje; Identificación sistema; Ajustamiento modelo; Método espacio estado; Espacio estado |
LO : | INIST-21320.354000173351010020 |
ID : | 13-0216765 |
Links to Exploration step
Pascal:13-0216765Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation><inist:fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216765</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216765 INIST</idno>
<idno type="RBID">Pascal:13-0216765</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000063</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Strategies d'échantillonnage pour l'apprentissage par renforcement batch</title>
<author><name sortKey="Fonteneau, Raphael" sort="Fonteneau, Raphael" uniqKey="Fonteneau R" first="Raphael" last="Fonteneau">Raphael Fonteneau</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Murphy, Susan A" sort="Murphy, Susan A" uniqKey="Murphy S" first="Susan A." last="Murphy">Susan A. Murphy</name>
<affiliation><inist:fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Wehenkel, Louis" sort="Wehenkel, Louis" uniqKey="Wehenkel L" first="Louis" last="Wehenkel">Louis Wehenkel</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Ernst, Damien" sort="Ernst, Damien" uniqKey="Ernst D" first="Damien" last="Ernst">Damien Ernst</name>
<affiliation><inist:fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Action</term>
<term>Active system</term>
<term>Artificial intelligence</term>
<term>Learning algorithm</term>
<term>Model matching</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimal policy</term>
<term>Reinforcement learning</term>
<term>Sampling</term>
<term>State space</term>
<term>State space method</term>
<term>Supervised learning</term>
<term>System identification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Apprentissage renforcé</term>
<term>Intelligence artificielle</term>
<term>Action</term>
<term>Système actif</term>
<term>Apprentissage supervisé</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Politique optimale</term>
<term>Echantillonnage</term>
<term>Algorithme apprentissage</term>
<term>Identification système</term>
<term>Ajustement modèle</term>
<term>Méthode espace état</term>
<term>Espace état</term>
<term>.</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0992-499X</s0>
</fA01>
<fA03 i2="1"><s0>Rev. intell. artif.</s0>
</fA03>
<fA05><s2>27</s2>
</fA05>
<fA06><s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="FRE"><s1>Strategies d'échantillonnage pour l'apprentissage par renforcement batch</s1>
</fA08>
<fA09 i1="01" i2="1" l="FRE"><s1>Apprentissage par renforcement et planification adaptative</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>FONTENEAU (Raphael)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>MURPHY (Susan A.)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>WEHENKEL (Louis)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>ERNST (Damien)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>ZANUTTINI (Bruno)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>LAURENT (Guillaume)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>BUFFET (Olivier)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Université de Liège</s1>
<s3>BEL</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>Université du Michigan</s1>
<s3>USA</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>Greyc/UCBN</s1>
<s2>Caen</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>Institut FEMTO-ST/ENSMM</s1>
<s2>Besançon</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03"><s1>LORIA/INRIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</fA15>
<fA20><s2>151, 171-194 [25 p.]</s2>
</fA20>
<fA21><s1>2013</s1>
</fA21>
<fA23 i1="01"><s0>FRE</s0>
</fA23>
<fA24 i1="01"><s0>eng</s0>
</fA24>
<fA43 i1="01"><s1>INIST</s1>
<s2>21320</s2>
<s5>354000173351010020</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2013 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.1/4</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>13-0216765</s0>
</fA47>
<fA60><s1>P</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Revue d'intelligence artificielle</s0>
</fA64>
<fA66 i1="01"><s0>FRA</s0>
</fA66>
<fA68 i1="01" i2="1" l="ENG"><s1>Sampling strategies for batch mode reinforcement learning</s1>
</fA68>
<fC01 i1="01" l="ENG"><s0>We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>001D02D08</s0>
</fC02>
<fC02 i1="03" i2="X"><s0>001D02D05</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Apprentissage renforcé</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Reinforcement learning</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Aprendizaje reforzado</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Intelligence artificielle</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Artificial intelligence</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Inteligencia artificial</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Action</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Action</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Acción</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Système actif</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Active system</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Sistema activo</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Apprentissage supervisé</s0>
<s5>18</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Supervised learning</s0>
<s5>18</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Aprendizaje supervisado</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Commande optimale</s0>
<s5>19</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Optimal control</s0>
<s5>19</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Control óptimo</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Contrôle optimal</s0>
<s5>20</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Optimal control (mathematics)</s0>
<s5>20</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Control óptimo (matemáticas)</s0>
<s5>20</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Politique optimale</s0>
<s5>21</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Optimal policy</s0>
<s5>21</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Política óptima</s0>
<s5>21</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Echantillonnage</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Sampling</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Muestreo</s0>
<s5>23</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Algorithme apprentissage</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Learning algorithm</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Algoritmo aprendizaje</s0>
<s5>24</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Identification système</s0>
<s5>25</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>System identification</s0>
<s5>25</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Identificación sistema</s0>
<s5>25</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Ajustement modèle</s0>
<s5>26</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Model matching</s0>
<s5>26</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Ajustamiento modelo</s0>
<s5>26</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE"><s0>Méthode espace état</s0>
<s5>27</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG"><s0>State space method</s0>
<s5>27</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA"><s0>Método espacio estado</s0>
<s5>27</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE"><s0>Espace état</s0>
<s5>28</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG"><s0>State space</s0>
<s5>28</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA"><s0>Espacio estado</s0>
<s5>28</s5>
</fC03>
<fC03 i1="15" i2="X" l="FRE"><s0>.</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fN21><s1>203</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
</standard>
<server><NO>PASCAL 13-0216765 INIST</NO>
<FT>Strategies d'échantillonnage pour l'apprentissage par renforcement batch</FT>
<ET>(Sampling strategies for batch mode reinforcement learning)</ET>
<AU>FONTENEAU (Raphael); MURPHY (Susan A.); WEHENKEL (Louis); ERNST (Damien); ZANUTTINI (Bruno); LAURENT (Guillaume); BUFFET (Olivier)</AU>
<AF>Université de Liège/Belgique (1 aut., 3 aut., 4 aut.); Université du Michigan/Etats-Unis (2 aut.); Greyc/UCBN/Caen/France (1 aut.); Institut FEMTO-ST/ENSMM/Besançon/France (2 aut.); LORIA/INRIA/Nancy/France (3 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2013; Vol. 27; No. 2; 151, 171-194 [25 p.]; Abs. anglais; Bibl. 1 p.1/4</SO>
<LA>Français</LA>
<EA>We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results.</EA>
<CC>001D02C; 001D02D08; 001D02D05</CC>
<FD>Apprentissage renforcé; Intelligence artificielle; Action; Système actif; Apprentissage supervisé; Commande optimale; Contrôle optimal; Politique optimale; Echantillonnage; Algorithme apprentissage; Identification système; Ajustement modèle; Méthode espace état; Espace état; .</FD>
<ED>Reinforcement learning; Artificial intelligence; Action; Active system; Supervised learning; Optimal control; Optimal control (mathematics); Optimal policy; Sampling; Learning algorithm; System identification; Model matching; State space method; State space</ED>
<SD>Aprendizaje reforzado; Inteligencia artificial; Acción; Sistema activo; Aprendizaje supervisado; Control óptimo; Control óptimo (matemáticas); Política óptima; Muestreo; Algoritmo aprendizaje; Identificación sistema; Ajustamiento modelo; Método espacio estado; Espacio estado</SD>
<LO>INIST-21320.354000173351010020</LO>
<ID>13-0216765</ID>
</server>
</inist>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000063 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000063 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= PascalFrancis |étape= Corpus |type= RBID |clé= Pascal:13-0216765 |texte= Strategies d'échantillonnage pour l'apprentissage par renforcement batch }}
![]() | This area was generated with Dilib version V0.6.33. | ![]() |