Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The factored policy-gradient planner

Identifieur interne : 000271 ( PascalFrancis/Corpus ); précédent : 000270; suivant : 000272

The factored policy-gradient planner

Auteurs : Olivier Buffet ; Douglas Aberdeen

Source :

RBID : Pascal:09-0228244

Descripteurs français

English descriptors

Abstract

We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 0004-3702
A02 01      @0 AINTBB
A03   1    @0 Artif. intell.
A05       @2 173
A06       @2 5-6
A08 01  1  ENG  @1 The factored policy-gradient planner
A09 01  1  ENG  @1 Advances in Automated Plan Generation
A11 01  1    @1 BUFFET (Olivier)
A11 02  1    @1 ABERDEEN (Douglas)
A12 01  1    @1 FOX (Maria) @9 ed.
A12 02  1    @1 THIEBAUX (Sylvie) @9 ed.
A14 01      @1 LORIA-INRIA, Nancy University @2 Nancy @3 FRA @Z 1 aut.
A14 02      @1 Google Inc @2 Zurich @3 CHE @Z 2 aut.
A15 01      @1 Department of Computer and Information Sciences, University of Strathclyde @2 Glasgow @3 GBR @Z 1 aut.
A15 02      @1 Computer Sciences Laboratory & NICTA, Australian National University @2 Canberra @3 AUS @Z 2 aut.
A20       @1 722-747
A21       @1 2009
A23 01      @0 ENG
A43 01      @1 INIST @2 15159 @5 354000184847340080
A44       @0 0000 @1 © 2009 INIST-CNRS. All rights reserved.
A45       @0 53 ref.
A47 01  1    @0 09-0228244
A60       @1 P
A61       @0 A
A64 01  1    @0 Artificial intelligence
A66 01      @0 NLD
C01 01    ENG  @0 We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.
C02 01  X    @0 001D02A05
C02 02  X    @0 001D02C
C03 01  X  FRE  @0 Planification @5 06
C03 01  X  ENG  @0 Planning @5 06
C03 01  X  SPA  @0 Planificación @5 06
C03 02  X  FRE  @0 Apprentissage renforcé @5 07
C03 02  X  ENG  @0 Reinforcement learning @5 07
C03 02  X  SPA  @0 Aprendizaje reforzado @5 07
C03 03  X  FRE  @0 Métrique @5 18
C03 03  X  ENG  @0 Metric @5 18
C03 03  X  SPA  @0 Métrico @5 18
C03 04  X  FRE  @0 Approche probabiliste @5 23
C03 04  X  ENG  @0 Probabilistic approach @5 23
C03 04  X  SPA  @0 Enfoque probabilista @5 23
C03 05  X  FRE  @0 Modélisation @5 24
C03 05  X  ENG  @0 Modeling @5 24
C03 05  X  SPA  @0 Modelización @5 24
C03 06  X  FRE  @0 Fonction discrète @5 25
C03 06  X  ENG  @0 Discrete function @5 25
C03 06  X  SPA  @0 Función discreta @5 25
C03 07  X  FRE  @0 Programmation dynamique @5 26
C03 07  X  ENG  @0 Dynamic programming @5 26
C03 07  X  SPA  @0 Programación dinámica @5 26
C03 08  X  FRE  @0 Programmation stochastique @5 27
C03 08  X  ENG  @0 Stochastic programming @5 27
C03 08  X  SPA  @0 Programación estocástica @5 27
C03 09  X  FRE  @0 Recherche locale @5 28
C03 09  X  ENG  @0 Local search @5 28
C03 09  X  SPA  @0 Busca local @5 28
C03 10  X  FRE  @0 Approximation d'une fonction @4 CD @5 96
C03 10  X  ENG  @0 Function approximation @4 CD @5 96
C03 10  X  SPA  @0 Aproximación de funciones @4 CD @5 96
N21       @1 166
N44 01      @1 OTO
N82       @1 OTO

Format Inist (serveur)

NO : PASCAL 09-0228244 INIST
ET : The factored policy-gradient planner
AU : BUFFET (Olivier); ABERDEEN (Douglas); FOX (Maria); THIEBAUX (Sylvie)
AF : LORIA-INRIA, Nancy University/Nancy/France (1 aut.); Google Inc/Zurich/Suisse (2 aut.); Department of Computer and Information Sciences, University of Strathclyde/Glasgow/Royaume-Uni (1 aut.); Computer Sciences Laboratory & NICTA, Australian National University/Canberra/Australie (2 aut.)
DT : Publication en série; Niveau analytique
SO : Artificial intelligence; ISSN 0004-3702; Coden AINTBB; Pays-Bas; Da. 2009; Vol. 173; No. 5-6; Pp. 722-747; Bibl. 53 ref.
LA : Anglais
EA : We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.
CC : 001D02A05; 001D02C
FD : Planification; Apprentissage renforcé; Métrique; Approche probabiliste; Modélisation; Fonction discrète; Programmation dynamique; Programmation stochastique; Recherche locale; Approximation d'une fonction
ED : Planning; Reinforcement learning; Metric; Probabilistic approach; Modeling; Discrete function; Dynamic programming; Stochastic programming; Local search; Function approximation
SD : Planificación; Aprendizaje reforzado; Métrico; Enfoque probabilista; Modelización; Función discreta; Programación dinámica; Programación estocástica; Busca local; Aproximación de funciones
LO : INIST-15159.354000184847340080
ID : 09-0228244

Links to Exploration step

Pascal:09-0228244

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">The factored policy-gradient planner</title>
<author>
<name sortKey="Buffet, Olivier" sort="Buffet, Olivier" uniqKey="Buffet O" first="Olivier" last="Buffet">Olivier Buffet</name>
<affiliation>
<inist:fA14 i1="01">
<s1>LORIA-INRIA, Nancy University</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Aberdeen, Douglas" sort="Aberdeen, Douglas" uniqKey="Aberdeen D" first="Douglas" last="Aberdeen">Douglas Aberdeen</name>
<affiliation>
<inist:fA14 i1="02">
<s1>Google Inc</s1>
<s2>Zurich</s2>
<s3>CHE</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0228244</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0228244 INIST</idno>
<idno type="RBID">Pascal:09-0228244</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000271</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">The factored policy-gradient planner</title>
<author>
<name sortKey="Buffet, Olivier" sort="Buffet, Olivier" uniqKey="Buffet O" first="Olivier" last="Buffet">Olivier Buffet</name>
<affiliation>
<inist:fA14 i1="01">
<s1>LORIA-INRIA, Nancy University</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Aberdeen, Douglas" sort="Aberdeen, Douglas" uniqKey="Aberdeen D" first="Douglas" last="Aberdeen">Douglas Aberdeen</name>
<affiliation>
<inist:fA14 i1="02">
<s1>Google Inc</s1>
<s2>Zurich</s2>
<s3>CHE</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Artificial intelligence</title>
<title level="j" type="abbreviated">Artif. intell.</title>
<idno type="ISSN">0004-3702</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Artificial intelligence</title>
<title level="j" type="abbreviated">Artif. intell.</title>
<idno type="ISSN">0004-3702</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Discrete function</term>
<term>Dynamic programming</term>
<term>Function approximation</term>
<term>Local search</term>
<term>Metric</term>
<term>Modeling</term>
<term>Planning</term>
<term>Probabilistic approach</term>
<term>Reinforcement learning</term>
<term>Stochastic programming</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Planification</term>
<term>Apprentissage renforcé</term>
<term>Métrique</term>
<term>Approche probabiliste</term>
<term>Modélisation</term>
<term>Fonction discrète</term>
<term>Programmation dynamique</term>
<term>Programmation stochastique</term>
<term>Recherche locale</term>
<term>Approximation d'une fonction</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0004-3702</s0>
</fA01>
<fA02 i1="01">
<s0>AINTBB</s0>
</fA02>
<fA03 i2="1">
<s0>Artif. intell.</s0>
</fA03>
<fA05>
<s2>173</s2>
</fA05>
<fA06>
<s2>5-6</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>The factored policy-gradient planner</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>Advances in Automated Plan Generation</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>BUFFET (Olivier)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>ABERDEEN (Douglas)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>FOX (Maria)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>THIEBAUX (Sylvie)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>LORIA-INRIA, Nancy University</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Google Inc</s1>
<s2>Zurich</s2>
<s3>CHE</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01">
<s1>Department of Computer and Information Sciences, University of Strathclyde</s1>
<s2>Glasgow</s2>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02">
<s1>Computer Sciences Laboratory & NICTA, Australian National University</s1>
<s2>Canberra</s2>
<s3>AUS</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA20>
<s1>722-747</s1>
</fA20>
<fA21>
<s1>2009</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>15159</s2>
<s5>354000184847340080</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2009 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>53 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>09-0228244</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Artificial intelligence</s0>
</fA64>
<fA66 i1="01">
<s0>NLD</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D02A05</s0>
</fC02>
<fC02 i1="02" i2="X">
<s0>001D02C</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Planification</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Planning</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Planificación</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Apprentissage renforcé</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Reinforcement learning</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Aprendizaje reforzado</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Métrique</s0>
<s5>18</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Metric</s0>
<s5>18</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Métrico</s0>
<s5>18</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Approche probabiliste</s0>
<s5>23</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Probabilistic approach</s0>
<s5>23</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Enfoque probabilista</s0>
<s5>23</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Modélisation</s0>
<s5>24</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Modeling</s0>
<s5>24</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Modelización</s0>
<s5>24</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Fonction discrète</s0>
<s5>25</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>Discrete function</s0>
<s5>25</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Función discreta</s0>
<s5>25</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Programmation dynamique</s0>
<s5>26</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Dynamic programming</s0>
<s5>26</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Programación dinámica</s0>
<s5>26</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Programmation stochastique</s0>
<s5>27</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Stochastic programming</s0>
<s5>27</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Programación estocástica</s0>
<s5>27</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Recherche locale</s0>
<s5>28</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Local search</s0>
<s5>28</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Busca local</s0>
<s5>28</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Approximation d'une fonction</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Function approximation</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Aproximación de funciones</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fN21>
<s1>166</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
</standard>
<server>
<NO>PASCAL 09-0228244 INIST</NO>
<ET>The factored policy-gradient planner</ET>
<AU>BUFFET (Olivier); ABERDEEN (Douglas); FOX (Maria); THIEBAUX (Sylvie)</AU>
<AF>LORIA-INRIA, Nancy University/Nancy/France (1 aut.); Google Inc/Zurich/Suisse (2 aut.); Department of Computer and Information Sciences, University of Strathclyde/Glasgow/Royaume-Uni (1 aut.); Computer Sciences Laboratory & NICTA, Australian National University/Canberra/Australie (2 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Artificial intelligence; ISSN 0004-3702; Coden AINTBB; Pays-Bas; Da. 2009; Vol. 173; No. 5-6; Pp. 722-747; Bibl. 53 ref.</SO>
<LA>Anglais</LA>
<EA>We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.</EA>
<CC>001D02A05; 001D02C</CC>
<FD>Planification; Apprentissage renforcé; Métrique; Approche probabiliste; Modélisation; Fonction discrète; Programmation dynamique; Programmation stochastique; Recherche locale; Approximation d'une fonction</FD>
<ED>Planning; Reinforcement learning; Metric; Probabilistic approach; Modeling; Discrete function; Dynamic programming; Stochastic programming; Local search; Function approximation</ED>
<SD>Planificación; Aprendizaje reforzado; Métrico; Enfoque probabilista; Modelización; Función discreta; Programación dinámica; Programación estocástica; Busca local; Aproximación de funciones</SD>
<LO>INIST-15159.354000184847340080</LO>
<ID>09-0228244</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000271 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000271 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:09-0228244
   |texte=   The factored policy-gradient planner
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022