The factored policy-gradient planner
Identifieur interne : 000271 ( PascalFrancis/Corpus ); précédent : 000270; suivant : 000272The factored policy-gradient planner
Auteurs : Olivier Buffet ; Douglas AberdeenSource :
- Artificial intelligence [ 0004-3702 ] ; 2009.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.
Notice en format standard (ISO 2709)
Pour connaître la documentation sur le format Inist Standard.
pA |
|
---|
Format Inist (serveur)
NO : | PASCAL 09-0228244 INIST |
---|---|
ET : | The factored policy-gradient planner |
AU : | BUFFET (Olivier); ABERDEEN (Douglas); FOX (Maria); THIEBAUX (Sylvie) |
AF : | LORIA-INRIA, Nancy University/Nancy/France (1 aut.); Google Inc/Zurich/Suisse (2 aut.); Department of Computer and Information Sciences, University of Strathclyde/Glasgow/Royaume-Uni (1 aut.); Computer Sciences Laboratory & NICTA, Australian National University/Canberra/Australie (2 aut.) |
DT : | Publication en série; Niveau analytique |
SO : | Artificial intelligence; ISSN 0004-3702; Coden AINTBB; Pays-Bas; Da. 2009; Vol. 173; No. 5-6; Pp. 722-747; Bibl. 53 ref. |
LA : | Anglais |
EA : | We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition. |
CC : | 001D02A05; 001D02C |
FD : | Planification; Apprentissage renforcé; Métrique; Approche probabiliste; Modélisation; Fonction discrète; Programmation dynamique; Programmation stochastique; Recherche locale; Approximation d'une fonction |
ED : | Planning; Reinforcement learning; Metric; Probabilistic approach; Modeling; Discrete function; Dynamic programming; Stochastic programming; Local search; Function approximation |
SD : | Planificación; Aprendizaje reforzado; Métrico; Enfoque probabilista; Modelización; Función discreta; Programación dinámica; Programación estocástica; Busca local; Aproximación de funciones |
LO : | INIST-15159.354000184847340080 |
ID : | 09-0228244 |
Links to Exploration step
Pascal:09-0228244Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">The factored policy-gradient planner</title>
<author><name sortKey="Buffet, Olivier" sort="Buffet, Olivier" uniqKey="Buffet O" first="Olivier" last="Buffet">Olivier Buffet</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA-INRIA, Nancy University</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Aberdeen, Douglas" sort="Aberdeen, Douglas" uniqKey="Aberdeen D" first="Douglas" last="Aberdeen">Douglas Aberdeen</name>
<affiliation><inist:fA14 i1="02"><s1>Google Inc</s1>
<s2>Zurich</s2>
<s3>CHE</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">09-0228244</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0228244 INIST</idno>
<idno type="RBID">Pascal:09-0228244</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000271</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">The factored policy-gradient planner</title>
<author><name sortKey="Buffet, Olivier" sort="Buffet, Olivier" uniqKey="Buffet O" first="Olivier" last="Buffet">Olivier Buffet</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA-INRIA, Nancy University</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Aberdeen, Douglas" sort="Aberdeen, Douglas" uniqKey="Aberdeen D" first="Douglas" last="Aberdeen">Douglas Aberdeen</name>
<affiliation><inist:fA14 i1="02"><s1>Google Inc</s1>
<s2>Zurich</s2>
<s3>CHE</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Artificial intelligence</title>
<title level="j" type="abbreviated">Artif. intell.</title>
<idno type="ISSN">0004-3702</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Artificial intelligence</title>
<title level="j" type="abbreviated">Artif. intell.</title>
<idno type="ISSN">0004-3702</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Discrete function</term>
<term>Dynamic programming</term>
<term>Function approximation</term>
<term>Local search</term>
<term>Metric</term>
<term>Modeling</term>
<term>Planning</term>
<term>Probabilistic approach</term>
<term>Reinforcement learning</term>
<term>Stochastic programming</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Planification</term>
<term>Apprentissage renforcé</term>
<term>Métrique</term>
<term>Approche probabiliste</term>
<term>Modélisation</term>
<term>Fonction discrète</term>
<term>Programmation dynamique</term>
<term>Programmation stochastique</term>
<term>Recherche locale</term>
<term>Approximation d'une fonction</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0004-3702</s0>
</fA01>
<fA02 i1="01"><s0>AINTBB</s0>
</fA02>
<fA03 i2="1"><s0>Artif. intell.</s0>
</fA03>
<fA05><s2>173</s2>
</fA05>
<fA06><s2>5-6</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG"><s1>The factored policy-gradient planner</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Advances in Automated Plan Generation</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>BUFFET (Olivier)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>ABERDEEN (Douglas)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>FOX (Maria)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>THIEBAUX (Sylvie)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>LORIA-INRIA, Nancy University</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>Google Inc</s1>
<s2>Zurich</s2>
<s3>CHE</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>Department of Computer and Information Sciences, University of Strathclyde</s1>
<s2>Glasgow</s2>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>Computer Sciences Laboratory & NICTA, Australian National University</s1>
<s2>Canberra</s2>
<s3>AUS</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA20><s1>722-747</s1>
</fA20>
<fA21><s1>2009</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA43 i1="01"><s1>INIST</s1>
<s2>15159</s2>
<s5>354000184847340080</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2009 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>53 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>09-0228244</s0>
</fA47>
<fA60><s1>P</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Artificial intelligence</s0>
</fA64>
<fA66 i1="01"><s0>NLD</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02A05</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>001D02C</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Planification</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Planning</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Planificación</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Apprentissage renforcé</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Reinforcement learning</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Aprendizaje reforzado</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Métrique</s0>
<s5>18</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Metric</s0>
<s5>18</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Métrico</s0>
<s5>18</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Approche probabiliste</s0>
<s5>23</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Probabilistic approach</s0>
<s5>23</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Enfoque probabilista</s0>
<s5>23</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Modélisation</s0>
<s5>24</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Modeling</s0>
<s5>24</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Modelización</s0>
<s5>24</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Fonction discrète</s0>
<s5>25</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Discrete function</s0>
<s5>25</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Función discreta</s0>
<s5>25</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Programmation dynamique</s0>
<s5>26</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Dynamic programming</s0>
<s5>26</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Programación dinámica</s0>
<s5>26</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Programmation stochastique</s0>
<s5>27</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Stochastic programming</s0>
<s5>27</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Programación estocástica</s0>
<s5>27</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Recherche locale</s0>
<s5>28</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Local search</s0>
<s5>28</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Busca local</s0>
<s5>28</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Approximation d'une fonction</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Function approximation</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Aproximación de funciones</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fN21><s1>166</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
</standard>
<server><NO>PASCAL 09-0228244 INIST</NO>
<ET>The factored policy-gradient planner</ET>
<AU>BUFFET (Olivier); ABERDEEN (Douglas); FOX (Maria); THIEBAUX (Sylvie)</AU>
<AF>LORIA-INRIA, Nancy University/Nancy/France (1 aut.); Google Inc/Zurich/Suisse (2 aut.); Department of Computer and Information Sciences, University of Strathclyde/Glasgow/Royaume-Uni (1 aut.); Computer Sciences Laboratory & NICTA, Australian National University/Canberra/Australie (2 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Artificial intelligence; ISSN 0004-3702; Coden AINTBB; Pays-Bas; Da. 2009; Vol. 173; No. 5-6; Pp. 722-747; Bibl. 53 ref.</SO>
<LA>Anglais</LA>
<EA>We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dynamic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approximation methods and factorisation of the policy, allow us to tackle complex domains. This factored policy gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-IPC, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.</EA>
<CC>001D02A05; 001D02C</CC>
<FD>Planification; Apprentissage renforcé; Métrique; Approche probabiliste; Modélisation; Fonction discrète; Programmation dynamique; Programmation stochastique; Recherche locale; Approximation d'une fonction</FD>
<ED>Planning; Reinforcement learning; Metric; Probabilistic approach; Modeling; Discrete function; Dynamic programming; Stochastic programming; Local search; Function approximation</ED>
<SD>Planificación; Aprendizaje reforzado; Métrico; Enfoque probabilista; Modelización; Función discreta; Programación dinámica; Programación estocástica; Busca local; Aproximación de funciones</SD>
<LO>INIST-15159.354000184847340080</LO>
<ID>09-0228244</ID>
</server>
</inist>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000271 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000271 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= PascalFrancis |étape= Corpus |type= RBID |clé= Pascal:09-0228244 |texte= The factored policy-gradient planner }}
![]() | This area was generated with Dilib version V0.6.33. | ![]() |