Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

l1-penalized projected Bellman residual

Identifieur interne : 005457 ( Hal/Corpus ); précédent : 005456; suivant : 005458

l1-penalized projected Bellman residual

Auteurs : Matthieu Geist ; Bruno Scherrer

Source :

RBID : Hal:hal-00644507

Abstract

We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with $\ell_1$-regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an $\ell_1$-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an $\ell_1$-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.

Url:

Links to Exploration step

Hal:hal-00644507

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">l1-penalized projected Bellman residual</title>
<author>
<name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<affiliation>
<hal:affiliation type="laboratory" xml:id="struct-26305" status="VALID">
<orgName>SUPELEC-Campus Metz</orgName>
<desc>
<address>
<addrLine>2 rue Edouard Belin 57070 Metz</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.metz.supelec.fr/metz/</ref>
</desc>
<listRelation>
<relation active="#struct-300812" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300812" type="direct">
<org type="institution" xml:id="struct-300812" status="VALID">
<orgName>SUPELEC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Scherrer, Bruno" sort="Scherrer, Bruno" uniqKey="Scherrer B" first="Bruno" last="Scherrer">Bruno Scherrer</name>
<affiliation>
<hal:affiliation type="researchteam" xml:id="struct-2355" status="OLD">
<idno type="RNSR">200218290B</idno>
<orgName>Autonomous intelligent machine</orgName>
<orgName type="acronym">MAIA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/maia</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
<relation active="#struct-2496" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-2496" type="direct">
<org type="laboratory" xml:id="struct-2496" status="OLD">
<orgName>INRIA Lorraine</orgName>
<desc>
<address>
<addrLine>615 rue du Jardin Botanique 54600 Villers-lès-Nancy</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre-de-recherche-inria/nancy-grand-est</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-00644507</idno>
<idno type="halId">hal-00644507</idno>
<idno type="halUri">https://hal.inria.fr/hal-00644507</idno>
<idno type="url">https://hal.inria.fr/hal-00644507</idno>
<date when="2011-09-09">2011-09-09</date>
<idno type="wicri:Area/Hal/Corpus">005457</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">l1-penalized projected Bellman residual</title>
<author>
<name sortKey="Geist, Matthieu" sort="Geist, Matthieu" uniqKey="Geist M" first="Matthieu" last="Geist">Matthieu Geist</name>
<affiliation>
<hal:affiliation type="laboratory" xml:id="struct-26305" status="VALID">
<orgName>SUPELEC-Campus Metz</orgName>
<desc>
<address>
<addrLine>2 rue Edouard Belin 57070 Metz</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.metz.supelec.fr/metz/</ref>
</desc>
<listRelation>
<relation active="#struct-300812" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300812" type="direct">
<org type="institution" xml:id="struct-300812" status="VALID">
<orgName>SUPELEC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Scherrer, Bruno" sort="Scherrer, Bruno" uniqKey="Scherrer B" first="Bruno" last="Scherrer">Bruno Scherrer</name>
<affiliation>
<hal:affiliation type="researchteam" xml:id="struct-2355" status="OLD">
<idno type="RNSR">200218290B</idno>
<orgName>Autonomous intelligent machine</orgName>
<orgName type="acronym">MAIA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/maia</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
<relation active="#struct-2496" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-2496" type="direct">
<org type="laboratory" xml:id="struct-2496" status="OLD">
<orgName>INRIA Lorraine</orgName>
<desc>
<address>
<addrLine>615 rue du Jardin Botanique 54600 Villers-lès-Nancy</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre-de-recherche-inria/nancy-grand-est</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with $\ell_1$-regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an $\ell_1$-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an $\ell_1$-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.</div>
</front>
</TEI>
<hal api="V3">
<titleStmt>
<title xml:lang="en">l1-penalized projected Bellman residual</title>
<author role="aut">
<persName>
<forename type="first">Matthieu</forename>
<surname>Geist</surname>
</persName>
<email>Matthieu.Geist@Supelec.fr</email>
<idno type="idhal">matthieu-geist</idno>
<idno type="halauthor">357433</idno>
<affiliation ref="#struct-26305"></affiliation>
</author>
<author role="aut">
<persName>
<forename type="first">Bruno</forename>
<surname>Scherrer</surname>
</persName>
<email>scherrer@loria.fr</email>
<ptr type="url" target="http://www.loria.fr/~scherrer"></ptr>
<idno type="idhal">bruno-scherrer</idno>
<idno type="halauthor">662428</idno>
<idno type="IdRef">http://www.idref.fr/073360708</idno>
<affiliation ref="#struct-2355"></affiliation>
</author>
<editor role="depositor">
<persName>
<forename>Bruno</forename>
<surname>Scherrer</surname>
</persName>
<email>bruno.scherrer@inria.fr</email>
</editor>
</titleStmt>
<editionStmt>
<edition n="v1" type="current">
<date type="whenSubmitted">2011-11-24 15:16:07</date>
<date type="whenModified">2016-05-18 08:55:32</date>
<date type="whenReleased">2011-11-25 10:19:20</date>
<date type="whenProduced">2011-09-09</date>
<date type="whenEndEmbargoed">2011-11-24</date>
<ref type="file" target="https://hal.inria.fr/hal-00644507/document">
<date notBefore="2011-11-24"></date>
</ref>
<ref type="file" subtype="author" n="1" target="https://hal.inria.fr/hal-00644507/file/gs_ewrl_l1_cr.pdf">
<date notBefore="2011-11-24"></date>
</ref>
</edition>
<respStmt>
<resp>contributor</resp>
<name key="103149">
<persName>
<forename>Bruno</forename>
<surname>Scherrer</surname>
</persName>
<email>bruno.scherrer@inria.fr</email>
</name>
</respStmt>
</editionStmt>
<publicationStmt>
<distributor>CCSD</distributor>
<idno type="halId">hal-00644507</idno>
<idno type="halUri">https://hal.inria.fr/hal-00644507</idno>
<idno type="halBibtex">geist:hal-00644507</idno>
<idno type="halRefHtml">European Wrokshop on Reinforcement Learning (EWRL 11), Sep 2011, Athens, Greece. 2011</idno>
<idno type="halRef">European Wrokshop on Reinforcement Learning (EWRL 11), Sep 2011, Athens, Greece. 2011</idno>
</publicationStmt>
<seriesStmt>
<idno type="stamp" n="CNRS">CNRS - Centre national de la recherche scientifique</idno>
<idno type="stamp" n="INRIA">INRIA - Institut National de Recherche en Informatique et en Automatique</idno>
<idno type="stamp" n="INPL">Institut National Polytechnique de Lorraine</idno>
<idno type="stamp" n="LORIA2">Publications du LORIA</idno>
<idno type="stamp" n="INRIA-NANCY-GRAND-EST">INRIA Nancy - Grand Est</idno>
<idno type="stamp" n="SUPELEC">SUPELEC</idno>
<idno type="stamp" n="SUP_IMS" p="SUPELEC">IMS - Equipe Information, Multimodalité et Signal</idno>
<idno type="stamp" n="LORIA-CSAI" p="LORIA">Systèmes complexes et intelligence artificielle</idno>
<idno type="stamp" n="LORIA">LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications</idno>
<idno type="stamp" n="INRIA2">INRIA 2</idno>
<idno type="stamp" n="INRIA-LORRAINE">INRIA Nancy - Grand Est</idno>
<idno type="stamp" n="LABO-LORIA-SET" p="LORIA">LABO-LORIA-SET</idno>
<idno type="stamp" n="UNIV-LORRAINE">Université de Lorraine</idno>
</seriesStmt>
<notesStmt>
<note type="audience" n="2">International</note>
<note type="invited" n="0">No</note>
<note type="popular" n="0">No</note>
<note type="peer" n="1">Yes</note>
<note type="proceedings" n="1">Yes</note>
</notesStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">l1-penalized projected Bellman residual</title>
<author role="aut">
<persName>
<forename type="first">Matthieu</forename>
<surname>Geist</surname>
</persName>
<email>Matthieu.Geist@Supelec.fr</email>
<idno type="idHal">matthieu-geist</idno>
<idno type="halAuthorId">357433</idno>
<affiliation ref="#struct-26305"></affiliation>
</author>
<author role="aut">
<persName>
<forename type="first">Bruno</forename>
<surname>Scherrer</surname>
</persName>
<email>scherrer@loria.fr</email>
<ptr type="url" target="http://www.loria.fr/~scherrer"></ptr>
<idno type="idHal">bruno-scherrer</idno>
<idno type="halAuthorId">662428</idno>
<idno type="IdRef">http://www.idref.fr/073360708</idno>
<affiliation ref="#struct-2355"></affiliation>
</author>
</analytic>
<monogr>
<meeting>
<title>European Wrokshop on Reinforcement Learning (EWRL 11)</title>
<date type="start">2011-09-09</date>
<date type="end">2011-09-11</date>
<settlement>Athens</settlement>
<country key="GR">Greece</country>
</meeting>
<imprint>
<date type="datePub">2011</date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
<profileDesc>
<langUsage>
<language ident="en">English</language>
</langUsage>
<textClass>
<classCode scheme="halDomain" n="info.info-ai">Computer Science [cs]/Artificial Intelligence [cs.AI]</classCode>
<classCode scheme="halTypology" n="COMM">Conference papers</classCode>
</textClass>
<abstract xml:lang="en">We consider the task of feature selection for value function approximation in reinforcement learning. A promising approach consists in combining the Least-Squares Temporal Difference (LSTD) algorithm with $\ell_1$-regularization, which has proven to be effective in the supervised learning community. This has been done recently whit the LARS-TD algorithm, which replaces the projection operator of LSTD with an $\ell_1$-penalized projection and solves the corresponding fixed-point problem. However, this approach is not guaranteed to be correct in the general off-policy setting. We take a different route by adding an $\ell_1$-penalty term to the projected Bellman residual, which requires weaker assumptions while offering a comparable performance. However, this comes at the cost of a higher computational complexity if only a part of the regularization path is computed. Nevertheless, our approach ends up to a supervised learning problem, which let envision easy extensions to other penalties.</abstract>
</profileDesc>
</hal>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Hal/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 005457 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Hal/Corpus/biblio.hfd -nk 005457 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Hal
   |étape=   Corpus
   |type=    RBID
   |clé=     Hal:hal-00644507
   |texte=   l1-penalized projected Bellman residual
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022