InforLorV4, PascalFrancis, Corpus, bibRecord, 000440

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Identifieur interne : 000440 ( PascalFrancis/Corpus ); précédent : 000439; suivant : 000441

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Auteurs : Guillaume J. Laurent ; Emmanuel Piat

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2006.

RBID : Pascal:06-0329055

Descripteurs français

Pascal (Inist)
- Système incertain, Parallélisation, Allocation dynamique, Navigation, Espace état, Apprentissage renforcé, Observable, Labyrinthe, Décision Markov, Dynamique processus, Récompense, Processus Markov, Grande dimension, Méthode espace état.

English descriptors

KwdEn :
- Dynamic allocation, Labyrinth, Large dimension, Markov decision, Markov process, Navigation, Observable, Parallelization, Process dynamics, Reinforcement learning, Reward, State space, State space method, Uncertain system.

Abstract

This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0992-499X`
A03		`1`		`@0 Rev. intell. artif.`
A05				`@2 20`
A06				`@2 2-3`
A08	`01`	`1`	`FRE`	`@1 Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving`
A09	`01`	`1`	`FRE`	`@1 Décision et planification dans l'incertain`
A11	`01`	`1`		`@1 LAURENT (Guillaume J.)`
A11	`02`	`1`		`@1 PIAT (Emmanuel)`
A12	`01`	`1`		`@1 CHARPILLET (F.) @9 ed.`
A12	`02`	`1`		`@1 GARCIA (F.) @9 ed.`
A12	`03`	`1`		`@1 PERNY (Patrice) @9 ed.`
A12	`04`	`1`		`@1 SIGAUD (Olivier) @9 ed.`
A14	`01`			`@1 Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary @2 25000 Besançon @3 FRA @Z 1 aut. @Z 2 aut.`
A15	`01`			`@1 LORIA-INRIA @2 Nancy @3 FRA @Z 1 aut.`
A15	`02`			`@1 INRA-MIA @2 Toulouse @3 FRA @Z 2 aut.`
A15	`03`			`@1 LIP6 @2 Paris @3 FRA @Z 3 aut. @Z 4 aut.`
A20				`@1 275-309`
A21				`@1 2006`
A23	`01`			`@0 FRE`
A24	`01`			`@0 eng`
A43	`01`			`@1 INIST @2 21320 @5 354000142556580050`
A44				`@0 0000 @1 © 2006 INIST-CNRS. All rights reserved.`
A45				`@0 1 p.1/2`
A47	`01`	`1`		`@0 06-0329055`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Revue d'intelligence artificielle`
A66	`01`			`@0 FRA`
C01	`01`		`ENG`	@0 This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.
C02	`01`	`X`		`@0 001D01A08`
C02	`02`	`X`		`@0 001D02C`
C03	`01`	`X`	`FRE`	`@0 Système incertain @5 01`
C03	`01`	`X`	`ENG`	`@0 Uncertain system @5 01`
C03	`01`	`X`	`SPA`	`@0 Sistema incierto @5 01`
C03	`02`	`X`	`FRE`	`@0 Parallélisation @5 06`
C03	`02`	`X`	`ENG`	`@0 Parallelization @5 06`
C03	`02`	`X`	`SPA`	`@0 Paralelisacíon @5 06`
C03	`03`	`X`	`FRE`	`@0 Allocation dynamique @5 07`
C03	`03`	`X`	`ENG`	`@0 Dynamic allocation @5 07`
C03	`03`	`X`	`SPA`	`@0 Asignación dinámica @5 07`
C03	`04`	`X`	`FRE`	`@0 Navigation @5 08`
C03	`04`	`X`	`ENG`	`@0 Navigation @5 08`
C03	`04`	`X`	`SPA`	`@0 Navegación @5 08`
C03	`05`	`X`	`FRE`	`@0 Espace état @5 09`
C03	`05`	`X`	`ENG`	`@0 State space @5 09`
C03	`05`	`X`	`SPA`	`@0 Espacio estado @5 09`
C03	`06`	`X`	`FRE`	`@0 Apprentissage renforcé @5 10`
C03	`06`	`X`	`ENG`	`@0 Reinforcement learning @5 10`
C03	`06`	`X`	`SPA`	`@0 Aprendizaje reforzado @5 10`
C03	`07`	`X`	`FRE`	`@0 Observable @5 18`
C03	`07`	`X`	`ENG`	`@0 Observable @5 18`
C03	`07`	`X`	`SPA`	`@0 Observable @5 18`
C03	`08`	`X`	`FRE`	`@0 Labyrinthe @5 19`
C03	`08`	`X`	`ENG`	`@0 Labyrinth @5 19`
C03	`08`	`X`	`SPA`	`@0 Laberinto @5 19`
C03	`09`	`X`	`FRE`	`@0 Décision Markov @5 20`
C03	`09`	`X`	`ENG`	`@0 Markov decision @5 20`
C03	`09`	`X`	`SPA`	`@0 Decisión Markov @5 20`
C03	`10`	`X`	`FRE`	`@0 Dynamique processus @5 21`
C03	`10`	`X`	`ENG`	`@0 Process dynamics @5 21`
C03	`10`	`X`	`SPA`	`@0 Dinámica proceso @5 21`
C03	`11`	`X`	`FRE`	`@0 Récompense @5 22`
C03	`11`	`X`	`ENG`	`@0 Reward @5 22`
C03	`11`	`X`	`SPA`	`@0 Recompensa @5 22`
C03	`12`	`X`	`FRE`	`@0 Processus Markov @5 23`
C03	`12`	`X`	`ENG`	`@0 Markov process @5 23`
C03	`12`	`X`	`SPA`	`@0 Proceso Markov @5 23`
C03	`13`	`X`	`FRE`	`@0 Grande dimension @5 24`
C03	`13`	`X`	`ENG`	`@0 Large dimension @5 24`
C03	`13`	`X`	`SPA`	`@0 Gran dimensión @5 24`
C03	`14`	`X`	`FRE`	`@0 Méthode espace état @5 25`
C03	`14`	`X`	`ENG`	`@0 State space method @5 25`
C03	`14`	`X`	`SPA`	`@0 Método espacio estado @5 25`
N21				`@1 212`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

A30	`01`	`1`	`FRE`	`@1 Décision dynamique et planification dans l'incertain. Journée @3 Paris FRA @4 2004-05-07`

Format Inist (serveur)

NO :	PASCAL 06-0329055 INIST
FT :	Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving
AU :	LAURENT (Guillaume J.); PIAT (Emmanuel); CHARPILLET (F.); GARCIA (F.); PERNY (Patrice); SIGAUD (Olivier)
AF :	Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary/25000 Besançon/France (1 aut., 2 aut.); LORIA-INRIA/Nancy/France (1 aut.); INRA-MIA/Toulouse/France (2 aut.); LIP6/Paris/France (3 aut., 4 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2006; Vol. 20; No. 2-3; Pp. 275-309; Abs. anglais; Bibl. 1 p.1/2
LA :	Français
EA :	This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.
CC :	001D01A08; 001D02C
FD :	Système incertain; Parallélisation; Allocation dynamique; Navigation; Espace état; Apprentissage renforcé; Observable; Labyrinthe; Décision Markov; Dynamique processus; Récompense; Processus Markov; Grande dimension; Méthode espace état
ED :	Uncertain system; Parallelization; Dynamic allocation; Navigation; State space; Reinforcement learning; Observable; Labyrinth; Markov decision; Process dynamics; Reward; Markov process; Large dimension; State space method
SD :	Sistema incierto; Paralelisacíon; Asignación dinámica; Navegación; Espacio estado; Aprendizaje reforzado; Observable; Laberinto; Decisión Markov; Dinámica proceso; Recompensa; Proceso Markov; Gran dimensión; Método espacio estado
LO :	INIST-21320.354000142556580050
ID :	06-0329055

Links to Exploration step

Pascal:06-0329055

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving</title>
<author><name sortKey="Laurent, Guillaume J" sort="Laurent, Guillaume J" uniqKey="Laurent G" first="Guillaume J." last="Laurent">Guillaume J. Laurent</name>
<affiliation><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Piat, Emmanuel" sort="Piat, Emmanuel" uniqKey="Piat E" first="Emmanuel" last="Piat">Emmanuel Piat</name>
<affiliation><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">06-0329055</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 06-0329055 INIST</idno>
<idno type="RBID">Pascal:06-0329055</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000440</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving</title>
<author><name sortKey="Laurent, Guillaume J" sort="Laurent, Guillaume J" uniqKey="Laurent G" first="Guillaume J." last="Laurent">Guillaume J. Laurent</name>
<affiliation><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Piat, Emmanuel" sort="Piat, Emmanuel" uniqKey="Piat E" first="Emmanuel" last="Piat">Emmanuel Piat</name>
<affiliation><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Dynamic allocation</term>
<term>Labyrinth</term>
<term>Large dimension</term>
<term>Markov decision</term>
<term>Markov process</term>
<term>Navigation</term>
<term>Observable</term>
<term>Parallelization</term>
<term>Process dynamics</term>
<term>Reinforcement learning</term>
<term>Reward</term>
<term>State space</term>
<term>State space method</term>
<term>Uncertain system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Système incertain</term>
<term>Parallélisation</term>
<term>Allocation dynamique</term>
<term>Navigation</term>
<term>Espace état</term>
<term>Apprentissage renforcé</term>
<term>Observable</term>
<term>Labyrinthe</term>
<term>Décision Markov</term>
<term>Dynamique processus</term>
<term>Récompense</term>
<term>Processus Markov</term>
<term>Grande dimension</term>
<term>Méthode espace état</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0992-499X</s0>
</fA01>
<fA03 i2="1"><s0>Rev. intell. artif.</s0>
</fA03>
<fA05><s2>20</s2>
</fA05>
<fA06><s2>2-3</s2>
</fA06>
<fA08 i1="01" i2="1" l="FRE"><s1>Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving</s1>
</fA08>
<fA09 i1="01" i2="1" l="FRE"><s1>Décision et planification dans l'incertain</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>LAURENT (Guillaume J.)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>PIAT (Emmanuel)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>CHARPILLET (F.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>GARCIA (F.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>PERNY (Patrice)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="04" i2="1"><s1>SIGAUD (Olivier)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>LORIA-INRIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>INRA-MIA</s1>
<s2>Toulouse</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03"><s1>LIP6</s1>
<s2>Paris</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</fA15>
<fA20><s1>275-309</s1>
</fA20>
<fA21><s1>2006</s1>
</fA21>
<fA23 i1="01"><s0>FRE</s0>
</fA23>
<fA24 i1="01"><s0>eng</s0>
</fA24>
<fA43 i1="01"><s1>INIST</s1>
<s2>21320</s2>
<s5>354000142556580050</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2006 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.1/2</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>06-0329055</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Revue d'intelligence artificielle</s0>
</fA64>
<fA66 i1="01"><s0>FRA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D01A08</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>001D02C</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Système incertain</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Uncertain system</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Sistema incierto</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Parallélisation</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Parallelization</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Paralelisacíon</s0>
<s5>06</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Allocation dynamique</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Dynamic allocation</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Asignación dinámica</s0>
<s5>07</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Navigation</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Navigation</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Navegación</s0>
<s5>08</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Espace état</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>State space</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Espacio estado</s0>
<s5>09</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Apprentissage renforcé</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Reinforcement learning</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Aprendizaje reforzado</s0>
<s5>10</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Observable</s0>
<s5>18</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Observable</s0>
<s5>18</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Observable</s0>
<s5>18</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Labyrinthe</s0>
<s5>19</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Labyrinth</s0>
<s5>19</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Laberinto</s0>
<s5>19</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Décision Markov</s0>
<s5>20</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Markov decision</s0>
<s5>20</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Decisión Markov</s0>
<s5>20</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Dynamique processus</s0>
<s5>21</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Process dynamics</s0>
<s5>21</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Dinámica proceso</s0>
<s5>21</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Récompense</s0>
<s5>22</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>Reward</s0>
<s5>22</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Recompensa</s0>
<s5>22</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Processus Markov</s0>
<s5>23</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Markov process</s0>
<s5>23</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Proceso Markov</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE"><s0>Grande dimension</s0>
<s5>24</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG"><s0>Large dimension</s0>
<s5>24</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA"><s0>Gran dimensión</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE"><s0>Méthode espace état</s0>
<s5>25</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG"><s0>State space method</s0>
<s5>25</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA"><s0>Método espacio estado</s0>
<s5>25</s5>
</fC03>
<fN21><s1>212</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
<pR><fA30 i1="01" i2="1" l="FRE"><s1>Décision dynamique et planification dans l'incertain. Journée</s1>
<s3>Paris FRA</s3>
<s4>2004-05-07</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 06-0329055 INIST</NO>
<FT>Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving</FT>
<AU>LAURENT (Guillaume J.); PIAT (Emmanuel); CHARPILLET (F.); GARCIA (F.); PERNY (Patrice); SIGAUD (Olivier)</AU>
<AF>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary/25000 Besançon/France (1 aut., 2 aut.); LORIA-INRIA/Nancy/France (1 aut.); INRA-MIA/Toulouse/France (2 aut.); LIP6/Paris/France (3 aut., 4 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2006; Vol. 20; No. 2-3; Pp. 275-309; Abs. anglais; Bibl. 1 p.1/2</SO>
<LA>Français</LA>
<EA>This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.</EA>
<CC>001D01A08; 001D02C</CC>
<FD>Système incertain; Parallélisation; Allocation dynamique; Navigation; Espace état; Apprentissage renforcé; Observable; Labyrinthe; Décision Markov; Dynamique processus; Récompense; Processus Markov; Grande dimension; Méthode espace état</FD>
<ED>Uncertain system; Parallelization; Dynamic allocation; Navigation; State space; Reinforcement learning; Observable; Labyrinth; Markov decision; Process dynamics; Reward; Markov process; Large dimension; State space method</ED>
<SD>Sistema incierto; Paralelisacíon; Asignación dinámica; Navegación; Espacio estado; Aprendizaje reforzado; Observable; Laberinto; Decisión Markov; Dinámica proceso; Recompensa; Proceso Markov; Gran dimensión; Método espacio estado</SD>
<LO>INIST-21320.354000142556580050</LO>
<ID>06-0329055</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000440 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000440 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:06-0329055
   |texte=   Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri