InforLorV4, Main, Exploration, bibRecord, 005581

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Identifieur interne : 005581 ( Main/Exploration ); précédent : 005580; suivant : 005582

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Auteurs : Guillaume J. Laurent [France] ; Emmanuel Piat [France]

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2006.

RBID : Pascal:06-0329055

Descripteurs français

Pascal (Inist)
- Système incertain, Parallélisation, Allocation dynamique, Navigation, Espace état, Apprentissage renforcé, Observable, Labyrinthe, Décision Markov, Dynamique processus, Récompense, Processus Markov, Grande dimension, Méthode espace état.

English descriptors

KwdEn :
- Dynamic allocation, Labyrinth, Large dimension, Markov decision, Markov process, Navigation, Observable, Parallelization, Process dynamics, Reinforcement learning, Reward, State space, State space method, Uncertain system.

Abstract

This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000440
to stream PascalFrancis, to step Curation: 000593
to stream PascalFrancis, to step Checkpoint: 000326
to stream Main, to step Merge: 005731
to stream Main, to step Curation: 005581

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving</title>
<author><name sortKey="Laurent, Guillaume J" sort="Laurent, Guillaume J" uniqKey="Laurent G" first="Guillaume J." last="Laurent">Guillaume J. Laurent</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Bourgogne-Franche-Comté</region>
<region type="old region" nuts="2">Franche-Comté</region>
<settlement type="city">Besançon</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Piat, Emmanuel" sort="Piat, Emmanuel" uniqKey="Piat E" first="Emmanuel" last="Piat">Emmanuel Piat</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Bourgogne-Franche-Comté</region>
<region type="old region" nuts="2">Franche-Comté</region>
<settlement type="city">Besançon</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">06-0329055</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 06-0329055 INIST</idno>
<idno type="RBID">Pascal:06-0329055</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000440</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000593</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000326</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000326</idno>
<idno type="wicri:doubleKey">0992-499X:2006:Laurent G:apprentissage:par:renforcement</idno>
<idno type="wicri:Area/Main/Merge">005731</idno>
<idno type="wicri:Area/Main/Curation">005581</idno>
<idno type="wicri:Area/Main/Exploration">005581</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving</title>
<author><name sortKey="Laurent, Guillaume J" sort="Laurent, Guillaume J" uniqKey="Laurent G" first="Guillaume J." last="Laurent">Guillaume J. Laurent</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Bourgogne-Franche-Comté</region>
<region type="old region" nuts="2">Franche-Comté</region>
<settlement type="city">Besançon</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Piat, Emmanuel" sort="Piat, Emmanuel" uniqKey="Piat E" first="Emmanuel" last="Piat">Emmanuel Piat</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Laboratoire d'Automatique de Besançon - UMR CNRS 6596 24 rue Alain Savary</s1>
<s2>25000 Besançon</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Bourgogne-Franche-Comté</region>
<region type="old region" nuts="2">Franche-Comté</region>
<settlement type="city">Besançon</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Dynamic allocation</term>
<term>Labyrinth</term>
<term>Large dimension</term>
<term>Markov decision</term>
<term>Markov process</term>
<term>Navigation</term>
<term>Observable</term>
<term>Parallelization</term>
<term>Process dynamics</term>
<term>Reinforcement learning</term>
<term>Reward</term>
<term>State space</term>
<term>State space method</term>
<term>Uncertain system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Système incertain</term>
<term>Parallélisation</term>
<term>Allocation dynamique</term>
<term>Navigation</term>
<term>Espace état</term>
<term>Apprentissage renforcé</term>
<term>Observable</term>
<term>Labyrinthe</term>
<term>Décision Markov</term>
<term>Dynamique processus</term>
<term>Récompense</term>
<term>Processus Markov</term>
<term>Grande dimension</term>
<term>Méthode espace état</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents experimental results obtained with an original architecture that can do generic learning for randomly observable factored Markov decision process (ROFMDP). First, the paper describes the theoretical framework of ROFMDP and the working of this algorithm, in particular the parallelization principle and the dynamic reward allocation process. Then, the architecture is applied to two navigation problems (gridworld and New York Driving). The tests show that the architecture allows to learn a good and generic policy in spite of the large dimensions of the state spaces of both systems.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Bourgogne-Franche-Comté</li>
<li>Franche-Comté</li>
</region>
<settlement><li>Besançon</li>
</settlement>
</list>
<tree><country name="France"><region name="Bourgogne-Franche-Comté"><name sortKey="Laurent, Guillaume J" sort="Laurent, Guillaume J" uniqKey="Laurent G" first="Guillaume J." last="Laurent">Guillaume J. Laurent</name>
</region>
<name sortKey="Piat, Emmanuel" sort="Piat, Emmanuel" uniqKey="Piat E" first="Emmanuel" last="Piat">Emmanuel Piat</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 005581 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 005581 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:06-0329055
   |texte=   Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre : Étude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri