InforLorV4, PascalFrancis, Corpus, bibRecord, 000439

Etude de différentes combinaisons de comportements adaptatives

Identifieur interne : 000439 ( PascalFrancis/Corpus ); précédent : 000438; suivant : 000440

Etude de différentes combinaisons de comportements adaptatives

Auteurs : Olivier Buffet ; Alain Dutech ; Francois Charpillet

Source :

Revue d'intelligence artificielle [ 0992-499X ] ; 2006.

RBID : Pascal:06-0329056

Descripteurs français

Pascal (Inist)
- Système incertain, Apprentissage renforcé, Long terme, Motivation, Agent intelligent, Observable, Décision Markov, Processus Markov, Modélisation, Méthode adaptative.

English descriptors

KwdEn :
- Adaptive method, Intelligent agent, Long term, Markov decision, Markov process, Modeling, Motivation, Observable, Reinforcement learning, Uncertain system.

Abstract

This article focusses on the automated synthesis of agents in an uncertain environment, working in the setting of Reinforcement Learning, and more precisely of Partially Observable Markov Decision Processes. The agents (with no model of their environment and no short-term memory) are facing multiple motivations/goals simultaneously, a problem related to the field of Action Selection. We propose and evaluate various Action Selection architectures. They all combine already known basic behaviors in an adaptive manner, by learning the tuning of the combination, so as to maximize the agent's payoff. The logical continuation of this work is to automate the selection and design of the basic behaviors themselves.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0992-499X`
A03		`1`		`@0 Rev. intell. artif.`
A05				`@2 20`
A06				`@2 2-3`
A08	`01`	`1`	`FRE`	`@1 Etude de différentes combinaisons de comportements adaptatives`
A09	`01`	`1`	`FRE`	`@1 Décision et planification dans l'incertain`
A11	`01`	`1`		`@1 BUFFET (Olivier)`
A11	`02`	`1`		`@1 DUTECH (Alain)`
A11	`03`	`1`		`@1 CHARPILLET (Francois)`
A12	`01`	`1`		`@1 CHARPILLET (F.) @9 ed.`
A12	`02`	`1`		`@1 GARCIA (F.) @9 ed.`
A12	`03`	`1`		`@1 PERNY (Patrice) @9 ed.`
A12	`04`	`1`		`@1 SIGAUD (Olivier) @9 ed.`
A14	`01`			`@1 LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239 @2 54506 Vandoeuvre-lès-Nancy @3 FRA @Z 1 aut. @Z 2 aut. @Z 3 aut.`
A14	`02`			`@1 National ICT Australia & The Autralian National Vniversity RSISE Building 115 - ANU/ @2 Canberra ACT 0200 @3 AUS @Z 1 aut.`
A15	`01`			`@1 LORIA-INRIA @2 Nancy @3 FRA @Z 1 aut.`
A15	`02`			`@1 INRA-MIA @2 Toulouse @3 FRA @Z 2 aut.`
A15	`03`			`@1 LIP6 @2 Paris @3 FRA @Z 3 aut. @Z 4 aut.`
A20				`@1 311-343`
A21				`@1 2006`
A23	`01`			`@0 FRE`
A24	`01`			`@0 eng`
A43	`01`			`@1 INIST @2 21320 @5 354000142556580060`
A44				`@0 0000 @1 © 2006 INIST-CNRS. All rights reserved.`
A45				`@0 1 p.3/4`
A47	`01`	`1`		`@0 06-0329056`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Revue d'intelligence artificielle`
A66	`01`			`@0 FRA`
C01	`01`		`ENG`	@0 This article focusses on the automated synthesis of agents in an uncertain environment, working in the setting of Reinforcement Learning, and more precisely of Partially Observable Markov Decision Processes. The agents (with no model of their environment and no short-term memory) are facing multiple motivations/goals simultaneously, a problem related to the field of Action Selection. We propose and evaluate various Action Selection architectures. They all combine already known basic behaviors in an adaptive manner, by learning the tuning of the combination, so as to maximize the agent's payoff. The logical continuation of this work is to automate the selection and design of the basic behaviors themselves.
C02	`01`	`X`		`@0 001D02C`
C02	`02`	`X`		`@0 001D01A08`
C03	`01`	`X`	`FRE`	`@0 Système incertain @5 06`
C03	`01`	`X`	`ENG`	`@0 Uncertain system @5 06`
C03	`01`	`X`	`SPA`	`@0 Sistema incierto @5 06`
C03	`02`	`X`	`FRE`	`@0 Apprentissage renforcé @5 07`
C03	`02`	`X`	`ENG`	`@0 Reinforcement learning @5 07`
C03	`02`	`X`	`SPA`	`@0 Aprendizaje reforzado @5 07`
C03	`03`	`X`	`FRE`	`@0 Long terme @5 08`
C03	`03`	`X`	`ENG`	`@0 Long term @5 08`
C03	`03`	`X`	`SPA`	`@0 Largo plazo @5 08`
C03	`04`	`X`	`FRE`	`@0 Motivation @5 09`
C03	`04`	`X`	`ENG`	`@0 Motivation @5 09`
C03	`04`	`X`	`SPA`	`@0 Motivación @5 09`
C03	`05`	`X`	`FRE`	`@0 Agent intelligent @5 10`
C03	`05`	`X`	`ENG`	`@0 Intelligent agent @5 10`
C03	`05`	`X`	`SPA`	`@0 Agente inteligente @5 10`
C03	`06`	`X`	`FRE`	`@0 Observable @5 18`
C03	`06`	`X`	`ENG`	`@0 Observable @5 18`
C03	`06`	`X`	`SPA`	`@0 Observable @5 18`
C03	`07`	`X`	`FRE`	`@0 Décision Markov @5 19`
C03	`07`	`X`	`ENG`	`@0 Markov decision @5 19`
C03	`07`	`X`	`SPA`	`@0 Decisión Markov @5 19`
C03	`08`	`X`	`FRE`	`@0 Processus Markov @5 23`
C03	`08`	`X`	`ENG`	`@0 Markov process @5 23`
C03	`08`	`X`	`SPA`	`@0 Proceso Markov @5 23`
C03	`09`	`X`	`FRE`	`@0 Modélisation @5 24`
C03	`09`	`X`	`ENG`	`@0 Modeling @5 24`
C03	`09`	`X`	`SPA`	`@0 Modelización @5 24`
C03	`10`	`X`	`FRE`	`@0 Méthode adaptative @5 25`
C03	`10`	`X`	`ENG`	`@0 Adaptive method @5 25`
C03	`10`	`X`	`SPA`	`@0 Método adaptativo @5 25`
N21				`@1 212`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

A30	`01`	`1`	`FRE`	`@1 Décision dynamique et planification dans l'incertain. Journée @3 Paris FRA @4 2004-05-07`

Format Inist (serveur)

NO :	PASCAL 06-0329056 INIST
FT :	Etude de différentes combinaisons de comportements adaptatives
AU :	BUFFET (Olivier); DUTECH (Alain); CHARPILLET (Francois); CHARPILLET (F.); GARCIA (F.); PERNY (Patrice); SIGAUD (Olivier)
AF :	LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239/54506 Vandoeuvre-lès-Nancy/France (1 aut., 2 aut., 3 aut.); National ICT Australia & The Autralian National Vniversity RSISE Building 115 - ANU//Canberra ACT 0200/Australie (1 aut.); LORIA-INRIA/Nancy/France (1 aut.); INRA-MIA/Toulouse/France (2 aut.); LIP6/Paris/France (3 aut., 4 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2006; Vol. 20; No. 2-3; Pp. 311-343; Abs. anglais; Bibl. 1 p.3/4
LA :	Français
EA :	This article focusses on the automated synthesis of agents in an uncertain environment, working in the setting of Reinforcement Learning, and more precisely of Partially Observable Markov Decision Processes. The agents (with no model of their environment and no short-term memory) are facing multiple motivations/goals simultaneously, a problem related to the field of Action Selection. We propose and evaluate various Action Selection architectures. They all combine already known basic behaviors in an adaptive manner, by learning the tuning of the combination, so as to maximize the agent's payoff. The logical continuation of this work is to automate the selection and design of the basic behaviors themselves.
CC :	001D02C; 001D01A08
FD :	Système incertain; Apprentissage renforcé; Long terme; Motivation; Agent intelligent; Observable; Décision Markov; Processus Markov; Modélisation; Méthode adaptative
ED :	Uncertain system; Reinforcement learning; Long term; Motivation; Intelligent agent; Observable; Markov decision; Markov process; Modeling; Adaptive method
SD :	Sistema incierto; Aprendizaje reforzado; Largo plazo; Motivación; Agente inteligente; Observable; Decisión Markov; Proceso Markov; Modelización; Método adaptativo
LO :	INIST-21320.354000142556580060
ID :	06-0329056

Links to Exploration step

Pascal:06-0329056

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="fr" level="a">Etude de différentes combinaisons de comportements adaptatives</title>
<author><name sortKey="Buffet, Olivier" sort="Buffet, Olivier" uniqKey="Buffet O" first="Olivier" last="Buffet">Olivier Buffet</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="02"><s1>National ICT Australia & The Autralian National Vniversity RSISE Building 115 - ANU/</s1>
<s2>Canberra ACT 0200</s2>
<s3>AUS</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Dutech, Alain" sort="Dutech, Alain" uniqKey="Dutech A" first="Alain" last="Dutech">Alain Dutech</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Charpillet, Francois" sort="Charpillet, Francois" uniqKey="Charpillet F" first="Francois" last="Charpillet">Francois Charpillet</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">06-0329056</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 06-0329056 INIST</idno>
<idno type="RBID">Pascal:06-0329056</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000439</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="fr" level="a">Etude de différentes combinaisons de comportements adaptatives</title>
<author><name sortKey="Buffet, Olivier" sort="Buffet, Olivier" uniqKey="Buffet O" first="Olivier" last="Buffet">Olivier Buffet</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
<affiliation><inist:fA14 i1="02"><s1>National ICT Australia & The Autralian National Vniversity RSISE Building 115 - ANU/</s1>
<s2>Canberra ACT 0200</s2>
<s3>AUS</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Dutech, Alain" sort="Dutech, Alain" uniqKey="Dutech A" first="Alain" last="Dutech">Alain Dutech</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Charpillet, Francois" sort="Charpillet, Francois" uniqKey="Charpillet F" first="Francois" last="Charpillet">Francois Charpillet</name>
<affiliation><inist:fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint><date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Adaptive method</term>
<term>Intelligent agent</term>
<term>Long term</term>
<term>Markov decision</term>
<term>Markov process</term>
<term>Modeling</term>
<term>Motivation</term>
<term>Observable</term>
<term>Reinforcement learning</term>
<term>Uncertain system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Système incertain</term>
<term>Apprentissage renforcé</term>
<term>Long terme</term>
<term>Motivation</term>
<term>Agent intelligent</term>
<term>Observable</term>
<term>Décision Markov</term>
<term>Processus Markov</term>
<term>Modélisation</term>
<term>Méthode adaptative</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This article focusses on the automated synthesis of agents in an uncertain environment, working in the setting of Reinforcement Learning, and more precisely of Partially Observable Markov Decision Processes. The agents (with no model of their environment and no short-term memory) are facing multiple motivations/goals simultaneously, a problem related to the field of Action Selection. We propose and evaluate various Action Selection architectures. They all combine already known basic behaviors in an adaptive manner, by learning the tuning of the combination, so as to maximize the agent's payoff. The logical continuation of this work is to automate the selection and design of the basic behaviors themselves.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0992-499X</s0>
</fA01>
<fA03 i2="1"><s0>Rev. intell. artif.</s0>
</fA03>
<fA05><s2>20</s2>
</fA05>
<fA06><s2>2-3</s2>
</fA06>
<fA08 i1="01" i2="1" l="FRE"><s1>Etude de différentes combinaisons de comportements adaptatives</s1>
</fA08>
<fA09 i1="01" i2="1" l="FRE"><s1>Décision et planification dans l'incertain</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>BUFFET (Olivier)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>DUTECH (Alain)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>CHARPILLET (Francois)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>CHARPILLET (F.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>GARCIA (F.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>PERNY (Patrice)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="04" i2="1"><s1>SIGAUD (Olivier)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239</s1>
<s2>54506 Vandoeuvre-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>National ICT Australia & The Autralian National Vniversity RSISE Building 115 - ANU/</s1>
<s2>Canberra ACT 0200</s2>
<s3>AUS</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>LORIA-INRIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02"><s1>INRA-MIA</s1>
<s2>Toulouse</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03"><s1>LIP6</s1>
<s2>Paris</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</fA15>
<fA20><s1>311-343</s1>
</fA20>
<fA21><s1>2006</s1>
</fA21>
<fA23 i1="01"><s0>FRE</s0>
</fA23>
<fA24 i1="01"><s0>eng</s0>
</fA24>
<fA43 i1="01"><s1>INIST</s1>
<s2>21320</s2>
<s5>354000142556580060</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2006 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.3/4</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>06-0329056</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Revue d'intelligence artificielle</s0>
</fA64>
<fA66 i1="01"><s0>FRA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>This article focusses on the automated synthesis of agents in an uncertain environment, working in the setting of Reinforcement Learning, and more precisely of Partially Observable Markov Decision Processes. The agents (with no model of their environment and no short-term memory) are facing multiple motivations/goals simultaneously, a problem related to the field of Action Selection. We propose and evaluate various Action Selection architectures. They all combine already known basic behaviors in an adaptive manner, by learning the tuning of the combination, so as to maximize the agent's payoff. The logical continuation of this work is to automate the selection and design of the basic behaviors themselves.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>001D01A08</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Système incertain</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Uncertain system</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Sistema incierto</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Apprentissage renforcé</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Reinforcement learning</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Aprendizaje reforzado</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Long terme</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Long term</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Largo plazo</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Motivation</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Motivation</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Motivación</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Agent intelligent</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Intelligent agent</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Agente inteligente</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Observable</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Observable</s0>
<s5>18</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Observable</s0>
<s5>18</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Décision Markov</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Markov decision</s0>
<s5>19</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Decisión Markov</s0>
<s5>19</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Processus Markov</s0>
<s5>23</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Markov process</s0>
<s5>23</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Proceso Markov</s0>
<s5>23</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Modélisation</s0>
<s5>24</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Modeling</s0>
<s5>24</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Modelización</s0>
<s5>24</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Méthode adaptative</s0>
<s5>25</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Adaptive method</s0>
<s5>25</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Método adaptativo</s0>
<s5>25</s5>
</fC03>
<fN21><s1>212</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
<pR><fA30 i1="01" i2="1" l="FRE"><s1>Décision dynamique et planification dans l'incertain. Journée</s1>
<s3>Paris FRA</s3>
<s4>2004-05-07</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 06-0329056 INIST</NO>
<FT>Etude de différentes combinaisons de comportements adaptatives</FT>
<AU>BUFFET (Olivier); DUTECH (Alain); CHARPILLET (Francois); CHARPILLET (F.); GARCIA (F.); PERNY (Patrice); SIGAUD (Olivier)</AU>
<AF>LORIA - INRIA-Lorraine / Campus Scientifique - B.P. 239/54506 Vandoeuvre-lès-Nancy/France (1 aut., 2 aut., 3 aut.); National ICT Australia & The Autralian National Vniversity RSISE Building 115 - ANU//Canberra ACT 0200/Australie (1 aut.); LORIA-INRIA/Nancy/France (1 aut.); INRA-MIA/Toulouse/France (2 aut.); LIP6/Paris/France (3 aut., 4 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Revue d'intelligence artificielle; ISSN 0992-499X; France; Da. 2006; Vol. 20; No. 2-3; Pp. 311-343; Abs. anglais; Bibl. 1 p.3/4</SO>
<LA>Français</LA>
<EA>This article focusses on the automated synthesis of agents in an uncertain environment, working in the setting of Reinforcement Learning, and more precisely of Partially Observable Markov Decision Processes. The agents (with no model of their environment and no short-term memory) are facing multiple motivations/goals simultaneously, a problem related to the field of Action Selection. We propose and evaluate various Action Selection architectures. They all combine already known basic behaviors in an adaptive manner, by learning the tuning of the combination, so as to maximize the agent's payoff. The logical continuation of this work is to automate the selection and design of the basic behaviors themselves.</EA>
<CC>001D02C; 001D01A08</CC>
<FD>Système incertain; Apprentissage renforcé; Long terme; Motivation; Agent intelligent; Observable; Décision Markov; Processus Markov; Modélisation; Méthode adaptative</FD>
<ED>Uncertain system; Reinforcement learning; Long term; Motivation; Intelligent agent; Observable; Markov decision; Markov process; Modeling; Adaptive method</ED>
<SD>Sistema incierto; Aprendizaje reforzado; Largo plazo; Motivación; Agente inteligente; Observable; Decisión Markov; Proceso Markov; Modelización; Método adaptativo</SD>
<LO>INIST-21320.354000142556580060</LO>
<ID>06-0329056</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000439 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000439 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:06-0329056
   |texte=   Etude de différentes combinaisons de comportements adaptatives
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022

	Serveur d'exploration sur la recherche en informatique en Lorraine
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la recherche en informatique en Lorraine

Etude de différentes combinaisons de comportements adaptatives

Etude de différentes combinaisons de comportements adaptatives

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri