License
When quoting this document, please refer to the following
URN: urn:nbn:de:0030-drops-6372
URL: http://drops.dagstuhl.de/opus/volltexte/2006/637/
Go to the corresponding Portal


Ryabko, Daniil ; Hutter, Marcus

Learning in Reactive Environments with Arbitrary Dependence

pdf-format:
Document 1.pdf (216 KB)


Abstract

In reinforcement learning the task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. This task generalises the sequence prediction problem, in which the environment does not react to the behaviour of the agent. Solomonoff induction solves the sequence prediction problem for any countable class of measures; however, it is easy to see that such result is impossible for reinforcement learning - not any countable class of environments can be learnt. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.

BibTeX - Entry

@InProceedings{ryabko_et_al:DSP:2006:637,
  author =	{Daniil Ryabko and Marcus Hutter},
  title =	{Learning in  Reactive Environments with Arbitrary Dependence},
  booktitle =	{Kolmogorov Complexity and Applications},
  year =	{2006},
  editor =	{Marcus Hutter  and Wolfgang Merkle and Paul M.B. Vitanyi},
  number =	{06051},
  series =	{Dagstuhl Seminar Proceedings},
  ISSN =	{1862-4405},
  publisher =	{Internationales Begegnungs- und Forschungszentrum f{\"u}r Informatik (IBFI), Schloss Dagstuhl, Germany},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2006/637},
  annote =	{Keywords: Reinforcement learning, asymptotic average value, self-optimizing policies, (non) Markov decision processes}
}

Keywords: Reinforcement learning, asymptotic average value, self-optimizing policies, (non) Markov decision processes
Seminar: 06051 - Kolmogorov Complexity and Applications
Issue Date: 2006
Date of publication: 31.07.2006


DROPS-Home | Fulltext Search | Imprint Published by LZI