Title: Partially observable Markov decision processes with reward information
Abstract: In a partially observable Markov decision process (POMDP), if the reward can be observed at each step, then the observed reward history contains information for the unknown state. This information, in addition to the information contained in the observation history, can be used to update the state probability distribution. The policy thus obtained is called a reward-information policy (RI-policy); an optimal RI policy performs no worse than any normal optimal policy depending only on the observation history. The above observation leads to four different problem-formulations for partially observable Markov decision processes (POMDPs) depending on whether the reward function is known and whether the reward at each step is observable.
Publication Year: 2004
Publication Date: 2004-01-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 6
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot