[PhD thesis] Information-Oriented Planning with ρ-POMDPs

Information-Oriented Planning with ρ-POMDPs

Supervision: Olivier Buffet & Vincent Thomas
firstname.lastname@loria.fr
https://members.loria.fr/olivier.buffet/ / https://members.loria.fr/vincent.thomas/
Team/Lab: LARSENLORIA (UMR 7503) / INRIA / Université de Lorraine / CNRS
Location: Nancy, France
Keywords: Artificial Intelligence, Partially Observable Markov Decision Processes (POMDPs), Information-Oriented Control, Monte Carlo Tree Search (MCTS).

Background

Automated planning aims at deciding which sequence of actions to perform so as to control the state of a system, the performance criterion being typically a function of the states and actions. It can for example induce a compromise between reaching some states while avoiding others and minimizing action costs. When driving a car, the objective is to reach some location while minimizing fuel costs and avoiding situations that could lead to a car failure (e.g., flat tire) or an accident. Uncertainties—e.g., uncertain failures or weather conditions—can be due to stochastic dynamics or partial observability (sensors providing a limited and noisy access to the system state), and often lead to the decision-theoretic setting of Partially Observable Markov Decision Processes (POMDPs) [2].

We are here concerned with information-oriented problems, where the performance criterion depends on the knowledge about the system state. Examples include: a robot willing to localize itself; a museum surveillance system willing to track visitors; or a medical system willing to monitor the state of a patient. ρ-POMDPs have been introduced to formalize such problems, the only—but key—difference with POMDPs being the wider family of possible objective functions (now depending on the belief—probability distribution over states—in addition to the state and action).

First algorithms have been proposed to solve ρ-POMDPs relying on the same schemes as point-based POMDP solvers [4]. They assume either (i) a convex criterion [1], which allowed to approximate a ρ-POMDP as a POMDP; or (ii) a Lipschitz-Continuous criterion [3], which led to new types of value-function approximators. In both cases an -optimal solution is provably obtained in finite time.

Objectives

The objective of this PhD thesis is to pursue this line of research, in particular with the aim of obtaining efficient tools for robotic applications.

While we are still interested in point-based algorithms for approximating the value function [4], the focus of this PhD thesis will be on Monte Carlo planning [5]—i.e., using a generative model to simulate, and thus evaluate, policies. Preliminary work on this Monte Carlo setting has been conducted, allowing to already identify three research directions. The first one relates to particle filters incorporated in the proposed algorithm, whose particle sets would need to grow progressively without introducing any bias. A second research direction is that of handling continuous state, action and observation spaces [6], which is an essential requirement for robotic systems. A third research direction is to identify different classes of problems that could be amenable to different types of algorithms—e.g., depending on whether rewards are sparse or not. Of course, other lines of research are likely to appear as the research work progresses.

References

[1] M. Araya-López, O. Buffet, V. Thomas, and F. Charpillet. “A POMDP Extension with Belief-dependent Rewards”. In: Advances in Neural Information Processing Systems 23 (NIPS-10). 2010.
[2] K. Åström. “Optimal control of Markov processes with incomplete state information”. In: Journal of Mathematical Analysis and Applications 10.1 (1965). ISSN : 0022-247X.
[3] M. Fehr, O. Buffet, V. Thomas, and J. Dibangoye. “ρ-POMDPs have Lipschitz-Continuous -Optimal Value Functions”. In: Advances in Neural Information Processing Systems 32 (NIPS-18). 2018.
[4] G. Shani, J. Pineau, and R. Kaplow. “A survey of point-based POMDP solvers”. In: JAAMAS 27.1 (2013).
[5] D. Silver and J. Veness. “Monte-Carlo Planning in Large POMDPs”. In: NIPS-10. 2010.
[6] Z. N. Sunberg and M. J. Kochenderfer. “Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces”. In: ICAPS-18. 2018.

Requirements

We are looking for excellent candidates with a strong interest for automated planning, both from a theoretical (maths) and practical (programming) point of view. Knowledge
of English is mandatory.

How to apply

Deadline: May 1st, 2019 (Midnight Paris time)
Applications are to be sent as soon as possible.

Upload your file on https://jobs.inria.fr (via https://jobs.inria.fr/public/classic/en/offres/2019-01556) in a single pdf or zip file, and send it as well by email to both supervisors. Your file should contain the following documents:

  1. Your CV;
  2. A cover/motivation letter describing your interest in this topic;
  3. A short (max one page) description of your Master thesis (or equivalent) or of the work in progress if not yet completed;
  4. Your degree certificates and transcripts for Bachelor and Master (or the last 5 years);
  5. Master thesis (or equivalent) if it is already completed and publications if any (it is not expected that you have any); only the web links to these documents are preferable, if possible.

In addition, one recommendation letter from the person who supervise(s|d) your Master thesis (or research project or internship) should be sent directly by his/her author to both supervisors.

Logo du CNRS

Logo d'Inria

Logo Université de Lorraine