In this thesis, using Markov decision models, we investigate high-level decision-making (task-level
planning) for robotics in two aspects: robot-robot collaboration and human-robot collaboration.
In robot-robot collaboration (RRC), we study the decision problems of multiple robots involved to
achieve a shared goal collaboratively, and we use the decentralized partially observable
Markov decision process (Dec-POMDP) framework to model such RRC problems. Then, we propose
two novel algorithms for solving Dec-POMDPs. The first algorithm (Inf-JESP) finds Nash
equilibrium solutions by iteratively building the best-response policy for each agent until no
improvement can be made. To handle infinite-horizon Dec-POMDPs, we represent each agent’s
policy using a finite-state controller. The second algorithm (MC-JESP) extends Inf-JESP with
generative models, which enables us to scale up to large problems. Through experiments, we
demonstrate our methods are competitive with existing Dec-POMDP solvers.
In human-robot collaboration (HRC), we can only control the robot, and the robot faces uncertain
human objectives and induced behaviors. Therefore, we attempt to address the challenge
of deriving robot policies in HRC, which are robust to the uncertainties about human behaviors.
In this direction, we discuss possible mental models that can be used to model humans in an HRC
task. We propose a general approach to derive, automatically and without prior knowledge, a
model of human behaviors based on the assumption that the human could also control the robot.
From here, we then design two algorithms for computing robust robot policies relying on solving
a robot POMDP, whose state contains the human’s internal state. The first algorithm operates
offline and gives a complete robot policy that can be used during the robot’s execution. The
second algorithm is an online method, i.e., it plans the robot’s action at each time step during
execution. Compared with the offline approach, the online method only requires a generative
model and thus can scale up to large problems. Experiments with synthetic and real humans
are conducted in a simulated environment to evaluate these algorithms. We observe that our
methods can provide robust robot decisions despite the uncertainties over human objectives and
behaviors.
In this thesis, our research for RRC provides a foundation for building best-response policies
in a partially observable and multi-agent setting, which serves as an important intermediate step
for addressing HRC problems. Moreover, we provide more flexible algorithms using generative
models in each contribution.