# PhD Proposal 2023 : Combining machine learning techniques and metaheuristics for solving combinatorial optimization problems

The thesis will be conducted in the Optimist team of Loria under the supervision of

Wahiba Ramdane Cherif-Khettaf (ramdanec@loria.fr)

Ammar Oulamara (oulamara@loria.fr)

__Deadline for application:__ 10 May 2023

__Expected skills __

– Research Master’s degree or engineering degree in at least one of the following fields: Computer Science, Applied Mathematics, Operations Research, Artificial Intelligence and Data Science.

– Experience in programming

__Application documents __

Detailed CV

Master 1 and Master 2 academic year results

Motivation letter

Any recommendations, or names of referees to contact.

__Description of the topic__

Combinatorial optimisation has applications in many fields such as transport, production, mobility, etc. Most combinatorial optimization problems are difficult to solve (e.g., traveling salesman problem, packing problem, scheduling problems), and their solution often relies on the development of specific heuristics to find efficient solutions without guaranteeing optimality, and often these heuristics require adaptation when a restriction is added to the problem. Metaheuristics offer an alternative to overcome some of the limitations of heuristics, since they have the advantage of being generic, and of allowing to visit a large space of feasible solutions by alternating the processes of exploration (diversification) and exploitation (intensification). Metaheuristics can be used for any optimisation problem, it is nevertheless difficult to guarantee the efficiency of the results. The notion of efficiency here generally refers to two metrics: the speed of execution and the quality of the found solutions. The speed of execution refers to the computation time, while the quality looks at the gap between the best solution found and the actual optimum. Naturally, if most of the search time is used in feasible regions that do not contain at least good solutions, the quality of the metaheuristic can hardly be better. Expertise and intuition are then needed to integrate rules in the search phase to guide the metaheuristics into the most promising regions. Machine learning (ML) techniques can address the design of rules, integrating them into the search process to discover new rules and globally accelerate the search process of promising feasible regions. Thus, machine learning has a double purpose. The first one aims at replacing some computations in metaheuristics by a fast approximation, where learning can be used to build such approximations as an example of estimating the quality of a solution in a generic way, without the need to build new explicit algorithms or to use the explicit algorithms of metaheuristics which are computationally intensive. The second objective is to explore the search space and learn from this exploration the most efficient behavior. The integration of machine learning in solving optimization problems is quite recent and research in this field has not yet led to spectacular results, nevertheless the first studies are very promising.

In this thesis, we are interested in the association of metaheuristics with Reinforcement Learning (RL) [1] [2] [6] to guide decisions in the search space. Recent results show the interest of hybridizing metaheuristics and RL to solve combinatorial problems such as routing problems [3] [4] or scheduling problems [5]. The ML-Optimisation association aims at replacing some computations in metaheuristics by a fast approximation, and aims at exploring the search space and learning from this exploration the best performing behaviour. In the case of searching for a fast approximation, the use of ML allows to quickly exclude unpromising solutions, while in the case of searching for new decision rules reinforcement learning is suitable since in learning a policy where the agent maximises performance, matching the reward signal with the optimisation goal allows the agent to explore promising feasible regions.

The aim of this thesis is to build a hybrid scheme that fits into this idea of cooperation between metaheuristics and ML/RL. The aim is to develop ML/RL models that are able to learn from the problem data the best strategies for finding promising regions (exploration) and the optimisation algorithms focus on the part of finding the best solution of this feasible region (exploitation). The development of a hybrid scheme will initially focus on optimisation problems related to vehicle touring applications.

**References**

[1] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

[2] Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial

optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.

[3] Mohammadreza Nazari, Afshin Oroojlooy, Lawrence V. Snyder, Martin Takáč. Reinforcement Learning for Solving the Vehicle Routing Problem. arXiv:1802.04240, 2018.

[4] Peng B , Zhang Y, Lü Z, Cheng T.C.E, Glover F, A learning-based memetic algorithm for the multiple vehicle pickup and delivery problem with LIFO loading, Computer and Industrial Engineering, 2020

[5] Chen R, Yang B, Li S, Wang S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem, Computer and Industrial Engineering, 2020

[6] Seyyedabbasi, A. (2023). A reinforcement learning-based metaheuristic algorithm for solving global optimization problems. *Advances in Engineering Software*, *178*, 103411.