[PhD] Reinforcement Learning and Spiking Neurons : from Models to Hardware

Reinforcement Learning and Spiking Neurons : from Models to Hardware


équipe BISCUIT, Loria
Encadrement : Bernard Girau (HDR), Alain Dutech (HDR)
Bernard.Girau@loria.fr & Alain.Dutech@loria.fr
dernière modification le 6 mai 2019



The BISCUIT [1] team of the Loria [2] laboratory studies computational paradigms where calculations are adaptive, distributed and decentralized, carried out by populations of simple computing units that communicate mainly with their close neighbors. These properties are compatible with the implementation of unsupervised – but not un-guided – self-organization principles to tackle difficult problems such as situated cognitive computation, autonomous robotics, adaptive allocation of computation resources, etc.

These characteristics also make it possible to consider a better use of so-called ”neuro-morphic” processors that are ermerging (IBM Truenorth, Intel Loihi, etc.). These processors are based on neuro-inspired principles that respect the constraints of the paradigms we are studying, and can benefit from self-organization mechanisms – not supervised but guided – that we are developing, both in terms of applications and neuromorphic resources management. This is why the BISCUIT team is committed to designing unsupervised and guided learning architectures and algorithms for spatialized and decentralized computing populations while remaining as close as possible to the constraints and characteristics of the Hardware. The subject of the proposed doctoral thesis is a additonal step in that direction.

[1] Bio-Inspired Situated Cellular and Unconventional Information Technology, http://biscuit.loria.fr/
[2] www.loria.fr



The main goal of this thesis is to explore the use of mechanisms at the crossroads of neuro-inspired calculation and reinforcement learning within the framework of so-called neuromorphic architectures.

The general framework of reinforcement learning (or RL) (Sutton and Barto, 1998) proposes a theoretical basis for decision making in uncertainty. Within the BISCUIT team, it is our main prospect in exploring how to guide the processes of self-organization. The classical algorithms are mainly dedicated to discrete and centralized approaches that are not very compatible with our computation paradigms. Similarly, the mechanisms currently in place in in deep reinforcement learning are based on gradient descents and cannot adapt to our constraints of decentralization and non-supervision.

This is the reason why we want to further explore one of the learning mechanisms coming from the connectionism world where decentralization and population coding are easier. This mecha- nism is called Spike-Timing Dependent Plasticity (or STDP) (Markram et al., 1997; Bi and Poo, 1998), an unsupervised learning rule that can be modulated, for example by taking into account a reinforcement signal, as in classical reinforcement learning. Several studies have already been carried out in this direction (see, for example , (Florian, 2007; El-Laithy and Bogdan, 2011)), but these works are still few in number. Moreover, they were conducted independently of the current advances in neuromorphic processors, some of which are very recent ( Intel chip ”Loihi”, (Davies et al., 2018)). However, a strong trend of recent neuro-morphic processors is precisely the implementation on a chip of configurable STDP mechanisms, which allows the adaptability of the implanted models while being based on decentralized and local learning rules. The purpose of this thesis is therefore to study this family of algorithms by taking into account the computational paradigms studied by the BISCUIT team and in the perspective of the emergence of neuromorphic circuits. This will include :

  • conduct a literature review on STDP and RL;
  • explore the capacity of existing STDP models modulated by RL to allow the learning of our neural models (mainly self-organizing map and dynamical neural fields);
  • propose adaptations of these algorithms compatible with the constraints imposed by neuro-morphic processors;
  • adapt these algorithms to the team’s key issues, in particular the decentralized dynamic control of the allocation of computing resources on neuromorphic processors.


Working conditions and desired skills

The doctoral student will be welcomed at the Loria in Nancy, France. He or she will work under the supervision of Alain Dutech and Bernard Girau. Scientific collaboration with other team members is expected, as well as more general scientific discussions and collaborations with other members of the laboratory. The expected duration of the doctorate is three years.

In addition to advanced master’s level computer skills, we expect solid foundations on the associated mathematical concepts (in particular probabilities and differential equations). The candidate should have some appetence for the design of digital circuits and artificial intelligence. Finally, the candidate, who holds a Master’s degree in computer science or equivalent, must be creative, curious and autonomous. The team will provide a set of programming tools and all the support necessary for the technical aspects of the work, which will allow the doctoral student to focus on the scientific questions. Being comfortable with software design is also required, the code production will be done under Linux.


  • Bi, G.-q. and Poo, M.-m. (1998). Synaptic modifications in cultured hippocampal neurons : Dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18(24) :10464–10472.
  • Davies, M., Srinivasa, N., Lin, T., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y., Wild, A., Yang, Y., and Wang, H. (2018). Loihi : A
    neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1) :82–99.
  • El-Laithy, K. and Bogdan, M. (2011). A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses. omputational Intelligence and Neuroscience, 2011.
  • Florian, R. V. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19(6):1468–1502.
  • Markram, H., Lübke, J., Frotscher, M., and Sakmann, B. (1997). Regulation of synaptic efficacy by coincidence of postsynaptic aps and epsps. Science, 275(5297) :213–215.
  • Sutton, R. and Barto, A. (1998). Reinforcement Learning. Bradford Book, MIT Press, Cambridge, MA.

How to apply

Deadline: May 20th, 2019 (Midnight Paris time)
Applications are to be sent as soon as possible.

Send a file with the following components to both supervisers.

  1. Your CV;
  2. A cover/motivation letter describing your interest in this topic;
  3. A short (max one page) description of your Master thesis (or equivalent) or of the work in progress if not yet completed;
  4. Your degree certificates and transcripts for Bachelor and Master (or the last 5 years);
  5. Master thesis (or equivalent) if it is already completed and publications if any (it is not expected that you have any); only the web links to these documents are preferable, if possible.

In addition, one recommendation letter from the person who supervise(s|d) your Master thesis (or research project or internship) should be sent directly by his/her author to both supervisors.

Logo du CNRS

Logo d'Inria

Logo Université de Lorraine