Consensus Based Peer to Peer Machine Learning

Consensus Based Peer to Peer Machine Learning

The PhD position is proposed by the COAST team of the Inria Nancy Grand Est research lab, the French national public institute dedicated to research in digital Science and technology. The team is one of the European research group in distributed system and trustworthy large scale collaborative systems. https://team.inria.fr/coast/. It will be conducted in collaboration with SYNALP team of LORIA in Nancy

Contacts
François Charoy (francois.charoy@loria.fr)

Context of the PhD Thesis

Data privacy concerns can be a limit to the training of high quality machine learning models. Recent approaches like federated learning have proved[1,3] that it was possible to train models from personal data without sharing these data. This has triggered a lot of new research[2,5]

 

However, federated learning is still a centralized approach where one participant owns and controls the model. More recently, Lalitha and al.[4] proposed an approach for peer to peer federated learning. Although appealing, this proposition is not very resistant to byzantine attacks. It relies on a trust relation between nodes that is not well defined.

 Our goal is to propose a framework that supports peer to peer decentralized learning, where nodes keep the control of their data and where nodes contribution to the model is considered and rewarded. It can be related to  permissioned decentralized machine learning where participants are authenticated and trusted. The set of participants can also evolve while participants join and leave the federation. A common example would be Healthcare organisations that agree to collaborate to train a model based on their respective annotated data. It could also be banks that agree to train a model to detect different kinds of frauds, logistic or industry services to train models to optimize production or transportation of goods. As in federated learning, they need to preserve data privacy and be sure that partners are actually faithfully collaborating to the common endeavor.

 

 

Objectives of the PhD Thesis

The objective of the thesis is to answer the following questions :

 

  • How to evaluate the contribution of a node to the “progress of a model” (and what does it mean) in a fully decentralized way. The evaluation should reflect the effort made by a node to the provision of high quality data and to the computation of the new version of the model. The evaluation should be resistant to tampering and poisoning. It must rely on a consensus among the partners in the collaboration.

 

  • How to guarantee a fair, trustworthy, efficient and fault tolerant collaboration to train quality models in a decentralized environment ? The protocol should ensure that participants are contributing fairly and that contracts are enforced in a decentralized trustworthy way. This is supposed to be a multi-round game. There is a need for incentives for partners to participate, understand the kind of attacks that may occur on the network and

 

References

[1] Konečný, Jakub, et al. “Federated optimization: Distributed machine learning for on-device intelligence.” arXiv preprint arXiv:1610.02527 (2016).

[2] Yang, Qiang, et al. “Federated machine learning: Concept and applications.” ACM Transactions on Intelligent Systems and Technology (TIST) 10.2 (2019): 1-19.

[3] Konečný, Jakub, et al. “Federated learning: Strategies for improving communication efficiency.” arXiv preprint arXiv:1610.05492 (2016).

[4] Lalitha, Anusha, et al. “Peer-to-peer federated learning on graphs.” arXiv preprint arXiv:1901.11173 (2019).

[5] Kairouz, Peter, et al. “Advances and open problems in federated learning.” arXiv preprint arXiv:1912.04977 (2019).

Compétences

The student is expected to develop techniques for the mathematical formulation of node contribution to the training of peer to peer machine learning and of trust management among nodes. We also expect to have implementations of the developed techniques and protocols. Ideal candidates should possess the following skills:

  • Programming experience in Python.
  • Good knowledge of distributed algorithms and distributed systems.
  • Good knowledge of machine learning.
  • Experience with a machine learning framework (Pytorch or TensorFlow) would be a plus.

 

Mostly importantly, we seek highly motivated students.  A master in Computer Science or Computer Engineering is required.

Avantages

  • Subsidised catering service
  • Partially-reimbursed public transport

 

Logo d'Inria