Chargement Évènements

« Tous les Évènements

  • Cet évènement est passé

PhD defense: Bishnu Sarker

23 avril 2021 @ 14:00 - 16:00

Bishnu Sarker (CAPSID) will defend his thesis, entitled : On Graph-based Approaches for Protein Function Annotation and Knowledge Discovery. The defense will take place on Friday, 23 April 2021 at 14:00. Due to the health situation, the defense will be held online.

Abstract

Due to the recent advancement in genomic sequencing technologies, the number of protein entries in public databases is growing exponentially. It is important to harness this huge amount of data to describe living things at the molecular level, which is essential for understanding human disease processes and accelerating drug discovery. A prerequisite, however, is that all of these proteins be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. Today, only a small fraction of the proteins is functionally annotated and reviewed by expert curators because it is expensive, slow and time-consuming. Developing automatic protein function annotation tools is the way forward to reduce the gap between the annotated and unannotated proteins and to predict reliable annotations for unknown proteins. Many tools of this type already exist, but none of them are fully satisfactory. We observed that only few consider graph-based approaches and the domain composition of proteins. Indeed, domains are conserved regions across protein sequences of the same family. In this thesis, we design and evaluate graph-based approaches to perform automatic protein function annotation and we explore the impact of domain architecture on protein functions. The first part is dedicated to protein function annotation using domain similarity graph and neighborhood-based label propagation technique. we present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with enzymatic functions (EC numbers) and GO terms from a protein-domain similarity graph. We validate the performance of GrAPFI using six reference proteomes from UniprotKB/SwissProt and compare GrAPFI results with state-of-the-art EC prediction approaches. We find that GrAPFI achieves better accuracy and comparable or better coverage. The second part of the dissertation deals with learning representation for biological entities. At the beginning, we focus on neural network-based word embedding technique. We formulate the annotation task as a text classification task. We build a corpus of proteins as sentences composed of respective domains and learn fixed dimensional vector representation for proteins. Then, we focus on learning representation from heterogeneous biological network. We build knowledge graph integrating different sources of information related to proteins and their functions. We formulate the problem of function annotation as a link prediction task between proteins and GO terms. We propose Prot-A-GAN, a machine-learning model inspired by Generative Adversarial Network (GAN) to learn vector representation of biological entities from protein knowledge graph. We observe that Prot-A-GAN works with promising results to associate appropriate functions with query proteins. In conclusion, this thesis revisits the crucial problem of large-scale automatic protein function annotation in the light of innovative techniques of artificial intelligence. It opens up wide perspectives, in particular for the use of knowledge graphs, which are today available in many fields other than protein annotation thanks to the progress of data science.

Composition of jury:

Reviewers :
                       Christine Brun : Research Director,  CNRS, Inserm-University of Marseille, France.
                       Mohamed Elati : Professor, University of Lille, France.

Examiners :

                       Anne Boyer : Professor, University of  Lorraine, France.
                       Albert Montresor : Professor, University of Trento, Italy.

Supervisors :

                      David W. Ritchie (till Sept 2019) : Research Director, Inria, Nancy, France.
                      Marie-Dominique Devignes (from sept 2019) : Associate Researcher, CNRS, Nancy, France.
                      Sabeur Aridhi : Associate Professor, University of Lorraine, France.

Détails

Date :
23 avril 2021
Heure :
14:00 - 16:00
Étiquettes évènement :
, , , , , , , ,

Lieu

Teams