OrphaMine: a tool to learn more about rare diseases

29 February 2016


Like Dr. House in the series of the same name, doctors sometimes come up against enigmas when faced with certain pathologies. The aim of the OrphaMine platform developed as part of the ANR Hybride project is to increase specialists’ understanding of rare diseases. The platform has been tested in-house and will soon be offered to a broader panel made up of doctors, researchers and representatives of the pharmaceutical industry.

Source: Marine Loyen, Citizen Press, 21/12/2015image_mini
Yannick Toussaint is a researcher and was one of the founder members of the ORPAILLEUR team in 1998. For the Hybride project, he obtained a four-year French National Research Agency (ANR) grant which was extended for an extra year to help his team finish work on the OrphaMine platform.

Chedy Raïssi is also a researcher and has been a member of the ORPAILLEUR team at the Nancy/Eastern France branch of the National Institute for Research in Computer Science and Control (Inria) since 2009 working alongside Yannick Toussaint on the OrphaMine platform.

How did the OrphaMine platform begin?

YT: We realized that it was difficult for specialists in rare diseases to gather together all the existing knowledge. Over 8,000 diseases have been identified, listed and characterized but in reality there are more than 15,000. Also, certain pathologies can develop very differently in different patients and diagnosis can therefore take several years. There are a limited number of sufferers which means few doctors are interested. Our aim is to provide a platform for them to access all available knowledge on these rare pathologies. One of our sources is the bibliographic database Medline which has a catalogue of millions of medical texts. The aim of our platform is to extract the most relevant data from this mass of texts.

Which methods do you use?

YT: On the Hybride project, we work with teams from the French National Institute of Health and Medical Research (Inserm) and with the Greyc* and MoDyCo** laboratories. We study the texts to extract the right information. This is like “Machine Learning” – our algorithms need to be able to recognize words and understand whether they refer to an illness, a symptom, a bacteria or a type of treatment. Next, to extract the information, we programme the software to recognize the syntax of a phrase to understand the links between words (causality, opposition, etc.) even in complex sentences. For example, the link between an illness and a symptom can be expressed in different ways and there can be very subtle nuances of meaning. The MoDyCo team helps a lot on this point.

CR: In concrete terms, we draw up a network of interconnected data using medical texts as our starting point. An illness is linked to several symptoms and to the genes involved for example. This first step enables us to obtain a representation of existing knowledge. I then work on detecting the hidden patterns and links within this data network using data mining algorithms. Here is an example – our data network shows us that illness A is linked with gene B, that gene B is present in mice with certain variations and that direct interactions exist between proteins encoded by the mouse genome and gene. It would be in doctors’ interest to make a closer study of the equivalent proteins identified in humans to increase their knowledge of the illness concerned. My work helps highlight these links.

How does the platform run in real terms?

YT: We still have work to do on making our platform accessible and easy to use for doctors and eventually the general public. Our aim is that they should be able to use the software to ask a question about a rare illness. For example they can enter the name of an illness to find out all the symptoms associated with it or enter their patients’ symptoms. The more symptoms there are, the smaller the group of possible illnesses will be.

CR: We have already presented our platform to doctors working in several hospitals who give us regular feedback on our work. A doctor from the University Hospital of Nancy works with our team to refine our results. The panel of users will be enlarged in January to include doctors, researchers and representatives of the pharmaceutical industry.

*Greyc: Computer Science Laboratory at the University of Caen Normandy, France

**MoDyCo: The “Models, Dynamics, Corpora” linguistics laboratory made up of researchers from Paris West University-Nanterre-La Défense and the CNRS.


OrphaMine platform
 demo