PhDThesis Position: Online hate speech against migrants

Deadline to apply : May 1st 2018

According to the 2017 International Migration Report, the number of international migrants worldwide has continued to grow rapidly in recent years, reaching 258 million in 2017, up from 220 million in 2010  and 173 million in 2000. In 2017, 64 per cent of all international migrants worldwide –equal to 165 million international migrants – lived in high-income countries; 78 million of them wereresiding in Europe. Since 2000, Germany and France figure among the countries hosting the largest number of international migrants. A key reason for the difficulty of EU leaders to take a decisive and coherent approach to the refugee crisis has been the high levels of public anxiety about immigration and asylum across Europe. Indeed, across the EU, attitudes towards asylum and immigration have
hardened in recent years because of (Berri et al., 2015): (i) the increase in the number and visibility of migrants in recent years, (ii) the economic crisis and austerity policies enacted since the 2008 Global Financial Crisis, (iii) the role of the mass media in influencing public and elite political attitudes towards
asylum and migration. Refugees and migrants tend to be framed negatively as a problem, potentially nourishing.

Indeed, the BRICkS – Building Respect on the Internet by Combating Hate Speech – EU project1 has revealed a significant increase of the use of hate speech towards immigrants and minorities, which are often blamed to be the cause of current economic and social problems. The participatory web and the social media seem to accelerate this tendency, accentuated by the online rapid spread of fake news which often corroborate online violence towards migrants. Based on existing research, Carla Schieb and Mike Preuss (2016) highlight that hate speech deepens prejudice and stereotypes in a society (Citron &Norton, 2011). It also has a detrimental effect on mental health and emotional well-being of targeted groups, especially on targeted individuals (Festl & Quandt, 2013) and is a source of harm in general for those under attack (Waldron, 2012), when culminating in violent acts incited by hateful speech. Such violent hate crimes may erupt in the aftermath of certain key events, e.g. anti-Muslim hate crimes in response to the 9/11 terrorist attacks (King & Sutton, 2013).

Hate speech and fake news are not, of course, just problems of our times. Hate speech has always been part of antisocial behavior such as bullying or stalking (Delgado & Stefancic, 2014); “trapped”, emotional, unverified and/or biased contents have always existed (Dauphin, 2002; Froissart, 2002, 2004;
Lebre, 2014) and need to be understood on an anthropological level as reflections of people’s fears, anxieties or fantasies. They reveal what Marc Angenot calls a certain “state of society” (Angenot, 1978;1989; 2006). Indeed, according to this author, analysis of situated specific discourses sheds light to some of the topoi – common premises and patterns – that characterize public doxa. This “gnoseological” perspective reveals the ways the visions of the “world” can be systematically schematized on linguistic materials at a certain moment.

Within this context and problematic, the PhD project jointly proposed by the Crem and the Loria aims to analyse hate speech towards migrants in social media and more particularly on Twitter. It seeks to provide answers to the following questions:
– What are the representations of migrants as they emerge in hate speech on Twitter?
– What themes are they associated with?
– What can the latter tell us about the “state” of our society, in the sense previously given to this term by Marc Angenot?

Secondary questions will also be addressed as to refine the main results:
– What is the origin of these messages? (individual accounts, political party accounts, bots, etc.)
– What is the circulation of these messages? (reactions, retweets, interactions, etc.)
– Can we measure the emotional dimension of these messages? Based on which indicators?
– Can a scale be established to measure the intensity of hate in speech?
More and more audio/video/text appear on Internet each day. About 300 hours of multimedia are uploaded per minute. In these multimedia sources, manual content retrieval is difficult or impossible. The classical approach for spoken content retrieval from multimedia documents is an automatic text retrieval. Automatic text classification is one of the widely used technologies for the above purposes. In text classification, text documents are usually represented in some so-called vector space and then assigned to predefined classes through supervised machine learning. Each document is represented as a numerical vector, which is computed from the words of the document. How to numerically represent the terms in an appropriate way is a basic problem in text classification tasks and directly affects the
classification accuracy. Sometimes, in text classification, the classes cannot be defined in advance. In this case, unsupervised machine learning is used and the challenge consists in finding underlying structures from unlabeled data. We will use methodologies to perform one of the important tasks of text classification: automatic hate speech detection.

Developments in Neural Network (Mikolov et al., 2013a) led to a renewed interest in the field of distributional semantics, more specifically in learning word embeddings (representation of words in a continuous space). Computational efficiency was one big factor which popularized word embeddings. The word embeddings capture syntactic as well as semantic properties of the words (Mikolov et al., 2013b). As a result, they outperformed several other word vector representations on different tasks (Baroni et al., 2014).

Our methodology in the hate speech classification will be related on the recent approaches for text classification with neural networks and word embeddings. In this context, fully connected feed forward networks (Iyyer et al., 2015; Nam et al., 2014), Convolutional Neural Networks (CNN) (Kim, 2014; Johnson and Zhang, 2015) and also Recurrent/Recursive Neural Networks (RNN) (Dong et al., 2014) have been applied. On the one hand, the approaches based on CNN and RNN capture rich compositional information, and have outperformed the state-of-the-art results in text classification; on the other hand they are computationally intensive and require careful hyperparameter selection and/or regularization (Dai and Le, 2015).

This thesis aims at proposing concepts, analysis and software components (Hate Speech Domain Specific Analysis and related software tools in connection with migrants in social media) to bridge the gap between conceptual requirements and multi-source information from social media. Automatic hate speech detection software will be experimented in the modeling of various hate speech phenomenon and assess their domain relevance with both partners. The language of the analysed messages will be primarily French, although links with other languages (including messages written in English) may appear throughout the analysis.
This PhD project complies with the Impact OLKi (Open Language and Knowledge for Citizens) framework because:
– It is centred on language.
– It aims to implement new methods to study and extract knowledge from linguistic data (indicators, scales of measurement).
– It opens perspectives to produce technical solutions (applications, etc.) for citizens and digital platforms, to better control the potential negative use of language data.
Scientific challenges:
– to study and extract knowledge from linguistic data that concern hate speech towards migrants in social media;
– to better understand hate speech as a social phenomenon, based on the data extracted and analysed;
– to propose and assess new methods based on Deep Learning for automatic detection of documents containing hate speech. This will allow to set up a hate speech online management protocol.

Keywords: hate speech, migrants, social media, natural language processing.
Doctoral school: Computer Science (IAEM)
Principal supervisor: Irina Illina, Assistant Professor in Computer Science,
Co-supervisors: Crem Loria
Angeliki Monnier, Professor Information-Communication,
Dominique Fohr, Research scientist CNRS,

Angenot M (1978) Fonctions narratives et maximes idéologiques. Orbis Litterarum 33: 95-100.
Angenot M (1989) 1889 : un état du discours social. Montréal : Préambule.
Angenot M (2006) Théorie du discours social. Notions de topographie des discours et de coupures cognitives,COnTEXTES. t
Baroni, M., Dinu, G., and Kruszewski, G. (2014). “Don’t count, predict! a systematic comparison of contextcounting vs. contextpredicting semantic vectors”. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Volume 1, pages 238-247.
Berri M, Garcia-Blanco I, Moore K (2015), Press coverage of the Refugee and Migrant Crisis in the EU: A Content Analysis of five European Countries, Report prepared for the United Nations High Commission for Refugees, Cardiff School of Journalism, Media and Cultural Studies.
Chouliaraki L, Georgiou M and Zaborowski R (2017), The European “migration crisis” and the media: A cross-European press content analysis. The London School of Economics and Political Science, London, UK.
Citron, D. K., Norton, H. L. (2011), “Intermediaries and hate speech: Fostering digital citizenship for our information age”, Boston University Law Review, 91, 1435.
Dai, A. M. and Le, Q. V. (2015). “Semi-supervised sequence Learning”. In Cortes, C., Lawrence, N. D., Lee, D.D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 3061-3069. Curran Associates, Inc
Dauphin F (2002), Rumeurs électroniques : synergie entre technologie et archaïsme. Sociétés 76 : 71-87.
Delgado R., Stefancic J. (2014), “Hate speech in cyberspace”, Wake Forest Law Review, 49.
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., and Xu, K. (2014). “Adaptive recursive neural network for targetdependent twitter sentiment classification”. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL, Baltimore, MD, USA, Volume 2: pages 49-54.
Festl R., Quandt T (2013), Social relations and cyberbullying: The influence of individual and structural attributes on victimization and perpetration via the internet, Human Communication Research, 39(1), 101–126.
Froissart P (2002) Les images rumorales, une nouvelle imagerie populaire sur Internet. Bry-Sur-Marne : INA.
Froissart P (2004) Des images rumorales en captivité : émergence d’une nouvelle catégorie de rumeur sur les sites de référence sur Internet. Protée 32(3) : 47-55.
Johnson, R. and Zhang, T. (2015). “Effective use of word order for text categorization with convolutional neural networks”. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 103-112.
Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daumé, H. (2015). “Deep unordered composition rivals syntactic methods for text classification”. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1681-1691.
Kim, Y. (2014). “Convolutional neural networks for sentence classification”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746-1751.
King R. D., Sutton G. M. (2013). High times for hate crimes: Explaining the temporal clustering of hate-motivated offending. Criminology, 51 (4), 871–894.
Lebre J (2014) Des idées partout : à propos du partage des hoaxes entre droite et extrême droite. Lignes 45: 153-162.
Mikolov, T., Yih, W.-t., and Zweig, G. (2013a). “Linguistic regularities in continuous space word representations”. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746-751.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). “Distributed representations of words and phrases and their Compositionality”. In Advances in Neural Information Processing Systems, 26, pages 3111-3119. Curran Associates, Inc.
Nam, J., Kim, J., Loza Menc__a, E., Gurevych, I., and F urnkranz, J. (2014). “Large-scale multi-label text classification – revisiting neural networks”. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD-14), Part 2, volume 8725, pages 437-452.
Schieb C, Preuss M (2016), Governing Hate Speech by Means of Counter Speech on Facebook, 66th ICA Annual Conference, Fukuoka, Japan.
United Nations (2018), International Migration Report 2017. Highlights, New York, Department of Economic and Social Affairs.
Waldron J. (2012), The harm in hate speech, Harvard University Press.

Deadline to apply : May 1st 2018

The candidates are required to provide the following documents in a single pdf or ZIP file: 

  • CV

  • A cover/motivation letter describing their interest in the topic

  • Degree certificates and transcripts for Bachelor and Master (or the last 5 years)

  • Master thesis (or equivalent) if it is already completed, or a description of the work in progress, otherwise

  • The publications (or web links) of the candidate, if any (it is not expected that they have any)

In addition, one recommendation letter from the person who supervises(d) the Master thesis (or research project or internship) should be sent directly by his/her author to the prospective PhD advisor.

En ce moment

Colloquium Loria 2018

Exposés précédents

Logo du CNRS
Logo Inria
Logo Université de Lorraine