The 2nd study session on social networks data organized by LUE project OLKi will take place on Monday 18th October at the Loria Amphiteatre.
Please register before 16th Oct : http://enquetes.univ-lorraine.fr/index.php/891174
9:30 – 9:45 Welcome coffee
9:45 – 12:00 Presentations & discussion
Marie FLESCH (ATILF) – Internet language and gender: an intersectional study of a corpus of Reddit comments
Tomara GOTKOVA and Nikolay CHEPURNYKH (ATILF) – Public Perception and Usage of Environmental Vocabulary: Building and Exploiting a Thematic Social Media Corpus
Aman SINHA (IECL & ATILF) – C-Net: Contextual Network for Sarcasm Detection Nicolas ZAMPIERI – (LORIA) Multiword Expression Features for Automatic Hate Speech
Detection
Dana RUITER (Saarland University) – Title to be announced
ABSTRACTS
Marie FLESCH (ATILF) – Internet language and gender: an intersectional study of a corpus of Reddit comments
This quantitative sociolinguistic study explores the dynamics of language and gender in a 20- million-word corpus of Reddit comments written by 1044 (mostly American) internet users. It focuses on 11 variables, including emojis, emoticons, acronyms, apostrophe omissions, and phonetic spellings. It adopts an intersectional perspective by taking into account cisgender, transgender, and non-binary gender identities, and by examining the interaction of gender with age and ethnicity.
Tomara GOTKOVA and Nikolay CHEPURNYKH (ATILF) – Public Perception and Usage of Environmental Vocabulary: Building and Exploiting a Thematic Social Media Corpus
We will present our interdisciplinary research whose main objective is the study of core vocabulary of the environment in English, with one specific issue in mind: the interplay between domain-specific usage of this vocabulary and its usage in ordinary discourse found on the Internet, with special attention paid to social networks. Our presentation will cover the following points: technical aspects of building a reference corpus using data extracted from Twitter and Reddit; normalization of raw corpus data; qualitative linguistic analysis of environmental terms (carbon, carbon dioxide, …) in the corpus.
Aman SINHA (IECL-ATILF) – C-Net: Contextual Network for Sarcasm Detection
Automatic Sarcasm Detection in conversations is a difficult and tricky task. Classifying an utterance as sarcastic or not in isolation can be futile since most of the time the sarcastic nature of a sentence heavily relies on its context. This paper presents our proposed model, C-Net, which takes contextual information of a sentence in a sequential manner to classify it as sarcastic or non-sarcastic. Our model showcases competitive performance in the Sarcasm Detection shared task organised on CodaLab under the workshop on Figurative Language Processing, at ACL 2020 and achieved 75.0% F1-score on the Twitter dataset and 66.3% F1- score on Reddit dataset.