Next D4 Seminar will take place on Friday, December 7th at 10am in room A008.
Philippe Muller (IRIT, Toulouse) will give a presentation entitled “Sentential distributional semantics: Learning semantic sentence representations and their compositions”.
(Joint work with Damien Sileo et Tim van de Cruys)
Distributional semantics aims at automatic representation of textual semantic content based on the observation of a large representative corpus. There is a large body of work on lexical distributional semantics, based on the assumption that words appearing in similar contexts should have similar semantic representations. This popularized the representation of words as vectors in a semantic space.
More recently, a lot of effort in the NLP field has been devoted to building similar representations for sentences, or even larger textual elements.
This raises several questions: how to build sentence representations from word representations in vector spaces, preferably in a compositional manner, and how to guide the representations so that they capture important semantic aspect at the sentence level?
Arguably, sequential compositional models such as recurrent neural network offer a simple composition at the lexical level that can be used in supervised settings to make accurate predictions in textual classification, while building a representation of the sentential context in their internal state. This is however specific to each task, and researchers have tried to find ways of building so-called « universal » sentence representations, or more exactly transferable representations. In this perspective several settings have been proposed that evokes supervised distributional approaches at the word level, with auxilliary tasks that could induce semantically relevant representations at the sentence level: for instance trying to predict if two sentences follow each other in a text, or if one is a consequence of the other. These in turn must compose the two sentences in a way that allows for the learning of their relationships. Composition of representations is also important in all tasks that involve predicting a relation between a pair of textual elements: sentence similarity, entailment, discourse relations.
The compositions considered in NLP are often quite superficial, and we will show more expressive compositions by taking inspiration from Statistical Relational Learning. Moreover we propose an unsupervised training task to induce sentence representations, based on the prediction of discourse connections between sentences in a large corpus.