Post-doc position
 
Open Science and Reproducible Research on Distributed Systems

To apply, send CV (including pointers to main publications) and motivation letter to lucas.nussbaum@loria.fr

Executive summary:
The goal of this post-doc position is to contribute to the opening of research on distributed systems by solving the challenges that must be overcome in order to allow Open Science and Reproducible Research in this field – improving description and publication of experiments and their results, facilitating the analysis of results, etc. The work will be applied to Algorille's tools for experimentation on distributed systems.

Key skills needed:
Mandatory:

Appreciated:

Research team name:AlGorille (leader: Martin Quinson)
Location:LORIA, Nancy, France
Contact:Lucas Nussbaum <lucas.nussbaum@loria.fr>
Keywords:experimentation, distributed systems, open science, reproducible research

Context

Distributed systems research

Distributed systems such as grids, clusters, peer-to-peer systems, high-performance supercomputers, cloud computing infrastructures or desktop computing environments, benefit of an ever increasing popularity nowadays. Distributed applications (such as decentralized data sharing solutions, games, scientific application, high-traffic web applications or scientific computations) are executed routinely on these systems.

By nature, the resulting environments and applications are extremely complex and dynamic because they aggregate thousands of elements that are heterogeneous and shared among several users. This make these systems very challenging to study, test, and evaluate. Computer scientists traditionally study their systems a priori by reasoning theoretically on the constituents and their interactions. But the complexity of these systems make this methodology is near to impossible, explaining that most of the studies are done a posteriori through experiments.

Experimentation in distributed systems research

Three main methodologies exist to experiment with computer systems: real-scale, simulation and emulation. Real-scale (or in situ) consists in executing the real application under study on an experimental platform like Grid’5000 (a large scale experimental platform in France, composed of more than 1600 machines). On the opposite, with simulation, both the application and the environments are replaced by models, and the interactions between both models are computed by a simulator. Emulation can be seen as an intermediate approach where the real application is executed within a synthetic environment. Typically, one will use a homogeneous cluster of machines as an execution environment, and use an emulation layer to reproduce the complex conditions found on the real Internet.

The AlGorille team is deeply involved in all those methodologies. It has a leadership role in the world leading Grid’5000 testbed and SimGrid simulator, and develops entirely the Distem emulator. We see those different methodologies as complementary approaches to work on the different steps of the scientific workflow: ideas are first maturated into algorithms in the simulator, before they are implemented in as prototypes tested on a real-scale testbed. Once the prototype is known to work, an emulator is used to evaluate the behavior of the prototype under various experimental conditions.

Open Science and Reproducible Research

The Open Science movement, which emerged in computational sciences, aims at conducting research in the spirit of free and open source software. It encompasses the publicity of software, data and results in order to allow enabled the detailed understanding of the experimental processes and obtained results. It also favors the reproduction of experiments and results, and the general increase of the quality of experiments.

Description

The goal of this post-doc position is to contribute to the opening of research on distributed systems – to solve the challenges that must be overcome in order to allow Open Science and Reproducible Research in this field. This will be done by focusing on the experimentation methodologies and tools actively worked on in the Algorille team, but the designed solutions will aim at being more generic.

Typically the work will consist in contributions to Algorille's tools and testbeds (SimGrid, Distem, XPFlow, Grid'5000, etc.) to enable higher-quality experiments by working on specific challenges such as:

This work will be done in collaboration with other teams in order to be applied to real experiments campaigns, by leveraging Algorille’s participation in the INRIA large-scale action Hemera (https://www.grid5000.fr/mediawiki/index.php/Hemera) that involves 25 research teams in France.