DataRedux (ANR)

DataRedux is an ANR-funded project that focuses on developing radically new methods for the reduction of the complexity of large networked datasets to feed effective and realistic data-driven models of spreading phenomena. Many rich datasets on actions and interactions of individuals have recently become available, commonly encoded as networked systems, arising from heterogeneous sources with details at different scales and resolutions, and potentially containing geographical and temporal information as well as metadata. These outstanding sources of information and knowledge fuel a wide spectrum of data-driven numerical simulations of dynamical processes. Data alone, however, even in huge amounts, do not easily transform into knowledge or predictive models. The rich and diverse information they contain raises crucial challenges concerning their analysis, representation and interpretation, the extraction of meaningful structures, and their integration into data-driven models. In this context, DataRedux puts forward an innovative framework to reduce networked data complexity while preserving its richness, by working at intermediate scales (“mesoscales”). Our objective is to reach a fundamental breakthrough in the theoretical understanding and representation of rich and complex networked datasets for use in predictive data-driven models for decision making and actionable insights.​

ANR (France), (2019-2023), 780K euros

SoBigData++ (H2020)

SoBigData++ strives to deliver a distributed, Pan-European, multi-disciplinary research infrastructure for big social data analytics, coupled with the consolidation of a cross-disciplinary European research community, aimed at using social mining and big data to understand the complexity of our contemporary, globally-interconnected society. SoBigData++ is set to advance on such ambitious tasks thanks to SoBigData, the predecessor project that started this construction in 2015. Becoming an advanced community, SoBigData++ will strengthen its tools and services to empower researchers and innovators through a platform for the design and execution of large-scale social mining experiments. It will be open to users with diverse background, accessible on project cloud (aligned with EOSC) and also exploiting supercomputing facilities. Pushing the FAIR principles further, SoBigData++ will render social mining experiments more easily designed, adjusted and repeatable by domain experts that are not data scientists. SoBigData++ will move forward from a starting community of pioneers to a wide and diverse scientific movement, capable of empowering the next generation of responsible social data scientists, engaged in the grand societal challenges laid out in its exploratories: Societal Debates and Online Misinformation, Sustainable Cities for Citizens, Demography, Economics & Finance 2.0, Migration Studies, Sport Data Science, Social Impact of Artificial Intelligence and Explainable Machine Learning. SoBigData++ will advance from the awareness of ethical and legal challenges to concrete tools that operationalise ethics with value-sensitive design, incorporating values and norms for privacy protection, fairness, transparency and pluralism. SoBigData++ will deliver an accelerator of data-driven innovation that facilitates the collaboration with industry to develop joint pilot projects, and will consolidate an RI ready for the ESFRI Roadmap and sustained by a SoBigData Association.

H2020 Infrastructure project (2019-2023) 10 M euros


The project ACADEMICS (mAChine LeArning & Data sciEnce for coMplex and dynamICal modelS) will combine Machine Learning (ML) and Data Science (DS) for the purpose of scientific research into two challenging directions: (a) Computing and information processing – develop new theoretical frameworks and learning algo- rithms adapted to difficult scientific contexts involving heterogeneous, irregular, error-prone, dynamic and complex data, while taking into account prior knowledge whenever it is relevant; and (b) Complex and dynamic models learning – leverage the synergy between ML and DS to devise data- driven models in two scientific domains: climate modeling, and quantitative understanding of social systems. Focusing on these two case studies, the project will tackle the key issue of how to learn intricate models from numerous, heterogeneous and dynamic data.

The project is funded by IDEX Lyon for the amount of 1.2M euro for the period 2018-2021. Participant laboratories are LP and LIP from ENS Lyon, LIRIS from Université Lyon 1, and LabHC from the Université Jean Monnet.


The aim of the DyLNet project (Language Dynamics, Linguistic Learning, and Sociability at Preschool) is to observe and characterize the relations between child socialization and oral language learning during the preschool period by means of an innovative multidisciplinary approach that combines work in the fields of language acquisition, sociolinguistics and network science. The project implemented as large scale socio-linguistic experiment to follow ≈150 children and teaching staff at a socially mixed pre-school. The physical proximity and verbal interactions are recorded between each participants using RFID sensor technology, which captures inter-individual proximity in every 5 second and verbal interactions continuously. The experiment is run one week every month for a period of 3 years. The task, in particular, will be to examine the influence of the children’s social relations on their language development and, equally, the influence of language on these social relations.

The DyLNet project is financed by ANR for 650K euro (16-CE28-0013) for the period of 2016-2020. Participating groups are Lidilem – University of Grenoble Alpes, DANTE – ENS Lyon/Inria, LSE – University of Grenoble Alpes, Ethos – University Rennes 1, and the – University pf Orleans.


The goal of SoSweet is to provide a detailed understanding of the dynamic links between individuals, social structure, and language variation and change through the study of synchronic variation and diachronic evolution of the variety of French language observed on Twitter. Within the project we collect and analyse a corpus of 600 million tweets combined with the social network of the 5 million users, complemented by socio-demographic data. The SoSweet project adopts a strong interdisciplinary position, at the crossing of social media linguistics, sociolinguistics, natural language processing (NLP) and network science.

The SoSweet project is funded by ANR (15- CE38-0011) for 635K euro for the period of 2015-2020. The consortium of the project is formed by the teams of ICAR and DANTE from ENS Lyon/Inria, ALMAnaCH – Inria Paris, and Lidilem – Université Grenoble Alpes.


The purpose of the HOTNet (Higher-order representation of temporal networks) project is to develop a pipeline for the embedding of temporal networks that captures higher order correlations relevant for dynamical processes. We propose to detach from the straightforward representations of networks — as successions of static networks — by focusing on representations that better reflects the higher-order neighbourhood and temporal paths. To project plans to develop a framework that learns from this representation an embedding sufficient to estimate the outcome of spreading processes that might take place on top of the original network.

This is a small-scale collaborative project funded by the IXXI Complex System Institute to foster collaborations between MK and Laetitia Gauvin (ISI Torino) for the period of 2019-2021.

Earlier Projects


The general goal of the MOTIf project (Mo​bile phone sensing of human dynamics in ​t​echno-social env​i​ronment) is to understand, model, and predict individual behavior embedded in social and technological environments. We aim to understand spatiotemporal patterns of service usage of individuals to learn when, where, and what people are doing and to understand the fine-grained sociodemographic structure of society and see how the demographic characteristics of individuals in a social network correlate with the dynamics of their egocentric and global network evolution.

The MOTif project is funded by Stic AmSud by ~81K euro for the period 2018-2020. This is collaborative project to foster collaborations between France and Latin American countries. Participating groups are DANTE – ENS Lyon/Inria (France), Grandata (USA-Argentina), and the Universidade Federal de Minas Gerais, LNCC, and Pontifícia Universidade Católica de Minas Gerais (Brazil).


LIAISON is an exploratory project aims to develop unsupervised deep learning approaches to infer correlations/patterns that exist between dynamic linguistic variables, the mesoscopic and dynamic structure of social networks, and their socio-economic attributes. This interdisciplinary project is positioned at the crossroads of Automatic Language Processing, Network Science, Data Science and Machine Learning.

The LIAISON project was funded by Inria within the PRE framework by 50K euro for the period of 2017-2018.


This project intended to cover both the basics of and recent advances in Network Science by organizing a series of workshops and bring together world-known experts from the fields of mathematics, physics, signal processing, computer science, social science, epidemiology and linguistic to discuss and enhance our understanding about the interaction between the structure, evolution, and coupled dynamical processes of complex networks.

The project was funded by Labex MILYON in 2016 by 90K euros.