Data Mining and Big Data Analytics

Data mining and big data analytics is a core subject in data science with the aim to develop methods to examine sizable and multivariate datasets. Their common purpose is to uncover hidden patterns, unknown correlations and other useful information useful to make better decisions. In this course we will introduce methods of data aqusition and concepts of data mining, machine learning and big data analytics. We will cover the key data mining methods of clustering, classification and pattern mining are illustrated, together with practical tools for their execution. We will also demonstrate the applications of these tools on real datasets, to show how they can help us to analyse the digital traces of human activities at societal scale, to understand and forecast many complex socio-economic phenomena. The course will have a hands-on approach, with homeworks, practical classes and with the development of a project. Students are free to work in any computer language/network software they feel most comfortable. However, during the class all examples and sample code will be provided in Python and Jupyter notebooks, thus the use of Python is strongly encouraged.

Course webpage

Digital Data Collection Methods

In the age of the digital data revolution the collection of human behavioral datasets is a very important issue and requires thorough training for the appropriate design of collection methods. While researchers commonly assume that data is granted at the outset, without control on the data collection pipeline, one never can be sure about intrinsic biases, hidden correlations or unrepresentative sampling. All these can potentially induce misleading noise or undermine any observation/conclusion drawn from the date-driven observations. The aim of this course is to provide proper training on the methodological paths of digital data collection to understand how to translate a scientific hypothesis to data collection pipelines precisely measuring the question in hand with the least possible noise and environmental effects. During the course we will learn in depth about all the latest techniques to collect individual or collective human behavioral data using tracking, monitoring or crawling methods or transactional data technics. We will also learn how to design digital surveys, to collect online questionnaires or to set up controlled online social experiments. All these methods are in the frontline of computational social science and are pivotal for the coming generation of researchers and data scientists working on any related questions. The course will have a hands-on approach, with homework assignments, practical classes and the development of a project.

Course webpage