European Integrated Infrastructure for Social Mining and Big Data Analytics (SoBigData++)

Le CIS participe à ce projet européen de 4 ans qui débute le 1er janvier 2020.

EN. SoBigData++ strives to deliver a distributed, Pan-European, multi-disciplinary research infrastructure for big social data analytics, coupled with the consolidation of a cross-disciplinary European research community, aimed at using social mining and big data to understand the complexity of our contemporary, globally-interconnected society.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no 871042.

SoBigData++ is funded under H2020 Call INFRAIA-01-2018-2019 Integrating Activities for Advanced Communities. Area: Mathematics and ICT. Topic: Distributed, multidisciplinary European infrastructure on Big Data and social data mining.

SoBigData++ is coordinated by Fosca Giannotti, ISTI-CNR, Pisa, Italy.

SoBigData++ is set to advance on such ambitious tasks thanks to SoBigData, the predecessor project that started this construction in 2015. Becoming an advanced community, SoBigData++ will strengthen its tools and services to empower researchers and innovators through a platform for the design and execution of large-scale social mining experiments. It will be open to users with diverse background, accessible on project cloud (aligned with EOSC) and also exploiting supercomputing facilities. Pushing the FAIR principles further, SoBigData++ will render social mining experiments more easily designed, adjusted and repeatable by domain experts that are not data scientists. SoBigData++ will move forward from a starting community of pioneers to a wide and diverse scientific movement, capable of empowering the next generation of responsible social data scientists, engaged in the grand societal challenges laid out in its exploratories: Societal Debates and Online Misinformation, Sustainable Cities for Citizens, Demography, Economics & Finance 2.0, Migration Studies, Sport Data Science, Social Impact of Artificial Intelligence and Explainable Machine Learning. SoBigData++ will advance from the awareness of ethical and legal challenges to concrete tools that operationalise ethics with value-sensitive design, incorporating values and norms for privacy protection, fairness, transparency and pluralism. SoBigData++ will deliver an accelerator of data-driven innovation that facilitates the collaboration with industry to develop joint pilot projects, and will consolidate an RI ready for the ESFRI Roadmap and sustained by a SoBigData Association.

Tasks allocated to THE CIS

At the CIS, Tommaso Venturini, Axel Meunier, Francesca Musiani and Mélanie Dulong de Rosnay are participating in this project.

T4.3 Datathons. Task leader: CNRS (4 months). Participants: CNR, IMT, AALTO, ETHZ

We will organize a series of Datathon (minimum 4) whose aim is to bring together young and bright minds in smaller dedicated groups, providing complementary theoretical and practical skills to visualise and analyse social big data questions addressing important societal problems. All the Datathon will be supported by the Operational Ethics and Law Board (see task 2.1) in order to include Ethical and Law aspects in the Datathon activities.

T2.3 High Level Advisories Board. Task leader: LUH. Participants: TUDelft, CNR, UNIPI, URV, CNRS (6 months), SSSA

This board activities will represent a second step after research, awareness conferences, and the experience of the T2.1 board have been collected. Opinion, use-cases, and guidelines drafts, prepared in T2.2, will be submitted for discussion within this high level board. The High Level Advisories Board will be formed by members of the consortium (core part) as well as external experts, selected according to their expertise in specific topics. The board will be chaired by J. van den Hoven, professor of ethics at TUDelft, while Professor Dr. Tina Krügel, IT law professor at LUH and Giovanni Comandé, legal informatics professor at Scuola Superiore Sant’Anna, will act as board Vice Chairs and legal counsels. This task will produce an annual white papers to be published and disseminated through different channels (see T3.2, T3.4 and T5.1) to impact European society at different levels.

T8.7 Visual Analytics services design and integration. Task leader: UvA. Participants: CNRS (2 months), CNR

The task will plan, design and integrate the methods according to the SoBigData++ Services design and integration focusing on Visual Analytics services. The task leverages the use of interactive visualisation techniques to the analytical processes in two parts of the analytical process. The first part is the inspection and the second regards the consolidation of analytical results. The rationale for such an approach not the lack, but the abundance of potentially useful results generated with different data sources, methods or settings. The consolidation process includes the comparison, evaluation and sense-making from partial results and the generation of a top-level view to the analysis.

T10.1 Societal Debates and Misinformation Analysis. Task leader: USFD. Participants: IMT, BSC, CNR, ETHZ, UT, CNRS (6 months), UNIPI, UvA, CSD, CEU

By analysing discussions on social media and newspaper articles, this exploratory aims to develop methods and datasets for studying online public debates in (near) real-time and at scale, i.e. during election campaigns or on controversial topics such as vaccination, abortion, or LGBT rights. The starting point will be the identification of key themes and points of view debates. Thus, the discussion on this topics will be analysed and visualised. Moreover, there will be an assessment of their evolution through time and space (i.e. in different countries or regions). The central focus will regard misinformation, a field where we will develop new methods for detecting, analysing, and tracking online misinformation and propaganda across social media platforms, countries, and over time. A key aim is to improve the accuracy of the methods through more data, experimentation with semi-supervised and unsupervised methods, and integrating the latest advances in deep learning. We will also study the effect of the different social relationships when it comes to opinion formation. A multi-disciplinary approach will be adopted, going beyond computer science to integrate also social and political scientists, as well as end-users and practitioners, such as the Centre for Study of Democracy (CSD), which will focus specifically on Russian propaganda and misinformation in Eastern and Central Europe. The results of T10.1 will be the development of new tools for the infrastructure, thus empowering researchers from outside the consortium to work on these topics.