There will be 4 incredible keynotes at IDA 2022
- Dominique Lavenier (CNRS, Fr) will talk about data storage on DNA;
- Cynthia Liem (TU Delft, N) will talk about validation and validity in data processing pipelines;
- Michèle Sebag (CRNS, Fr) will talk about causal explanations;
Julia Stoyanovich (NYU Center for Data Science, USA) will talk about building data equity systems.
From Machine Learning to Causal Modelling
The AI wave faces a shock as learning algorithms are being democratized and used outside of research labs. On one hand, the accuracy of learned models critically depends on whether they are used in the same context as they were learned. On the other hand, using learned models to achieve decision-making incurs multiple risks, ranging from inefficiency to unethical risks.
Causal modelling aims at building more reliable models, avoiding common drawbacks of causality-blind approaches: they aim at finding which variables cause some other ones, and how — such causal mechanisms being by definition independent on the data acquisition process.
The talk will describe the state of the art in causal modelling, and describe some applications in the domain of human resources.
Bio: Michele Sebag is a senior researcher at CNRS and Univ. Paris-Saclay. With a background in Maths, she went to industry, then entered the French National Center for Research (CNRS). She is head of the Machine Learning and Optimization team in the Lab of Computer Science at Université Paris-Saclay, and co-head with Marc Schoenauer of the Inria team TAU, (Tackling the Underspecified – referring to the number of under-specified issues at the core of Artificial Intelligence). Her research interests include causal modelling, deep learning, and applications of machine learning for society (health, hiring, social sciences). She was elected European AI Fellow, and member of the French Academy of Technology.
(Thursday 21th 9:00 am) Dominique Lavenier, DR CNRS, Univ. Rennes, IRISA/Inria, GenScale Team
Data Storage on DNA
The talk will first explain the principle of storing digital data on DNA molecules. It will describe the different steps of the process, including the biotechnological operations, and their interactions and impacts on IT processing. It will also highlight the different challenges and perspectives of this new field of research. Finally, we will present the dnarXiv project which aims to experiment with various strategies to read and write information on DNA.
Bio: Dominique Lavenier is a senior CNRS (French National Center for Scientific Research) researcher and a member of the GenScale bioinformatics team at IRISA/INRIA, Rennes, he created in 2012. His current research interests include bioinformatics, data structures, genomics, parallelism, optimization algorithms, processing-in-memory, and DNA storage. He is currently leading the dnarXiv project whose goal is to explore various strategies for archiving information on DNA.
(Thursday 21th 3:50 pm) Julia Stoyanovich, NYU Center for Data Science, USA
Building Data Equity Systems
(Friday 22nd 2:00 pm) Cynthia C. S. Liem, Delft University of Technology, The Netherlands
Working as intended? On validation and validity in data processing pipelines
In the current era of big data, we can acquire and analyze more data than ever, but this data is unstructured and messy, and measurement procedures may not have been optimal. Even more strongly, in many human-focused use cases, we may not be able to fully articulate what and where to measure, even though we have a good intuitive sense on what is an intended or unintended outcome.
To make this even more challenging, multiple examples exist where techniques were ‘working as intended’ according to traditional metrics and evaluation procedures, but where in actual systems, they turned out not to, even causing societally problematic outcomes.
In this keynote presentation, I will discuss how my interest in this topic was triggered in the music domain, and illustrate how my team currently seeks to more broadly gain confidence in computational measurement procedures, drawing inspiration from metrology, psychometric validity, software testing, and metascientific and open science practices. In this, I will address both tensions and connection opportunities between design, science and engineering methodologies, and argue why I feel these particularly need to be addressed when working on societally impactful applications.
Bio: Dr Cynthia C. S. Liem is an Associate Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests focus on making people discover new interests and content which would not trivially be retrieved in music and multimedia collections, assessing questions of validation and validity in data science, and fostering trustworthy and responsible AI applications when human-interpreted data is involved.
She initiated and co-coordinated the European research projects PHENICX (2013-2016) and TROMPA (2018-2021), focusing on technological enrichment of digital musical heritage, and participated as technical partner in an ERASMUS+ education innovation project on Big Data for Psychological Assessment. She gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach, Researcher-in-Residence 2018 at the National Library of The Netherlands, general chair of the ISMIR 2019 conference, and keynote speaker at the RecSys 2021 conference.
Presently, she co-leads the Future Libraries Lab with the National Library of The Netherlands, is track leader of the Trustworthy AI track in the AI for Fintech lab with the ING bank, holds a TU Delft Education Fellowship on Responsible AI teaching, and is a member of the Dutch Young Academy.