14/04/25 – Séminaire de Angela Bonifati, Pr @ LIRIS-Lyon 1 :

Séminaire/congrès/conférence

Laboratoire ERIC à 09h30 en salle K071

Résumé de la présentation

One of the key processes of data science pipelines is data preparation, which aims at cleaning and curating the data for the subsequent analytical and inference steps. Data preparation deals with the errors and conflicts introduced into the input datasets during data collection and acquisition. These errors, such as violations of business rules, typos, missing values, replicated entries and abnormal features, are of different kinds depending on the nature of the data, ranging from structured data to graph-shaped data and time series. If these errors are kept into the data, they can propagate to the results of data science and AI processes and also hamper their efficiency and trustworthiness. 

My talk will present some of our results on enhancing the quality of querying and inference tasks in data science operating on different kinds of heterogeneous data. Among the others, we focus on real-life healthcare applications and provide domain experts with useful AI-assisted data management techniques that can facilitate their diagnoses and analyses. First, inconsistency-aware annotations can quantify the amount of quality for structured data input to analytical processes. These annotations are further exploited during query processing in order to enhance the output of queries with inconsistency degrees. Second, feature-based similarities among time series corresponding to patients? signals help to better identify groups of patients and to assess their risks for a particular disease. Last but not least, violations of graph constraints can be addressed by human-guided feedback and lead to better accuracy of the repairing algorithms for graph-shaped data.

Leave a Reply

Your email address will not be published. Required fields are marked *