
Laboratoire ERIC, Salle K071 à 10h30
Abstract: Claim data, including Electronic Health Record (EHR) data, constitute vast datasets collected by healthcare institutions (hospitals, insurance systems, etc.) throughout patient care. These datasets are readily available and contain valuable real-world medical information about large populations. For this reason, epidemiologists are increasingly conducting public health studies using these data to support public health decision-making. However, they face several challenges in fully exploiting these data: they are multimodal, some components are unstructured (e.g., medical reports), structured data require medical knowledge for proper interpretation, and they are longitudinal in nature.
In this presentation, I will discuss how advancements in artificial intelligence can empower epidemiologists in analyzing these data and detail two key approaches. First, I will illustrate how two different machine learning techniques can effectively leverage the longitudinal nature of the data. In particular, I will present various methods we have developed for conducting unsupervised analyses of longitudinal care trajectories (temporal tensor decomposition and temporal sequence clustering). Then, I will explore methods from the field of web semantics to address semantic gaps within these datasets and discuss how they could revolutionize epidemiological research using claim data. Finally, I will present ongoing multidisciplinary projects focused on the ethical use of AI in this context.