La soutenance aura lieu le mardi 13 septembre à 14h00, dans la salle des colloques du Palais Hirsch (maps) de l?Université Lumière Lyon 2, sur le campus Berges du Rhône.
Résumé : Since the development of writing 5000 years ago, human-generated data gets produced at an ever-increasing pace. This rate has been greatly influenced by technical innovations, such as clay tablets, papyrus, paper, press, and more recently the Internet. At the same time, new methods designed to handle and archive these growing information flows emerged: clay archives (Nippur, Mari), early libraries (Alexandria, Rome’s Tabularia, Athens’ Metroon), religious scriptoriums (abbeys, monasteries), modern libraries and, more recently, machine learning. Each of these archival methods aims at easing information retrieval.
Nowadays, archiving is not enough anymore. The amount of data that gets generated daily is beyond human comprehension, and appeals for new information retrieval strategies. Instead of referencing every single data piece as in traditional archival techniques, a more relevant approach consists in understanding the overall ideas conveyed in data flows. To spot such general tendencies, a precise comprehension of the underlying data generation mechanisms is required. In the rich literature tackling this problem, the question of information interaction remains nearly unexplored. Explicitly, few works explored the influence of anterior human-generated data on ulterior data creation mechanisms. In this manuscript, we develop a panel of new machine learning methods that explore this specific aspect of online data generation.
First, we investigate the frequency of such interactions. Building on recent advances made in Stochastic Block Modelling, we explore the role of interactions in several social networks. We find that interactions are rare in these datasets.
Then, we wonder how interactions evolve over time. Earlier data pieces should not have an everlasting influence on ulterior data generation mechanisms; an ad may exert a short-term influence on buying behaviors, but would have no influence on them a decade later for instance. We model this using dynamic network inference advances on social media datasets. We conclude that interactions are brief and that their intensity typically decays in an exponential fashion.
Finally, as an answer to the previous points, we design a framework that jointly models rare and brief interactions. Doing so, we exploit a recent bridge between Dirichlet processes and Point processes. We improve on this advance and discuss the more general Dirichlet-Point processes. We argue that this new class of models readily fits brief and sparse interaction modelling. We conduct a large-scale application on Reddit and find that interactions play a minor role in this dataset.
From a broader perspective, our work results in a collection of highly flexible models and in a rethinking of core concepts of machine learning. Consequently, we open a range of novel perspectives both in terms of real-world applications and in terms of technical contributions to machine learning.