September 2021 - PhD offer: Mixed data #temporal #clustering for modelling longitudinal surveys

In many areas of humanities and social sciences, the studies are based on questionnaires completed by participants. Often, these questionnaires are completed several times over the study period. The researchers then analyse these questionnaires to determine typical behaviours within the studied population. But the statistical analysis of these questionnaires is far from simple, for several reasons. First, the answers to the questions are often of different types: nominal categorical (for example « what is your socio-professional category? »), ordinal categorical (for example « what is your level of satisfaction: bad, average, good? »), quantitative (« what is your age? »), textual (for open questions with free answer). The analysis of such mixed data is a currentresearch problem in the fields of statistics and machine learning, and for lack of an existing solution the practitioner often tends to transform the data to standardize them. Such approach is not satisfying since it leads either to the introduction of a bias or to an important information loss. The second scientific obstacle is the modelling of the temporal evolution of the answers to the questions. Currently, the analyses are done independently at each temporal phase, then researchers try a posteriori to find links between these different analyses, by seeking from one phase to the other to find similar typical behaviour. The ideal way to model these data would be to propose a model of the temporal evolution, which models all the responses to the questionnaires at the same time. Thus, the analysis will exhibit typical temporal evolution behaviours, which are the objects which researchers in human and social sciences wish to study.

This thesis will thus provide a complete tool for analysing questionnaires repeated over time. The core of the thesis will be the development of a statistical model and associated inference algorithms. But the PhD student will go as far as the implementation of a software tool in the form of an R package, so that researchers in humanities and social sciences can easily use these results.

Complete information

September 2021 – PhD offer: Mixed data #temporal #clustering for modelling longitudinal surveys

Leave a Reply