In collaboration with EDF.
Context. Clustering is the task of organizing similar objects into meaningful groups. With the big data phenomenon, modern data are now high dimensional and /or heterogeneous. This provides new challenges and there is a need to develop new clustering methods adapted to such data. In particular, we are interested in developing a clustering algorithm which is able to work with two types of data: quantitative data and functional data. Functional data are types of observation encountered when with observe a quantity over a continum, and are represented by curves.Typical application which can be solve with such algorithm is to be able to cluster EDF customers according to both the household electricity consumption (curves) and additional information about the household (number of occupants, date of the building, …)
Subject. The goal of the internship is to develop a clustering algorithm for functional and quantitative data. The main missions are :- to study the recent development in clustering methods for mixed data,- to develop a model on the basis of an original idea proposed by the internship supervisor,- to test this model on simulated data and onto data provided by EDF. A publication presenting the model will be written during the internship. The intern candidate should have high skills in statistical learning, machine learning and R programming.
Toutes les informations