Data lake for big data analytics and governance
Lundi 9/12/2019, 11:00, Salle K71
Over the past decade, the data volume, velocity, variety, veracity requirements change a lot. To group these new requirements, we use a new term big data. Traditional approaches such as data warehouses have great difficulty to store and analyze big data. We propose a new solution called data lake which uses distributed storage and computing systems to store and analyze data of any possible format without any predefined schema.
Data governance is another challenge inside a data lake. Because unstructured or semi-structured data without an efficient and comprehensive metadata system is not useable.
In this presentation, we propose an implementation of the data lake based on Hadoop ecosystem, and a metadata framework for managing metadata inside the data lake.