Feature selection for smarter data analysis

12 December 2016 - 12:00pm to 1:00pm

Ponente(s):

Andrea Mariello, Estudiante de doctorado, ICT International Doctoral School, Universidad de Trento, Italia

Lugar:

Sala 1.1/2 IMDEA Networks Institute, Avda. del Mar Mediterráneo 22, 28918 Leganés – Madrid

Organización:

NETCOM Research Group (Telematics Engineering Department, UC3M); IMDEA Networks Institute

Feature selection Nowadays, we are experiencing a growing interest in data science, a relatively new discipline at the intersection between Statistical Learning, Engineering and Operations Research in which practitioners develop and use techniques and algorithms to extract useful insights from an increasing number of huge collections of data. However, the real challenge is not only to find proper ways to deal with the volume of data but also be able to cope with their velocity, variety, veracity and value (the so-called 5 Vs of Big Data). The majority of the datasets are characterized by a large number of high-dimensional patterns, such as those found in genetics, chemistry, finance etc. Others are also characterized by a high level of noise or missing values.

Dimensionality reduction, also known as feature selection, is a subfield in data science whose objective is to provide methods for:

decreasing the convergence time of learning algorithms;
extracting automatically the most relevant information, that is, the information providing the highest value;
making complex models simpler for better generalization and human interpretability;
reducing measurement acquisition costs for those systems dealing with data streams;
compressing big datasets, by retaining only the most informative features.

In this talk we explore some of the most successful techniques for feature selection, with the main focus on recent methods that use information-theoretical concepts such as the entropy and the mutual information.

About Andrea Mariello

In 2008, I was honored with the title of "Alfiere del Lavoro" by the President of the Italian Republic, being the first among the top 25 high school students in Italy in 2004-2008. In 2011, I received a BSc in Information Engineering cum laude from the University of Salento, with a thesis on parallel and distributed computing systems. Then I received an MSc in Computer Engineering cum laude from the same university in 2013, with a thesis on high performance computing titled "Big Data Analytics for Climate Change", developed in collaboration with the Euro-Mediterranean Center on Climate Change (CMCC). I worked as a computer scientist at CMCC until 2015 and I have been one of the core developers of Ophidia, an innovative and open source big data analytics platform designed for e-science. I am currently a PhD student of the ICT International Doctoral School at the University of Trento and a member of the LION lab.

Este evento se impartirá en inglés

English

NETCOM Research Group

Networks and Communication Technologies

Error message