Loading…
DevConf.CZ 2020 has ended
Saturday, January 25 • 11:00am - 11:55am
Data cleaning: when less is more

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
In today's ML world we are gathering and analyzing an enormous amount of data. But how to deal when there is too much information, i.e. too many variables? We can use grid search that will select variables for us, but this process is very computationally expensive. In my talk I will show various strategies for variable selection and how to combine them into data cleaning pipelines. I will cover univariate variable selection, PCA (Principal Component Analysis) and penalization regression technique. This talk will give you practical tips on how to get the most of your data instead of getting lost in variables.

Speakers
avatar for Anastazie Sedláková

Anastazie Sedláková

Data scientist, Freelancer
I am data scientist, programming courses lecturer and mom of two. Together with my husband, we are organizing programming courses (sedlakovi.org). My background is in statistical genetics. During my PhD, I learned to program and then changed the field completely - to work with financial... Read More →



Saturday January 25, 2020 11:00am - 11:55am CET
A112 Faculty of Information Technology Brno University of Technology, Božetěchova, Brno-Královo Pole, Czechia