Thursday, September 11, 2014

Kernel Methods Aren’t Dead Yet

The other night I gave a talk at the Data Science MD meetup group about using Kernel Methods on larger datasets. I have the slides up here. Public speaking is really outside my comfort zone, but its something I want to do / get better at. Overall it was a great experience, and my first experience giving a talk about Machine Learning in front of a large group. I'm quite sure I left a good deal of room for improvement.

The majority of my talk was about methods that could be used to approximate the solution of a support vector machine, specifically forming approximate features and using a linear solver or performing bounded kernel learning using projection and support vector merging. The motivation being that these approximate methods are good enough to find the parameters we want to use, and then a slower more exact solver can be used on the final parameters chosen.

Part of what motivated this idea is that many people feel that they have to use the exact same algorithm for all steps. There is simply not reason for that, and the grid search is a perfect example of this. If a pair of parameters C and \(\sigma \) are going to perform badly using the exact solver, they aren't going to get any better with an approximate solution. So the approximate solution is more than good enough to filter these out.

One could envision a tiered approach. In the first stages very fast but low quality approximations are used, whittling down the set of all parameters to try down to a smaller set. Then a more accurate approximate to cut out more. Continuing until we feel that a 'best' pair can be selected, and the most accurate (or exact) solver gets us the final model.

My other motivation for the talk is (my perceived) dying out of using SVMs. A lot of people now seem to be trying to use either only linear models or Neural Networks. Both certainly have their place, but I feel kernel methods are a more easily applied than Neural Networks while having significantly wider applicability than simple linear models. One of many tools in the tool box thats just collecting dust when it could be solving problems.

For me personally, it was good experience to talk to a wider audience / range of skillsets. The number of black stairs I got from the audience makes me think I may have lost a few people. So in future versions I think I'm going to try and add more slides that give the intuition of what's going on. I'm not sure how to best give the intuition for Random Kitchen Sinks if the mathy version doesn't work, so that will take some thought.

I was also quite nervous, and went through my slides a bit too fast. I try hard to not be that guy reading their slides word for word, and in doing so forgot to talk about some things that weren't in my slides. Hopefully more practice will decrease the nervousness in the future.