ICML 2015 - day one
It's ICML time again and this year the conference is in sunny and beautiful Lille, France.
Day one has served up some tasty papers.
First up, the keynote from Leon Bottou was good - a look at the practical difficulties in using Machine Learning (ML) and how that may / will drive change going forward. The question is how the wider "conventional" software engineering community can integrate ML. He referenced Machine Learning: The High-Interest Credit Card of Technical Debt which should be mandatory reading for all ML coders, from beginner to master.
The problems in a nutshell:
In ML, data is as important as code => if I give you my trained model and you run it on your data, performance (accuracy) is often terrible. We're not used to that in software engineering (for example we expect core classes in the Java and .NET SDKs to work as advertised and they do).
Leading on from #1, this means that layers of abstraction we take for granted in SW eng, mech eng do not exist yet in ML. In some cases all I can do is take your cool idea contained in a paper and re-implement it from scratch for my problem and data set. That's a step backwards from regular software engineering (better search algorithm, faster database, ) which we can simply read up on, download and start using.
Bottou's keynote didn't really offer solutions (the problems are fundamental and ubiquitous throughout ML) but he did make one statement which resonated:
Ask paper authors to critique their own work.
I would add to this and ask authors to point to failed approaches and experiments. Papers often resemble perfect jewels and are crafted as such to pass the peer review process for conferences and journals. But every research approach / idea has corner cases and weak points. An objective appraisal of these would make papers so much more valuable. I'd also ask for open data sets and reproducible code but one step at a time :)
Next we had the first best paper award - A Nearly-Linear Time Framework for Graph-Structured Sparsity from Hegde, Indyk and Schmidt: the elegance in providing a generalised model for multiple hitherto separate models was clear to see here.
For the remainder of the day I attended Deep Learning I and Deep Learning Computations, both chaired by Yoshua Bengio. Papers of note:
Applying ML to MOOCs
In Learning Program Embeddings to Propagate Feedback on Student Code, the motivation (widen CS education by enabling human effort of graders to be extended / augmented by appropriate use of ML) building on MOOCs is clearly worthwhile. The improvement in existing methods (edit distance, unit tests) was impressive. Essentially recursive neural networks (RNNs) map from (common and embedded) preconditions to postconditions both more accurately than edit distance and without the "hard fail" that unit tests must necessarily apply so students can be given relevant feedback on their assignments.
Better solutions than LSTM?
In Gated Feedback Recurrent Neural Networks, the rich research vein currently operating for recurrent nets in general continues. They are simpler than LSTM and in some cases can give better performance. Outside of that I'm not really using them - I know they're in keras so plan to tweak my models soon to see if they work better than LSTM for my datasets.
Generating new candidate architectures
In An Empirical Exploration of Recurrent Network Architectures, the core idea is to use mutations in some seed architectures (LSTM and GRU) to try and find a better architecture variant / perhaps a whole new variant? The search space was explored first by filtering on toy tasks and then "graduating" good candidates to more advanced tasks such as XML prediction / PENN. My question here would be: is there not an inherent bias in seeding the search space with variants of LSTM and GRU, i.e. how different can the candidates truly be without becoming unviable. It would be interesting to apply the methodology used to a more varied candidate pool.
Hashing + Dark Knowledge
The Compressing Neural Networks with the Hashing Trick paper was nice. In particular I liked the addition right at the end where the authors combined their methodology with a competing yet completely orthogonal approach (Dark Knowledge from Caruana / Hinton) to sweep the board in their selected metric - a suggestion that the presenter attributed to one of the ICML reviewers!
Finally, for something very different, the IBM guys at TJ Watson presented some hardware - algorithm hybrid work - Deep Learning with Limited Numerical Precision. That work on its own is nice, but I was most intrigued by their forward-looking roadmap - expanding to look at CMOS < 10 nm, carbon-based vs silicon-based logic and new memory architectures. It's clear that the current explosion in deep learning was made possible by the advent of GPUs and algorithms that parallelise well onto GPUs, so the next generation of hardware can have a similar impact.
Questions posed by Schmidhuber
It would be remiss of me not to mention questions asked by Jurgen Schmidhuber in both of the Deep Learning sessions. I think the detail of what is going on is known only to the various participants / protagonists but I point the interested reader to these posts to get a flavour of the argument:
Initial post by Schmidhuber on the Nature paper:
The reply from Yann LeCun in the comments is worth reading as a counterpoint (I don't see a way to link to it, please let me know if there is a way..)
During the sessions, Bengio also answered the question posed.
Taking one concrete example, Schmidhuber asked the authors of the Empirical Exploration in RNNS paper to review their work vs this paper from ICANN 2009. I've read the Empirical Exploration paper but not the 2009 paper so cannot comment either way.
There are two more examples (GRU and the hashing paper) where the same question was posed and I'll post them as I find them.
PS I'm working on a comment facility for this blog. In the meantime, please tweet to me using @humphreysheil if you have any comments, thanks!