Humphrey Sheil - Blog

Deep learning and machine learning

14 Jul 2015 | 211 comments

This blog post is a more informal discourse on a formal exercise I'm currently engaged in - the background literature review for my PhD. As such, it is very link-rich - the journey I am documenting here is intrinsically made up from the work of others. This post may also be interpreted as an attempt to convince my supervisors Prof. Omer Rana and Prof Ronan Reilly that I am actually getting stuff done instead of reading papers and debugging Torch7 / Theano code. But this is not the case :)

Machine Learning - background

The field of Machine Learning is highly active at present. Week by week, month by month, previous records are tumbling and then tumbling again (although rules are being broken along the way). Libraries implement algorithms that are optimised for GPU hardware using Nvidia's CUDA and researchers can either buy hardware from Nvidia, or rent machines in the cloud an hour at a time. Coders at home / small companies can now train very large models on extremely large data sets that only a much larger company could contemplate doing just 3 - 4 years ago.

Why is this important? Well, the simplest definition of Machine Learning is software that improves its own performance over time when measured against a specified task. That sentence describes a breed of software very different to what we are all used to at present - operating systems, browsers and most applications do not meaningfully modify their behaviour based on "experience" - they repeatably do exactly the same thing over and over again. Having software that is trainable - that can learn - is an immensely valuable capability as we will see later on.

In my opinion, Machine Learning represents one of the grand challenges of our time, up there with understanding the fabric of reality or what is life itself.

Machine Learning != Deep Learning (except it kind of is right now)

Machine Learning is not Deep Learning (it's a superset of it) but recently (and especially in mainstream media) the two terms have become synonymous. Deep Learning is a branch of Machine Learning that stems from Artifical Neural Networks - in my opinion the most interesting of all ML branches because of biological plausibility (a mapping to a similar function or characteristic in the human brain) and the roadmap / treasure trove of ideas it provides to researchers. ANNs have been wildly popular and deeply unpopular by turn since the 1960s, as first hype outstripped reality and then practitioners caught up again (Minsky and Papert's devaluing of ANNs in the 1969 book Perceptrons still echoes through the field today even though it was refuted).

Referencing Minsky here is appropriate - until recently talking about symbolic logic as implemented in Prolog and Lisp was as acceptable (even more so) as neural networks. But like neural nets, Prolog and LISP have fallen out of fashion. From a software engineering point of view however, they are seductive. Prolog for example deals with easy to understand facts and rules as follows:


When we "consult" the system with questions, Prolog uses resolution refutation to answer it, so we get:

 ?- likes(mary,food). 

But Prolog is slow and the rules / facts databases became unmanageable. Prolog / Lisp suffered their own AI winter in Japan as non-specialised hardware outstripped the elegant dedicated hardware (they were basically outperformed by Moore's Law). Recently, researchers at Google have built deep learning systems focusing on the same space which display promising results but their inner workings are not as obvious / clear as a (albeit simple) Prolog equivalent. That's neither a good or bad thing - in a nutshell it is a restatement of the old Symbolics vs Connectionist debate. In practice, hard results count and right now neural networks are winning that debate..

The following schematic shows the concentric grouping / subsets of Artificial Intelligence to provide a truer picture of the relationship between deep learning and machine learning.

deep learning vs machine learning

Figure 1. A schematic depicting the relationships between different sub-fields in Artificial Intelligence, from Bengio's book.

Neural networks have their problems - among them the Problem of Opacity. Although visualisation tools exist, for the most part neural networks look like, and are, large lattices of floating point numbers. The casual observer will make no more sense from them than by looking at the human brain, which isn't much good when trying to integrate a neural network into a working softare system.

So what is a neural network?

Haykin's canonical text defines a neural network as follows:

"A massively parallel distributed processor made up of simple processing units that has a natural propensity for storing experiential knowledge and making it available for use".

Those simple processing units are neurons. The standard model of a neuron was proposed by McCulloch and Pitts in 1943 and hasn't really changed much since. A neuron accepts a vector of inputs X, those inputs are weighted and an optional bias b (not depicted in the figure) and generates an output or activation based on some function (many different functions are used depending on the characteristics desired) applied to the weights. The diagram below depicts this stylised neuron.

A single neuron

Figure 2. The standard (McCulloch-Pitts) model of a single neuron.

Real-world neural networks have many thousands of these neurons, and together they provide the store of experiential knowledge and most usefully of all, the ability to understand new, unseen data in the context of previous data already seen. GPUs are ideally suited to train neural networks as they too are massively parallel as the diagram below clearly demonstrates.

Titan X layout

Figure 3. The Titan X GPU from Nvidia with 3,072 CUDA cores.

A personal history of neural networks

I first encountered neural networks in 1997 - my final year project combined SNNS (an ancestor of Torch7 / Theano) and Simderella to create a 6 DoF robotic arm that "learned by seeing". We taught the network to nest cups - just as a child would do. In fact, as the network trained, it displayed a progression of complexity in nesting strategies (linear -> pre-stacking) exactly as Piaget documented for real children. This project was also intended to be a stepping stone to further work inspired by the development of Broca's area in the human brain, whereby if we pre-train or condition a neural network on simple tasks, it is more successful at learning more complex tasks. Pre-training / conditioning is an important heuristic to consider when training deep nets (see below). In hindsight, I suspect that I over-trained the hell out of this network for my end of project demo to ensure accuracy in finding the cups on the simple 20x20 retina I used! In 1997 (subjectively speaking), neural networks were seen as elegant from a computer science perspective but suspiciously close to psychology and software engineering could harvest much better (and understandable) results from techniques like Case-based Reasoning, inference, decision trees and so on.

In 2003, my M.Sc. again focused on neural networks but almost as a side-show to the main event (distributed computing). The primary thrust of this work was to show how a cluster of heterogenous compute nodes could effectively replicate a very well-known benchmark Predicting the secondary structure of globular proteins using neural network models by Qian and Sejnowski, 1998 with near linear speed-up using a tuple-based architecture based on Linda from David Gelernter. As an aside, this work used only feed-forward neural networks - recurrent networks and certainly LSTM would produce better results due to their ability to retain information across the input sequence of amino acids.

In 2003, urban myths existed about investment banks using neural nets to predict the stock market and to monitor portfolio risks. If true, then I put it to the reader that the events of 2008 demonstrate that these neural networks may have found an unfortunate series of local minima..

In 2006, the world changed. Geoffrey Hinton, Yoshua Bengio and Yann LeCun started stacking neural networks in layers - each layer extracted important / relevant features from its input and passed these features to the layer above it. In addition, focus shifted to a generative model from a straightforward classifier - in effect the main task of the network became generating the input data at increasing levels of abstraction (via stacks) so that useful features were extracted. These were significant breakthroughs, and the timing was fortuitous - on the hardware side, Nvidia (and AMD but OpenCL is all but dead for deep learning) were making GPU cards available as computing resources accessed via CUDA. Even today, we are still in a golden age because of this trifecta confluence:

  1. A core idea (neural networks) with a training algorithm Backpropagation: Rumelhart, Hinton, Williams intrinsically amenable to parallelisation.
  2. Intrinsically parallel hardware (GPUs) to train these neural networks on.
  3. More and larger data sets (both labeled and unlabeled) to provide as input to neural networks.

In March 2013, Hinton joined Google and in December 2013, LeCun joined Facebook. Both companies, Baidu and more are investing heavily into deep learning to automatically categorise, understand and translate rich content on the web - text, images, video, speech.. Why you would do this is obvious - social platforms become more relevant, mobile phones become more powerful and useful, ads become more targeted, Ecommerce sites deliver better search results, recommendations are genuinely useful - the possibilities are endless. And simply having Google and Facebook working in this area is responsible for a lot of the current (both accurate and inaccurate) mainstream media coverage on Artificial Intelligence.

This leads us nicely onto what neural networks are typically used for.

Practical applications

Simply put, neural networks extract features and patterns from data they are trained on - often features and patterns that a human will not see. Then, a trained neural network can be presented with new, unseen data and make predictions. This generalisation step is very useful, especially for Recurrent Neural Networks which can absorb data over time, i.e. encode temporal information or sequences. Neural networks are getting awfully good at:

  • Image parsing (edge detection, object detection and classification)
  • Scene understanding
  • Speech recognition
  • Natural Language Processing (NLP)
  • Language translation
  • Playing simple video games
  • Classification and clustering of multi-label systems
  • Prediction of temporal sequences.

For specific use cases, ANNs sometimes can and will be out-performed, often by simpler / older techniques. It is always wise to construct a problem such that it admits the use of multiple ML techniques so that peer to peer comparisons can be made. Nevertheless, in computer vision and sequence translation in particular, deep learning approaches dominate the various leaderboards.

More generally, if a neural network N can be trained on a system S and events E occurring in that system from time step 0 up to the current time step t e.g. E = {e0, .. , et} and can then provide reasonable to good predictions on what will happen in e(t+1), e(t+2) etc., then this has wide applicability - in Ecommerce, personalised medicine, inventory management - you name it, there is a need for this kind of prediction. This is a fundamental attraction of neural networks (and often overlooked / undervalued versus understanding an image and its elements).

Neural network frailties

Make no mistake, neural nets are frustrating things to work with. Neural networks are:

  1. Difficult to train (inordinate length of time, average end-performance achieved).
  2. Remarkably sensitive to how the inputs are presented so that the right features can be extracted (ICLR is an entire conference devoted to Representation Learning) - Bengio, Courville and Vincent have compiled a nice review on the intricacies.
  3. The initial state of the weights can have a dramatic (positive and negative) impact on end-performance.
  4. Amenable to "rules of thumb" which are sprinkled throughout the literature and discovered / rediscovered / reinvented: for example - curriculum learning, reversing input sequences, tuning the learning rate, . This is not a complete list by any means..

Despite all of this, neural networks in various forms have demonstrated class-leading results in many different domains. That is why they are interesting and that is why researchers and companies persist with them in my opinion. Their complex, esoteric nature is more than outweighed by the results they yield.

"State of the nation" in deep learning - ICLR 2015

It's instructive to look at the subject matter presented at the very latest conferences in a research field to see what the current "state of the nation" of that field is. Machine Learning is no different, and the last conference I attended was ICLR 2015 in San Diego, California just a few weeks ago. As already noted in other reviews of ICLR, all of the papers presented are available on Arxiv, which is fantastic. And it's great to see the authors of those papers present their work or getting to talk them directly in the poster sessions. There are many other excellent conferences - NIPS, KDD, IJCAI - I choose ICLR as it is freshest in my mind!

Firstly, the datasets used to date are starting to struggle.

Secondly, researchers are looking for common reference points against which to grade the latest systems under development. Toy tasks is a nice example of this with a clear lineage from curriculum learning. A paper under review for ICLR 2015 but I guess not accepted was Recurrent neural network regularization. What I like most about this paper is the code on Github - a great way to match up data in the paper with reproducible output.

Fast Convolutional Nets With fbfft: A GPU Performance Evaluation demonstrated how GPUs can be driven even harder when training deep nets and there is already another version of this library in the works (hint: use 16-bit floats instead of 32!) It's not hard to imagine Nvidia producing dedicated hardware for the deep learning community in the future if further divergent optimisations are identified..

Finally, a personal favourite of mine purely from an intuition perspective is Memory Networks. It has low biological plausiblity (or certainly this aspect is not explored in the paper) but high practical application. We can easily imagine a variant of this architecture which learned to access business facts held in SQL / NoSQL databases for example, or to access Prolog-like facts and rules.

The commoditisation of Machine Learning

Machine Learning has crossed a rubicon of sorts and it is now seen as a must-have for business applications. The traditional applications of speech and computer vision will continue, but ML will also become ubiquitous in POBAs (Plain Old Business Applications). This commoditisation of ML is already happening:

  1. Microsoft (this year)

  2. Google (2011, little adoption)

  3. Amazon (this year)

Many smaller services also exist and we can expect to see consolidation over time as winners emerge. For those engineers who want to consume ML locally, there are services such as:

  1. WEKA

  2. SparkML

The bane of ML practitioners is hyperparameter selection (aka the god parameter - the config flag that makes your beautiful neural network work brilliantly or little better than a random guess). In fact, Bergstra and Bengio demonstrated that randomly guessing hyperparameters is often a "good enough" strategy as algorithms are typically sensitive to just one or two hyperparameters. More recently, Snoek et al applied Bayesian priors with Gaussian processes to further refine this approach (I guess I'd call it intelligent randomness plus learning). Spearmint - the implementation built has been spun off into a startup with a nice API. Given that some deep nets take ~two weeks to train, any quicker path to the optimal hyperparameters is advantageous.

Future directions in the field

It is clear that the software tooling around Machine Learning is pretty immature when compared to the theory. Researchers write and publish papers - not production-strength code. So comparing Machine Learning to other branches of Computer Science (operating systems, networking, data storage, search) shows that a lot of work needs to be done in this area.

At the risk of inordinate bet-hedging, the next breakthrough will either come from an existing approach (see how popular Long Short Term Memory aka LSTM from Schmidhuber and Hochreiter became in 2014 / 2015 after 15 years in the wilderness) or new thinking, probably inspired by a biological construct. Geoffrey Hinton's current work on capsules / manifold learning is a good example of this. It would certainly seem promising for researchers to see what ideas / techniques exist in the literature that were abandoned due to computational intractability which might now be more tractable!

It is possible that all we have to do is stack networks ever higher (900+ layers deep?) as per the Highway Networks paper from Srivastava, Greff and Schmidhuber to keep advancing but my sense is that there is only so much distance left in the "deep" approach. Anyway, anecdotally six layers appears optimum at present (but GoogLeNet uses 22 or 27 depending on how you count them).

It is a little strange that Backpropagation combined with Stochastic Gradient Descent (SGD) is still the best / canonical learning algorithm used. It has certainly stood the test of time since 1986 and is very GPU-friendly. Current research seems focused on network architecture but it seems inevitable that a resurgence in learning algorithms will also take place.

Most neural nets are trained by a teacher - we reward the net with a low error for good outputs and penalise it for bad outputs. This approach works well enough, but requires nicely-labelled data, often a luxury or simply not possible to attain. Google Deepmind are championing reinforcement learning as a way to develop systems that transfer well across problems (in their seminal paper to date, a single neural net learned / generalised enough to play all of the Atari games).

Finally, it is reasonable to expect a reduction in complexity of neural networks, if not theory then certainly in practice and usage. Using Minimum Description Length or Vapnik–Chervonenkis dimension as a measure, we are interested in building the simplest, most compact neural network (concretely, with the fewest parameters) to solve a given problem or task. Simpler networks will also train more quickly - a very nice benefit in practice.

Recap / summary

This ended up being a much longer blog post than I intended! Part historical journey, part overview on Machine Learning with a noticeable deep learning bent, it is a much more informal (and perhaps more readable) literature review than the one I'm currently also writing up for my PhD.

The pace of innovation in this field is high - there are four / five major conferences each year and each conference brings new announcements and breakthroughs. There is a real confluence of software engineering, fast, scalable hardware and good theories at present.

Deep learning / neural networks might not be the single unifying theory to bring all of AI together (there are simply too many unknowns and previous failures), but I suspect that it will have a profound influence on the development of AI for at least the next decade.

Further reading

If you're still interested in Machine Learning (why would you not be?!) I think it's fair to say that you simply cannot do enough reading in this field - the foundations are well-established but the frontiers are being pushed and challenged every day. This blog post contains many, many interesting links and in addition to that I would strongly recommend the Reddit ML AMAs from key thought leaders / practitioners in the space if you want to learn more. They are (in no priority order - they all make for excellent, insightful reading):

  1. Andrew Ng and Adam Coates (4/15/2015)
  2. Jürgen Schmidhuber (3/4/2015)
  3. Geoffrey Hinton (11/10/2014)
  4. Michael Jordan (9/10/2014)
  5. Yann LeCun (5/15/2014)
  6. Yoshua Bengio (2/27/2014)


11 Dec 2017

<a href=>Рулетка игр</a> - PREY, Rocket League.


10 Dec 2017

<a href=>www azino777 com</a> - 4 азино 777 ru забрать выигрыш, азино777 реклама.


10 Dec 2017

Бесплатный онлайн кинотеатр <a href=> - фильмы онлайн</a> <a href=>смотреть СЂСѓСЃСЃРєРёРµ фильмы</a> <a href=>смотреть онлайн бесплатно фильм матильда хорошего качества</a> <a href=>смотреть фильмы онлайн бесплатно хорошего качества СЂРѕСЃСЃРёСЏ</a> <a href=>фильмы 2017 онлайн РІ хорошем качестве</a> <a href=>РЅРѕРІРёРЅРєРё РєРёРЅРѕ РІ hd 720</a>


08 Dec 2017

авто поиск программа+для+поиска+автомобилей+в+украине Устал сутками сидеть на разных сайтах в поисках интересных вариантов , наш сервис кардинально изменит твой доход ! НАШ СЕРВИС БУДЕТ ПОЛЕЗЕН:ПЕРЕКУПЩИКАМ Первым получай предложения по цене ниже рыночной в твоем регионе. Первым программа+для+поиска+автомобилей+в+украине Узнавай об изменении цены на актуальные для тебя объявления. Отсеивай объявления от своих собратьев.Площадкам и компаниям по подбору автомобилей ПЛОЩАДКАМ И КОМПАНИЯМ ПО ПОДБОРУ АВТОМОБИЛЕЙ Собранны все актуальные объявления. Экономия времени и персонала на поиск авто. Продуманный интерфейс.Удобная система оповещения о новых предложениях или изменениях цены <a href=>автоматический поиск машин в интернете</a> <a href=>программа для перекупов Украина</a> - программа для перекупщиков - менеджер по поиску авто - автоматический поиск машин в интернете <a href=>программа для перекупов Украина</a> <a href=>сайт перекупщико в авто</a> <a href=>автоматический поиск машин в интернете</a>


07 Dec 2017 - стоимость экскурсий паттайя, паттайя экскурсии камбоджа.


07 Dec 2017

быстрый заработок битеоина <a href=>как зарабатывать с помощью биткоинов</a>


07 Dec 2017

Ночью серфил содержимое сети интернет, случайно к своему восторгу увидел прекрасный вебролик. Вот: <a href=>outdoor bbq kitchens</a> . Для моих близких данный ролик оказал незабываемое впечатление. Всего хорошего!


06 Dec 2017

автоматический поиск машин в интернете программа+для+поиска+автомобилей+в+украине ЛУЧШИЙ ИНСТРУМЕНТ ДЛЯ ПОИСКА АВТОМОБИЛЕЙ НИЖЕ РЫНОЧНОЙ ЦЕНЫ Только у нас Вы получите актуальные предложения от владельцев по всей Украине, со всех интернет ресурсов по интересным ценам на телефон,ПК или любой удобный для Вас гаджет Самый удобный сервис. Все сайты в одном кабинете AutoRia, OLX, RST, Avtobazar и др.Получай первым самые выгодные предложения. <a href=>программа для перекупщиков авто</a> <a href=>программа для перекупщиков</a> - программа для поиска автомобилей в украине - программа для поиска авто - программа для поиска автомобилей в интернете <a href=>программа для мониторинга объявлений</a> <a href=>программа для перекупщиков</a> <a href=>программы для перекупщиков авто</a>


06 Dec 2017

<a href=>хостинг компании россии</a> - защита от ddos iptables, хостинг сайтов с php и mysql.


05 Dec 2017

<a href=>Экшен игры на андроид на русском</a> - Игры одевалки для девочек, Скачать драки на андроид.


04 Dec 2017

This message is posted here using XRumer + XEvil 4.0 XEvil 4.0 is a revolutionary application that can bypass almost any anti-botnet protection. Captcha Recognition Google (ReCaptcha-1, ReCaptcha-2), Facebook, BING, Hotmail, Yahoo, Yandex, VKontakte, Captcha Com - and over 8.4 million other types! You read this - it means it works! ;) Details on the official website of XEvil.Net, there is a free demo version. Check YouTube video "XEvil ReCaptcha2"


03 Dec 2017

хорош веб ресурс https://xn--80ahdheogk5l.xn--p1ai/p208127528-forsunka-0445120214.html


02 Dec 2017

first site [url=]dumps shop[/url]


02 Dec 2017

Il magazine che analizza i nuovi linguaggi digitali e le nuove idee di format tv - IN Il contenitore tv delle notizie sul mondo della scienza e della conoscenza He'd always dreamed of owning a ranch, of being a pioneer. But when they arrive at the parcel that was to be their homestead, in late April, they discover, the [url=]look at this web-site[/url] Fast Facts about John F. Kennedy - JFK Library ... the speech is also available through the online version of the Public Papers. The former Prime Minister was unable to attend, but watched the Rose Garden ceremony on television in his home in London. Cigars: John F. Kennedy smoked 4-5 a day. .... Storage space for machine guns under the front seat and in the trunk Aug 8, 2016 UnREAL is about to wrap up its second season — here's how to watch the action online.


01 Dec 2017

ome people, especially those running on busy daily schedules tend to use the pills to help maintain weight since they can not afford to follow all the diet programs. This is not advised. It is recommended that one seek advice from a professional in this field before using the pills. This can save one from many dangers associated with the misuse. The diet pills should always be taken whole. Some people tend to divide the pills to serve a longer period of time. This is not advised and can lead to ineffectiveness. If it is required that one takes a complete tablet, it means that a certain amount of the ingredients are required to achieve the desired goal. It is also recommended that one does not crush the pill and dissolve it in beverages. Chemicals found in beverages have the potential of neutralizing the desired nutrients in the pill thereby leading to ineffectiveness. The best way to take the tablets is swallowing them whole with a glass of water. [url=][/url]


29 Nov 2017

[url=]страховка для детей занимающихся спортом[/url] - страховка для спорта ребенку, спортивная медицинская страховка.


29 Nov 2017

<a href=>оклейка такси желтой пленкой в Москве дешево</a> [url=]оклейка такси в белый цвет[/url]


29 Nov 2017

<a href=>массажный салон в балашихе</a> [url=]массаж в балашихе[/url]


27 Nov 2017

[url=]создание и продвижение сайтов в Балашихе[/url] <a href=>сайты в балашихе</a>


27 Nov 2017

generous resource [url=]buy cc fullz[/url]

Show more

Leave a comment

  Back to Blog