Humphrey Sheil - Blog

Gorila: Google Reinforcement Learning Architecture

17 Jul 2015 | 25 comments

Today (Sat 11th July) is the second day of the Deep Learning workshop here at ICML 2015 in Lille. I liked one session in particular as I think it offers a good glimpse of how the industrialisation of Machine Learning is shaping up. Or put another way, the white-hot nexus where experimental ML meets practical software engineering..

Gorila (General Reinforcement Learning Architecture)

The talk today was by David Silver - Head of the Reinforcement Learning team at Google DeepMind and Arun Nair from the Applied team at Deepmind on Gorila (General Reinforcement Learning Architecture). This separation of concerns (research vs application) makes a lot of sense when productionising Machine Learning - you wouldn't ask a pure research scientist to make their code ready for production - so the teams are stratified from research through to production.

Silver gave a keynote on the same topic at ICLR in San Diego in May, but as you'll see from those slides that talk focused more on the general benefits of Reinforcement Learning (RL) as opposed to Gorila itself.

In summary, Gorila looks to be a generalisation of the well-known DistBelief from Jeff Dean et al. from feed-forward supervised learning to reinforcement learning.

What is Reinforcement Learning?

From an ML perspective, Reinforcement Learning has some nice properties over supervised learning, but it is also harder to implement successfully. Supervised or semi-supervised learning hinges on two key properties in particular:

  1. Having well-formed labelled data to tell your network in the training phase whether it got the task right or wrong. Labelled data is often hard to get and expensive to scale as we need humans to do the labelling.
  2. Having a well-defined Teacher module and being able to construct your objective function in such a way that you can even have a Teacher module.

In unsupervised learning, there are no labels and no teacher. The most common example of unsupervised learning would be clustering a data set based on properties intrinsic to the data that the algorithm can infer without outside help.

Reinforcement Learning (RL) is different again to both supervised and unsupervised learning. RL asks a different question - can the network figure out how to take one or more actions now to achieve a reward or payout (potentially far-off, i.e. t steps in the future) in the future. This delayed reward scenario is much harder to train for as we may have a large number of t steps to count back from and also we need to solve the credit assignment problem whereby multiple actions chosen by the network combine to realise the goal. There is no teacher module and very little labelled data - we only need to be able to measure the outcome of actions on the environment.

Mathematically, the network is asked to learn the best policy that will achieve the best outcome by picking the best action for a given state of the environment that the actor finds itself in, i.e. to solve the Q learning / Bellman optimality equation (derived from dynamic programming).

The easiest way to visualise this is playing video games - and is also a great way to ensure copious coverage in the mainstream media :) The DeepMind team train their networks on 49 games from the Atari 2600 - Seaquest, Tennis, Boxing etc.

Gorila itself

Schematic from ICLR 2015 presentation on Gorila Figure 1 A schematic from David Silver's ICML 2015 presentation on Reinforcement Learning. Used with permission.

What's interesting for me about Gorila is how much it feels like MapReduce from Dean et al or BigTable from Chang et al. In both of those cases, a hard problem (using a heterogenous compute cluster efficiently, storing and querying very large data sets) was solved by a new framework designed right from inception to scale to levels not previously encountered.

The four key components for Gorilla are:

  • Actor (there are many of these and they correspond to video game players, users of a service etc.)
  • Replay memory (this was a key insight to improve performance of the RL system and enable the Q learning task to be learned)
  • Learner (parallelised - so can generate many more gradients than the previous iteration)
  • The Q network or model itself (distributed using DistBelief - capacity to process many more networks in parallel)


The implications of all of this are fairly obvious but worth noting nonetheless for their importance:

  1. The team reported very significant speedups in performance and wall clock training time (v2 beat v1 (the Nature DQN) on 41 out of 49 Atari games. 22 x 2x, 11 x 5x, on 25 it is better than a human player). Training time is reduced from ~2 weeks to ~1 day. So the speed of iteration going from pretty basic research to iteration two is fast (less than one year by my reckoning).

  2. Google are building the same infrastructure around ML as they have around other problems (Gorila is to Reinforcement Learning as MapReduce is to task parallelisation as BigTable is to data storage). History then tells us that sooner or later there will be an open-source equivalent and eventually using reinforcement learning will be common place in software (whereas today it is esoteric, even within the deep learning community itself).

  3. Reinforcement Learning has the potential to have far wider practical usage than just video games and systems like Gorila pull it towards this future. If you imagine that we are all actors or protagonists inside Google services like youtube, AdWords etc., then it becomes quite realistic to use RL to pick the best ads to show me, recommend new content for me to watch etc.

Finally, the paper that Silver and Nair referenced during the talk is now available on Arxiv!

Thanks to David Silver for constructive feedback on this blog post


05 Dec 2017

A Biological Masterpiece, But Subject to Many Ills The human foot is a biological masterpiece. Its strong, flexible, and functional design enables it to do its job well and without complaint—if you take care of it and don't take it for granted. healthThe foot can be compared to a finely tuned race car, or a space shuttle, vehicles whose function dictates their design and structure. And like them, the human foot is complex, containing within its relatively small size 26 bones (the two feet contain a quarter of all the bones in the body), 33 joints, and a network of more than 100 tendons, muscles, and ligaments, to say nothing of blood vessels and nerves. <a href=></a>

yopute momde

08 Nov 2017

Z3gchS with hackers? My last blog (wordpress) was hacked and I ended up losing months of hard work due to no


18 Oct 2017

Trying To Find The Best Diet Pill? Trying to find the best diet pill may seem like an impossible task, especially with the multitude of diet pills available for purchase. Many people purchase a diet pill only to find out that the pill makes them feel jittery, nervous, or often has no effect at all. Diet pills frequently contain the same or similar combination of ingredients and rarely contain anything new, innovative, or undiscovered to the supplement / weight loss industry. So, how can you find the best diet pill when most diet pills are made with similar ingredients? One of the most common problems associated with taking diet pills is that the person taking the diet pill is uneducated about the dosage, effects, and promises offered as they relate to each diet pill. The research at website finds that there are three factors that should be taken into consideration when deciding to take a diet pill. Dosage: It is important to take the pill exactly as recommended on the product label. Some people choose to increase the dosage thinking that the product will work faster or better. This is not the case, and many people become sick in response to the large dose. Reviewers at website often suggest that the recommended dosage be cut in half to give the body time to adjust to the stimulant in the diet pill. After the body has adjusted, it is fine to begin taking the regular dosage as recommended on the product label. Effects: The effects listed on the product label are there because these are the effects that the product has had on 'some' of the test group. Some of the diet pill testers may be fine taking the product, while others may have adverse effects. The diet pill companies print this information to educate the buyer as well as to protect themselves from lawsuits. The consumer needs to read the label and educate themselves before taking the product. Many people who are sensitive to caffeine are surprised when the diet pill makes them feel nervous or nauseous, but this information is likely printed on the product, so with a little research these affects can be avoided. Promises: If you read the fine print on product claims for diet pills and other weight loss supplements, you will see 'results not typical' printed very small somewhere where you are not expected to look. The diet pills advertised on television are responsible for some of the most outlandish claims. The results claimed in these advertisements are often unattainable within the given amount of time outlined in the ad. Don't expect to see results in two weeks like a lot of ads claim. Wouldn't it be great if you could read reviews for diet pills from actual users of each diet pill? Diet Pill Reviews website has taken the trouble out of searching for the best diet pill. You can read reviews of over 150 of the most popular diet pills available. Copyright 2006, Diet Pill Reviews <a href=>viagra pas cher</a>


12 Oct 2017

Writing a medical thesis or dissertation is a task done by almost all postgraduate and master's medical students. Dissertation is derived from the Latin word disserto which means discuss. It is essential to write successful medical papers such as medicine essays and medical thesis papers. There are several reasons as to why students write medicine essays. One of the reasons is to promote enhancement of critical judgment, research skills as well as analytical skills. Moreover, medicine essay writing produce students with the ability to 4evaluate and analyze data critically. The initial step for writing medicine essays is to choose a topic. A writer should have at least three topics to choose from. The topic has to be interesting, feasible and relevant. It is essential to write quality medicine essay. Hence, students need to have analytical skills and perfect writing skills. The writing skills will enable them write outstanding essay papers that can be highly regarded by instructors and professors. Teachers often require a lot and expect a lot from their students in terms of medicine essay writing. for this reason, students find essay writing to be an extremely difficult task and hence resort to buying custom medicine essays. A custom medicine essay has to be written by professional writers who are qualified in the field of nursing. Moreover, the custom medicine essay has to be original and plagiarism free. This means that it has to be written from scratch by experts with many years experience. The many years experience should enable a writer to write any form of medical paper including medical thesis, medicine essay and even medicine research paper. Moreover, experience will enable a writer to write a medicine essay that can guarantee academic success. Students get custom medicine essays from custom writing company. It is essential to choose the best company so that one can get the best custom medicine essay. The best and the most reliable medicine essay writing company should have some unique characteristics such as affordability and the ability to provide original and superior quality medicine essays. The other quality is that the company has to hire expert writers who can write quality medicine essays and other types of medical papers. The essays should not only be quality but also plagiarism free and free of grammatical and spelling mistakes. A custom medicine essay has a similar structure to any other academic essay assignment. It has an introduction that introduces the topic and tells the reader what the essay is all about. The second section is the body that has many paragraphs supporting the main topic. Finally there is the conclusion that briefly summarizes what has been discussed in the body section of the essay. Students should choose reliable writing companies so that they can get quality custom papers on several fields such as technology, sociology and law in addition to medicine field. Our custom writing company is the best company that all clients should rely on when in need of any given type of medicine paper. We provide quality papers that not only plagiarism free but also original. Moreover, our custom papers are affordable and able to guarantee academic excellence at all times. All our medical papers are reliable and sure of satisfying clients at all times.  


03 Oct 2017

Doctor Who is now considered a British Institute and has come a long way since it first aired on November 23rd 1963. The very first show saw the Doctor travel 100,00 years into the past to help some dim cavemen discover light. After 26 seasons and seven Doctors later the series came off our screens in 1989 much to the disappointment of the huge devoted fanbase. In 1996 an attempt was made to revive Doctor Who but it wasnt until June 2005 when it came back with a vengeance with Christopher Eccleston as the ninth Doctor that put the series back on the map as it were. It then went on for 5 years with David Tenant portraying the Doctor until 2010 when Matt Smith took over the role. Today it is still a great family show and has attracted many new fans. If youre a new or old fan of the show there are Tours and museums you can go and see some of the locations and memorabilia of this classic show. The Doctor Who Tour of London will take you on over 15 locations from the show, some from the new series and some from old sites like the location of The Invasion and Resurrection of the Darleks. The tour also takes you to the TV museum in London where you will get to see some of the cosumes worn in the show and props used. Also you can buy gifts and memorabilia from the shop. You will learn all about how the shows were made so the tour is also educational. If you want to take pictures of the locations thats not a problem. Remember the front door of 10 Downing Street in Aliens of London? Well you can get up and close to this and get your picture taken in front of the door. Rose Tyler fans will love the tour as you get to drop by her home in the show. Why not go that extra mile and actually meet a Doctor Who star. Well this is possible with private or group tours. You will get the general tour but included will be a pre-arranged meeting or lunch with a celebrity from the show. This will obviously depend on availability of the celebrity and the cost will reflect the popularity of that celebrity. There are tours in London and also Wales. The Wales tours take you to Cardiff where you will see lots of location which were featured in shows since 2005. You can leave from London or at Leigh Delamere services station on the M4. There is a Doctor Who exhibition in Cardiff which you get to see. At the end of the Doctor Who tour you get a souvenir group picture sent you by email which is a nice touch. For seriously devoted Doctor Who fans there is a 3 day tour which takes you to all the locations in both London and Cardiff. You will see locations from the past 45 years as well as recent sites from the lasted Doctor Who series. Day one is based in London where you get to see 15 sites. Day two takes you to Cardiff where you get to mean the real life owner of the to see we have an Gothic property used as the location of the school in Human Nature. The final day is partly spent in Cardiff with a walking tour at Cardiff Bay, then you head back to London but a stop at Stonehenge to see the site of the Pandoica. Then its dinner at The Cloven Hoof pub in Devils End b efore you taken back to central London.  

tits tits tits

20 Sep 2017

A5rovY Looking forward to reading more. Great post.Much thanks again. Really Great.


18 Sep 2017

301 Moved Permanently <a href=>More info!</a>


19 Aug 2017

Others can achieve a harder erection but cannot maintain it during sexual intercourse. Use these circumspectly however, because they may lower blood glucose levels, that is an undesirable effect in males whose blood sugar are properly balanced.


17 Aug 2017

The ED resulting from that surgery may be either temporary or permanent. Once you discover the top natural treatments, you are able to yet again have full power over your sexual pleasures.

suba me

11 Jul 2017

QPqXUk posts from you later on as well. In fact, your creative writing abilities has motivated me to get

Tami Sandoval

01 Apr 2017

Hello my name is Tami Sandoval and I just wanted to send you a quick note here instead of calling you. I discovered your Humphrey Sheil | Gorila: Google Reinforcement Learning Architecture page and noticed you could have a lot more visitors. I have found that the key to running a successful website is making sure the visitors you are getting are interested in your subject matter. There is a company that you can get keyword targeted traffic from and they let you try the service for free for 7 days. I managed to get over 300 targeted visitors to day to my site.

mike tyson

04 Mar 2017

O7CLMH I truly appreciate this article post. Will read on

Get the facts

01 Feb 2017

adTxku pretty handy stuff, overall I believe this is worth a bookmark, thanks


26 Jan 2017

to be frank, i feel sorry for the baby yg baru je 10 bulan for having irpesronsible parents, mempunyai ego yg tinggi, yg x paham langsung ttg komitmen berkahwin, x de sikap bertolak ansur antara satu sama lain. it will not be a surprise if that daughter of them, let say dh mencapai umur teenager lari dr rumah atas sebab x tahan dgn ayah die, jimmy, and mak die, suhaila.

Livia Schacter

04 Jan 2017

Hi my name is Livia Schacter and I just wanted to send you a quick message here instead of calling you. I came to your Humphrey Sheil | Gorila: Google Reinforcement Learning Architecture page and noticed you could have a lot more traffic. I have found that the key to running a popular website is making sure the visitors you are getting are interested in your website topic. There is a company that you can get keyword targeted traffic from and they let you try their service for free for 7 days. I managed to get over 300 targeted visitors to day to my website. Livia Schacter

suba jobblow

27 Nov 2016

YFMLSW Muchos Gracias for your article post.Much thanks again. Great.

Generic doctor

05 Nov 2016

Great site you have got in here.


23 Oct 2016

setembro 27th, 2012 at 10:44


11 Oct 2016

6s0EHz Wow, fantastic blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your web site is great, let alone the content!

huba buba

04 Aug 2016

38ARMb Very neat blog post.Really thank you! Really Great.

Show more

Leave a comment

  Back to Blog