April 09, 2021 – by Simone Ulmer 

When graphics processing units (GPUs) from the gaming industry began to be installed in conventional supercomputers more than ten years ago, the research field of artificial intelligence began to gain momentum. The properties of the GPU finally made it possible to carry out the computationally intensive "training programs" of the agents (the software algorithms) in large numbers. And unlike the early days when algorithms prescribed how to distinguish a dog picture from a cat picture, for example, it was becoming much easier to develop more general algorithms that could also be applied to a variety of problems. These algorithms, known as reinforcement learning (RL) algorithms, learn to interact with the environment, and in doing so, they optimise themselves via the positive or negative feedback they receive during the interactions.

Algorithm’s potential may be limited by human decisions

Today, RL algorithms enable countless advanced technologies from speech recognition all the way to autonomous robots. Researchers go to great lengths to develop the underlying learning algorithms so that they can be used as generally as possible and not just for a specific application. However, it is often not actually known what the best learning algorithm for a problem would be, or in which context it would work best.

"Current learning algorithms are limited by the researcher's ability to make the right design decisions," says Louis Kirsch of the renowned Swiss AI Lab IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale) at the Università della Svizzera italiana (USI) and La Scuola universitaria professionale della Svizzera italiana (SUPSI).

Kirsch is a doctoral student under Jürgen Schmidhuber, the scientific director of IDSIA and professor of artificial intelligence. Schmidhuber is one of the world's leading researchers in the field of artificial intelligence and founded the underlying method of meta learning (learning to learn) more than 30 years ago. Building on that knowledge, Kirsch wants to overcome the problem in conventional RL by combining meta and reinforcement learning. This, he explains, is because meta learning enables an algorithm developed by the researcher to learn to generate new learning algorithms in intermediate steps and without direct human influence. The further design decision here therefore lies with the machine and not with the human.

Exchange with the environment and learning by oneself

"Current research on meta learning has so far mainly focused on learning specialised learning algorithms," says Kirsch. This means that the automatically generated learning algorithms can only solve specific problems that their original algorithms were trained on and can therefore only be applied to very similar problems. But by combining meta learning with reinforcement learning, the "agent" learns to react to the environment and thus "new situations", while at the same time he learns how to learn itself. 

In order to make these algorithms more general and universally applicable, Schmidhuber's research group turned to biological evolution for inspiration. They observed that, in the course of evolution, the experience of many learning humans resulted in a "general learning algorithm" that is written down in our genes — like learning to ride a bike, for example. 

In the study by Kirsch, Schmidhuber, and colleague Sjoerd van Steenkiste, the method referred to as MetaGenRL attempts to achieve such basic capabilities by having the RL agents train and learn in different environments. MetaGenRL then distils the agents' experiences and generates a simplified learning algorithm that combines all of them. This general learning algorithm is analogous to the genetic code, so to speak, and decides in further steps how to learn new things. 

"The GPU-accelerated architecture of 'Piz Daint' is very well suited for the matrix-matrix multiplication required for this, which is used extensively in deep learning and deep reinforcement learning," says Kirsch. "In the case of meta learning, it is necessary not only to learn with a learning algorithm, but also to optimise over it — that means trying out different learning algorithms automatically to find the best one." Furthermore, the researcher says, this task can be widely distributed over the large number of nodes of "Piz Daint".

The researchers write that MetaGenRL can be generalised to new environments, even those that are completely different from the ones used for meta-training. For example, the trained learning algorithm can control new robot models that are unfamiliar. In some cases, these new algorithms are already outperforming human-developed RL algorithms.

(Image above: Shutterstock)

Reference:

Kirsch L, van Steenkiste S & Schmidhuber J: Improving Generalization in Meta Reinforcement Learning using Learned Objectives, https://arxiv.org/abs/1910.04098

Further information:

This article may be used on other media and online portals provided the copyright conditions are observed.