What’s Reinforcement Learning?

Dongchan Year
2 min readSep 25, 2022

--

Questions are more important than answers.

Machine learning includes three categories:

  • supervised learning
  • unsupervised learning
  • reinforcement learning

Recall that in supervised learning, we rely on labeled training examples, which are provided by a supervisor or a human expert, and the goal is to train a model that can generalize well to unseen, unlabeled test examples. This means that the supervised learning model should learn to assign the same labels or values to a given input example as the supervisor human expert.

On the other hand, in unsupervised learning, the goal is to learn or capture the underlying structure of a dataset, such as in clustering and dimensionality reduction methods; or learning how to generate new, synthetic training examples with a similar underlying distribution.

RL is substantially different from supervised and unsupervised learning, and so RL is often regarded as the “third category of machine learning.”

The key element that distinguishes RL from other subtasks of machine learning, such as supervised and unsupervised learning, is that RL is centered around the concept of learning by interaction. This means that in RL, the model learns from interactions with an environment to maximize a reward function.

While maximizing a reward function is related to the concept of minimizing the cost function in supervised learning, the correct labels for learning a series of actions are not known or defined upfront in RL — instead, they need to be learned through interactions with the environment, in order to achieve a certain desired outcome — such as winning at a game. With RL, the model (also called an agent) interacts with its environment, and by doing so generates a sequence of interactions that are together called an episode. Through these interactions, the agent collects a series of rewards determined by the environment. These rewards can be positive or negative, and sometimes they are not disclosed to the agent until the end of an episode.

For example, imagine that we want to teach a computer to play the game of chess and win against human players.The labels (rewards) for each individual chess move made by the computer are not known until the end of the game, because during the game itself, we don’t know whether a particular move will result in winning or losing that game.

Furthermore, considering the example of playing chess, the input is the current configuration, for instance, the arrangement of the individual chess pieces on the board. Given the large number of possible inputs (the states of the system), it is impossible to label each configuration or state as positive or negative.

This is the essence of RL. In RL, we cannot or do not teach an agent, computer or robot, how to do things; we can only specify what we want the agent to achieve. Then, based on the outcome of a particular trial, we can determine rewards depending on the agent’s success or failure.

--

--

Dongchan Year
Dongchan Year

No responses yet