What is Reinforcement Learning?

Reinforcement Learning is a computational approach to understand and automate the goal-oriented learning and decision making. In the context of Artificial intelligence, RL is a type of dynamic programming that trains algorithms using a system of reward and punishment.

Reinforcement Learning uses a formal framework defining the interactions between a learning agent and its environment in terms of states, actions, and rewards. The agent receives rewards for every right action and penalty for every wrong action. The goal of the agent is to maximize the rewards and minimize the penalties. Reinforcement learning problems involve learning what to do, how to map situations to actions so as to maximize a numerical reward signal.

How RL is different from ML?

RL is distinguished from other computational approaches by its emphasis on learning by an agent from direct interaction with its environment, without relying on exemplary supervision or complete models of the environment.

Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience. Reinforcement learning is also different from unsupervised learning, which is typically about finding hidden structures in unlabelled data. RL is trying to maximize a reward signal instead of trying to find hidden structures.

Reinforcement Learning is the first field to seriously address the computational issues that arise when learning from interaction with an environment in order to achieve long-term goals.

Elements of RL:

  • An agent performs the action under the policy being followed
  • An action that the agent takes in the environment
  • reward that the agent receives from the environment for every single action
  • An environment is a place where agent exists
  • state that the agent currently exists
  • goal of your agent is to maximize the rewards
  • policy is a mapping from states of the environment to actions to be taken during states

According to Richard Sutton: ” I believe that in some sense reinforcement learning is the future of AI. Reinforcement learning is the best representative of the idea that an intelligent system must be able to learn on its own, without constant supervision. An AI has to be able to tell for itself if it is right or wrong. Only in this way can it scale to really large amounts of knowledge and general skill.”

Applications of RL:

Reinforcement Learning aroused as a method for training of Artificial neural networks “by experience”, rather than “by examples”. Applications of RL in high-dimensional control problems, like Robotics have been the subject of research and by using RL we can build products for industrial automation.

Predictive Maintenance:

The main aim of the Predictive maintenance of industrial plants is prediction on time of any undesirable future regimes, based on the available story of the on-line measurements of key variables. If an exact model of the plant is available, it could be easily used for prediction of its future states and hence, the current working regime of the plant. However, real industrial plant identification by an adequate model is usually a hard task. Experienced plant operators are able to predict future alarm situations based only on available real time measurements. Similarly, in terms of RL, a well trained critic is able to predict future rewards or losses without any model of the environment and using only some sensory inputs and using only the available measurements information, without the need of an adequate plant model.

Dynamic Pricing:

Maximizing profit is the one of the main goals of every trader since early days of commerce. Unfair pricing policies have been shown to be one of the most negative perceptions customers can have concerning pricing, and may result in long-term losses for a company. Reinforcement Learning helps us to get the fairness of the deal in order to cultivate the trust of customer. RL is able to learn from recent experiences adapting pricing policy to complex market environments. It help us do is to balance in real-time fairness and profit in a complex and fuzzy environment of price fluctuations and for a diverse range of financial products. RL is ideal for this kind of problem because it learns by trial and error while interacting with the environment, as opposed of learning with labeled data, i.e. not prior-knowledge about how environment works is necessary. With RL we can have transparency on how prices set and convey this measure of fairness to the customer, and therefore improving trust in the service.

Resources of RL:

Frameworks and Packages

RL-Glue – “RL-Glue (Reinforcement Learning Glue) provides a standard interface that allows you to connect reinforcement learning agents, environments, and experiment programs together, even if they are written in different languages.”

Gym (OpenAI) – “Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.”

RL4J (DL4J) – “RL4J is a reinforcement learning framework integrated with deeplearning4j and released under an Apache 2.0 open-source license.”

TensorForce (Reinforce.io) – “A TensorFlow library for applied reinforcement learning.”