What is reinforcement learning?

Market Logic Team

Even if you only sporadically follow the topic of AI and the methods of machine learning, I’m almost certain you’ve come across the term “reinforcement learning”. It’s one of many machine learning methods, but it’s the only method that learns from positive and negative examples. It basically describes a method, which creates a model for the machine to deliver a ‘good’ outcome based on what it’s learned from positive and negative examples. A great example of reinforcement learning is the model built by DeepMind. This model learned over time and without any prior knowledge, other than knowing the objective was to increase the score of the classic Atari video game, Breakout, while also learning to play it. In this video game, you have to move a paddle to control a ball that, when it hits a tile, destroys that tile, and rewards the player with points. In terms of reinforcement learning, what is a reward? Answer: what the model is built to optimize. So in the case of Atari Breakout, getting more points while playing the game. Maybe you’re thinking that’s easy. You just need to learn to control the ball with the paddle. Unfortunately, it’s a little more complex. First, by a special trial and error approach, the model has to learn how to control the paddle; then, over time and while moving the paddle around, the ball will accidentally hit the paddle, allowing the model to learn that the ball hitting the paddle is a good thing sometimes (but not always since the ball might hit the paddle but then not directly hit a tile, thus NOT resulting in instant rewards). Then, depending on the ball’s position, it has to learn how to predict where to move the paddle in order to be rewarded more often and so on and so forth. The difficulty is also that the reward doesn’t always immediately follow an action. It might take 10 seconds or 15 actions to receive the reward (because the ball will not always hit a tile even when it hits the paddle). Therefore, the model has to be robust enough to be able to develop some notion of strategy. This leads us to the hardest problem when it comes to solving most machine learning challenges: reinforcement learning needs a lot of training – several thousand games, just to learn to play Breakout. And this knowledge can’t be transferred to similar games, like Pong, another ancient classic. Each game must be learned entirely from scratch. If you think about it within our domain of marketing information systems, there are lots of use cases where a system (that optimizes based on a reward and results in recommendations) is very desirable. For instance, take a system that recommends what product benefits should be promoted to optimize sales to a target consumer group. Imagine you have a certain set of criteria you want to focus on for your campaign: price, ingredients, packaging, brand image, etc. Now, envision a model that helps you select the right criteria: based on all the research you have, and all the successful and unsuccessful concept tests your marketing team ever executed, to recommend the most likely right thing to do. Compared with the Atari game, there is one major difference: there are most likely not enough training examples, and unfortunately, also no simulator (like a “game of marketing”) that makes this work with reinforcement learning without adding a special method like transfer learning. It’ll be crucial to enable the machine to transfer what it learned in one business situation to another – which is essentially what we humans call “experience”. At Market Logic, we are currently working on exactly that – enabling our marketing assistant to learn how to make such decision-predictions based on real feedback from the business, and then to be able to transfer such learned models from one “situation” to another in order to provide better and faster recommendations.