Alpha Zero and Reinforcement Learning: What Does this Mean for Marketing?

Once the basis for science fiction, artificial intelligence is becoming a part of everyday life, advancing more rapidly than experts could have predicted just a few years ago. For example, in 2015, an AI program called AlphaGo beat a professional human player at the game of Go – an ancient and extremely sophisticated game – more complex than Chess, and extremely popular in Asia and among gaming circles. The following year, the same program beat the 18-time world champion in an event heralded as being a decade ahead of its time.

However, the next iteration of AlphaGo – Alpha Zero – not only surpassed the human players, but it also beat AlphaGo itself 100 games to 0. The advancement that its developers made moving from AlphaGo to Alpha Zero is that of unsupervised learning, also known as reinforcement learning. Reinforcement learning allows machines and software agents to use trial and error to determine the ideal behavior in a specific context automatically. When the algorithmic agent reaches the correct outcomes, it receives a reward known as the reinforcement signal. Reinforcement learning systems determine the ideal action for the context, learning by trial and error and positive reinforcement, continually improving itself without human intervention.

Alpha Zero contains a general-purpose algorithm that has been applied to develop super-human level AI agents to learn how to succeed in a complex, strategic environment such as Go, or chess. For the Alpha Zero algorithm to function, it must be in an environment meeting three criteria: discrete, deterministic, with perfect information. 

What does reinforcement learning mean for marketing?

One of the primary goals of targeted marketing is determining the ideal offer value. What offer, at the lowest cost to the company, is enough to sway a consumer to make a purchase? What offer/redemption rate scenario maximizes profit while ensuring that offers are not being unnecessarily subsidized?

The existing method of determining which digital offer to use, which requires a client to decide on a business rule, is problematic. Not only is it difficult to calculate the best data flow, but offers are also challenging to individualize, cannot be made in real time, and are untracked, so it is impossible to measure the value and ROI of the effort accurately.

A company that can accurately determine the ideal offer value for each shopper would have an enormous competitive advantage by maximizing value and measurable ROI and reducing waste and overall spend while appealing to individual shoppers with real-time results.

Using the principles of reinforcement learning, RevTrax is currently developing a solution to act as an offer serving agent, distributing digital promotions to individual shoppers at the ideal offer level, to maximize the profit to the client. RevTrax is presenting the application with an environment of discrete, deterministic, perfect information with a singular goal: to determine an ideal offer for the client.

An AI-powered promotions application like this would use unsupervised, reinforcement learning as an innovative, cutting-edge marketing application. AI agents would study and review shopper characteristics, discern patterns in behavior, and use this to determine the ideal sales price for that shopper or group of shoppers. With no human input, this solution would be able to use existing offer  parameters to designate the best offer for both the consumer and the company.

When the offer is redeemed, data is sent back to the RevTrax AI solution which would incorporate results into rich data analytics for even greater precision on future ideal offers.

In its early application, an AI-powered offer solution would easily determine the best offer from a defined set of available options. However, in future iterations, additional techniques such as network graph theory could be applied to select the best value from a range of potential offers.

By leveraging AI and machine-learning, RevTrax is able to bring reinforcement learning to incentive technology. As this tool is further developed; RevTrax can develop business rules and train algorithmic agents to successfully predict the optimal offer level and redemption rate to maximize company profit. When the solution is fully optimized, RevTrax clients will be able to reap the benefits of AI and machine-learning in their marketing promotions.




Learn how to leverage first-party purchase data to close the path-to-purchase loop by tying specific offline customer purchase data to all online marketing channels and tactics — download the First-Party Purchase Data White Paper and drive meaningful growth.

our blogs
Beyond Black Friday: Making Post-Holiday Email Campaigns Count

Beyond Black Friday: Making Post-Holiday Email Campaigns Count

With Black Friday now behind us, the holiday shopping season is in full throttle as families rush to get gifts in time for the holidays. Given how unpredictable inventory management and supply chains have been in recent months, consumers have been advised to shop...

Big News! Neptune Retail Solutions Acquires RevTrax

Big News! Neptune Retail Solutions Acquires RevTrax

  We are excited to share the news that RevTrax has been acquired by Neptune Retail Solutions (“NRS”). We formally announced our exclusive partnership with NRS last month, which was the prelude to today’s big announcement. NRS is one of the most well-respected...