InteLLigence

RLvSL: Reinforcement Learning via Supervised Learning

Contract Details

Programme Type: SIXTH FRAMEWORK PROGRAMME
Programme Acronym: SIXTH FRAMEWORK PROGRAMME
Contract Type: MARIE CURIE ACTIONS-INTERNATIONAL RE-INTEGRATION GRANTS
Contract No: MIRG-CT-2006-044980
Start Date: 2006-12-01
End Date: 2008-12-01
Project Status: completed
Budget for TUC: 80,000 euros
Role for TUC: prime
Principal Investigator for TUC: Michail G. Lagoudakis

Project Description

The field of machine learning develops learning paradigms and algorithms, which allow systems to learn some, desired functionality on their own. Supervised learning is learning with a teacher; some authoritative source provides a finite set of correct examples and the learner generalises from the examples and learns a correct function over the entire spectrum. An example from human learning would be the learning of correct spelling by observing correctly-spelled words. Supervised learning focuses on two static learning problems: classification, where the learner induces a correct classification of inputs to one of many classes, and regression, where the learner infers the correct values of a numerical function over its entire domain. In both cases, learning is based on a limited, finite set of correct classification or regression training examples.

Reinforcement learning on the other hand is learning by trial and error; there is no teacher and the learner interacts directly with its environment to acquire information. The learner makes decisions arbitrarily and occasionally receives a numerical score (reinforcement signal) for its overall behaviour. This score does not indicate correct or incorrect actions, but can be used to reinforce good decision making and discourage bad decision making. An example from human learning would be the process of learning how to balance and ride a bicycle (falls incur negative scores). Reinforcement learning focuses on two interactive learning problems within the scope of decision making: prediction, where the learner estimates the quality of a fixed control policy, and control, where the learner infers a good control policy. In both cases, learning is based on training data collected through interaction between the learning agent and its environment.

These two learning paradigms have been researched mostly independently. Recent advances in supervised learning have demonstrated outstanding, near optimal, generalisation performance, whereas reinforcement learning has not reached the same level of applicability to real-world problems. This research proposal investigates the potential of using supervised learning technology for advancing reinforcement learning. Preliminary results have shown that it is possible to incorporate supervised learning algorithms within the inner loops of several reinforcement learning algorithms and therefore reduce one problem to the other. This synergy opens the door to a variety of promising combinations. The proposed research will establish the criteria under which this reduction is possible, will investigate viable combinations, will propose novel algorithms, will assess their potential, and will apply them to real problems of practical interest to demonstrate their effectiveness.

The technological potential of the proposed research is enormous. Two respected learning paradigms are joined together and a direct transfer of knowledge is allowed. Advances in supervised learning will immediately incur similar advances in reinforcement learning. This fact can be seen as a major breakthrough in wider terms. Research nowadays has become so specialized that innovation in one field rarely finds its way and becomes useful in another field. Therefore, researchers are doomed to ``reinventing the wheel'' whenever needs arise, instead of drawing solutions already invented by their colleagues in a related field. The proposed research demonstrates how researchers can benefit each other by building bridges across disciplines.

Reinforcement learning is by nature interdisciplinary in the sense that it finds applications in robotics, automatic control, combinatorial optimization, networking, signal processing, dialogue management, and numerous other fields. Advances in reinforcement learning can only widen the breadth of applications and strengthen the ties between different fields. The proposed research is expected to enhance reinforcement learning in previously unknown ways and therefore allow further intedisciplinarity.

Project Coordination and Principal Investigator: Michail G. Lagoudakis

Official Project Web Site: http://www.lagoudakis.gr/RLvSL

Related Publications

Rexakis I., Lagoudakis M.: Directed Exploration of Policy Space using Support Vector Classifiers, Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011), Paris, France, April 2011, pp. 112-119.

Publication Type: Conference Publications [abstract] [link][file]

Rachelson E., Lagoudakis M.: On the Locality of Action Domination in Sequential Decision Making, Proceedings of the 11th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, FL, USA, January 2010.

Publication Type: Conference Publications [abstract] [link][file]

Pazis J., Lagoudakis M.: Binary Action Search for Learning Continuous-Action Control Policies, Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, Quebec, Canada, June 2009, pp. 793–800.
[video of the presentation at ICML]

Publication Type: Conference Publications [abstract] [link][file]

Pazis J., Lagoudakis M.: Learning Continuous-Action Control Policies, Proceedings of the 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Nashville, TN, USA, March 2009, pp. 169-176.

Publication Type: Conference Publications [abstract] [link][file]

Rexakis I., Lagoudakis M.: Classifier-Based Policy Representation, Proceedings of the 2008 IEEE International Conference on Machine Learning and Applications (ICMLA'08), San Diego, CA, USA, December 2008, pp. 91-98.

Publication Type: Conference Publications [abstract] [link][file]

Dimitrakakis C., Lagoudakis M.: Rollout Sampling Approximate Policy Iteration, Machine Learning 72 (3), 2008, pp. 157-171.

Publication Type: Journal Publications [abstract] [link][file]

Dimitrakakis C., Lagoudakis M.: Rollout Sampling Approximate Policy Iteration, (Extended Abstract) Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2008), Antwerp, Belgium, September 2008, pp. 7.

Publication Type: Workshop Proceedings [abstract] [link][file]

Dimitrakakis C., Lagoudakis M.: Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration, Proceedings of the 8th European Workshop on Reinforcement Learning (EWRL'08), Lille, France, June 2008, pp. 27-40.

Publication Type: Conference Publications [abstract] [link][file]