InteLLigence
 
RLvSL: Ενισχυτική Μάθηση μέσω Επιβλεπόμενης Μάθησης
 logo
Πληροφορίες Συμβολαίου
Τύπος Προγράμματος: SIXTH FRAMEWORK PROGRAMME
Ακρωνύμιο Προγράμματος: SIXTH FRAMEWORK PROGRAMME
Τύπος Συμβολαίου: MARIE CURIE ACTIONS-INTERNATIONAL RE-INTEGRATION GRANTS
Αριθμός Συμβολαίου: MIRG-CT-2006-044980
Ημερομηνία Έναρξης: 2006-12-01
Ημερομηνία Λήξης: 2008-12-01
Κατάσταση Έργου: έχει ολοκληρωθεί
Προϋπολογισμός Εργαστηρίου: 80,000 ευρώ
Ρόλος Εργαστηρίου: συντονιστής
Επιστημονικός Υπεύθυνος Εργαστηρίου: Μιχαήλ Γ. Λαγουδάκης
Περιγραφή Έργου
The field of machine learning develops learning paradigms and algorithms, which allow systems to learn some, desired functionality on their own. Supervised learning is learning with a teacher; some authoritative source provides a finite set of correct examples and the learner generalises from the examples and learns a correct function over the entire spectrum. An example from human learning would be the learning of correct spelling by observing correctly-spelled words. Supervised learning focuses on two static learning problems: classification, where the learner induces a correct classification of inputs to one of many classes, and regression, where the learner infers the correct values of a numerical function over its entire domain. In both cases, learning is based on a limited, finite set of correct classification or regression training examples.

Reinforcement learning on the other hand is learning by trial and error; there is no teacher and the learner interacts directly with its environment to acquire information. The learner makes decisions arbitrarily and occasionally receives a numerical score (reinforcement signal) for its overall behaviour. This score does not indicate correct or incorrect actions, but can be used to reinforce good decision making and discourage bad decision making. An example from human learning would be the process of learning how to balance and ride a bicycle (falls incur negative scores). Reinforcement learning focuses on two interactive learning problems within the scope of decision making: prediction, where the learner estimates the quality of a fixed control policy, and control, where the learner infers a good control policy. In both cases, learning is based on training data collected through interaction between the learning agent and its environment.

These two learning paradigms have been researched mostly independently. Recent advances in supervised learning have demonstrated outstanding, near optimal, generalisation performance, whereas reinforcement learning has not reached the same level of applicability to real-world problems. This research proposal investigates the potential of using supervised learning technology for advancing reinforcement learning. Preliminary results have shown that it is possible to incorporate supervised learning algorithms within the inner loops of several reinforcement learning algorithms and therefore reduce one problem to the other. This synergy opens the door to a variety of promising combinations. The proposed research will establish the criteria under which this reduction is possible, will investigate viable combinations, will propose novel algorithms, will assess their potential, and will apply them to real problems of practical interest to demonstrate their effectiveness.

The technological potential of the proposed research is enormous. Two respected learning paradigms are joined together and a direct transfer of knowledge is allowed. Advances in supervised learning will immediately incur similar advances in reinforcement learning. This fact can be seen as a major breakthrough in wider terms. Research nowadays has become so specialized that innovation in one field rarely finds its way and becomes useful in another field. Therefore, researchers are doomed to ``reinventing the wheel'' whenever needs arise, instead of drawing solutions already invented by their colleagues in a related field. The proposed research demonstrates how researchers can benefit each other by building bridges across disciplines.

Reinforcement learning is by nature interdisciplinary in the sense that it finds applications in robotics, automatic control, combinatorial optimization, networking, signal processing, dialogue management, and numerous other fields. Advances in reinforcement learning can only widen the breadth of applications and strengthen the ties between different fields. The proposed research is expected to enhance reinforcement learning in previously unknown ways and therefore allow further intedisciplinarity.

Project Coordination and Principal Investigator: Michail G. Lagoudakis

Επίσημος Ιστοχώρος Έργου: http://www.lagoudakis.gr/RLvSL

Σχετικές Δημοσιεύσεις
  • Ρεξάκης Γ., Λαγουδάκης Μ.: Directed Exploration of Policy Space using Support Vector Classifiers, Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2011), Paris, France, April 2011, pp. 112-119.
    Τύπος Δημοσίευσης: Άρθρα σε Συνέδρια [περίληψη] [σύνδεσμος][αρχείο]
  • Rachelson E., Λαγουδάκης Μ.: On the Locality of Action Domination in Sequential Decision Making, Proceedings of the 11th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Ft. Lauderdale, FL, USA, January 2010.
    Τύπος Δημοσίευσης: Άρθρα σε Συνέδρια [περίληψη] [σύνδεσμος][αρχείο]
  • Πάζης Ι., Λαγουδάκης Μ.: Binary Action Search for Learning Continuous-Action Control Policies, Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, Quebec, Canada, June 2009, pp. 793–800.
    [βίντεο της παρουσίασης στο ICML]
    Τύπος Δημοσίευσης: Άρθρα σε Συνέδρια [περίληψη] [σύνδεσμος][αρχείο]
  • Πάζης Ι., Λαγουδάκης Μ.: Learning Continuous-Action Control Policies, Proceedings of the 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), Nashville, TN, USA, March 2009, pp. 169-176.
    Τύπος Δημοσίευσης: Άρθρα σε Συνέδρια [περίληψη] [σύνδεσμος][αρχείο]
  • Ρεξάκης Γ., Λαγουδάκης Μ.: Classifier-Based Policy Representation, Proceedings of the 2008 IEEE International Conference on Machine Learning and Applications (ICMLA'08), San Diego, CA, USA, December 2008, pp. 91-98.
    Τύπος Δημοσίευσης: Άρθρα σε Συνέδρια [περίληψη] [σύνδεσμος][αρχείο]
  • Δημητρακάκης Χ., Λαγουδάκης Μ.: Rollout Sampling Approximate Policy Iteration, (Extended Abstract) Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2008), Antwerp, Belgium, September 2008, pp. 7.
    Τύπος Δημοσίευσης: Άρθρα σε Συμπόσια [περίληψη] [σύνδεσμος][αρχείο]
  • Δημητρακάκης Χ., Λαγουδάκης Μ.: Rollout Sampling Approximate Policy Iteration, Machine Learning 72 (3), 2008, pp. 157-171.
    Τύπος Δημοσίευσης: Άρθρα σε Περιοδικά [περίληψη] [σύνδεσμος][αρχείο]
  • Δημητρακάκης Χ., Λαγουδάκης Μ.: Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration, Proceedings of the 8th European Workshop on Reinforcement Learning (EWRL'08), Lille, France, June 2008, pp. 27-40.
    Τύπος Δημοσίευσης: Άρθρα σε Συνέδρια [περίληψη] [σύνδεσμος][αρχείο]