Deep Reinforcement Learning


This course is taken almost verbatim from CS 294-112 Deep Reinforcement Learning – Sergey Levine’s course at UC Berkeley. We are following his course’s formulation and selection of papers, with the permission of Levine. This is a section of the CS 6101 Exploration of Computer Science Research at NUS. CS 6101 is a 4 modular credit pass/fail module for new incoming graduate programme students to obtain background in an area with an instructor’s support. It is designed as a “lab rotation” to familiarize students with the methods and ways of research in a particular research area.


24 students in 11 teams

Guest Registration

1378 guests registered



RL for self-driving car with the Carla Simulator


A curriculum-based approach to learning a universal policy

Due to a mismatch between simulation and real-world domains (commonly termed the ‘reality gap’), robotic agents trained in simulation often may not perform successfully when transferred to the real-world. This problem may be addressed through domain randomisation, such that a single universal policy is trained to adapt to different environmental dynamics. However, existing methods do not prescribe a priori the order in which tasks are presented to the learner. We present a curriculum-based approach where the difficulty level of training corresponds to the number of variables that are randomised. Experiments are conducted on the inverted pendulum, a classic control problem.



Stock Price Prediction with RL

Using Reinforcement Learning to predict Stock Price Movements



Solving TSP and Knapsack problems with RL



Inverse reinforcement learning for NLP



Atari MCTS simulation


Deep Q Network Flappy Bird

An implementation of a DQN Network to play flappy bird.


Imitation Learning

This project explores reinforcement learning by implementing and deploying imitation learning algorithms such as direct behavior cloning and DAgger.


Model-based Reinforcement Learning

Model-based reinforcement learning refers to learning a model of the environmental by taking action and observing the results including the next state and the immediate rewards, and indirectly learning the optimal behavior. The model predicts the outcome of the action and is used to replace or supplement the interaction with the environment to learn the optimal policy. Model-based reinforcement learning consists of two main parts: learning a dynamics model, and using a controller to plan and execute actions that minimize a cost function. In this project, we will explore model-based reinforcement learning in terms of dynamics models and control logics.



Policy Gradients for Reinforcement Learning tasks.

Use Policy Gradient algorithms to use RL tasks like CartPole, LunarLadar with neural network baseline implementation.



Reinforcement Learning in Pong Game

This project is about implementing reinforcement learning into one of the classic Atari in Pong Game by using OpenAI Gym as the environment.