Machine Learning


This module introduces basic concepts and algorithms in machine learning and neural networks. The main reason for studying computational learning is to make better use of powerful computers to learn knowledge (or regularities) from the raw data. The ultimate objective is to build self-learning systems to relieve human from some of already-too-many programming tasks. At the end of the course, students are expected to be familiar with the theories and paradigms of computational learning, and capable of implementing basic learning systems.


160 students in 27 teams


Generating Word Embeddings for Singlish to be used in Sentiment Analysis

"Hi, how are you?" this sentence may be simple for us to understand yet is incomprehensible for a computer. Our project aims to explore how can we model the daily language of Singapore (Singlish) using mathematical representation. One common way to represent words would be the use of vectors to capture the semantic meanings of the words. Another name of such vector representation is known as the word embedding. After we successfully modeled the Singlish words, we performed an extrinsic evaluation on the embeddings obtained by performing sentiment analysis on sentences to see how well the word embeddings capture the semantic meaning of the Singlish words.


Real-world Image Recognition for Multiple Human Attributes

Nowadays, we do not lack data of human beings in image or video format. In Singapore, the government has installed more than 80,000 police cameras. However, it becomes practically impossible to manually watch all the video recordings and understand what is happening/has happened. We present a modern approach for allowing computers to recognize multiple human attributes from images. A Convolutional Neural Network (CNN) model is trained to predict a sequence of descriptive attributes, such as gender, age, clothes.


Diagnosing Pneumonia With Chest X-Ray Image

Pneumonia is an infectious disease and a leading cause of mortality globally. Timely and accurate diagnosis is crucial in preventing spread of the disease. We aim to explore the most suitable machine learning model to diagnose pneumonia and to determine pre-processing techniques that will improve our models’ accuracy which will help to expedite the diagnosis process.


Toxic Comments Classification

With the evolution of technology, platforms, such as social media,that allows the communication of personal thoughts and feelings are increasingly prevalent. However, this degree of freedom is associated with problems such as promoting hate, hurling abuse anonymously or cyber-bullying - resulting in a toxic online community.Hence, this project aims to come up with a multi-headed model to distinguish toxic comments on Wikipedia from clean ones, and to identify the types of toxicity present.


Predicting Stock Volatility with News Headlines

The stock market is a multi agent environment comprised of human and computer traders. Over the past decade, algorithmic trading has become increasingly popular and new developments in AI have resulted in new algorithms that mimic the decision making process of human traders better than ever before. One of the most popular approaches is to look at how news impacts the stock market. We compare three approaches of headlines manipulation to better understand which feature is best at predicting the volatility of a stock. These features are raw text, extracted events and sentimentality of the news headlines.


The Unbinding of Isaac: Clearing Dungeons with Deep Q-Network

Since DeepMind proposed playing Atari with deep reinforcement learning in 2013, many researchers have attempted to reproduce their results. In this project, we will reproduce Deep Q-Network(DQN for short) on a dungeon crawler game, “the binding of Isaac”, which is also a pixel-style game like Atari games but is more challenging for human players.


Restore the Archive - Using Neural Networks to Remove Distortions in Scanned Documents

We use neural networks to help remove artifacts and distortions from scanned documents - hopefully to its original quality.


Identifying Salt Deposits Beneath the Earth's Surface

We are trying to identify salt deposits in seismic images , because drilling into areas of salt is dangerous for oil and gas companies. The problem is an object detection problem with output 1 (salt) or 0 (rock) for each pixel of the image. We found that it is not the color of the pixel but rather its color contrast with neighbouring pixels that determine whether the said pixel is salt. Therefore, the key is to identify salt-rock boundaries within each image, thereby making the problem analogous to an edge detection problem.


Revolutionising the Fight Against Pneumonia with ML

We applied and compared deep learning neural network models YOLOv3, RetinaNet and Mask RCNN for the detection of pneumonia lung capacities on chest x-rays


College Analytica

$1.4 trillion!! That's the amount of student debt in the US. This amount is further increasing and so is the number of defaults on student loans. How can this be solved? We feel that the first step is to have an idea of where you stand. To make this possible, we developed a Machine Learning powered website based on a Gradient Boosting and Regressor Chain model to predict your future earnings and loan repayment rates.


Rossmann Store Sales Challenge

Predicting sales performance of retail companies would be useful in making good investment decisions on the company. Traditional econometrics methods, such as time series analysis and multivariate linear regression come with simplicity, but they are also having strong model assumptions, low flexibility, and a limited scope into the past. After learning about Neural Networks in CS3244 lessons, we would like to study how to apply recurrent neural networks models to capture time series effects in the data. We have trained models using Long Short Term Memory, Gated Recurrent Units, and Ensemble Methods with bagging. With the models trained, we assessed and compared their prediction accuracy and computational feasibility, and hence reinforced our understanding on selecting the optimal model for various types of data.


Fraud Detection

Despite the implementation of fraud analytics and Europay, MasterCard and Visa (EMV) technology, credit card fraud rates have risen over the years due in part to increasing prevalence of cashless payments. In this study, we compare the effectiveness of traditional machine learning and deep learning methods in detecting fraudulent credit card transactions, and develop a suitable model capable of detecting fraud in real-time. The findings from this comparative study could help credit card companies improve their fraud detection technology, and be extended to detect other types of fraud.




Pneumonia is the largest cause of death in children worldwide. However, detection of pneumonia can be challenging due to other lung conditions that will also appear as increased lung opacity on chest radiographs. Automating the detection of potential pneumonia cases can ultimately save more lives. We built a Mask R-CNN model to locate lung opacities on chest radiographs and detect the visual signal for pneumonia. We have also tapped into some advanced neural network techniques to enhance our model to perform better. We will be able to see how deep learning can be applied to medical images.


Advanced Regression Techniques: Predicting Iowa House prices Kaggle Competition

Predict sales prices and practice feature engineering, RFs, and gradient boosting


Predicting foot traffic using weather and time data

Foot traffic has traditionally been an important consideration for those involved in advertising, business siting and city planning. Using sensor and weather data provided by the city of Melbourne, we studied the feasibility of predicting footfall in various locations of Melbourne. We applied linear regression and neural network models to the problem, and investigated the relative importance of each feature, as well as the use of an initial embedding layer. Our results showed that time was the most important feature, and we were able to achieve a reasonable accuracy with our final regression model, thus demonstrating the feasibility of this problem.


Comparative Study of Machine Learning Models for Image Classification of Fashion Items

There is a wide variety of supervised machine learning algorithms, each with its inspirations and roots, advantages and disadvantages. We seek to explore these algorithms in detail to gain a deeper understanding of them and how they perform compared to each other for image classification. To do that, we compared the models’ ability to classify images of fashion items and identified which of them are best suited to be employed in several situations that required emphasis in different aspects of performance. Finally, using our acquired insights, we developed an original model JoNet-0 and achieved better accuracy than that achieved by the models we had previously implemented.


Reading Comprehension On Lecture Notes

This project explores the application of open-domain Question Answering (QA) in learning materials with a contribution of a lecture note dataset, called LNQA, annotated with question-answer pairs. Our approach is to improve the overall pipeline of lecture note reading comprehension involving context retrieving (finding the relevant slides) and text reading (identifying the correct information). Experiments show that initializing our text reader model with a pre-trained version on SQuAD significantly improve its performance on much limited lecture note dataset, comparing with both training from scratch and inferring from the pre-trained model. Narrowing down the search space by specifying departments of questions also helps improve document retriever results, thus we examine state-of-the-art sentence classifiers in predicting departments of questions.



Expedia Hotel Recommendation

A Kaggle competition project that aims to estimate the top predictions for hotels and recommend them to the users of Expedia.com, given their user details like personal information, destinations, number of travellers etc.


Fake News Detection

The Internet plays a major role in this technological era. Along with it, comes the widespread of fake news which could be hard to tell at the first glance. This has great potential to cause massive influence not only in the political realm but also many other sectors such as the financial markets. Hence, there is a need for readers in this time and age to have the ability to detect fake news before even gaining false insights from them.


Classifying Pneumonia with CXR Images

Pneumonia is a serious medical condition that are often not identified immediately due to the overwhelming number of chest radiograph (CXR) that doctors have to interpret. Thus this project aims to explore how machine learning can be applied to automate the interpretation process in order to prioritize and expedite the doctor diagnosis.


Detecting and Classifying Lung Diseases using X-ray images

In this project, we are using chest X-ray images to detect the presence of lung diseases and classifying the images with diseases into 14 different types of lung diseases.


Positive, Negative, or Neutral

Polarity-based sentiment analysis is a natural language task that predicts if a given sentence has positive, negative, or neutral tone. In our work, we will be building a model to perform sentiment analysis. This involves experimentation to find the best set of word embedding and hyperparameters that gives the a model that generalizes well to unseen sentences.



Creating a Pokemon Master through reinforcement learning


Cell Counting From Microscopic Images



Component Regularization for Domain-Specific Image Classification


NBA Shot Prediction

This project aims to predict whether can shooter can make a certain shot, given a set of relevant features of a basketball shot (e.g. shot clock, game clock, number of dribbles made by the shooting player before the shot etc.).


Comparative Study of Machine Learning Techniques on Musculoskeletal Abnormality Detection

A growing demand and public expectation for radiology services have been revealed by recent studies. As a result, problems such as mismatch between human labor and demand as well as potential hazards in management have been observed in hospitals. This challenge can be potentially tackled by introducing machine learning techniques to augment human diagnosis. This project is motivated to produce a holistic comparison among the existing machine learning solution paradigms in detecting abnormal radiographics. The result of this study can serve as a reference guide for researchers to evaluate existing approaches and design new solutions according to constraints in resources, time and expectation on performance.


Wyin Kok10 months ago
Such an amazing application!!!!!! Deserves to win the TOP place!
Chua Wen Feng10 months ago
They have good concept and system flow which I really like
Jeffrey Ang10 months ago
Bellyfast gives a good different take of food delivery system, efficiently linking inventory management with order sales.
Ang Boon Hwee10 months ago
Bellyfast gives a good different take of food delivery system, efficiently linking inventory management with order sales.
Jessie Sim10 months ago
Bellyfast gives a good different take of food delivery system, efficiently linking inventory management with order sales.
Nadia Bte Mohd Hamzah10 months ago
Bellyfast gives a good different take of food delivery system, efficiently linking inventory management with order sales.
Wong Ding Feng10 months ago
I will