160 students in 27 teams
This module introduces basic concepts and algorithms in machine learning and neural networks. The main reason for studying computational learning is to make better use of powerful computers to learn knowledge (or regularities) from the raw data. The ultimate objective is to build self-learning systems to relieve human from some of already-too-many programming tasks. At the end of the course, students are expected to be familiar with the theories and paradigms of computational learning, and capable of implementing basic learning systems.
"Hi, how are you?" this sentence may be simple for us to understand yet is incomprehensible for a computer. Our project aims to explore how can we model the daily language of Singapore (Singlish) using mathematical representation. One common way to represent words would be the use of vectors to capture the semantic meanings of the words. Another name of such vector representation is known as the word embedding. After we successfully modeled the Singlish words, we performed an extrinsic evaluation on the embeddings obtained by performing sentiment analysis on sentences to see how well the word embeddings capture the semantic meaning of the Singlish words.
Nowadays, we do not lack data of human beings in image or video format. In Singapore, the government has installed more than 80,000 police cameras. However, it becomes practically impossible to manually watch all the video recordings and understand what is happening/has happened. We present a modern approach for allowing computers to recognize multiple human attributes from images. A Convolutional Neural Network (CNN) model is trained to predict a sequence of descriptive attributes, such as gender, age, clothes.
Pneumonia is an infectious disease and a leading cause of mortality globally. Timely and accurate diagnosis is crucial in preventing spread of the disease. We aim to explore the most suitable machine learning model to diagnose pneumonia and to determine pre-processing techniques that will improve our models’ accuracy which will help to expedite the diagnosis process.
With the evolution of technology, platforms, such as social media,that allows the communication of personal thoughts and feelings are increasingly prevalent. However, this degree of freedom is associated with problems such as promoting hate, hurling abuse anonymously or cyber-bullying - resulting in a toxic online community.Hence, this project aims to come up with a multi-headed model to distinguish toxic comments on Wikipedia from clean ones, and to identify the types of toxicity present.
The stock market is a multi agent environment comprised of human and computer traders. Over the past decade, algorithmic trading has become increasingly popular and new developments in AI have resulted in new algorithms that mimic the decision making process of human traders better than ever before. One of the most popular approaches is to look at how news impacts the stock market. We compare three approaches of headlines manipulation to better understand which feature is best at predicting the volatility of a stock. These features are raw text, extracted events and sentimentality of the news headlines.
Since DeepMind proposed playing Atari with deep reinforcement learning in 2013, many researchers have attempted to reproduce their results. In this project, we will reproduce Deep Q-Network(DQN for short) on a dungeon crawler game, “the binding of Isaac”, which is also a pixel-style game like Atari games but is more challenging for human players.
We are trying to identify salt deposits in seismic images , because drilling into areas of salt is dangerous for oil and gas companies. The problem is an object detection problem with output 1 (salt) or 0 (rock) for each pixel of the image. We found that it is not the color of the pixel but rather its color contrast with neighbouring pixels that determine whether the said pixel is salt. Therefore, the key is to identify salt-rock boundaries within each image, thereby making the problem analogous to an edge detection problem.
$1.4 trillion!! That's the amount of student debt in the US. This amount is further increasing and so is the number of defaults on student loans. How can this be solved? We feel that the first step is to have an idea of where you stand. To make this possible, we developed a Machine Learning powered website based on a Gradient Boosting and Regressor Chain model to predict your future earnings and loan repayment rates.
Predicting sales performance of retail companies would be useful in making good investment decisions on the company. Traditional econometrics methods, such as time series analysis and multivariate linear regression come with simplicity, but they are also having strong model assumptions, low flexibility, and a limited scope into the past. After learning about Neural Networks in CS3244 lessons, we would like to study how to apply recurrent neural networks models to capture time series effects in the data. We have trained models using Long Short Term Memory, Gated Recurrent Units, and Ensemble Methods with bagging. With the models trained, we assessed and compared their prediction accuracy and computational feasibility, and hence reinforced our understanding on selecting the optimal model for various types of data.
Despite the implementation of fraud analytics and Europay, MasterCard and Visa (EMV) technology, credit card fraud rates have risen over the years due in part to increasing prevalence of cashless payments. In this study, we compare the effectiveness of traditional supervised learning methods and deep learning methods in detecting fraudulent credit card transactions via an appropriate performance metric. The findings from this comparative study could help credit card companies improve their fraud detection technology, and could be extended to detect other types of fraud.
Pneumonia is the largest cause of death in children worldwide. However, detection of pneumonia can be challenging due to other lung conditions that will also appear as increased lung opacity on chest radiographs. Automating the detection of potential pneumonia cases can ultimately save more lives. We built a Mask R-CNN model to locate lung opacities on chest radiographs and detect the visual signal for pneumonia. We have also tapped into some advanced neural network techniques to enhance our model to perform better. We will be able to see how deep learning can be applied to medical images.
Foot traffic has traditionally been an important consideration for those involved in advertising, business siting and city planning. Using sensor and weather data provided by the city of Melbourne, we studied the feasibility of predicting footfall in various locations of Melbourne. We applied linear regression and neural network models to the problem, and investigated the relative importance of each feature, as well as the use of an initial embedding layer. Our results showed that time was the most important feature, and we were able to achieve a reasonable accuracy with our final regression model, thus demonstrating the feasibility of this problem.
There is a wide variety of supervised machine learning algorithms, each with its inspirations and roots, advantages and disadvantages. We seek to explore these algorithms in detail to gain a deeper understanding of them and how they perform compared to each other for image classification. To do that, we compared the models’ ability to classify images of fashion items and identified which of them are best suited to be employed in several situations that required emphasis in different aspects of performance. Finally, using our acquired insights, we developed an original model JoNet-0 and achieved better accuracy than that achieved by the models we had previously implemented.
This project explores the application of open-domain Question Answering (QA) in learning materials with a contribution of a lecture note dataset, called LNQA, annotated with question-answer pairs. Our approach is to improve the overall pipeline of lecture note reading comprehension involving context retrieving (finding the relevant slides) and text reading (identifying the correct information). Experiments show that initializing our text reader model with a pre-trained version on SQuAD significantly improve its performance on much limited lecture note dataset, comparing with both training from scratch and inferring from the pre-trained model. Narrowing down the search space by specifying departments of questions also helps improve document retriever results, thus we examine state-of-the-art sentence classifiers in predicting departments of questions.
The Internet plays a major role in this technological era. Along with it, comes the widespread of fake news which could be hard to tell at the first glance. This has great potential to cause massive influence not only in the political realm but also many other sectors such as the financial markets. Hence, there is a need for readers in this time and age to have the ability to detect fake news before even gaining false insights from them.
Pneumonia is a serious medical condition that are often not identified immediately due to the overwhelming number of chest radiograph (CXR) that doctors have to interpret. Thus this project aims to explore how machine learning can be applied to automate the interpretation process in order to prioritize and expedite the doctor diagnosis.
Polarity-based sentiment analysis is a natural language task that predicts if a given sentence has positive, negative, or neutral tone. In our work, we will be building a model to perform sentiment analysis. This involves experimentation to find the best set of word embedding and hyperparameters that gives the a model that generalizes well to unseen sentences.
A growing demand and public expectation for radiology services have been revealed by recent studies. As a result, problems such as mismatch between human labor and demand as well as potential hazards in management have been observed in hospitals. This challenge can be potentially tackled by introducing machine learning techniques to augment human diagnosis. This project is motivated to produce a holistic comparison among the existing machine learning solution paradigms in detecting abnormal radiographics. The result of this study can serve as a reference guide for researchers to evaluate existing approaches and design new solutions according to constraints in resources, time and expectation on performance.