Machine Learning Approaches for Improving the Usability of Big Data
Dr W Pang
Prof P Edwards
Prof G M Coghill
Applications accepted all year round
Self-Funded PhD Students Only
The term ‘Big Data’ is used to indicate massive amounts of complex data, be they structured or unstructured, real time or historical. Over the last few years, several important characteristics of Big Data have been highlighted: volume, velocity, and variety, have all received much attention.
However, an important aspect of Big Data is their usability, that is, are they good enough to be used for further analysis. The quality (or veracity) of Big Data may be degraded with the increase of dirty and imperfect data, which may originate from many sources, for instance, missing values, noise, systematic errors, and human errors.
Data pre-processing has been widely used in data mining tasks. In the context of Big Data this becomes extremely challenging. More sophisticated and effective approaches are urgently needed to deal with the quality issues surrounding Big Data and to improve their usability.
In this project we will investigate how machine learning (e.g. classification, deep learning, clustering) can be used to distinguish good data from bad ones and more importantly, to improve the usability of Big Data.
More specifically, we will investigate the following related topics:
(1) How can we extract salient features with respect to data quality so that these features can be used to train a machine learning model?
(2) How can we develop machine learning models specifically for evaluating data quality?
(3) How can we use machine learning to facilitate data imputation with an aim to improve the usability of Big Data.
(4) Can Bio-inspired computing and deep learning be used to develop such machine learning systems?
In this project we will collaborate with some SMEs and solve their real-world problems. This will be beneficial for the student to gain experience of working with industry.
The successful candidate should have, or expect to have, an Honours Degree at 2.1 or above (or equivalent) in Computer Science or related disciplines.
Knowledge of: Essential: machine learning basics; programming experience in Java, Python, Ruby, or Scala.
Desirable: Data Quality, Information Governance
There is no funding attached to this project, it is for self-funded students only
This project is advertised in relation to the research areas of the discipline of Computing Science. Formal applications can be completed online: http://www.abdn.ac.uk/postgraduate/apply. You should apply for PhD in Computing Science, to ensure that your application is passed to the correct College for processing. NOTE CLEARLY THE NAME OF THE SUPERVISOR and EXACT PROJECT TITLE ON THE APPLICATION FORM. Applicants are limited to applying for a maximum of 2 projects. Any further applications received will be automatically withdrawn.
Informal inquiries can be made to Dr W Pang ([email protected]) with a copy of your curriculum vitae and cover letter. All general enquiries should be directed to the Graduate School Admissions Unit ([email protected]).