Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Utilizing Data Augmentation Techniques for Applying Machine Learning Techniques in Survival Analysis with Limited Data Availability


   School of Science, Engineering and Environment

This project is no longer listed on FindAPhD.com and may not be available.

Click here to search FindAPhD.com for PhD studentship opportunities
  Dr Kaveh Kiani, Dr Taha Mansouri  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Information on this PhD research area can be found further down this page under the details about the Widening Participation Scholarship given immediately below.

Applications for this PhD research are welcomed from anyone worldwide but there is an opportunity for UK candidates (or eligible for UK fees) to apply for a widening participation scholarship.

Widening Participation Scholarship: Any UK candidates (or eligible for UK fees) is invited to apply. Our scholarships seek to increase participation from groups currently under-represented within research. A priority will be given to students that meet the widening participation criteria and to graduates of the University of Salford. For more information about widening participation, follow this link: https://www.salford.ac.uk/postgraduate-research/fees. [Scroll down the page until you reach the heading “PhD widening participation scholarships”.] Please note: we accept applications all year but the deadline for applying for the widening participation scholarships in 2024 is 28th March 2024. All candidates who wish to apply for the MPhil or PhD widening participation scholarship will first need to apply for and be accepted onto a research degree programme. As long as you have submitted your completed application for September/October 2024 intake by 28 February 2024 and you qualify for UK fees, you will be sent a very short scholarship application. This form must be returned by 28 March 2024. Applications received after this date must either wait until the next round or opt for the self-funded PhD route.

-----------------

Project description:

Introduction:

Survival analysis is a subfield of statistics that deals with predicting the duration of time until a specific event occurs, such as the failure of a machine, occurrence of a disease, or death of a patient. In recent years, machine learning techniques have been used in survival analysis to improve prediction accuracy, but these methods require large amounts of data. However, in real-world scenarios, data availability is often limited due to various reasons, such as ethical and privacy concerns, rare events, or the high cost of data collection. Hence, it is essential to explore methods to perform survival analysis with limited data availability.

One of the potential solutions to this problem is to use data augmentation techniques, which are methods to generate new synthetic data points based on the existing data. Data augmentation can increase the size of the dataset, improve the generalization ability of the model, and mitigate the effect of overfitting. In this project, we aim to explore the use of data augmentation techniques for applying machine learning techniques in survival analysis with limited data availability.

Background:

Survival analysis is a widely used statistical technique in various fields, such as engineering, medicine, social sciences, and economics. The primary goal of survival analysis is to model the hazard function, which represents the instantaneous probability of the event occurring at a given time, conditional on the event not occurring before that time. The most common methods for survival analysis are Kaplan-Meier estimator, Cox proportional hazards model, and accelerated failure time model. These methods have been widely used and studied, but they require a sufficient amount of data to obtain reliable estimates.

Machine learning techniques have been shown to improve the prediction accuracy of survival analysis, especially when dealing with high-dimensional data or complex relationships between variables. The most common machine learning methods for survival analysis are random forests, support vector machines, neural networks, and Bayesian models. However, these methods require large amounts of data to avoid overfitting and achieve good performance.

Data augmentation techniques have been widely used in machine learning to improve the generalization ability of the model and mitigate the effect of overfitting. Data augmentation can be performed in various ways, such as adding noise, rotating, scaling, cropping, and flipping the images, or generating new synthetic samples based on the existing data. Data augmentation has been shown to be effective in image classification, natural language processing, and speech recognition tasks, but its application in survival analysis is relatively limited.

Objectives:

The main objectives of this project are:

  • To explore the use of data augmentation techniques for improving the performance of machine learning methods in survival analysis with limited data availability.
  • To accommodate various censoring mechanisms in the augmented data.
  • To compare the performance of different data augmentation techniques, such as synthetic minority oversampling technique (SMOTE), generative adversarial networks (GANs), and Bayesian data augmentation.
  • To evaluate the impact of different data augmentation techniques on the model's accuracy, precision, recall, and F1-score.
  • To investigate the effect of the amount of data augmentation on the model's performance and determine the optimal level of data augmentation.

Methodology:

The project will be conducted in several stages:

  • Data collection: We will collect real-world datasets from various sources, such as healthcare, engineering, and social sciences, that have limited data availability and survival outcomes. We will preprocess the data to remove missing values, outliers, and irrelevant features.
  • Data augmentation: We will apply various data augmentation techniques to the original dataset to generate new synthetic samples. We will use SMOTE, GANs, and Bayesian data augmentation methods to create new data points.
  • Feature engineering: We will perform feature engineering to extract relevant features from the data and transform them into a suitable format for machine learning algorithms.
  • Model selection: We will compare the performance of different machine learning algorithms, such as random forests, support vector, … and will recommend the best solution to the problem.
Computer Science (8) Mathematics (25) Medicine (26)

Where will I study?

 About the Project