High-throughput sequencing technologies enabled the characterization of the genomic landscape of many tumours (1,2). This has led to the discovery of many genes that, when mutated, support tumorigenesis, metastasis and affect outcome; these genes include the well-known tumour suppressor TP53 and the oncogenes KRAS and MYC.
However, during the course of the disease, cancer cells continuously acquire mutations, leading to subpopulations of tumour cells with different molecular phenotypes and resistance to treatment. This is particularly true for blood cancers, such as multiple myeloma, where very small sub-populations resist to treatment and usually cause a relapse; this particular stage of the malignancy is called minimal residual disease (MRD).
There is little knowledge about the mutations acquired by cancer cells in multiple myeloma MRD; this is mostly due to the small number of cells (~500) that persist in the patients, which ultimately results on a low amount of DNA available for sequencing. Custom library preparation protocols have been developed to make exome sequencing possible, but they usually introduce artefacts that make detection of rare variants impossible with standard variant calling methods.
The student will develop new variant calling methods for rare variants in blood cancer from low-coverage exome sequencing data. He/she will develop both generative methods (hierarchical Bayesian models and variational models) and deep learning approaches (CNN, autoencoders); particular attention will be put on developing models to account for biases introduced by experimental protocols, such as whole genome amplification. The student will also learn how to write reproducible sequencing analyses pipelines and research software.
In collaboration with the Department of Hemato-oncology at the University of Ostrava, we have already generated exome data for 24 patients with multiple myeloma MRD, which we aim to analyse to identify targets for treatment.
The ideal candidate has a background in mathematics, computer science, computer engineering, statistics or physics or related fields. He/she is strongly motivated to develop a competitive profile in machine learning and cancer genomics, and likes to work in a fast-pace environment.
1. Stracquadanio, G., Wang, X., Wallace, M. D., Grawenda, A. M., Zhang, P., Hewitt, J., … Bond, G. L. (2016). The importance of p53 pathway genetics in inherited and somatic cancer genomes. Nature Reviews Cancer, 16(4), 251–265. doi:10.1038/nrc.2016.15
2. Fanfani, V., Citi, L., Harris, A. L., Pezzella, F., & Stracquadanio, G. (2019). Gene-level heritability analysis explains the polygenic architecture of cancer. doi:10.1101/599753