The genome of every organism encodes genes and regulatory elements required for building and maintaining cells. However, it is still unknown what is the minimal set of genomic elements necessary and sufficient for cellular life. Moreover, the concept of minimality is context dependent; for example, a genome can be minimal with respect to the number of genes or chromosomes, or even the amount of DNA.
Here, we are interested in finding the minimal set of genes to better understand how eukaryotic cells work and how to engineer new ones. However, finding a minimal genome by combinatorically deleting multiple genes is not feasible, as the number of experiments grows exponentially. Instead, we will use S. cerevisiae 2.0 (Sc2.0) as a model organism (1) with a synthetic genome that encodes an evolutionary system called Synthetic Chromosome Recombination and Modification by LoxP-mediated Evolution (SCRaMbLE).
The SCRaMbLE system enables genome minimization in vivo, by randomly deleting or inverting random genomic regions upon expression of the Cre protein. By sequencing populations of SCRaMbLEd genomes, we aim at reconstructing the set of minimal genes required for life (2).
The student will develop machine learning methods to identify changes in genome structure from next-generation sequencing data, and predict putative minimal genomes in-silico, ultimately generating testable hypotheses for lab testing. He/she will study both deep-learning (e.g. CNN, LSTMs) and generative models (e.g. hierarchical Bayesian models, variational networks), and learn how to write reproducible sequencing analysis pipelines and research software. The student will have the chance to collaborate with other members of the Sc2.0 consortium, and to present his/her work at the annual meetings.
The ideal candidate has a background in mathematics, computer science, computer engineering, statistics, physics or related fields. He/she is strongly motivated to develop a competitive profile in machine learning, data science, sequence analysis and synthetic genomics, while working in a fast-paced environment.
1. Richardson, S. M., Mitchell, L. A., Stracquadanio, G., Yang, K., Dymond, J. S., DiCarlo, J. E., … Bader, J. S. (2017). Design of a synthetic yeast genome. Science, 355(6329), 1040‑1044. doi:10.1126/science.aaf4557
2. Shen, Y., Stracquadanio, G., Wang, Y., Yang, K., Mitchell, L. A., Xue, Y., … Bader, J. S. (2015). SCRaMbLE generates designed combinatorial stochastic diversity in synthetic chromosomes. Genome Research, 26(1), 36‑49. doi:10.1101/gr.193433.115