Mass spectrometry (MS) based proteomics is the method of choice for characterizing proteins to understand biological functions and processes, elucidate signaling networks, discover disease biomarkers for human and identify key genes underlying important traits in plants. Computational methods for proteomics play an essential role in interpreting MS data and generating biological insights, but their potentials remains to be fully exploited.
Particularly in a plant proteomics experiments, fewer than 20% of the high-quality MS/MS spectra acquired can be meaningfully interpreted. This largely reflects the limited sensitivity of current computational methods and the lack of complete, accurate and concise protein isoform sequence information
Here we propose to address these critical issues in proteomics by developing a novel probability based computational approach for Peptide Spectrum Match (PSM), which transforms the mass information to sequence information and improves peptide identification by increasing accuracy, sensitivity, and capability for resolving mass shifting events and assessing false discoveries.
Outline plan for student work for the first 12 – 18 months
1-6 Mo Survey the literature on principles and methods for mass spectrometry-based shotgun proteomics. The student will develop in-depth knowledge about the technology and sample preparations; and Carry out literature reviews on the computational methods for proteomics data processing, including peak detection, peptide spectrum match, searching for post translational modifications and statistical methods on false positive control.
7-12 Mo Learn to use a variety of tools for basic bio-informatic analysis. The student will learn to pre-process large scale proteomics data, perform proper quality controls, understand and master various tools on peak detection, peptide/protein/PTM identifications. Learn to carry out customized analysis in R/python language and use shell script under Unix. Start collecting relevant data and start the data-processing. We can make use of the extensive shotgun proteomics data of Arabidopsis and barley available in Hemsley lab and Waugh lab.
13-18 Mo Prototype a novel method that transforms the mass information to sequence information and improves peptide identification. The first step is to process the raw MS/MS spectrum, calculate theoretical fragment masses for a given peptide, and identify all the matches between the theoretical fragment masses and the processed peaks in the spectrum. For the second step, mass information from all matched peaks will be transformed into sequence information and mapped onto the peptide with a probability. In the third step, mapping fragment ion sequences onto the peptide and identify mass shift events, such as PTMs, on the unmapped regions; Finally, the probability of the peptide sequence being the correct match can be estimated by the probabilities of the observed b and y ions.
Description of how the student will have the opportunity to provide intellectual input into the direction of the project.
Every step of developing this novel method will involve extensive explorations of methodologies. Multiple computational methods/strategies are possible to achieve the same goal. It is important for the student to understand the underlying principles of these methods and choose the appropriate method while being clear of its limitations; There can be individual research focus derived from each step of developing this method too. The student can choose an element of special interest/importance and expand the scope and increase the depth.
The studentship is funded under the James Hutton Institute/University Joint PhD programme, for a four year study period, in this case with the University of Dundee. Applicants should have a first-class honours degree in a relevant subject or a 2.1 honours degree plus Masters (or equivalent).Shortlisted candidates will be interviewed in Jan/Feb 2019. A more detailed plan of the studentship is available to candidates upon application. Funding is available for European applications, but Worldwide applicants who possess suitable self-funding are also invited to apply.
1. Cristiane P. G. Calixto, Wenbin Guo, Allan James, Nikoleta Tzioutziou, Juan Entizne, Paige Panter, Heather Knight, Hugh Nimmo, Runxuan Zhang and John Brown, "Rapid and dynamic alternative splicing impacts the Arabidopsis cold response transcriptome", The Plant Cell, accepted, 2018.
2. Wenbin Guo, Cristiane P. G. Calixto, John W.S. Brown, Runxuan Zhang, "TSIS: an R package to infer alternative splicing isoform switches for time series data", Bioinformatics, https://doi.org/10.1093/bioinformatics/btx411, 2017.
3. Wenbin Guo, Cristiane P. G. Calixto, Nikoleta A. Tzioutziou, Ping Lin, Robbie Waugh, John W.S. Brown, Runxuan Zhang, "Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size", BMC Systems Biology, 11:62, 2017
4. Runxuan Zhang, Alun Barton, Julie Brittenden, Jeffrey T.-J. Huang and Daniel Crowther, "Evaluation for computational platforms of LC-MS based label-free quantitative proteomics: A global view", Journal of Proteomics & Bioinformatics, vol 3(9): 260-265, 2010.
5. Monica A. Grobei, Ermir Qeli, Erich Brunner, Hubert Rehrauer, Runxuan Zhang, Bernd Roschitzki, Konrad Basler, Christian H. Ahrens and Ueli Grossniklaus, "Deterministic protein inference for shotgun proteomics data provides new insights into Arabidopsis pollen development and function", Genome Research, 19: 1786-1800, 2009.