Don't miss our weekly PhD newsletter | Sign up now Don't miss our weekly PhD newsletter | Sign up now

  Emergent Properties of Large Databases


   Faculty of Engineering, Computing and the Environment

   Applications accepted all year round  Self-Funded PhD Students Only

About the Project

The emergence of unusual or unexpected distributions in large data samples has resulted in some well-known laws, which includes Zipf’s Law in textual analysis [1], the Pareto distribution in the measurement of wealth [2], Benford’s Law in the distribution of first digits in real-world measurements [3], and Chargaff’s Second Parity Rule in genetics [4]. More recently, powerful arguments based on a synthesis of classical statistical mechanics and information theory have shown how such distributions can arise naturally in what appear to be otherwise unrelated systems or areas of study [5] and which are scale independent. What these systems have in common is their capacity to be represented by symbols that carry no intrinsic meaning other than their ability to be distinguished from each other.

We propose in this project to explore further the emergence of such global properties in appropriate databases, beginning with publicly accessible databases of proteins. The aim of such explorations is to gain some insight into the departures from the expected equilibrium distributions, and to determine whether metrics can be produced that provide information on the size of the departure from equilibrium. It is expected that the outcomes of this work will have applications in a broad range of areas, including molecular biology, economics, ecology, biology, organisational structures and beyond.

The project will require strong analytical and computational skills, and would particularly suit a graduate in one of the following areas: Computer Science, Theoretical Physics or Physics, Theoretical Chemistry, Mathematics or Applied Mathematics, Economics, or equivalent. A background knowledge in this area is not required, but enthusiasm and the patience to work with large datasets is essential, along with a drive to produce efficient programming code, so a programming background in any language, or range of languages, would be helpful. 


Computer Science (8) Mathematics (25) Physics (29)

References

1. Kawamura K and Hatano N. 2002 Universality of Zipf's law. J. Phys. Soc. Jpn 71, 1211–1213. doi:10.1143/JPSJ.71.1211
2. https://en.wikipedia.org/wiki/Pareto_distribution accessed 14.01.2022
3. F. Benford 1938 The law of anomalous numbers Proc. Am. Philos. Soc. 78(4): 551–572
4. Chargaff E, Lipshitz R, Green C 1952 Composition of the deoxypentose nucleic acids of four genera of sea-urchin J Biol Chem. 195(1): 155–160. doi:10.1016/S0021-9258(19)50884-5
5. Hatton, L and Warr, G 2019 Strong evidence of an information-theoretical conservation principle linking all discrete systems Royal Society Open Science 6(10), p. 191101. ISSN (online) 2054-5703

Register your interest for this project


Search Suggestions
Search suggestions

Based on your current searches we recommend the following search filters.