The emergence of unusual or unexpected distributions in large data samples has resulted in some well-known laws, which includes Zipf’s Law in textual analysis , the Pareto distribution in the measurement of wealth , Benford’s Law in the distribution of first digits in real-world measurements , and Chargaff’s Second Parity Rule in genetics . More recently, powerful arguments based on a synthesis of classical statistical mechanics and information theory have shown how such distributions can arise naturally in what appear to be otherwise unrelated systems or areas of study  and which are scale independent. What these systems have in common is their capacity to be represented by symbols that carry no intrinsic meaning other than their ability to be distinguished from each other.
We propose in this project to explore further the emergence of such global properties in appropriate databases, beginning with publicly accessible databases of proteins. The aim of such explorations is to gain some insight into the departures from the expected equilibrium distributions, and to determine whether metrics can be produced that provide information on the size of the departure from equilibrium. It is expected that the outcomes of this work will have applications in a broad range of areas, including molecular biology, economics, ecology, biology, organisational structures and beyond.
The project will require strong analytical and computational skills, and would particularly suit a graduate in one of the following areas: Computer Science, Theoretical Physics or Physics, Theoretical Chemistry, Mathematics or Applied Mathematics, Economics, or equivalent. A background knowledge in this area is not required, but enthusiasm and the patience to work with large datasets is essential, along with a drive to produce efficient programming code, so a programming background in any language, or range of languages, would be helpful.