High dimensional data comes with computational, statistical, inferential, geometric, and interpretational challenges in general. However, the inherent difficulty of these problems may vary hugely across problem instances and particular data sets. We begun developing a theory for high dimensional data analytics to better understand and capture this effect in terms of some hidden low complexity geometry. Using random dimensionality reduction in a purely analytic role, we are able to provide tighter and dimension-free risk guarantees that exploit naturally occurring structures in some high dimensional learning problems. These have also shown potential to explain and draw new connections between some previously existing successful machine learning algorithms.
The purpose of this project is to develop new algorithms from this theory, and test their applicability in practice. In particular:
(1) To exploit the explanatory power of random dimensionality reduction more widely in high dimensional data analytics
(2) To develop efficient implementations of algorithms suggested by the theoretical guarantees
(3) To develop applications in high dimensional data problems, such as (but not limited to) text mining, and image understanding, in collaboration with Turing Research Fellows
(4) To investigate the applicability of such methods to privacy preserving data analytics in collaboration with Turing Research Fellows, especially the use of random dimensionality reduction to obfuscate individual data items in predictive data analytics and aggregation based exporatory data analysis.
The above objectives contribute to furthering and testing our theory of high dimensional learning and high dimensional data analytics. At the same time, successful outcomes have practical value. For instance, we expect to create algorithms whose time complexity depends on the complexity of the inherent structure of the problem rather than the dimension of its initial representation. We expect such results and tools will be of use beyond this specific project.
The project complements our ongoing EPSRC-funded project "FORGING: Fortuitous Geometries and Compressive Learning" (https://gtr.ukri.org/projects?ref=EP%2FP004245%2F1
The successful candidate will have excellent analytical and problem solving skills, and strong programming skills, evidenced by an excellent MSc/MSci degree in a numerate subject (Mathematics, or Computer Science), and should be passionate to pursue ambitions research at the interface of Statistics, Algorithms, Signal Processing, and Forensics.
A. Kaban. Dimension-free error bounds from random projections. 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), to appear.
A. Kaban, Y. Thummanusarn. Tighter guarantees for the compressive multi-layer perceptron. 7th International Conference on the Theory and Practice of Natural Computing (TPNC18), Dublin, Ireland, December 12-14, 2018.
A. Kaban, R.J. Durrant. Structure-aware error bounds for linear classification with the zeroone loss. arXiv:1709.09782
A. Kaban. On Compressive Ensemble Induced Regularization: How Close is the Finite Ensemble Precision Matrix to the Infinite Ensemble? The 28th International Conference on Algorithmic Learning Theory (ALT 2017), Kyoto University, Japan, 15-17 October 2017.
R.J. Durrant, A. Kaban. Random projections as regularizers: Learning a linear discriminant from fewer observations than dimensions. Machine Learning 99(2), pp. 257-286, 2015.