FREE PhD Study Fairs in Sheffield & Edinburgh | REGISTER NOW FREE PhD Study Fairs in Sheffield & Edinburgh | REGISTER NOW

Statistical inference for entropy, divergences and Rényi information


   Cardiff School of Mathematics

  ,  Applications accepted all year round  Self-Funded PhD Students Only

About the Project

Entropy and divergence (Shannon and Kullback-Leibler) estimation is a central problem in image processing, with many applications for image compression, segmentation, calibration, registration, etc. Mutual information, which is strongly related to Shannon entropy and Kullback-Leibler divergence, is a widely used measure of similarity between images. The entropy of a random variable is the expected information gained by observing a realisation of that random variable. To estimate the entropy of a random vector, the naïve approach is to partition the sample space into a finite number of cells, estimate the density in each cell by the proportion of sample points falling into that cell, then estimate the entropy by that of the associated empirical density function. For non-uniform distributions, fixed partitions lead to low occupancy numbers in sparse regions and poor resolution in dense regions. In a seminal paper, Kozachenko & Leonenko (1987) proposed an alternative approach to the problem of entropy estimation, based on the expected distance between a point and its nearest neighbour in the sample. Nearest neighbour relations define an adaptive partition of the sample space, which allows control of the number of points in each spatial cell, and hence the computation time of the algorithm. This method was generalized on the k-th nearest neighbour statistics in Goria, Leonenko, Mergel and Novi Inverardi (2005). The method permits to work with high dimensional data, that is, to compare images not only in terms of their local pixel intensities or colours, but also to use their spatial characteristics through the properties of neighbouring pixels, which yields in turn much more efficient methods for comparing images.

Relative entropy and mutual information extend the notion of entropy to two or more random variable. Evans (2008a) showed that estimators based only on nearest neighbour relations in the marginal space can be computationally expensive, and shows how computational efficiency can be maintained by considering nearest neighbour relations in the joint probability space. Leonenko et al. (2008, 2010) presented a more general class of estimators for Rényi entropy and divergence, and showed that these estimators satisfy a Strong Law of Large Numbers. Evans (2008b) showed that this also holds for a broad class of nearest-neighbour statistics, see also Wang, Q., Kulkarni, S. R.; Verdú, S. (2009).

Leonenko and Seleznev (2010) (see also Kallberg, Leonenko and Selezven, O. (2014) for an extension for dependent data) proposed a new method of estimation for ε-entropy and quadratic Rényi entropy for both discrete and continuous densities based on the number of coincident (or ε-close) vector observation in the corresponding sample. They developed a consistency and asymptotic distribution theory for this scheme, based on the U-statistics theory.

In a series of papers of Prof. Leonenko and his collaborators ([4,7,9,11]) the analogous of Rényi entropy was constructed and studied in relation to the so-called multifractal analysis, which is very important for applications in turbulence as well as other areas of Physics, where the multifractal behaviour is typical for real data. They have recently developed an envelop a new form of multifractal measures and multifrcatal spectrums and Rényi functions related mainly to log hyperbolic distributions, constructed from multifractal products of stochastic processes. These multifractal measures possess further desirable properties such as a natural form of the singularity spectrum and dependence structure. The new paper of Denisov and Leonenko (2011) contains the complete proof of Rényi functions envelop for a two schemes related to the multifractal products of geometric Ornstein-Uhlenbeck processes driven by Levy noise.

One of the main aims of the project is to develop an asymptotic theory of the nearest neighbor estimates of Shannon and Rényi information, in particular to prove an asymptotic normality using the ideas of the paper [11].

The project also will consider a statistical methods for ε-entropy and quadratic Rényi entropy in the case of dependent data, see [12].

For the recent development, see [13,14].

For further details please contact Prof. Nikolai Leonenko


Funding Notes

We are interested in pursuing this project and welcome applications if you are self-funded or have funding from other sources, including government sponsorships or your employer

References

1. Kozachenko, L. F. and Leonenko, N. N. (1987) Sample estimate of entropy of a random vector. Problems of Information Transmission 23, 95-101
2. Goria, M.N., Leonenko, N.N., Mergel, V.V. and Novi Inverardi, P. L. (2005) On a class of random vector entropy estimators and its applications in testing statistical hypotheses, Journal of Nonparametric Statistics ,17, N 3, 277—297
3. Leonenko, N. N., Pronzato, L. and Savani, V. (2008) A class of Rényi information estimators for multidimensional densities. Annals of Statistics 36 (5), 2153-2182.
Corrections, Annals of Statistics, 2010, 38, N6, 3837-3838
4. Anh, V.V, Leonenko, N.N, and Shieh N.-R., (2008), Multifractality of products geometric Ornstein-Uhlenbeck type processes, Advanced of Applied Probability, 40, N4, 1129-1156
5. Evans, D. (2008a) A computationally efficient estimator for mutual information. Proceedings of the Royal Society A 464 (2093), 1203-1215
6. Evans, D. (2008b) A law of large numbers for nearest neighbour statistics. Proceedings of the Royal Society A 464 (2100), 3175-3195
7. Anh, V.V, Leonenko, N.N, and Shieh, N.-R., (2009), Multifractal scaling of products of birth-death processes, Bernoulli, 15 (2), 508-531
8. Wang, Q., Kulkarni, S. R.; Verdú, S. (2009) Divergence estimation for multidimensional densities via k -nearest-neighbor distances. IEEE Trans. Inform. Theory 55, no. 5, 2392–2405
9. Anh, V.V, Leonenko, N.N, and Shieh N.-R., Taufer, E. (2010), Simulation of multifractal products of Ornstein-Uhlenbeck type processes, Nonlinearity, 23, 823-843
10. Leonenko, N. N. and Seleznev, O. (2010) Statistical inference for ε-entropy and quadratic Rényi entropy, Journal of Multivariate Analysis, 101, 1981-1994
11. Penrose, M. D.; Yukich, J. E. Limit theory for point processes in manifolds. Ann. Appl. Probab. 23 (2013), no. 6, 2161–2211
12. Kallberg, N.Leonenko and O.Seleznev (2014) Statistical estimation of quadratic Renyi entropy for a stationary m-dependent sequence, Journal of Nonparametric Statistics, 26, no. 2, 385–411
13. T. B. Berrett, R. J. Samworth, M. Yuan (2019) Efficient multivariate entropy estimation via k-nearest neighbour distances. Annals of Statistics, 47(1):288—318.
14. A.V.Bulinski and D.Dimitrov (2019). Statistical estimation of Shannon entropy, Acta Mathematica Sinica, 35(1):17-46.

How good is research at Cardiff University in Mathematical Sciences?


Research output data provided by the Research Excellence Framework (REF)

Click here to see the results for all UK universities

Email Now


PhD saved successfully
View saved PhDs