The field of Data Stream Mining is concerned with the analytics of high velocity Big Data Streams. A data stream is a sequence of consecutive data instances that is infinite and generated in real-time. Thus applications, such as data mining can only read the sequence once using limited computing and storage capabilities. Predictive analytics is one of the most important types of data mining techniques, where an unknown variable in a dataset is predicted. For example, imagine a sequence of twitter posts that is generated in real-time. One application could be to predict if a tweet is related to a specific topic, e.g. politics. A data stream predictor would learn a model that can then be applied to new tweets in order to predict whether they are related to politics. Particular challenges here are the generation of data mining models that automatically adapt to changes of the pattern encoded in the stream (concept drift). In the example a concept drift could be “breaking news” related to politics which influences the topics which are being discussed on twitter. Further application examples are detection of performance bottlenecks in computer networks or traffic congestion forecasting in smart cities.
The aim of this PhD project is to develop new cutting edge predictive analytics methods/algorithms for Big Data Streams that can forecast events ahead of time and adapt to concept drift. The project is in collaboration with an industry partner that will contribute real-world case studies and data stream processing infrastructure.