This project aims to develop machine learning techniques for detecting anomaly traffic in Internet of Things (IoT) Devices, especially in Smart Homes. As billion IoT devices have been used worldwide, their security becomes an important problem. To prevent IoT devices from hacking attack, Intrusion Detection Systems (IDS) can be used in IoT systems. Network attacks often lead to certain changes in network traffic, or anomaly traffic. Symptoms of attacks can be detected by IDS through traffic anomalies. Currently many works exploit deep learning in IDS for efficiently detecting attacks. In this project, a smart home environment will be set up consisting of different smart devices. Simulated attacks are implemented in the smart home. Traffic data (normal and abnormal) are collected and used for the purpose of research in this project. In smart home environment, attacks seldom happen, and the collected data will be regarded as imbalanced data.
This project will investigate a new way of applying machine learning to anomaly traffic detection, which combines both supervised leaning and non-supervised learning together. For detecting the behavioural change of IoT devices, both clustering and supervised learning will be applied to the analysis of the collected traffic data. Through clustering, IoT traffic attributes are clustered to characterize devices’ behaviour. Based on clustered data, deep learning models will be designed and built for classifying normal and abnormal traffic. The models will be trained with hyper-parameter tuning for the best performance. The performance of these models will be evaluated. Furthermore, a new architecture of IDS which combines the approaches of clustering and classification will be designed for Smart Home security. This architecture will provide a new framework for monitoring and analysing Smart Home devices’ anomaly traffic. The main research questions in this project are
- Whether the performance of the combination of clustering and classification models is better than pure classification models?
- How to develop efficient machine learning models which combine supervised learning and non-supervised learning together for reduce the impact of data imbalanced data?