With the emerging Internet of Things (IoT) technology, various human-computer interaction applications have become more and more popular recently. For example, indoor surveillance monitors the status of people and objects in the field. Unattended stores allow people to shop without a checkout counter. Interactive robots provide a better user experience by understanding user behaviour. A smart home empowers security control if a person can be identified as performing certain actions. By understanding the interaction between people and the environment, more applications can be developed to create personalized services and reduce labour costs.
In this project, we try to address the limitations of camera-based surveillance systems by using IoT technologies as the third-eye. Cameras that act as human eyes have been used for monitoring and surveillance for a long time. However, camera-based surveillance applications have limitations in terms of line of sight, overlap, lighting conditions, etc. For example, public CCTVs may have difficulty identifying criminals wearing masks. In this project, IoT devices, such as sensors, skeleton tracking devices, thermal imaging cameras, and drone views, will become the third eye, enabling future systems more intelligent by fusing multiple senses through modern Artificial Intelligence (AI)/Deep Learning (DL) technologies. The purpose of the project is to address the challenges from “single sense to multi-senses”, i.e., to mimic five basic senses of human (touch, sight, hearing, smell, and taste). Specifically, this Ph.D. project will tackle the problem of various data fusion technologies as follows:
· Fusing skeleton and inertial data
· Fusing video trajectories and wearable trajectory
· Fusing RGB, skeleton, and motion data
· Time serial data analysis
· Image data processing
· … and many more