This project aims to explore a Multimodal Large Language Model framework that enables Social Robots to interpret interaction contexts from various modality inputs, such as vision, language or audio, and provide interactions to users through multiple communication channels, such as speech, gestures or images.
Read more