Challenge
Human computer interaction can be reinvented by the use of natural interfaces which are build on the way we are used to interact, including spoken language, and natural hand or body gestures. The challenge is to face the audio-visual sensing, representation and machine learning challenges, and build intelligent automatic recognition interfaces for custom tailored vocabularies, grammars and actions.
Solution
This is a multiple modalities audio-gesture recognition software as a service for web-sites and apps, with which you may enable natural interface user control via gesture and/or spoken commands. You may select your own vocabulary and grammar and easily transform these commands into consequent machine actions. For automatic recognition it uses our server-side recognition system to provide the recognized voice-gesture command.
Results
The resulting system employs standard SDKs e.g. Kinect SDK to acquire skeleton and audio modalities and send audio and/or visual cues to our server which is responsible for multimodal recognition. Our state-of-the-art multimodal recognition framework recognizes speech and gesture and applies multimodal fusion to provide the most robust the final recognized gesture-voice commands. These commands are mapped to the intended software actions fitting the application needs. Up to now applications include an assistive communication interface for people with movement disabilities e.g. quadraplegia, and naturally-enabled website navigation.