Silent Speech Interfaces

Project Overview

This area aims to provide a research basis for tools to enable non-verbal speech interfaces. We built deep neural network architectures to predict formant values from mid-sagittal tongue curves shown via ultrasound imaging. Formant values were then fed into a speech synthesizer that produced vowels. Our work shed light on synthesizing speech from articulatory gestures and made contribution to communication of patients who received laryngectomy. Another project in this area aims to map hand gestures to speech sounds via a cyber glove. This approach removes speech from the vocal tract, allowing us to observe speech control independent of body parts. Hand movements such as wrist flexion and finger abduction can be mapped onto acoustic and other parameters, such as F1 and F2, which can then be used as input to a speech synthesizer to produce vowels. We have also investigated the role of visual feedback in learning to use this gesture-to-vowel synthesizer.

Selected Publications / Conferences

Liu, Y., Saha, P., Shamei, A., Gick, B., & Fels, S. (2020). Mapping a Continuous Vowel Space to Hand Gestures. Canadian Acoustics, 48(1).

Liu, Y., Saha, P., & Gick, B. (2020). Visual Feedback and Self-monitoring in Speech Learning via Hand Movement. Abstract accepted by the Acoustical Society of America 179th meeting.

*Saha, P., Liu, Y., Gick, B., & Fels, S. (2020, October). Ultra2Speech-A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 473-482). Springer, Cham.

*This project won the Young Scientist Award at MICCAI 2020

Spam prevention powered by Akismet