Open problems in the SSI review set.
This page distills the current review database into the blockers that keep coming back. It is a research agenda, not a ranking and not a prediction.
Every item below stays tied to source review pages, so the agenda remains traceable to the records.
Recurring gaps
The shortest agenda from the current database is simple: shrink the hardware, widen the vocabulary, prove live latency, and show that the systems survive more users and more conditions.
Wearable hardware is still too awkward for daily use
SilentSpeller still needs a custom-fitted in-mouth retainer, SottoVoce still uses a bulky under-jaw probe with ~2.61 s latency, and NasoVoce still leaves fully streaming operation and physiological calibration for future work.
SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography · SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks · NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction
Open vocabulary is still weak
The current reviews still describe letter-only text entry, four-command ultrasound control, small closed vocabularies, and high open-vocabulary error rates as the main language-coverage blockers.
SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography · SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks · Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency · End-to-end Silent Speech Recognition with Acoustic Sensing · Digital Voicing of Silent Speech
Real-time interaction is still not proven
The fast path still lacks live deployment evidence: SottoVoce is too slow for real time, FastLTS is not a live camera-to-audio system, and NasoVoce still treats streaming as future work.
SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks · NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction · FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis · Video-Driven Speech Reconstruction using Generative Adversarial Networks
Generalization across users and conditions remains thin
Several papers still rely on speaker-dependent training, small cohorts, or controlled corpora, so cross-user robustness and in-the-wild behavior remain unproven.
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks · Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks · Video-Driven Speech Reconstruction using Generative Adversarial Networks · End-to-end Silent Speech Recognition with Acoustic Sensing