Research agenda 4 recurring gaps

Open problems in the SSI review set.

This page distills the current review database into the blockers that keep coming back. It is a research agenda, not a ranking and not a prediction.

Every item below stays tied to source review pages, so the agenda remains traceable to the records.

Review rubric Compare approaches

Recurring gaps

The shortest agenda from the current database is simple: shrink the hardware, widen the vocabulary, prove live latency, and show that the systems survive more users and more conditions.

wearability3 source reviews

Wearable hardware is still too awkward for daily use

SilentSpeller still needs a custom-fitted in-mouth retainer, SottoVoce still uses a bulky under-jaw probe with ~2.61 s latency, and NasoVoce still leaves fully streaming operation and physiological calibration for future work.

SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography · SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks · NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

vocabulary5 source reviews

Open vocabulary is still weak

The current reviews still describe letter-only text entry, four-command ultrasound control, small closed vocabularies, and high open-vocabulary error rates as the main language-coverage blockers.

SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography · SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks · Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency · End-to-end Silent Speech Recognition with Acoustic Sensing · Digital Voicing of Silent Speech

latency4 source reviews

Real-time interaction is still not proven

The fast path still lacks live deployment evidence: SottoVoce is too slow for real time, FastLTS is not a live camera-to-audio system, and NasoVoce still treats streaming as future work.

SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks · NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction · FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis · Video-Driven Speech Reconstruction using Generative Adversarial Networks

generalization4 source reviews

Generalization across users and conditions remains thin

Several papers still rely on speaker-dependent training, small cohorts, or controlled corpora, so cross-user robustness and in-the-wild behavior remain unproven.

Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks · Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks · Video-Driven Speech Reconstruction using Generative Adversarial Networks · End-to-end Silent Speech Recognition with Acoustic Sensing