Major silent speech approaches compared.
The table below stays inside the expert review records. It compares sensing, evidence strength, evaluation, practicality, and open questions without inventing a ranking.
Each row links back to the source review page so every claim stays traceable.
Comparison snapshot
This page uses only fields already stored in the review database. Missing fields are shown as "Not stated in review" instead of being filled in by guesswork.
| Paper | Evidence strength | Sensing modality | Evaluation setting | Practicality | Open questions |
|---|---|---|---|---|---|
| SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography | Confidence high · 8 evidence records | electropalatography · SmartPalate custom dental retainer with 124 capacitive electrodes sampled at 100 Hz, connected wired or wireless to processing device. | Offline isolated word recognition with 10-fold cross validation; reserve testing on 100 unseen words; seated vs walking phrase recognition; live interactive text entry with push-to-talk interface and edit gestures. | High for privacy-sensitive communication and hands-busy users; useful where speech is socially inappropriate and users can manage oral hardware. | user_independence; comfort; social_acceptability; broader_symbol_input · The system targets discreet text entry, explicitly trading away naturalness of silent speech for reliability; not a conversational silent speech system. · Recognition confusions occur mainly for letters with similar palatograms, especially EE-sound letters (B/P, D/T/Z). Strong user-dependence; user-independent recognition remains poor. |
| SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks | Confidence high · 7 evidence records | ultrasound · 3.5 MHz convex ultrasound probe attached under the jaw, with ultrasound images captured to display monitor and digitized video stored | Quantitative smart speaker success rates, word error rates with Google speech-to-text, and qualitative user adaptation observations. | medium as a systems design contribution and research direction; low-to-medium as a direct deployable interface in reported form | real_time_interaction; open_vocabulary; speaker_independence; wearable_ultrasound · Prototype supports only a fixed small command vocabulary in speaker-dependent training; no demonstration of open vocabulary or continuous real-time interaction. · Speaker-dependent training; latency unsuitable for real-time use (2.61 s per utterance); differences in silent versus voiced articulation require user adaptation; bulky hardware; potential unknown safety issues with continuous ultrasound emission; small vocabulary size. |
| NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction | Confidence high · 4 evidence records | acoustic; vibration; multimodal · MEMS microphone (Syntiant SPH0141LM4H-1) and MEMS vibration sensor (Syntiant V2S200D) integrated in smart glasses nose pads providing synchronized PDM output. | Quantitative ASR accuracy (WER, CER) on held-out data, objective perceptual quality metrics (PESQ, STOI), MUSHRA subjective ratings with 50 evaluators, and qualitative in-the-wild recordings in four real-world environments. | High for wearable AI voice agents by addressing sensor placement, noise robustness, perceptual quality, and practical evaluation in diverse contexts. | continuous_streaming; adaptive_sensor_gating; longitudinal_wearability; physiological_variability · Targets low-audibility whispered speech, not fully silent speech without any acoustic leakage; assumes hand-covering mouth for privacy. · Fusion model not fully streaming; whispered vibration signals remain weak limiting enhancement quality; performance under extreme noise favors vibration sensor input only at very low SNR. |
Source reviews
SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography
SilentSpeller is a strong, rigorously tested SSI system that reframes silent speech as silent spelling, enabling large vocabulary, live text entry, and walking robustness with in-mouth electropalatography sensors.
SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks
A solid proof of concept that reconstructs speech audio from ultrasound for controlling unmodified smart speakers, showcasing important system design insight despite prototype limitations in latency, hardware bulk, and speaker dependency.
NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction
A strong deployment-focused speech interface leveraging a novel nose-pad dual-sensor configuration and multimodal fusion to enable robust low-audibility speech interaction with AI under noise, backed by extensive evaluation.