Digital Voicing of Silent Speech
Core EMG SSI paper with real gains from target transfer.
Reading guidance
- Verdict
- full-text draft · priority high · confidence high
- Why it matters
- Strong EMG SSI paper showing that silent EMG can produce intelligible speech if training explicitly handles the mismatch between silent and vocalized articulations.
- What to trust
- Basis: full text. Coverage: high. 3 evidence records back the review.
- What is weak
- Speaker-dependent setup with substantial data collection and still-high WER in open-vocabulary conditions. Open-vocabulary outputs remain far from production quality, and all results come from the authors' data collection setup. Requires facial EMG instrumentation and a speaker-specific training pipeline. Silent EMG to speech reconstruction only. Overclaim risk: medium.
- Read before
- SSI review rubric
- Read next
- SSI archive
Axes
- Task
- speech-reconstruction
- Modality
- surface emg
- Hardware
- facial EMG electrodes
- Body site
- face; jaw; throat
- Output
- speech-audio
- Vocabulary
- closed vocabulary dates/times plus open vocabulary book text
- Metrics
- Closed-vocabulary human WER reaches 3.6% with a 94% relative error reduction from the strongest baseline; open-vocabulary human WER drops from 95.1% to 74.8%; automatic open-vocabulary WER drops from 91.2% to 68.0%
- Evaluation mode
- human transcription WER, automatic ASR WER, and ablations on data size and electrode subsets
- Review confidence
- high
- Overclaim risk
- medium
Expert take
The full text backs a strong SSI claim: the target-transfer pipeline is the key step that makes silent EMG viable for speech reconstruction instead of merely hoping a model trained on vocalized EMG will transfer. The closed-vocabulary result is genuinely strong, and the open-vocabulary numbers remain difficult but materially better than baseline. The main remaining weakness is generalization: the setup is speaker-dependent, performance in open vocabulary is still rough, and the system depends on relatively heavy data collection and alignment machinery.
True value
Strong EMG SSI paper showing that silent EMG can produce intelligible speech if training explicitly handles the mismatch between silent and vocalized articulations.
What changed
Canon before
Prior EMG-to-speech work largely trained on vocalized EMG and transferred poorly to silent EMG because aligned speech targets were missing.
Delta from canon
Introduces target transfer, CCA alignment, and predicted-audio refinement so silent EMG can train a speech generator directly.
Position in field
Core EMG silent-speech reconstruction paper with clear methodological relevance to SSI.
Evidence
“ Our solution is to A substantial body of prior work has explored adopt a target-transfer approach, where audio out- the use of facial EMG for silent speech-to-text in- put targets are transferred from vocalized record- terfaces (Jou et al., 2006; Schultz and Wand, 2010; ings to silent recordings of the same utterances. ”
author_claim · Abstract · confidence 0.99
“ Our solution is to A substantial body of prior work has explored adopt a target-transfer approach, where audio out- the use of facial EMG for silent speech-to-text in- put targets are transferred from vocalized record- terfaces (Jou et al., 2006; Schultz and Wand, 2010; ings to silent recordings of the same utterances. ”
actual_novelty · 3.2 Audio Target Transfer · confidence 0.98
“ Note For comparison, we record data from two do- that during testing, we use only the silent EMG mains: a closed vocabulary and open vocabulary recordings ES , so the vocalized recordings of the condition, which are described in Sections 2.1 and test utterances are unused. ”
metric · 4.1 Closed Vocabulary Condition · confidence 0.98
Limits
Technical limits
Speaker-dependent setup with substantial data collection and still-high WER in open-vocabulary conditions.
Evaluation limits
Open-vocabulary outputs remain far from production quality, and all results come from the authors' data collection setup.
Deployment limits
Requires facial EMG instrumentation and a speaker-specific training pipeline.
Scope limits
Silent EMG to speech reconstruction only.