2020 · arXiv / imported corpus page · Field expert review · confidence high

Digital Voicing of Silent Speech

David Gaddy, Dan Klein

arXiv

Core EMG SSI paper with real gains from target transfer.

Verdict: full-text draftPriority: highConfidence: highBasis: full textCoverage: high

Reading guidance

Verdict: full-text draft · priority high · confidence high
Why it matters: Strong EMG SSI paper showing that silent EMG can produce intelligible speech if training explicitly handles the mismatch between silent and vocalized articulations.
What to trust: Basis: full text. Coverage: high. 3 evidence records back the review.
What is weak: Speaker-dependent setup with substantial data collection and still-high WER in open-vocabulary conditions. Open-vocabulary outputs remain far from production quality, and all results come from the authors' data collection setup. Requires facial EMG instrumentation and a speaker-specific training pipeline. Silent EMG to speech reconstruction only. Overclaim risk: medium.
Read before: SSI review rubric
Read next: SSI archive

Axes

Task: speech-reconstruction
Modality: surface emg
Hardware: facial EMG electrodes
Body site: face; jaw; throat
Output: speech-audio
Vocabulary: closed vocabulary dates/times plus open vocabulary book text
Metrics: Closed-vocabulary human WER reaches 3.6% with a 94% relative error reduction from the strongest baseline; open-vocabulary human WER drops from 95.1% to 74.8%; automatic open-vocabulary WER drops from 91.2% to 68.0%
Evaluation mode: human transcription WER, automatic ASR WER, and ablations on data size and electrode subsets
Review confidence: high
Overclaim risk: medium

Expert take

The full text backs a strong SSI claim: the target-transfer pipeline is the key step that makes silent EMG viable for speech reconstruction instead of merely hoping a model trained on vocalized EMG will transfer. The closed-vocabulary result is genuinely strong, and the open-vocabulary numbers remain difficult but materially better than baseline. The main remaining weakness is generalization: the setup is speaker-dependent, performance in open vocabulary is still rough, and the system depends on relatively heavy data collection and alignment machinery.

True value

Strong EMG SSI paper showing that silent EMG can produce intelligible speech if training explicitly handles the mismatch between silent and vocalized articulations.

What changed

Canon before

Prior EMG-to-speech work largely trained on vocalized EMG and transferred poorly to silent EMG because aligned speech targets were missing.

Delta from canon

Introduces target transfer, CCA alignment, and predicted-audio refinement so silent EMG can train a speech generator directly.

Position in field

Core EMG silent-speech reconstruction paper with clear methodological relevance to SSI.

Evidence

“ Our solution is to A substantial body of prior work has explored adopt a target-transfer approach, where audio out- the use of facial EMG for silent speech-to-text in- put targets are transferred from vocalized record- terfaces (Jou et al., 2006; Schultz and Wand, 2010; ings to silent recordings of the same utterances. ”

author_claim · Abstract · confidence 0.99

“ Our solution is to A substantial body of prior work has explored adopt a target-transfer approach, where audio out- the use of facial EMG for silent speech-to-text in- put targets are transferred from vocalized record- terfaces (Jou et al., 2006; Schultz and Wand, 2010; ings to silent recordings of the same utterances. ”

actual_novelty · 3.2 Audio Target Transfer · confidence 0.98

“ Note For comparison, we record data from two do- that during testing, we use only the silent EMG mains: a closed vocabulary and open vocabulary recordings ES , so the vocalized recordings of the condition, which are described in Sections 2.1 and test utterances are unused. ”

metric · 4.1 Closed Vocabulary Condition · confidence 0.98

Limits

Technical limits

Speaker-dependent setup with substantial data collection and still-high WER in open-vocabulary conditions.

Evaluation limits

Open-vocabulary outputs remain far from production quality, and all results come from the authors' data collection setup.

Deployment limits

Requires facial EMG instrumentation and a speaker-specific training pipeline.

Scope limits

Silent EMG to speech reconstruction only.