2020 · arXiv / imported corpus page · Field expert review · confidence high

End-to-end Silent Speech Recognition with Acoustic Sensing

Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao

arXiv

Strong mobile-friendly acoustic SSI paper.

Verdict: full-text draftPriority: highConfidence: highBasis: full textCoverage: high

Reading guidance

Verdict: full-text draft · priority high · confidence high
Why it matters: Real SSI contribution: it shows commodity acoustic sensing can support non-invasive silent speech recognition beyond same-user laboratory memorization.
What to trust: Basis: full text. Coverage: high. 3 evidence records back the review.
What is weak: Vocabulary remains small, evaluation is on a collected research dataset, and practical latency or always-on robustness are not established. Only 54 sentences are covered, and the unseen-sentence split still operates inside that limited corpus design. Promising for smart devices, but field robustness, power cost, and broader-vocabulary behavior remain unresolved. Acoustic silent speech recognition from lip movements only. Overclaim risk: medium.
Read before: SSI review rubric
Read next: SSI archive

Axes

Task: speech-recognition
Modality: acoustic
Hardware: smartphone speaker and microphone
Body site: lip
Output: text
Vocabulary: 54 fixed sentences with unseen-sentence split
Metrics: WER is 2.6% in domain-dependent testing, 8.4% average in domain-independent testing, and 8.1% in unseen-sentence testing; the worst unseen-sentence WER shown in the Top-10 list is 18.2%
Evaluation mode: domain-dependent, domain-independent, unseen-sentence, and CTC comparison WER evaluation
Review confidence: high
Overclaim risk: medium

Expert take

The full text supports a substantial SSI claim: with phase and double-delta features plus an attention decoder, the system can recognize silent sentences from active acoustic reflections at nontrivial accuracy across users and environments. The domain-independent and unseen-sentence numbers are strong enough to make the modality credible, especially given the non-invasive smartphone-style hardware. The remaining limitation is scope: the dataset is still small, vocabulary coverage is only 54 sentences, and the paper does not prove real-world always-on deployment.

True value

Real SSI contribution: it shows commodity acoustic sensing can support non-invasive silent speech recognition beyond same-user laboratory memorization.

What changed

Canon before

Acoustic sensing for silent speech mostly targeted simpler gesture-style lip sensing and often relied on hand-crafted pipelines rather than sentence-level end-to-end recognition.

Delta from canon

Builds a sentence-level silent speech recognizer from reflected acoustic phase features and shows cross-domain and unseen-sentence performance on a smartphone-like setup.

Position in field

Core acoustic-sensing SSI paper focused on mobile-compatible silent speech recognition.

Evidence

“ We make sure that the training data tion method, which uses the inaudible acoustic signals gener- and the testing data are collected from different users and ated by smart devices for lip-reading. ”

author_claim · 5. CONCLUSION · confidence 0.98

“ The WER under domain-dependent, domain- Unseen sentences test: We also evaluate the performance independent, and unseen sentence tests are 2.6%, 8.4%, and in translating unseen sentences (sentences not in the training 8.1%, respectively, demonstrating the feasibility and effec- set). ”

metric · 5. CONCLUSION · confidence 0.98

“ In the training which can shorten long-term dependencies between the begin- phase, we perform data augmentation for each lip commands ning of the signal stream and sentence labels, as shown in [23]. with 10 different scaling factor α, meaning that the number of We denote the final output matrix as O = [oN , oN −1 , ..., o1 ], samples increases 10 times the original one. ”

validation_scope · 4.3. Evaluation and Performance · confidence 0.96

Limits

Technical limits

Vocabulary remains small, evaluation is on a collected research dataset, and practical latency or always-on robustness are not established.

Evaluation limits

Only 54 sentences are covered, and the unseen-sentence split still operates inside that limited corpus design.

Deployment limits

Promising for smart devices, but field robustness, power cost, and broader-vocabulary behavior remain unresolved.

Scope limits

Acoustic silent speech recognition from lip movements only.