2022 · CHI '22 · Kimura expert review · confidence high

SilentSpeller: Towards mobile, hands-free, silent speech text entry using electropalatography

Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Zixiong Su, Alex Olwal, Jun Rekimoto, Thad Starner

DOI

SilentSpeller is a strong, rigorously tested SSI system that reframes silent speech as silent spelling, enabling large vocabulary, live text entry, and walking robustness with in-mouth electropalatography sensors.

Verdict: full-text draftPriority: highConfidence: highBasis: full text + existing expert seedCoverage: high

Reading guidance

Verdict: full-text draft · priority high · confidence high
Why it matters: The paper's main contribution is the novel problem reframing from silent speech to silent spelling using an electropalatography retainer, which yields practical live text entry over large vocabularies including unseen words, with empirical validation of robustness to walking and comparison to mainstream mobile text input.
What to trust: Basis: full text + existing expert seed. Coverage: high. 8 evidence records back the review.
What is weak: Recognition confusions occur mainly for letters with similar palatograms, especially EE-sound letters (B/P, D/T/Z). Strong user-dependence; user-independent recognition remains poor. Offline experiments rely on data from only two main participants for tuning; live text entry and walking tests include seven users but under constrained phrase tasks; vocabulary is limited to English letters and space without punctuation or capitalization. The system requires a custom-fitted in-mouth SmartPalate retainer with 124 electrodes connected by a wired (now partially wireless prototype) interface. The retainer remains obtrusive, and the user-dependent training required limits scalability and ease of deployment. The system targets discreet text entry, explicitly trading away naturalness of silent speech for reliability; not a conversational silent speech system. Overclaim risk: low-medium.
Read before: SSI review rubric
Read next: SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks

Axes

Task: text-entry using silent spelling
Modality: electropalatography
Hardware: SmartPalate custom dental retainer with 124 capacitive electrodes sampled at 100 Hz, connected wired or wireless to processing device.
Body site: palate; tongue
Output: text
Vocabulary: Dictionary-based silent spelling with triletter HMM decoding, phrase composition, and bigram correction
Metrics: Offline HMM accuracy: ~97% character, 92% word; Unseen word offline test: 94.5% character, 85.5% word accuracy; Walking and seated phrase recognition: ~97.5% vs 96.5% character accuracy; Live text entry average 37 words per minute at 87% accuracy, best participant 53 wpm at 91%.
Evaluation mode: Offline isolated word recognition with 10-fold cross validation; reserve testing on 100 unseen words; seated vs walking phrase recognition; live interactive text entry with push-to-talk interface and edit gestures.
Review confidence: high
Overclaim risk: low-medium

Expert take

SilentSpeller offers a carefully validated alternative to classic silent speech interfaces by changing the recognition task from continuous silent speech to discrete silent spelling. This reframing produces a more structured signal that, together with electropalatography sensors and a user-dependent HMM recognizer, supports a large vocabulary of 1164 words with high offline accuracy. The system uniquely supports live interactive text entry at speeds around 37 wpm with 87% accuracy, including robust performance while walking, demonstrating tolerance to motion artifacts. Limitations include the requirement for custom dental impressions, obtrusive mouth hardware, strong user dependence for training, and struggles with user independence and social acceptability. Overall, the work advances SSI toward viable mobile, hands-free text entry applications in privacy-sensitive or hands-busy scenarios, providing extensive empirical evidence backing claims.

True value

The paper's main contribution is the novel problem reframing from silent speech to silent spelling using an electropalatography retainer, which yields practical live text entry over large vocabularies including unseen words, with empirical validation of robustness to walking and comparison to mainstream mobile text input.

What changed

Canon before

Most silent speech interfaces were limited to small vocabularies (~100 words), stationary use, and offline, non-interactive experiments, with little evidence for practical live text entry.

Delta from canon

The key change is reframing the task from silent speech recognition to silent spelling recognition, allowing a larger vocabulary (1164 offline words), robust unseen-word generalization, tolerance to walking motion, and live hands-free text entry at reasonable speeds (~37 wpm average).

Position in field

One of the clearest practical SSI task-reframing papers to date; less natural than silent speech but more usable for mobile text entry.

Evidence

“ SilentSpeller achieves an average of easily recognized than silent speech, allowing larger vocabularies 94% accuracy (86% word). (1164 words in this work) and on-the-go interaction. ”

author_claim · 1 INTRODUCTION · confidence 1.00

“ Character (word) accuracy Transformer 37% (9.1%) 34% (8.8%) For the SilentSpeller use cases of silent text entry while mobile, or Table 3: Average 10-fold, cross-validation, user-dependent for people with movement disorders, one or two hours of training word accuracy on 2328 isolated words, 1164 unique, using data is quite reasonable, especially since such use cases may often HMMs and deep learning Transformers. use a limited vocabulary [28, 38]. ”

metric · TUNING MODELS · confidence 1.00

“ Figure 1: a) A SilentSpeller user wears the SmartPalate retainer whose 124 electrodes sense the position of the tongue at 100 Hz. ”

fact · 3.2 SMARTPALATE · confidence 1.00

“ Character (word) accuracy Transformer 37% (9.1%) 34% (8.8%) For the SilentSpeller use cases of silent text entry while mobile, or Table 3: Average 10-fold, cross-validation, user-dependent for people with movement disorders, one or two hours of training word accuracy on 2328 isolated words, 1164 unique, using data is quite reasonable, especially since such use cases may often HMMs and deep learning Transformers. use a limited vocabulary [28, 38]. ”

validation_scope · TUNING MODELS · confidence 1.00

“ The current system can be made wearable for testing; Figure 1a Early Identification of Recognizer Success shows such a system constructed using a Vufine head worn dis- P2 P5 play, the Smart Palate, and the support hardware in a backpack. ”

deployment_claim · 3.2 SMARTPALATE · confidence 1.00

“ Character (word) accuracy Transformer 37% (9.1%) 34% (8.8%) For the SilentSpeller use cases of silent text entry while mobile, or Table 3: Average 10-fold, cross-validation, user-dependent for people with movement disorders, one or two hours of training word accuracy on 2328 isolated words, 1164 unique, using data is quite reasonable, especially since such use cases may often HMMs and deep learning Transformers. use a limited vocabulary [28, 38]. ”

deployment_claim · 4.3 TUNING USER DEPENDENT RECOGNIZERS · confidence 1.00

“ We have reason to be optimistic: the words from the 107 phrases collected during the seated condition. tongue is relatively isolated from the mechanical shock of walking Similarly, the recognizer for the seated condition was trained with (otherwise, voiced speech while walking would not be possible) the 2328 dictionary words plus the 556 words from the 107 phrases and SilentSpeller’s electrode array fits snugly in the mouth such collected during the walking condition. ”

deployment_claim · 8 DISCUSSION · confidence 1.00

“ Similarly, of 124 binary electrode values is projected to the top 16 principal SilentSpeller focuses on recognizing the 26 letters of the alphabet, components. ”

fact · 3.3 RECOGNIZER PIPELINE · confidence 1.00

Limits

Technical limits

Recognition confusions occur mainly for letters with similar palatograms, especially EE-sound letters (B/P, D/T/Z). Strong user-dependence; user-independent recognition remains poor.

Evaluation limits

Offline experiments rely on data from only two main participants for tuning; live text entry and walking tests include seven users but under constrained phrase tasks; vocabulary is limited to English letters and space without punctuation or capitalization.

Deployment limits

The system requires a custom-fitted in-mouth SmartPalate retainer with 124 electrodes connected by a wired (now partially wireless prototype) interface. The retainer remains obtrusive, and the user-dependent training required limits scalability and ease of deployment.

Scope limits

The system targets discreet text entry, explicitly trading away naturalness of silent speech for reliability; not a conversational silent speech system.