IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases
This paper introduces FERASEC, a novel radar feature extraction enabling the first contactless IR-UWB radar phoneme-level silent speech recognition with 86% vowel and 81% consonant accuracy, surpassing raw signal baselines and signifying a key advance in practical silent speech interfaces.
Reading guidance
- Verdict
- full-text draft · priority high · confidence high
- Why it matters
- A radar-specialized feature extraction and system design that enables realistic contactless phoneme-level silent speech recognition, a milestone beyond previous radar SSR word-only demos, verified with a large experimental dataset and classifier comparisons.
- What to trust
- Basis: full text. Coverage: high. 9 evidence records back the review.
- What is weak
- Performance affected by articulator alignment; signal blockage by teeth can obscure tongue motion with upper radar; FERASEC depends on effective clutter mitigation and engineered transforms. Controlled lab study with prompted closed-set items, limited speaker diversity (mostly Korean with American English speakers), no open vocabulary or conversational speech tested. Needs robust user-position handling, mitigating signal blockage (e.g. teeth), and hardware integration to small devices for daily use. Closed-set prompted isolated and short phrase recognition; no open vocabulary, spontaneous speech, or broader speaker variability tested. Overclaim risk: medium.
- Read before
- SSI review rubric
- Read next
- SSI archive
Axes
- Task
- speech-recognition
- Modality
- radar
- Hardware
- Upper and lower IR-UWB radar sensors; upper uses patch antennas facing lips, lower uses sinuous antennas below chin.
- Body site
- lip
- Output
- text
- Vocabulary
- 8 vowels, 11 consonants, 25 words, and 12 phrases
- Metrics
- Classification accuracy evaluated via leave-one-out cross-validation across 20 reps per phoneme/word/phrase; vowels achieved 86.47%, consonants 81.59%, words 88.95%, phrases 96.88% with FERASEC + DNN-HMM upper radar.
- Evaluation mode
- Leave-one-out cross-validation across repeated prompted articulation; separate experiments per speech unit type and radar position.
- Review confidence
- high
- Overclaim risk
- medium
Expert take
This work presents the first successful demonstration of contactless phoneme-level silent speech recognition using IR-UWB radar by introducing FERASEC, a radar-specific feature extraction method that transforms raw 2D radar frames into abbreviated envelopes, effectively capturing articulator movements despite noise and clutter. Comprehensive evaluations on 8 vowels, 11 consonants, 25 words, and 12 phrases show classification accuracies up to 86.47% for vowels and 81.59% for consonants using DNN–HMM classifiers with radar antenna positioned near the lips, outperforming baseline methods and prior radar SSR studies. The use of two radar placements reveals that upper lip-positioned radar better captures critical lip motions, necessary for phoneme discrimination. Although raw radar data or end-to-end deep learning on raw input perform poorly (<50%), engineered FERASEC features enable robust recognition. Controlled prompted speech data from 20 participants supports strong validation of phoneme recognition capability, although conversational and open-vocabulary scenarios remain untested. The study insightfully discusses practical deployment issues such as sensor alignment aiding and potential integration into consumer devices, marking a significant advance toward real-world contactless silent speech interfaces.
True value
A radar-specialized feature extraction and system design that enables realistic contactless phoneme-level silent speech recognition, a milestone beyond previous radar SSR word-only demos, verified with a large experimental dataset and classifier comparisons.
What changed
Canon before
Prior radar-based SSR showed limited small word-level demos with weak or unproven phoneme-level contactless recognition.
Delta from canon
Introduces FERASEC and demonstrates meaningful contactless IR-UWB radar SSR accuracy on phonemes, words, and phrases with radar antenna placement and aiding logic considerations.
Position in field
Core contactless SSI work establishing radar phoneme recognition feasibility with pragmatic sensor placement and aiding algorithms.
Evidence
“ to FMCW radar-based SSR up to the present date. tactless radar-based SSR of phonemes, including both Leveraging the high-performance potential of IR-UWB vowels and consonants. radar, our study aims to accomplish phoneme-level speech • A novel speech feature extraction algorithm was pro- recognition. ”
author_claim · ABSTRACT · confidence 0.95
“ SPEECH STIMULI AND PARTICIPANTS mary focus of our study and more challenging than word or All speech stimuli (8 vowels, 11 consonants, 25 words, sentence recognition, were evaluated with data from all 20 and 12 phrases) used in this study are based on [9]. participants. ”
fact · A SPEECH STIMULI · confidence 0.99
“ MOTIVATION OF FERASEC The motivation behind our feature extraction algorithm, zk = vk − v̄ (13) which converts the frame set into an abbreviated envelope of 1 ⌊MN/D⌋ X the concatenated frames, is rooted in the inefficacy of raw IR- v̄ = vk (14) UWB radar data as a representation (feature) for SSR. ”
actual_novelty · A PROPOSED FEATURE EXTRACTION ALGORITHM · confidence 0.98
“ Each frame in the set represents the ⌊M/4⌋. normalized signal amplitudes corresponding to 256 fast- Two individual one-dimensional speech feature sequences time indices that indicate the target distance from the radar. are generated by applying the same transformation algorithm Therefore, the information regarding articulator movements to the raw frame set and its clutter-reduced frame set. ”
actual_novelty · A PROPOSED FEATURE EXTRACTION ALGORITHM · confidence 0.98
“ SPEECH STIMULI AND PARTICIPANTS mary focus of our study and more challenging than word or All speech stimuli (8 vowels, 11 consonants, 25 words, sentence recognition, were evaluated with data from all 20 and 12 phrases) used in this study are based on [9]. participants. ”
validation_scope · A SPEECH STIMULI · confidence 0.98
“ Our study achieved average classifi- back vowel /o/ can be attributed to its requirement of distinct cation accuracies of 86.47%, 81.59%, 88.95%, and 96.88% lip protrusion, which can be effectively detected by the upper for the vowels, consonants, words, and phrases, respectively. ”
metric · HMM · confidence 0.99
“ Since our Before developing the proposed feature extraction algorithm study suggests a single radar placed in front of the lips for (i.e., FERASEC), we attempted end-to-end deep learning, a IR-UWB radar-based SSR, potential use cases are illustrated method that does not rely on explicitly engineered features, in Fig. ”
metric · 1 Necessity of Developing a Feature Extraction Algorithm for IR · confidence 0.98
“ To implement an IR-UWB radar- without the need to place sensors inside the oral cavity. based contactless SSR system, Shin and Seo [28] proposed Although these techniques are more convenient than the a method to extract the distance and correlation amplitude aforementioned ones, they have some shortcomings. from raw radar measurements as speech features for SSR. ”
deployment_claim · BASED SSR STUDIES · confidence 0.95
“ 3 or 4 during silent pronunciation, it remains chal- “Good-bye,” “I don’t know,” and “What happened?” lenging to define and extract suitable speech features from this data that facilitate the recognition of phonemes within Twenty participants (13 males and 7 females), aged be- the silently uttered sentence “How are you doing?”. tween 20 and 28, were recruited for the experiment. ”
deployment_claim · 2 IR · confidence 0.95
Limits
Technical limits
Performance affected by articulator alignment; signal blockage by teeth can obscure tongue motion with upper radar; FERASEC depends on effective clutter mitigation and engineered transforms.
Evaluation limits
Controlled lab study with prompted closed-set items, limited speaker diversity (mostly Korean with American English speakers), no open vocabulary or conversational speech tested.
Deployment limits
Needs robust user-position handling, mitigating signal blockage (e.g. teeth), and hardware integration to small devices for daily use.
Scope limits
Closed-set prompted isolated and short phrase recognition; no open vocabulary, spontaneous speech, or broader speaker variability tested.