Acoustic
This page groups the current SSI review database by the real `modality:` tag `modality:acoustic`.
The list below includes every paper page that currently carries this technique label.
Papers
NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction
A strong deployment-focused speech interface leveraging a novel nose-pad dual-sensor configuration and multimodal fusion to enable robust low-audibility speech interaction with AI under noise, backed by extensive evaluation.
Distributed pressure matching strategy using diffusion adaptation
Distributed rootless pressure matching for personal sound zones is presented and validated in simulation, not an SSI paper.
Advancing Test-Time Adaptation for Acoustic Foundation Models in Open-World Shifts
Strong acoustic ASR paper proposing confidence-weighted frame adaptation plus temporal consistency regularization for stable test-time adaptation under wild acoustic conditions, yielding substantial WER improvements across noise, accents, and singing datasets.
Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data
Strong clinical fluency regression method validated on noisy read speech from aphasia patients; outside core SSI modalities and use-cases.
Exploring how a Generative AI interprets music
A thorough interpretability analysis reveals that MusicVAE uses only a few dozen latent dimensions to encode music with pitch and rhythm strongly represented in the first two, but the work has no direct relevance to silent speech interfaces.
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Strong AVS result, outside SSI: the useful idea is audio-conditioned decoder queries plus dynamic mask prediction.
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
The real gain is not 'diffusion' alone but aligned conditioning plus guidance that pushes synchronization very hard.
Conditional Generation of Audio from Video via Foley Analogies
The paper matters because it gives V2A generation a controllable exemplar, not because it beats every timing baseline.
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Strong whisper-conversion paper, but it remains whisper-based rather than truly silent SSI.
Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation
Strong causal PSE paper, not SSI. The pVAD-guided loss is the part that holds up under full-text reading.
An Anchor-Free Detector for Continuous Speech Keyword Spotting
Strong CSKWS paper, not SSI. The detection framing and unknown class are the points that hold up in full text.
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Strong evidence that silence segments in HuBERT representations uniquely store speaker information, improving SID accuracy when silence is augmented; analytical SSL probing paper outside silent speech interface field.
Listen only to me! How well can target speech extraction handle false alarms?
Strong paper for false-alarm handling in TSE, wrong domain if someone tries to count it as SSI progress.
Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals
Sound classification paper, not SSI.
SA-SDR: A novel loss function for separation of meeting style data
Elegant loss fix, not SSI.
Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits
Useful separation engineering, not silent speech.
HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation
Solid time-domain music vocal separation paper with a novel hybrid masking-denoising design showing improved silent-segment suppression; not relevant to SSI applications.
End-to-end Silent Speech Recognition with Acoustic Sensing
Strong mobile-friendly acoustic SSI paper.
X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network
Strong time-domain target-speaker extraction using speaker verification and innovative training; improves robustness to absent target but remains speech extraction, not silent speech.
Listening to Sounds of Silence for Speech Denoising
Strong denoising work, not SSI.
End-to-End Speaker-Dependent Voice Activity Detection
Strong target-speaker VAD paper, not SSI.
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
Strong AV speech survey, not an SSI system paper.
CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
Strong mobile speech-processing app paper, not SSI.
Learning Frame Level Attention for Environmental Sound Classification
Strong ESC paper, but outside SSI.
Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed
This work delivers an improved waveform source separation model combined with a novel remix-based semi-supervised learning scheme using unlabeled music. Though not related to silent speech, it advances music separation benchmarks by closing gaps to spectrogram methods.
Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification
The proposed frame-level attention integrated within a convolutional recurrent network effectively improves environmental sound classification accuracy on ESC benchmarks by focusing on informative temporal frames while suppressing irrelevant or silent ones.
All-neural online source separation, counting, and diarization for meeting analysis
Strong online diarization/separation paper, but outside SSI.
Audio Spectrogram Factorization for Classification of Telephony Signals below the Auditory Threshold
Strong telephony anti-SPAM paper, not SSI.
Cross-modal Embeddings for Video and Audio Retrieval
Useful multimodal retrieval baseline, not SSI.
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Strong audiovisual speech separation and enhancement leveraging face video for speaker-dependent masking; not a silent speech interface paper.