modality:acoustic 30 pages 30 reviewed 0 imported

Acoustic

This page groups the current SSI review database by the real `modality:` tag `modality:acoustic`.

The list below includes every paper page that currently carries this technique label.

Papers

reviewedCHI '26 / arXiv2026

NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

Jun Rekimoto, Yu Nishimura, Bojian Yang

A strong deployment-focused speech interface leveraging a novel nose-pad dual-sensor configuration and multimodal fusion to enable robust low-audibility speech interaction with AI under noise, backed by extensive evaluation.

reviewedarXiv / imported corpus page2023

Distributed pressure matching strategy using diffusion adaptation

Mengfei Zhang, Junqing Zhang, Jie Chen, Cédric Richard

Distributed rootless pressure matching for personal sound zones is presented and validated in simulation, not an SSI paper.

reviewedarXiv / imported corpus page2023

Advancing Test-Time Adaptation for Acoustic Foundation Models in Open-World Shifts

Andy Clark

Strong acoustic ASR paper proposing confidence-weighted frame adaptation plus temporal consistency regularization for stable test-time adaptation under wild acoustic conditions, yielding substantial WER improvements across noise, accents, and singing datasets.

reviewedarXiv / imported corpus page2023

Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data

Lionel Fontan, Typhanie Prince, Aleksandra Nowakowska, Halima Sahraoui, Silvia Martínez‐Ferreiro

Strong clinical fluency regression method validated on noisy read speech from aphasia patients; outside core SSI modalities and use-cases.

reviewedarXiv / imported corpus page2023

Exploring how a Generative AI interprets music

Gabriela Barenboim, Luigi Del Debbio, Johannes Hirn, Verónica Sanz

A thorough interpretability analysis reveals that MusicVAE uses only a few dozen latent dimensions to encode music with pitch and rhythm strongly represented in the first two, but the work has no direct relevance to silent speech interfaces.

reviewedarXiv / imported corpus page2023

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang

Strong AVS result, outside SSI: the useful idea is audio-conditioned decoder queries plus dynamic mask prediction.

reviewedarXiv / imported corpus page2023

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao

The real gain is not 'diffusion' alone but aligned conditioning plus guidance that pushes synchronization very hard.

reviewedarXiv / imported corpus page2023

Conditional Generation of Audio from Video via Foley Analogies

Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens

The paper matters because it gives V2A generation a controllable exemplar, not because it beats every timing baseline.

reviewedProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 20232023

WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions

Jun Rekimoto

Strong whisper-conversion paper, but it remains whisper-based rather than truly silent SSI.

reviewedarXiv / imported corpus page2022

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Hassan Taherian, Şefik Emre Eskimez, Takuya Yoshioka

Strong causal PSE paper, not SSI. The pVAD-guided loss is the part that holds up under full-text reading.

reviewedarXiv / imported corpus page2022

An Anchor-Free Detector for Continuous Speech Keyword Spotting

Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Strong CSKWS paper, not SSI. The detection framing and unknown class are the points that hold up in full text.

reviewedarXiv / imported corpus page2022

Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information

Chi-Luen Feng, Po‐Chun Hsu, Hung-yi Lee

Strong evidence that silence segments in HuBERT representations uniquely store speaker information, improving SID accuracy when silence is augmented; analytical SSL probing paper outside silent speech interface field.

reviewedarXiv / imported corpus page2022

Listen only to me! How well can target speech extraction handle false alarms?

Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Kateřina Žmolíková, Hiroshi Satō, Tomohiro Nakatani

Strong paper for false-alarm handling in TSE, wrong domain if someone tries to count it as SSI progress.

reviewedICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 561-5652022

Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals

Xingyu Chen, Qiushi Zhu, Jie Zhang, Li-Rong Dai

Sound classification paper, not SSI.

reviewedarXiv / imported corpus page2021

SA-SDR: A novel loss function for separation of meeting style data

Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb‐Umbach

Elegant loss fix, not SSI.

reviewedarXiv / imported corpus page2021

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

Qingjian Lin, Lin Yang, Xuyang Wang, Luyuan Xie, Jia Chen, Junjie Wang

Useful separation engineering, not silent speech.

reviewedarXiv / imported corpus page2021

HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation

Christos Garoufis, Athanasia Zlatintsi, Petros Maragos

Solid time-domain music vocal separation paper with a novel hybrid masking-denoising design showing improved silent-segment suppression; not relevant to SSI applications.