2020 · arXiv / imported corpus page · Field expert review · confidence high

Listening to Sounds of Silence for Speech Denoising

Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, Changxi Zheng

arXiv

Strong denoising work, not SSI.

Verdict: full-text draftPriority: mediumConfidence: highBasis: full textCoverage: high

Reading guidance

Verdict: full-text draft · priority medium · confidence high
Why it matters: A strong speech-denoising paper with careful ablations and broad evaluation, but it is not a silent-speech interface paper.
What to trust: Basis: full text. Coverage: high. 4 evidence records back the review.
What is weak: The method assumes speech with natural pauses and still requires accurate silent-interval detection to realize the full gain. Real-world tests are qualitative because clean references are unavailable, and the benchmark comparison is centered on denoising corpora rather than SSI tasks. No articulatory sensing, silent communication, or SSI deployment claim is present. Single-channel speech denoising only. Overclaim risk: low.
Read before: SSI review rubric
Read next: SSI archive

Axes

Task: speech-enhancement
Modality: mono speech audio with detected silent intervals
Hardware: microphone
Output: speech-audio
Metrics: On VoiceBank-DEMAND the model reports PESQ 3.16 and STOI 0.98; silent-interval detection reaches F1 0.869 / accuracy 0.918 on DEMAND and F1 0.807 / accuracy 0.873 on AudioSet.
Evaluation mode: multi-dataset denoising benchmark plus silent-interval detection metrics, ablations, VoiceBank-DEMAND comparison, and real-world tests
Review confidence: high
Overclaim risk: low

Expert take

The full text makes the contribution clearer than the abstract alone: the value is not merely that silence exists, but that silent-interval supervision materially improves both interval detection and downstream denoising. Table 2 shows each proposed component matters, and Table 3 shows the method is competitive on VoiceBank-DEMAND despite being designed for harsher SNR ranges. That said, it remains microphone-based speech enhancement rather than silent-speech decoding.

True value

A strong speech-denoising paper with careful ablations and broad evaluation, but it is not a silent-speech interface paper.

What changed

Canon before

Speech denoisers typically treat speech regions directly and do not use naturally occurring pauses as explicit noise probes.

Delta from canon

This paper turns silent-interval detection into a supervisory signal for a two-step denoising pipeline.

Position in field

Adjacent speech-enhancement work outside SSI.

Evidence

“ We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. ”

author_claim · Abstract · confidence 0.97

“ Noise Dataset Method Precision Recall F1 Accuracy Baseline-thres 0.533 0.718 0.612 0.706 DEMAND VAD 0.797 0.432 0.558 0.783 Ours 0.876 0.866 0.869 0.918 Baseline-thres 0.536 0.731 0.618 0.708 Audioset VAD 0.736 0.227 0.338 0.728 Ours 0.794 0.822 0.807 0.873 ”

metric · Table 1 · confidence 0.96

“ Nevertheless, trained and tested under the same setting, our method is highly competitive to the best of those methods under every metric, as shown in Table 3. ”

metric · Table 3 · confidence 0.96

“ 4.6 Tests on real-world data We also test our method against real-world data. ”

limitation · 4.6 Tests on real-world data · confidence 0.92

Limits

Technical limits

The method assumes speech with natural pauses and still requires accurate silent-interval detection to realize the full gain.

Evaluation limits

Real-world tests are qualitative because clean references are unavailable, and the benchmark comparison is centered on denoising corpora rather than SSI tasks.

Deployment limits

No articulatory sensing, silent communication, or SSI deployment claim is present.

Scope limits

Single-channel speech denoising only.