← SSI archive · Review rubric

2020 · arXiv / imported corpus page · Field expert review · confidence high

Listening to Sounds of Silence for Speech Denoising

Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, Changxi Zheng

Strong denoising work, not SSI.

Verdict: full-text draftPriority: mediumConfidence: highBasis: full textCoverage: high

Reading guidance

Verdict
full-text draft · priority medium · confidence high
Why it matters
A strong speech-denoising paper with careful ablations and broad evaluation, but it is not a silent-speech interface paper.
What to trust
Basis: full text. Coverage: high. 4 evidence records back the review.
What is weak
The method assumes speech with natural pauses and still requires accurate silent-interval detection to realize the full gain. Real-world tests are qualitative because clean references are unavailable, and the benchmark comparison is centered on denoising corpora rather than SSI tasks. No articulatory sensing, silent communication, or SSI deployment claim is present. Single-channel speech denoising only. Overclaim risk: low.
Read before
SSI review rubric
Read next
SSI archive

Axes

Task
speech-enhancement
Modality
mono speech audio with detected silent intervals
Hardware
microphone
Output
speech-audio
Metrics
On VoiceBank-DEMAND the model reports PESQ 3.16 and STOI 0.98; silent-interval detection reaches F1 0.869 / accuracy 0.918 on DEMAND and F1 0.807 / accuracy 0.873 on AudioSet.
Evaluation mode
multi-dataset denoising benchmark plus silent-interval detection metrics, ablations, VoiceBank-DEMAND comparison, and real-world tests
Review confidence
high
Overclaim risk
low

Expert take

The full text makes the contribution clearer than the abstract alone: the value is not merely that silence exists, but that silent-interval supervision materially improves both interval detection and downstream denoising. Table 2 shows each proposed component matters, and Table 3 shows the method is competitive on VoiceBank-DEMAND despite being designed for harsher SNR ranges. That said, it remains microphone-based speech enhancement rather than silent-speech decoding.

True value

A strong speech-denoising paper with careful ablations and broad evaluation, but it is not a silent-speech interface paper.

What changed

Canon before

Speech denoisers typically treat speech regions directly and do not use naturally occurring pauses as explicit noise probes.

Delta from canon

This paper turns silent-interval detection into a supervisory signal for a two-step denoising pipeline.

Position in field

Adjacent speech-enhancement work outside SSI.

Evidence

“ We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. ”

author_claim · Abstract · confidence 0.97

“ Noise Dataset Method Precision Recall F1 Accuracy Baseline-thres 0.533 0.718 0.612 0.706 DEMAND VAD 0.797 0.432 0.558 0.783 Ours 0.876 0.866 0.869 0.918 Baseline-thres 0.536 0.731 0.618 0.708 Audioset VAD 0.736 0.227 0.338 0.728 Ours 0.794 0.822 0.807 0.873 ”

metric · Table 1 · confidence 0.96

“ Nevertheless, trained and tested under the same setting, our method is highly competitive to the best of those methods under every metric, as shown in Table 3. ”

metric · Table 3 · confidence 0.96

“ 4.6 Tests on real-world data We also test our method against real-world data. ”

limitation · 4.6 Tests on real-world data · confidence 0.92

Limits

Technical limits

The method assumes speech with natural pauses and still requires accurate silent-interval detection to realize the full gain.

Evaluation limits

Real-world tests are qualitative because clean references are unavailable, and the benchmark comparison is centered on denoising corpora rather than SSI tasks.

Deployment limits

No articulatory sensing, silent communication, or SSI deployment claim is present.

Scope limits

Single-channel speech denoising only.