2020 · arXiv / imported corpus page · Field expert review · confidence high

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

Yu-Wen Chen, Kuo-Hsuan Hung, You-Jin Li, Alexander Kang, Ya‐Hsin Lai, Kai-Chun Liu, Szu‐Wei Fu, Syu‐Siang Wang, Yu Tsao

DOI arXiv

Strong mobile speech-processing app paper, not SSI.

Verdict: full-text draftPriority: mediumConfidence: highBasis: full textCoverage: high

Reading guidance

Verdict: full-text draft · priority medium · confidence high
Why it matters: Solid mobile speech-processing integration paper, but it is not an SSI contribution.
What to trust: Basis: full text. Coverage: high. 3 evidence records back the review.
What is weak: Cloud-backed inference and task-specific evaluation mean the app is not a silent-speech or low-resource on-device solution. Metrics are speech-processing centric and do not establish broad real-world robustness beyond the tested SNR and scene settings. Demonstrated as a mobile speech app, but not as a silent-speech interface and not fully on-device. Audible speech enhancement, adaptation, and noise conversion only. Overclaim risk: medium.
Read before: SSI review rubric
Read next: SSI archive

Axes

Task: speech-enhancement
Modality: acoustic
Hardware: microphone
Output: speech-audio
Metrics: Model adaptation improved STOI by 5.06%, 2.94%, and 5.84% and PESQ by 12.48%, 3.32%, and 11.24% for MA(N), MA(S), and MA(N+S) over the FCN baseline; machine-evaluation summary reports BNC accuracy above 90% with CCR dropping when enhanced speech replaces clean speech
Evaluation mode: objective speech metrics, human listening tests, acoustic-scene classification, and ASR-based machine evaluation
Review confidence: high
Overclaim risk: medium

Expert take

The full text supports a real integration contribution: CITISEN exposes speech enhancement, model adaptation, and background-noise conversion through a mobile application rather than only as isolated models. The results are meaningful for speech-processing deployment, especially the consistent STOI/PESQ gains from adaptation and the >90% BNC scene-accuracy summary. But the scope is audible speech enhancement and noise conversion, not silent-speech sensing or reconstruction, so it should not be presented as an SSI advance.

True value

Solid mobile speech-processing integration paper, but it is not an SSI contribution.

What changed

Canon before

Speech-enhancement work often reported model gains without integrating enhancement, adaptation, and controllable background conversion into a user-facing mobile workflow.

Delta from canon

Packages enhancement, personalized adaptation, and background-noise conversion into a mobile app backed by cloud inference.

Position in field

Speech-processing mobile application adjacent to SSI only through assistive speech enhancement.

Evidence

“ INDEX TERMS speech enhancement, model adaptation, background noise conversion, deep learning, mobile application. ”

author_claim · ABSTRACT · confidence 0.98

“ 5.06%, 2.94%, and 5.84% in terms of STOI, and relative improvements of 12.48%, 3.32%, and 11.24%, in terms of a: Results of human evaluation PESQ, respectively, as compared to the baseline. ”

metric · TABLE 6. Average STOI and PESQ scores for different SE models over -2, 0, · confidence 0.97

“ CITISEN USER INTERFACE AND USAGE studies [79], [80] have shown that some level of noise con- CITISEN has four pages: “speech enhancement,” “back- tained in the referenced target can also lead to an effective ground noise conversion,” “uploading,” and “recording,” as reconstruction of the clean waveform in an SE system. ”

deployment_claim · V. CONCLUSION · confidence 0.95

Limits

Technical limits

Cloud-backed inference and task-specific evaluation mean the app is not a silent-speech or low-resource on-device solution.

Evaluation limits

Metrics are speech-processing centric and do not establish broad real-world robustness beyond the tested SNR and scene settings.

Deployment limits

Demonstrated as a mobile speech app, but not as a silent-speech interface and not fully on-device.

Scope limits

Audible speech enhancement, adaptation, and noise conversion only.