2018 · arXiv / imported corpus page · Field expert review · confidence high

Audio Spectrogram Factorization for Classification of Telephony Signals below the Auditory Threshold

Iroro Orife, Shane Walker, Jason Flaks

arXiv

Strong telephony anti-SPAM paper, not SSI.

Verdict: full-text draftPriority: lowConfidence: highBasis: full textCoverage: high

Reading guidance

Verdict: full-text draft · priority low · confidence high
Why it matters: This is a solid production telephony SPAM classifier for sub-audible dead-air calls, not a silent speech interface paper.
What to trust: Basis: full text. Coverage: high. 4 evidence records back the review.
What is weak: No speech content is decoded; the method only separates telephony SPAM from HAM and depends on this specific low-amplitude robocall regime. The evidence is from a proprietary and class-imbalanced labeled set rather than a public benchmark with reproducible splits. Useful only inside comparable VoIP anti-SPAM systems with similar latency and business costs for false positives. Outside SSI scope. Overclaim risk: Low for telephony anti-SPAM claims, but high if the paper is presented as an SSI contribution..
Read before: SSI review rubric
Read next: SSI archive

Axes

Task: audio classification / telephony SPAM detection
Modality: telephony call audio
Hardware: VoIP / telephony call audio captured from the production call stack
Output: labels
Metrics: Table 1 reports Random Forest precision 83.82%, recall 63.27%, accuracy 90.40%, outperforming the reported linear SVC baselines. The paper also notes 10,000 to 33,000 silent calls per day during traffic-pumping attacks and a strict two-second screening window.
Evaluation mode: Cross-validation on labeled telephony SPAM / HAM calls with a deployment-minded two-second latency budget before call bridging.
Review confidence: high
Overclaim risk: Low for telephony anti-SPAM claims, but high if the paper is presented as an SSI contribution.

Expert take

The full text supports a real deployment story: the system uses only the first two seconds of a call, factorizes the spectrogram with SVD, and chooses a 100-tree Random Forest because precision matters more than recall in a call-routing business. Table 1 gives the key number set: 83.82% precision, 63.27% recall, and 90.40% accuracy. That is meaningful for screening dead-air robocalls at scale, but nothing in the paper recovers linguistic content or builds an SSI. It should stay in the archive only as a clearly labeled distractor or adjacent audio-classification reference.

True value

This is a solid production telephony SPAM classifier for sub-audible dead-air calls, not a silent speech interface paper.

What changed

Canon before

Telephony anti-SPAM systems typically relied on signaling, metadata, or higher-energy acoustic cues rather than sub-audible dead-air classification.

Delta from canon

The paper reframes one robocall defense problem as low-amplitude spectrogram classification within a two-second production screening window.

Position in field

Outside SSI scope; best understood as adjacent low-amplitude audio classification rather than speech-interface research.

Evidence

“ We propose a pletely stopping TP [4]. technique to classify so-called “dead air” or “silent” SPAM The telephony stack behind the Marchex call and speech calls based on features derived from factorizing the caller analytics business handles over one million calls per busi- audio spectrogram. ”

author_claim · Abstract · confidence 1.00

“ The LXC flood • Given the first two seconds of a call, we demonstrate the their own telephone networks with call traffic to boost their efficacy of a Random Forest classifier trained on these call volume and the inter-carrier revenue-sharing fees they features to classify SPAM at production scale. ”

fact · 1 Introduction · confidence 1.00

“ Model Precision Recall Accuracy 6 Initial Results Linear SVC 58.57 63.60 84.04 Linear SVC with SGD 81.85 52.56 76.31 Evaluating the system performance entails comparing Random Forest 83.82 63.27 90.40 the model’s predicted value to ground truth. ”

metric · Table 1. Classifier performance · confidence 1.00

“ During a TP attack, an extra 10,000 to 33,000 “silent” calls per day may be handled by the telephony stack. ”

validation_scope · 5 Experiments · confidence 1.00

Limits

Technical limits

No speech content is decoded; the method only separates telephony SPAM from HAM and depends on this specific low-amplitude robocall regime.

Evaluation limits

The evidence is from a proprietary and class-imbalanced labeled set rather than a public benchmark with reproducible splits.

Deployment limits

Useful only inside comparable VoIP anti-SPAM systems with similar latency and business costs for false positives.

Scope limits

Outside SSI scope.