← SSI archive · Review rubric

2018 · arXiv / imported corpus page · Field expert review · confidence medium

Proactive Security: Embedded AI Solution for Violent and Abusive Speech Recognition

Christopher Shulby, Leonardo Pombal, Vitor Jordão, Guilherme Ziolle, Bruno Martho, Antônio Postal, Thiago Prochnow

An embedded smartphone NLP classifier detects violent speech with ~87.5% accuracy using known methods but is unrelated to silent speech interfaces; strong practical application in safety alerting.

Verdict: full-text draftPriority: mediumConfidence: mediumBasis: full text + structured benchmark + summaryCoverage: high

Reading guidance

Verdict
full-text draft · priority medium · confidence medium
Why it matters
Demonstration of embedded violent speech detection on smartphone with small model footprint and improved NLP classification techniques, enabling proactive silent alerts in real-time applications.
What to trust
Basis: full text + structured benchmark + summary. Coverage: high. 8 evidence records back the review.
What is weak
Relies on Android SpeechRecognizer API which complicates continuous listening due to enforced beeps and OS kill; limited effectivity from small dataset and restricted language. Evaluation limited to small 1200-phrase dataset split randomly with 70% training and 30% validation; performance metrics given only on internal validation set; no external or real-world user study reported. Relies on Android SpeechRecognizer which requires restarting, consumes significant battery (~15h continuous use), and is not fully silent due to periodic beeps; no silent or articulatory input methods; limited to Brazilian Portuguese. Limited to Brazilian Portuguese violent and abusive speech phrase detection; no general speech recognition or silent speech function. Overclaim risk: medium.
Read before
SSI review rubric
Read next
SSI archive

Axes

Task
speech-recognition
Modality
microphone
Hardware
smartphone microphone
Output
text
Vocabulary
Mixed; includes both violent and non-violent phrases; binary features for bag-of-words; numeric vector embeddings for word2vec
Metrics
Validation accuracy 79% (bag-of-words) and 87.5% (word embeddings); F1 score of 0.78 and 0.87 respectively; confusion matrices show false positive rates reduced from 26% to 6% with embeddings.
Evaluation mode
Proof-of-concept product evaluation with offline corpus split for training and validation.
Review confidence
medium
Overclaim risk
medium

Expert take

This paper addresses embedded detection of violent and abusive speech from smartphone audio using NLP-based classifiers. It employs bag-of-words and word embeddings with SVMs, augmented by SMOTE for class imbalance, on a curated 1200-sentence corpus in Brazilian Portuguese. The system runs on Android, using the native speech recognition API to convert audio to text before classification. Results show improved F1 and accuracy with embeddings (87.5%). While practical and novel as a silent alerting application, the approach is not silent speech interface technology and remains limited by dataset size, OS constraints (microphone access and persistent service), battery consumption, and language scope. Overall, this system is a focused contribution in safety monitoring via embedded NLP rather than silent speech recognition, and shows promising deployment feasibility with typical technical and dataset limitations of a PoC stage embedded classifier.

True value

Demonstration of embedded violent speech detection on smartphone with small model footprint and improved NLP classification techniques, enabling proactive silent alerts in real-time applications.

What changed

Canon before

Violent-speech detection was typically keyword-based or reliant on panic buttons requiring user action.

Delta from canon

Replaces keyword or panic button triggers with NLP-driven phrase classification using SVM on bag-of-words and word embeddings and augments minority class with SMOTE for balance.

Position in field

A safety-oriented embedded speech classification system outside silent speech interfaces.

Evidence

“ We the highest absolute number of murders in the world in 2012 propose an embedded artificial intelligence solution, using natural [2] with 47 thousand murders; that represents 13 in each 100 language and speech processing technology, to silently alert murders which occurred in the entire world. someone who can help in this situation. ”

author_claim · Abstract · confidence 0.95

“ The dataset was built using real violent and abusive situa- Capture: To be able to predict a potentially violent sit- tions, found on websites [21] (public police occurrences, FBI, uation, the application needs to capture the audio from the threat emails, investigation data) and scientific articles [22], environment and process it using the trained model. ”

validation_scope · III. PROPOSED SOLUTION · confidence 0.95

“ The boolean technique was used to create true-positive rate was 73%; the false-positive rate was: 26%; the feature vectors. the true negative rate was: 86%; and the false-negative rate Firstly, the dataset was split into two parts: the training was: 14%. set, used to train our SVM classifier, and the test set, to validate it. ”

metric · IV. RESULTS · confidence 0.95

“ In Figure 2, the confusion techniques used for Word Embeddings and SVM and attempts matrix for the word embeddings method is presented, where to improve accuracy with a minimalistic approach to data the true-positive rate was 94%; the false-positive rate was: augmentation with SMOTE. ”

metric · IV. RESULTS · confidence 0.95

“ Since the data with the microphone should also run uninterruptedly. for BP was rather sparse, some phrases were then augmented The native Android speech recognition service, SpeechRec- using the creative license of the developer team or automatic ognizer, was used to collect the audio and transcribe it into phrase creation with JSGF scripts. ”

deployment_claim · III. PROPOSED SOLUTION · confidence 0.90

“ Currently, we were able and support... to simulate lives tests showing a battery duration of about 15 hours under continuous use. ”

limitation · VI. FUTURE WORK · confidence 0.90

“ The solution should also be expanded to multiple languages, since until now it’s been developed only for BP, creating new training sets for each scenario and applying the same model for identifying violent and abusive speech. ”

limitation · VI. FUTURE WORK · confidence 0.85

“ The final embedded solution has a small footprint according to the Statistical Learning Theory and large-margin of less than 10 MB. bounds [18], [19]. ”

actual_novelty · III. PROPOSED SOLUTION · confidence 0.90

Limits

Technical limits

Relies on Android SpeechRecognizer API which complicates continuous listening due to enforced beeps and OS kill; limited effectivity from small dataset and restricted language.

Evaluation limits

Evaluation limited to small 1200-phrase dataset split randomly with 70% training and 30% validation; performance metrics given only on internal validation set; no external or real-world user study reported.

Deployment limits

Relies on Android SpeechRecognizer which requires restarting, consumes significant battery (~15h continuous use), and is not fully silent due to periodic beeps; no silent or articulatory input methods; limited to Brazilian Portuguese.

Scope limits

Limited to Brazilian Portuguese violent and abusive speech phrase detection; no general speech recognition or silent speech function.