Using LLM-Generated Data To Create A Roman Urdu Scam Call Detector

Sameed Irfan; Aswad Sheeraz; Muhammad Hasnain

Authors

Sameed Irfan Department of Computer Science, Bloomfield Hall College, Multan, Pakistan
Aswad Sheeraz Department of Computer Science, Beaconhouse College Program, Multan, Pakistan
Muhammad Hasnain Department of Computer Science, Beaconhouse College Program, Multan, Pakistan

Keywords:

LLM, Scam Call Detection, Machine Learning, Training Models with Synthetic Data, Urdu Scam Call Detector

Abstract

The issue of scam calls is on the rise, with losses expected to exceed $1 trillion globally in 2024. While easily incorporating Machine Learning has been effective in countering scam calls, the dominant models continue to suffer from glaring insufficiencies. Most models can only detect monolingual scam calls, and LLM-based solutions, though they can be multilingual, are impractical due to the resources they require. Furthermore, scam call tactics are constantly changing; hence, many models can become outdated. To address these challenges, this paper proposes a structure where a model is trained on LLM-generated data, allowing for a multilingual and easy-to-update dataset. To test the accuracy of these models, a small dataset of human-written scam and non-scam call dialogues was used. This model was trained on synthetic data and tested on real-world scam calls data, achieving, on average, over 90% accuracy and f1_score.

Using LLM-Generated Data To Create A Roman Urdu Scam Call Detector

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License