Resource Aware Financial Text Classification Using Hybrid Vectorizers and Tree Based Models
Students & Supervisors
Student Authors
Supervisors
Abstract
Financial fraud detection in corporate filings is challenging due to the scale and variability of textual disclosures. This study proposes a resource-aware hybrid framework that integrates traditional machine learning classifiers with transformer-based models, enhanced by adversarial training to simulate real-world obfuscation. Five vectorization methods and five classifiers were systematically evaluated based on accuracy, execution time, memory usage, CPU utilization, inference latency, and robustness. XGBoost combined with TF-IDF achieved perfect classification (F1=1.00; AUC ROC=1.00) in under four seconds of training time, using only 1.8GB RAM, 35% CPU utilization, and 11ms sample inference latency, and obtained the highest composite resource-accuracy score, confirming its suitability for large-scale deployment. Compared to transformer-only baselines, this configuration reduced memory footprint by 72% and training time by 12-15 times while maintaining perfect detection. While transformer-based models captured richer semantic information, they incurred significantly higher computational costs without improving classification performance. These findings highlight the practicality of lightweight vectorizers for financial fraud detection and establish a benchmark for future resource-efficient and domain-robust implementations.
Keywords
Publication Details
- Type of Publication:
- Conference Name: 3rd International Conference on Big Data, IoT and Machine Learning (BIM 2025)
- Date of Conference: 25/09/2025 - 25/09/2025
- Venue: Dhaka International University, Bangladesh
- Organizer: Springer, DIU