A Transformer-Based BanglaBERT Approach for Detecting Harmful Content in Bengali Social Media Landscape
Students & Supervisors
Student Authors
Supervisors
Abstract
Social media connect individuals across vast geographical dis tances, fostering free expression. However, it can be undermined by online harassment, often manifested in comment sections, causing emo tional distress, reputational and even physical harm. Automated systems are developed to detect and remove these harmful contents but overlook the concerns of bias and losses in the pursuit of higher accuracy. This study addresses cyberbullying detection in Bangla text, a widely spo ken language. We leverage BanglaBERT (small) for the classification task using a large publicly available dataset consisting of 44,001 com ments. After analyzing the data, we applied rigorous data-cleaning pro cedures along with pre-processing techniques that removed non-linguistic artefacts. Our proposed model demonstrated a remarkable binary and a multi-level classification accuracy of 94.91 % and 90 %, respectively, while minimizing overfitting complications and maintaining minimal loss. This performance opens a new avenue for future research in low-resource scenarios for real-world applications
Keywords
Publication Details
- Type of Publication:
- Conference Name: 3rd International Conference on Big Data, IoT and Machine Learning (BIM 2025)
- Date of Conference: 25/09/2025 - 25/09/2025
- Venue: Dhaka International University (DIU)
- Organizer: Department of CSE and EEE, DIU