← Back to Publications List

BRDD: A Transformer-Based Approach for Region-Specific Dialect Detection in Banglish Using Pretrained Embeddings

Students & Supervisors

Student Authors
Hasin Almas Sifat
Bachelor of Science in Computer Science & Engineering, FST
Koushik Biswas Arko
Bachelor of Science in Computer Science & Engineering, FST
Supervisors
Md. Mortuza Ahmmed
Associate Professor, Faculty, FST

Abstract

Detecting regional dialects in Banglish (romanized Bangla) poses significant challenges due to noisy text, inconsistent romanization, and overlapping linguistic features. In this paper, we introduce BRDD (Bangla Romanized Dialect Detector), a novel Transformer-based model designed to accurately classify Banglish dialects from different regions of Bangladesh, such as Barishal, Chittagong, Mymensingh, Noakhali, and Sylhet. Leveraging the power of pretrained word embeddings, our approach overcomes the complexities of spelling variations and romanization inconsistencies, providing an efficient method for regional dialect identification. The BRDD model utilizes a fine-tuned Transformer encoder that processes the Romanized Banglish text. To enhance dialect detection, we propose a prototypical classifier, which assigns sentence embeddings to region-specific prototypes, ensuring that the model learns to distinguish between subtle regional features. We further enhance performance by employing data augmentation strategies to simulate romanization noise, which improves the model's robustness to spelling variations and diverse writing styles. Experimental results on a custom Banglish dialect dataset show that BRDD outperforms traditional methods, achieving significant improvements in accuracy and macro-F1 score. The model is robust to noisy inputs and is interpretable, making it highly effective for real-world applications in social media monitoring, regional text classification, and other Bangla NLP tasks. By combining pretrained embeddings with transformer architecture, BRDD offers a powerful solution for Banglish dialect detection, advancing multilingual NLP in under-resourced languages. Our experiments conducted with a custom Banglish dialect dataset demonstrated that BRDD outperformed traditional methods, and shown a significant improvement in accuracy and macro-F1 score. The model is robust under noisy input conditions and is interpretable making our model extremely useful for social media monitoring, regional text classification, Bangla NLP, and possibly other longitudinal NLP tasks. Pretrained embeddings, merged with transformer architecture, enable BRDD to provide a powerful approach to Banglish dialect detection, further developing multilingual NLP in under-resourced languages.

Keywords

Banglish Dialect Detection Pretrained Embeddings Bangla NLP Multilingual NLP

Publication Details

  • Type of Publication:
  • Conference Name: 7th International Conference on Integrated Sciences (ICIS) 2025
  • Date of Conference: 25/10/2025 - 25/10/2025
  • Venue: Eastern University Campus, Ashulia Model Town, Dhaka, Bangladesh
  • Organizer: Eastern University, Bangladesh