← Back to Publications List

A Lightweight Data Mining Approach for Fake Review Detection: An Integrated Textual and Behavioral Analysis

Students & Supervisors

Student Authors
Kazi Abdullah Jarif
Bachelor of Science in Computer Science & Engineering, FST
Mst. Nadiya Noor
Bachelor of Science in Computer Science & Engineering, FST
Moizuddin Mohammad Mujahid Rashid
Bachelor of Science in Computer Science & Engineering, FST
Supervisors
Victor Stany Rozario
Assistant Professor, Special Assistant [cs], FST

Abstract

Customer reviews play a significant role in determining a seller’s credibility in this era of rapidly expanding online shopping. Customer reviews have a greater influence on purchasing decisions, and they also have an impact on the reputation of the seller and the product. This is especially true for small e-commerce sites, which usually don’t have the resources or such sophisticated technology to detect fake reviews. The goal of this study is to create a cost-effective data mining method for evaluating the reliability of customer reviews by combining contextual and textual data. In this study, we present a practical and efficient data mining method with utilizing the popular Amazon Fine Food Reviews dataset, which integrates review text with significant contextual information, including posting time, day of the week, review length, and rating score. By aggregating these contextual and text-based cues, we compare the performance of three models: Logistic Regression, Random Forest Classifier, and XGBoost. Of these models, XGBoost obtained the highest test accuracy of 87.87%, while performing significantly better than Logistic Regression test accuracy of 84.97% and Random Forest Classifier test accuracy of 86.14%. We selected XGBoost as the model based on this comparison due to its superior detection performance, which provides a valuable and lightweight solution for real-world problems. Our proposed approach is applicable to a wide range of reviews of diverse product categories and review sites, thus providing a strong defense against reputational attacks and ensuring that real customer feedback always plays a role in the purchase decision process.

Keywords

Customer Reviews Deceptive Reviews Data Mining Random Forest Classifier Contextual Feature XGBoost

Publication Details

  • Type of Publication:
  • Conference Name: 3rd International Conference on Big Data, IoT and Machine Learning (BIM 2025)
  • Date of Conference: 25/09/2025 - 25/09/2025
  • Venue: Dhaka International University, Bangladesh
  • Organizer: Department of CSE and EEE, DIU, Bangladesh