Optimizing Feature Selection and Model Accuracy on Large Datasets: A Hybrid Approach with TabMap.
Students & Supervisors
Student Authors
Supervisors
Abstract
This paper proposes a novel hybrid approach to improve the classification of large-scale tabular datasets by integrating feature selection with spatial transformation through the TabMap, combined with convolutional neural networks (CNNs). Traditional machine learning (ML) models struggle to capture complex feature dependencies in high-dimensional tabular data, limiting their predictive power and interpretability. In this work, tabular data is transformed into spatially meaningful two-dimensional images using TabMap, preserving feature correlations and enabling CNNs to exploit spatial locality for better pattern recognition. The approach is applied to a large freshwater quality dataset containing 4.5 million instances and 23 physicochemical features, such as pH and iron concentration. After preprocessing the data through imputation, normalization, and outlier handling, the method’s performance is evaluated against traditional ML models, such as Random Forest and XGBoost, as well as CNN models trained on TabMap images. Additionally, feature selection methods, including Information Gain-based thresholding, Correlation-Based Feature Selection (CBFS), and etc. are applied to enhance model efficiency. The results demonstrate that CNNs trained on TabMap 2D grid outperform classical methods, with accuracy and F1-score improvements. Moreover, feature selection significantly boosts model interpretability and reduces dimensionality, highlighting the importance of integrating these techniques. The findings provide valuable insights into the potential of spatially transformed data and feature selection for large-scale environmental data classification and suggest future research directions to improve scalability, categorical data handling, and model generalization.
Keywords
Publication Details
- Type of Publication:
- Conference Name: 3rd International Conference on Big Data, IoT and Machine Learning
- Date of Conference: 25/09/2025 - 25/09/2025
- Venue: Dhaka International University, Bangladesh.
- Organizer: Department of CSE and EEE, DIU