Cover Letter

Journal: Data Science
Manuscript Title: Detecting Algorithmic Bias in Turkish E-Commerce Reviews: A Systematic Comparison of Supervised, Lexicon-Based, and Unsupervised Polarity Analysis Methods


Dear Editors,

I am pleased to submit our manuscript, "Detecting Algorithmic Bias in Turkish E-Commerce Reviews: A Systematic Comparison of Supervised, Lexicon-Based, and Unsupervised Polarity Analysis Methods" for consideration for publication in The Data Science Journal.
This manuscript makes several significant contributions to the Data Science community and the field of text mining:

1. **Novel R Package (Released on July 22nd,2025)**: We use 'shoppingwords', a comprehensive R package available on CRAN that provides specialized tools for Turkish e-commerce text analysis, including domain-specific stopwords, sentiment phrases, and annotated review datasets.

2. **Methodological Innovation**: Our work presents a systematic framework for detecting sentiment-rating mismatches in Turkish e-commerce reviews, addressing a critical challenge in recommendation systems where users strategically manipulate ratings.

3. **Comprehensive Benchmarking**: We provide an extensive comparison of supervised classifiers (Logistic Regression, Random Forest, SVM, XGBoost), unsupervised methods (k-means, hierarchical clustering, LDA), and Turkish-specific lexicon-based approaches (BERTurk, SentiTurkNet, TRSAV1, HUMIR).

4. **Reproducible Research**: All code, data, and analysis pipelines are fully reproducible and available through the accompanying R package and supplementary materials, aligning with the Journal's standards.

5. **Multilingual NLP Contribution**: Our work addresses the underrepresentation of Turkish language resources in NLP research and provides valuable insights for other morphologically rich languages.

The study reveals that approximately 2.4% of 5-star reviews contain negative sentiment—a systematic bias where users assign high ratings despite critical feedback to boost review visibility. Our results demonstrate that Random Forest classifiers combined with rating metadata achieve the best performance (F1 score: 0.846, Accuracy: 0.936), while k-means clustering provides robust unsupervised alternatives.

This research bridges practical e-commerce analytics with methodological advances in text mining, offering both theoretical insights and practical tools for the R community. We believe it aligns well with The Journal's focus on innovative applications in data science and will be of interest to researchers working in e-commerce analytics, multilingual NLP, and bias detection in algorithmic systems.

All authors have approved the manuscript and its submission to The Data Science Journal. This work is original and has not been published elsewhere, nor is it currently under consideration for publication elsewhere.

Thank you for considering our work. We look forward to your response.

Sincerely,

Betul Kan-Kilinc, PhD
Department of Statistics
Eskisehir Technical University
Yunus Emre Campus, 26470, Eskisehir, Turkiye
Email_1: bkan@eskisehir.edu.tr
Email_1: betul.kankilinc@duke.edu
Phone: +90 222 321 7943