To read the full version of this content please select one of the options below:

Gender bias in machine learning for sentiment analysis

Mike Thelwall (School of Mathematics and Computer Science, University of Wolverhampton, Wolverhampton, UK)

Online Information Review

ISSN: 1468-4527

Article publication date: 11 June 2018

Abstract

Purpose

The purpose of this paper is to investigate whether machine learning induces gender biases in the sense of results that are more accurate for male authors or for female authors. It also investigates whether training separate male and female variants could improve the accuracy of machine learning for sentiment analysis.

Design/methodology/approach

This paper uses ratings-balanced sets of reviews of restaurants and hotels (3 sets) to train algorithms with and without gender selection.

Findings

Accuracy is higher on female-authored reviews than on male-authored reviews for all data sets, so applications of sentiment analysis using mixed gender data sets will over represent the opinions of women. Training on same gender data improves performance less than having additional data from both genders.

Practical implications

End users of sentiment analysis should be aware that its small gender biases can affect the conclusions drawn from it and apply correction factors when necessary. Users of systems that incorporate sentiment analysis should be aware that performance will vary by author gender. Developers do not need to create gender-specific algorithms unless they have more training data than their system can cope with.

Originality/value

This is the first demonstration of gender bias in machine learning sentiment analysis.

Keywords

Citation

Thelwall, M. (2018), "Gender bias in machine learning for sentiment analysis", Online Information Review, Vol. 42 No. 3, pp. 343-354. https://doi.org/10.1108/OIR-05-2017-0153

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited