Rabiger, Stefan and Gezici, Gizem and Saygın, Yücel and Spiliopoulou, Myra (2018) Predicting worker disagreement for more effective crowd labeling. In: 5th IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA), Turin, Italy
PDF
DSAA_Turin.pdf
Download (440kB)
DSAA_Turin.pdf
Download (440kB)
Official URL: http://dx.doi.org/10.1109/DSAA.2018.00028
Abstract
Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling – a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification.
Item Type: | Papers in Conference Proceedings |
---|---|
Uncontrolled Keywords: | worker disagreement; crowdsourcing; dataset quality; label reliability; tweet ambiguity |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Computer Science & Eng. Faculty of Engineering and Natural Sciences |
Depositing User: | Yücel Saygın |
Date Deposited: | 31 Jul 2019 11:14 |
Last Modified: | 12 Jun 2023 15:24 |
URI: | https://research.sabanciuniv.edu/id/eprint/38750 |