Filtering With the Crowd: CrowdScreen Revisited

Groz, Benoit; Levin, Ezra; Meilijson, Isaac; Milo, Tova

doi:10.4230/LIPIcs.ICDT.2016.12

File

Subject Classification

Keywords

CrowdSourcing
filtering
algorithms
sprt
hypothesis testing

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

Filtering a set of items, based on a set of properties that can be verified by humans, is a common application of CrowdSourcing. When the workers are error-prone, each item is presented to multiple users, to limit the probability of misclassification. Since the Crowd is a relatively expensive resource, minimizing the number of questions per item may naturally result in big savings. Several algorithms to address this minimization problem have been presented in the CrowdScreen framework by Parameswaran et al. However, those algorithms do not scale well and therefore cannot be used in scenarios where high accuracy is required in spite of high user error rates. The goal of this paper is thus to devise algorithms that can cope with such situations. To achieve this, we provide new theoretical insights to the problem, then use them to develop a new efficient algorithm. We also propose novel optimizations for the algorithms of CrowdScreen that improve their scalability. We complement our theoretical study by an experimental evaluation of the algorithms on a large set of synthetic parameters as well as real-life crowdsourcing scenarios, demonstrating the advantages of our solution.

Cite As Get BibTex

Benoit Groz, Ezra Levin, Isaac Meilijson, and Tova Milo. Filtering With the Crowd: CrowdScreen Revisited. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ICDT.2016.12

Author Details

Benoit Groz

Ezra Levin

Isaac Meilijson

Tova Milo

References

Ittai Abraham, Omar Alonso, Vasilis Kandylas, and Aleksandrs Slivkins. Adaptive crowdsourcing algorithms for the bandit survey problem. To appear in JMLR W&CP, 30, 2013.
T. W. Anderson. A modification of the sequential probability ratio test to reduce the sample size. The Annals of Math. Stat., 31(1):pp. 165-197, 1960. URL: http://www.jstor.org/stable/2237502.
Michael S. Bernstein, David R. Karger, Robert C. Miller, and Joel Brandt. Analytic methods for optimizing realtime crowdsourcing. In Collective Intelligence, 2012.
Rubi Boim, Ohad Greenshpan, Tova Milo, Slava Novgorodov, Neoklis Polyzotis, and Wang Chiew Tan. Asking the right questions in crowd data sourcing. In ICDE, pages 1261-1264, 2012. URL: http://dx.doi.org/10.1109/ICDE.2012.122.
Peng Dai, Mausam, and Daniel S. Weld. Decision-theoretic control of crowd-sourced workflows. In AAAI, 2010.
Nilesh N. Dalvi, Aditya G. Parameswaran, and Vibhor Rastogi. Minimizing uncertainty in pipelines. In NIPS, pages 2951-2959, 2012.
Jacques Dutka. The incomplete beta function - a historical profile. Archive for History of Exact Sciences, 24(1):11-29, 1981. URL: http://dx.doi.org/10.1007/BF00327713.
Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. CrowdDB: answering queries with crowdsourcing. In SIGMOD, pages 61-72, 2011. URL: http://dx.doi.org/10.1145/1989323.1989331.
Peter Frazier and Angela J. Yu. Sequential hypothesis testing under stochastic deadlines. In NIPS, 2007.
Jinyang Gao, Xuan Liu, Beng Chin Ooi, Haixun Wang, and Gang Chen. An online cost sensitive decision-making method in crowdsourcing systems. In SIGMOD, pages 217-228, 2013. URL: http://dx.doi.org/10.1145/2463676.2465307.
Benoit Groz, Ezra Levin, Isaco Meilijson, and Tova Milo. Filtering with the crowd: Crowdscreen revisited. https://hal.archives-ouvertes.fr/view/index/docid/1239458.
Haim Kaplan, Ilia Lotosh, Tova Milo, and Slava Novgorodov. Answering planning queries with the crowd. PVLDB, 6(9):697-708, 2013.
David R. Karger, Sewoong Oh, and Devavrat Shah. Efficient crowdsourcing for multi-class labeling. In SIGMETRICS, pages 81-92, 2013. URL: http://dx.doi.org/10.1145/2465529.2465761.
Donald E. Knuth. The Art of Computer Programming, Volume IV, draft of 7.2.1.6. Addison-Wesley, 2004.
Walter Lehmacher and Gernot Wassmer. Adaptive sample size calculations in group sequential trials. Biometrics, 55(4):1286-1290, 1999. URL: http://dx.doi.org/10.1111/j.0006-341X.1999.01286.x.
Christopher H. Lin, Mausam, and Daniel S. Weld. Dynamically switching between synergistic workflows for crowdsourcing. In AAAI, 2012.
Qiang Liu, Jian Peng, and Alexander T. Ihler. Variational inference for crowdsourcing. In NIPS, pages 701-709, 2012.
Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, and Sewoong Oh. Counting with the crowd. PVLDB, 6(2):109-120, 2012.
Adam Marcus, Eugene Wu, David R. Karger, Samuel Madden, and Robert C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13-24, 2011.
Adam Marcus, Eugene Wu, Samuel Madden, and Robert C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211-214, 2011.
Aditya G. Parameswaran. Personal Communication.
Aditya G. Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, and Jennifer Widom. Crowdscreen: algorithms for filtering data with humans. In SIGMOD, pages 361-372, 2012. URL: http://dx.doi.org/10.1145/2213836.2213878.
A. Wald. Sequential tests of statistical hypotheses. The Annals of Math. Stat., 16(2):pp. 117-186, 1945. URL: http://www.jstor.org/stable/2235829.
Peter Welinder, Steve Branson, Serge Belongie, and Pietro Perona. The multidimensional wisdom of crowds. In NIPS, pages 2424-2432, 2010.
Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035-2043, 2009.

Filtering With the Crowd: CrowdScreen Revisited

Authors Benoit Groz, Ezra Levin, Isaac Meilijson, Tova Milo

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message