Filtering With the Crowd: CrowdScreen Revisited

Authors Benoit Groz, Ezra Levin, Isaac Meilijson, Tova Milo



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2016.12.pdf
  • Filesize: 0.74 MB
  • 18 pages

Document Identifiers

Author Details

Benoit Groz
Ezra Levin
Isaac Meilijson
Tova Milo

Cite As Get BibTex

Benoit Groz, Ezra Levin, Isaac Meilijson, and Tova Milo. Filtering With the Crowd: CrowdScreen Revisited. In 19th International Conference on Database Theory (ICDT 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 48, pp. 12:1-12:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.ICDT.2016.12

Abstract

Filtering a set of items, based on a set of properties that can be verified by humans, is a common application of CrowdSourcing. When the workers are error-prone, each item is presented to multiple users, to limit the probability of misclassification. Since the Crowd is a relatively expensive resource, minimizing the number of questions per item may naturally result in big savings. Several algorithms to address this minimization problem have been presented  in the CrowdScreen framework by Parameswaran et al. However, those algorithms do not scale well and therefore cannot be used in scenarios where high accuracy is required in spite of high user error rates. The goal of this paper is thus to devise algorithms that can cope with such situations. To achieve this, we provide new theoretical insights to the problem, then use them to develop a new efficient algorithm. We also propose novel optimizations for the algorithms of CrowdScreen that improve their scalability. We complement our theoretical study by an experimental evaluation of the algorithms on a large set of synthetic parameters as well as real-life crowdsourcing scenarios, demonstrating the advantages of our solution.

Subject Classification

Keywords
  • CrowdSourcing
  • filtering
  • algorithms
  • sprt
  • hypothesis testing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ittai Abraham, Omar Alonso, Vasilis Kandylas, and Aleksandrs Slivkins. Adaptive crowdsourcing algorithms for the bandit survey problem. To appear in JMLR W&CP, 30, 2013. Google Scholar
  2. T. W. Anderson. A modification of the sequential probability ratio test to reduce the sample size. The Annals of Math. Stat., 31(1):pp. 165-197, 1960. URL: http://www.jstor.org/stable/2237502.
  3. Michael S. Bernstein, David R. Karger, Robert C. Miller, and Joel Brandt. Analytic methods for optimizing realtime crowdsourcing. In Collective Intelligence, 2012. Google Scholar
  4. Rubi Boim, Ohad Greenshpan, Tova Milo, Slava Novgorodov, Neoklis Polyzotis, and Wang Chiew Tan. Asking the right questions in crowd data sourcing. In ICDE, pages 1261-1264, 2012. URL: http://dx.doi.org/10.1109/ICDE.2012.122.
  5. Peng Dai, Mausam, and Daniel S. Weld. Decision-theoretic control of crowd-sourced workflows. In AAAI, 2010. Google Scholar
  6. Nilesh N. Dalvi, Aditya G. Parameswaran, and Vibhor Rastogi. Minimizing uncertainty in pipelines. In NIPS, pages 2951-2959, 2012. Google Scholar
  7. Jacques Dutka. The incomplete beta function - a historical profile. Archive for History of Exact Sciences, 24(1):11-29, 1981. URL: http://dx.doi.org/10.1007/BF00327713.
  8. Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. CrowdDB: answering queries with crowdsourcing. In SIGMOD, pages 61-72, 2011. URL: http://dx.doi.org/10.1145/1989323.1989331.
  9. Peter Frazier and Angela J. Yu. Sequential hypothesis testing under stochastic deadlines. In NIPS, 2007. Google Scholar
  10. Jinyang Gao, Xuan Liu, Beng Chin Ooi, Haixun Wang, and Gang Chen. An online cost sensitive decision-making method in crowdsourcing systems. In SIGMOD, pages 217-228, 2013. URL: http://dx.doi.org/10.1145/2463676.2465307.
  11. Benoit Groz, Ezra Levin, Isaco Meilijson, and Tova Milo. Filtering with the crowd: Crowdscreen revisited. https://hal.archives-ouvertes.fr/view/index/docid/1239458. Google Scholar
  12. Haim Kaplan, Ilia Lotosh, Tova Milo, and Slava Novgorodov. Answering planning queries with the crowd. PVLDB, 6(9):697-708, 2013. Google Scholar
  13. David R. Karger, Sewoong Oh, and Devavrat Shah. Efficient crowdsourcing for multi-class labeling. In SIGMETRICS, pages 81-92, 2013. URL: http://dx.doi.org/10.1145/2465529.2465761.
  14. Donald E. Knuth. The Art of Computer Programming, Volume IV, draft of 7.2.1.6. Addison-Wesley, 2004. Google Scholar
  15. Walter Lehmacher and Gernot Wassmer. Adaptive sample size calculations in group sequential trials. Biometrics, 55(4):1286-1290, 1999. URL: http://dx.doi.org/10.1111/j.0006-341X.1999.01286.x.
  16. Christopher H. Lin, Mausam, and Daniel S. Weld. Dynamically switching between synergistic workflows for crowdsourcing. In AAAI, 2012. Google Scholar
  17. Qiang Liu, Jian Peng, and Alexander T. Ihler. Variational inference for crowdsourcing. In NIPS, pages 701-709, 2012. Google Scholar
  18. Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, and Sewoong Oh. Counting with the crowd. PVLDB, 6(2):109-120, 2012. Google Scholar
  19. Adam Marcus, Eugene Wu, David R. Karger, Samuel Madden, and Robert C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13-24, 2011. Google Scholar
  20. Adam Marcus, Eugene Wu, Samuel Madden, and Robert C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211-214, 2011. Google Scholar
  21. Aditya G. Parameswaran. Personal Communication. Google Scholar
  22. Aditya G. Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, and Jennifer Widom. Crowdscreen: algorithms for filtering data with humans. In SIGMOD, pages 361-372, 2012. URL: http://dx.doi.org/10.1145/2213836.2213878.
  23. A. Wald. Sequential tests of statistical hypotheses. The Annals of Math. Stat., 16(2):pp. 117-186, 1945. URL: http://www.jstor.org/stable/2235829.
  24. Peter Welinder, Steve Branson, Serge Belongie, and Pietro Perona. The multidimensional wisdom of crowds. In NIPS, pages 2424-2432, 2010. Google Scholar
  25. Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035-2043, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail