InterPoll: Crowd-Sourced Internet Polls

Livshits, Benjamin; Mytkowicz, Todd

doi:10.4230/LIPIcs.SNAPL.2015.156

Abstract

Crowd-sourcing is increasingly being used to provide answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little to help the survey-maker or pollster in obtaining statistically significant results devoid of even the obvious selection biases. 

This paper proposes InterPoll, a platform for programming of crowd-sourced polls. Pollsters express polls as embedded LINQ queries and the runtime correctly reasons about uncertainty in those polls, only polling as many people as required to meet statistical guarantees. To optimize the cost of polls, InterPoll performs query optimization, as well as bias correction and power analysis. The goal of InterPoll is to provide a system that can be reliably used for research into marketing, social and political science questions. 

This paper highlights some of the existing challenges and how InterPoll is designed to address most of them. 

In this paper we summarize some of the work we have already done and give an outline for future work.

Sarah Anderson, Sarah Wandersee, Ariana Arcenas, and Lynn Baumgartner. Craigslist samples of convenience: recruiting hard-to-reach populations. Unpublished.
D Andrews, B Nonnecke, and J Preece. Electronic survey methodology: A case study in reaching hard-to-involve Internet users. International Journal of \ldots, 2003.
J Antin and A Shaw. Social desirability bias and self-reports of motivation: a study of Amazon Mechanical Turk in the US and India. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012.
Daniel Barowy, Charlie Curtsinger, Emery Berger, and Andrew McGregor. AutoMan: A platform for integrating human-based and digital computation. Proceedings of the ACM international conference on Object oriented programming systems languages and applications - OOPSLA'12, page 639, January 2012.
T S Behrend, D J Sharek, and A W Meade. The viability of crowdsourcing for survey research. Behavior research methods, January 2011.
A Berinsky, G Huber, and G Lenz. Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk. Political Analysis, 20(3):351-368, July 2012.
Adam J AJ Berinsky, Gregory A GA Huber, and Gabriel S Lenz. Using mechanical Turk as a subject recruitment tool for experimental research. Typescript, Yale, pages 1-26, 2010.
Samuel J. Best and Brian S. Krueger. Exit Polls: Surveying the American Electorate, 1972-2010. CQ Press, 2012.
M Bleja, T Kowalski, and K Subieta. Optimization of object-oriented queries through rewriting compound weakly dependent subqueries. Database and Expert Systems, pages 1-8, January 2010.
James Bornholt, Todd Mytkowicz, and Kathryn S. McKinley. Uncertain<T>: A First-order Type for Uncertain Data. SIGARCH Comput. Archit. News, 42(1):51-66, 2014.
M Buhrmester and T Kwang. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? on Psychological Science, January 2011.
Trent D Buskirk, D Ph, and Charles Andrus. Online Surveys Aren't Just for Computers Anymore! Exploring Potential Mode Effects between Smartphone and Computer-Based Online Surveys. American Statistical Association (ASA), events and resources for statisticians, educators, students, pages 5678-5691, 2010.
J Chandler, P Mueller, and G Paolacci. Methodological concerns and advanced uses of crowdsourcing in psychological research. Behavioral Research, 2013.
James Cheney, Sam Lindley, and Philip Wadler. A practical theory of language-integrated query. Proceedings of the 18th ACM SIGPLAN international conference on Functional programming - ICFP'13, page 403, January 2013.
D. L. Clancy and K. Phillips J. Some effects of "Social desirability" in survey studies. The American Journal of Sociology, 77(5):921-940, 1972.
Christopher Cooper, David M McCord, and Alan Socha. Evaluating the college sophomore problem: the case of personality and politics. Journal of Psychology, 145(1):23-37, 2011.
M Couper. Designing effective web surveys, 2008.
M P Couper. Review: Web surveys: A review of issues and approaches. The Public Opinion Quarterly, pages 1-31, January 2000.
Franco Curmi and Maria Angela Ferrario. Online sharing of live biometric data for crowd-support: Ethical issues from system design. Unpublished, 2013.
Nilesh Dalvi, Christopher Ré, and Dan Suciu. Probabilistic Databases: Diamonds in the Dirt. Communications of the ACM, 2009.
D Dillman, R Tortora, and D Bowker. Principles for constructing Web surveys. Unpublished, 1998.
M Duda and J Nobile. The fallacy of online surveys: No data are better than bad data. Human Dimensions of Wildlife, 2010.
B Duffy, K Smith, and G Terhanian. Comparing data from online and face-to-face surveys. International Journal of, January 2005.
Justin Ellis. How Google is quietly experimenting in new ways for readers to access publishers' content, 2011.
Joel Evans, New Hempstead, and Anil Mathur. The value of online surveys. Internet Research, 15(2):195-219, January 2005.
Jeremy Eysenbach, Gunther Eysenbach, and Jeremy Wyatt. Using the Internet for Surveys and Health Research. Journal of Medical Internet Research, 4(2):e13, January 2002.
Emma Ferneyhough. Crowdsourcing Anxiety and Attention Research, 2012.
K Fort, G Adda, and K B Cohen. Amazon Mechanical Turk: Gold mine or coal mine? Computational Linguistics, pages 1-8, January 2011.
Michael Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. CrowdDB: answering queries with crowdsourcing. SIGMOD'11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1-12, June 2011.
S Fricker, M Galesic, R Tourangeau, and T Yan. An experimental comparison of web and telephone surveys. Public Opinion Quarterly, 2005.
M Fuchs. Mobile Web Survey: A preliminary discussion of methodological implications. Envisioning the survey interview of the future, January 2008.
Marek Fuchs and Britta Busse. The Coverage Bias of Mobile Web Surveys Across European Countries. International Journal of Internet Science, 4(1):21-33, 2009.
Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian Data Analysis. CRC Press, 3rd edition, 2014.
Andrew D Gordon, Johannes Borgstr, Nicolas Rolland, and John Guiver. Tabular: A Schema-Driven Probabilistic Programming Language. Technical report, Microsoft Research, 2013.
Samuel Gosling, Simine Vazire, Sanjay Srivastava, and Oliver John. Should we trust web-based studies? A comparative analysis of six preconceptions about Internet questionnaires. American Psychologist, 59(2):93-104, January 2004.
R.M. Groves. Survey Errors and Survey Costs. Wiley Series in Probability and Statistics. Wiley, 1989.
Robert M. Groves, Floyd J. Fowler Jr., Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau. Survey Methodology. Wiley, 2009.
Torsten Grust, Jan Rittinger, and Tom Schreiber. Avalanche-safe LINQ compilation. Proceedings of the VLDB Endowment, 3(1-2):162-172, September 2010.
H Gunn. Web-based surveys: Changing the survey process. First Monday, 2002.
Joseph Henrich, Steven J Heine, and Ara Norenzayan. The weirdest people in the world? The Behavioral and brain sciences, 33(2-3):61-83; discussion 83-135, June 2010.
HubSpot and SurveyMonkey. Using online surveys in your marketing. Unpublished.
P G Ipeirotis. Analyzing the Amazon Mechanical Turk marketplace. XRDS: Crossroads, January 2010.
P G Ipeirotis. Demographics of Mechanical Turk. 2010, January 2010.
Floyd J. Fowler Jr. Survey Research Methods (4th ed.). SAGE Publications, Inc., 4 edition, 2009.
R Jurca and B Faltings. Incentives for expressing opinions in online polls. Proceedings of the ACM Conference on Electronic Commerce, 2008.
Adam Kapelner and Dana Chandler. Preventing Satisficing in Online Surveys : A "Kapcha" to Ensure Higher Quality Data. CrowdConf, 2010.
S Keeter. The impact of cell phone noncoverage bias on polling in the 2004 presidential election. Public Opinion Quarterly, 2006.
Scott Keeter, Leah Christian, and Senior Researcher. A Comparison of Results from Surveys by the Pew Research Center and Google Consumer Surveys. http://www.people-press.org/files/legacy-pdf/11-7-12 Google Methodology paper.pdf, 2012.
P Kellner. Can online polls produce accurate findings? International Journal of Market Research, 2004.
A Kittur, E H Chi, and B Suh. Crowdsourcing user studies with Mechanical Turk. Proceedings of the SIGCHI conference on, January 2008.
Aniket Kittur, Susheel Khamkar, Paul André, and Robert Kraut. CrowdWeaver: Visually Managing Complex Crowd Work. Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work - CSCW'12, page 1033, January 2012.
R Kosara and C Ziemkiewicz. Do Mechanical Turks dream of square pie charts? Proceedings Beyond time and errors: novel evaLuation methods for Information Visualization, 2010.
Robert Kraut, Judith Olson, Mahzarin Banaji, Amy Bruckman, Jeffrey Cohen, and Mick Couper. Psychological Research Online: Report of Board of Scientific Affairs' Advisory Group on the Conduct of Research on the Internet. American Psychologist, 59(2):105-117, January 2004.
Robert E Kraut. CrowdForge : Crowdsourcing Complex Work. UIST, pages 43-52, 2011.
Paul Krugman. What People (Don’t) Know About The Deficit, April 2013.
A Kulkarni, M Can, and B Hartmann. Collaboratively crowdsourcing workflows with turkomatic. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, January 2012.
A P Kulkarni, M Can, and B Hartmann. Turkomatic: automatic recursive task and workflow design for mechanical turk. CHI'11 Extended Abstracts on Human, January 2011.
G Little, L B Chilton, M Goldman, and R C Miller. TurKit: tools for iterative tasks on Mechanical Turk. Proceedings of UIST, pages 1-2, January 2009.
Benjamin Livshits and George Kastrinis. Optimizing human computation to save time and money. Technical Report MSR-TR-2014-145, Microsoft Research, November 2014.
Benjamin Livshits and Todd Mytkowicz. Saving money while polling with interpoll using power analysis. In In Proceedings of the Conference on Human Computation and Crowdsourcing (HCOMP 2014), November 2014.
A Marcus, E Wu, Karger, S R Madden, and R C Miller. Crowdsourced databases: Query processing with people. 2011, January 2011.
Adam Marcus, David Karger, Samuel Madden, Robert Miller, and Sewoong Oh. Counting with the crowd. Proceedings of the VLDB Endowment ,, 6(2), December 2012.
Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. Human-powered sorts and joins. Proceedings of the VLDB Endowment ,, 5(1), September 2011.
W Mason and S Suri. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior research methods, January 2012.
Joe Mayo. LINQ Programming. McGraw-Hill Osborne Media, 1 edition, 2008.
Paul Mcdonald, Matt Mohebbi, and Brett Slatkin. Comparing Google Consumer Surveys to Existing Probability and Non-Probability Based Internet Surveys. URL: http://www.google.com/insights/consumersurveys/static/consumer_surveys_whitepaper.pdf.
Patrick Minder, Sven Seuken, Abraham Bernstein, and Mengia Zollinger. CrowdManager - Combinatorial Allocation and Pricing of Crowdsourcing Tasks with Time Constraints. Workshop on Social Computing and User Generated Content in conjunction with ACM Conference on Electronic Commerce (ACM-EC 2012), 2012.
Derek Murray, Michael Isard, and Yuan Yu. Steno: automatic optimization of declarative queries. Proceedings of the Conference on Programming Language Design and Implementation, pages 1-11, June 2011.
Venkata Nerella, Sanjay Madria, and Thomas Weigert. An Approach for Optimization of Object Queries on Collections Using Annotations. 2013 17th European Conference on Software Maintenance and Reengineering, pages 273-282, March 2013.
Daniel M. Oppenheimer, Tom Meyvis, and Nicolas Davidenko. Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45(4):867-872, July 2009.
G Paolacci, J Chandler, and P G Ipeirotis. Running experiments on Amazon Mechanical Turk. Judgment and Decision, January 2010.
Pew Research Center. Demographics of Internet users, 2013.
Steven J Phillips, Miroslav Dudík, Jane Elith, Catherine H Graham, Anthony Lehmann, John Leathwick, and Simon Ferrier. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological applications : a publication of the Ecological Society of America, 19(1):181-97, January 2009.
P Podsakoff, S MacKenzie, and J Lee. Common method biases in behavioral research: a critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5):879-903, 2003.
Ramo and S M Hall. Reaching young adult smokers through the Internet: Comparison of three recruitment mechanisms. Nicotine & Tobacco, January 2010.
D Ramo, S Hall, and J Prochaska. Reliability and validity of self-reported smoking in an anonymous online survey with young adults. Health Psychology, 2011.
Allan Roshwalb, Neal El-Dash, and Clifford Young. Toward the use of Bayesian credibility intervals in online survey results. http://www.ipsos-na.com/knowledge-ideas/public-affairs/points-of-view/?q=bayesian-credibility-interval, 2012.
J Ross, A Zaldivar, L Irani, B Tomlinson, and M Silberman. Who are the crowdworkers?: shifting demographics in Mechanical Turk. CHI'10 Extended, January 2009.
Matthew Salganik and Karen Levy. Wiki surveys: Open and quantifiable social data collection. http://arxiv.org/abs/1202.0500, February 2012.
L Sax, S Gilmartin, and A Bryant. Assessing response rates and nonresponse bias in web and paper surveys. Research in higher education, 2003.
L Schmidt. Crowdsourcing for human subjects research. Proceedings of CrowdConf, 2010.
M Schonlau, A Soest, A Kapteyn, and M Couper. Selection bias in Web surveys and the use of propensity scores. Sociological Methods & Research, 37(3):291-318, February 2009.
G Schueller and A Behrend. Stream Fusion using Reactive Programming, LINQ and Magic Updates. Proceedings of the International Conference on Information Fusion, pages 1-8, January 2013.
S Sills and C Song. Innovations in survey research an application of web-based surveys. Social science computer review, 2002.
Cindy D. Simasa2 and Elizabeth N. Kama. Risk Orientations and Policy Frames. The Journal of Politics, 72(2), 2010.
Martha Sinclair, Joanne O'Toole, Manori Malawaraarachchi, and Karin Leder. Comparison of response rates and cost-effectiveness for a community-based survey: postal, internet and telephone modes with generic or personalised recruitment approaches. BMC medical research methodology, 12(1):132, January 2012.
Nick Sparrow. Developing Reliable Online Polls. International Journal of Market Research, 48(6), 2006.
Robin Sprou. Exit Polls: Better or Worse Since the 2000 Election? Joan Shorestein Center on the Press, Politics and Public Policy, 2008.
J Sprouse. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior research methods, January 2011.
L B Stephenson and J Crête. Studying political behavior: A comparison of Internet and telephone surveys. International Journal of Public Opinion Research, January 2011.
SurveyMonkey. Data Quality: Measuring the Quality of Online Data Sources. http://www.slideshare.net/SurveyMonkeyAudience/surveymonkey-audience-data-quality-whitepaper-september-2012, 2012.
SurveyMonkey. Market Research Survey; Get to know your customer, grow your business, 2013.
M Swan. Crowdsourced health research studies: an important emerging complement to clinical trials in the public health research ecosystem. Journal of Medical Internet Research, January 2012.
Melanie Swan. Scaling crowdsourced health studies : the emergence of a new form of contract research organization. Personalized Medicine, 9:223-234, 2012.
Swati Tawalare and S Dhande. Query Optimization to Improve Performance of the Code Execution. Computer Engineering and Intelligent Systems, 3(1):44-52, January 2012.
Emma Tosch and Emery D. Berger. Surveyman: Programming and automatically debugging surveys. In Proceedings of Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, 2014.
Roger Tourangeau, Frederick G. Conrad, and Mick P. Couper. The Science of Web Surveys. Oxford University Press, 2013.
H L Truong, S Dustdar, and K Bhattacharya. Programming hybrid services in the cloud. Service-Oriented Computing, pages 1-15, January 2012.
Amos Tversky and Daniel Kahneman. The Framing of Decisions and the Psychology of Choice The Framing of Decisions and the Psychology of Choice. Science, 211(4481):453-458, 1981.
US Census. Current population survey, October 2010, school enrollment and Internet use supplement file, 2010.
USamp. Panel Book 2013. 2013.
A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2):117-186, 06 1945.
Fabian L Wauthier and Michael I Jordan. Bayesian Bias Mitigation for Crowdsourcing. Neural Information Processing Systems Conference, pages 1-9, 2011.
R W White. Beliefs and Biases in Web Search. 2013, January 2013.
K Wright. Researching Internet-Based Populations: Advantages and Disadvantages of Online Survey Research, Online Questionnaire Authoring Software Packages, and Web Survey Services. Journal of Computer-Mediated Communication, 2005.
J Wyatt. When to use web-based surveys. Journal of the American Medical Informatics Association, 2000.
D Yeager, J Krosnick, L Chang, and H Javitz. Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples. Public Opinion Quarterly, 2011.
X Yin, W Liu, Y Wang, C Yang, and L Lu. What? How? Where? A Survey of Crowdsourcing. Frontier and Future Development of, January 2014.
Clifford Young, John Vidmar, Julia Clark, and Neale El-Dash. Our brave new world: blended online samples and performance of no probability approaches. Ipsos Public Affairs.

InterPoll: Crowd-Sourced Internet Polls

Authors Benjamin Livshits, Todd Mytkowicz

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message