Agnostic Learning with Unknown Utilities

Authors Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2021.55.pdf
  • Filesize: 0.6 MB
  • 20 pages

Document Identifiers

Author Details

Kush Bhatia
  • University of California at Berkeley, CA, USA
Peter L. Bartlett
  • University of California at Berkeley, CA, USA
Anca D. Dragan
  • University of California at Berkeley, CA, USA
Jacob Steinhardt
  • University of California at Berkeley, CA, USA

Cite As Get BibTex

Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, and Jacob Steinhardt. Agnostic Learning with Unknown Utilities. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 55:1-55:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.ITCS.2021.55

Abstract

Traditional learning approaches for classification implicitly assume that each mistake has the same cost. In many real-world problems though, the utility of a decision depends on the underlying context x and decision y; for instance, misclassifying a stop sign is worse than misclassifying a road-side postbox. However, directly incorporating these utilities into the learning objective is often infeasible since these can be quite complex and difficult for humans to specify.
We formally study this as agnostic learning with unknown utilities: given a dataset S = {x_1, …, x_n} where each data point x_i ∼ 𝒟_x from some unknown distribution 𝒟_x, the objective of the learner is to output a function f in some class of decision functions ℱ with small excess risk. This risk measures the performance of the output predictor f with respect to the best predictor in the class ℱ on the unknown underlying utility u^*:𝒳×𝒴↦ [0,1]. This utility u^* is not assumed to have any specific structure and is allowed to be any bounded function. This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility u^* might not be possible from finitely many samples. Surprisingly, we show that estimating the utilities of only the sampled points S suffices to learn a decision function which generalizes well. 
With this insight, we study mechanisms for eliciting information from human experts which allow a learner to estimate the utilities u^* on the set S. While humans find it difficult to directly provide utility values reliably, it is often easier for them to provide comparison feedback based on these utilities. We show that, unlike in the realizable setup, the vanilla comparison queries where humans compare a pair of decisions for a single input x are insufficient. We introduce a family of elicitation mechanisms by generalizing comparisons, called the k-comparison oracle, which enables the learner to ask for comparisons across k different inputs x at once. We show that the excess risk in our agnostic learning framework decreases at a rate of O (1/k) with such queries. This result brings out an interesting accuracy-elicitation trade-off - as the order k of the oracle increases, the comparative queries become harder to elicit from humans but allow for more accurate learning.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Machine learning
  • Computing methodologies → Active learning settings
Keywords
  • agnostic learning
  • learning by comparisons
  • utility estimation
  • active learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, 2004. Google Scholar
  2. Sydney N Afriat. The construction of utility functions from expenditure data. International economic review, 8(1), 1967. Google Scholar
  3. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint, 2016. URL: http://arxiv.org/abs/1606.06565.
  4. Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5), 2009. Google Scholar
  5. Maria-Florina Balcan, Andrei Broder, and Tong Zhang. Margin based active learning. In International Conference on Computational Learning Theory, pages 35-50. Springer, 2007. Google Scholar
  6. Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V Vazirani. Learning economic parameters from revealed preferences. In International Conference on Web and Internet Economics, 2014. Google Scholar
  7. Maria-Florina Balcan and Phil Long. Active and passive learning of linear separators under log-concave distributions. In Conference on Learning Theory, pages 288-316, 2013. Google Scholar
  8. Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463-482, 2002. Google Scholar
  9. Eyal Beigman and Rakesh Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006. Google Scholar
  10. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006. Google Scholar
  11. Tom Bylander. Learning linear threshold functions in the presence of classification noise. In Proceedings of the seventh annual conference on Computational learning theory, 1994. Google Scholar
  12. Sanjoy Dasgupta, Adam Tauman Kalai, and Adam Tauman. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10(2), 2009. Google Scholar
  13. Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. John Wiley & Sons, 2012. Google Scholar
  14. Johannes Fürnkranz and Eyke Hüllermeier. Preference learning and ranking by pairwise comparison. In Preference learning. Springer, 2010. Google Scholar
  15. David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation, 100(1), 1992. Google Scholar
  16. Roni Khardon and Gabriel Wachman. Noise tolerant variants of the perceptron algorithm. Journal of Machine Learning Research, 8, 2007. Google Scholar
  17. George A Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 63(2), 1956. Google Scholar
  18. Oskar Morgenstern and John Von Neumann. Theory of games and economic behavior. Princeton university press, 1953. Google Scholar
  19. Andrew Y Ng and Stuart J Russell. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, volume 1, page 2, 2000. Google Scholar
  20. Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 1958. Google Scholar
  21. Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014. Google Scholar
  22. Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11, 2010. Google Scholar
  23. Neil Stewart, Gordon DA Brown, and Nick Chater. Absolute identification by relative judgment. Psychological review, 112(4), 2005. Google Scholar
  24. Louis L Thurstone. The method of paired comparisons for social values. The Journal of Abnormal and Social Psychology, 21(4), 1927. Google Scholar
  25. Songbai Yan and Chicheng Zhang. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. In Advances in Neural Information Processing Systems, 2017. Google Scholar
  26. Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail