Agnostic Learning with Unknown Utilities

Bhatia, Kush; Bartlett, Peter L.; Dragan, Anca D.; Steinhardt, Jacob

doi:10.4230/LIPIcs.ITCS.2021.55

Abstract

Traditional learning approaches for classification implicitly assume that each mistake has the same cost. In many real-world problems though, the utility of a decision depends on the underlying context x and decision y; for instance, misclassifying a stop sign is worse than misclassifying a road-side postbox. However, directly incorporating these utilities into the learning objective is often infeasible since these can be quite complex and difficult for humans to specify.
We formally study this as agnostic learning with unknown utilities: given a dataset S = {x_1, …, x_n} where each data point x_i ∼ 𝒟_x from some unknown distribution 𝒟_x, the objective of the learner is to output a function f in some class of decision functions ℱ with small excess risk. This risk measures the performance of the output predictor f with respect to the best predictor in the class ℱ on the unknown underlying utility u^*:𝒳×𝒴↦ [0,1]. This utility u^* is not assumed to have any specific structure and is allowed to be any bounded function. This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility u^* might not be possible from finitely many samples. Surprisingly, we show that estimating the utilities of only the sampled points S suffices to learn a decision function which generalizes well. 
With this insight, we study mechanisms for eliciting information from human experts which allow a learner to estimate the utilities u^* on the set S. While humans find it difficult to directly provide utility values reliably, it is often easier for them to provide comparison feedback based on these utilities. We show that, unlike in the realizable setup, the vanilla comparison queries where humans compare a pair of decisions for a single input x are insufficient. We introduce a family of elicitation mechanisms by generalizing comparisons, called the k-comparison oracle, which enables the learner to ask for comparisons across k different inputs x at once. We show that the excess risk in our agnostic learning framework decreases at a rate of O (1/k) with such queries. This result brings out an interesting accuracy-elicitation trade-off - as the order k of the oracle increases, the comparative queries become harder to elicit from humans but allow for more accurate learning.

Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, 2004.
Sydney N Afriat. The construction of utility functions from expenditure data. International economic review, 8(1), 1967.
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint, 2016. URL: http://arxiv.org/abs/1606.06565.
Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5), 2009.
Maria-Florina Balcan, Andrei Broder, and Tong Zhang. Margin based active learning. In International Conference on Computational Learning Theory, pages 35-50. Springer, 2007.
Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V Vazirani. Learning economic parameters from revealed preferences. In International Conference on Web and Internet Economics, 2014.
Maria-Florina Balcan and Phil Long. Active and passive learning of linear separators under log-concave distributions. In Conference on Learning Theory, pages 288-316, 2013.
Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463-482, 2002.
Eyal Beigman and Rakesh Vohra. Learning from revealed preference. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006.
Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
Tom Bylander. Learning linear threshold functions in the presence of classification noise. In Proceedings of the seventh annual conference on Computational learning theory, 1994.
Sanjoy Dasgupta, Adam Tauman Kalai, and Adam Tauman. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10(2), 2009.
Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. John Wiley & Sons, 2012.
Johannes Fürnkranz and Eyke Hüllermeier. Preference learning and ranking by pairwise comparison. In Preference learning. Springer, 2010.
David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation, 100(1), 1992.
Roni Khardon and Gabriel Wachman. Noise tolerant variants of the perceptron algorithm. Journal of Machine Learning Research, 8, 2007.
George A Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 63(2), 1956.
Oskar Morgenstern and John Von Neumann. Theory of games and economic behavior. Princeton university press, 1953.
Andrew Y Ng and Stuart J Russell. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, volume 1, page 2, 2000.
Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 1958.
Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11, 2010.
Neil Stewart, Gordon DA Brown, and Nick Chater. Absolute identification by relative judgment. Psychological review, 112(4), 2005.
Louis L Thurstone. The method of paired comparisons for social values. The Journal of Abnormal and Social Psychology, 21(4), 1927.
Songbai Yan and Chicheng Zhang. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. In Advances in Neural Information Processing Systems, 2017.
Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, 2008.

Agnostic Learning with Unknown Utilities

Authors Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message