Document

**Published in:** LIPIcs, Volume 256, 4th Symposium on Foundations of Responsible Computing (FORC 2023)

Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor?
The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is r₁ away from the empirical distribution over the labeled dataset and r₂ away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.

Pranjal Awasthi, Christopher Jung, and Jamie Morgenstern. Distributionally Robust Data Join. In 4th Symposium on Foundations of Responsible Computing (FORC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 256, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{awasthi_et_al:LIPIcs.FORC.2023.10, author = {Awasthi, Pranjal and Jung, Christopher and Morgenstern, Jamie}, title = {{Distributionally Robust Data Join}}, booktitle = {4th Symposium on Foundations of Responsible Computing (FORC 2023)}, pages = {10:1--10:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-272-3}, ISSN = {1868-8969}, year = {2023}, volume = {256}, editor = {Talwar, Kunal}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2023.10}, URN = {urn:nbn:de:0030-drops-179311}, doi = {10.4230/LIPIcs.FORC.2023.10}, annote = {Keywords: Distributionally Robust Optimization, Semi-Supervised Learning, Learning Theory} }

Document

**Published in:** LIPIcs, Volume 215, 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)

We present a general, efficient technique for providing contextual predictions that are "multivalid" in various senses, against an online sequence of adversarially chosen examples (x,y). This means that the resulting estimates correctly predict various statistics of the labels y not just marginally - as averaged over the sequence of examples - but also conditionally on x ∈ G for any G belonging to an arbitrary intersecting collection of groups 𝒢.
We provide three instantiations of this framework. The first is mean prediction, which corresponds to an online algorithm satisfying the notion of multicalibration from [Hébert-Johnson et al., 2018]. The second is variance and higher moment prediction, which corresponds to an online algorithm satisfying the notion of mean-conditioned moment multicalibration from [Jung et al., 2021]. Finally, we define a new notion of prediction interval multivalidity, and give an algorithm for finding prediction intervals which satisfy it. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods, giving rise to very general techniques for quantifying the uncertainty of predictions of black box algorithms, even in an online adversarial setting. When instantiated for prediction intervals, this solves a similar problem as conformal prediction, but in an adversarial environment and with multivalidity guarantees stronger than simple marginal coverage guarantees.

Varun Gupta, Christopher Jung, Georgy Noarov, Mallesh M. Pai, and Aaron Roth. Online Multivalid Learning: Means, Moments, and Prediction Intervals. In 13th Innovations in Theoretical Computer Science Conference (ITCS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 215, pp. 82:1-82:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gupta_et_al:LIPIcs.ITCS.2022.82, author = {Gupta, Varun and Jung, Christopher and Noarov, Georgy and Pai, Mallesh M. and Roth, Aaron}, title = {{Online Multivalid Learning: Means, Moments, and Prediction Intervals}}, booktitle = {13th Innovations in Theoretical Computer Science Conference (ITCS 2022)}, pages = {82:1--82:24}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-217-4}, ISSN = {1868-8969}, year = {2022}, volume = {215}, editor = {Braverman, Mark}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2022.82}, URN = {urn:nbn:de:0030-drops-156785}, doi = {10.4230/LIPIcs.ITCS.2022.82}, annote = {Keywords: Uncertainty Estimation, Calibration, Online Learning} }

Document

**Published in:** LIPIcs, Volume 192, 2nd Symposium on Foundations of Responsible Computing (FORC 2021)

We consider settings in which the right notion of fairness is not captured by simple mathematical definitions (such as equality of error rates across groups), but might be more complex and nuanced and thus require elicitation from individual or collective stakeholders. We introduce a framework in which pairs of individuals can be identified as requiring (approximately) equal treatment under a learned model, or requiring ordered treatment such as "applicant Alice should be at least as likely to receive a loan as applicant Bob". We provide a provably convergent and oracle efficient algorithm for learning the most accurate model subject to the elicited fairness constraints, and prove generalization bounds for both accuracy and fairness. This algorithm can also combine the elicited constraints with traditional statistical fairness notions, thus "correcting" or modifying the latter by the former. We report preliminary findings of a behavioral study of our framework using human-subject fairness constraints elicited on the COMPAS criminal recidivism dataset.

Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan Stapleton, and Zhiwei Steven Wu. An Algorithmic Framework for Fairness Elicitation. In 2nd Symposium on Foundations of Responsible Computing (FORC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 192, pp. 2:1-2:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{jung_et_al:LIPIcs.FORC.2021.2, author = {Jung, Christopher and Kearns, Michael and Neel, Seth and Roth, Aaron and Stapleton, Logan and Wu, Zhiwei Steven}, title = {{An Algorithmic Framework for Fairness Elicitation}}, booktitle = {2nd Symposium on Foundations of Responsible Computing (FORC 2021)}, pages = {2:1--2:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-187-0}, ISSN = {1868-8969}, year = {2021}, volume = {192}, editor = {Ligett, Katrina and Gupta, Swati}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2021.2}, URN = {urn:nbn:de:0030-drops-138701}, doi = {10.4230/LIPIcs.FORC.2021.2}, annote = {Keywords: Fairness, Fairness Elicitation} }

Document

**Published in:** LIPIcs, Volume 156, 1st Symposium on Foundations of Responsible Computing (FORC 2020)

When selecting locations for a set of centers, standard clustering algorithms may place unfair burden on some individuals and neighborhoods. We formulate a fairness concept that takes local population densities into account. In particular, given k centers to locate and a population of size n, we define the "neighborhood radius" of an individual i as the minimum radius of a ball centered at i that contains at least n/k individuals. Our objective is to ensure that each individual has a center that is within at most a small constant factor of her neighborhood radius.
We present several theoretical results: We show that optimizing this factor is NP-hard; we give an approximation algorithm that guarantees a factor of at most 2 in all metric spaces; and we prove matching lower bounds in some metric spaces. We apply a variant of this algorithm to real-world address data, showing that it is quite different from standard clustering algorithms and outperforms them on our objective function and balances the load between centers more evenly.

Christopher Jung, Sampath Kannan, and Neil Lutz. Service in Your Neighborhood: Fairness in Center Location. In 1st Symposium on Foundations of Responsible Computing (FORC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 156, pp. 5:1-5:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{jung_et_al:LIPIcs.FORC.2020.5, author = {Jung, Christopher and Kannan, Sampath and Lutz, Neil}, title = {{Service in Your Neighborhood: Fairness in Center Location}}, booktitle = {1st Symposium on Foundations of Responsible Computing (FORC 2020)}, pages = {5:1--5:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-142-9}, ISSN = {1868-8969}, year = {2020}, volume = {156}, editor = {Roth, Aaron}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2020.5}, URN = {urn:nbn:de:0030-drops-120215}, doi = {10.4230/LIPIcs.FORC.2020.5}, annote = {Keywords: Fairness, Clustering, Facility Location} }

Document

**Published in:** LIPIcs, Volume 151, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)

We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the conditional distribution on datasets induced by the transcript of the interaction is close to its expectation on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to the expectation of the query on the conditional distribution. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the conditional distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work.
An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality.

Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Moshe Shenfeld. A New Analysis of Differential Privacy’s Generalization Guarantees. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 31:1-31:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{jung_et_al:LIPIcs.ITCS.2020.31, author = {Jung, Christopher and Ligett, Katrina and Neel, Seth and Roth, Aaron and Sharifi-Malvajerdi, Saeed and Shenfeld, Moshe}, title = {{A New Analysis of Differential Privacy’s Generalization Guarantees}}, booktitle = {11th Innovations in Theoretical Computer Science Conference (ITCS 2020)}, pages = {31:1--31:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-134-4}, ISSN = {1868-8969}, year = {2020}, volume = {151}, editor = {Vidick, Thomas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2020.31}, URN = {urn:nbn:de:0030-drops-117165}, doi = {10.4230/LIPIcs.ITCS.2020.31}, annote = {Keywords: Differential Privacy, Adaptive Data Analysis, Transfer Theorem} }