# Research

## 2024

- An Embedding Framework for the Design and Analysis of Consistent Polyhedral Surrogates
*Jessie Finocchiaro*, Rafael M Frongillo , and Bo Waggoner*Journal of Machine Learning Research*, 2024We formalize and study the natural approach of designing convex surrogate loss functions via embeddings, for problems such as classification, ranking, or structured prediction. In this approach, one embeds each of the finitely many predictions (e.g. rankings) as a point in Rd, assigns the original loss values to these points, and "convexifies" the loss in some way to obtain a surrogate. We establish a strong connection between this approach and polyhedral (piecewise-linear convex) surrogate losses: every discrete loss is embedded by some polyhedral loss, and every polyhedral loss embeds some discrete loss. Moreover, an embedding gives rise to a consistent link function as well as linear surrogate regret bounds. Our results are constructive, as we illustrate with several examples. In particular, our framework gives succinct proofs of consistency or inconsistency for various polyhedral surrogates in the literature, and for inconsistent surrogates, it further reveals the discrete losses for which these surrogates are consistent. We go on to show additional structure of embeddings, such as the equivalence of embedding and matching Bayes risks, and the equivalence of various notions of non-redudancy. Using these results, we establish that indirect elicitation, a necessary condition for consistency, is also sufficient when working with polyhedral surrogates.

- Using Property Elicitation to Understand the Impacts of Fairness Constraints
*Jessie Finocchiaro*2024Predictive algorithms are often trained by optimizing some loss function, to which regularization functions are added to impose a penalty for violating constraints. As expected, the addition of such regularization functions can change the minimizer of the objective. It is not well-understood which regularizers change the minimizer of the loss, and, when the minimizer does change, how it changes. We use property elicitation to take first steps towards understanding the joint relationship between the loss and regularization functions and the optimal decision for a given problem instance. In particular, we give a necessary and sufficient condition on loss and regularizer pairs for when a property changes with the addition of the regularizer, and examine some regularizers satisfying this condition standard in the fair machine learning literature. We empirically demonstrate how algorithmic decision-making changes as a function of both data distribution changes and hardness of the constraints.

- Trading off Consistency and Dimensionality of Convex Surrogates for the ModeEnrique Nueve , Dhamma Kimpara , Bo Waggoner , and 1 more author2024
In multiclass classification over n outcomes, the outcomes must be embedded into the reals with dimension at least n−1 in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large n, such as in information retrieval and structured prediction tasks, optimizing a surrogate in n−1 dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than n−1 dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding n=2d outcomes into the d-dimensional unit cube and n=d! outcomes into the d-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with n2 dimensions over the whole simplex.

- Participatory Objective Design via Preference ElicitationAli Shirali ,
*Jessie Finocchiaro*, and Rediet Abebe2024In standard resource allocation problems, the designer sets the objective—such as utilitarian social welfare—that captures a societal goal and solves for the optimal allocation subject to fairness and item availability constraints. The participants, on the other hand, specify their preferences for the items being allocated, e.g., through stating how they rank the items or expressing their cardinal utility for each item. The objective function, which guides the overall allocation, is therefore determined by the designer in a top-down manner, whereas participants can only express their preferences for the items. This standard preference elicitation stage limits participants’ ability to express preferences for the overall allocation, such as the level of inequality, and influence the overall objective function. In this work, we examine whether it is possible to use this bottom-up preference elicitation stage to enable participants to express not only their preferences for individual items but also their preferences for the overall allocation, thereby indirectly influencing the objective function. We examine this question using a well-studied resource allocation problem where m divisible items must be allocated to n agents, who express their cardinal utilities over the items. The designer aims to optimize for the sum of the agents’ utilities for the items they receive. In particular, this utilitarian objective is agnostic to the overall inequality level. We consider a setting where the agents’ true utility is a function not only of their preferences for the items, but also the overall level of inequality. We model this using a popular social preference model from behavioral economics by Fehr and Schmidt, where agents can express levels of inequality aversion. We conduct a theoretical examination of this problem and show that there can be large gains in social welfare if the designer uses this richer inequality-aware preference model, instead of the standard inequality-agnostic preference model. Further, if we take the standard inequality-agnostic welfare as the benchmark, we show that the relative loss of welfare can be tightly bounded–shown to be independent of the number of agents and linear in the level of inequality aversion. With further assumptions on the preferences, we provide strictly tighter, distribution-free, and parametric bounds on the loss of welfare. We also discuss the worst-case drop in inequality-agnostic utility an agent might incur as a consequence of a designer allocating items using the inequality-averse preferences. We conclude with a discussion on possible designs to elicit the preferences of strategic agents over the goods and fairness. Taken together, our results argue for potentially large gains that can be obtained from using the richer social preference model and demonstrate the relatively minor losses from using the standard model, highlighting a promising avenue for using preference elicitation to empower participants to influence the overall objective function.

- Robustness of voting mechanisms to external information in expectationYiling Chen , and
*Jessie Finocchiaro*2024Analyses of voting algorithms often overlook informational externalities shaping individual votes. For example, pre-polling information often skews voters towards candidates who may not be their top choice, but who they believe would be a worthwhile recipient of their vote. In this work, we aim to understand the role of external information in voting outcomes. We study this by analyzing (1) the probability that voting outcomes align with external information, and (2) the effect of external information on the total utility across voters, or social welfare. In practice, voting mechanisms elicit coarse information about voter utilities, such as ordinal preferences, which initially prevents us from directly analyzing the effect of informational externalities with standard voting mechanisms. To overcome this, we present an intermediary mechanism for learning how preferences change with external information which does not require eliciting full cardinal preferences. With this tool in hand, we find that voting mechanisms are generally more likely to select the alternative most favored by the external information, and when external information reflects the population’s true preferences, social welfare increases in expectation.

## 2023

- Online platforms and the fair exposure problem under homophilyJakob Schoeffer , Alexander Ritchie , Keziah Naggita , and 3 more authors
*In Proceedings of the AAAI Conference on Artificial Intelligence*, 2023In the wake of increasing political extremism, online platforms have been criticized for contributing to polarization. One line of criticism has focused on echo chambers and the recommended content served to users by these platforms. In this work, we introduce the fair exposure problem: given limited intervention power of the platform, the goal is to enforce balance in the spread of content (e.g., news articles) among two groups of users through constraints similar to those imposed by the Fairness Doctrine in the United States in the past. Groups are characterized by different affiliations (e.g., political views) and have different preferences for content. We develop a stylized framework that models intra- and intergroup content propagation under homophily, and we formulate the platform’s decision as an optimization problem that aims at maximizing user engagement, potentially under fairness constraints. Our main notion of fairness requires that each group see a mixture of their preferred and non-preferred content, encouraging information diversity. Promoting such information diversity is often viewed as desirable and a potential means for breaking out of harmful echo chambers. We study the solutions to both the fairness-agnostic and fairness-aware problems. We prove that a fairness-agnostic approach inevitably leads to group-homogeneous targeting by the platform. This is only partially mitigated by imposing fairness constraints: we show that there exist optimal fairness-aware solutions which target one group with different types of content and the other group with only one type that is not necessarily the group’s most preferred. Finally, using simulations with real-world data, we study the system dynamics and quantify the price of fairness.

- Characterizing and Improving the Robustness of Predict-Then-Optimize FrameworksSonja Johnson-Yu ,
*Jessie Finocchiaro*, Kai Wang , and 4 more authors*In International Conference on Decision and Game Theory for Security*, 2023Optimization tasks situated in incomplete information settings are often preceded by a prediction problem to estimate the missing information; past work shows the traditional predict-then-optimize (PTO) framework can be improved by training a predictive model with respect to the optimization task through a PTO paradigm called decision-focused learning. Little is known, however, about the performance of traditional PTO and decision-focused learning when exposed to adversarial label drift. We provide modifications of traditional PTO and decision-focused learning that attempt to improve robustness by anticipating label drift. When the predictive model is perfectly expressive, we cast these learning problems as Stackelberg games. With these games, we provide a necessary condition for when anticipating label drift can improve the performance of a PTO algorithm: if performance can be improved, then the downstream optimization objective must be asymmetric. We then bound the loss of decision quality in the presence of adversarial label drift to show there may exist a strict gap between the performance of the two algorithms. We verify our theoretical findings empirically in two asymmetric and two symmetric settings. These experimental results demonstrate that robustified decision-focused learning is generally more robust to adversarial label drift than both robust and traditional PTO.

## 2022

- Consistent polyhedral surrogates for top-k classification and variants
*Jessie Finocchiaro*, Rafael Frongillo , Emma Goodwill , and 1 more author*In International Conference on Machine Learning*, 2022Top-k classification is a generalization of multiclass classification used widely in information retrieval, image classification, and other extreme classification settings. Several hinge-like (piecewise-linear) surrogates have been proposed for the problem, yet all are either non-convex or inconsistent. For the proposed hinge-like surrogates that are convex (i.e., polyhedral), we apply the recent embedding framework of Finocchiaro et al. (2019; 2022) to determine the prediction problem for which the surrogate is consistent. These problems can all be interpreted as variants of top-k classification, which may be better aligned with some applications. We leverage this analysis to derive constraints on the conditional label distributions under which these proposed surrogates become consistent for top-k. It has been further suggested that every convex hinge-like surrogate must be inconsistent for top-k. Yet, we use the same embedding framework to give the first consistent polyhedral surrogate for this problem.

- The structured abstain problem and the lovász hinge
*Jessie Finocchiaro*, Rafael M. Frongillo , and Enrique Nueve*In Conference on Learning Theory*, 2022The Lovász hinge is a convex surrogate recently proposed for structured binary classification, in which k binary predictions are made simultaneously and the error is judged by a submodular set function. Despite its wide usage in image segmentation and related problems, its consistency has remained open. We resolve this open question, showing that the Lovász hinge is inconsistent for its desired target unless the set function is modular. Leveraging a recent embedding framework, we instead derive the target loss for which the Lovász hinge is consistent. This target, which we call the structured abstain problem, allows one to abstain on any subset of the k predictions. We derive two link functions, each of which are consistent for all submodular set functions simultaneously.

- Designing Consistent and Convex Surrogates for General Prediction Tasks
*Jessie Finocchiaro**University of Colorado at Boulder*, 2022Supervised machine learning algorithms are often predicated on the minimization of loss functions which measure error of a given prediction against a ground truth label. The choice of loss function to minimize corresponds to a summary statistic of the underlying data distribution that is learned in this process. Historically, loss function design has often been ad-hoc, and often results in losses that are not actually statistically consistent with respect to the target prediction task. This work focuses on the design of losses that are simultaneously convex, consistent with respect to a target prediction task, and efficient in the dimension of the prediction space. We provide frameworks to construct such losses in both discrete prediction and continuous estimation settings, as well as tools to lower bound the prediction dimension for certain classes of consistent convex losses. We apply our results throughout to understand prediction tasks such as high-confidence classification, top-k prediction, variance estimation, conditional value at risk, and ratios of expectations.

## 2021

- Bridging machine learning and mechanism design towards algorithmic fairness
*Jessie Finocchiaro*, Roland Maio , Faidra Monachou , and 4 more authors*In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency*, 2021Decision-making systems increasingly orchestrate our world: how to intervene on the algorithmic components to build fair and equitable systems is therefore a question of utmost importance; one that is substantially complicated by the context-dependent nature of fairness and discrimination. Modern decision-making systems that involve allocating resources or information to people (e.g., school choice, advertising) incorporate machine-learned predictions in their pipelines, raising concerns about potential strategic behavior or constrained allocation, concerns usually tackled in the context of mechanism design. Although both machine learning and mechanism design have developed frameworks for addressing issues of fairness and equity, in some complex decision-making systems, neither framework is individually sufficient. In this paper, we develop the position that building fair decision-making systems requires overcoming these limitations which, we argue, are inherent to each field. Our ultimate objective is to build an encompassing framework that cohesively bridges the individual frameworks of mechanism design and machine learning. We begin to lay the ground work towards this goal by comparing the perspective each discipline takes on fair decision-making, teasing out the lessons each field has taught and can teach the other, and highlighting application domains that require a strong collaboration between these disciplines.

- Unifying lower bounds on prediction dimension of convex surrogates
*Jessie Finocchiaro*, Rafael M. Frongillo , and Bo Waggoner*Advances in Neural Information Processing Systems*, 2021Given a prediction task, understanding when one can and cannot design a consistent convex surrogate loss, particularly a low-dimensional one, is an important and active area of machine learning research. The prediction task may be given as a target loss, as in classification and structured prediction, or simply as a (conditional) statistic of the data, as in risk measure estimation. These two scenarios typically involve different techniques for designing and analyzing surrogate losses. We unify these settings using tools from property elicitation, and give a general lower bound on prediction dimension. Our lower bound tightens existing results in the case of discrete predictions, showing that previous calibration-based bounds can largely be recovered via property elicitation. For continuous estimation, our lower bound resolves on open problem on estimating measures of risk and uncertainty.

## 2020

- Embedding dimension of polyhedral losses
*Jessie Finocchiaro*, Rafael M. Frongillo , and Bo Waggoner*In Conference on Learning Theory*, 2020A common technique in supervised learning with discrete losses, such as 0-1 loss, is to optimize a convex surrogate loss over $\mathbbR^d, calibrated with respect to the original loss. In particular, recent work has investigated embedding the original predictions (e.g. labels) as points in \mathbbR^d, showing an equivalence to using polyhedral surrogates. In this work, we study the notion of the embedding dimension of a given discrete loss: the minimum dimension d such that an embedding exists. We characterize d-embeddability for all d, with a particularly tight characterization for d=1$ (embedding into the real line), and useful necessary conditions for d>1 in the form of a quadratic feasibility program. We illustrate our results with novel lower bounds for abstain loss.

- A Decomposition of a Complete Graph with a HoleRoxanne Back , Alejandra Brewer Castano , Rachel Galindo , and 1 more author
*Open Journal of Discrete Mathematics*, 2020In the field of design theory, the most well-known design is a Steiner Triple System. In general, a G-design on H is an edge-disjoint decomposition of H into isomorphic copies of G. In a Steiner Triple system, a complete graph is decomposed into triangles. In this paper we let H be a complete graph with a hole and G be a complete graph on four vertices minus one edge, also referred to as a K4 −e. A complete graph with a hole, Kd +v, consists of a complete graph on d vertices, Kd , and a set of independent vertices of size v, V, where each vertex in V is adjacent to each vertex in Kd . When d is even, we give two constructions for the decomposition of a complete graph with a hole into copies of K4 − e : the Alpha-Delta Construction, and the Alpha- Beta-Delta Construction. By restricting d and v so that v = 2(d 1−) 5−a , we are able to resolve both of these cases for a subset of Kd + v using difference methods and 1-factors.

- Evolutionary Optimization of Cooperative Strategies for the Iterated Prisoner’s Dilemma
*Jessie Finocchiaro*, and H David Mathias*IEEE Transactions on Games*, 2020The Iterated Prisoner’s Dilemma (IPD) has been studied in fields as diverse as economics, computer science, psychology, politics, and environmental studies. This is due, in part, to the intriguing property that its Nash Equilibrium is not globally optimal. Typically treated as a single-objective problem, a player’s goal is to maximize their own score. In some work, minimizing the opponent’s score is an additional objective. Here, we explore the role of explicitly optimizing for mutual cooperation in IPD player performance. We implement a genetic algorithm in which each member of the population evolves using one of four multi-objective fitness functions: selfish, communal, cooperative, and selfless, the last three of which use a cooperative metric as an objective. As a control, we also consider two single- objective fitness functions. We explore the role of representation in evolving cooperation by implementing four representations for evolving players. Finally, we evaluate the effect of noise on the evolution of cooperative behaviors. Testing our evolved players in tournaments in which a player’s own score is the sole metic, we find that players evolved with mutual cooperation as an objective are very competitive. Thus, learning to play nicely with others is a successful strategy for maximizing personal reward.

## 2019

- An embedding framework for consistent polyhedral surrogates
*Jessie Finocchiaro*, Rafael Frongillo , and Bo Waggoner*Advances in neural information processing systems*, 2019We formalize and study the natural approach of designing convex surrogate loss functions via embeddings, for problems such as classification, ranking, or structured prediction. In this approach, one embeds each of the finitely many predictions (e.g. rankings) as a point in ℝd, assigns the original loss values to these points, and "convexifies" the loss in some way to obtain a surrogate. We establish a strong connection between this approach and polyhedral (piecewise-linear convex) surrogate losses. Given any polyhedral loss L, we give a construction of a link function through which L is a consistent surrogate for the loss it embeds. Conversely, we show how to construct a consistent polyhedral surrogate for any given discrete loss. Our framework yields succinct proofs of consistency or inconsistency of various polyhedral surrogates in the literature, and for inconsistent surrogates, it further reveals the discrete losses for which these surrogates are consistent. We show some additional structure of embeddings, such as the equivalence of embedding and matching Bayes risks, and the equivalence of various notions of non-redudancy. Using these results, we establish that indirect elicitation, a necessary condition for consistency, is also sufficient when working with polyhedral surrogates.

- Evolving cooperation for the iterated prisoner’s dilemma
*Jessie Finocchiaro*, and H. David Mathias*In Proceedings of the Genetic and Evolutionary Computation Conference Companion*, 2019The Iterated Prisoner’s Dilemma (IPD) is an intriguing problem for which the Nash Equilibrium is not globally optimal. Typically treated as a single-objective problem, a player’s goal is to maximize their own score. In some work, minimizing the opponent’s score has been added as an additional objective. We explore the role of mutual cooperation in IPD player performance. We implement a genetic algorithm in which the population is divided into four multi-objective sub-populations: selfish, communal, cooperative, and selfless, the last three of which use a measure of mutual cooperation as an objective. Game play occurs among all members, without regard to sub-population, while crossover and selection occur only within a sub-population. Testing is against a population of well-known strategies and is single objective, using only self score. We find that players evolved to cooperate perform very well, in some cases dominating the competition. Thus, learning to play nicely with others is a successful strategy for maximizing personal reward.

## 2018

- Convex elicitation of continuous properties
*Jessie Finocchiaro*, and Rafael M. Frongillo*Advances in Neural Information Processing Systems*, 2018A property or statistic of a distribution is said to be elicitable if it can be expressed as the minimizer of some loss function in expectation. Recent work shows that continuous real-valued properties are elicitable if and only if they are identifiable, meaning the set of distributions with the same property value can be described by linear constraints. From a practical standpoint, one may ask for which such properties do there exist convex loss functions. In this paper, in a finite-outcome setting, we show that in fact essentially every elicitable real-valued property can be elicited by a convex loss function. Our proof is constructive, and leads to convex loss functions for new properties.

## 2017

- Egocentric height estimation
*Jessie Finocchiaro*, Aisha Urooj Khan , and Ali Borji*In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)*, 2017Egocentric, or first-person vision which became popular in recent years with an emerge in wearable technology, is different than exocentric (third-person) vision in some distinguishable ways, one of which being that the camera wearer is generally not visible in the video frames. Recent work has been done on action and object recognition in egocentric videos, as well as work on biometric extraction from first-person videos. Height estimation can be a useful feature for both soft-biometrics and object tracking. Here, we propose a method of estimating the height of an egocentric camera without any calibration or reference points. We used both traditional computer vision approaches and deep learning in order to determine the visual cues that results in best height estimation. Here, we introduce a framework inspired by two stream networks comprising of two Convolutional Neural Networks, one based on spatial information, and one based on information given by optical flow in a frame. Given an egocentric video as an input to the framework, our model yields a height estimate as an output. We also incorporate late fusion to learn a combination of temporal and spatial cues. Comparing our model with other methods we used as baselines, we achieve height estimates for videos with a Mean Average Error of 14.04 cm over a range of 103 cm of data, and classification accuracy for relative height (tall, medium or short) up to 93.75% where chance level is 33%.