Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Published in Physical Review D, 2015
In this paper we use Dirac’s method to impose the Lorenz gauge condition in a general four-dimensional conformally flat spacetime and find that there is no particle production. We show that in cosmological spacetimes with dimension D≠4 there will be particle production when the scale factor changes, and we calculate the particle production due to a sudden change.
Recommended citation: Jesse C. Cresswell and Dan N. Vollick. Lorenz gauge quantization in conformally flat spacetimes. Phys. Rev. D 91, 084008, 2015 https://journals.aps.org/prd/abstract/10.1103/PhysRevD.91.084008
Published in Journal of High Energy Physics, 2017
Kinematic space can be used as an intermediate step in the AdS/CFT dictionary and lends itself naturally to the description of diffeomorphism invariant quantities. In this work conical defect spacetimes are considered and a duality is established between partial OPE blocks and bulk fields integrated over individual geodesics, minimal or non-minimal.
Recommended citation: Jesse C. Cresswell, Amanda W. Peet. Kinematic space for conical defects. JHEP 11 (2017) 155 https://link.springer.com/article/10.1007/JHEP11(2017)155
Published in Physical Review A, 2018
The growth of entanglement in an initially separable state, as measured by the purity of subsystems, can be characterized by a timescale that takes a universal form for any Hamiltonian. We show that the same timescale governs the growth of entanglement for all Rényi entropies. Since the family of Rényi entropies completely characterizes the entanglement of a pure bipartite state, our timescale is a universal feature of bipartite entanglement, depending only on the interaction Hamiltonian and the initial state.
Recommended citation: Jesse C. Cresswell. Universal entanglement timescale for Rényi entropies. Phys. Rev. A 97 022317, 2018 https://journals.aps.org/pra/abstract/10.1103/PhysRevA.97.022317
Published in Physical Review A, 2019
Negativity is an entanglement monotone frequently used to quantify entanglement in bipartite states. We develop techniques in the calculus of complex, patterned matrices and use them to conduct a perturbative analysis of negativity in terms of arbitrary variations of the density operator. Our methods are well suited to study the growth and decay of entanglement in a wide range of physical systems, including the generic linear growth of entanglement in many-body systems, and have broad relevance to many functions of quantum states and observables.
Recommended citation: Jesse C. Cresswell, Ilan Tzitrin, and Aaron Z. Goldberg. Perturbative expansion of entanglement negativity using patterned matrix calculus. Phys. Rev. A 99 012322, 2019 https://journals.aps.org/pra/abstract/10.1103/PhysRevA.99.012322
Published in Journal of High Energy Physics, 2019
We study the holographic duality between boundary OPE blocks and geodesic integrated bulk fields in quotients of AdS3 dual to excited CFT states. The quotient geometries exhibit non-minimal geodesics between pairs of spacelike separated boundary points which modify the OPE block duality. We decompose OPE blocks into quotient invariant operators and propose a duality with bulk fields integrated over individual geodesics, minimal or non-minimal.
Recommended citation: Jesse C. Cresswell, Ian T. Jardine, and Amanda W. Peet. Holographic relations for OPE blocks in excited states. JHEP 2019 3, 58 https://link.springer.com/article/10.1007/JHEP03(2019)058
Published in University of Toronto Doctoral Thesis, 2019
In this thesis we apply techniques from quantum information theory to study quantum gravity within the framework of the anti-de Sitter / conformal field theory correspondence (AdS/CFT). We present refinements of a duality between operator product expansion (OPE) blocks in the CFT, and geodesic integrated fields in AdS. Working with excited states within AdS3/CFT2, we show how the OPE block decomposes into more fine-grained CFT observables that are dual to AdS fields integrated over non-minimal geodesics. Additionally, this thesis contains results on the dynamics of entanglement measures for general quantum systems. Results are presented for the family of quantum Renyi entropies and entanglement negativity.
Recommended citation: Jesse C. Cresswell, Quantum Information Approaches to Quantum Gravity. University of Toronto Doctoral Thesis https://tspace.library.utoronto.ca/handle/1807/97354
Published in Journal of Physics A: Mathematical and Theoretical, 2020
There exist quantum entangled states for which all local actions on one subsystem can be equivalently realized by actions on another, thereby possessing operational symmetry. We characterize the states for which this fundamental property of entanglement does and does not hold, including multipartite and mixed states, and draw connections to quantum steering, envariance, the Reeh–Schlieder theorem, and classical entanglement.
Recommended citation: Ilan Tzitrin, Aaron Z. Goldberg, and Jesse C. Cresswell, Operational symmetries of entangled states. J. Phys. A: Math. Theor. 53 095304, 2021 https://iopscience.iop.org/article/10.1088/1751-8121/ab6fc9
Published in International Conference on Learning Representations 2021, 2021
We introduce cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that optimal cumulative accessibility functions are monotonic and can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality.
Recommended citation: Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg. C-Learning: Horizon-Aware Cumulative Accessibility Estimation. International Conference on Learning Representations https://openreview.net/forum?id=W3Wf_wKmqm9
Published in Advances in Neural Information Processing Systems, 2021
Generative modelling allows us to learn patterns in data and generate novel examples that are similar to real ones. Normalizing flows are one technique in machine learning for accomplishing this, however, they cannot directly model the space where realistic data lives. We show that composing a normalizing flow with a conformal embedding can model the data space, and demonstrate the effectiveness of this approach on real-world data sets.
Recommended citation: Brendan Leigh Ross and Jesse C. Cresswell. Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows. In Advances in Neural Information Processing Systems, volume 34, 2021 https://proceedings.neurips.cc/paper/2021/hash/dfd786998e082758be12670d856df755-Abstract.html
Published in Nature Scientific Reports, 2022
We conduct a case study of applying a differentially private federated learning framework for the analysis of histopathology images, the largest and perhaps most complex medical images. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.
Recommended citation: Mohammed Adnan, Shivam Kalra, Jesse C. Cresswell, Graham W. Taylor, and Hamid R. Tizhoosh. Federated Learning and Differential Privacy for Medical Image Analysis. Nature Scientific Reports, 12, 1953, 2022 https://www.nature.com/articles/s41598-022-05539-7
Published in Transactions on Machine Learning Research, 2022
The manifold hypothesis states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. We investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting.
Recommended citation: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Jesse C. Cresswell, and Anthony L. Caterini. Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. TMLR 2022 https://openreview.net/forum?id=0nEZCVshxS
Published in NeurIPS 2022 Workshop on Federated Learning: Recent Advances and New Challanges, 2022
In the traditional federated learning setting, a central server coordinates a network of clients to train one global model, but may serve many clients poorly due to data heterogeneity. We present a decentralized framework, FedeRiCo, where each client can learn as much or as little from other clients as is optimal for its local data distribution. Based on expectation-maximization, FedeRiCo estimates the utilities of other participants’ models on each client’s data so that everyone can select the right collaborators for learning.
Recommended citation: Yi Sui, Junfeng Wen, Yenson Lau, Brendan Leigh Ross, and Jesse C. Cresswell. Find Your Friends: Personalized Federated Learning with the Right Collaborators. NeurIPS 2022 Workshop on Federated Learning: Recent Advances and New Challanges https://arxiv.org/abs/2210.06597
Published in NeurIPS 2022 Workshop on Machine Learning and the Physical Sciences, 2022
Precision measurements and new physics searches at the Large Hadron Collider require efficient simulations of particle propagation and interactions within the detectors including calorimeter showers. However, the high-dimensional representation of showers belies the relative simplicity and structure of the underlying physical laws. We propose modelling calorimeter showers first by learning their manifold structure, and then estimating the density of data across this manifold. Learning manifold structure reduces the dimensionality of the data, which enables fast training and generation when compared with competing methods.
Recommended citation: Jesse C. Cresswell, Brendan Leigh Ross, Gabriel Loaiza-Ganem, Humberto Reyes-Gonzalez, Marco Letizia, and Anthony L. Caterini. CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds. NeurIPS 2022 Workshop on Machine Learning and the Physical Sciences. https://arxiv.org/abs/2211.15380
Published in NeurIPS 2022 Workshop on Understanding Deep Learning Through Empirical Falsification, 2022
Likelihood-based deep generative models exhibit pathological behaviour as a consequence of using high-dimensional densities to model data with low-dimensional structure. In this paper we propose two methodologies to remove the dimensionality mismatch during training. Our first approach is based on Tweedie’s formula, and the second on models which take the variance of added noise as a conditional input. We show that surprisingly, while well motivated, these approaches only sporadically improve performance over not adding noise, and that other methods of addressing the dimensionality mismatch are more empirically adequate.
Recommended citation: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Luhuan Wu, John P. Cunningham, Jesse C. Cresswell, and Anthony L. Caterini. Denoising Deep Generative Models. NeurIPS 2022 Workshop on Understanding Deep Learning Through Empirical Falsification. https://arxiv.org/abs/2212.01265
Published in International Conference on Learning Representations 2023, 2023
One of the most widely used techniques for private model training, differentially private stochastic gradient descent (DPSGD), frequently intensifies disparate impact on groups within data. In this work we study the fine-grained causes of unfairness in DPSGD and identify gradient misalignment due to inequitable gradient clipping as the most significant source.
Recommended citation: Maria S. Esipova, Atiyeh Ashari Ghomi, Yaqiao Luo, and Jesse C. Cresswell. Disparate Impact in Differential Privacy from Gradient Misalignment. International Conference on Learning Representations 2023 https://openreview.net/forum?id=qLOaeRvteqbx
Published in International Conference on Learning Representations 2023, 2023
The manifold hypothesis states that data lies on an unknown manifold of low intrinsic dimension. We argue that this hypothesis does not properly capture the low-dimensional structure typically present in data, and we put forth the union of manifolds hypothesis, which accommodates the existence of non-constant intrinsic dimensions. We empirically verify this hypothesis on commonly-used image datasets, and show that classes with higher intrinsic dimensions are harder to classify.
Recommended citation: Bradley C.A. Brown, Anthony L. Caterini, Brendan Leigh Ross, Jesse C. Cresswell, and Gabriel Loaiza-Ganem. Verifying the Union of Manifolds Hypothesis for Image Data. International Conference on Learning Representations 2023. https://openreview.net/forum?id=Rvee9CAX4fi
Published in Nature Communications, 2023
We propose a communication-efficient scheme for decentralized federated learning called ProxyFL, or proxy-based federated learning. Each participant in ProxyFL maintains two models, a private model, and a publicly shared proxy model designed to protect the participant’s privacy.
Recommended citation: Shivam Kalra, Junfeng Wen, Jesse C. Cresswell, Maksims Volkovs, and Hamid R. Tizhoosh. Decentralized federated learning through proxy model sharing. Nature Communications 14, 2899, 2023. https://www.nature.com/articles/s41467-023-38569-4
Published in Advances in Neural Information Processing Systems, 2023
We study image-based generative models spanning semantically-diverse datasets to understand and improve the feature extractors and metrics used to evaluate them. We conduct the largest human experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations, and that diffusion models are unfairly punished by common metrics based on Inception. We show that DINOv2-ViT-L/14 is the best alternative to Inception.
Recommended citation: George Stein, Jesse C. Cresswell, Rasa Hosseingzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In Advances in Neural Information Processing Systems, volume 36, 2023 https://proceedings.neurips.cc/paper_files/paper/2023/hash/0bc795afae289ed465a65a3b4b1f4eb7-Abstract-Conference.html
Published in Transactions on Machine Learning Research, 2024
Natural data is often constrained to a low dimensional manifold. We propose to model the data manifold implicitly as the set of zeros of a neural network. To learn the data distribution on the manifold, we introduce the constrained energy-based model, which uses a constrained variant of Langevin dynamics to train and sample within the learned manifold. The resulting model can be manipulated with an arithmetic of manifolds which allows practitioners to take unions and intersections of model manifolds.
Recommended citation: Brendan Leigh Ross, Gabriel Loaiza-Ganem, Anthony L. Caterini, and Jesse C. Cresswell. Neural Implicit Manifold Learning for Topology-Aware Generative Modelling. TMLR 2024 https://openreview.net/forum?id=lTOku838Zv
Published in International Conference on Learning Representations 2024, 2024
Self-supervised representation learning (SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines.
Recommended citation: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs. Self-supervised Representation Learning from Random Data Projectors. International Conference on Learning Representations 2024 https://openreview.net/forum?id=EpYnZpDpsQ
Published in Computer Vision and Pattern Recognition Conference 2024, 2024
The goal of multimodal alignment is to learn a single latent space that is shared between multimodal inputs. We surmise that existing unimodal encoders pre-trained on large amounts of unimodal data should provide an effective bootstrap to create multimodal models from unimodal ones at much lower costs. We therefore propose FuseMix, a multimodal augmentation scheme that operates on the latent spaces of arbitrary pre-trained unimodal encoders. Using FuseMix for multimodal alignment, we achieve competitive performance in both image-text and audio-text retrieval, with orders of magnitude less compute and data: for example, we outperform CLIP on the Flickr30K text-to-image retrieval task with ∼600× fewer GPU days and ∼80× fewer image-text pairs.
Recommended citation: Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti, Valentin Villecroze, Jesse C. Cresswell, Guangwei Yu, Gabriel Loaiza-Ganem, and Maksims Volkovs. Data-Efficient Multimodal Fusion on a Single GPU. Computer Vision and Pattern Recognition Conference 2024 https://openaccess.thecvf.com/content/CVPR2024/html/Vouitsis_Data-Efficient_Multimodal_Fusion_on_a_Single_GPU_CVPR_2024_paper.html
Published in Transactions on Machine Learning Research, 2024
Differential privacy and randomized smoothing respectively provide certifiable guarantees against privacy and adversarial attacks on machine learning models, however, it is not well understood how implementing either defense impacts the other. We argue that it is possible to achieve both privacy guarantees and certified robustness simultaneously, and provide a framework for integrating certified robustness through randomized smoothing into differentially private model training.
Recommended citation: Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, and Nicholas Papernot. Augment then Smooth: Reconciling Differential Privacy with Certified Robustness. TMLR 2024 https://openreview.net/forum?id=YN0IcnXqsr
Published in International Conference on Machine Learning 2024, 2024
In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee.
Recommended citation: Jesse C. Cresswell, Yi Sui, Bhargava Kumar, and Noël Vouitsis. Conformal Prediction Sets Improve Human Decision Making. International Conference on Machine Learning 2024 https://openreview.net/forum?id=4CO45y7Mlv
Published in International Conference on Machine Learning 2024, 2024
Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM.
Recommended citation: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem. A Geometric Explanation of the Likelihood OOD Detection Paradox. International Conference on Machine Learning 2024 https://openreview.net/forum?id=EVMzCKLpdD
Published in ICML 2024 Workshop on AI for Science, 2024
Novel machine learning methods for tabular data generation are often developed on small datasets which do not match the scale required for scientific applications. We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models on tabular data, which proved to be extremely memory intensive, even on tiny datasets. In this work, we conduct a critical analysis of the existing implementation from an engineering perspective, and show that these limitations are not fundamental to the method; with better implementation it can be scaled to datasets 370x larger than previously used. We also propose algorithmic improvements that can further benefit resource usage and model performance, including multi-output trees which are well-suited to generative modeling. Finally, we present results on large-scale scientific datasets derived from experimental particle physics as part of the Fast Calorimeter Simulation Challenge.
Recommended citation: Jesse C. Cresswell, Taewoo Kim. Scaling Up Diffusion and Flow-based XGBoost Models. ICML 2024 Workshop on AI for Science https://arxiv.org/abs/2408.16046
Published in ICML 2024 Workshop on Foundation Models in the Wild, 2024
Large-scale vision models have become integral in many applications due to their unprecedented performance and versatility across downstream tasks. However, the robustness of these foundation models has primarily been explored for a single task, namely image classification. The vulnerability of other common vision tasks, such as semantic segmentation and depth estimation, remains largely unknown. We present a comprehensive empirical evaluation of the adversarial robustness of self-supervised vision encoders across multiple downstream tasks. Our attacks operate in the encoder embedding space and at the downstream task output level. In both cases, current state-of-the-art adversarial fine-tuning techniques tested only for lassification significantly degrade clean and robust performance on other tasks. Since the purpose of a foundation model is to cater to multiple applications at once, our findings reveal the need to enhance encoder robustness more broadly. We discuss potential strategies for more robust foundation vision models across diverse downstream tasks.
Recommended citation: Antoni Kowalczuk, Jan Dubiński, Atiyeh Ashari Ghomi, Yi Sui, George Stein, Jiapeng Wu, Jesse C. Cresswell, Franziska Boenisch, Adam Dziedzic. Robust Self-Supervised Learning Across Diverse Downstream Tasks. ICML 2024 Workshop on Foundation Models in the Wild https://openreview.net/forum?id=U2nyqFbnRF
Published in Transactions on Machine Learning Research, 2024
In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold-supported data. This manifold lens provides both clarity as to why some DGMs (e.g. diffusion models and some generative adversarial networks) empirically surpass others (e.g. likelihood-based models such as variational autoencoders, normalizing flows, or energy-based models) at sample generation, and guidance for devising more performant DGMs. We carry out the first survey of DGMs viewed through this lens, making two novel contributions along the way.
Recommended citation: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, esse C. Cresswell. Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections. TMLR 2024 https://openreview.net/forum?id=a90WpmSi0I
Published in arXiv preprint, 2024
We present the results of the “Fast Calorimeter Simulation Challenge 2022” - the CaloChallenge. We study state-of-the-art generative models on four calorimeter shower datasets of increasing dimensionality, ranging from a few hundred voxels to a few tens of thousand voxels. The 31 individual submissions span a wide range of current popular generative architectures, including Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Diffusion models, and models based on Conditional Flow Matching. We compare all submissions in terms of quality of generated calorimeter showers, as well as shower generation time and model size. To assess the quality we use a broad range of different metrics including differences in 1-dimensional histograms of observables, KPD/FPD scores, AUCs of binary classifiers, and the log-posterior of a multiclass classifier. The results of the CaloChallenge provide the most complete and comprehensive survey of cutting-edge approaches to calorimeter fast simulation to date. In addition, our work provides a uniquely detailed perspective on the important problem of how to evaluate generative models. As such, the results presented here should be applicable for other domains that use generative AI and require fast and faithful generation of samples in a large phase space.
Recommended citation: Claudius Krause et al. CaloChallenge 2022: A Community Challenge for Fast Calorimeter Simulation. arXiv preprint 2410.21611 https://arxiv.org/abs/2410.21611
Published in Advances in Neural Information Processing Systems, 2024
High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum – i.e. the dimension of the submanifold it belongs to – is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models, i.e. diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide a LID estimator which addresses all the aforementioned deficiencies. Our estimator, called FLIPD, is compatible with all popular DMs, and outperforms existing baselines on LID estimation benchmarks. We also apply FLIPD on natural images where the true LID is unknown. Compared to competing estimators, FLIPD exhibits a higher correlation with non-LID measures of complexity, better matches a qualitative assessment of complexity, and is the only estimator to remain tractable with high-resolution images at the scale of Stable Diffusion.
Recommended citation: Hamidreza Kamkari, Brendan Leigh Ross, Rasa Hosseinzadeh, Jesse C. Cresswell, Gabriel Loaiza-Ganem. A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models. In Advances in Neural Information Processing Systems, volume 37, 2024 https://openreview.net/forum?id=nd8Q4a8aWl
Published in NeurIPS 2024 Workshop on Self-Supervised Learning - Theory and Practice, 2024
Meta-learning represents a strong class of approaches for solving few-shot learning tasks. Nonetheless, recent research suggests that simply pre-training a generic encoder can potentially surpass meta-learning algorithms. In this paper, we first discuss the reasons why meta-learning fails to stand out in these few-shot learning experiments, and hypothesize that it is due to the few-shot learning tasks lacking diversity. Furthermore, we propose DRESS, a task-agnostic Disentangled REpresentation-based Self-Supervised meta-learning approach that enables fast model adaptation on highly diversified few-shot learning tasks. Specifically, DRESS utilizes disentangled representation learning to create self-supervised tasks that can fuel the meta-training process. We validate the effectiveness of DRESS through experiments on few-shot classification tasks on datasets with multiple factors of variation. Through this paper, we advocate for a re-examination of proper setups for task adaptation studies, and aim to reignite interest in the potential of meta-learning for solving few-shot learning tasks via disentangled representations.
Recommended citation: Wei Cui, Yi Sui, Jesse C. Cresswell, Keyvan Golestan. DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks. NeurIPS 2024 Workshop on Self-Supervised Learning - Theory and Practice https://openreview.net/forum?id=AguQIV9CeN
Published in NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning, 2024
Although diffusion models can generate remarkably high-quality samples, they are intrinsically bottlenecked by their expensive iterative sampling procedure. Consistency models (CMs) have recently emerged as a promising diffusion model distillation method, reducing the cost of sampling by generating high fidelity samples in just a few iterations. Consistency model distillation aims to solve the probability flow ordinary differential equation (ODE) defined by an existing diffusion model. CMs are not directly trained to minimize error against an ODE solver, rather they use a more computationally tractable objective. As a way to study how effectively CMs solve the probability flow ODE, and the effect that any induced error has on the quality of generated samples, we introduce Direct CMs, which \textit{directly} minimize this error. Intriguingly, we find that Direct CMs reduce the ODE solving error compared to CMs but also result in significantly worse sample quality, calling into question why exactly CMs work well in the first place. Full training and evaluation code will be made publicly available.
Recommended citation: Noël Vouitsis, Rasa Hosseinzadeh, Brendan Leigh Ross, Valentin Villecroze, Satya Krishna Gorti, Jesse C. Cresswell, Gabriel Loaiza-Ganem. Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning https://openreview.net/forum?id=2p4ES8QPUi
Published in arXiv preprint, 2025
Large-scale data processing is increasingly done using distributed computing frameworks like Apache Spark, which have a considerable number of configurable parameters that affect runtime performance. For optimal performance, these parameters must be tuned to the specific job being run. Tuning commonly requires multiple executions to collect runtime information for updating parameters. This is infeasible for ad hoc queries that are run once or infrequently. Zero-execution tuning, where parameters are automatically set before a job’s first run, can provide significant savings for all types of applications, but is more challenging since runtime information is not available. In this work, we propose a novel method for zero-execution tuning of Spark configurations based on retrieval. Our method achieves 93.3% of the runtime improvement of state-of-the-art one-execution optimization, entirely avoiding the slow initial execution using default settings. The shift to zero-execution tuning results in a lower cumulative runtime over the first 140 runs, and provides the largest benefit for ad hoc and analytical queries which only need to be executed once. We release the largest and most comprehensive suite of Spark query datasets, optimal configurations, and runtime information, which will promote future development of zero-execution tuning methods.
Recommended citation: Raunaq Suri, Ilan Gofman, Guangwei Yu, Jesse C. Cresswell. Zero-Execution Retrieval-Augmented Configuration Tuning of Spark Applications. arXiv preprint 2503.03826 https://arxiv.org/abs/2503.03826
Published in International Conference on Learning Representations 2025, 2025
Although conformal prediction is a promising method for quantifying the uncertainty of machine learning models, the prediction sets it outputs are not inherently actionable. Many applications require a single output to act on, not several. To overcome this, prediction sets can be provided to a human who then makes an informed decision. In any such system it is crucial to ensure the fairness of outcomes across protected groups, and researchers have proposed that Equalized Coverage be used as the standard for fairness. By conducting experiments with human participants, we demonstrate that providing prediction sets can increase the unfairness of their decisions. Disquietingly, we find that providing sets that satisfy Equalized Coverage actually increases unfairness compared to marginal coverage. Instead of equalizing coverage, we propose to equalize set sizes across groups which empirically leads to more fair outcomes.
Recommended citation: Jesse C. Cresswell, Bhargava Kumar, Yi Sui, and Mouloud Belbahri. Conformal Prediction Sets Can Cause Disparate Impact. ICLR 2025 https://openreview.net/forum?id=fZK6AQXlUU
Published in International Conference on Learning Representations 2025, 2025
As deep generative models have progressed, recent work has shown them to be capable of memorizing and reproducing training datapoints when deployed. These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization. To better understand this phenomenon, we propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization. We propose to analyze memorization in terms of the relationship between the dimensionalities of (i) the ground truth data manifold and (ii) the manifold learned by the model. This framework provides a formal standard for “how memorized” a datapoint is and systematically categorizes memorized data into two types: memorization driven by overfitting and memorization driven by the underlying data distribution. By analyzing prior work in the context of the MMH, we explain and unify assorted observations in the literature. We empirically validate the MMH using synthetic data and image datasets up to the scale of Stable Diffusion, developing new tools for detecting and preventing generation of memorized samples in the process.
Recommended citation: Brendan Leigh Ross, Hamidreza Kamkari, Zhaoyan Liu, Tongzi Wu, George Stein, Gabriel Loaiza-Ganem, Jesse C. Cresswell. A Geometric Framework for Understanding Memorization in Generative Models. ICLR 2025 https://openreview.net/forum?id=aZ1gNJu8wO
Published in ICLR 2025 Workshop on Biderectional Human-AI Alignment, 2025
Trustworthy AI encompasses many aspirational aspects for aligning AI systems with human values, including fairness, privacy, robustness, explainability, and uncertainty quantification. However, efforts to enhance one aspect often introduce unintended trade-offs that negatively impact others, making it challenging to improve all aspects simultaneously. In this position paper, we review notable approaches to these five aspects and systematically consider every pair, detailing the negative interactions that can arise. For example, applying differential privacy to model training can amplify biases in the data, undermining fairness. Drawing on these findings, we take the position that addressing trustworthiness along each axis in isolation is insufficient. Instead, research on Trustworthy AI must account for intersectionality between aspects and adopt a holistic view across all relevant axes at once. To illustrate our perspective, we provide guidance on how researchers can work towards integrated trustworthiness, a case study on how intersectionality applies to the financial industry, and alternative views to our position.
Recommended citation: Jesse C. Cresswell. Trustworthy AI Must Account for Intersectionality. ICLR 2025 Workshop on Bidirectional Human-AI Alignment https://arxiv.org/abs/2504.07170
Published in 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025
Text-to-SQL generation enables non-experts to interact with databases via natural language. Recent advances rely on large closed-source models like GPT-4 that present challenges in accessibility, privacy, and latency. To address these issues, we focus on developing small, efficient, and open-source text-to-SQL models. We demonstrate the benefits of sampling multiple candidate SQL generations and propose our method, MSc-SQL, to critique them using associated metadata. Our sample critiquing model evaluates multiple outputs simultaneously, achieving state-of-the-art performance compared to other open-source models while remaining competitive with larger models at a much lower cost.
Recommended citation: Satya Krishna Gorti, Ilan Gofman, Zhaoyan Liu, Jiapeng Wu, Noël Vouitsis, Guangwei Yu, Jesse C. Cresswell, Rasa Hosseinzadeh. MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation. NAACL 2025 https://aclanthology.org/2025.naacl-long.107/
Published in arXiv preprint, 2025
Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem, which limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensitive to the prompts that bind them together, which often require significant manual tuning (i.e., prompt engineering). In this work, we address these challenges by viewing LLM-based systems through a Bayesian lens. We interpret prompts as textual parameters in a statistical model, allowing us to use a small training dataset to perform Bayesian inference over these prompts. This novel perspective enables principled uncertainty quantification over both the model’s textual parameters and its downstream predictions, while also incorporating prior beliefs about these parameters expressed in free-form text. To perform Bayesian inference, a difficult problem even for well-studied data modalities, we introduce Metropolis-Hastings through LLM Proposals (MHLP), a novel Markov chain Monte Carlo (MCMC) algorithm that combines prompt optimization techniques with standard MCMC methods. MHLP is a turnkey modification to existing LLM pipelines, including those that rely exclusively on closed-source models. Empirically, we demonstrate that our method yields improvements in both predictive accuracy and uncertainty quantification (UQ) on a range of LLM benchmarks and UQ tasks. More broadly, our work demonstrates a viable path for incorporating methods from the rich Bayesian literature into the era of LLMs, paving the way for more reliable and calibrated LLM-based systems.
Recommended citation: Brendan Leigh Ross, Noël Vouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, Jesse C. Cresswell. Textual Bayes: Quantifying Uncertainty in LLM-Based Systems. arXiv preprint: 2506.10060 https://arxiv.org/abs/2506.10060
Published in SIGIR Conference on Research and Development in Information Retrieval 2025, 2025
Existing research on Retrieval-Augmented Generation (RAG) primarily focuses on improving overall question-answering accuracy, often overlooking the quality of sub-claims within generated responses. Recent methods that attempt to improve RAG trustworthiness, such as auto-evaluation metrics, often lack probabilistic guarantees or requires ground truth answers, failing to provide reliable and scalable assessment. To address these limitations, we propose Conformal-RAG, a novel framework inspired by recent applications of conformal prediction in large language models (LLMs). Conformal-RAG leverages conformal prediction and internal information from the RAG mechanism to offer statistical guarantees on response quality. It ensures conditional coverage (potentially spanning multiple sub-domains) without requiring manual calibration of conformal sets, making it suitable for complex RAG applications. Compared to existing RAG auto-evaluation methods, Conformal-RAG offers statistical guarantees on the quality of refined sub-claims, ensuring response reliability without needing ground truth answers. Additionally, our experiments demonstrate that by leveraging RAG internal information, Conformal-RAG retains more high-quality sub-claims from the response while maintaining the same reliability guarantee as naïve adaptations of conformal prediction in LLMs. Specifically, Conformal-RAG retains 60% more high-quality sub-claims in biography generation tasks and 20% more in medication question-answering tasks.
Recommended citation: Naihe Feng, Yi Sui, Shiyi Hou, Jesse C. Cresswell, Ga Wu. Response Quality Assessment for Retrieval-Augmented Generation via Conditional Conformal Factuality. SIGIR 2025 https://dl.acm.org/doi/10.1145/3726302.3730244
Published in NeurIPS 2025 Workshop on Regulatable ML, 2025
Large language models (LLMs) have shown promising performance on tasks that require reasoning, such as text-to-SQL, code generation, and debugging. However, regulatory frameworks with strict privacy requirements constrain their integration into sensitive systems. State-of-the-art LLMs are also proprietary, costly, and resource-intensive, making local deployment impractical. Consequently, utilizing such LLMs often requires sharing data with third-party providers, raising privacy concerns and risking noncompliance with regulations. Although fine-tuned small language models (SLMs) can outperform LLMs on certain tasks and be deployed locally to mitigate privacy concerns, they underperform on more complex tasks such as text-to-SQL translation. In this work, we introduce MaskSQL, a text-to-SQL framework that utilizes abstraction as a privacy protection mechanism to mask sensitive information in LLM prompts. Unlike redaction, which removes content entirely, or generalization, which broadens tokens, abstraction retains essential information while discarding unnecessary details, striking an effective privacy-utility balance for the text-to-SQL task. Moreover, by providing mechanisms to control the privacy-utility tradeoff, MaskSQL facilitates adoption across a broader range of use cases. Our experimental results show that MaskSQL outperforms leading SLM-based text-to-SQL models and achieves performance approaching state-of-the-art LLM-based models, while preserving privacy.
Recommended citation: Sepideh Abedini, Shubhankar Mohapatra, D. B. Emerson, Masoumeh Shafieinejad, Jesse C. Cresswell, Xi He. MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction. arXiv preprint: 2509.23459 https://arxiv.org/abs/2509.23459
Published in arXiv preprint, 2025
Self-supervised representation learning (SSRL) has demonstrated remarkable empirical success, yet its underlying principles remain insufficiently understood. While recent works attempt to unify SSRL methods by examining their information-theoretic objectives or summarizing their heuristics for preventing representation collapse, architectural elements like the predictor network, stop-gradient operation, and statistical regularizer are often viewed as empirically motivated additions. In this paper, we adopt a first-principles approach and investigate whether the learning objective of an SSRL algorithm dictates its possible optimization strategies and model design choices. In particular, by starting from a variational mutual information (MI) lower bound, we derive two training paradigms, namely Self-Distillation MI (SDMI) and Joint MI (JMI), each imposing distinct structural constraints and covering a set of existing SSRL algorithms. SDMI inherently requires alternating optimization, making stop-gradient operations theoretically essential. In contrast, JMI admits joint optimization through symmetric architectures without such components. Under the proposed formulation, predictor networks in SDMI and statistical regularizers in JMI emerge as tractable surrogates for the MI objective. We show that many existing SSRL methods are specific instances or approximations of these two paradigms. This paper provides a theoretical explanation behind the choices of different architectural components of existing SSRL methods, beyond heuristic conveniences.
Recommended citation: Akhlaqur Rahman Sabby, Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu. Self-Supervised Representation Learning as Mutual Information Maximization. arXiv preprint: 2510.01345 https://arxiv.org/abs/2510.01345
Published in arXiv preprint, 2025
Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-world RAG systems, there are many potential causes for erroneous outputs. Understanding the range of errors that can occur in practice is crucial for robust deployment. We present a new taxonomy of the error types that can occur in realistic RAG systems, examples of each, and practical advice for addressing them. Additionally, we curate a dataset of erroneous RAG responses annotated by error types. We then propose an auto-evaluation method aligned with our taxonomy that can be used in practice to track and address errors during development.
Recommended citation: Kin Kwan Leung, Mouloud Belbahri, Yi Sui, Alex Labach, Xueying Zhang, Stephen Rose, Jesse C. Cresswell. Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems. arXiv preprint: 2510.13975 https://arxiv.org/abs/2510.13975
Published in Advances in Neural Information Processing Systems, 2025
Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out-of-the-box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model does not require any further training or tuning and takes a step toward automated causal inference.
Recommended citation: Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, Rahul G. Krishnan. CausalPFN: Amortized Causal Effect Estimation via In-Context Learning. In Advances in Neural Information Processing Systems, volume 38, 2025 https://arxiv.org/abs/2506.07918
Published in Advances in Neural Information Processing Systems 2025, 2025
Automatic summarization systems have advanced rapidly with large language models (LLMs), yet they still lack reliable guarantees on inclusion of critical content in high-stakes domains like healthcare, law, and finance. In this work, we introduce Conformal Importance Summarization, the first framework for importance-preserving summary generation which uses conformal prediction to provide rigorous, distribution-free coverage guarantees. By calibrating thresholds on sentence-level importance scores, we enable extractive document summarization with user-specified coverage and recall rates over critical content. Our method is model-agnostic, requires only a small calibration set, and seamlessly integrates with existing black-box LLMs. Experiments on established summarization benchmarks demonstrate that Conformal Importance Summarization achieves the theoretically assured information coverage rate. Our work suggests that Conformal Importance Summarization can be combined with existing techniques to achieve reliable, controllable automatic summarization, paving the way for safer deployment of AI summarization tools in critical applications. Code is provided as supplementary material and will be released upon publication.
Recommended citation: Bruce Kuwahara, Chen-Yuan Lin, Xiao Shi Huang, Kin Kwan Leung, Jullian Arta Yapeter, Ilya Stanevich, Felipe Perez, Jesse C. Cresswell. Document Summarization with Conformal Importance Guarantees. In Advances in Neural Information Processing Systems, volume 38, 2025 https://arxiv.org/abs/2509.20461
Published in Advances in Neural Information Processing Systems, 2025
Tabular data is one of the most ubiquitous sources of information worldwide, spanning a wide variety of domains. This inherent heterogeneity has slowed the development of Tabular Foundation Models (TFMs) capable of fast generalization to unseen datasets. In-Context Learning (ICL) has recently emerged as a promising solution for TFMs, enabling dynamic adaptation to new tasks without additional tuning. While many studies have attempted to re-purpose large language models for tabular ICL, they have had limited success, so recent works have focused on developing tabular-specific foundation models. In this work, we propose an approach to combine ICL-based retrieval with self supervised learning to train tabular foundation models. We also investigate the utility of real vs. synthetic data for model pre-training, and show that real data can contain useful signal not easily captured in synthetic training. Specifically, we show that incorporating real data during the pre-training phase can lead to significantly faster training and better downstream generalization to unseen data. Our resulting model, TabDPT, achieves top performance on both regression (CTR23) and classification (CC18) benchmarks. Importantly, we also demonstrate that with our pre-training procedure, scaling both model and data size leads to consistent performance improvements that follow power laws. This echoes scaling laws in LLMs and other foundation models, and suggests that Internet-scale TFMs can be achievable.
Recommended citation: Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, Maksims Volkovs. TabDPT: An Open Tabular Foundation Model. In Advances in Neural Information Processing Systems, volume 38, 2025 https://arxiv.org/abs/2410.18164
Published:
Published:
Due to the sensitive nature of medical data, hospitals are unable to merge their datasets to develop models. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.
Published:
Published:
Published:
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.