Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Published in Physical Review D, 2015
In this paper we use Dirac’s method to impose the Lorenz gauge condition in a general four-dimensional conformally flat spacetime and find that there is no particle production. We show that in cosmological spacetimes with dimension D≠4 there will be particle production when the scale factor changes, and we calculate the particle production due to a sudden change.
Recommended citation: Jesse C. Cresswell and Dan N. Vollick. Lorenz gauge quantization in conformally flat spacetimes. Phys. Rev. D 91, 084008, 2015 https://journals.aps.org/prd/abstract/10.1103/PhysRevD.91.084008
Published in Journal of High Energy Physics, 2017
Kinematic space can be used as an intermediate step in the AdS/CFT dictionary and lends itself naturally to the description of diffeomorphism invariant quantities. In this work conical defect spacetimes are considered and a duality is established between partial OPE blocks and bulk fields integrated over individual geodesics, minimal or non-minimal.
Recommended citation: Jesse C. Cresswell, Amanda W. Peet. Kinematic space for conical defects. JHEP 11 (2017) 155 https://link.springer.com/article/10.1007/JHEP11(2017)155
Published in Physical Review A, 2018
The growth of entanglement in an initially separable state, as measured by the purity of subsystems, can be characterized by a timescale that takes a universal form for any Hamiltonian. We show that the same timescale governs the growth of entanglement for all Rényi entropies. Since the family of Rényi entropies completely characterizes the entanglement of a pure bipartite state, our timescale is a universal feature of bipartite entanglement, depending only on the interaction Hamiltonian and the initial state.
Recommended citation: Jesse C. Cresswell. Universal entanglement timescale for Rényi entropies. Phys. Rev. A 97 022317, 2018 https://journals.aps.org/pra/abstract/10.1103/PhysRevA.97.022317
Published in Physical Review A, 2019
Negativity is an entanglement monotone frequently used to quantify entanglement in bipartite states. We develop techniques in the calculus of complex, patterned matrices and use them to conduct a perturbative analysis of negativity in terms of arbitrary variations of the density operator. Our methods are well suited to study the growth and decay of entanglement in a wide range of physical systems, including the generic linear growth of entanglement in many-body systems, and have broad relevance to many functions of quantum states and observables.
Recommended citation: Jesse C. Cresswell, Ilan Tzitrin, and Aaron Z. Goldberg. Perturbative expansion of entanglement negativity using patterned matrix calculus. Phys. Rev. A 99 012322, 2019 https://journals.aps.org/pra/abstract/10.1103/PhysRevA.99.012322
Published in Journal of High Energy Physics, 2019
We study the holographic duality between boundary OPE blocks and geodesic integrated bulk fields in quotients of AdS3 dual to excited CFT states. The quotient geometries exhibit non-minimal geodesics between pairs of spacelike separated boundary points which modify the OPE block duality. We decompose OPE blocks into quotient invariant operators and propose a duality with bulk fields integrated over individual geodesics, minimal or non-minimal.
Recommended citation: Jesse C. Cresswell, Ian T. Jardine, and Amanda W. Peet. Holographic relations for OPE blocks in excited states. JHEP 2019 3, 58 https://link.springer.com/article/10.1007/JHEP03(2019)058
Published in University of Toronto Doctoral Thesis, 2019
In this thesis we apply techniques from quantum information theory to study quantum gravity within the framework of the anti-de Sitter / conformal field theory correspondence (AdS/CFT). We present refinements of a duality between operator product expansion (OPE) blocks in the CFT, and geodesic integrated fields in AdS. Working with excited states within AdS3/CFT2, we show how the OPE block decomposes into more fine-grained CFT observables that are dual to AdS fields integrated over non-minimal geodesics. Additionally, this thesis contains results on the dynamics of entanglement measures for general quantum systems. Results are presented for the family of quantum Renyi entropies and entanglement negativity.
Recommended citation: Jesse C. Cresswell, Quantum Information Approaches to Quantum Gravity. University of Toronto Doctoral Thesis https://tspace.library.utoronto.ca/handle/1807/97354
Published in Journal of Physics A: Mathematical and Theoretical, 2020
There exist quantum entangled states for which all local actions on one subsystem can be equivalently realized by actions on another, thereby possessing operational symmetry. We characterize the states for which this fundamental property of entanglement does and does not hold, including multipartite and mixed states, and draw connections to quantum steering, envariance, the Reeh–Schlieder theorem, and classical entanglement.
Recommended citation: Ilan Tzitrin, Aaron Z. Goldberg, and Jesse C. Cresswell, Operational symmetries of entangled states. J. Phys. A: Math. Theor. 53 095304, 2021 https://iopscience.iop.org/article/10.1088/1751-8121/ab6fc9
Published in International Conference on Learning Representations 2021, 2021
We introduce cumulative accessibility functions, which measure the reachability of a goal from a given state within a specified horizon. We show that optimal cumulative accessibility functions are monotonic and can trade off speed and reliability in goal-reaching by suggesting multiple paths to a single goal depending on the provided horizon. We show that our method outperforms state-of-the-art goal-reaching algorithms in success rate, sample complexity, and path optimality.
Recommended citation: Panteha Naderian, Gabriel Loaiza-Ganem, Harry J. Braviner, Anthony L. Caterini, Jesse C. Cresswell, Tong Li, Animesh Garg. C-Learning: Horizon-Aware Cumulative Accessibility Estimation. International Conference on Learning Representations https://openreview.net/forum?id=W3Wf_wKmqm9
Published in Advances in Neural Information Processing Systems, 2021
Generative modelling allows us to learn patterns in data and generate novel examples that are similar to real ones. Normalizing flows are one technique in machine learning for accomplishing this, however, they cannot directly model the space where realistic data lives. We show that composing a normalizing flow with a conformal embedding can model the data space, and demonstrate the effectiveness of this approach on real-world data sets.
Recommended citation: Brendan Leigh Ross and Jesse C. Cresswell. Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows. In Advances in Neural Information Processing Systems, volume 34, 2021 https://proceedings.neurips.cc/paper/2021/hash/dfd786998e082758be12670d856df755-Abstract.html
Published in Nature Scientific Reports, 2022
We conduct a case study of applying a differentially private federated learning framework for the analysis of histopathology images, the largest and perhaps most complex medical images. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.
Recommended citation: Mohammed Adnan, Shivam Kalra, Jesse C. Cresswell, Graham W. Taylor, and Hamid R. Tizhoosh. Federated Learning and Differential Privacy for Medical Image Analysis. Nature Scientific Reports, 12, 1953, 2022 https://www.nature.com/articles/s41598-022-05539-7
Published in Transactions on Machine Learning Research, 2022
The manifold hypothesis states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. We investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting.
Recommended citation: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Jesse C. Cresswell, and Anthony L. Caterini. Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. TMLR 2022 https://openreview.net/forum?id=0nEZCVshxS
Published in NeurIPS 2022 Workshop on Federated Learning: Recent Advances and New Challanges, 2022
In the traditional federated learning setting, a central server coordinates a network of clients to train one global model, but may serve many clients poorly due to data heterogeneity. We present a decentralized framework, FedeRiCo, where each client can learn as much or as little from other clients as is optimal for its local data distribution. Based on expectation-maximization, FedeRiCo estimates the utilities of other participants’ models on each client’s data so that everyone can select the right collaborators for learning.
Recommended citation: Yi Sui, Junfeng Wen, Yenson Lau, Brendan Leigh Ross, and Jesse C. Cresswell. Find Your Friends: Personalized Federated Learning with the Right Collaborators. NeurIPS 2022 Workshop on Federated Learning: Recent Advances and New Challanges https://arxiv.org/abs/2210.06597
Published in NeurIPS 2022 Workshop on Machine Learning and the Physical Sciences, 2022
Precision measurements and new physics searches at the Large Hadron Collider require efficient simulations of particle propagation and interactions within the detectors including calorimeter showers. However, the high-dimensional representation of showers belies the relative simplicity and structure of the underlying physical laws. We propose modelling calorimeter showers first by learning their manifold structure, and then estimating the density of data across this manifold. Learning manifold structure reduces the dimensionality of the data, which enables fast training and generation when compared with competing methods.
Recommended citation: Jesse C. Cresswell, Brendan Leigh Ross, Gabriel Loaiza-Ganem, Humberto Reyes-Gonzalez, Marco Letizia, and Anthony L. Caterini. CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds. NeurIPS 2022 Workshop on Machine Learning and the Physical Sciences. https://arxiv.org/abs/2211.15380
Published in NeurIPS 2022 Workshop on Understanding Deep Learning Through Empirical Falsification, 2022
Likelihood-based deep generative models exhibit pathological behaviour as a consequence of using high-dimensional densities to model data with low-dimensional structure. In this paper we propose two methodologies to remove the dimensionality mismatch during training. Our first approach is based on Tweedie’s formula, and the second on models which take the variance of added noise as a conditional input. We show that surprisingly, while well motivated, these approaches only sporadically improve performance over not adding noise, and that other methods of addressing the dimensionality mismatch are more empirically adequate.
Recommended citation: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Luhuan Wu, John P. Cunningham, Jesse C. Cresswell, and Anthony L. Caterini. Denoising Deep Generative Models. NeurIPS 2022 Workshop on Understanding Deep Learning Through Empirical Falsification. https://arxiv.org/abs/2212.01265
Published in International Conference on Learning Representations 2023, 2023
One of the most widely used techniques for private model training, differentially private stochastic gradient descent (DPSGD), frequently intensifies disparate impact on groups within data. In this work we study the fine-grained causes of unfairness in DPSGD and identify gradient misalignment due to inequitable gradient clipping as the most significant source.
Recommended citation: Maria S. Esipova, Atiyeh Ashari Ghomi, Yaqiao Luo, and Jesse C. Cresswell. Disparate Impact in Differential Privacy from Gradient Misalignment. International Conference on Learning Representations 2023 https://openreview.net/forum?id=qLOaeRvteqbx
Published in International Conference on Learning Representations 2023, 2023
The manifold hypothesis states that data lies on an unknown manifold of low intrinsic dimension. We argue that this hypothesis does not properly capture the low-dimensional structure typically present in data, and we put forth the union of manifolds hypothesis, which accommodates the existence of non-constant intrinsic dimensions. We empirically verify this hypothesis on commonly-used image datasets, and show that classes with higher intrinsic dimensions are harder to classify.
Recommended citation: Bradley C.A. Brown, Anthony L. Caterini, Brendan Leigh Ross, Jesse C. Cresswell, and Gabriel Loaiza-Ganem. Verifying the Union of Manifolds Hypothesis for Image Data. International Conference on Learning Representations 2023. https://openreview.net/forum?id=Rvee9CAX4fi
Published in Nature Communications, 2023
We propose a communication-efficient scheme for decentralized federated learning called ProxyFL, or proxy-based federated learning. Each participant in ProxyFL maintains two models, a private model, and a publicly shared proxy model designed to protect the participant’s privacy.
Recommended citation: Shivam Kalra, Junfeng Wen, Jesse C. Cresswell, Maksims Volkovs, and Hamid R. Tizhoosh. Decentralized federated learning through proxy model sharing. Nature Communications 14, 2899, 2023. https://www.nature.com/articles/s41467-023-38569-4
Published in Advances in Neural Information Processing Systems, 2023
We study image-based generative models spanning semantically-diverse datasets to understand and improve the feature extractors and metrics used to evaluate them. We conduct the largest human experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations, and that diffusion models are unfairly punished by common metrics based on Inception. We show that DINOv2-ViT-L/14 is the best alternative to Inception.
Recommended citation: George Stein, Jesse C. Cresswell, Rasa Hosseingzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In Advances in Neural Information Processing Systems, volume 36, 2023 https://proceedings.neurips.cc/paper_files/paper/2023/hash/0bc795afae289ed465a65a3b4b1f4eb7-Abstract-Conference.html
Published in Transactions on Machine Learning Research, 2024
Natural data is often constrained to a low dimensional manifold. We propose to model the data manifold implicitly as the set of zeros of a neural network. To learn the data distribution on the manifold, we introduce the constrained energy-based model, which uses a constrained variant of Langevin dynamics to train and sample within the learned manifold. The resulting model can be manipulated with an arithmetic of manifolds which allows practitioners to take unions and intersections of model manifolds.
Recommended citation: Brendan Leigh Ross, Gabriel Loaiza-Ganem, Anthony L. Caterini, and Jesse C. Cresswell. Neural Implicit Manifold Learning for Topology-Aware Generative Modelling. TMLR 2024 https://openreview.net/forum?id=lTOku838Zv
Published in International Conference on Learning Representations 2024, 2024
Self-supervised representation learning (SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines.
Recommended citation: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs. Self-supervised Representation Learning from Random Data Projectors. International Conference on Learning Representations 2024 https://openreview.net/forum?id=EpYnZpDpsQ
Published in Computer Vision and Pattern Recognition Conference 2024, 2024
The goal of multimodal alignment is to learn a single latent space that is shared between multimodal inputs. We surmise that existing unimodal encoders pre-trained on large amounts of unimodal data should provide an effective bootstrap to create multimodal models from unimodal ones at much lower costs. We therefore propose FuseMix, a multimodal augmentation scheme that operates on the latent spaces of arbitrary pre-trained unimodal encoders. Using FuseMix for multimodal alignment, we achieve competitive performance in both image-text and audio-text retrieval, with orders of magnitude less compute and data: for example, we outperform CLIP on the Flickr30K text-to-image retrieval task with ∼600× fewer GPU days and ∼80× fewer image-text pairs.
Recommended citation: Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti, Valentin Villecroze, Jesse C. Cresswell, Guangwei Yu, Gabriel Loaiza-Ganem, and Maksims Volkovs. Data-Efficient Multimodal Fusion on a Single GPU. Computer Vision and Pattern Recognition Conference 2024 https://openaccess.thecvf.com/content/CVPR2024/html/Vouitsis_Data-Efficient_Multimodal_Fusion_on_a_Single_GPU_CVPR_2024_paper.html
Published in Transactions on Machine Learning Research, 2024
Differential privacy and randomized smoothing respectively provide certifiable guarantees against privacy and adversarial attacks on machine learning models, however, it is not well understood how implementing either defense impacts the other. We argue that it is possible to achieve both privacy guarantees and certified robustness simultaneously, and provide a framework for integrating certified robustness through randomized smoothing into differentially private model training.
Recommended citation: Jiapeng Wu, Atiyeh Ashari Ghomi, David Glukhov, Jesse C. Cresswell, Franziska Boenisch, and Nicholas Papernot. Augment then Smooth: Reconciling Differential Privacy with Certified Robustness. TMLR 2024 https://openreview.net/forum?id=YN0IcnXqsr
Published in International Conference on Machine Learning 2024, 2024
In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee.
Recommended citation: Jesse C. Cresswell, Yi Sui, Bhargava Kumar, and Noël Vouitsis. Conformal Prediction Sets Improve Human Decision Making. International Conference on Machine Learning 2024 https://openreview.net/forum?id=4CO45y7Mlv
Published in International Conference on Machine Learning 2024, 2024
Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM.
Recommended citation: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem. A Geometric Explanation of the Likelihood OOD Detection Paradox. International Conference on Machine Learning 2024 https://openreview.net/forum?id=EVMzCKLpdD
Published in Transactions on Machine Learning Research, 2024
In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold-supported data. This manifold lens provides both clarity as to why some DGMs (e.g. diffusion models and some generative adversarial networks) empirically surpass others (e.g. likelihood-based models such as variational autoencoders, normalizing flows, or energy-based models) at sample generation, and guidance for devising more performant DGMs. We carry out the first survey of DGMs viewed through this lens, making two novel contributions along the way.
Recommended citation: Gabriel Loaiza-Ganem, Brendan Leigh Ross, Rasa Hosseinzadeh, Anthony L. Caterini, esse C. Cresswell. Deep Generative Models through the Lens of the Manifold Hypothesis: A Survey and New Connections. TMLR 2024 https://openreview.net/forum?id=a90WpmSi0I
Published in ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling, 2024
As deep generative models have progressed, recent work has shown them to be capable of memorizing and reproducing training datapoints when deployed. These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization. To better understand this phenomenon, we propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization. We propose to analyze memorization in terms of the relationship between the dimensionalities of (i) the ground truth data manifold and (ii) the manifold learned by the model. This framework provides a formal standard for “how memorized” a datapoint is and systematically categorizes memorized data into two types: memorization driven by overfitting and memorization driven by the underlying data distribution. By analyzing prior work in the context of the MMH, we explain and unify assorted observations in the literature. We empirically validate the MMH using synthetic data and image datasets up to the scale of Stable Diffusion, developing new tools for detecting and preventing generation of memorized samples in the process.
Recommended citation: Brendan Leigh Ross, Hamidreza Kamkari, Zhaoyan Liu, Tongzi Wu, George Stein, Gabriel Loaiza-Ganem, Jesse C. Cresswell. A Geometric Framework for Understanding Memorization in Generative Models. ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling https://openreview.net/forum?id=sGHeIefdvL
Published in ICML 2024 Workshop on AI for Science, 2024
Novel machine learning methods for tabular data generation are often developed on small datasets which do not match the scale required for scientific applications. We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models on tabular data, which proved to be extremely memory intensive, even on tiny datasets. In this work, we conduct a critical analysis of the existing implementation from an engineering perspective, and show that these limitations are not fundamental to the method; with better implementation it can be scaled to datasets 370x larger than previously used. We also propose algorithmic improvements that can further benefit resource usage and model performance, including multi-output trees which are well-suited to generative modeling. Finally, we present results on large-scale scientific datasets derived from experimental particle physics as part of the Fast Calorimeter Simulation Challenge.
Recommended citation: Jesse C. Cresswell, Taewoo Kim. Scaling Up Diffusion and Flow-based XGBoost Models. ICML 2024 Workshop on AI for Science https://arxiv.org/abs/2408.16046
Published in ICML 2024 Workshop on Foundation Models in the Wild, 2024
Large-scale vision models have become integral in many applications due to their unprecedented performance and versatility across downstream tasks. However, the robustness of these foundation models has primarily been explored for a single task, namely image classification. The vulnerability of other common vision tasks, such as semantic segmentation and depth estimation, remains largely unknown. We present a comprehensive empirical evaluation of the adversarial robustness of self-supervised vision encoders across multiple downstream tasks. Our attacks operate in the encoder embedding space and at the downstream task output level. In both cases, current state-of-the-art adversarial fine-tuning techniques tested only for lassification significantly degrade clean and robust performance on other tasks. Since the purpose of a foundation model is to cater to multiple applications at once, our findings reveal the need to enhance encoder robustness more broadly. We discuss potential strategies for more robust foundation vision models across diverse downstream tasks.
Recommended citation: Antoni Kowalczuk, Jan Dubiński, Atiyeh Ashari Ghomi, Yi Sui, George Stein, Jiapeng Wu, Jesse C. Cresswell, Franziska Boenisch, Adam Dziedzic. Robust Self-Supervised Learning Across Diverse Downstream Tasks. ICML 2024 Workshop on Foundation Models in the Wild https://openreview.net/forum?id=U2nyqFbnRF
Published in arXiv preprint, 2024
Although conformal prediction is a promising method for quantifying the uncertainty of machine learning models, the prediction sets it outputs are not inherently actionable. Many applications require a single output to act on, not several. To overcome this, prediction sets can be provided to a human who then makes an informed decision. In any such system it is crucial to ensure the fairness of outcomes across protected groups, and researchers have proposed that Equalized Coverage be used as the standard for fairness. By conducting experiments with human participants, we demonstrate that providing prediction sets can increase the unfairness of their decisions. Disquietingly, we find that providing sets that satisfy Equalized Coverage actually increases unfairness compared to marginal coverage. Instead of equalizing coverage, we propose to equalize set sizes across groups which empirically leads to more fair outcomes.
Recommended citation: Jesse C. Cresswell, Bhargava Kumar, Yi Sui, and Mouloud Belbahri. Conformal Prediction Sets Can Cause Disparate Impact. arXiv preprint 2410.01888 https://arxiv.org/abs/2410.01888
Published in arXiv preprint, 2024
The challenges faced by neural networks on tabular data are well-documented and have hampered the progress of tabular foundation models. Techniques leveraging in-context learning (ICL) have shown promise here, allowing for dynamic adaptation to unseen data. ICL can provide predictions for entirely new datasets without further training or hyperparameter tuning, therefore providing very fast inference when encountering a novel task. However, scaling ICL for tabular data remains an issue: approaches based on large language models cannot efficiently process numeric tables, and tabular-specific techniques have not been able to effectively harness the power of real data to improve performance and generalization. We are able to overcome these challenges by training tabular-specific ICL-based architectures on real data with self-supervised learning and retrieval, combining the best of both worlds. Our resulting model – the Tabular Discriminative Pre-trained Transformer (TabDPT) – achieves state-of-the-art performance on the CC18 (classification) and CTR23 (regression) benchmarks with no task-specific fine-tuning, demonstrating the adapatability and speed of ICL once the model is pre-trained. TabDPT also demonstrates strong scaling as both model size and amount of available data increase, pointing towards future improvements simply through the curation of larger tabular pre-training datasets and training larger models.
Recommended citation: Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Maksims Volkovs, Anthony L. Caterini. TabDPT: Scaling Tabular Foundation Models. arXiv preprint 2410.18164 https://arxiv.org/abs/2410.18164
Published in arXiv preprint, 2024
We present the results of the “Fast Calorimeter Simulation Challenge 2022” - the CaloChallenge. We study state-of-the-art generative models on four calorimeter shower datasets of increasing dimensionality, ranging from a few hundred voxels to a few tens of thousand voxels. The 31 individual submissions span a wide range of current popular generative architectures, including Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Diffusion models, and models based on Conditional Flow Matching. We compare all submissions in terms of quality of generated calorimeter showers, as well as shower generation time and model size. To assess the quality we use a broad range of different metrics including differences in 1-dimensional histograms of observables, KPD/FPD scores, AUCs of binary classifiers, and the log-posterior of a multiclass classifier. The results of the CaloChallenge provide the most complete and comprehensive survey of cutting-edge approaches to calorimeter fast simulation to date. In addition, our work provides a uniquely detailed perspective on the important problem of how to evaluate generative models. As such, the results presented here should be applicable for other domains that use generative AI and require fast and faithful generation of samples in a large phase space.
Recommended citation: Claudius Krause et al. CaloChallenge 2022: A Community Challenge for Fast Calorimeter Simulation. arXiv preprint 2410.21611 https://arxiv.org/abs/2410.21611
Published in Advances in Neural Information Processing Systems, 2024
High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum – i.e. the dimension of the submanifold it belongs to – is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models, i.e. diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide a LID estimator which addresses all the aforementioned deficiencies. Our estimator, called FLIPD, is compatible with all popular DMs, and outperforms existing baselines on LID estimation benchmarks. We also apply FLIPD on natural images where the true LID is unknown. Compared to competing estimators, FLIPD exhibits a higher correlation with non-LID measures of complexity, better matches a qualitative assessment of complexity, and is the only estimator to remain tractable with high-resolution images at the scale of Stable Diffusion.
Recommended citation: Hamidreza Kamkari, Brendan Leigh Ross, Rasa Hosseinzadeh, Jesse C. Cresswell, Gabriel Loaiza-Ganem. A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models. In Advances in Neural Information Processing Systems, volume 37, 2024 https://openreview.net/forum?id=nd8Q4a8aWl
Published in NeurIPS 2024 Workshop on Self-Supervised Learning - Theory and Practice, 2024
Meta-learning represents a strong class of approaches for solving few-shot learning tasks. Nonetheless, recent research suggests that simply pre-training a generic encoder can potentially surpass meta-learning algorithms. In this paper, we first discuss the reasons why meta-learning fails to stand out in these few-shot learning experiments, and hypothesize that it is due to the few-shot learning tasks lacking diversity. Furthermore, we propose DRESS, a task-agnostic Disentangled REpresentation-based Self-Supervised meta-learning approach that enables fast model adaptation on highly diversified few-shot learning tasks. Specifically, DRESS utilizes disentangled representation learning to create self-supervised tasks that can fuel the meta-training process. We validate the effectiveness of DRESS through experiments on few-shot classification tasks on datasets with multiple factors of variation. Through this paper, we advocate for a re-examination of proper setups for task adaptation studies, and aim to reignite interest in the potential of meta-learning for solving few-shot learning tasks via disentangled representations.
Recommended citation: Wei Cui, Yi Sui, Jesse C. Cresswell, Keyvan Golestan. DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks. NeurIPS 2024 Workshop on Self-Supervised Learning - Theory and Practice https://openreview.net/forum?id=AguQIV9CeN
Published in NeurIPS 2024 Workshop on Table Representation Learning, 2024
Text-to-SQL generation enables non-experts to interact with databases via natural language. Recent advances rely on large closed-source models like GPT-4 that present challenges in accessibility, privacy, and latency. To address these issues, we focus on developing small, efficient, and open-source text-to-SQL models. We demonstrate the benefits of sampling multiple candidate SQL generations and propose our method, MSc-SQL, to critique them using associated metadata. Our sample critiquing model evaluates multiple outputs simultaneously, achieving state-of-the-art performance compared to other open-source models while remaining competitive with larger models at a much lower cost. Full code of our method will be released for the camera-ready version.
Recommended citation: Satya Krishna Gorti, Ilan Gofman, Zhaoyan Liu, Jiapeng Wu, Noël Vouitsis, Guangwei Yu, Jesse C. Cresswell, Rasa Hosseinzadeh. MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation. NeurIPS 2024 Workshop on Table Representation Learning https://openreview.net/forum?id=RubZlwPv6D
Published in NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning, 2024
Although diffusion models can generate remarkably high-quality samples, they are intrinsically bottlenecked by their expensive iterative sampling procedure. Consistency models (CMs) have recently emerged as a promising diffusion model distillation method, reducing the cost of sampling by generating high fidelity samples in just a few iterations. Consistency model distillation aims to solve the probability flow ordinary differential equation (ODE) defined by an existing diffusion model. CMs are not directly trained to minimize error against an ODE solver, rather they use a more computationally tractable objective. As a way to study how effectively CMs solve the probability flow ODE, and the effect that any induced error has on the quality of generated samples, we introduce Direct CMs, which \textit{directly} minimize this error. Intriguingly, we find that Direct CMs reduce the ODE solving error compared to CMs but also result in significantly worse sample quality, calling into question why exactly CMs work well in the first place. Full training and evaluation code will be made publicly available.
Recommended citation: Noël Vouitsis, Rasa Hosseinzadeh, Brendan Leigh Ross, Valentin Villecroze, Satya Krishna Gorti, Jesse C. Cresswell, Gabriel Loaiza-Ganem. Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples. NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning https://openreview.net/forum?id=2p4ES8QPUi
Published:
Published:
Due to the sensitive nature of medical data, hospitals are unable to merge their datasets to develop models. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.
Published:
Published:
Published:
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.