Research and Projects

An overview of my major projects and publications

Research Projects

The following is a list of major research projects done through my Ph.D. and on my own time.

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Train-time data poisoning attacks compromise machine learning models by introducing adversarial examples during training, causing misclassification. We propose universal data purification methods using a stochastic transform, Ψ(x), implemented via iterative Langevin dynamics of Energy-Based Models (EBMs) and Denoising Diffusion Probabilistic Models (DDPMs). Our approach purifies poisoned data with minimal impact on classifier generalization achieve State-of-the-Art defense performance without requiring specific knowledge of the attack or classifier.

Paper Github Medium Blog

Causal Structural Hypothesis Testing and Data Generation Models

In this work, we introduce CSHTEST and CSVHTEST, novel architectures for causal model hypothesis testing and data generation. These models use non-parametric, structural causal knowledge and approximate a causal model's functional relationships using deep neural networks. The architectures are tested on extensive simulated DAGs, a synthetic pendulum dataset, and a real-world medical trauma dataset to show practical use for causal inference.

Paper/Video Github

Towards Composable Distributions of Latent Space Augmentations

We propose a composable framework for latent space image augmentation, based on the Variational Autoencoder (VAE) architecture, that allows for easy combination of multiple augmentations through linear transformations within the latent space. This method explores losses and augmentation latent geometry to ensure transformations are composable and involuntary, enabling combinations or inversions, and effectively constrains the VAE’s bottleneck to preserve specific augmentation variances and image features. We demonstrate the effectiveness of our approach on the MNIST dataset, showing improved control and geometric interpretability of the latent space compared to standard and Conditional VAEs.

Paper

De-Biasing Generative Models using Counterfactual Methods

Causal Counterfactual Generative Model (CCGM) is a VAE based framework with a partially trainable causal layer that learns causal relationships without compromising reconstruction fidelity, allowing for bias analysis, interventions, and scenario simulations. Our method, which combines a causal latent space VAE model with modifications for causal fidelity, can generate de-biased datasets from biased training data and offers finer control over the causal layer, as demonstrated by our initial experiments showing high fidelity in generating images and tabular data aligned with the causal framework.

Paper

Academic Publications

Bhat, Sunay Gajanan. (2024) Robust Modeling through Causal Priors and Data Purification in Machine Learning. PhD thesis, UCLA.

Bhat, Sunay, Jiang, Jeffrey, Pooladzandi, Omead, Branch, Alexander, Pottie, Gregory. (2024) PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics. arXiv preprint arXiv:2405.18627.

Pooladzandi, Omead, Jiang, Jeffrey, Bhat, Sunay, Pottie, Gregory. (2024) PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models. arXiv preprint arXiv:2405.19376.

Pooladzandi, Omead, Jiang, Jeffrey, Bhat, Sunay, Pottie, Gregory. (2023) Towards Composable Distributions of Latent Space Augmentations. arXiv preprint arXiv:2303.03462.

Bhat, Sunay Gajanan, Pooladzandi, Omead, Jiang, Jeffrey, Pottie, Gregory. (2022) Causal Structural Hypothesis Testing and Data Generation Models. Proceedings of the NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research.

Bhat, Sunay, Jiang, Jeffrey, Pooladzandi, Omead, Pottie, Gregory. (2022) De-biasing generative models using counterfactual methods. arXiv preprint arXiv:2207.01575.