UCF researchers had a 12 papers accepted into the Thirty-Eighth Annual Conference on Neural Information Processing Systems, that will be held at the Vancouver Convention Center Tuesday, December 10 through Sunday, December 15.
The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.
The h5-index is the h-index for articles published in the last 5 complete years. According to Google Scholar Metrics, NeurIPS is 7th overall and ranked 1st in the Artificial Intelligence subcategory in the h5-index rankings.
You can access the CRCV Publications Page and Aii Publications Page for enhanced search capabilities.
Wu, Junyi; Wang, Haoxuan; Shang, Yuzhang; Shah, Mubarak; Yan, Yan
PTQ4DiT: Post-training Quantization for Diffusion Transformers Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
@conference{Wu2024,
title = {PTQ4DiT: Post-training Quantization for Diffusion Transformers},
author = {Junyi Wu and Haoxuan Wang and Yuzhang Shang and Mubarak Shah and Yan Yan},
url = {https://nips.cc/virtual/2024/poster/95445
https://arxiv.org/pdf/2405.16005
https://github.com/adreamwu/PTQ4DiT},
year = {2024},
date = {2024-12-13},
urldate = {2024-12-13},
publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
abstract = {The recent introduction of Diffusion Transformers (DiTs) has demonstrated exceptional capabilities in image generation by using a different backbone architecture, departing from traditional U-Nets and embracing the scalable nature of transformers. Despite their advanced capabilities, the wide deployment of DiTs, particularly for real-time applications, is currently hampered by considerable computational demands at the inference stage. Post-training Quantization (PTQ) has emerged as a fast and data-efficient solution that can significantly reduce computation and memory footprint by using low-bit weights and activations. However, its applicability to DiTs has not yet been explored and faces non-trivial difficulties due to the unique design of DiTs. In this paper, we propose PTQ4DiT, a specifically designed PTQ method for DiTs. We discover two primary quantization challenges inherent in DiTs, notably the presence of salient channels with extreme magnitudes and the temporal variability in distributions of salient activation over multiple timesteps. To tackle these challenges, we propose Channel-wise Salience Balancing (CSB) and Spearmen's
ρ
-guided Salience Calibration (SSC). CSB leverages the complementarity property of channel magnitudes to redistribute the extremes, alleviating quantization errors for both activations and weights. SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation. Additionally, to eliminate extra computational costs caused by PTQ4DiT during inference, we design an offline re-parameterization strategy for DiTs. Experiments demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision (W8A8) while preserving comparable generation ability and further enables effective quantization to 4-bit weight precision (W4A8) for the first time.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
ρ
-guided Salience Calibration (SSC). CSB leverages the complementarity property of channel magnitudes to redistribute the extremes, alleviating quantization errors for both activations and weights. SSC extends this approach by dynamically adjusting the balanced salience to capture the temporal variations in activation. Additionally, to eliminate extra computational costs caused by PTQ4DiT during inference, we design an offline re-parameterization strategy for DiTs. Experiments demonstrate that our PTQ4DiT successfully quantizes DiTs to 8-bit precision (W8A8) while preserving comparable generation ability and further enables effective quantization to 4-bit weight precision (W4A8) for the first time.
Modi, Rajat; Rawat, Yogesh Singh
Asynchronous Perception Machine for Test Time Training Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
@conference{Modi2024,
title = {Asynchronous Perception Machine for Test Time Training},
author = {Rajat Modi and Yogesh Singh Rawat},
url = {https://www.crcv.ucf.edu/wp-content/uploads/2018/11/apm_final.pdf
https://www.crcv.ucf.edu/person/rawat/},
year = {2024},
date = {2024-12-11},
urldate = {2024-12-11},
publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
abstract = {In this work, we propose Asynchronous Perception Machine (APM), a computationally-efficient architecture for test-time-training (TTT). APM can process patches of an image one at a time in any order asymmetrically and still encode semantic-awareness. We demonstrate APM's ability to recognize out-of-distribution images without dataset-specific pre-training, and its competitive classification performance over existing TTT approaches. To perform TTT, APM just distills test sample's representation once. APM possesses a unique property: it can learn using just this single representation and starts predicting semantically-aware features. We demonstrate APM's potential application beyond test-time-training: APM can scale up to a dataset of 2D images and yield semantic-clusterings in a single forward pass. APM also provides first empirical evidence of GLOM's insight, i.e. percept is really a field. Therefore, APM helps us converge towards an implementation which can do both interpolation and perception on a shared-connectionist hardware. Our codebase has been provided for review and will be made publicly-available. --------It now appears that some of the ideas in GLOM could be made to work.https://www.technologyreview.com/2021/04/16/1021871/geoffrey-hinton-glom-godfather-ai-neural-networks/},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Lim, Hui Xian Grace; Cui, Xuanming; Rawat, Yogesh Singh; Lim, Ser-Nam
AirSketch: Generative Motion to Sketch Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
@conference{Lim2024,
title = {AirSketch: Generative Motion to Sketch},
author = {Hui Xian Grace Lim and Xuanming Cui and Yogesh Singh Rawat and Ser-Nam Lim },
url = {https://arxiv.org/pdf/2407.08906
https://www.crcv.ucf.edu/person/rawat/},
year = {2024},
date = {2024-12-09},
urldate = {2024-07-12},
publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
abstract = {Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Morafah, Mahdi; Kungurtsev, Vyacheslav; Chang, Hojin Matthew; Chen, Chen; Lin, Bill
Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
@conference{Morafah2024,
title = {Towards Diverse Device Heterogeneous Federated Learning via Task Arithmetic Knowledge Integration},
author = {Mahdi Morafah and Vyacheslav Kungurtsev and Hojin Matthew Chang and Chen Chen and Bill Lin},
url = {https://arxiv.org/pdf/2409.18461
https://mmorafah.github.io/takflpage/
https://github.com/MMorafah/TAKFL},
year = {2024},
date = {2024-12-09},
publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
abstract = {Federated Learning (FL) has emerged as a promising paradigm for collaborative machine learning, while preserving user data privacy. Despite its potential, standard FL algorithms lack support for diverse heterogeneous device prototypes, which vary significantly in model and dataset sizes—from small IoT devices to large workstations. This limitation is only partially addressed by existing knowledge distillation (KD) techniques, which often fail to transfer knowledge effectively across a broad spectrum of device prototypes with varied capabilities. This failure primarily stems from two issues: the dilution of informative logits from more capable devices by those from less capable ones, and the use of a single integrated logits as the distillation target across all devices, which neglects their individual learning capacities and and the unique contributions of each device. To address these challenges, we introduce TAKFL, a novel KD-based framework that treats the knowledge transfer from each device prototype’s ensemble as a separate task, independently distilling each to preserve its unique contributions and avoid dilution. TAKFL also incorporates a KD-based self-regularization technique to mitigate the issues related to the noisy and unsupervised ensemble distillation process. To integrate the separately distilled knowledge, we introduce an adaptive task arithmetic knowledge integration process, allowing each student model to customize the knowledge integration for optimal performance. Additionally, we present theoretical results demonstrating the effectiveness of task arithmetic in transferring knowledge across heterogeneous device prototypes with varying capacities. Comprehensive evaluations of our method across both computer vision (CV) and natural language processing (NLP) tasks demonstrate that TAKFL achieves state-of-the-art results in a variety of datasets and settings, significantly outperforming existing KD-based methods. Our code is released at https://github.com/MMorafah/TAKFL.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Wang, Lei; Bian, Jieming; Zhang, Letian; Chen, Chen; Xu, Jie
Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
@conference{Wang2024c,
title = {Taming Cross-Domain Representation Variance in Federated Prototype Learning with Heterogeneous Data Domains},
author = {Lei Wang and Jieming Bian and Letian Zhang and Chen Chen and Jie Xu},
url = {https://arxiv.org/pdf/2403.09048
https://arxiv.org/abs/2403.09048},
year = {2024},
date = {2024-12-09},
urldate = {2024-12-09},
publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
abstract = {Federated learning (FL) allows collaborative machine learning training without sharing private data. While most FL methods assume identical data domains across clients, real-world scenarios often involve heterogeneous data domains. Federated Prototype Learning (FedPL) addresses this issue, using mean feature vectors as prototypes to enhance model generalization. However, existing FedPL methods create the same number of prototypes for each client, leading to cross-domain performance gaps and disparities for clients with varied data distributions. To mitigate cross-domain feature representation variance, we introduce FedPLVM, which establishes variance-aware dual-level prototypes clustering and employs a novel α-sparsity prototype loss. The dual-level prototypes clustering strategy creates local clustered prototypes based on private data features, then performs global prototypes clustering to reduce communication complexity and preserve local data privacy. The α-sparsity prototype loss aligns samples from underrepresented domains, enhancing intra-class similarity and reducing inter-class similarity. Evaluations on Digit-5, Office-10, and DomainNet datasets demonstrate our method's superiority over existing approaches.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Lei Wang Liyun Zhu, Arjun Raj
Advancing Video Anomaly Detection: A Concise Review and a New Dataset Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
@conference{Zhu2024,
title = {Advancing Video Anomaly Detection: A Concise Review and a New Dataset},
author = {Liyun Zhu, Lei Wang, Arjun Raj, Tom Gedeon, Chen Chen},
url = {https://arxiv.org/pdf/2402.04857
https://arxiv.org/abs/2402.04857
https://msad-dataset.github.io/},
year = {2024},
date = {2024-12-09},
publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)},
abstract = {Video Anomaly Detection (VAD) finds widespread applications in security surveillance, traffic monitoring, industrial monitoring, and healthcare. Despite extensive research efforts, there remains a lack of concise reviews that provide insightful guidance for researchers. Such reviews would serve as quick references to grasp current challenges, research trends, and future directions. In this paper, we present such a review, examining models and datasets from various perspectives. We emphasize the critical relationship between model and dataset, where the quality and diversity of datasets profoundly influence model performance, and dataset development adapts to the evolving needs of emerging approaches. Our review identifies practical issues, including the absence of comprehensive datasets with diverse scenarios. To address this, we introduce a new dataset, Multi-Scenario Anomaly Detection (MSAD), comprising 14 distinct scenarios captured from various camera views. Our dataset has diverse motion patterns and challenging variations, such as different lighting and weather conditions, providing a robust foundation for training superior models. We conduct an in-depth analysis of recent representative models using MSAD and highlight its potential in addressing the challenges of detecting anomalies across diverse and evolving surveillance scenarios. Our dataset is available here.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Wang, Yue; Sun, Zhongchang; Zou, Shaofeng
A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Abstract | Tags: NeurIPS | Links:
@conference{Wang2024, title = {A Unified Principle of Pessimism for Offline Reinforcement Learning under Model Mismatch}, author = {Yue Wang and Zhongchang Sun and Shaofeng Zou}, url = {https://nips.cc/virtual/2024/poster/94438}, year = {2024}, date = {2024-12-12}, urldate = {2024-12-12}, publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, abstract = {Abstract: In this paper, we address the challenges of offline reinforcement learning (RL) under model mismatch, where the agent aims to optimize its performance through an offline dataset that may not accurately represent the deployment environment. We identify two primary challenges under the setting: inaccurate model estimation due to limited data and performance degradation caused by the model mismatch between the dataset-collecting environment and the target deployment one. To tackle these issues, we propose a unified principle of pessimism using distributionally robust Markov decision processes. We carefully construct a robust MDP with a single uncertainty set to tackle both data sparsity and model mismatch, and demonstrate that the optimal robust policy enjoys a near-optimal sub-optimality gap under the target environment across three widely used uncertainty models: total variation, χ^2 divergence, and KL divergence. Our results improve upon or match the state-of-the-art performance under the total variation and KL divergence models, and provide the first result for the χ^2 divergence model.}, keywords = {NeurIPS}, pubstate = {published}, tppubtype = {conference} }
Nguyen, Tri; Ibrahim, Shahana; Fu, Xiao
Noisy Label Learning with Instance-Dependent Outliers: Identifiability via Crowd Wisdom Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Abstract | Tags: NeurIPS | Links:
@conference{Nguyen2024, title = {Noisy Label Learning with Instance-Dependent Outliers: Identifiability via Crowd Wisdom}, author = {Tri Nguyen and Shahana Ibrahim and Xiao Fu}, url = {https://nips.cc/virtual/2024/poster/95831}, year = {2024}, date = {2024-12-12}, urldate = {2024-12-12}, publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, abstract = {The generation of label noise is often modeled as a process involving a probability transition matrix (often interpreted as the {it annotator confusion matrix}) imposed onto the ground-truth label distribution.Under this model, rectifying the label noise and learning the target classifier boil down to identifying the confusion matrix. This line of work demonstrated appealing empirical performance, yet identifiability of the model was mostly established by assuming an instance-invariant confusion matrix. Having an (occasionally) instance-dependent confusion matrix across data samples is apparently more realistic, but inevitably introduces outliers to the model.Our interest lies in confusion matrix-based noisy label learning with such outliers taken into consideration.We begin with pointing out that under the model of interest, detecting the outliers in the presence of a single confusion matrix is fundamentally insufficient.Then, we prove that by employing a crowdsourcing strategy involving multiple annotators, a carefully designed loss function can detect the outliers and identify the desired classifier under reasonable conditions.Our development builds upon a link between the noisy label model and a column-corrupted matrix factorization model---which turns out attesting to the importance of crowdsourced data annotation. Experiments show that our learning scheme substantially improves the outlier detection probability and the learned neural systems' testing accuracy.}, keywords = {NeurIPS}, pubstate = {published}, tppubtype = {conference} }
Chakraborty, Souradip; Ghosal, Soumya Suvra; Yin, Ming; Manocha, Dinesh; Wang, Mengdi; Bedi, Amrit Singh; Huang, Furong
Transfer Q-star : Principled Decoding for LLM Alignment Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Abstract | Tags: NeurIPS | Links:
@conference{Chakraborty2024b, title = {Transfer Q-star : Principled Decoding for LLM Alignment}, author = {Souradip Chakraborty and Soumya Suvra Ghosal and Ming Yin and Dinesh Manocha and Mengdi Wang and Amrit Singh Bedi and Furong Huang}, url = {https://neurips.cc/virtual/2024/poster/96588}, year = {2024}, date = {2024-12-12}, urldate = {2024-12-12}, publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, abstract = {Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward r, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function (Q∗), which is often unavailable in practice. Hence, prior SoTA methods either approximate this Q∗ using Qπsft (derived from the reference SFT model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer Q∗, which implicitly estimates the optimal value function for a target reward r through a baseline model ρBL aligned with a baseline reward rBL (which can be different from the target reward r). Theoretical analyses of Transfer Q∗ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference SFT model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets. }, keywords = {NeurIPS}, pubstate = {published}, tppubtype = {conference} }
r through a baseline model ρBL aligned with a baseline reward rBL (which can be different from the target reward r). Theoretical analyses of Transfer Q∗ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference SFT model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.
Bornstein, Marco; Bedi, Amrit Singh; Mohamed, Abdirisak; Huang, Furong
FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding? Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Abstract | Tags: NeurIPS | Links:
@conference{Bornstein2024, title = {FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?}, author = {Marco Bornstein and Amrit Singh Bedi and Abdirisak Mohamed and Furong Huang}, url = {https://nips.cc/virtual/2024/poster/95703}, year = {2024}, date = {2024-12-12}, publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, abstract = {Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way out of contributing to federated training. In an effort to make free-riding-averse federated mechanisms truthful, and consequently less prone to breaking down in practice, we propose FACT. FACT is the first federated mechanism that: (1) eliminates federated free riding by using a penalty system, (2) ensures agents provide truthful information by creating a competitive environment, and (3) encourages agent participation by offering better performance than training alone. Empirically, FACT avoids free-riding when agents are untruthful, and reduces agent loss by over 4x.}, keywords = {NeurIPS}, pubstate = {published}, tppubtype = {conference} }
Jialin Liu Ziang Chen, Xiaohan Chen
Rethinking the Capacity of Graph Neural Networks for Branching Strategy Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Abstract | Tags: NeurIPS | Links:
@conference{Chen2024, title = {Rethinking the Capacity of Graph Neural Networks for Branching Strategy}, author = {Ziang Chen, Jialin Liu, Xiaohan Chen, Xinshang Wang, Wotao Yin}, url = {https://neurips.cc/virtual/2024/poster/95991 https://arxiv.org/pdf/2402.07099}, year = {2024}, date = {2024-12-11}, publisher = {Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)}, abstract = {Graph neural networks (GNNs) have been widely used to predict properties and heuristics of mixed-integer linear programs (MILPs) and hence accelerate MILP solvers. This paper investigates the capacity of GNNs to represent strong branching (SB), the most effective yet computationally expensive heuristic employed in the branch-and-bound algorithm. In the literature, message-passing GNN (MP-GNN), as the simplest GNN structure, is frequently used as a fast approximation of SB and we find that not all MILPs's SB can be represented with MP-GNN. We precisely define a class of MP-tractable" MILPs for which MP-GNNs can accurately approximate SB scores. Particularly, we establish a universal approximation theorem: for any data distribution over the MP-tractable class, there always exists an MP-GNN that can approximate the SB score with arbitrarily high accuracy and arbitrarily high probability, which lays a theoretical foundation of the existing works on imitating SB with MP-GNN. For MILPs without the MP-tractability, unfortunately, a similar result is impossible, which can be illustrated by two MILP instances with different SB scores that cannot be distinguished by any MP-GNN, regardless of the number of parameters. Recognizing this, we explore another GNN structure called the second-order folklore GNN (2-FGNN) that overcomes this limitation, and the aforementioned universal approximation theorem can be extended to the entire MILP space using 2-FGNN, regardless of the MP-tractability. A small-scale numerical experiment is conducted to directly validate our theoretical findings.}, keywords = {NeurIPS}, pubstate = {published}, tppubtype = {conference} }
MP-tractable” MILPs for which MP-GNNs can accurately approximate SB scores. Particularly, we establish a universal approximation theorem: for any data distribution over the MP-tractable class, there always exists an MP-GNN that can approximate the SB score with arbitrarily high accuracy and arbitrarily high probability, which lays a theoretical foundation of the existing works on imitating SB with MP-GNN. For MILPs without the MP-tractability, unfortunately, a similar result is impossible, which can be illustrated by two MILP instances with different SB scores that cannot be distinguished by any MP-GNN, regardless of the number of parameters. Recognizing this, we explore another GNN structure called the second-order folklore GNN (2-FGNN) that overcomes this limitation, and the aforementioned universal approximation theorem can be extended to the entire MILP space using 2-FGNN, regardless of the MP-tractability. A small-scale numerical experiment is conducted to directly validate our theoretical findings.
Csaba, Botos; Zhang, Wenxuan; Müller, Matthias; Lim, Ser-Nam; Torr, Philip; Bibi, Adel
Label Delay in Online Continual Learning Conference
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024.
Abstract | Tags: NeurIPS | Links: