Machine Learning
See recent articles
Showing new listings for Tuesday, 26 November 2024
- [1] arXiv:2411.15173 [pdf, html, other]
-
Title: Decentralizing Test-time Adaptation under Heterogeneous Data StreamsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
While Test-Time Adaptation (TTA) has shown promise in addressing distribution shifts between training and testing data, its effectiveness diminishes with heterogeneous data streams due to uniform target estimation. As previous attempts merely stabilize model fine-tuning over time to handle continually changing environments, they fundamentally assume a homogeneous target domain at any moment, leaving the intrinsic real-world data heterogeneity unresolved. This paper delves into TTA under heterogeneous data streams, moving beyond current model-centric limitations. By revisiting TTA from a data-centric perspective, we discover that decomposing samples into Fourier space facilitates an accurate data separation across different frequency levels. Drawing from this insight, we propose a novel Frequency-based Decentralized Adaptation (FreDA) framework, which transitions data from globally heterogeneous to locally homogeneous in Fourier space and employs decentralized adaptation to manage diverse distribution this http URL, we devise a novel Fourier-based augmentation strategy to assist in decentralizing adaptation, which individually enhances sample quality for capturing each type of distribution shifts. Extensive experiments across various settings (corrupted, natural, and medical environments) demonstrate the superiority of our proposed framework over the state-of-the-arts.
- [2] arXiv:2411.15178 [pdf, html, other]
-
Title: Harnessing Scale and Physics: A Multi-Graph Neural Operator Framework for PDEs on Arbitrary GeometriesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Partial Differential Equations (PDEs) underpin many scientific phenomena, yet traditional computational approaches often struggle with complex, nonlinear systems and irregular geometries. This paper introduces the \textbf{AMG} method, a \textbf{M}ulti-\textbf{G}raph neural operator approach designed for efficiently solving PDEs on \textbf{A}rbitrary geometries. AMG leverages advanced graph-based techniques and dynamic attention mechanisms within a novel GraphFormer architecture, enabling precise management of diverse spatial domains and complex data interdependencies. By constructing multi-scale graphs to handle variable feature frequencies and a physics graph to encapsulate inherent physical properties, AMG significantly outperforms previous methods, which are typically limited to uniform grids. We present a comprehensive evaluation of AMG across six benchmarks, demonstrating its consistent superiority over existing state-of-the-art models. Our findings highlight the transformative potential of tailored graph neural operators in surmounting the challenges faced by conventional PDE solvers. Our code and datasets are available on \url{this https URL}.
- [3] arXiv:2411.15179 [pdf, html, other]
-
Title: Random Forest-Supervised Manifold AlignmentComments: 4 pages, 3 figures, Accepted at MMAI 2024 (BigData 2024)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Manifold alignment is a type of data fusion technique that creates a shared low-dimensional representation of data collected from multiple domains, enabling cross-domain learning and improved performance in downstream tasks. This paper presents an approach to manifold alignment using random forests as a foundation for semi-supervised alignment algorithms, leveraging the model's inherent strengths. We focus on enhancing two recently developed alignment graph-based by integrating class labels through geometry-preserving proximities derived from random forests. These proximities serve as a supervised initialization for constructing cross-domain relationships that maintain local neighborhood structures, thereby facilitating alignment. Our approach addresses a common limitation in manifold alignment, where existing methods often fail to generate embeddings that capture sufficient information for downstream classification. By contrast, we find that alignment models that use random forest proximities or class-label information achieve improved accuracy on downstream classification tasks, outperforming single-domain baselines. Experiments across multiple datasets show that our method typically enhances cross-domain feature integration and predictive performance, suggesting that random forest proximities offer a practical solution for tasks requiring multimodal data alignment.
- [4] arXiv:2411.15180 [pdf, html, other]
-
Title: Multi-layer matrix factorization for cancer subtyping using full and partial multi-omics datasetSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Cancer, with its inherent heterogeneity, is commonly categorized into distinct subtypes based on unique traits, cellular origins, and molecular markers specific to each type. However, current studies primarily rely on complete multi-omics datasets for predicting cancer subtypes, often overlooking predictive performance in cases where some omics data may be missing and neglecting implicit relationships across multiple layers of omics data integration. This paper introduces Multi-Layer Matrix Factorization (MLMF), a novel approach for cancer subtyping that employs multi-omics data clustering. MLMF initially processes multi-omics feature matrices by performing multi-layer linear or nonlinear factorization, decomposing the original data into latent feature representations unique to each omics type. These latent representations are subsequently fused into a consensus form, on which spectral clustering is performed to determine subtypes. Additionally, MLMF incorporates a class indicator matrix to handle missing omics data, creating a unified framework that can manage both complete and incomplete multi-omics data. Extensive experiments conducted on 10 multi-omics cancer datasets, both complete and with missing values, demonstrate that MLMF achieves results that are comparable to or surpass the performance of several state-of-the-art approaches.
- [5] arXiv:2411.15182 [pdf, html, other]
-
Title: Forecasting Application Counts in Talent Acquisition Platforms: Harnessing Multimodal Signals using LMsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
As recruitment and talent acquisition have become more and more competitive, recruitment firms have become more sophisticated in using machine learning (ML) methodologies for optimizing their day to day activities. But, most of published ML based methodologies in this area have been limited to the tasks like candidate matching, job to skill matching, job classification and normalization. In this work, we discuss a novel task in the recruitment domain, namely, application count forecasting, motivation of which comes from designing of effective outreach activities to attract qualified applicants. We show that existing auto-regressive based time series forecasting methods perform poorly for this task. Henceforth, we propose a multimodal LM-based model which fuses job-posting metadata of various modalities through a simple encoder. Experiments from large real-life datasets from CareerBuilder LLC show the effectiveness of the proposed method over existing state-of-the-art methods.
- [6] arXiv:2411.15185 [pdf, html, other]
-
Title: Hybrid Gaussian Process Regression with Temporal Feature Extraction for Partially Interpretable Remaining Useful Life Interval Prediction in Aeroengine PrognosticsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The estimation of Remaining Useful Life (RUL) plays a pivotal role in intelligent manufacturing systems and Industry 4.0 technologies. While recent advancements have improved RUL prediction, many models still face interpretability and compelling uncertainty modeling challenges. This paper introduces a modified Gaussian Process Regression (GPR) model for RUL interval prediction, tailored for the complexities of manufacturing process development. The modified GPR predicts confidence intervals by learning from historical data and addresses uncertainty modeling in a more structured way. The approach effectively captures intricate time-series patterns and dynamic behaviors inherent in modern manufacturing systems by coupling GPR with deep adaptive learning-enhanced AI process models. Moreover, the model evaluates feature significance to ensure more transparent decision-making, which is crucial for optimizing manufacturing processes. This comprehensive approach supports more accurate RUL predictions and provides transparent, interpretable insights into uncertainty, contributing to robust process development and management.
- [7] arXiv:2411.15189 [pdf, html, other]
-
Title: Order Is All You Need for Categorical Data ClusteringSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Categorical data composed of nominal valued attributes are ubiquitous in knowledge discovery and data mining tasks. Due to the lack of well-defined metric space, categorical data distributions are difficult to intuitively understand. Clustering is a popular technique suitable for data analysis. However, the success of clustering often relies on reasonable distance metrics, which happens to be what categorical data naturally lack. Therefore, the cluster analysis of categorical data is considered a critical but challenging problem. This paper introduces the new finding that the order relation among attribute values is the decisive factor in clustering accuracy, and is also the key to understanding the categorical data clusters. To automatically obtain the orders, we propose a new learning paradigm that allows joint learning of clusters and the orders. It turns out that clustering with order learning achieves superior clustering accuracy, and the learned orders provide intuition for understanding the cluster distribution of categorical data. Extensive experiments with statistical evidence and case studies have verified the effectiveness of the new ``order is all you need'' insight and the proposed method.
- [8] arXiv:2411.15191 [pdf, html, other]
-
Title: Tailoring the Hyperparameters of a Wide-Kernel Convolutional Neural Network to Fit Different Bearing Fault Vibration DatasetsComments: 71 pages, 14 figures, 7 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
State-of-the-art algorithms are reported to be almost perfect at distinguishing the vibrations arising from healthy and damaged machine bearings, according to benchmark datasets at least. However, what about their application to new data? In this paper, we are able to confirm that neural networks for bearing fault detection can be crippled by incorrect hyperparameterisation, and also that the correct hyperparameter settings can actually change when transitioning to new data. The paper weaves together multiple methods to explain the behaviour of the hyperparameters of a wide-kernel convolutional neural network and how to set them. Since guidance already exists for generic hyperparameters like minibatch size, we focus on how to set architecture-specific hyperparameters such as the width of the convolutional kernels, a topic which might otherwise be obscure. We reflect different data properties by fusing information from seven different benchmark datasets, and our results show that the kernel size in the first layer in particular is sensitive to changes in the data. Looking deeper, we use manipulated copies of one dataset in an attempt to spot why the kernel size sometimes needs to change. The relevance of sampling rate is studied by using different levels of resampling, and spectral content is studied by increasingly filtering out high frequencies. At the end of our paper we conclude by stating clear guidance on how to set the hyperparameters of our neural network architecture.
- [9] arXiv:2411.15194 [pdf, html, other]
-
Title: Guiding Word Equation Solving using Graph Neural Networks (Extended Technical Report)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Logic in Computer Science (cs.LO)
This paper proposes a Graph Neural Network-guided algorithm for solving word equations, based on the well-known Nielsen transformation for splitting equations. The algorithm iteratively rewrites the first terms of each side of an equation, giving rise to a tree-like search space. The choice of path at each split point of the tree significantly impacts solving time, motivating the use of Graph Neural Networks (GNNs) for efficient split decision-making. Split decisions are encoded as multi-classification tasks, and five graph representations of word equations are introduced to encode their structural information for GNNs. The algorithm is implemented as a solver named DragonLi. Experiments are conducted on artificial and real-world benchmarks. The algorithm performs particularly well on satisfiable problems. For single word \mbox{equations}, DragonLi can solve significantly more problems than well-established string solvers. For the conjunction of multiple word equations, DragonLi is competitive with state-of-the-art string solvers.
- [10] arXiv:2411.15197 [pdf, html, other]
-
Title: K-means Derived Unsupervised Feature Selection using Improved ADMMSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Feature selection is important for high-dimensional data analysis and is non-trivial in unsupervised learning problems such as dimensionality reduction and clustering. The goal of unsupervised feature selection is finding a subset of features such that the data points from different clusters are well separated. This paper presents a novel method called K-means Derived Unsupervised Feature Selection (K-means UFS). Unlike most existing spectral analysis based unsupervised feature selection methods, we select features using the objective of K-means. We develop an alternating direction method of multipliers (ADMM) to solve the NP-hard optimization problem of our K-means UFS model. Extensive experiments on real datasets show that our K-means UFS is more effective than the baselines in selecting features for clustering.
- [11] arXiv:2411.15203 [pdf, other]
-
Title: Multimodal large language model for wheat breeding: a new exploration of smart breedingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
UAV remote sensing technology has become a key technology in crop breeding, which can achieve high-throughput and non-destructive collection of crop phenotyping data. However, the multidisciplinary nature of breeding has brought technical barriers and efficiency challenges to knowledge mining. Therefore, it is important to develop a smart breeding goal tool to mine cross-domain multimodal data. Based on different pre-trained open-source multimodal large language models (MLLMs) (e.g., Qwen-VL, InternVL, Deepseek-VL), this study used supervised fine-tuning (SFT), retrieval-augmented generation (RAG), and reinforcement learning from human feedback (RLHF) technologies to inject cross-domain knowledge into MLLMs, thereby constructing multiple multimodal large language models for wheat breeding (WBLMs). The above WBLMs were evaluated using the newly created evaluation benchmark in this study. The results showed that the WBLM constructed using SFT, RAG and RLHF technologies and InternVL2-8B has leading performance. Then, subsequent experiments were conducted using the WBLM. Ablation experiments indicated that the combination of SFT, RAG, and RLHF technologies can improve the overall generation performance, enhance the generated quality, balance the timeliness and adaptability of the generated answer, and reduce hallucinations and biases. The WBLM performed best in wheat yield prediction using cross-domain data (remote sensing, phenotyping, weather, germplasm) simultaneously, with R2 and RMSE of 0.821 and 489.254 kg/ha, respectively. Furthermore, the WBLM can generate professional decision support answers for phenotyping estimation, environmental stress assessment, target germplasm screening, cultivation technique recommendation, and seed price query tasks.
- [12] arXiv:2411.15204 [pdf, html, other]
-
Title: Label Distribution Shift-Aware Prediction Refinement for Test-Time AdaptationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Test-time adaptation (TTA) is an effective approach to mitigate performance degradation of trained models when encountering input distribution shifts at test time. However, existing TTA methods often suffer significant performance drops when facing additional class distribution shifts. We first analyze TTA methods under label distribution shifts and identify the presence of class-wise confusion patterns commonly observed across different covariate shifts. Based on this observation, we introduce label Distribution shift-Aware prediction Refinement for Test-time adaptation (DART), a novel TTA method that refines the predictions by focusing on class-wise confusion patterns. DART trains a prediction refinement module during an intermediate time by exposing it to several batches with diverse class distributions using the training dataset. This module is then used during test time to detect and correct class distribution shifts, significantly improving pseudo-label accuracy for test data. Our method exhibits 5-18% gains in accuracy under label distribution shifts on CIFAR-10C, without any performance degradation when there is no label distribution shift. Extensive experiments on CIFAR, PACS, OfficeHome, and ImageNet benchmarks demonstrate DART's ability to correct inaccurate predictions caused by test-time distribution shifts. This improvement leads to enhanced performance in existing TTA methods, making DART a valuable plug-in tool.
- [13] arXiv:2411.15206 [pdf, html, other]
-
Title: Self-Supervised Conditional Distribution Learning on GraphsComments: 8 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Graph contrastive learning (GCL) has shown promising performance in semisupervised graph classification. However, existing studies still encounter significant challenges in GCL. First, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while GCL aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism of GNNs and the contrastive learning of negative pairs via intraviews. Second, leveraging the diversity and quantity of data provided by graph-structured data augmentations while preserving intrinsic semantic information is challenging. In this paper, we propose a self-supervised conditional distribution learning (SSCDL) method designed to learn graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment effectively reduces the risk of disrupting intrinsic semantic information through graph-structured data augmentation. To avoid conflict between the message-passing mechanism and contrastive learning of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed SSCDL method.
- [14] arXiv:2411.15208 [pdf, html, other]
-
Title: M2oE: Multimodal Collaborative Expert Peptide ModelComments: accepted by bibm 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Peptides are biomolecules comprised of amino acids that play an important role in our body. In recent years, peptides have received extensive attention in drug design and synthesis, and peptide prediction tasks help us better search for functional peptides. Typically, we use the primary sequence and structural information of peptides for model encoding. However, recent studies have focused more on single-modal information (structure or sequence) for prediction without multi-modal approaches. We found that single-modal models are not good at handling datasets with less information in that particular modality. Therefore, this paper proposes the M2oE multi-modal collaborative expert peptide model. Based on previous work, by integrating sequence and spatial structural information, employing expert model and Cross-Attention Mechanism, the model's capabilities are balanced and improved. Experimental results indicate that the M2oE model performs excellently in complex task predictions.
- [15] arXiv:2411.15209 [pdf, html, other]
-
Title: Quantized symbolic time series approximationSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Time series are ubiquitous in numerous science and engineering domains, e.g., signal processing, bioinformatics, and astronomy. Previous work has verified the efficacy of symbolic time series representation in a variety of engineering applications due to its storage efficiency and numerosity reduction. The most recent symbolic aggregate approximation technique, ABBA, has been shown to preserve essential shape information of time series and improve downstream applications, e.g., neural network inference regarding prediction and anomaly detection in time series.
Motivated by the emergence of high-performance hardware which enables efficient computation for low bit-width representations, we present a new quantization-based ABBA symbolic approximation technique, QABBA, which exhibits improved storage efficiency while retaining the original speed and accuracy of symbolic reconstruction. We prove an upper bound for the error arising from quantization and discuss how the number of bits should be chosen to balance this with other errors.
An application of QABBA with large language models (LLMs) for time series regression is also presented, and its utility is investigated. By representing the symbolic chain of patterns on time series, QABBA not only avoids the training of embedding from scratch, but also achieves a new state-of-the-art on Monash regression dataset. The symbolic approximation to the time series offers a more efficient way to fine-tune LLMs on the time series regression task which contains various application domains. We further present a set of extensive experiments performed across various well-established datasets to demonstrate the advantages of the QABBA method for symbolic approximation. - [16] arXiv:2411.15210 [pdf, html, other]
-
Title: Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual AttacksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
As deep learning models are increasingly deployed in safety-critical applications, evaluating their vulnerabilities to adversarial perturbations is essential for ensuring their reliability and trustworthiness. Over the past decade, a large number of white-box adversarial robustness evaluation methods (i.e., attacks) have been proposed, ranging from single-step to multi-step methods and from individual to ensemble methods. Despite these advances, challenges remain in conducting meaningful and comprehensive robustness evaluations, particularly when it comes to large-scale testing and ensuring evaluations reflect real-world adversarial risks. In this work, we focus on image classification models and propose a novel individual attack method, Probability Margin Attack (PMA), which defines the adversarial margin in the probability space rather than the logits space. We analyze the relationship between PMA and existing cross-entropy or logits-margin-based attacks, and show that PMA can outperform the current state-of-the-art individual methods. Building on PMA, we propose two types of ensemble attacks that balance effectiveness and efficiency. Furthermore, we create a million-scale dataset, CC1M, derived from the existing CC3M dataset, and use it to conduct the first million-scale white-box adversarial robustness evaluation of adversarially-trained ImageNet models. Our findings provide valuable insights into the robustness gaps between individual versus ensemble attacks and small-scale versus million-scale evaluations.
- [17] arXiv:2411.15211 [pdf, html, other]
-
Title: LightLLM: A Versatile Large Language Model for Predictive Light SensingComments: 15 pages, 14 figures, 5 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while being fine-tuned through the addition of lightweight, trainable components, allowing the model to adapt to new tasks without altering its original parameters. This approach enables flexible adaptation of LLM to specialized light sensing tasks with minimal computational overhead and retraining effort. We have implemented LightLLM for three light sensing tasks: light-based localization, outdoor solar forecasting, and indoor solar estimation. Using real-world experimental datasets, we demonstrate that LightLLM significantly outperforms state-of-the-art methods, achieving 4.4x improvement in localization accuracy and 3.4x improvement in indoor solar estimation when tested in previously unseen environments. We further demonstrate that LightLLM outperforms ChatGPT-4 with direct prompting, highlighting the advantages of LightLLM's specialized architecture for sensor data fusion with textual prompts.
- [18] arXiv:2411.15212 [pdf, html, other]
-
Title: Effective Analog ICs Floorplanning with Relational Graph Neural Networks and Reinforcement LearningComments: 7 pages, 7 figures, Accepted at DATE25Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Analog integrated circuit (IC) floorplanning is typically a manual process with the placement of components (devices and modules) planned by a layout engineer. This process is further complicated by the interdependence of floorplanning and routing steps, numerous electric and layout-dependent constraints, as well as the high level of customization expected in analog design. This paper presents a novel automatic floorplanning algorithm based on reinforcement learning. It is augmented by a relational graph convolutional neural network model for encoding circuit features and positional constraints. The combination of these two machine learning methods enables knowledge transfer across different circuit designs with distinct topologies and constraints, increasing the \emph{generalization ability} of the solution. Applied to $6$ industrial circuits, our approach surpassed established floorplanning techniques in terms of speed, area and half-perimeter wire length. When integrated into a \emph{procedural generator} for layout completion, overall layout time was reduced by $67.3\%$ with a $8.3\%$ mean area reduction compared to manual layout.
- [19] arXiv:2411.15214 [pdf, html, other]
-
Title: Urban Region Embeddings from Service-Specific Mobile Traffic DataSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
With the advent of advanced 4G/5G mobile networks, mobile phone data collected by operators now includes detailed, service-specific traffic information with high spatio-temporal resolution. In this paper, we leverage this type of data to explore its potential for generating high-quality representations of urban regions. To achieve this, we present a methodology for creating urban region embeddings from service-specific mobile traffic data, employing a temporal convolutional network-based autoencoder, transformers, and learnable weighted sum models to capture key urban features. In the extensive experimental evaluation conducted using a real-world dataset, we demonstrate that the embeddings generated by our methodology effectively capture urban characteristics. Specifically, our embeddings are compared against those of a state-of-the-art competitor across two downstream tasks. Additionally, through clustering techniques, we investigate how well the embeddings produced by our methodology capture the temporal dynamics and characteristics of the underlying urban regions. Overall, this work highlights the potential of service-specific mobile traffic data for urban research and emphasizes the importance of making such data accessible to support public innovation.
- [20] arXiv:2411.15215 [pdf, html, other]
-
Title: S$^2$ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation LearningMingze Yin, Hanjing Zhou, Jialu Wu, Yiheng Zhu, Yuxuan Zhan, Zitai Kong, Hongxia Xu, Chang-Yu Hsieh, Jintai Chen, Tingjun Hou, Jian WuSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Antibodies safeguard our health through their precise and potent binding to specific antigens, demonstrating promising therapeutic efficacy in the treatment of numerous diseases, including COVID-19. Recent advancements in biomedical language models have shown the great potential to interpret complex biological structures and functions. However, existing antibody specific models have a notable limitation that they lack explicit consideration for antibody structural information, despite the fact that both 1D sequence and 3D structure carry unique and complementary insights into antibody behavior and functionality. This paper proposes Sequence-Structure multi-level pre-trained Antibody Language Model (S$^2$ALM), combining holistic sequential and structural information in one unified, generic antibody foundation model. We construct a hierarchical pre-training paradigm incorporated with two customized multi-level training objectives to facilitate the modeling of comprehensive antibody representations. S$^2$ALM's representation space uncovers inherent functional binding mechanisms, biological evolution properties and structural interaction patterns. Pre-trained over 75 million sequences and 11.7 million structures, S$^2$ALM can be adopted for diverse downstream tasks: accurately predicting antigen-antibody binding affinities, precisely distinguishing B cell maturation stages, identifying antibody crucial binding positions, and specifically designing novel coronavirus-binding antibodies. Remarkably, S$^2$ALM outperforms well-established and renowned baselines and sets new state-of-the-art performance across extensive antibody specific understanding and generation tasks. S$^2$ALM's ability to model comprehensive and generalized representations further positions its potential to advance real-world therapeutic antibody development, potentially addressing unmet academic, industrial, and clinical needs.
- [21] arXiv:2411.15216 [pdf, html, other]
-
Title: Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance ConstraintSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Imbalanced data distributions are prevalent in real-world scenarios, posing significant challenges in both imbalanced classification and imbalanced regression tasks. They often cause deep learning models to overfit in areas of high sample density (many-shot regions) while underperforming in areas of low sample density (few-shot regions). This characteristic restricts the utility of deep learning models in various sectors, notably healthcare, where areas with few-shot data hold greater clinical relevance. While recent studies have shown the benefits of incorporating distribution information in imbalanced classification tasks, such strategies are rarely explored in imbalanced regression. In this paper, we address this issue by introducing a novel loss function, termed Dist Loss, designed to minimize the distribution distance between the model's predictions and the target labels in a differentiable manner, effectively integrating distribution information into model training. Dist Loss enables deep learning models to regularize their output distribution during training, effectively enhancing their focus on few-shot regions. We have conducted extensive experiments across three datasets spanning computer vision and healthcare: IMDB-WIKI-DIR, AgeDB-DIR, and ECG-Ka-DIR. The results demonstrate that Dist Loss effectively mitigates the negative impact of imbalanced data distribution on model performance, achieving state-of-the-art results in sparse data regions. Furthermore, Dist Loss is easy to integrate, complementing existing methods.
- [22] arXiv:2411.15217 [pdf, html, other]
-
Title: LPLgrad: Optimizing Active Learning Through Gradient Norm Sample Selection and Auxiliary Model TrainingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Machine learning models are increasingly being utilized across various fields and tasks due to their outstanding performance and strong generalization capabilities. Nonetheless, their success hinges on the availability of large volumes of annotated data, the creation of which is often labor-intensive, time-consuming, and expensive. Many active learning (AL) approaches have been proposed to address these challenges, but they often fail to fully leverage the information from the core phases of AL, such as training on the labeled set and querying new unlabeled samples. To bridge this gap, we propose a novel AL approach, Loss Prediction Loss with Gradient Norm (LPLgrad), designed to quantify model uncertainty effectively and improve the accuracy of image classification tasks. LPLgrad operates in two distinct phases: (i) {\em Training Phase} aims to predict the loss for input features by jointly training a main model and an auxiliary model. Both models are trained on the labeled data to maximize the efficiency of the learning process, an aspect often overlooked in previous AL methods. This dual-model approach enhances the ability to extract complex input features and learn intrinsic patterns from the data effectively; (ii) {\em Querying Phase} that quantifies the uncertainty of the main model to guide sample selection. This is achieved by calculating the gradient norm of the entropy values for samples in the unlabeled dataset. Samples with the highest gradient norms are prioritized for labeling and subsequently added to the labeled set, improving the model's performance with minimal labeling effort. Extensive evaluations on real-world datasets demonstrate that the LPLgrad approach outperforms state-of-the-art methods by order of magnitude in terms of accuracy on a small number of labeled images, yet achieving comparable training and querying times in multiple image classification tasks.
- [23] arXiv:2411.15220 [pdf, html, other]
-
Title: Sampling with Adaptive Variance for Multimodal DistributionsComments: 26 pages, 6 figuresSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)
We propose and analyze a class of adaptive sampling algorithms for multimodal distributions on a bounded domain, which share a structural resemblance to the classic overdamped Langevin dynamics. We first demonstrate that this class of linear dynamics with adaptive diffusion coefficients and vector fields can be interpreted and analyzed as weighted Wasserstein gradient flows of the Kullback--Leibler (KL) divergence between the current distribution and the target Gibbs distribution, which directly leads to the exponential convergence of both the KL and $\chi^2$ divergences, with rates depending on the weighted Wasserstein metric and the Gibbs potential. We then show that a derivative-free version of the dynamics can be used for sampling without gradient information of the Gibbs potential and that for Gibbs distributions with nonconvex potentials, this approach could achieve significantly faster convergence than the classical overdamped Langevin dynamics. A comparison of the mean transition times between local minima of a nonconvex potential further highlights the better efficiency of the derivative-free dynamics in sampling.
- [24] arXiv:2411.15221 [pdf, html, other]
-
Title: Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and ChemistryYoel Zimmermann, Adib Bazgir, Zartashia Afzal, Fariha Agbere, Qianxiang Ai, Nawaf Alampara, Alexander Al-Feghali, Mehrad Ansari, Dmytro Antypov, Amro Aswad, Jiaru Bai, Viktoriia Baibakova, Devi Dutta Biswajeet, Erik Bitzek, Joshua D. Bocarsly, Anna Borisova, Andres M Bran, L. Catherine Brinson, Marcel Moran Calderon, Alessandro Canalicchio, Victor Chen, Yuan Chiang, Defne Circi, Benjamin Charmes, Vikrant Chaudhary, Zizhang Chen, Min-Hsueh Chiu, Judith Clymo, Kedar Dabhadkar, Nathan Daelman, Archit Datar, Matthew L. Evans, Maryam Ghazizade Fard, Giuseppe Fisicaro, Abhijeet Sadashiv Gangan, Janine George, Jose D. Cojal Gonzalez, Michael Götte, Ankur K. Gupta, Hassan Harb, Pengyu Hong, Abdelrahman Ibrahim, Ahmed Ilyas, Alishba Imran, Kevin Ishimwe, Ramsey Issa, Kevin Maik Jablonka, Colin Jones, Tyler R. Josephson, Greg Juhasz, Sarthak Kapoor, Rongda Kang, Ghazal Khalighinejad, Sartaaj Khan, Sascha Klawohn, Suneel Kuman, Alvin Noe Ladines, Sarom Leang, Magdalena Lederbauer, Sheng-Lun Mark Liao, Hao Liu, Xuefeng Liu, Stanley Lo, Sandeep Madireddy, Piyush Ranjan Maharana, Shagun Maheshwari, Soroush Mahjoubi, José A. Márquez, Rob Mills, Trupti Mohanty, Bernadette Mohr, Seyed Mohamad Moosavi, Alexander Moßhammer, Amirhossein D. Naghdi, Aakash Naik, Oleksandr Narykov, Hampus Näsström, Xuan Vu Nguyen, Xinyi Ni, Dana O'Connor, Teslim Olayiwola, Federico Ottomano, Aleyna Beste Ozhan, Sebastian Pagel, Chiku Parida, Jaehee Park, Vraj Patel, Elena Patyukova, Martin Hoffmann Petersen, Luis Pinto, José M. Pizarro, Dieter Plessers, Tapashree Pradhan, Utkarsh Pratiush, Charishma Puli, Andrew Qin, Mahyar Rajabi, Francesco Ricci, Elliot Risch, Martiño Ríos-GarcíaComments: 98 pagesSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Chemical Physics (physics.chem-ph)
Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.
- [25] arXiv:2411.15222 [pdf, html, other]
-
Title: Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial DistillationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Language-conditioned robotic learning has significantly enhanced robot adaptability by enabling a single model to execute diverse tasks in response to verbal commands. Despite these advancements, security vulnerabilities within this domain remain largely unexplored. This paper addresses this gap by proposing a novel adversarial prompt attack tailored to language-conditioned robotic models. Our approach involves crafting a universal adversarial prefix that induces the model to perform unintended actions when added to any original prompt. We demonstrate that existing adversarial techniques exhibit limited effectiveness when directly transferred to the robotic domain due to the inherent robustness of discretized robotic action spaces. To overcome this challenge, we propose to optimize adversarial prefixes based on continuous action representations, circumventing the discretization process. Additionally, we identify the beneficial impact of intermediate features on adversarial attacks and leverage the negative gradient of intermediate self-attention features to further enhance attack efficacy. Extensive experiments on VIMA models across 13 robot manipulation tasks validate the superiority of our method over existing approaches and demonstrate its transferability across different model variants.
- [26] arXiv:2411.15223 [pdf, html, other]
-
Title: An accuracy improving method for advertising click through rate prediction based on enhanced xDeepFM modelComments: 12 pages, 7 figures, 3 tablesSubjects: Machine Learning (cs.LG)
Advertising click-through rate (CTR) prediction aims to forecast the probability that a user will click on an advertisement in a given context, thus providing enterprises with decision support for product ranking and ad placement. However, CTR prediction faces challenges such as data sparsity and class imbalance, which adversely affect model training effectiveness. Moreover, most current CTR prediction models fail to fully explore the associations among user history, interests, and target advertisements from multiple perspectives, neglecting important information at different levels. To address these issues, this paper proposes an improved CTR prediction model based on the xDeepFM architecture. By integrating a multi-head attention mechanism, the model can simultaneously focus on different aspects of feature interactions, enhancing its ability to learn intricate patterns without significantly increasing computational complexity. Furthermore, replacing the linear model with a Factorization Machine (FM) model improves the handling of high-dimensional sparse data by flexibly capturing both first-order and second-order feature interactions. Experimental results on the Criteo dataset demonstrate that the proposed model outperforms other state-of-the-art methods, showing significant improvements in both AUC and Logloss metrics. This enhancement facilitates better mining of implicit relationships between features and improves the accuracy of advertising CTR prediction.
- [27] arXiv:2411.15224 [pdf, html, other]
-
Title: Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear TransformationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning, and (2) Based on our observation that adapting pretrained Projectors to new tasks can be effectively approximated through a near-diagonal linear transformation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only diagonal-centric linear transformation matrices, without directly fine-tuning the pretrained Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.
- [28] arXiv:2411.15231 [pdf, html, other]
-
Title: IterIS: Iterative Inference-Solving Alignment for LoRA MergingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Low-rank adaptations (LoRA) are widely used to fine-tune large models across various domains for specific downstream tasks. While task-specific LoRAs are often available, concerns about data privacy and intellectual property can restrict access to training data, limiting the acquisition of a multi-task model through gradient-based training. In response, LoRA merging presents an effective solution by combining multiple LoRAs into a unified adapter while maintaining data privacy. Prior works on LoRA merging primarily frame it as an optimization problem, yet these approaches face several limitations, including the rough assumption about input features utilized in optimization, massive sample requirements, and the unbalanced optimization objective. These limitations can significantly degrade performance. To address these, we propose a novel optimization-based method, named IterIS: 1) We formulate LoRA merging as an advanced optimization problem to mitigate the rough assumption. Additionally, we employ an iterative inference-solving framework in our algorithm. It can progressively refine the optimization objective for improved performance. 2) We introduce an efficient regularization term to reduce the need for massive sample requirements (requiring only 1-5% of the unlabeled samples compared to prior methods). 3) We utilize adaptive weights in the optimization objective to mitigate potential unbalances in LoRA merging process. Our method demonstrates significant improvements over multiple baselines and state-of-the-art methods in composing tasks for text-to-image diffusion, vision-language models, and large language models. Furthermore, our layer-wise algorithm can achieve convergence with minimal steps, ensuring efficiency in both memory and computation.
- [29] arXiv:2411.15235 [pdf, html, other]
-
Title: CODE-CL: COnceptor-Based Gradient Projection for DEep Continual LearningComments: 10 pages, 2 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Continual learning, or the ability to progressively integrate new concepts, is fundamental to intelligent beings, enabling adaptability in dynamic environments. In contrast, artificial deep neural networks face the challenge of catastrophic forgetting when learning new tasks sequentially. To alleviate the problem of forgetting, recent approaches aim to preserve essential weight subspaces for previous tasks by limiting updates to orthogonal subspaces via gradient projection. While effective, this approach can lead to suboptimal performance, particularly when tasks are highly correlated. In this work, we introduce COnceptor-based gradient projection for DEep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a computational model inspired by neuroscience, to more flexibly handle highly correlated tasks. CODE-CL encodes directional importance within the input space of past tasks, allowing new knowledge integration in directions modulated by $1-S$, where $S$ represents the direction's relevance for prior tasks. Additionally, we analyze task overlap using conceptor-based representations to identify highly correlated tasks, facilitating efficient forward knowledge transfer through scaled projection within their intersecting subspace. This strategy enhances flexibility, allowing learning in correlated tasks without significantly disrupting previous knowledge. Extensive experiments on continual learning image classification benchmarks validate CODE-CL's efficacy, showcasing superior performance with minimal forgetting, outperforming most state-of-the-art methods.
- [30] arXiv:2411.15240 [pdf, other]
-
Title: Is Attention All You Need For Actigraphy? Foundation Models of Wearable Accelerometer Data for Mental Health ResearchComments: Supplementary material can be found at the end of the documentSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Quantitative Methods (q-bio.QM)
Wearable accelerometry (actigraphy) has provided valuable data for clinical insights since the 1970s and is increasingly important as wearable devices continue to become widespread. The effectiveness of actigraphy in research and clinical contexts is heavily dependent on the modeling architecture utilized. To address this, we developed the Pretrained Actigraphy Transformer (PAT)--the first pretrained and fully attention-based model designed specifically to handle actigraphy. PAT was pretrained on actigraphy from 29,307 participants in NHANES, enabling it to deliver state-of-the-art performance when fine-tuned across various actigraphy prediction tasks in the mental health domain, even in data-limited scenarios. For example, when trained to predict benzodiazepine usage using actigraphy from only 500 labeled participants, PAT achieved an 8.8 percentage-point AUC improvement over the best baseline. With fewer than 2 million parameters and built-in model explainability, PAT is robust yet easy to deploy in health research settings.
GitHub: this https URL - [31] arXiv:2411.15242 [pdf, html, other]
-
Title: The Zamba2 Suite: Technical ReportPaolo Glorioso, Quentin Anthony, Yury Tokpanov, Anna Golubeva, Vasudev Shyam, James Whittington, Jonathan Pilault, Beren MillidgeComments: 21/11/24 initial uploadSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its architecture, training and annealing datasets, and training for up to three trillion tokens. We provide open-source weights for all models of the Zamba2 series as well as instruction-tuned variants that are strongly competitive against comparable instruct-tuned models of their class. We additionally open-source the pretraining dataset, which we call Zyda-2, used to train the Zamba2 series of models. The models and datasets used in this work are openly available at this https URL
- [32] arXiv:2411.15247 [pdf, html, other]
-
Title: Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate RewardSubjects: Machine Learning (cs.LG)
Recent research has shown that fine-tuning diffusion models (DMs) with arbitrary rewards, including non-differentiable ones, is feasible with reinforcement learning (RL) techniques, enabling flexible model alignment. However, applying existing RL methods to timestep-distilled DMs is challenging for ultra-fast ($\le2$-step) image generation. Our analysis suggests several limitations of policy-based RL methods such as PPO or DPO toward this goal. Based on the insights, we propose fine-tuning DMs with learned differentiable surrogate rewards. Our method, named LaSRO, learns surrogate reward models in the latent space of SDXL to convert arbitrary rewards into differentiable ones for efficient reward gradient guidance. LaSRO leverages pre-trained latent DMs for reward modeling and specifically targets image generation $\le2$ steps for reward optimization, enhancing generalizability and efficiency. LaSRO is effective and stable for improving ultra-fast image generation with different reward objectives, outperforming popular RL methods including PPO and DPO. We further show LaSRO's connection to value-based RL, providing theoretical insights. See our webpage at this https URL.
- [33] arXiv:2411.15250 [pdf, html, other]
-
Title: TPLogAD: Unsupervised Log Anomaly Detection Based on Event Templates and Key ParametersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Log-system is an important mechanism for recording the runtime status and events of Web service systems, and anomaly detection in logs is an effective method of detecting problems. However, manual anomaly detection in logs is inefficient, error-prone, and unrealistic. Existing log anomaly detection methods either use the indexes of event templates, or form vectors by embedding the fixed string part of the template as a sentence, or use time parameters for sequence analysis. However, log entries often contain features and semantic information that cannot be fully represented by these methods, resulting in missed and false alarms. In this paper, we propose TPLogAD, a universal unsupervised method for analyzing unstructured logs, which performs anomaly detection based on event templates and key parameters. The itemplate2vec and para2vec included in TPLogAD are two efficient and easy-to-implement semantic representation methods for logs, detecting anomalies in event templates and parameters respectively, which has not been achieved in previous work. Additionally, TPLogAD can avoid the interference of log diversity and dynamics on anomaly detection. Our experiments on four public log datasets show that TPLogAD outperforms existing log anomaly detection methods.
- [34] arXiv:2411.15254 [pdf, html, other]
-
Title: A Unified Energy Management Framework for Multi-Timescale Forecasting in Smart GridsComments: Submitted to PES GM 2025Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Accurate forecasting of the electrical load, such as the magnitude and the timing of peak power, is crucial to successful power system management and implementation of smart grid strategies like demand response and peak shaving. In multi-time-scale optimization scheduling, rolling optimization is a common solution. However, rolling optimization needs to consider the coupling of different optimization objectives across time scales. It is challenging to accurately capture the mid- and long-term dependencies in time series data. This paper proposes Multi-pofo, a multi-scale power load forecasting framework, that captures such dependency via a novel architecture equipped with a temporal positional encoding layer. To validate the effectiveness of the proposed model, we conduct experiments on real-world electricity load data. The experimental results show that our approach outperforms compared to several strong baseline methods.
- [35] arXiv:2411.15257 [pdf, html, other]
-
Title: The Explabox: Model-Agnostic Machine Learning Transparency & AnalysisComments: 5 pages, 3 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
We present the Explabox: an open-source toolkit for transparent and responsible machine learning (ML) model development and usage. Explabox aids in achieving explainable, fair and robust models by employing a four-step strategy: explore, examine, explain and expose. These steps offer model-agnostic analyses that transform complex 'ingestibles' (models and data) into interpretable 'digestibles'. The toolkit encompasses digestibles for descriptive statistics, performance metrics, model behavior explanations (local and global), and robustness, security, and fairness assessments. Implemented in Python, Explabox supports multiple interaction modes and builds on open-source packages. It empowers model developers and testers to operationalize explainability, fairness, auditability, and security. The initial release focuses on text data and models, with plans for expansion. Explabox's code and documentation are available open-source at this https URL.
- [36] arXiv:2411.15272 [pdf, html, other]
-
Title: Curriculum-enhanced GroupDRO: Challenging the Norm of Avoiding Curriculum Learning in Subpopulation Shift SetupsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In subpopulation shift scenarios, a Curriculum Learning (CL) approach would only serve to imprint the model weights, early on, with the easily learnable spurious correlations featured. To the best of our knowledge, none of the current state-of-the-art subpopulation shift approaches employ any kind of curriculum. To overcome this, we design a CL approach aimed at initializing the model weights in an unbiased vantage point in the hypothesis space which sabotages easy convergence towards biased hypotheses during the final optimization based on the entirety of the available data. We hereby propose a Curriculum-enhanced Group Distributionally Robust Optimization (CeGDRO) approach, which prioritizes the hardest bias-confirming samples and the easiest bias-conflicting samples, leveraging GroupDRO to balance the initial discrepancy in terms of difficulty. We benchmark our proposed method against the most popular subpopulation shift datasets, showing an increase over the state-of-the-art results across all scenarios, up to 6.2% on Waterbirds.
- [37] arXiv:2411.15279 [pdf, html, other]
-
Title: Don't Mesh with Me: Generating Constructive Solid Geometry Instead of Meshes by Fine-Tuning a Code-Generation LLMSubjects: Machine Learning (cs.LG); Graphics (cs.GR)
While recent advancements in machine learning, such as LLMs, are revolutionizing software development and creative industries, they have had minimal impact on engineers designing mechanical parts, which remains largely a manual process. Existing approaches to generate 3D geometry most commonly use meshes as a 3D representation. While meshes are suitable for assets in video games or animations, they lack sufficient precision and adaptability for mechanical engineering purposes. This paper introduces a novel approach for the generation of 3D geometry that generates surface-based Constructive Solid Geometry (CSG) by leveraging a code-generation LLM. First, we create a dataset of 3D mechanical parts represented as code scripts by converting Boundary Representation geometry (BREP) into CSG-based Python scripts. Second, we create annotations in natural language using GPT-4. The resulting dataset is used to fine-tune a code-generation LLM. The fine-tuned LLM can complete geometries based on positional input and natural language in a plausible way, demonstrating geometric understanding.
- [38] arXiv:2411.15281 [pdf, html, other]
-
Title: ElastiFormer: Learned Redundancy Reduction in Transformer via Self-DistillationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We introduce ElastiFormer, a post-training technique that adapts pretrained Transformer models into an elastic counterpart with variable inference time compute. ElastiFormer introduces small routing modules (as low as .00006% additional trainable parameters) to dynamically selects subsets of network parameters and input tokens to be processed by each layer of the pretrained network in an inputdependent manner. The routing modules are trained using self-distillation losses to minimize the differences between the output of the pretrained-model and their elastic counterparts. As ElastiFormer makes no assumption regarding the modality of the pretrained Transformer model, it can be readily applied to all modalities covering causal language modeling, image modeling as well as visual-language modeling tasks. We show that 20% to 50% compute saving could be achieved for different components of the transformer layer, which could be further reduced by adding very low rank LoRA weights (rank 1) trained via the same distillation objective. Finally, by comparing routing trained on different subsets of ImageNet, we show that ElastiFormer is robust against the training domain.
- [39] arXiv:2411.15285 [pdf, html, other]
-
Title: Forecasting Unseen Points of Interest Visits Using Context and Proximity PriorsComments: 2024 IEEE International Conference on Big Data workshop BSD 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Understanding human mobility behavior is crucial for numerous applications, including crowd management, location-based recommendations, and the estimation of pandemic spread. Machine learning models can predict the Points of Interest (POIs) that individuals are likely to visit in the future by analyzing their historical visit patterns. Previous studies address this problem by learning a POI classifier, where each class corresponds to a POI. However, this limits their applicability to predict a new POI that was not in the training data, such as the opening of new restaurants. To address this challenge, we propose a model designed to predict a new POI outside the training data as long as its context is aligned with the user's interests. Unlike existing approaches that directly predict specific POIs, our model first forecasts the semantic context of potential future POIs, then combines this with a proximity-based prior probability distribution to determine the exact POI. Experimental results on real-world visit data demonstrate that our model outperforms baseline methods that do not account for semantic contexts, achieving a 17% improvement in accuracy. Notably, as new POIs are introduced over time, our model remains robust, exhibiting a lower decline rate in prediction accuracy compared to existing methods.
- [40] arXiv:2411.15290 [pdf, html, other]
-
Title: GreenMachine: Automatic Design of Zero-Cost Proxies for Energy-Efficient NASComments: Submitted to CVPR 2025Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Artificial Intelligence (AI) has driven innovations and created new opportunities across various sectors. However, leveraging domain-specific knowledge often requires automated tools to design and configure models effectively. In the case of Deep Neural Networks (DNNs), researchers and practitioners usually resort to Neural Architecture Search (NAS) approaches, which are resource- and time-intensive, requiring the training and evaluation of numerous candidate architectures. This raises sustainability concerns, particularly due to the high energy demands involved, creating a paradox: the pursuit of the most effective model can undermine sustainability goals. To mitigate this issue, zero-cost proxies have emerged as a promising alternative. These proxies estimate a model's performance without the need for full training, offering a more efficient approach. This paper addresses the challenges of model evaluation by automatically designing zero-cost proxies to assess DNNs efficiently. Our method begins with a randomly generated set of zero-cost proxies, which are evolved and tested using the NATS-Bench benchmark. We assess the proxies' effectiveness using both randomly sampled and stratified subsets of the search space, ensuring they can differentiate between low- and high-performing networks and enhance generalizability. Results show our method outperforms existing approaches on the stratified sampling strategy, achieving strong correlations with ground truth performance, including a Kendall correlation of 0.89 on CIFAR-10 and 0.77 on CIFAR-100 with NATS-Bench-SSS and a Kendall correlation of 0.78 on CIFAR-10 and 0.71 on CIFAR-100 with NATS-Bench-TSS.
- [41] arXiv:2411.15292 [pdf, html, other]
-
Title: Influence functions and regularity tangents for efficient active learningComments: 33 pages, 4 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
In this paper we describe an efficient method for providing a regression model with a sense of curiosity about its data. In the field of machine learning, our framework for representing curiosity is called Active Learning, which means automatically choosing data points for which to query labels in the semisupervised setting. The methods we propose are based on computing a "regularity tangent" vector that can be calculated (with only a constant slow-down) together with the model's parameter vector during training. We then take the inner product of this tangent vector with the gradient vector of the model's loss at a given data point to obtain a measure of the influence of that point on the complexity of the model. There is only a single regularity tangent vector, of the same dimension as the parameter vector. Thus, in the proposed technique, once training is complete, evaluating our "curiosity" about a potential query data point can be done as quickly as calculating the model's loss gradient at that point. The new vector only doubles the amount of storage required by the model. We show that the quantity computed by our technique is an example of an "influence function", and that it measures the expected squared change in model complexity incurred by up-weighting a given data point. We propose a number of ways for using this quantity to choose new training data for a model in the framework of active learning.
- [42] arXiv:2411.15328 [pdf, html, other]
-
Title: Dependence Induced RepresentationsJournal-ref: 2024 60th Annual Allerton Conference on Communication, Control, and ComputingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the problem of learning feature representations from a pair of random variables, where we focus on the representations that are induced by their dependence. We provide sufficient and necessary conditions for such dependence induced representations, and illustrate their connections to Hirschfeld--Gebelein--Rényi (HGR) maximal correlation functions and minimal sufficient statistics. We characterize a large family of loss functions that can learn dependence induced representations, including cross entropy, hinge loss, and their regularized variants. In particular, we show that the features learned from this family can be expressed as the composition of a loss-dependent function and the maximal correlation function, which reveals a key connection between representations learned from different losses. Our development also gives a statistical interpretation of the neural collapse phenomenon observed in deep classifiers. Finally, we present the learning design based on the feature separation, which allows hyperparameter tuning during inference.
- [43] arXiv:2411.15331 [pdf, html, other]
-
Title: GeoScatt-GNN: A Geometric Scattering Transform-Based Graph Neural Network Model for Ames Mutagenicity PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)
This paper tackles the pressing challenge of mutagenicity prediction by introducing three ground-breaking approaches. First, it showcases the superior performance of 2D scattering coefficients extracted from molecular images, compared to traditional molecular descriptors. Second, it presents a hybrid approach that combines geometric graph scattering (GGS), Graph Isomorphism Networks (GIN), and machine learning models, achieving strong results in mutagenicity prediction. Third, it introduces a novel graph neural network architecture, MOLG3-SAGE, which integrates GGS node features into a fully connected graph structure, delivering outstanding predictive accuracy. Experimental results on the ZINC dataset demonstrate significant improvements, emphasizing the effectiveness of blending 2D and geometric scattering techniques with graph neural networks. This study illustrates the potential of GNNs and GGS for mutagenicity prediction, with broad implications for drug discovery and chemical safety assessment.
- [44] arXiv:2411.15370 [pdf, html, other]
-
Title: Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay BuffersGautham Vasan, Mohamed Elsayed, Alireza Azimi, Jiamin He, Fahim Shariar, Colin Bellinger, Martha White, A. Rupam MahmoodComments: In The Thirty-eighth Annual Conference on Neural Information Processing Systems. Source code at this https URL and companion video at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Modern deep policy gradient methods achieve effective performance on simulated robotic tasks, but they all require large replay buffers or expensive batch updates, or both, making them incompatible for real systems with resource-limited computers. We show that these methods fail catastrophically when limited to small replay buffers or during incremental learning, where updates only use the most recent sample without batch updates or a replay buffer. We propose a novel incremental deep policy gradient method -- Action Value Gradient (AVG) and a set of normalization and scaling techniques to address the challenges of instability in incremental learning. On robotic simulation benchmarks, we show that AVG is the only incremental method that learns effectively, often achieving final performance comparable to batch policy gradient methods. This advancement enabled us to show for the first time effective deep reinforcement learning with real robots using only incremental updates, employing a robotic manipulator and a mobile robot.
- [45] arXiv:2411.15375 [pdf, html, other]
-
Title: AdamZ: An Enhanced Optimisation Method for Neural Network TrainingIlia Zaznov (Department of Computer Science, University of Reading, Reading, UK), Atta Badii (Department of Computer Science, University of Reading, Reading, UK), Alfonso Dufour (ICMA Centre, Henley Business School, University of Reading, Reading, UK), Julian Kunkel (Department of Computer Science/GWDG, University of Göttingen, Goettingen, Germany)Comments: 13 pages, 9 figures, 3 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)
AdamZ is an advanced variant of the Adam optimiser, developed to enhance convergence efficiency in neural network training. This optimiser dynamically adjusts the learning rate by incorporating mechanisms to address overshooting and stagnation, that are common challenges in optimisation. Specifically, AdamZ reduces the learning rate when overshooting is detected and increases it during periods of stagnation, utilising hyperparameters such as overshoot and stagnation factors, thresholds, and patience levels to guide these adjustments. While AdamZ may lead to slightly longer training times compared to some other optimisers, it consistently excels in minimising the loss function, making it particularly advantageous for applications where precision is critical. Benchmarking results demonstrate the effectiveness of AdamZ in maintaining optimal learning rates, leading to improved model performance across diverse tasks.
- [46] arXiv:2411.15380 [pdf, html, other]
-
Title: Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data ProcessingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep learning models often require specially designed architectures to process data of different dimensions, such as 1D time series, 2D images, and 3D volumetric data. Existing bidirectional models mainly focus on sequential data, making it difficult to scale effectively to higher dimensions. To address this issue, we propose a novel multi-dimensional bidirectional neural network architecture, named Nd-BiMamba2, which efficiently handles 1D, 2D, and 3D data. Nd-BiMamba2 is based on the Mamba2 module and introduces innovative bidirectional processing mechanisms and adaptive padding strategies to capture bidirectional information in multi-dimensional data while maintaining computational efficiency. Unlike existing methods that require designing specific architectures for different dimensional data, Nd-BiMamba2 adopts a unified architecture with a modular design, simplifying development and maintenance costs. To verify the portability and flexibility of Nd-BiMamba2, we successfully exported it to ONNX and TorchScript and tested it on different hardware platforms (e.g., CPU, GPU, and mobile devices). Experimental results show that Nd-BiMamba2 runs efficiently on multiple platforms, demonstrating its potential in practical applications. The code is open-source: this https URL
- [47] arXiv:2411.15385 [pdf, html, other]
-
Title: Gradient dynamics for low-rank fine-tuning beyond kernelsSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
LoRA has emerged as one of the de facto methods for fine-tuning foundation models with low computational cost and memory footprint. The idea is to only train a low-rank perturbation to the weights of a pre-trained model, given supervised data for a downstream task. Despite its empirical sucess, from a mathematical perspective it remains poorly understood what learning mechanisms ensure that gradient descent converges to useful low-rank perturbations.
In this work we study low-rank fine-tuning in a student-teacher setting. We are given the weights of a two-layer base model $f$, as well as i.i.d. samples $(x,f^*(x))$ where $x$ is Gaussian and $f^*$ is the teacher model given by perturbing the weights of $f$ by a rank-1 matrix. This generalizes the setting of generalized linear model (GLM) regression where the weights of $f$ are zero.
When the rank-1 perturbation is comparable in norm to the weight matrix of $f$, the training dynamics are nonlinear. Nevertheless, in this regime we prove under mild assumptions that a student model which is initialized at the base model and trained with online gradient descent will converge to the teacher in $dk^{O(1)}$ iterations, where $k$ is the number of neurons in $f$. Importantly, unlike in the GLM setting, the complexity does not depend on fine-grained properties of the activation's Hermite expansion. We also prove that in our setting, learning the teacher model "from scratch'' can require significantly more iterations. - [48] arXiv:2411.15403 [pdf, html, other]
-
Title: Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated LearningSubjects: Machine Learning (cs.LG)
Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that weak classes consistently exist even for class-balanced learning. These weak classes, different from the minority classes in the previous works, are inherent to data and remain fairly consistent for various network structures and learning paradigms. The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets, even when the class distribution is balanced both globally and locally. In this study, we empirically analyze the potential reason for this phenomenon. Furthermore, a class-specific partial knowledge distillation method is proposed to improve the model's classification accuracy for weak classes. In this approach, knowledge transfer is initiated upon the occurrence of specific misclassifications within certain weak classes. Experimental results show that the accuracy of weak classes can be improved by 10.7%, reducing the inherent interclass discrepancy effectively.
- [49] arXiv:2411.15404 [pdf, html, other]
-
Title: A Comparative Analysis of Transformer and LSTM Models for Detecting Suicidal Ideation on RedditComments: 23rd IEEE International Conference on Machine Learning and Applications, ICMLA 2024 (camera-ready)Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Suicide is a critical global health problem involving more than 700,000 deaths yearly, particularly among young adults. Many people express their suicidal thoughts on social media platforms such as Reddit. This paper evaluates the effectiveness of the deep learning transformer-based models BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA and various Long Short-Term Memory (LSTM) based models in detecting suicidal ideation from user posts on Reddit. Toward this objective, we curated an extensive dataset from diverse subreddits and conducted linguistic, topic modeling, and statistical analyses to ensure data quality. Our results indicate that each model could reach high accuracy and F1 scores, but among them, RoBERTa emerged as the most effective model with an accuracy of 93.22% and F1 score of 93.14%. An LSTM model that uses attention and BERT embeddings performed as the second best, with an accuracy of 92.65% and an F1 score of 92.69%. Our findings show that transformer-based models have the potential to improve suicide ideation detection, thereby providing a path to develop robust mental health monitoring tools from social media. This research, therefore, underlines the undeniable prospect of advanced techniques in Natural Language Processing (NLP) while improving suicide prevention efforts.
- [50] arXiv:2411.15422 [pdf, html, other]
-
Title: Learning a local trading strategy: deep reinforcement learning for grid-scale renewable energy integrationComments: Accepted to HICSS58Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC)
Variable renewable generation increases the challenge of balancing power supply and demand. Grid-scale batteries co-located with generation can help mitigate this misalignment. This paper explores the use of reinforcement learning (RL) for operating grid-scale batteries co-located with solar power. Our results show RL achieves an average of 61% (and up to 96%) of the approximate theoretical optimal (non-causal) operation, outperforming advanced control methods on average. Our findings suggest RL may be preferred when future signals are hard to predict. Moreover, RL has two significant advantages compared to simpler rules-based control: (1) that solar energy is more effectively shifted towards high demand periods, and (2) increased diversity of battery dispatch across different locations, reducing potential ramping issues caused by super-position of many similar actions.
- [51] arXiv:2411.15458 [pdf, html, other]
-
Title: TANGNN: a Concise, Scalable and Effective Graph Neural Networks with Top-m Attention Mechanism for Graph Representation LearningComments: The code and ArXivNet dataset are available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In the field of deep learning, Graph Neural Networks (GNNs) and Graph Transformer models, with their outstanding performance and flexible architectural designs, have become leading technologies for processing structured data, especially graph data. Traditional GNNs often face challenges in capturing information from distant vertices effectively. In contrast, Graph Transformer models are particularly adept at managing long-distance node relationships. Despite these advantages, Graph Transformer models still encounter issues with computational and storage efficiency when scaled to large graph datasets. To address these challenges, we propose an innovative Graph Neural Network (GNN) architecture that integrates a Top-m attention mechanism aggregation component and a neighborhood aggregation component, effectively enhancing the model's ability to aggregate relevant information from both local and extended neighborhoods at each layer. This method not only improves computational efficiency but also enriches the node features, facilitating a deeper analysis of complex graph structures. Additionally, to assess the effectiveness of our proposed model, we have applied it to citation sentiment prediction, a novel task previously unexplored in the GNN field. Accordingly, we constructed a dedicated citation network, ArXivNet. In this dataset, we specifically annotated the sentiment polarity of the citations (positive, neutral, negative) to enable in-depth sentiment analysis. Our approach has shown superior performance across a variety of tasks including vertex classification, link prediction, sentiment prediction, graph regression, and visualization. It outperforms existing methods in terms of effectiveness, as demonstrated by experimental results on multiple datasets.
- [52] arXiv:2411.15527 [pdf, html, other]
-
Title: Haar-Laplacian for directed graphsComments: 30 pages, 9 figures, 4 tablesSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
This paper introduces a novel Laplacian matrix aiming to enable the construction of spectral convolutional networks and to extend the signal processing applications for directed graphs. Our proposal is inspired by a Haar-like transformation and produces a Hermitian matrix which is not only in one-to-one relation with the adjacency matrix, preserving both direction and weight information, but also enjoys desirable additional properties like scaling robustness, sensitivity, continuity, and directionality. We take a theoretical standpoint and support the conformity of our approach with the spectral graph theory. Then, we address two use-cases: graph learning (by introducing HaarNet, a spectral graph convolutional network built with our Haar-Laplacian) and graph signal processing. We show that our approach gives better results in applications like weight prediction and denoising on directed graphs.
- [53] arXiv:2411.15558 [pdf, html, other]
-
Title: Reassessing Layer Pruning in LLMs: New Insights and MethodsYao Lu, Hao Cheng, Yujie Fang, Zeyu Wang, Jiaheng Wei, Dongwei Xu, Qi Xuan, Xiaoniu Yang, Zhaowei ZhuSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Although large language models (LLMs) have achieved remarkable success across various domains, their considerable scale necessitates substantial computational resources, posing significant challenges for deployment in resource-constrained environments. Layer pruning, as a simple yet effective compression method, removes layers of a model directly, reducing computational overhead. However, what are the best practices for layer pruning in LLMs? Are sophisticated layer selection metrics truly effective? Does the LoRA (Low-Rank Approximation) family, widely regarded as a leading method for pruned model fine-tuning, truly meet expectations when applied to post-pruning fine-tuning? To answer these questions, we dedicate thousands of GPU hours to benchmarking layer pruning in LLMs and gaining insights across multiple dimensions. Our results demonstrate that a simple approach, i.e., pruning the final 25\% of layers followed by fine-tuning the \texttt{lm\_head} and the remaining last three layer, yields remarkably strong performance. Following this guide, we prune Llama-3.1-8B-It and obtain a model that outperforms many popular LLMs of similar size, such as ChatGLM2-6B, Vicuna-7B-v1.5, Qwen1.5-7B and Baichuan2-7B. We release the optimal model weights on Huggingface, and the code is available on GitHub.
- [54] arXiv:2411.15590 [pdf, html, other]
-
Title: From Complexity to Parsimony: Integrating Latent Class Analysis to Uncover Multimodal Learning Patterns in Collaborative LearningLixiang Yan, Dragan Gašević, Linxuan Zhao, Vanessa Echeverria, Yueqiao Jin, Roberto Martinez-MaldonadoSubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Methodology (stat.ME)
Multimodal Learning Analytics (MMLA) leverages advanced sensing technologies and artificial intelligence to capture complex learning processes, but integrating diverse data sources into cohesive insights remains challenging. This study introduces a novel methodology for integrating latent class analysis (LCA) within MMLA to map monomodal behavioural indicators into parsimonious multimodal ones. Using a high-fidelity healthcare simulation context, we collected positional, audio, and physiological data, deriving 17 monomodal indicators. LCA identified four distinct latent classes: Collaborative Communication, Embodied Collaboration, Distant Interaction, and Solitary Engagement, each capturing unique monomodal patterns. Epistemic network analysis compared these multimodal indicators with the original monomodal indicators and found that the multimodal approach was more parsimonious while offering higher explanatory power regarding students' task and collaboration performances. The findings highlight the potential of LCA in simplifying the analysis of complex multimodal data while capturing nuanced, cross-modality behaviours, offering actionable insights for educators and enhancing the design of collaborative learning interventions. This study proposes a pathway for advancing MMLA, making it more parsimonious and manageable, and aligning with the principles of learner-centred education.
- [55] arXiv:2411.15616 [pdf, html, other]
-
Title: A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data SegmentationComments: Accepted in CODS-COMAD 2024Subjects: Machine Learning (cs.LG)
In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift, a phenomenon where discrepancies between historical training data and future test data lead to significant performance degradation and operational inefficiencies. Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data, and focus primarily on either covariate drift or concept drift. These methods face issues such as high resource demands, inability to manage all types of drifts effectively, and neglecting the valuable context that historical data can provide. We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness. This paper introduces an advanced framework that integrates the strengths of data-centric approaches with adaptive management of both covariate and concept drift in a scalable and efficient manner. Our framework employs sophisticated data segmentation techniques to identify optimal data batches that accurately reflect test data patterns. These data batches are then utilized for training on test data, ensuring that the models remain relevant and accurate over time. By leveraging the advantages of both data segmentation and scalable drift management, our solution ensures robust model accuracy and operational efficiency in large-scale ML deployments. It also minimizes resource consumption and computational overhead by selecting and utilizing relevant data subsets, leading to significant cost savings. Experimental results on classification task on real-world and synthetic datasets show our approach improves model accuracy while reducing operational costs and latency. This practical solution overcomes inefficiencies in current methods, providing a robust, adaptable, and scalable approach.
- [56] arXiv:2411.15638 [pdf, html, other]
-
Title: Learning state and proposal dynamics in state-space models using differentiable particle filters and neural networksSubjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
State-space models are a popular statistical framework for analysing sequential data. Within this framework, particle filters are often used to perform inference on non-linear state-space models. We introduce a new method, StateMixNN, that uses a pair of neural networks to learn the proposal distribution and transition distribution of a particle filter. Both distributions are approximated using multivariate Gaussian mixtures. The component means and covariances of these mixtures are learnt as outputs of learned functions. Our method is trained targeting the log-likelihood, thereby requiring only the observation series, and combines the interpretability of state-space models with the flexibility and approximation power of artificial neural networks. The proposed method significantly improves recovery of the hidden state in comparison with the state-of-the-art, showing greater improvement in highly non-linear scenarios.
- [57] arXiv:2411.15645 [pdf, html, other]
-
Title: MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine TreeSubjects: Machine Learning (cs.LG)
Mathematical reasoning has proven to be a critical yet challenging task for large language models (LLMs), as they often struggle with complex multi-step problems. To address these limitations, we introduce the Monte Carlo Nash Equilibrium Self-Refine Tree (MC-NEST) algorithm, an enhancement of the Monte Carlo Tree Self-Refine (MCTSr) approach. By integrating Nash Equilibrium strategies with LLM-based self-refinement and self-evaluation processes, MC-NEST aims to improve decision-making for complex mathematical reasoning tasks. This method ensures balanced exploration and exploitation of potential solutions, leveraging Upper Confidence Bound (UCT) scores and various selection policies. Through iterative critique and refinement, MC-NEST enhances the reasoning capabilities of LLMs, particularly for problems requiring strategic decision-making. Comparative analysis reveals that GPT-4o, equipped with MC-NEST using an Importance Sampling Policy, achieved superior accuracy in domains such as Number Theory and Geometry. These results suggest that both LLMs GPT-4o and Phi-3-mini can benefit from MC-NEST, with iterative self-refinement proving especially effective in expanding the reasoning capacity and problem-solving performance of LLMs. We evaluate the effectiveness of MC-NEST on challenging Olympiad-level benchmarks, demonstrating its potential to significantly boost complex mathematical reasoning performance in LLMs.
- [58] arXiv:2411.15655 [pdf, html, other]
-
Title: Machine Learning-based sEMG Signal Classification for Hand Gesture RecognitionComments: IEEE BIBM 2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
EMG-based hand gesture recognition uses electromyographic~(EMG) signals to interpret and classify hand movements by analyzing electrical activity generated by muscle contractions. It has wide applications in prosthesis control, rehabilitation training, and human-computer interaction. Using electrodes placed on the skin, the EMG sensor captures muscle signals, which are processed and filtered to reduce noise. Numerous feature extraction and machine learning algorithms have been proposed to extract and classify muscle signals to distinguish between various hand gestures. This paper aims to benchmark the performance of EMG-based hand gesture recognition using novel feature extraction methods, namely, fused time-domain descriptors, temporal-spatial descriptors, and wavelet transform-based features, combined with the state-of-the-art machine and deep learning models. Experimental investigations on the Grabmyo dataset demonstrate that the 1D Dilated CNN performed the best with an accuracy of $97\%$ using fused time-domain descriptors such as power spectral moments, sparsity, irregularity factor and waveform length ratio. Similarly, on the FORS-EMG dataset, random forest performed the best with an accuracy of $94.95\%$ using temporal-spatial descriptors (which include time domain features along with additional features such as coefficient of variation (COV), and Teager-Kaiser energy operator (TKEO)).
- [59] arXiv:2411.15671 [pdf, html, other]
-
Title: Best of Both Worlds: Advantages of Hybrid Graph Sequence ModelsSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Modern sequence models (e.g., Transformers, linear RNNs, etc.) emerged as dominant backbones of recent deep learning frameworks, mainly due to their efficiency, representational power, and/or ability to capture long-range dependencies. Adopting these sequence models for graph-structured data has recently gained popularity as the alternative to Message Passing Neural Networks (MPNNs). There is, however, a lack of a common foundation about what constitutes a good graph sequence model, and a mathematical description of the benefits and deficiencies in adopting different sequence models for learning on graphs. To this end, we first present Graph Sequence Model (GSM), a unifying framework for adopting sequence models for graphs, consisting of three main steps: (1) Tokenization, which translates the graph into a set of sequences; (2) Local Encoding, which encodes local neighborhoods around each node; and (3) Global Encoding, which employs a scalable sequence model to capture long-range dependencies within the sequences. This framework allows us to understand, evaluate, and compare the power of different sequence model backbones in graph tasks. Our theoretical evaluations of the representation power of Transformers and modern recurrent models through the lens of global and local graph tasks show that there are both negative and positive sides for both types of models. Building on this observation, we present GSM++, a fast hybrid model that uses the Hierarchical Affinity Clustering (HAC) algorithm to tokenize the graph into hierarchical sequences, and then employs a hybrid architecture of Transformer to encode these sequences. Our theoretical and experimental results support the design of GSM++, showing that GSM++ outperforms baselines in most benchmark evaluations.
- [60] arXiv:2411.15674 [pdf, html, other]
-
Title: Quantile deep learning models for multi-step ahead time series predictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistical Finance (q-fin.ST); Methodology (stat.ME)
Uncertainty quantification is crucial in time series prediction, and quantile regression offers a valuable mechanism for uncertainty quantification which is useful for extreme value forecasting. Although deep learning models have been prominent in multi-step ahead prediction, the development and evaluation of quantile deep learning models have been limited. We present a novel quantile regression deep learning framework for multi-step time series prediction. In this way, we elevate the capabilities of deep learning models by incorporating quantile regression, thus providing a more nuanced understanding of predictive values. We provide an implementation of prominent deep learning models for multi-step ahead time series prediction and evaluate their performance under high volatility and extreme conditions. We include multivariate and univariate modelling, strategies and provide a comparison with conventional deep learning models from the literature. Our models are tested on two cryptocurrencies: Bitcoin and Ethereum, using daily close-price data and selected benchmark time series datasets. The results show that integrating a quantile loss function with deep learning provides additional predictions for selected quantiles without a loss in the prediction accuracy when compared to the literature. Our quantile model has the ability to handle volatility more effectively and provides additional information for decision-making and uncertainty quantification through the use of quantiles when compared to conventional deep learning models.
- [61] arXiv:2411.15675 [pdf, html, other]
-
Title: Can a Large Language Model Learn Matrix Functions In Context?Subjects: Machine Learning (cs.LG)
Large Language Models (LLMs) have demonstrated the ability to solve complex tasks through In-Context Learning (ICL), where models learn from a few input-output pairs without explicit fine-tuning. In this paper, we explore the capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition. Our experiments show that while LLMs perform comparably to traditional models such as Stochastic Gradient Descent (SGD) based Linear Regression and Neural Networks (NN) for simpler tasks, they outperform these models on more complex tasks, particularly in the case of top-k Singular Values. Furthermore, LLMs demonstrate strong scalability, maintaining high accuracy even as the matrix size increases. Additionally, we found that LLMs can achieve high accuracy with minimal prior examples, converging quickly and avoiding the overfitting seen in classical models. These results suggest that LLMs could provide an efficient alternative to classical methods for solving high-dimensional problems. Future work will focus on extending these findings to larger matrices and more complex matrix operations while exploring the effect of using different numerical representations in ICL.
- [62] arXiv:2411.15692 [pdf, html, other]
-
Title: DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent CollaborationSubjects: Machine Learning (cs.LG)
Recent advancements in Large Language Models (LLMs) have opened new avenues for accelerating drug discovery processes. Despite their potential, several critical challenges remain unsolved, particularly in translating theoretical ideas into practical applications within the highly specialized field of pharmaceutical research, limiting practitioners from leveraging the latest AI development in drug discovery. To this end, we introduce DrugAgent, a multi-agent framework aimed at automating machine learning (ML) programming in drug discovery. DrugAgent incorporates domain expertise by identifying specific requirements and building domain-specific tools, while systematically exploring different ideas to find effective solutions. A preliminary case study demonstrates DrugAgent's potential to overcome key limitations LLMs face in drug discovery, moving toward AI-driven innovation. For example, DrugAgent is able to complete the ML programming pipeline end-to-end, from data acquisition to performance evaluation for the ADMET prediction task, and finally select the best model, where the random forest model achieves an F1 score of 0.92 when predicting absorption using the PAMPA dataset.
- [63] arXiv:2411.15716 [pdf, html, other]
-
Title: Tackling Data Heterogeneity in Federated Time Series ForecastingSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Although substantial progress has been made in time series forecasting, most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices (e.g., sensors, wearables) to a central cloud server. However, this paradigm has overloaded communication networks and raised privacy concerns. Federated learning, a popular privacy-preserving technique, enables collaborative model training across distributed data sources. However, directly applying federated learning to time series forecasting often yields suboptimal results, as time series data generated by different devices are inherently heterogeneous. In this paper, we propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers. Specifically, Fed-TREND generates two types of synthetic data. The first type of synthetic data captures the representative distribution information from clients' uploaded model updates and enhances clients' local training consensus. The second kind of synthetic data extracts long-term influence insights from global model update trajectories and is used to refine the global model after aggregation. Fed-TREND is compatible with most time series forecasting models and can be seamlessly integrated into existing federated learning frameworks to improve prediction performance. Extensive experiments on eight datasets, using several federated learning baselines and four popular time series forecasting models, demonstrate the effectiveness and generalizability of Fed-TREND.
- [64] arXiv:2411.15717 [pdf, other]
-
Title: Learning Algorithm Hyperparameters for Fast Parametric Convex OptimizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
We introduce a machine-learning framework to learn the hyperparameter sequence of first-order methods (e.g., the step sizes in gradient descent) to quickly solve parametric convex optimization problems. Our computational architecture amounts to running fixed-point iterations where the hyperparameters are the same across all parametric instances and consists of two phases. In the first step-varying phase the hyperparameters vary across iterations, while in the second steady-state phase the hyperparameters are constant across iterations. Our learned optimizer is flexible in that it can be evaluated on any number of iterations and is guaranteed to converge to an optimal solution. To train, we minimize the mean square error to a ground truth solution. In the case of gradient descent, the one-step optimal step size is the solution to a least squares problem, and in the case of unconstrained quadratic minimization, we can compute the two and three-step optimal solutions in closed-form. In other cases, we backpropagate through the algorithm steps to minimize the training objective after a given number of steps. We show how to learn hyperparameters for several popular algorithms: gradient descent, proximal gradient descent, and two ADMM-based solvers: OSQP and SCS. We use a sample convergence bound to obtain generalization guarantees for the performance of our learned algorithm for unseen data, providing both lower and upper bounds. We showcase the effectiveness of our method with many examples, including ones from control, signal processing, and machine learning. Remarkably, our approach is highly data-efficient in that we only use $10$ problem instances to train the hyperparameters in all of our examples.
- [65] arXiv:2411.15721 [pdf, other]
-
Title: Research on Effectiveness Evaluation and Optimization of Baseball Teaching Method Based on Machine LearningSubjects: Machine Learning (cs.LG)
In modern physical education, data-driven evaluation methods have gradually attracted attention, especially the quantitative prediction of students' sports performance through machine learning model. The purpose of this study is to use a variety of machine learning models to regress and predict students' comprehensive scores in baseball training, so as to evaluate the effectiveness of the current baseball teaching methods and put forward targeted training optimization suggestions. We set up a model and evaluate the performance of students by collecting many characteristics, such as hitting times, running times and batting. The experimental results show that K-Neighbors Regressor and Gradient Boosting Regressor are excellent in comprehensive prediction accuracy and stability, and the R score and error index are significantly better than other models. In addition, through the analysis of feature importance, it is found that cumulative hits and cumulative runs are the key factors affecting students' comprehensive scores. Based on the results of this study, this paper puts forward some suggestions on optimizing training strategies to help students get better performance in baseball training. The results show that the data-driven teaching evaluation method can effectively support physical education and promote personalized and refined teaching plan design.
- [66] arXiv:2411.15743 [pdf, html, other]
-
Title: Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Time series forecasting is critical in numerous real-world applications, requiring accurate predictions of future values based on observed patterns. While traditional forecasting techniques work well in in-domain scenarios with ample data, they struggle when data is scarce or not available at all, motivating the emergence of zero-shot and few-shot learning settings. Recent advancements often leverage large-scale foundation models for such tasks, but these methods require extensive data and compute resources, and their performance may be hindered by ineffective learning from the available training set. This raises a fundamental question: What factors influence effective learning from data in time series forecasting? Toward addressing this, we propose using Fourier analysis to investigate how models learn from synthetic and real-world time series data. Our findings reveal that forecasters commonly suffer from poor learning from data with multiple frequencies and poor generalization to unseen frequencies, which impedes their predictive performance. To alleviate these issues, we present a novel synthetic data generation framework, designed to enhance real data or replace it completely by creating task-specific frequency information, requiring only the sampling rate of the target data. Our approach, Freq-Synth, improves the robustness of both foundation as well as nonfoundation forecast models in zero-shot and few-shot settings, facilitating more reliable time series forecasting under limited data scenarios.
- [67] arXiv:2411.15764 [pdf, html, other]
-
Title: LLM Online Spatial-temporal Signal Reconstruction Under NoiseSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
This work introduces the LLM Online Spatial-temporal Reconstruction (LLM-OSR) framework, which integrates Graph Signal Processing (GSP) and Large Language Models (LLMs) for online spatial-temporal signal reconstruction. The LLM-OSR utilizes a GSP-based spatial-temporal signal handler to enhance graph signals and employs LLMs to predict missing values based on spatiotemporal patterns. The performance of LLM-OSR is evaluated on traffic and meteorological datasets under varying Gaussian noise levels. Experimental results demonstrate that utilizing GPT-4-o mini within the LLM-OSR is accurate and robust under Gaussian noise conditions. The limitations are discussed along with future research insights, emphasizing the potential of combining GSP techniques with LLMs for solving spatiotemporal prediction tasks.
- [68] arXiv:2411.15795 [pdf, html, other]
-
Title: Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show significant improvements, including a decrease in the overall training time by up to 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.
- [69] arXiv:2411.15805 [pdf, html, other]
-
Title: Benchmarking Active Learning for NILMSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Non-intrusive load monitoring (NILM) focuses on disaggregating total household power consumption into appliance-specific usage. Many advanced NILM methods are based on neural networks that typically require substantial amounts of labeled appliance data, which can be challenging and costly to collect in real-world settings. We hypothesize that appliance data from all households does not uniformly contribute to NILM model improvements. Thus, we propose an active learning approach to selectively install appliance monitors in a limited number of houses. This work is the first to benchmark the use of active learning for strategically selecting appliance-level data to optimize NILM performance. We first develop uncertainty-aware neural networks for NILM and then install sensors in homes where disaggregation uncertainty is highest. Benchmarking our method on the publicly available Pecan Street Dataport dataset, we demonstrate that our approach significantly outperforms a standard random baseline and achieves performance comparable to models trained on the entire dataset. Using this approach, we achieve comparable NILM accuracy with approximately 30% of the data, and for a fixed number of sensors, we observe up to a 2x reduction in disaggregation errors compared to random sampling.
- [70] arXiv:2411.15806 [pdf, other]
-
Title: Broad Critic Deep Actor Reinforcement Learning for Continuous ControlComments: 7 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In the domain of continuous control, deep reinforcement learning (DRL) demonstrates promising results. However, the dependence of DRL on deep neural networks (DNNs) results in the demand for extensive data and increased computational complexity. To address this issue, a novel hybrid architecture for actor-critic reinforcement learning (RL) algorithms is introduced. The proposed architecture integrates the broad learning system (BLS) with DNN, aiming to merge the strengths of both distinct architectural paradigms. Specifically, the critic network is implemented using BLS, while the actor network is constructed with a DNN. For the estimations of the critic network parameters, ridge regression is employed, and the parameters of the actor network are optimized through gradient descent. The effectiveness of the proposed algorithm is evaluated by applying it to two classic continuous control tasks, and its performance is compared with the widely recognized deep deterministic policy gradient (DDPG) algorithm. Numerical results show that the proposed algorithm is superior to the DDPG algorithm in terms of computational efficiency, along with an accelerated learning trajectory. Application of the proposed algorithm in other actor-critic RL algorithms is suggested for investigation in future studies.
- [71] arXiv:2411.15831 [pdf, html, other]
-
Title: Efficient and Private: Memorisation under differentially private parameter-efficient fine-tuning in language modelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Fine-tuning large language models (LLMs) for specific tasks introduces privacy risks, as models may inadvertently memorise and leak sensitive training data. While Differential Privacy (DP) offers a solution to mitigate these risks, it introduces significant computational and performance trade-offs, particularly with standard fine-tuning approaches. Previous work has primarily focused on full-parameter updates, which are computationally intensive and may not fully leverage DPs potential in large models. In this work, we address these shortcomings by investigating Parameter-Efficient Fine-Tuning (PEFT) methods under DP constraints. We show that PEFT methods achieve comparable performance to standard fine-tuning while requiring fewer parameters and significantly reducing privacy leakage. Furthermore, we incorporate a data poisoning experiment involving intentional mislabelling to assess model memorisation and directly measure privacy risks. Our findings indicate that PEFT methods not only provide a promising alternative but also serve as a complementary approach for privacy-preserving, resource-efficient fine-tuning of LLMs.
- [72] arXiv:2411.15844 [pdf, html, other]
-
Title: Unveiling the Superior Paradigm: A Comparative Study of Source-Free Domain Adaptation and Unsupervised Domain AdaptationComments: Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
In domain adaptation, there are two popular paradigms: Unsupervised Domain Adaptation (UDA), which aligns distributions using source data, and Source-Free Domain Adaptation (SFDA), which leverages pre-trained source models without accessing source data. Evaluating the superiority of UDA versus SFDA is an open and timely question with significant implications for deploying adaptive algorithms in practical applications. In this study, we demonstrate through predictive coding theory and extensive experiments on multiple benchmark datasets that SFDA generally outperforms UDA in real-world scenarios. Specifically, SFDA offers advantages in time efficiency, storage requirements, targeted learning objectives, reduced risk of negative transfer, and increased robustness against overfitting. Notably, SFDA is particularly effective in mitigating negative transfer when there are substantial distribution discrepancies between source and target domains. Additionally, we introduce a novel data-model fusion scenario, where data sharing among stakeholders varies (e.g., some provide raw data while others provide only models), and reveal that traditional UDA and SFDA methods do not fully exploit their potential in this context. To address this limitation and capitalize on the strengths of SFDA, we propose a novel weight estimation method that effectively integrates available source data into multi-SFDA (MSFDA) approaches, thereby enhancing model performance within this scenario. This work provides a thorough analysis of UDA versus SFDA and advances a practical approach to model adaptation across diverse real-world environments.
- [73] arXiv:2411.15847 [pdf, html, other]
-
Title: FedQP: Towards Accurate Federated Learning using Quadratic Programming Guided MutationComments: SEKE 2024, 6 pagesSubjects: Machine Learning (cs.LG)
Due to the advantages of privacy-preserving, Federated Learning (FL) is widely used in distributed machine learning systems. However, existing FL methods suffer from low-inference performance caused by data heterogeneity. Specifically, due to heterogeneous data, the optimization directions of different local models vary greatly, making it difficult for the traditional FL method to get a generalized global model that performs well on all clients. As one of the state-of-the-art FL methods, the mutation-based FL method attempts to adopt a stochastic mutation strategy to guide the model training towards a well-generalized area (i.e., flat area in the loss landscape). Specifically, mutation allows the model to shift within the solution space, providing an opportunity to escape areas with poor generalization (i.e., sharp area). However, the stochastic mutation strategy easily results in diverse optimal directions of mutated models, which limits the performance of the existing mutation-based FL method. To achieve higher performance, this paper proposes a novel mutation-based FL approach named FedQP, utilizing a quadratic programming strategy to regulate the mutation directions wisely. By biasing the model mutation towards the direction of gradient update rather than traditional random mutation, FedQP can effectively guide the model to optimize towards a well-generalized area (i.e., flat area). Experiments on multiple well-known datasets show that our quadratic programming-guided mutation strategy effectively improves the inference accuracy of the global model in various heterogeneous data scenarios.
- [74] arXiv:2411.15866 [pdf, html, other]
-
Title: Ruppert-Polyak averaging for Stochastic Order OracleSubjects: Machine Learning (cs.LG)
Black-box optimization, a rapidly growing field, faces challenges due to limited knowledge of the objective function's internal mechanisms. One promising approach to address this is the Stochastic Order Oracle Concept. This concept, similar to other Order Oracle Concepts, relies solely on relative comparisons of function values without requiring access to the exact values. This paper presents a novel, improved estimation of the covariance matrix for the asymptotic convergence of the Stochastic Order Oracle Concept. Our work surpasses existing research in this domain by offering a more accurate estimation of asymptotic convergence rate. Finally, numerical experiments validate our theoretical findings, providing strong empirical support for our proposed approach.
- [75] arXiv:2411.15876 [pdf, html, other]
-
Title: An Extensive Study on D2C: Overfitting Remediation in Deep Learning Using a Decentralized ApproachComments: 17 PagesSubjects: Machine Learning (cs.LG)
Overfitting remains a significant challenge in deep learning, often arising from data outliers, noise, and limited training data. To address this, we propose Divide2Conquer (D2C), a novel technique to mitigate overfitting. D2C partitions the training data into multiple subsets and trains identical models independently on each subset. To balance model generalization and subset-specific learning, the model parameters are periodically aggregated and averaged during training. This process enables the learning of robust patterns while minimizing the influence of outliers and noise. Empirical evaluations on benchmark datasets across diverse deep-learning tasks demonstrate that D2C significantly enhances generalization performance, particularly with larger datasets. Our analysis includes evaluations of decision boundaries, loss curves, and other performance metrics, highlighting D2C's effectiveness both as a standalone technique and in combination with other overfitting reduction methods. We further provide a rigorous mathematical justification for D2C's underlying principles and examine its applicability across multiple domains. Finally, we explore the trade-offs associated with D2C and propose strategies to address them, offering a holistic view of its strengths and limitations. This study establishes D2C as a versatile and effective approach to combating overfitting in deep learning. Our codes are publicly available at: this https URL.
- [76] arXiv:2411.15878 [pdf, html, other]
-
Title: ExAL: An Exploration Enhanced Adversarial Learning AlgorithmSubjects: Machine Learning (cs.LG)
Adversarial learning is critical for enhancing model robustness, aiming to defend against adversarial attacks that jeopardize machine learning systems. Traditional methods often lack efficient mechanisms to explore diverse adversarial perturbations, leading to limited model resilience. Inspired by game-theoretic principles, where adversarial dynamics are analyzed through frameworks like Nash equilibrium, exploration mechanisms in such setups allow for the discovery of diverse strategies, enhancing system robustness. However, existing adversarial learning methods often fail to incorporate structured exploration effectively, reducing their ability to improve model defense comprehensively. To address these challenges, we propose a novel Exploration-enhanced Adversarial Learning Algorithm (ExAL), leveraging the Exponentially Weighted Momentum Particle Swarm Optimizer (EMPSO) to generate optimized adversarial perturbations. ExAL integrates exploration-driven mechanisms to discover perturbations that maximize impact on the model's decision boundary while preserving structural coherence in the data. We evaluate the performance of ExAL on the MNIST Handwritten Digits and Blended Malware datasets. Experimental results demonstrate that ExAL significantly enhances model resilience to adversarial attacks by improving robustness through adversarial learning.
- [77] arXiv:2411.15891 [pdf, other]
-
Title: From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and RewardsSubjects: Machine Learning (cs.LG)
Large Language Models (LLMs) and Reinforcement Learning (RL) are two powerful approaches for building autonomous agents. However, due to limited understanding of the game environment, agents often resort to inefficient exploration and trial-and-error, struggling to develop long-term strategies or make decisions. We propose a method that extracts experience from interaction records to model the underlying laws of the game environment, using these experience as internal motivation to guide agents. These experience, expressed in language, are highly flexible and can either assist agents in reasoning directly or be transformed into rewards for guiding training. Our evaluation results in Crafter demonstrate that both RL and LLM agents benefit from these experience, leading to improved overall performance.
- [78] arXiv:2411.15893 [pdf, html, other]
-
Title: Distribution-aware Online Continual Learning for Urban Spatio-Temporal ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Urban spatio-temporal (ST) forecasting is crucial for various urban applications such as intelligent scheduling and trip planning. Previous studies focus on modeling ST correlations among urban locations in offline settings, which often neglect the non-stationary nature of urban ST data, particularly, distribution shifts over time. This oversight can lead to degraded performance in real-world scenarios. In this paper, we first analyze the distribution shifts in urban ST data, and then introduce DOST, a novel online continual learning framework tailored for ST data characteristics. DOST employs an adaptive ST network equipped with a variable-independent adapter to address the unique distribution shifts at each urban location dynamically. Further, to accommodate the gradual nature of these shifts, we also develop an awake-hibernate learning strategy that intermittently fine-tunes the adapter during the online phase to reduce computational overhead. This strategy integrates a streaming memory update mechanism designed for urban ST sequential data, enabling effective network adaptation to new patterns while preventing catastrophic forgetting. Experimental results confirm DOST's superiority over state-of-the-art models on four real-world datasets, providing online forecasts within an average of 0.1 seconds and achieving a 12.89% reduction in forecast errors compared to baseline models.
- [79] arXiv:2411.15894 [pdf, other]
-
Title: Navigating the Effect of Parametrization for Dimensionality ReductionComments: NeurIPS 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Parametric dimensionality reduction methods have gained prominence for their ability to generalize to unseen datasets, an advantage that traditional approaches typically lack. Despite their growing popularity, there remains a prevalent misconception among practitioners about the equivalence in performance between parametric and non-parametric methods. Here, we show that these methods are not equivalent -- parametric methods retain global structure but lose significant local details. To explain this, we provide evidence that parameterized approaches lack the ability to repulse negative pairs, and the choice of loss function also has an impact. Addressing these issues, we developed a new parametric method, ParamRepulsor, that incorporates Hard Negative Mining and a loss function that applies a strong repulsive force. This new method achieves state-of-the-art performance on local structure preservation for parametric methods without sacrificing the fidelity of global structural representation. Our code is available at this https URL.
- [80] arXiv:2411.15919 [pdf, html, other]
-
Title: Enhancing Symbolic Regression and Universal Physics-Informed Neural Networks with Dimensional AnalysisSubjects: Machine Learning (cs.LG)
We present a new method for enhancing symbolic regression for differential equations via dimensional analysis, specifically Ipsen's and Buckingham pi methods. Since symbolic regression often suffers from high computational costs and overfitting, non-dimensionalizing datasets reduces the number of input variables, simplifies the search space, and ensures that derived equations are physically meaningful. As our main contribution, we integrate Ipsen's method of dimensional analysis with Universal Physics-Informed Neural Networks. We also combine dimensional analysis with the AI Feynman symbolic regression algorithm to show that dimensional analysis significantly improves the accuracy of the recovered equation. The results demonstrate that transforming data into a dimensionless form significantly decreases computation time and improves accuracy of the recovered hidden term. For algebraic equations, using the Buckingham pi theorem reduced complexity, allowing the AI Feynman model to converge faster with fewer data points and lower error rates. For differential equations, Ipsen's method was combined with Universal Physics-Informed Neural Networks (UPINNs) to identify hidden terms more effectively. These findings suggest that integrating dimensional analysis with symbolic regression can significantly lower computational costs, enhance model interpretability, and increase accuracy, providing a robust framework for automated discovery of governing equations in complex systems when data is limited.
- [81] arXiv:2411.15920 [pdf, html, other]
-
Title: An AutoML-based approach for Network Intrusion DetectionNana Kankam Gyimah, Judith Mwakalonge, Gurcan Comert, Saidi Siuhi, Robert Akinie, Methusela Sulle, Denis Ruganuza, Benibo Izison, Arthur MukwayaSubjects: Machine Learning (cs.LG)
In this paper, we present an automated machine learning (AutoML) approach for network intrusion detection, leveraging a stacked ensemble model developed using the MLJAR AutoML framework. Our methodology combines multiple machine learning algorithms, including LightGBM, CatBoost, and XGBoost, to enhance detection accuracy and robustness. By automating model selection, feature engineering, and hyperparameter tuning, our approach reduces the manual overhead typically associated with traditional machine learning methods. Extensive experimentation on the NSL-KDD dataset demonstrates that the stacked ensemble model outperforms individual models, achieving high accuracy and minimizing false positives. Our findings underscore the benefits of using AutoML for network intrusion detection, as the AutoML-driven stacked ensemble achieved the highest performance with 90\% accuracy and an 89\% F1 score, outperforming individual models like Random Forest (78\% accuracy, 78\% F1 score), XGBoost and CatBoost (both 80\% accuracy, 80\% F1 score), and LightGBM (78\% accuracy, 78\% F1 score), providing a more adaptable and efficient solution for network security applications.
- [82] arXiv:2411.15931 [pdf, other]
-
Title: Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy MaximizationComments: 19 pages including appendix, 5 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Applications (stat.AP); Machine Learning (stat.ML)
A number of different architectures and loss functions have been applied to the problem of self-supervised learning (SSL), with the goal of developing embeddings that provide the best possible pre-training for as-yet-unknown, lightly supervised downstream tasks. One of these SSL criteria is to maximize the entropy of a set of embeddings in some compact space. But the goal of maximizing the embedding entropy often depends--whether explicitly or implicitly--upon high dimensional entropy estimates, which typically perform poorly in more than a few dimensions. In this paper, we motivate an effective entropy maximization criterion (E2MC), defined in terms of easy-to-estimate, low-dimensional constraints. We demonstrate that using it to continue training an already-trained SSL model for only a handful of epochs leads to a consistent and, in some cases, significant improvement in downstream performance. We perform careful ablation studies to show that the improved performance is due to the proposed add-on criterion. We also show that continued pre-training with alternative criteria does not lead to notable improvements, and in some cases, even degrades performance.
- [83] arXiv:2411.15944 [pdf, html, other]
-
Title: Customer Lifetime Value Prediction with Uncertainty Estimation Using Monte Carlo DropoutComments: 9 pages, 3 figuresSubjects: Machine Learning (cs.LG)
Accurately predicting customer Lifetime Value (LTV) is crucial for companies to optimize their revenue strategies. Traditional deep learning models for LTV prediction are effective but typically provide only point estimates and fail to capture model uncertainty in modeling user behaviors. To address this limitation, we propose a novel approach that enhances the architecture of purely neural network models by incorporating the Monte Carlo Dropout (MCD) framework. We benchmarked the proposed method using data from one of the most downloaded mobile games in the world, and demonstrated a substantial improvement in predictive Top 5\% Mean Absolute Percentage Error compared to existing state-of-the-art methods. Additionally, our approach provides confidence metric as an extra dimension for performance evaluation across various neural network models, facilitating more informed business decisions.
- [84] arXiv:2411.15945 [pdf, html, other]
-
Title: Understanding Machine Learning Paradigms through the Lens of Statistical Thermodynamics: A tutorialStar (Xinxin)LiuComments: 19 pagesSubjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Statistics Theory (math.ST); Chemical Physics (physics.chem-ph)
This tutorial investigates the convergence of statistical mechanics and learning theory, elucidating the potential enhancements in machine learning methodologies through the integration of foundational principles from physics. The tutorial delves into advanced techniques like entropy, free energy, and variational inference which are utilized in machine learning, illustrating their significant contributions to model efficiency and robustness. By bridging these scientific disciplines, we aspire to inspire newer methodologies in researches, demonstrating how an in-depth comprehension of physical systems' behavior can yield more effective and dependable machine learning models, particularly in contexts characterized by uncertainty.
- [85] arXiv:2411.15951 [pdf, other]
-
Title: Partial Identifiability and Misspecification in Inverse Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer $R$ from $\pi$, an IRL algorithm must have a *behavioural model* of how $\pi$ relates to $R$. However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature. We also provide necessary and sufficient conditions that describe precisely how the observed demonstrator policy may differ from each of the standard behavioural models before that model leads to faulty inferences about the reward function $R$. In addition to this, we introduce a cohesive framework for reasoning about partial identifiability and misspecification in IRL, together with several formal tools that can be used to easily derive the partial identifiability and misspecification robustness of new IRL models, or analyse other kinds of reward learning algorithms.
- [86] arXiv:2411.15958 [pdf, other]
-
Title: Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of NoiseEnea Monzio Compagnoni, Tianlin Liu, Rustem Islamov, Frank Norbert Proske, Antonio Orvieto, Aurelien LucchiComments: An earlier version, titled 'SDEs for Adaptive Methods: The Role of Noise' and dated May 2024, is available on OpenReviewSubjects: Machine Learning (cs.LG)
Despite the vast empirical evidence supporting the efficacy of adaptive optimization methods in deep learning, their theoretical understanding is far from complete. This work introduces novel SDEs for commonly used adaptive optimizers: SignSGD, RMSprop(W), and Adam(W). These SDEs offer a quantitatively accurate description of these optimizers and help illuminate an intricate relationship between adaptivity, gradient noise, and curvature. Our novel analysis of SignSGD highlights a noteworthy and precise contrast to SGD in terms of convergence speed, stationary distribution, and robustness to heavy-tail noise. We extend this analysis to AdamW and RMSpropW, for which we observe that the role of noise is much more complex. Crucially, we support our theoretical analysis with experimental evidence by verifying our insights: this includes numerically integrating our SDEs using Euler-Maruyama discretization on various neural network architectures such as MLPs, CNNs, ResNets, and Transformers. Our SDEs accurately track the behavior of the respective optimizers, especially when compared to previous SDEs derived for Adam and RMSprop. We believe our approach can provide valuable insights into best training practices and novel scaling rules.
- [87] arXiv:2411.15972 [pdf, html, other]
-
Title: Stability properties of gradient flow dynamics for the symmetric low-rank matrix factorization problemComments: 10 pages, 3 figuresSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
The symmetric low-rank matrix factorization serves as a building block in many learning tasks, including matrix recovery and training of neural networks. However, despite a flurry of recent research, the dynamics of its training via non-convex factorized gradient-descent-type methods is not fully understood especially in the over-parameterized regime where the fitted rank is higher than the true rank of the target matrix. To overcome this challenge, we characterize equilibrium points of the gradient flow dynamics and examine their local and global stability properties. To facilitate a precise global analysis, we introduce a nonlinear change of variables that brings the dynamics into a cascade connection of three subsystems whose structure is simpler than the structure of the original system. We demonstrate that the Schur complement to a principal eigenspace of the target matrix is governed by an autonomous system that is decoupled from the rest of the dynamics. In the over-parameterized regime, we show that this Schur complement vanishes at an $O(1/t)$ rate, thereby capturing the slow dynamics that arises from excess parameters. We utilize a Lyapunov-based approach to establish exponential convergence of the other two subsystems. By decoupling the fast and slow parts of the dynamics, we offer new insight into the shape of the trajectories associated with local search algorithms and provide a complete characterization of the equilibrium points and their global stability properties. Such an analysis via nonlinear control techniques may prove useful in several related over-parameterized problems.
- [88] arXiv:2411.15997 [pdf, html, other]
-
Title: Ensuring Fair LLM Serving Amid Diverse ApplicationsRedwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan RajmohanSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and ensure fairness. Our experimental results on real-world traces demonstrate FairServe's superior performance compared to the state-of-the-art method in ensuring fairness. We are actively working on deploying our system in production, expecting to benefit millions of customers world-wide.
- [89] arXiv:2411.16003 [pdf, html, other]
-
Title: eFedLLM: Efficient LLM Inference Based on Federated LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) herald a transformative era in artificial intelligence (AI). However, the expansive scale of data and parameters of LLMs requires high-demand computational and memory resources, restricting their accessibility to a broader range of users and researchers. This paper introduces an effective approach that enhances the operational efficiency and affordability of LLM inference. By utilizing transformer-based federated learning (FL) with model-parallel distributed training, our model efficiently distributes the computational loads and memory requirements across a network of participants. This strategy permits users, especially those with limited resources to train state-of-the-art LLMs collaboratively. We also innovate an incentive mechanism within the FL framework, rewarding constructive contributions and filtering out malicious activities, thereby safeguarding the integrity and reliability of the training process. Concurrently, we leverage memory hierarchy strategies and Singular Value Decomposition (SVD) on weight matrices to boost computational and memory efficiencies further. Our results, derived from formulaic analyses and numerical calculations, demonstrate significant optimization of resource use and democratize access to cutting-edge LLMs, ensuring that a wide scale of users can both contribute to and benefit from these advanced models.
- [90] arXiv:2411.16019 [pdf, html, other]
-
Title: M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective SchedulingSubjects: Machine Learning (cs.LG)
Recent advancements in reinforcement learning (RL) for analog circuit optimization have demonstrated significant potential for improving sample efficiency and generalization across diverse circuit topologies and target specifications. However, there are challenges such as high computational overhead, the need for bespoke models for each circuit. To address them, we propose M3, a novel Model-based RL (MBRL) method employing the Mamba architecture and effective scheduling. The Mamba architecture, known as a strong alternative to the transformer architecture, enables multi-circuit optimization with distinct parameters and target specifications. The effective scheduling strategy enhances sample efficiency by adjusting crucial MBRL training parameters. To the best of our knowledge, M3 is the first method for multi-circuit optimization by leveraging both the Mamba architecture and a MBRL with effective scheduling. As a result, it significantly improves sample efficiency compared to existing RL methods.
- [91] arXiv:2411.16030 [pdf, html, other]
-
Title: Binary Search with Distributional PredictionsMichael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, Aidin Niaparast, Sergei VassilvitskiiSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Algorithms with (machine-learned) predictions is a powerful framework for combining traditional worst-case algorithms with modern machine learning. However, the vast majority of work in this space assumes that the prediction itself is non-probabilistic, even if it is generated by some stochastic process (such as a machine learning system). This is a poor fit for modern ML, particularly modern neural networks, which naturally generate a distribution. We initiate the study of algorithms with distributional predictions, where the prediction itself is a distribution. We focus on one of the simplest yet fundamental settings: binary search (or searching a sorted array). This setting has one of the simplest algorithms with a point prediction, but what happens if the prediction is a distribution? We show that this is a richer setting: there are simple distributions where using the classical prediction-based algorithm with any single prediction does poorly. Motivated by this, as our main result, we give an algorithm with query complexity $O(H(p) + \log \eta)$, where $H(p)$ is the entropy of the true distribution $p$ and $\eta$ is the earth mover's distance between $p$ and the predicted distribution $\hat p$. This also yields the first distributionally-robust algorithm for the classical problem of computing an optimal binary search tree given a distribution over target keys. We complement this with a lower bound showing that this query complexity is essentially optimal (up to constants), and experiments validating the practical usefulness of our algorithm.
- [92] arXiv:2411.16035 [pdf, html, other]
-
Title: Predicting Emergent Capabilities by FinetuningSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
A fundamental open challenge in modern LLM scaling is the lack of understanding around emergent capabilities. In particular, language model pretraining loss is known to be highly predictable as a function of compute. However, downstream capabilities are far less predictable -- sometimes even exhibiting emergent jumps -- which makes it challenging to anticipate the capabilities of future models. In this work, we first pose the task of emergence prediction: given access to current LLMs that have random few-shot accuracy on a task, can we predict whether future models (GPT-N+1) will have non-trivial accuracy on that task? We then discover a simple insight for this problem: finetuning LLMs on a given task can shift the point in scaling at which emergence occurs towards less capable models. To operationalize this insight, we can finetune LLMs with varying amounts of data and fit a parametric function that predicts when emergence will occur (i.e., "emergence laws"). We validate this approach using four standard NLP benchmarks where large-scale open-source LLMs already demonstrate emergence (MMLU, GSM8K, CommonsenseQA, and CoLA). Using only small-scale LLMs, we find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged. Finally, we present a case study of two realistic uses for emergence prediction.
- [93] arXiv:2411.16063 [pdf, html, other]
-
Title: VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics PredictionSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
In-Context Operator Networks (ICONs) are models that learn operators across different types of PDEs using a few-shot, in-context approach. Although they show successful generalization to various PDEs, existing methods treat each data point as a single token, and suffer from computational inefficiency when processing dense data, limiting their application in higher spatial dimensions. In this work, we propose Vision In-Context Operator Networks (VICON), incorporating a vision transformer architecture that efficiently processes 2D functions through patch-wise operations. We evaluated our method on three fluid dynamics datasets, demonstrating both superior performance (reducing scaled $L^2$ error by $40\%$ and $61.6\%$ for two benchmark datasets for compressible flows, respectively) and computational efficiency (requiring only one-third of the inference time per frame) in long-term rollout predictions compared to the current state-of-the-art sequence-to-sequence model with fixed timestep prediction: Multiple Physics Pretraining (MPP). Compared to MPP, our method preserves the benefits of in-context operator learning, enabling flexible context formation when dealing with insufficient frame counts or varying timestep values.
- [94] arXiv:2411.16073 [pdf, html, other]
-
Title: Soft-TransFormers for Continual LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Inspired by Well-initialized Lottery Ticket Hypothesis (WLTH), which provides suboptimal fine-tuning solutions, we propose a novel fully fine-tuned continual learning (CL) method referred to as Soft-TransFormers (Soft-TF). Soft-TF sequentially learns and selects an optimal soft-network or subnetwork for each task. During sequential training in CL, Soft-TF jointly optimizes the weights of sparse layers to obtain task-adaptive soft (real-valued) networks or subnetworks (binary masks), while keeping the well-pre-trained layer parameters frozen. In inference, the identified task-adaptive network of Soft-TF masks the parameters of the pre-trained network, mapping to an optimal solution for each task and minimizing Catastrophic Forgetting (CF) - the soft-masking preserves the knowledge of the pre-trained network. Extensive experiments on Vision Transformer (ViT) and CLIP demonstrate the effectiveness of Soft-TF, achieving state-of-the-art performance across various CL scenarios, including Class-Incremental Learning (CIL) and Task-Incremental Learning (TIL), supported by convergence theory.
- [95] arXiv:2411.16081 [pdf, html, other]
-
Title: Exploring the Generalization Capabilities of AID-based Bi-level OptimizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bi-level optimization has achieved considerable success in contemporary machine learning applications, especially for given proper hyperparameters. However, due to the two-level optimization structure, commonly, researchers focus on two types of bi-level optimization methods: approximate implicit differentiation (AID)-based and iterative differentiation (ITD)-based approaches. ITD-based methods can be readily transformed into single-level optimization problems, facilitating the study of their generalization capabilities. In contrast, AID-based methods cannot be easily transformed similarly but must stay in the two-level structure, leaving their generalization properties enigmatic. In this paper, although the outer-level function is nonconvex, we ascertain the uniform stability of AID-based methods, which achieves similar results to a single-level nonconvex problem. We conduct a convergence analysis for a carefully chosen step size to maintain stability. Combining the convergence and stability results, we give the generalization ability of AID-based bi-level optimization methods. Furthermore, we carry out an ablation study of the parameters and assess the performance of these methods on real-world tasks. Our experimental results corroborate the theoretical findings, demonstrating the effectiveness and potential applications of these methods.
- [96] arXiv:2411.16085 [pdf, other]
-
Title: Cautious Optimizers: Improving Training with One Line of CodeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM)
AdamW has been the default optimizer for transformer pretraining. For many years, our community searches for faster and more stable optimizers with only constraint positive outcomes. In this work, we propose a \textbf{single-line modification in Pytorch} to any momentum-based optimizer, which we rename Cautious Optimizer, e.g. C-AdamW and C-Lion. Our theoretical result shows that this modification preserves Adam's Hamiltonian function and it does not break the convergence guarantee under the Lyapunov analysis. In addition, a whole new family of optimizers is revealed by our theoretical insight. Among them, we pick the simplest one for empirical experiments, showing speed-up on Llama and MAE pretraining up to $1.47\times$. Code is available at this https URL
- [97] arXiv:2411.16094 [pdf, html, other]
-
Title: Very Basics of Tensors with Graphical Notations: Unfolding, Calculations, and DecompositionsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP); Machine Learning (stat.ML)
Tensor network diagram (graphical notation) is a useful tool that graphically represents multiplications between multiple tensors using nodes and edges. Using the graphical notation, complex multiplications between tensors can be described simply and intuitively, and it also helps to understand the essence of tensor products. In fact, most of matrix/tensor products including inner product, outer product, Hadamard product, Kronecker product, and Khatri-Rao product can be written in graphical notation. These matrix/tensor operations are essential building blocks for the use of matrix/tensor decompositions in signal processing and machine learning. The purpose of this lecture note is to learn the very basics of tensors and how to represent them in mathematical symbols and graphical notation. Many papers using tensors omit these detailed definitions and explanations, which can be difficult for the reader. I hope this note will be of help to such readers.
- [98] arXiv:2411.16095 [pdf, html, other]
-
Title: LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding StrategyPeng Cui (1), Yiming Yang (2), Fusheng Jin (1), Siyuan Tang (2), Yunli Wang (2), Fukang Yang (2), Yalong Jia (2), Qingpeng Cai (2), Fei Pan (2), Changcheng Li (2), Peng Jiang (2) ((1) Beijing Institute of Technology, (2) Kuaishou Technology)Comments: 10 pages, 8 figures, 6 tablesSubjects: Machine Learning (cs.LG)
In online advertising, once an ad campaign is deployed, the automated bidding system dynamically adjusts the bidding strategy to optimize Cost Per Action (CPA) based on the number of ad conversions. For ads with a long conversion delay, relying solely on the real-time tracked conversion number as a signal for bidding strategy can significantly overestimate the current CPA, leading to conservative bidding strategies. Therefore, it is crucial to predict the number of long-delayed conversions. Nonetheless, it is challenging to predict ad conversion numbers through traditional regression methods due to the wide range of ad conversion numbers. Previous regression works have addressed this challenge by transforming regression problems into bucket classification problems, achieving success in various scenarios. However, specific challenges arise when predicting the number of ad conversions: 1) The integer nature of ad conversion numbers exacerbates the discontinuity issue in one-hot hard labels; 2) The long-tail distribution of ad conversion numbers complicates tail data prediction. In this paper, we propose the Long-Delayed Ad Conversions Prediction model for bidding strategy (LDACP), which consists of two sub-modules. To alleviate the issue of discontinuity in one-hot hard labels, the Bucket Classification Module with label Smoothing method (BCMS) converts one-hot hard labels into non-normalized soft labels, then fits these soft labels by minimizing classification loss and regression loss. To address the challenge of predicting tail data, the Value Regression Module with Proxy labels (VRMP) uses the prediction bias of aggregated pCTCVR as proxy labels. Finally, a Mixture of Experts (MoE) structure integrates the predictions from BCMS and VRMP to obtain the final predicted ad conversion number.
- [99] arXiv:2411.16102 [pdf, html, other]
-
Title: BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware BatchingSubjects: Machine Learning (cs.LG)
Offline batch inference, which leverages the flexibility of request batching to achieve higher throughput and lower costs, is becoming more popular for latency-insensitive applications. Meanwhile, recent progress in model capability and modality makes requests more diverse in compute and memory demands, creating unique opportunities for throughput improvement by resource overlapping. However, a request schedule that maximizes resource overlapping can conflict with the schedule that maximizes prefix sharing, a widely-used performance optimization, causing sub-optimal inference throughput. We present BlendServe, a system that maximizes resource utilization of offline batch inference by combining the benefits of resource overlapping and prefix sharing using a resource-aware prefix tree. BlendServe exploits the relaxed latency requirements in offline batch inference to reorder and overlap requests with varied resource demands while ensuring high prefix sharing. We evaluate BlendServe on a variety of synthetic multi-modal workloads and show that it provides up to $1.44\times$ throughput boost compared to widely-used industry standards, vLLM and SGLang.
- [100] arXiv:2411.16105 [pdf, html, other]
-
Title: Adaptive Circuit Behavior and Generalization in Mechanistic InterpretabilityComments: 10 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Mechanistic interpretability aims to understand the inner workings of large neural networks by identifying circuits, or minimal subgraphs within the model that implement algorithms responsible for performing specific tasks. These circuits are typically discovered and analyzed using a narrowly defined prompt format. However, given the abilities of large language models (LLMs) to generalize across various prompt formats for the same task, it remains unclear how well these circuits generalize. For instance, it is unclear whether the models generalization results from reusing the same circuit components, the components behaving differently, or the use of entirely different components. In this paper, we investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small, which is well-studied and believed to implement a simple, interpretable algorithm. We evaluate its performance on prompt variants that challenge the assumptions of this algorithm. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges. Notably, the circuit generalizes even to prompt variants where the original algorithm should fail; we discover a mechanism that explains this which we term S2 Hacking. Our findings indicate that circuits within LLMs may be more flexible and general than previously recognized, underscoring the importance of studying circuit generalization to better understand the broader capabilities of these models.
- [101] arXiv:2411.16110 [pdf, html, other]
-
Title: FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training DataComments: Accepted at WACV 2025. Supplementary material included after references. 17 pages, 7 figures, 14 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
While the mainstream research in anomaly detection has mainly followed the one-class classification, practical industrial environments often incur noisy training data due to annotation errors or lack of labels for new or refurbished products. To address these issues, we propose a novel learning-based approach for fully unsupervised anomaly detection with unlabeled and potentially contaminated training data. Our method is motivated by two observations, that i) the pairwise feature distances between the normal samples are on average likely to be smaller than those between the anomaly samples or heterogeneous samples and ii) pairs of features mutually closest to each other are likely to be homogeneous pairs, which hold if the normal data has smaller variance than the anomaly data. Building on the first observation that nearest-neighbor distances can distinguish between confident normal samples and anomalies, we propose a pseudo-labeling strategy using an iteratively reconstructed memory bank (IRMB). The second observation is utilized as a new loss function to promote class-homogeneity between mutually closest pairs thereby reducing the ill-posedness of the task. Experimental results on two public industrial anomaly benchmarks and semantic anomaly examples validate the effectiveness of FUN-AD across different scenarios and anomaly-to-normal ratios. Our code is available at this https URL.
- [102] arXiv:2411.16127 [pdf, html, other]
-
Title: DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUsSubjects: Machine Learning (cs.LG); Performance (cs.PF)
Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to $7.0\times$ over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of $2.16\times$ in end-to-end training compared to the popular GNN computing framework DGL.
- [103] arXiv:2411.16133 [pdf, html, other]
-
Title: Context Awareness Gate For Retrieval Augmented GenerationSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Retrieval Augmented Generation (RAG) has emerged as a widely adopted approach to mitigate the limitations of large language models (LLMs) in answering domain-specific questions. Previous research has predominantly focused on improving the accuracy and quality of retrieved data chunks to enhance the overall performance of the generation pipeline. However, despite ongoing advancements, the critical issue of retrieving irrelevant information -- which can impair the ability of the model to utilize its internal knowledge effectively -- has received minimal attention. In this work, we investigate the impact of retrieving irrelevant information in open-domain question answering, highlighting its significant detrimental effect on the quality of LLM outputs. To address this challenge, we propose the Context Awareness Gate (CAG) architecture, a novel mechanism that dynamically adjusts the LLMs' input prompt based on whether the user query necessitates external context retrieval. Additionally, we introduce the Vector Candidates method, a core mathematical component of CAG that is statistical, LLM-independent, and highly scalable. We further examine the distributions of relationships between contexts and questions, presenting a statistical analysis of these distributions. This analysis can be leveraged to enhance the context retrieval process in Retrieval Augmented Generation (RAG) systems.
- [104] arXiv:2411.16139 [pdf, html, other]
-
Title: Beyond Task Vectors: Selective Task Arithmetic Based on Importance MetricsComments: Under ReviewSubjects: Machine Learning (cs.LG)
Pretrained models have revolutionized deep learning by enabling significant performance improvements across a wide range of tasks, leveraging large-scale, pre-learned knowledge representations. However, deploying these models in real-world multi-task learning (MTL) scenarios poses substantial challenges, primarily due to high computational costs and inefficiencies in inference. Traditional approaches such as pruning, quantization, and knowledge distillation have been explored to mitigate these issues, but they often fall short in fully addressing the complexities of multi-task environments. This paper introduces \textbf{\underline{S}}elective \textbf{\underline{T}}ask \textbf{\underline{A}}rithmetic \underline{\textbf{(STA)}}, a training-free framework designed to enhance multi-task performance through task-specific parameter fusion. STA addresses three key challenges: (i) \textbf{Parameter importance diversity: } Recognizing that different tasks relie on distinct parameters, STA employs a loss-sensitive parameter importance metric derived from a first-order Taylor expansion to accurately measure the importance of parameters for each task. (ii) \textbf{Over-reliance on hyperparameter tuning: }By enhancing the sparsity of task vectors through parameter importance metrics, STA reduces the need for extensive hyperparameter tuning, thereby improving the generalization and robustness of the model. (iii) \textbf{Neglect of other abilities in task arithmetic: } Previous works have largely overlooked the potential for more precise task forgetting. STA leverages its parameter importance metric to achieve more controlled and effective task forgetting, minimizing the impact of noisy elements that can degrade model performance. Experimental results demonstrate that STA achieves superior multi-task performance across benchmarks and excellent performance in task forgetting.
- [105] arXiv:2411.16142 [pdf, html, other]
-
Title: Causal Adjacency Learning for Spatiotemporal Prediction Over GraphsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Spatiotemporal prediction over graphs (STPG) is crucial for transportation systems. In existing STPG models, an adjacency matrix is an important component that captures the relations among nodes over graphs. However, most studies calculate the adjacency matrix by directly memorizing the data, such as distance- and correlation-based matrices. These adjacency matrices do not consider potential pattern shift for the test data, and may result in suboptimal performance if the test data has a different distribution from the training one. This issue is known as the Out-of-Distribution generalization problem. To address this issue, in this paper we propose a Causal Adjacency Learning (CAL) method to discover causal relations over graphs. The learned causal adjacency matrix is evaluated on a downstream spatiotemporal prediction task using real-world graph data. Results demonstrate that our proposed adjacency matrix can capture the causal relations, and using our learned adjacency matrix can enhance prediction performance on the OOD test data, even though causal learning is not conducted in the downstream task.
- [106] arXiv:2411.16145 [pdf, html, other]
-
Title: Local Intrinsic Dimensionality for Dynamic Graph EmbeddingsSubjects: Machine Learning (cs.LG)
The notion of local intrinsic dimensionality (LID) has important theoretical implications and practical applications in the fields of data mining and machine learning. Recent research efforts indicate that LID measures defined for graphs can improve graph representational learning methods based on random walks. In this paper, we discuss how NC-LID, a LID measure designed for static graphs, can be adapted for dynamic networks. Focusing on dynnode2vec as the most representative dynamic graph embedding method based on random walks, we examine correlations between NC-LID and the intrinsic quality of 10 real-world dynamic network embeddings. The obtained results show that NC-LID can be used as a good indicator of nodes whose embedding vectors do not tend to preserve temporal graph structure well. Thus, our empirical findings constitute the first step towards LID-aware dynamic graph embedding methods.
- [107] arXiv:2411.16154 [pdf, html, other]
-
Title: DeDe: Detecting Backdoor Samples for SSL Encoders via DecodersComments: 12 pagesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. However, it is found to be susceptible to backdoor attacks merely via polluting a small portion of training data. The victim encoders mismatch triggered inputs with target embeddings, e.g., match the triggered cat input to an airplane embedding, such that the downstream tasks are affected to misbehave when the trigger is activated. Emerging backdoor attacks have shown great threats in different SSL paradigms such as contrastive learning and CLIP, while few research is devoted to defending against such attacks. Besides, the existing ones fall short in detecting advanced stealthy backdoors. To address the limitations, we propose a novel detection mechanism, DeDe, which detects the activation of the backdoor mapping with the cooccurrence of victim encoder and trigger inputs. Specifically, DeDe trains a decoder for the SSL encoder on an auxiliary dataset (can be out-of-distribution or even slightly poisoned), such that for any triggered input that misleads to the target embedding, the decoder outputs an image significantly different from the input. We empirically evaluate DeDe on both contrastive learning and CLIP models against various types of backdoor attacks, and demonstrate its superior performance over SOTA detection methods in both upstream detection performance and ability of preventing backdoors in downstream tasks.
- [108] arXiv:2411.16155 [pdf, html, other]
-
Title: Graph Adapter of EEG Foundation Models for Parameter Efficient Fine TuningComments: Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
In diagnosing mental diseases from electroencephalography (EEG) data, neural network models such as Transformers have been employed to capture temporal dynamics. Additionally, it is crucial to learn the spatial relationships between EEG sensors, for which Graph Neural Networks (GNNs) are commonly used. However, fine-tuning large-scale complex neural network models simultaneously to capture both temporal and spatial features increases computational costs due to the more significant number of trainable parameters. It causes the limited availability of EEG datasets for downstream tasks, making it challenging to fine-tune large models effectively. We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach to address these challenges. EGA is integrated into pre-trained temporal backbone models as a GNN-based module and fine-tuned itself alone while keeping the backbone model parameters frozen. This enables the acquisition of spatial representations of EEG signals for downstream tasks, significantly reducing computational overhead and data requirements. Experimental evaluations on healthcare-related downstream tasks of Major Depressive Disorder and Abnormality Detection demonstrate that our EGA improves performance by up to 16.1% in the F1-score compared with the backbone BENDR model.
- [109] arXiv:2411.16158 [pdf, html, other]
-
Title: MixPE: Quantization and Hardware Co-design for Efficient LLM InferenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Transformer-based large language models (LLMs) have achieved remarkable success as model sizes continue to grow, yet their deployment remains challenging due to significant computational and memory demands. Quantization has emerged as a promising solution, and state-of-the-art quantization algorithms for LLMs introduce the need for mixed-precision matrix multiplication (mpGEMM), where lower-precision weights are multiplied with higher-precision activations. Despite its benefits, current hardware accelerators such as GPUs and TPUs lack native support for efficient mpGEMM, leading to inefficient dequantization operations in the main sequential loop. To address this limitation, we introduce MixPE, a specialized mixed-precision processing element designed for efficient low-bit quantization in LLM inference. MixPE leverages two key innovations to minimize dequantization overhead and unlock the full potential of low-bit quantization. First, recognizing that scale and zero point are shared within each quantization group, we propose performing dequantization after per-group mpGEMM, significantly reducing dequantization overhead. Second, instead of relying on conventional multipliers, MixPE utilizes efficient shift\&add operations for multiplication, optimizing both computation and energy efficiency. Our experimental results demonstrate that MixPE surpasses the state-of-the-art quantization accelerators by $2.6\times$ speedup and $1.4\times$ energy reduction.
- [110] arXiv:2411.16167 [pdf, html, other]
-
Title: BadSFL: Backdoor Attack against Scaffold Federated LearningXingshuo Han, Xiang Lan, Haozhao Wang, Shengmin Xu, Shen Ren, Jason Zeng, Ming Wu, Michael Heinrich, Tianwei ZhangSubjects: Machine Learning (cs.LG)
Federated learning (FL) enables the training of deep learning models on distributed clients to preserve data privacy. However, this learning paradigm is vulnerable to backdoor attacks, where malicious clients can upload poisoned local models to embed backdoors into the global model, leading to attacker-desired predictions. Existing backdoor attacks mainly focus on FL with independently and identically distributed (IID) scenarios, while real-world FL training data are typically non-IID. Current strategies for non-IID backdoor attacks suffer from limitations in maintaining effectiveness and durability. To address these challenges, we propose a novel backdoor attack method, \name, specifically designed for the FL framework using the scaffold aggregation algorithm in non-IID settings. \name leverages a Generative Adversarial Network (GAN) based on the global model to complement the training set, achieving high accuracy on both backdoor and benign samples. It utilizes a specific feature as the backdoor trigger to ensure stealthiness, and exploits the Scaffold's control variate to predict the global model's convergence direction, ensuring the backdoor's persistence. Extensive experiments on three benchmark datasets demonstrate the high effectiveness, stealthiness, and durability of \name. Notably, our attack remains effective over 60 rounds in the global model and up to 3 times longer than existing baseline attacks after stopping the injection of malicious updates.
- [111] arXiv:2411.16200 [pdf, html, other]
-
Title: Neural Network-based High-index Saddle Dynamics Method for Searching Saddle Points and Solution LandscapeSubjects: Machine Learning (cs.LG)
The high-index saddle dynamics (HiSD) method is a powerful approach for computing saddle points and solution landscape. However, its practical applicability is constrained by the need for the explicit energy function expression. To overcome this challenge, we propose a neural network-based high-index saddle dynamics (NN-HiSD) method. It utilizes neural network-based surrogate model to approximates the energy function, allowing the use of the HiSD method in the cases where the energy function is either unavailable or computationally expensive. We further enhance the efficiency of the NN-HiSD method by incorporating momentum acceleration techniques, specifically Nesterov's acceleration and the heavy-ball method. We also provide a rigorous convergence analysis of the NN-HiSD method. We conduct numerical experiments on systems with and without explicit energy functions, specifically including the alanine dipeptide model and bacterial ribosomal assembly intermediates for the latter, demonstrating the effectiveness and reliability of the proposed method.
- [112] arXiv:2411.16201 [pdf, html, other]
-
Title: Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language ModelsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
High-quality video-text preference data is crucial for Multimodal Large Language Models (MLLMs) alignment. However, existing preference data is very scarce. Obtaining VQA preference data for preference training is costly, and manually annotating responses is highly unreliable, which could result in low-quality pairs. Meanwhile, AI-generated responses controlled by temperature adjustment lack diversity. To address these issues, we propose a high-quality VQA preference dataset, called \textit{\textbf{M}ultiple \textbf{M}ultimodal \textbf{A}rtificial \textbf{I}ntelligence \textbf{P}reference Datasets in \textbf{V}QA} (\textbf{MMAIP-V}), which is constructed by sampling from the response distribution set and using an external scoring function for response evaluation. Furthermore, to fully leverage the preference knowledge in MMAIP-V and ensure sufficient optimization, we propose \textit{\textbf{Iter}ative \textbf{W}eak-to-\textbf{S}trong \textbf{R}einforcement \textbf{L}earning from \textbf{AI} \textbf{F}eedback for video MLLMs} (\textbf{Iter-W2S-RLAIF}), a framework that gradually enhances MLLMs' alignment capabilities by iteratively updating the reference model and performing parameter extrapolation. Finally, we propose an unbiased and information-complete evaluation scheme in VQA evaluation. Experiments demonstrate that MMAIP-V is beneficial for MLLMs in preference learning and Iter-W2S-RLAIF fully exploits the alignment information in MMAIP-V. We believe that the proposed automatic VQA preference data generation pipeline based on AI feedback can greatly promote future work in the MLLMs alignment. \textbf{Code and dataset are available} \href{this https URL}{MMAIP-V\_Iter-W2S-RLAIF-702F}.
- [113] arXiv:2411.16206 [pdf, html, other]
-
Title: Batch Bayesian Optimization via Expected Subspace ImprovementSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. Most of current batch approaches use artificial functions to simulate the sequential Bayesian optimization algorithm's behavior to select a batch of points for parallel evaluation. However, as the batch size grows, the accumulated error introduced by these artificial functions increases rapidly, which dramatically decreases the optimization efficiency of the algorithm. In this work, we propose a simple and efficient approach to extend Bayesian optimization to batch evaluation. Different from existing batch approaches, the idea of the new approach is to draw a batch of subspaces of the original problem and select one acquisition point from each subspace. To achieve this, we propose the expected subspace improvement criterion to measure the amount of the improvement that a candidate point can achieve within a certain subspace. By optimizing these expected subspace improvement functions simultaneously, we can get a batch of query points for expensive evaluation. Numerical experiments show that our proposed approach can achieve near-linear speedup when compared with the sequential Bayesian optimization algorithm, and performs very competitively when compared with eight state-of-the-art batch algorithms. This work provides a simple yet efficient approach for batch Bayesian optimization. A Matlab implementation of our approach is available at this https URL
- [114] arXiv:2411.16260 [pdf, html, other]
-
Title: Unraveling Arithmetic in Large Language Models: The Role of Algebraic StructuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting, which decomposes complex reasoning into step-by-step solutions. This approach has enabled significant advancements, as evidenced by performance on benchmarks like GSM8K and MATH. However, the mechanisms underlying LLMs' ability to perform arithmetic in a single step of CoT remain poorly understood. Existing studies debate whether LLMs encode numerical values or rely on symbolic reasoning, while others explore attention and multi-layered processing in arithmetic tasks. In this work, we propose that LLMs learn arithmetic by capturing algebraic structures, such as \emph{Commutativity} and \emph{Identity} properties. Since these structures are observable through input-output relationships, they can generalize to unseen data. We empirically demonstrate that LLMs can learn algebraic structures using a custom dataset of arithmetic problems. Our findings indicate that leveraging algebraic structures can enhance the LLMs' arithmetic capabilities, offering insights into improving their arithmetic performance.
- [115] arXiv:2411.16278 [pdf, html, other]
-
Title: Even Sparser Graph TransformersHamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker, David Woodruff, Danica SutherlandSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph Transformers excel in long-range dependency modeling, but generally require quadratic memory complexity in the number of nodes in an input graph, and hence have trouble scaling to large graphs. Sparse attention variants such as Exphormer can help, but may require high-degree augmentations to the input graph for good performance, and do not attempt to sparsify an already-dense input graph. As the learned attention mechanisms tend to use few of these edges, such high-degree connections may be unnecessary. We show (empirically and with theoretical backing) that attention scores on graphs are usually quite consistent across network widths, and use this observation to propose a two-stage procedure, which we call Spexphormer: first, train a narrow network on the full augmented graph. Next, use only the active connections to train a wider network on a much sparser graph. We establish theoretical conditions when a narrow network's attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets.
- [116] arXiv:2411.16285 [pdf, html, other]
-
Title: A Graph Neural Architecture Search Approach for Identifying Bots in Social MediaGeorgios Tzoumanekas, Michail Chatzianastasis, Loukas Ilias, George Kiokes, John Psarras, Dimitris AskounisSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Social media platforms, including X, Facebook, and Instagram, host millions of daily users, giving rise to bots-automated programs disseminating misinformation and ideologies with tangible real-world consequences. While bot detection in platform X has been the area of many deep learning models with adequate results, most approaches neglect the graph structure of social media relationships and often rely on hand-engineered architectures. Our work introduces the implementation of a Neural Architecture Search (NAS) technique, namely Deep and Flexible Graph Neural Architecture Search (DFG-NAS), tailored to Relational Graph Convolutional Neural Networks (RGCNs) in the task of bot detection in platform X. Our model constructs a graph that incorporates both the user relationships and their metadata. Then, DFG-NAS is adapted to automatically search for the optimal configuration of Propagation and Transformation functions in the RGCNs. Our experiments are conducted on the TwiBot-20 dataset, constructing a graph with 229,580 nodes and 227,979 edges. We study the five architectures with the highest performance during the search and achieve an accuracy of 85.7%, surpassing state-of-the-art models. Our approach not only addresses the bot detection challenge but also advocates for the broader implementation of NAS models in neural network design automation.
- [117] arXiv:2411.16298 [pdf, html, other]
-
Title: Evaluating Rank-N-Contrast: Continuous and Robust Representations for RegressionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This document is a replication of the original "Rank-N-Contrast" (arXiv:2210.01189v2) paper published in 2023. This evaluation is done for academic purposes. Deep regression models often fail to capture the continuous nature of sample orders, creating fragmented representations and suboptimal performance. To address this, we reproduced the Rank-N-Contrast (RNC) framework, which learns continuous representations by contrasting samples by their rankings in the target space. Our study validates RNC's theoretical and empirical benefits, including improved performance and robustness. We extended the evaluation to an additional regression dataset and conducted robustness tests using a holdout method, where a specific range of continuous data was excluded from the training set. This approach assessed the model's ability to generalise to unseen data and achieve state-of-the-art performance. This replication study validates the original findings and broadens the understanding of RNC's applicability and robustness.
- [118] arXiv:2411.16303 [pdf, html, other]
-
Title: Understanding Generalization of Federated Learning: the Trade-off between Model Stability and OptimizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Federated Learning (FL) is a distributed learning approach that trains neural networks across multiple devices while keeping their local data private. However, FL often faces challenges due to data heterogeneity, leading to inconsistent local optima among clients. These inconsistencies can cause unfavorable convergence behavior and generalization performance degradation. Existing studies mainly describe this issue through \textit{convergence analysis}, focusing on how well a model fits training data, or through \textit{algorithmic stability}, which examines the generalization gap. However, neither approach precisely captures the generalization performance of FL algorithms, especially for neural networks. In this paper, we introduce the first generalization dynamics analysis framework in federated optimization, highlighting the trade-offs between model stability and optimization. Through this framework, we show how the generalization of FL algorithms is affected by the interplay of algorithmic stability and optimization. This framework applies to standard federated optimization and its advanced versions, like server momentum. We find that fast convergence from large local steps or accelerated momentum enlarges stability but obtains better generalization performance. Our insights into these trade-offs can guide the practice of future algorithms for better generalization.
- [119] arXiv:2411.16315 [pdf, html, other]
-
Title: Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent VariablesSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global network structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.
- [120] arXiv:2411.16342 [pdf, html, other]
-
Title: A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network InferenceComments: Accepted for ASP-DAC 2025Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
Graph Neural Networks (GNNs) have shown significant promise in various domains, such as recommendation systems, bioinformatics, and network analysis. However, the irregularity of graph data poses unique challenges for efficient computation, leading to the development of specialized GNN accelerator architectures that surpass traditional CPU and GPU performance. Despite this, the structural diversity of input graphs results in varying performance across different GNN accelerators, depending on their dataflows. This variability in performance due to differing dataflows and graph properties remains largely unexplored, limiting the adaptability of GNN accelerators. To address this, we propose a data-driven framework for dataflow-aware latency prediction in GNN inference. Our approach involves training regressors to predict the latency of executing specific graphs on particular dataflows, using simulations on synthetic graphs. Experimental results indicate that our regressors can predict the optimal dataflow for a given graph with up to 91.28% accuracy and a Mean Absolute Percentage Error (MAPE) of 3.78%. Additionally, we introduce an online scheduling algorithm that uses these regressors to enhance scheduling decisions. Our experiments demonstrate that this algorithm achieves up to $3.17\times$ speedup in mean completion time and $6.26\times$ speedup in mean execution time compared to the best feasible baseline across all datasets.
- [121] arXiv:2411.16346 [pdf, html, other]
-
Title: Towards Foundation Models for Critical Care Time SeriesManuel Burger, Fedor Sergeev, Malte Londschien, Daphné Chopard, Hugo Yèche, Eike Gerdes, Polina Leshetkina, Alexander Morgenroth, Zeynep Babür, Jasmina Bogojeska, Martin Faltys, Rita Kuznetsova, Gunnar RätschComments: Accepted for Oral Presentation at AIM-FM Workshop at NeurIPS 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.
- [122] arXiv:2411.16349 [pdf, html, other]
-
Title: Machine learning for cerebral blood vessels' malformationsIrem Topal, Alexander Cherevko, Yuri Bugay, Maxim Shishlenin, Jean Barbier, Deniz Eroglu, Édgar Roldán, Roman BelousovComments: 14 pages, 6 main figures, 5 supplementary figures, 2 supplementary tablesSubjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Quantitative Methods (q-bio.QM)
Cerebral aneurysms and arteriovenous malformations are life-threatening hemodynamic pathologies of the brain. While surgical intervention is often essential to prevent fatal outcomes, it carries significant risks both during the procedure and in the postoperative period, making the management of these conditions highly challenging. Parameters of cerebral blood flow, routinely monitored during medical interventions, could potentially be utilized in machine learning-assisted protocols for risk assessment and therapeutic prognosis. To this end, we developed a linear oscillatory model of blood velocity and pressure for clinical data acquired from neurosurgical operations. Using the method of Sparse Identification of Nonlinear Dynamics (SINDy), the parameters of our model can be reconstructed online within milliseconds from a short time series of the hemodynamic variables. The identified parameter values enable automated classification of the blood-flow pathologies by means of logistic regression, achieving an accuracy of 73 %. Our results demonstrate the potential of this model for both diagnostic and prognostic applications, providing a robust and interpretable framework for assessing cerebral blood vessel conditions.
- [123] arXiv:2411.16422 [pdf, other]
-
Title: Turbofan Engine Remaining Useful Life (RUL) Prediction Based on Bi-Directional Long Short-Term Memory (BLSTM)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
The aviation industry is rapidly evolving, driven by advancements in technology. Turbofan engines used in commercial aerospace are very complex systems. The majority of turbofan engine components are susceptible to degradation over the life of their operation. Turbofan engine degradation has an impact to engine performance, operability, and reliability. Predicting accurate remaining useful life (RUL) of a commercial turbofan engine based on a variety of complex sensor data is of paramount importance for the safety of the passengers, safety of flight, and for cost effective operations. That is why it is essential for turbofan engines to be monitored, controlled, and maintained. RUL predictions can either come from model-based or data-based approaches. The model-based approach can be very expensive due to the complexity of the mathematical models and the deep expertise that is required in the domain of physical systems. The data-based approach is more frequently used nowadays thanks to the high computational complexity of computers, the advancements in Machine Learning (ML) models, and advancements in sensors. This paper is going to be focused on Bi-Directional Long Short-Term Memory (BLSTM) models but will also provide a benchmark of several RUL prediction databased models. The proposed RUL prediction models are going to be evaluated based on engine failure prediction benchmark dataset Commercial Modular Aero-Propulsion System Simulation (CMAPSS). The CMAPSS dataset is from NASA which contains turbofan engine run to failure events.
- [124] arXiv:2411.16427 [pdf, html, other]
-
Title: Unsupervised Event Outlier Detection in Continuous TimeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Event sequence data record the occurrences of events in continuous time. Event sequence forecasting based on temporal point processes (TPPs) has been extensively studied, but outlier or anomaly detection, especially without any supervision from humans, is still underexplored. In this work, we develop, to the best our knowledge, the first unsupervised outlier detection approach to detecting abnormal events. Our novel unsupervised outlier detection framework is based on ideas from generative adversarial networks (GANs) and reinforcement learning (RL). We train a 'generator' that corrects outliers in the data with a 'discriminator' that learns to discriminate the corrected data from the real data, which may contain outliers. A key insight is that if the generator made a mistake in the correction, it would generate anomalies that are different from the anomalies in the real data, so it serves as data augmentation for the discriminator learning. Different from typical GAN-based outlier detection approaches, our method employs the generator to detect outliers in an online manner. The experimental results show that our method can detect event outliers more accurately than the state-of-the-art approaches.
- [125] arXiv:2411.16442 [pdf, html, other]
-
Title: TIFeD: a Tiny Integer-based Federated learning algorithm with Direct feedback alignmentJournal-ref: Proceedings of the Third International Conference on AI-ML Systems, 2023, pp. 1-8Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Training machine and deep learning models directly on extremely resource-constrained devices is the next challenge in the field of tiny machine learning. The related literature in this field is very limited, since most of the solutions focus only on on-device inference or model adaptation through online learning, leaving the training to be carried out on external Cloud services. An interesting technological perspective is to exploit Federated Learning (FL), which allows multiple devices to collaboratively train a shared model in a distributed way. However, the main drawback of state-of-the-art FL algorithms is that they are not suitable for running on tiny devices. For the first time in the literature, in this paper we introduce TIFeD, a Tiny Integer-based Federated learning algorithm with Direct Feedback Alignment (DFA) entirely implemented by using an integer-only arithmetic and being specifically designed to operate on devices with limited resources in terms of memory, computation and energy. Besides the traditional full-network operating modality, in which each device of the FL setting trains the entire neural network on its own local data, we propose an innovative single-layer TIFeD implementation, which enables each device to train only a portion of the neural network model and opens the door to a new way of distributing the learning procedure across multiple devices. The experimental results show the feasibility and effectiveness of the proposed solution. The proposed TIFeD algorithm, with its full-network and single-layer implementations, is made available to the scientific community as a public repository.
- [126] arXiv:2411.16458 [pdf, html, other]
-
Title: On the Reconstruction of Training Data from Group Invariant NetworksSubjects: Machine Learning (cs.LG)
Reconstructing training data from trained neural networks is an active area of research with significant implications for privacy and explainability. Recent advances have demonstrated the feasibility of this process for several data types. However, reconstructing data from group-invariant neural networks poses distinct challenges that remain largely unexplored. This paper addresses this gap by first formulating the problem and discussing some of its basic properties. We then provide an experimental evaluation demonstrating that conventional reconstruction techniques are inadequate in this scenario. Specifically, we observe that the resulting data reconstructions gravitate toward symmetric inputs on which the group acts trivially, leading to poor-quality results. Finally, we propose two novel methods aiming to improve reconstruction in this setup and present promising preliminary experimental results. Our work sheds light on the complexities of reconstructing data from group invariant neural networks and offers potential avenues for future research in this domain.
- [127] arXiv:2411.16462 [pdf, html, other]
-
Title: Lion Cub: Minimizing Communication Overhead in Distributed LionSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck. While gradient compression techniques have been explored for SGD and Adam, the Lion optimizer has the distinct advantage that its update vectors are the output of a sign operation, enabling straightforward quantization. However, simply compressing updates for communication and using techniques like majority voting fails to lead to end-to-end speedups due to inefficient communication algorithms and reduced convergence. We analyze three factors critical to distributed learning with Lion: optimizing communication methods, identifying effective quantization methods, and assessing the necessity of momentum synchronization. Our findings show that quantization techniques adapted to Lion and selective momentum synchronization can significantly reduce communication costs while maintaining convergence. We combine these into Lion Cub, which enables up to 5x speedups in end-to-end training compared to Lion. This highlights Lion's potential as a communication-efficient solution for distributed training.
- [128] arXiv:2411.16477 [pdf, html, other]
-
Title: Distributed Online Optimization with Stochastic Agent AvailabilitySubjects: Machine Learning (cs.LG)
Motivated by practical federated learning settings where clients may not be always available, we investigate a variant of distributed online optimization where agents are active with a known probability $p$ at each time step, and communication between neighboring agents can only take place if they are both active. We introduce a distributed variant of the FTRL algorithm and analyze its network regret, defined through the average of the instantaneous regret of the active agents. Our analysis shows that, for any connected communication graph $G$ over $N$ agents, the expected network regret of our FTRL variant after $T$ steps is at most of order $(\kappa/p^2)\min\big\{\sqrt{N},N^{1/4}/\sqrt{p}\big\}\sqrt{T}$, where $\kappa$ is the condition number of the Laplacian of $G$. We then show that similar regret bounds also hold with high probability. Moreover, we show that our notion of regret (average-case over the agents) is essentially equivalent to the standard notion of regret (worst-case over agents), implying that our bounds are not significantly improvable when $p=1$. Our theoretical results are supported by experiments on synthetic datasets.
- [129] arXiv:2411.16478 [pdf, html, other]
-
Title: Distributed, communication-efficient, and differentially private estimation of KL divergenceComments: 28 pages, 5 figuresSubjects: Machine Learning (cs.LG); Databases (cs.DB)
A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy. We analyze their theoretical properties and present an empirical study of their performance. We explore parameter settings that optimize the accuracy of the algorithm catering to each of the settings; these provide sub-variations that are applicable to real-world tasks, addressing different context- and application-specific trust level requirements. Our experimental results confirm that our private estimators achieve accuracy comparable to a baseline algorithm without differential privacy guarantees.
- [130] arXiv:2411.16502 [pdf, html, other]
-
Title: Interpreting Language Reward Models via Contrastive ExplanationsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reward models (RMs) are a crucial component in the alignment of large language models' (LLMs) outputs with human values. RMs approximate human preferences over possible LLM responses to the same prompt by predicting and comparing reward scores. However, as they are typically modified versions of LLMs with scalar output heads, RMs are large black boxes whose predictions are not explainable. More transparent RMs would enable improved trust in the alignment of LLMs. In this work, we propose to use contrastive explanations to explain any binary response comparison made by an RM. Specifically, we generate a diverse set of new comparisons similar to the original one to characterise the RM's local behaviour. The perturbed responses forming the new comparisons are generated to explicitly modify manually specified high-level evaluation attributes, on which analyses of RM behaviour are grounded. In quantitative experiments, we validate the effectiveness of our method for finding high-quality contrastive explanations. We then showcase the qualitative usefulness of our method for investigating global sensitivity of RMs to each evaluation attribute, and demonstrate how representative examples can be automatically extracted to explain and compare behaviours of different RMs. We see our method as a flexible framework for RM explanation, providing a basis for more interpretable and trustworthy LLM alignment.
- [131] arXiv:2411.16525 [pdf, other]
-
Title: Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and EfficiencySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on \textit{single-head} transformers with only a \textit{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-$dL$ and -in-$(1/\epsilon)$ lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the \textit{soft-prompt-induced} keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms. These fundamental limits provide important necessary conditions for designing expressive and efficient prompt tuning methods for practitioners.
- [132] arXiv:2411.16532 [pdf, html, other]
-
Title: Continual Deep Reinforcement Learning with Task-Agnostic Policy DistillationComments: Accepted for publication in Scientific ReportsSubjects: Machine Learning (cs.LG)
Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)-(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: this https URL.
- [133] arXiv:2411.16549 [pdf, other]
-
Title: Transformers are Deep Optimizers: Provable In-Context Learning for Deep Model TrainingComments: 66 pages, 3 figuresSubjects: Machine Learning (cs.LG)
We investigate the transformer's capability for in-context learning (ICL) to simulate the training process of deep models. Our key contribution is providing a positive example of using a transformer to train a deep neural network by gradient descent in an implicit fashion via ICL. Specifically, we provide an explicit construction of a $(2N+4)L$-layer transformer capable of simulating $L$ gradient descent steps of an $N$-layer ReLU network through ICL. We also give the theoretical guarantees for the approximation within any given error and the convergence of the ICL gradient descent. Additionally, we extend our analysis to the more practical setting using Softmax-based transformers. We validate our findings on synthetic datasets for 3-layer, 4-layer, and 6-layer neural networks. The results show that ICL performance matches that of direct training.
- [134] arXiv:2411.16550 [pdf, html, other]
-
Title: Representation Collapsing Problems in Vector QuantizationComments: 13 pages, under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Vector quantization is a technique in machine learning that discretizes continuous representations into a set of discrete vectors. It is widely employed in tokenizing data representations for large language models, diffusion models, and other generative models. Despite its prevalence, the characteristics and behaviors of vector quantization in generative models remain largely underexplored. In this study, we investigate representation collapse in vector quantization - a critical degradation where codebook tokens or latent embeddings lose their discriminative power by converging to a limited subset of values. This collapse fundamentally compromises the model's ability to capture diverse data patterns. By leveraging both synthetic and real datasets, we identify the severity of each type of collapses and triggering conditions. Our analysis reveals that restricted initialization and limited encoder capacity result in tokens collapse and embeddings collapse. Building on these findings, we propose potential solutions aimed at mitigating each collapse. To the best of our knowledge, this is the first comprehensive study examining representation collapsing problems in vector quantization.
- [135] arXiv:2411.16554 [pdf, html, other]
-
Title: Generating Out-Of-Distribution Scenarios Using Language ModelsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The deployment of autonomous vehicles controlled by machine learning techniques requires extensive testing in diverse real-world environments, robust handling of edge cases and out-of-distribution scenarios, and comprehensive safety validation to ensure that these systems can navigate safely and effectively under unpredictable conditions. Addressing Out-Of-Distribution (OOD) driving scenarios is essential for enhancing safety, as OOD scenarios help validate the reliability of the models within the vehicle's autonomy stack. However, generating OOD scenarios is challenging due to their long-tailed distribution and rarity in urban driving dataset. Recently, Large Language Models (LLMs) have shown promise in autonomous driving, particularly for their zero-shot generalization and common-sense reasoning capabilities. In this paper, we leverage these LLM strengths to introduce a framework for generating diverse OOD driving scenarios. Our approach uses LLMs to construct a branching tree, where each branch represents a unique OOD scenario. These scenarios are then simulated in the CARLA simulator using an automated framework that aligns scene augmentation with the corresponding textual descriptions. We evaluate our framework through extensive simulations, and assess its performance via a diversity metric that measures the richness of the scenarios. Additionally, we introduce a new "OOD-ness" metric, which quantifies how much the generated scenarios deviate from typical urban driving conditions. Furthermore, we explore the capacity of modern Vision-Language Models (VLMs) to interpret and safely navigate through the simulated OOD scenarios. Our findings offer valuable insights into the reliability of language models in addressing OOD scenarios within the context of urban driving.
- [136] arXiv:2411.16567 [pdf, other]
-
Title: Enhancing Few-Shot Learning with Integrated Data and GAN Model ApproachesSubjects: Machine Learning (cs.LG)
This paper presents an innovative approach to enhancing few-shot learning by integrating data augmentation with model fine-tuning in a framework designed to tackle the challenges posed by small-sample data. Recognizing the critical limitations of traditional machine learning models that require large datasets-especially in fields such as drug discovery, target recognition, and malicious traffic detection-this study proposes a novel strategy that leverages Generative Adversarial Networks (GANs) and advanced optimization techniques to improve model performance with limited data. Specifically, the paper addresses the noise and bias issues introduced by data augmentation methods, contrasting them with model-based approaches, such as fine-tuning and metric learning, which rely heavily on related datasets. By combining Markov Chain Monte Carlo (MCMC) sampling and discriminative model ensemble strategies within a GAN framework, the proposed model adjusts generative and discriminative distributions to simulate a broader range of relevant data. Furthermore, it employs MHLoss and a reparameterized GAN ensemble to enhance stability and accelerate convergence, ultimately leading to improved classification performance on small-sample images and structured datasets. Results confirm that the MhERGAN algorithm developed in this research is highly effective for few-shot learning, offering a practical solution that bridges data scarcity with high-performing model adaptability and generalization.
- [137] arXiv:2411.16591 [pdf, html, other]
-
Title: Adversarial Attacks for Drift DetectionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Concept drift refers to the change of data distributions over time. While drift poses a challenge for learning models, requiring their continual adaption, it is also relevant in system monitoring to detect malfunctions, system failures, and unexpected behavior. In the latter case, the robust and reliable detection of drifts is imperative. This work studies the shortcomings of commonly used drift detection schemes. We show how to construct data streams that are drifting without being detected. We refer to those as drift adversarials. In particular, we compute all possible adversairals for common detection schemes and underpin our theoretical findings with empirical evaluations.
- [138] arXiv:2411.16615 [pdf, html, other]
-
Title: Graph Pooling with Local Cluster SelectionComments: 10 pages, 4 figuresSubjects: Machine Learning (cs.LG)
Graph poolings in GNNs are a family of operations which take graphs as inputs and produce coarsened graphs as output. Modern graph poolings are trainable and closely related to GNNs, which learn to pool graphs under different assumptions. Though there are various assumptions, the procedure of generating pooled graphs is relatively similar and limited. This work formalizes a novel procedure of pooling graphs, along with a graph pooling approach for average situations.
- [139] arXiv:2411.16644 [pdf, html, other]
-
Title: Exploring Discrete Flow Matching for 3D De Novo Molecule GenerationComments: Presented at the NeurIPS 2024 Machine Learning for Structural Biology WorkshopSubjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Flow matching is a recently proposed generative modeling framework that has achieved impressive performance on a variety of tasks including those on biomolecular structures. The seminal flow matching framework was developed only for continuous data. However, de novo molecular design tasks require generating discrete data such as atomic elements or sequences of amino acid residues. Several discrete flow matching methods have been proposed recently to address this gap. In this work we benchmark the performance of existing discrete flow matching methods for 3D de novo small molecule generation and provide explanations of their differing behavior. As a result we present FlowMol-CTMC, an open-source model that achieves state of the art performance for 3D de novo design with fewer learnable parameters than existing methods. Additionally, we propose the use of metrics that capture molecule quality beyond local chemical valency constraints and towards higher-order structural motifs. These metrics show that even though basic constraints are satisfied, the models tend to produce unusual and potentially problematic functional groups outside of the training data distribution. Code and trained models for reproducing this work are available at \url{this https URL}.
New submissions (showing 139 of 139 entries)
- [140] arXiv:2411.15142 (cross-list from cond-mat.soft) [pdf, html, other]
-
Title: Data-driven Modeling of Granular Chains with Modern Koopman TheorySubjects: Soft Condensed Matter (cond-mat.soft); Machine Learning (cs.LG); Dynamical Systems (math.DS)
Externally driven dense packings of particles can exhibit nonlinear wave phenomena that are not described by effective medium theory or linearized approximate models. Such nontrivial wave responses can be exploited to design sound-focusing/scrambling devices, acoustic filters, and analog computational units. At high amplitude vibrations or low confinement pressures, the effect of nonlinear particle contacts becomes increasingly noticeable, and the interplay of nonlinearity, disorder, and discreteness in the system gives rise to remarkable properties, particularly useful in designing structures with exotic properties. In this paper, we build upon the data-driven methods in dynamical system analysis and show that the Koopman spectral theory can be applied to granular crystals, enabling their phase space analysis beyond the linearizable regime and without recourse to any approximations considered in the previous works. We show that a deep neural network can map the dynamics to a latent space where the essential nonlinearity of the granular system unfolds into a high-dimensional linear space. As a proof of concept, we use data from numerical simulations of a two-particle system and evaluate the accuracy of the trajectory predictions under various initial conditions. By incorporating data from experimental measurements, our proposed framework can directly capture the underlying dynamics without imposing any assumptions about the physics model. Spectral analysis of the trained surrogate system can help bridge the gap between the simulation results and the physical realization of granular crystals and facilitate the inverse design of materials with desired behaviors.
- [141] arXiv:2411.15144 (cross-list from eess.SP) [pdf, other]
-
Title: Physically Parameterized Differentiable MUSIC for DoA Estimation with Uncalibrated ArraysBaptiste Chatelier (INSA Rennes, IETR, MERCE-France), José Miguel Mateos-Ramos, Vincent Corlay (MERCE-France), Christian Häger, Matthieu Crussière (INSA Rennes, IETR), Henk Wymeersch, Luc Le Magoarou (INSA Rennes, IETR)Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
Direction of arrival (DoA) estimation is a common sensing problem in radar, sonar, audio, and wireless communication systems. It has gained renewed importance with the advent of the integrated sensing and communication paradigm. To fully exploit the potential of such sensing systems, it is crucial to take into account potential hardware impairments that can negatively impact the obtained performance. This study introduces a joint DoA estimation and hardware impairment learning scheme following a model-based approach. Specifically, a differentiable version of the multiple signal classification (MUSIC) algorithm is derived, allowing efficient learning of the considered impairments. The proposed approach supports both supervised and unsupervised learning strategies, showcasing its practical potential. Simulation results indicate that the proposed method successfully learns significant inaccuracies in both antenna locations and complex gains. Additionally, the proposed method outperforms the classical MUSIC algorithm in the DoA estimation task.
- [142] arXiv:2411.15145 (cross-list from cs.CY) [pdf, html, other]
-
Title: Opportunities of Reinforcement Learning in South Africa's Just TransitionComments: Accepted at the Southern African Conference for Artificial Intelligence Research 2024Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
South Africa stands at a crucial juncture, grappling with interwoven socio-economic challenges such as poverty, inequality, unemployment, and the looming climate crisis. The government's Just Transition framework aims to enhance climate resilience, achieve net-zero greenhouse gas emissions by 2050, and promote social inclusion and poverty eradication. According to the Presidential Commission on the Fourth Industrial Revolution, artificial intelligence technologies offer significant promise in addressing these challenges. This paper explores the overlooked potential of Reinforcement Learning (RL) in supporting South Africa's Just Transition. It examines how RL can enhance agriculture and land-use practices, manage complex, decentralised energy networks, and optimise transportation and logistics, thereby playing a critical role in achieving a just and equitable transition to a low-carbon future for all South Africans. We provide a roadmap as to how other researchers in the field may be able to contribute to these pressing problems.
- [143] arXiv:2411.15190 (cross-list from cs.CR) [pdf, html, other]
-
Title: Transforming Triple-Entry Accounting with Machine Learning: A Path to Enhanced Transparency Through AnalyticsSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Triple Entry (TE) is an accounting method that utilizes three accounts or 'entries' to record each transaction, rather than the conventional double-entry bookkeeping system. Existing studies have found that TE accounting, with its additional layer of verification and disclosure of inter-organizational relationships, could help improve transparency in complex financial and supply chain transactions such as blockchain. Machine learning (ML) presents a promising avenue to augment the transparency advantages of TE accounting. By automating some of the data collection and analysis needed for TE bookkeeping, ML techniques have the potential to make this more transparent accounting method scalable for large organizations with complex international supply chains, further enhancing the visibility and trustworthiness of financial reporting. By leveraging ML algorithms, anomalies within distributed ledger data can be swiftly identified, flagging potential instances of fraud or errors. Furthermore, by delving into transaction relationships over time, ML can untangle intricate webs of transactions, shedding light on obscured dealings and adding an investigative dimension. This paper aims to demonstrate the interaction between TE and ML and how they can leverage transparency levels.
- [144] arXiv:2411.15195 (cross-list from cs.CL) [pdf, other]
-
Title: Graph Neural Network-Based Entity Extraction and Relationship Reasoning in Complex Knowledge GraphsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This study proposed a knowledge graph entity extraction and relationship reasoning algorithm based on a graph neural network, using a graph convolutional network and graph attention network to model the complex structure in the knowledge graph. By building an end-to-end joint model, this paper achieves efficient recognition and reasoning of entities and relationships. In the experiment, this paper compared the model with a variety of deep learning algorithms and verified its superiority through indicators such as AUC, recall rate, precision rate, and F1 value. The experimental results show that the model proposed in this paper performs well in all indicators, especially in complex knowledge graphs, it has stronger generalization ability and stability. This provides strong support for further research on knowledge graphs and also demonstrates the application potential of graph neural networks in entity extraction and relationship reasoning.
- [145] arXiv:2411.15199 (cross-list from cs.CV) [pdf, html, other]
-
Title: Adaptively Controllable Diffusion Model for Efficient Conditional Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
With the development of artificial intelligence, more and more attention has been put onto generative models, which represent the creativity, a very important aspect of intelligence. In recent years, diffusion models have been studied and proven to be more reasonable and effective than previous methods. However, common diffusion frameworks suffer from controllability problems. Although extra conditions have been considered by some work to guide the diffusion process for a specific target generation, it only controls the generation result but not its process. In this work, we propose a new adaptive framework, $\textit{Adaptively Controllable Diffusion (AC-Diff) Model}$, to automatically and fully control the generation process, including not only the type of generation result but also the length and parameters of the generation process. Both inputs and conditions will be first fed into a $\textit{Conditional Time-Step (CTS) Module}$ to determine the number of steps needed for a generation. Then according to the length of the process, the diffusion rate parameters will be estimated through our $\textit{Adaptive Hybrid Noise Schedule (AHNS) Module}$. We further train the network with the corresponding adaptive sampling mechanism to learn how to adjust itself according to the conditions for the overall performance improvement. To enable its practical applications, AC-Diff is expected to largely reduce the average number of generation steps and execution time while maintaining the same performance as done in the literature diffusion models.
- [146] arXiv:2411.15200 (cross-list from cs.CV) [pdf, html, other]
-
Title: Deep Learning-Based Classification of Hyperkinetic Movement Disorders in ChildrenComments: 59 pages, 20 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Hyperkinetic movement disorders (HMDs) in children, including dystonia (abnormal twisting) and chorea (irregular, random movements), pose significant diagnostic challenges due to overlapping clinical features. The prevalence of dystonia ranges from 2 to 50 per million, and chorea from 5 to 10 per 100,000. These conditions are often diagnosed with delays averaging 4.75 to 7.83 years. Traditional diagnostic methods depend on clinical history and expert physical examinations, but specialized tests are ineffective due to the complex pathophysiology of these disorders. This study develops a neural network model to differentiate between dystonia and chorea from video recordings of paediatric patients performing motor tasks. The model integrates a Graph Convolutional Network (GCN) to capture spatial relationships and Long Short-Term Memory (LSTM) networks to account for temporal dynamics. Attention mechanisms were incorporated to improve model interpretability. The model was trained and validated on a dataset of 50 videos (31 chorea-predominant, 19 dystonia-predominant) collected under regulatory approval from Guy's and St Thomas' NHS Foundation Trust. The model achieved 85% accuracy, 81% sensitivity, and 88% specificity at 15 frames per second. Attention maps highlighted the model's ability to correctly identify involuntary movement patterns, with misclassifications often due to occluded body parts or subtle movement variations. This work demonstrates the potential of deep learning to improve the accuracy and efficiency of HMD diagnosis and could contribute to more reliable, interpretable clinical tools.
- [147] arXiv:2411.15202 (cross-list from physics.ao-ph) [pdf, html, other]
-
Title: A Comparison of Machine Learning Algorithms for Predicting Sea Surface Temperature in the Great Barrier Reef RegionSubjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)
Predicting Sea Surface Temperature (SST) in the Great Barrier Reef (GBR) region is crucial for the effective management of its fragile ecosystems. This study provides a rigorous comparative analysis of several machine learning techniques to identify the most effective method for SST prediction in this area. We evaluate the performance of ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest, and Extreme Gradient Boosting (XGBoost) algorithms. Our results reveal that while LASSO and ridge regression perform well, Random Forest and XGBoost significantly outperform them in terms of predictive accuracy, as evidenced by lower Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Prediction Error (RMSPE). Additionally, XGBoost demonstrated superior performance in minimizing Kullback- Leibler Divergence (KLD), indicating a closer alignment of predicted probability distributions with actual observations. These findings highlight the efficacy of using ensemble methods, particularly XGBoost, for predicting sea surface temperatures, making them valuable tools for climatological and environmental modeling.
- [148] arXiv:2411.15207 (cross-list from cs.CV) [pdf, html, other]
-
Title: Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-trainingComments: 15 pages, 2 figures, accepted by BMVC'24Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Recent advancements in vision-language pre-training via contrastive learning have significantly improved performance across computer vision tasks. However, in the medical domain, obtaining multimodal data is often costly and challenging due to privacy, sensitivity, and annotation complexity. To mitigate data scarcity while boosting model performance, we introduce \textbf{Uni-Mlip}, a unified self-supervision framework specifically designed to enhance medical vision-language pre-training. Uni-Mlip seamlessly integrates cross-modality, uni-modality, and fused-modality self-supervision techniques at the data-level and the feature-level. Additionally, Uni-Mlip tailors uni-modal image self-supervision to accommodate the unique characteristics of medical images. Our experiments across datasets of varying scales demonstrate that Uni-Mlip significantly surpasses current state-of-the-art methods in three key downstream tasks: image-text retrieval, image classification, and visual question answering (VQA).
- [149] arXiv:2411.15230 (cross-list from cs.AI) [pdf, html, other]
-
Title: A No Free Lunch Theorem for Human-AI CollaborationSubjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
The gold standard in human-AI collaboration is complementarity -- when combined performance exceeds both the human and algorithm alone. We investigate this challenge in binary classification settings where the goal is to maximize 0-1 accuracy. Given two or more agents who can make calibrated probabilistic predictions, we show a "No Free Lunch"-style result. Any deterministic collaboration strategy (a function mapping calibrated probabilities into binary classifications) that does not essentially always defer to the same agent will sometimes perform worse than the least accurate agent. In other words, complementarity cannot be achieved "for free." The result does suggest one model of collaboration with guarantees, where one agent identifies "obvious" errors of the other agent. We also use the result to understand the necessary conditions enabling the success of other collaboration techniques, providing guidance to human-AI collaboration.
- [150] arXiv:2411.15236 (cross-list from cs.CV) [pdf, html, other]
-
Title: Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention MapsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In text-to-image diffusion models, the cross-attention map of each text token indicates the specific image regions attended. Comparing these maps of syntactically related tokens provides insights into how well the generated image reflects the text prompt. For example, in the prompt, "a black car and a white clock", the cross-attention maps for "black" and "car" should focus on overlapping regions to depict a black car, while "car" and "clock" should not. Incorrect overlapping in the maps generally produces generation flaws such as missing objects and incorrect attribute binding. Our study makes the key observations investigating this issue in the existing text-to-image models:(1) the similarity in text embeddings between different tokens -- used as conditioning inputs -- can cause their cross-attention maps to focus on the same image regions; and (2) text embeddings often fail to faithfully capture syntactic relations already within text attention maps. As a result, such syntactic relationships can be overlooked in cross-attention module, leading to inaccurate image generation. To address this, we propose a method that directly transfers syntactic relations from the text attention maps to the cross-attention module via a test-time optimization. Our approach leverages this inherent yet unexploited information within text attention maps to enhance image-text semantic alignment across diverse prompts, without relying on external guidance.
- [151] arXiv:2411.15237 (cross-list from cs.CV) [pdf, other]
-
Title: Stain-Invariant Representation for Tissue Classification in Histology ImagesJournal-ref: In 27th Conference on Medical Image Understanding and Analysis 2023 (p. 242)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The process of digitising histology slides involves multiple factors that can affect a whole slide image's (WSI) final appearance, including the staining protocol, scanner, and tissue type. This variability constitutes a domain shift and results in significant problems when training and testing deep learning (DL) algorithms in multi-cohort settings. As such, developing robust and generalisable DL models in computational pathology (CPath) remains an open challenge. In this regard, we propose a framework that generates stain-augmented versions of the training images using stain matrix perturbation. Thereafter, we employed a stain regularisation loss to enforce consistency between the feature representations of the source and augmented images. Doing so encourages the model to learn stain-invariant and, consequently, domain-invariant feature representations. We evaluate the performance of the proposed model on cross-domain multi-class tissue type classification of colorectal cancer images and have achieved improved performance compared to other state-of-the-art methods.
- [152] arXiv:2411.15253 (cross-list from eess.IV) [pdf, other]
-
Title: Unsupervised Machine Learning for Osteoporosis Diagnosis Using Singh Index Clustering on Hip RadiographsVimaladevi Madhivanan, Kalavakonda Vijaya, Abhay Lal, Senthil Rithika, Shamala Karupusamy Subramaniam, Mohamed SameerSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Osteoporosis, a prevalent condition among the aging population worldwide, is characterized by diminished bone mass and altered bone structure, increasing susceptibility to fractures. It poses a significant and growing global public health challenge over the next decade. Diagnosis typically involves Dual-energy X-ray absorptiometry to measure bone mineral density, yet its mass screening utility is limited. The Singh Index (SI) provides a straightforward, semi-quantitative means of osteoporosis diagnosis through plain hip radiographs, assessing trabecular patterns in the proximal femur. Although cost-effective and accessible, manual SI calculation is time-intensive and requires expertise. This study aims to automate SI identification from radiographs using machine learning algorithms. An unlabelled dataset of 838 hip X-ray images from Indian adults aged 20-70 was utilized. A custom convolutional neural network architecture was developed for feature extraction, demonstrating superior performance in cluster homogeneity and heterogeneity compared to established models. Various clustering algorithms categorized images into six SI grade clusters, with comparative analysis revealing only two clusters with high Silhouette Scores for promising classification. Further scrutiny highlighted dataset imbalance and emphasized the importance of image quality and additional clinical data availability. The study suggests augmenting X-ray images with patient clinical data and reference images, alongside image pre-processing techniques, to enhance diagnostic accuracy. Additionally, exploring semi-supervised and self-supervised learning methods may mitigate labelling challenges associated with large datasets.
- [153] arXiv:2411.15255 (cross-list from eess.IV) [pdf, html, other]
-
Title: OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure CorrectionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Exposure correction is a fundamental problem in computer vision and image processing. Recently, frequency domain-based methods have achieved impressive improvement, yet they still struggle with complex real-world scenarios under extreme exposure conditions. This is due to the local convolutional receptive fields failing to model long-range dependencies in the spectrum, and the non-generative learning paradigm being inadequate for retrieving lost details from severely degraded regions. In this paper, we propose Omnidirectional Spectral Mamba (OSMamba), a novel exposure correction network that incorporates the advantages of state space models and generative diffusion models to address these limitations. Specifically, OSMamba introduces an omnidirectional spectral scanning mechanism that adapts Mamba to the frequency domain to capture comprehensive long-range dependencies in both the amplitude and phase spectra of deep image features, hence enhancing illumination correction and structure recovery. Furthermore, we develop a dual-domain prior generator that learns from well-exposed images to generate a degradation-free diffusion prior containing correct information about severely under- and over-exposed regions for better detail restoration. Extensive experiments on multiple-exposure and mixed-exposure datasets demonstrate that the proposed OSMamba achieves state-of-the-art performance both quantitatively and qualitatively.
- [154] arXiv:2411.15265 (cross-list from cs.CV) [pdf, html, other]
-
Title: Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAIComments: 19 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Gradient-based methods are a prototypical family of explainability techniques, especially for image-based models. Nonetheless, they have several shortcomings in that they (1) require white-box access to models, (2) are vulnerable to adversarial attacks, and (3) produce attributions that lie off the image manifold, leading to explanations that are not actually faithful to the model and do not align well with human perception. To overcome these challenges, we introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG), a novel method that serves as an improved basis for explainability of a given neural network than the traditional gradient. Specifically, by leveraging ensemble Kalman filters and diffusion models, we derive a derivative-free approximation of the model's gradient projected onto the data manifold, requiring access only to the model's outputs. We demonstrate the effectiveness of FreeMCG by applying it to both counterfactual generation and feature attribution, which have traditionally been treated as distinct tasks. Through comprehensive evaluation on both tasks, counterfactual explanation and feature attribution, we show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
- [155] arXiv:2411.15267 (cross-list from stat.ML) [pdf, html, other]
-
Title: Proportional infinite-width infinite-depth limit for deep linear neural networksSubjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Probability (math.PR)
We study the distributional properties of linear neural networks with random parameters in the context of large networks, where the number of layers diverges in proportion to the number of neurons per layer. Prior works have shown that in the infinite-width regime, where the number of neurons per layer grows to infinity while the depth remains fixed, neural networks converge to a Gaussian process, known as the Neural Network Gaussian Process. However, this Gaussian limit sacrifices descriptive power, as it lacks the ability to learn dependent features and produce output correlations that reflect observed labels. Motivated by these limitations, we explore the joint proportional limit in which both depth and width diverge but maintain a constant ratio, yielding a non-Gaussian distribution that retains correlations between outputs. Our contribution extends previous works by rigorously characterizing, for linear activation functions, the limiting distribution as a nontrivial mixture of Gaussians.
- [156] arXiv:2411.15269 (cross-list from eess.IV) [pdf, html, other]
-
Title: MambaIRv2: Attentive State Space RestorationComments: Technical reportSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by \textbf{even 0.35dB} PSNR for lightweight SR even with \textbf{9.3\% less} parameters and suppresses HAT on classic SR by \textbf{up to 0.29dB}. Code is available at \url{this https URL}.
- [157] arXiv:2411.15270 (cross-list from cs.CL) [pdf, html, other]
-
Title: BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation TechniquesComments: Accepted in ACAI 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Sentence-level embedding is essential for various tasks that require understanding natural language. Many studies have explored such embeddings for high-resource languages like English. However, low-resource languages like Bengali (a language spoken by almost two hundred and thirty million people) are still under-explored. This work introduces two lightweight sentence transformers for the Bangla language, leveraging a novel cross-lingual knowledge distillation approach. This method distills knowledge from a pre-trained, high-performing English sentence transformer. Proposed models are evaluated across multiple downstream tasks, including paraphrase detection, semantic textual similarity (STS), and Bangla hate speech detection. The new method consistently outperformed existing Bangla sentence transformers. Moreover, the lightweight architecture and shorter inference time make the models highly suitable for deployment in resource-constrained environments, making them valuable for practical NLP applications in low-resource languages.
- [158] arXiv:2411.15284 (cross-list from cs.CV) [pdf, other]
-
Title: When Spatial meets Temporal in Action RecognitionComments: Research reportSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Video action recognition has made significant strides, but challenges remain in effectively using both spatial and temporal information. While existing methods often focus on either spatial features (e.g., object appearance) or temporal dynamics (e.g., motion), they rarely address the need for a comprehensive integration of both. Capturing the rich temporal evolution of video frames, while preserving their spatial details, is crucial for improving accuracy. In this paper, we introduce the Temporal Integration and Motion Enhancement (TIME) layer, a novel preprocessing technique designed to incorporate temporal information. The TIME layer generates new video frames by rearranging the original sequence, preserving temporal order while embedding $N^2$ temporally evolving frames into a single spatial grid of size $N \times N$. This transformation creates new frames that balance both spatial and temporal information, making them compatible with existing video models. When $N=1$, the layer captures rich spatial details, similar to existing methods. As $N$ increases ($N\geq2$), temporal information becomes more prominent, while the spatial information decreases to ensure compatibility with model inputs. We demonstrate the effectiveness of the TIME layer by integrating it into popular action recognition models, such as ResNet-50, Vision Transformer, and Video Masked Autoencoders, for both RGB and depth video data. Our experiments show that the TIME layer enhances recognition accuracy, offering valuable insights for video processing tasks.
- [159] arXiv:2411.15306 (cross-list from math.ST) [pdf, html, other]
-
Title: Heavy-tailed Contamination is Easier than Adversarial ContaminationSubjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
A large body of work in the statistics and computer science communities dating back to Huber (Huber, 1960) has led to statistically and computationally efficient outlier-robust estimators. Two particular outlier models have received significant attention: the adversarial and heavy-tailed models. While the former models outliers as the result of a malicious adversary manipulating the data, the latter relaxes distributional assumptions on the data allowing outliers to naturally occur as part of the data generating process. In the first setting, the goal is to develop estimators robust to the largest fraction of outliers while in the second, one seeks estimators to combat the loss of statistical efficiency, where the dependence on the failure probability is paramount.
Despite these distinct motivations, the algorithmic approaches to both these settings have converged, prompting questions on the relationship between the models. In this paper, we investigate and provide a principled explanation for this phenomenon. First, we prove that any adversarially robust estimator is also resilient to heavy-tailed outliers for any statistical estimation problem with i.i.d data. As a corollary, optimal adversarially robust estimators for mean estimation, linear regression, and covariance estimation are also optimal heavy-tailed estimators. Conversely, for arguably the simplest high-dimensional estimation task of mean estimation, we construct heavy-tailed estimators whose application to the adversarial setting requires any black-box reduction to remove almost all the outliers in the data. Taken together, our results imply that heavy-tailed estimation is likely easier than adversarially robust estimation opening the door to novel algorithmic approaches for the heavy-tailed setting. Additionally, confidence intervals obtained for adversarially robust estimation also hold with high-probability. - [160] arXiv:2411.15315 (cross-list from quant-ph) [pdf, html, other]
-
Title: Lie-Equivariant Quantum Graph Neural NetworksJogi Suda Neto, Roy T. Forestano, Sergei Gleyzer, Kyoungchul Kong, Konstantin T. Matchev, Katia MatchevaComments: 10 pages, 5 figures, accepted to the Machine Learning with New Compute Paradigms (MLNCP) Workshop at NeurIPS 2024Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph)
Discovering new phenomena at the Large Hadron Collider (LHC) involves the identification of rare signals over conventional backgrounds. Thus binary classification tasks are ubiquitous in analyses of the vast amounts of LHC data. We develop a Lie-Equivariant Quantum Graph Neural Network (Lie-EQGNN), a quantum model that is not only data efficient, but also has symmetry-preserving properties. Since Lorentz group equivariance has been shown to be beneficial for jet tagging, we build a Lorentz-equivariant quantum GNN for quark-gluon jet discrimination and show that its performance is on par with its classical state-of-the-art counterpart LorentzNet, making it a viable alternative to the conventional computing paradigm.
- [161] arXiv:2411.15350 (cross-list from cs.RO) [pdf, html, other]
-
Title: Dynamic Tube MPC: Learning Tube Dynamics with Massively Parallel Simulation for Robust Safety in PracticeComments: Submitted to ICRA 2025Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
Safe navigation of cluttered environments is a critical challenge in robotics. It is typically approached by separating the planning and tracking problems, with planning executed on a reduced order model to generate reference trajectories, and control techniques used to track these trajectories on the full order dynamics. Inevitable tracking error necessitates robustification of the nominal plan to ensure safety; in many cases, this is accomplished via worst-case bounding, which ignores the fact that some trajectories of the planning model may be easier to track than others. In this work, we present a novel method leveraging massively parallel simulation to learn a dynamic tube representation, which characterizes tracking performance as a function of actions taken by the planning model. Planning model trajectories are then optimized such that the dynamic tube lies in the free space, allowing a balance between performance and safety to be traded off in real time. The resulting Dynamic Tube MPC is applied to the 3D hopping robot ARCHER, enabling agile and performant navigation of cluttered environments, and safe collision-free traversal of narrow corridors.
- [162] arXiv:2411.15351 (cross-list from cond-mat.mtrl-sci) [pdf, html, other]
-
Title: Accelerating CALPHAD-based Phase Diagram Predictions in Complex Alloys Using Universal Machine Learning Potentials: Opportunities and ChallengesComments: 11 pages, 8 figuresSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Accurate phase diagram prediction is crucial for understanding alloy thermodynamics and advancing materials design. While traditional CALPHAD methods are robust, they are resource-intensive and limited by experimentally assessed data. This work explores the use of machine learning interatomic potentials (MLIPs) such as M3GNet, CHGNet, MACE, SevenNet, and ORB to significantly accelerate phase diagram calculations by using the Alloy Theoretic Automated Toolkit (ATAT) to map calculations of the energies and free energies of atomistic systems to CALPHAD-compatible thermodynamic descriptions. Using case studies including Cr-Mo, Cu-Au, and Pt-W, we demonstrate that MLIPs, particularly ORB, achieve computational speedups exceeding three orders of magnitude compared to DFT while maintaining phase stability predictions within acceptable accuracy. Extending this approach to liquid phases and ternary systems like Cr-Mo-V highlights its versatility for high-entropy alloys and complex chemical spaces. This work demonstrates that MLIPs, integrated with tools like ATAT within a CALPHAD framework, provide an efficient and accurate framework for high-throughput thermodynamic modeling, enabling rapid exploration of novel alloy systems. While many challenges remain to be addressed, the accuracy of some of these MLIPs (ORB in particular) are on the verge of paving the way toward high-throughput generation of CALPHAD thermodynamic descriptions of multi-component, multi-phase alloy systems.
- [163] arXiv:2411.15364 (cross-list from cs.DS) [pdf, html, other]
-
Title: Exploring Facets of Language Generation in the LimitComments: 24 pagesSubjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
The recent work of Kleinberg and Mullainathan [KM24] provides a concrete model for language generation in the limit: given a sequence of examples from an unknown target language, the goal is to generate new examples from the target language such that no incorrect examples are generated beyond some point. In sharp contrast to strong negative results for the closely related problem of language identification, they establish positive results for language generation in the limit for all countable collections of languages. Follow-up work by Raman and Tewari [RT24] studies bounds on the number of distinct inputs required by an algorithm before correct language generation is achieved -- namely, whether this is a constant for all languages in the collection (uniform generation) or a language-dependent constant (non-uniform generation).
We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit. However, while the generation algorithm of [KM24] can be implemented using membership queries, we show that any algorithm cannot non-uniformly generate even for collections of just two languages, using only membership queries.
We also formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation, and show a strong negative result for exhaustive generation. Our result shows that a tradeoff between validity and breadth is inherent for generation in the limit. Finally, inspired by algorithms that can choose to obtain feedback, we consider a model of uniform generation with feedback, completely characterizing language collections for which such uniform generation with feedback is possible in terms of a complexity measure of the collection. - [164] arXiv:2411.15368 (cross-list from cs.SE) [pdf, html, other]
-
Title: The Power of Types: Exploring the Impact of Type Checking on Neural Bug Detection in Dynamically Typed LanguagesComments: Accepted by ICSE'25 Research TrackSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Programming Languages (cs.PL)
Motivation: Automated bug detection in dynamically typed languages such as Python is essential for maintaining code quality. The lack of mandatory type annotations in such languages can lead to errors that are challenging to identify early with traditional static analysis tools. Recent progress in deep neural networks has led to increased use of neural bug detectors. In statically typed languages, a type checker is integrated into the compiler and thus taken into consideration when the neural bug detector is designed for these languages.
Problem: However, prior studies overlook this aspect during the training and testing of neural bug detectors for dynamically typed languages. When an optional type checker is used, assessing existing neural bug detectors on bugs easily detectable by type checkers may impact their performance estimation. Moreover, including these bugs in the training set of neural bug detectors can shift their detection focus toward the wrong type of bugs.
Contribution: We explore the impact of type checking on various neural bug detectors for variable misuse bugs, a common type targeted by neural bug detectors. Existing synthetic and real-world datasets are type-checked to evaluate the prevalence of type-related bugs. Then, we investigate how type-related bugs influence the training and testing of the neural bug detectors.
Findings: Our findings indicate that existing bug detection datasets contain a significant proportion of type-related bugs. Building on this insight, we discover integrating the neural bug detector with a type checker can be beneficial, especially when the code is annotated with types. Further investigation reveals neural bug detectors perform better on type-related bugs than other bugs. Moreover, removing type-related bugs from the training data helps improve neural bug detectors' ability to identify bugs beyond the scope of type checkers. - [165] arXiv:2411.15372 (cross-list from cs.CL) [pdf, html, other]
-
Title: Transforming NLU with Babylon: A Case Study in Development of Real-time, Edge-Efficient, Multi-Intent Translation System for Automated Drive-Thru OrderingComments: 12 pages, 3 figures, 2 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Real-time conversational AI agents face challenges in performing Natural Language Understanding (NLU) in dynamic, outdoor environments like automated drive-thru systems. These settings require NLU models to handle background noise, diverse accents, and multi-intent queries while operating under strict latency and memory constraints on edge devices. Additionally, robustness to errors from upstream Automatic Speech Recognition (ASR) is crucial, as ASR outputs in these environments are often noisy. We introduce Babylon, a transformer-based architecture that tackles NLU as an intent translation task, converting natural language inputs into sequences of regular language units ('transcodes') that encode both intents and slot information. This formulation allows Babylon to manage multi-intent scenarios in a single dialogue turn. Furthermore, Babylon incorporates an LSTM-based token pooling mechanism to preprocess phoneme sequences, reducing input length and optimizing for low-latency, low-memory edge deployment. This also helps mitigate inaccuracies in ASR outputs, enhancing system robustness. While this work focuses on drive-thru ordering, Babylon's design extends to similar noise-prone scenarios, for e.g. ticketing kiosks. Our experiments show that Babylon achieves significantly better accuracy-latency-memory footprint trade-offs over typically employed NMT models like Flan-T5 and BART, demonstrating its effectiveness for real-time NLU in edge deployment settings.
- [166] arXiv:2411.15378 (cross-list from eess.IV) [pdf, html, other]
-
Title: Improved Background Estimation for Gas Plume Identification in Hyperspectral ImagesComments: 13 pages, 10 figures, submitted and under review to IEEE Transactions on Geoscience and Remote SensingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Longwave infrared (LWIR) hyperspectral imaging can be used for many tasks in remote sensing, including detecting and identifying effluent gases by LWIR sensors on airborne platforms. Once a potential plume has been detected, it needs to be identified to determine exactly what gas or gases are present in the plume. During identification, the background underneath the plume needs to be estimated and removed to reveal the spectral characteristics of the gas of interest. Current standard practice is to use ``global" background estimation, where the average of all non-plume pixels is used to estimate the background for each pixel in the plume. However, if this global background estimate does not model the true background under the plume well, then the resulting signal can be difficult to identify correctly. The importance of proper background estimation increases when dealing with weak signals, large libraries of gases of interest, and with uncommon or heterogeneous backgrounds. In this paper, we propose two methods of background estimation, in addition to three existing methods, and compare each against global background estimation to determine which perform best at estimating the true background radiance under a plume, and for increasing identification confidence using a neural network classification model. We compare the different methods using 640 simulated plumes. We find that PCA is best at estimating the true background under a plume, with a median of 18,000 times less MSE compared to global background estimation. Our proposed K-Nearest Segments algorithm improves median neural network identification confidence by 53.2%.
- [167] arXiv:2411.15386 (cross-list from cs.AI) [pdf, html, other]
-
Title: Inducing Human-like Biases in Moral Reasoning Language ModelsComments: Accepted to the 2nd Workshop on Unifying Representations in Neural Models (UniReps) at NeurIPS 2024Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
In this work, we study the alignment (BrainScore) of large language models (LLMs) fine-tuned for moral reasoning on behavioral data and/or brain data of humans performing the same task. We also explore if fine-tuning several LLMs on the fMRI data of humans performing moral reasoning can improve the BrainScore. We fine-tune several LLMs (BERT, RoBERTa, DeBERTa) on moral reasoning behavioral data from the ETHICS benchmark [Hendrycks et al., 2020], on the moral reasoning fMRI data from Koster-Hale et al. [2013], or on both. We study both the accuracy on the ETHICS benchmark and the BrainScores between model activations and fMRI data. While larger models generally performed better on both metrics, BrainScores did not significantly improve after fine-tuning.
- [168] arXiv:2411.15388 (cross-list from cs.CV) [pdf, html, other]
-
Title: A Constrast-Agnostic Method for Ultra-High Resolution Claustrum SegmentationChiara Mauri, Ryan Fritz, Jocelyn Mora, Benjamin Billot, Juan Eugenio Iglesias, Koen Van Leemput, Jean Augustinack, Douglas N GreveComments: 14 pages, 10 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
The claustrum is a band-like gray matter structure located between putamen and insula whose exact functions are still actively researched. Its sheet-like structure makes it barely visible in in vivo Magnetic Resonance Imaging (MRI) scans at typical resolutions and neuroimaging tools for its study, including methods for automatic segmentation, are currently very limited. In this paper, we propose a contrast- and resolution-agnostic method for claustrum segmentation at ultra-high resolution (0.35 mm isotropic); the method is based on the SynthSeg segmentation framework (Billot et al., 2023), which leverages the use of synthetic training intensity images to achieve excellent generalization. In particular, SynthSeg requires only label maps to be trained, since corresponding intensity images are synthesized on the fly with random contrast and resolution. We trained a deep learning network for automatic claustrum segmentation, using claustrum manual labels obtained from 18 ultra-high resolution MRI scans (mostly ex vivo). We demonstrated the method to work on these 18 high resolution cases (Dice score = 0.632, mean surface distance = 0.458 mm, and volumetric similarity = 0.867 using 6-fold Cross Validation (CV)), and also on in vivo T1-weighted MRI scans at typical resolutions (~1 mm isotropic). We also demonstrated that the method is robust in a test-retest setting and when applied to multimodal imaging (T2-weighted, Proton Density and quantitative T1 scans). To the best of our knowledge this is the first accurate method for automatic ultra-high resolution claustrum segmentation, which is robust against changes in contrast and resolution. The method is released at this https URL and as part of the neuroimaging package Freesurfer (Fischl, 2012).
- [169] arXiv:2411.15399 (cross-list from cs.PF) [pdf, html, other]
-
Title: Less is More: Optimizing Function Calling for LLM Execution on Edge DevicesComments: Accepted at DATE 2025Subjects: Performance (cs.PF); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
The advanced function-calling capabilities of foundation models open up new possibilities for deploying agents to perform complex API tasks. However, managing large amounts of data and interacting with numerous APIs makes function calling hardware-intensive and costly, especially on edge devices. Current Large Language Models (LLMs) struggle with function calling at the edge because they cannot handle complex inputs or manage multiple tools effectively. This results in low task-completion accuracy, increased delays, and higher power consumption. In this work, we introduce Less-is-More, a novel fine-tuning-free function-calling scheme for dynamic tool selection. Our approach is based on the key insight that selectively reducing the number of tools available to LLMs significantly improves their function-calling performance, execution time, and power efficiency on edge devices. Experimental results with state-of-the-art LLMs on edge hardware show agentic success rate improvements, with execution time reduced by up to 70% and power consumption by up to 40%.
- [170] arXiv:2411.15405 (cross-list from cs.CL) [pdf, other]
-
Title: ML-SPEAK: A Theory-Guided Machine Learning Method for Studying and Predicting Conversational Turn-taking PatternsComments: 64 pages, 9 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Predicting team dynamics from personality traits remains a fundamental challenge for the psychological sciences and team-based organizations. Understanding how team composition generates team processes can significantly advance team-based research along with providing practical guidelines for team staffing and training. Although the Input-Process-Output (IPO) model has been useful for studying these connections, the complex nature of team member interactions demands a more dynamic approach. We develop a computational model of conversational turn-taking within self-organized teams that can provide insight into the relationships between team member personality traits and team communication dynamics. We focus on turn-taking patterns between team members, independent of content, which can significantly influence team emergent states and outcomes while being objectively measurable and quantifiable. As our model is trained on conversational data from teams of given trait compositions, it can learn the relationships between individual traits and speaking behaviors and predict group-wide patterns of communication based on team trait composition alone. We first evaluate the performance of our model using simulated data and then apply it to real-world data collected from self-organized student teams. In comparison to baselines, our model is more accurate at predicting speaking turn sequences and can reveal new relationships between team member traits and their communication patterns. Our approach offers a more data-driven and dynamic understanding of team processes. By bridging the gap between individual personality traits and team communication patterns, our model has the potential to inform theories of team processes and provide powerful insights into optimizing team staffing and training.
- [171] arXiv:2411.15418 (cross-list from q-bio.BM) [pdf, html, other]
-
Title: SPRINT Enables Interpretable and Ultra-Fast Virtual Screening against Thousands of ProteomesAndrew T. McNutt, Abhinav K. Adduri, Caleb N. Ellington, Monica T. Dayao, Eric P. Xing, Hosein Mohimani, David R. KoesComments: Machine Learning for Structural Biology Workshop, NeurIPS 2024Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Virtual screening of small molecules against protein targets can accelerate drug discovery and development by predicting drug-target interactions (DTIs). However, structure-based methods like molecular docking are too slow to allow for broad proteome-scale screens, limiting their application in screening for off-target effects or new molecular mechanisms. Recently, vector-based methods using protein language models (PLMs) have emerged as a complementary approach that bypasses explicit 3D structure modeling. Here, we develop SPRINT, a vector-based approach for screening entire chemical libraries against whole proteomes for DTIs and novel mechanisms of action. SPRINT improves on prior work by using a self-attention based architecture and structure-aware PLMs to learn drug-target co-embeddings for binder prediction, search, and retrieval. SPRINT achieves SOTA enrichment factors in virtual screening on LIT-PCBA and DTI classification benchmarks, while providing interpretability in the form of residue-level attention maps. In addition to being both accurate and interpretable, SPRINT is ultra-fast: querying the whole human proteome against the ENAMINE Real Database (6.7B drugs) for the 100 most likely binders per protein takes 16 minutes. SPRINT promises to enable virtual screening at an unprecedented scale, opening up new opportunities for in silico drug repurposing and development. SPRINT is available on the web as ColabScreen: this https URL
- [172] arXiv:2411.15477 (cross-list from cs.CL) [pdf, html, other]
-
Title: Towards Robust Evaluation of Unlearning in LLMs via Data TransformationsAbhinav Joshi, Shaswati Saha, Divyaksh Shukla, Sriram Vema, Harsh Jhamtani, Manas Gaur, Ashutosh ModiComments: Accepted at EMNLP 2024 Findings; 21 pages (5 page main content + references + appendix)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. LLMs have been trained on a vast corpus of texts from various sources; despite the best efforts during the data pre-processing stage while training the LLMs, they may pick some undesirable information such as personally identifiable information (PII). Consequently, in recent times research in the area of Machine Unlearning (MUL) has become active, the main idea is to force LLMs to forget (unlearn) certain information (e.g., PII) without suffering from performance loss on regular tasks. In this work, we examine the robustness of the existing MUL techniques for their ability to enable leakage-proof forgetting in LLMs. In particular, we examine the effect of data transformation on forgetting, i.e., is an unlearned LLM able to recall forgotten information if there is a change in the format of the input? Our findings on the TOFU dataset highlight the necessity of using diverse data formats to quantify unlearning in LLMs more reliably.
- [173] arXiv:2411.15490 (cross-list from cs.CV) [pdf, html, other]
-
Title: Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain AugmentationJunhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul YeSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contain the most relevant clinical information from the image findings, the difficulty of mapping across different modalities has limited the factuality of conventional direct DWI-to-report generation methods. Here, we propose paired image-domain retrieval and text-domain augmentation (PIRTA), a cross-modal retrieval-augmented generation (RAG) framework for providing clinician-interpretative AIS radiology reports with improved factuality. PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports. By exploiting the retrieved radiology reports to augment the report generation process of the query image, we show by experiments with extensive in-house and public datasets that PIRTA can accurately retrieve relevant reports from 3D DWI images. This approach enables the generation of radiology reports with significantly higher accuracy compared to direct image-to-text generation using state-of-the-art multimodal language models.
- [174] arXiv:2411.15519 (cross-list from q-fin.RM) [pdf, html, other]
-
Title: Risk Management with Feature-Enriched Generative Adversarial Networks (FE-GAN)Subjects: Risk Management (q-fin.RM); Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper investigates the application of Feature-Enriched Generative Adversarial Networks (FE-GAN) in financial risk management, with a focus on improving the estimation of Value at Risk (VaR) and Expected Shortfall (ES). FE-GAN enhances existing GANs architectures by incorporating an additional input sequence derived from preceding data to improve model performance. Two specialized GANs models, the Wasserstein Generative Adversarial Network (WGAN) and the Tail Generative Adversarial Network (Tail-GAN), were evaluated under the FE-GAN framework. The results demonstrate that FE-GAN significantly outperforms traditional architectures in both VaR and ES estimation. Tail-GAN, leveraging its task-specific loss function, consistently outperforms WGAN in ES estimation, while both models exhibit similar performance in VaR estimation. Despite these promising results, the study acknowledges limitations, including reliance on highly correlated temporal data and restricted applicability to other domains. Future research directions include exploring alternative input generation methods, dynamic forecasting models, and advanced neural network architectures to further enhance GANs-based financial risk estimation.
- [175] arXiv:2411.15540 (cross-list from cs.CV) [pdf, html, other]
-
Title: Optical-Flow Guided Prompt Optimization for Coherent Video GenerationComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
While text-to-video diffusion models have made significant strides, many still face challenges in generating videos with temporal consistency. Within diffusion frameworks, guidance techniques have proven effective in enhancing output quality during inference; however, applying these methods to video diffusion models introduces additional complexity of handling computations across entire sequences. To address this, we propose a novel framework called MotionPrompt that guides the video generation process via optical flow. Specifically, we train a discriminator to distinguish optical flow between random pairs of frames from real videos and generated ones. Given that prompts can influence the entire video, we optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs. This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content. We demonstrate the effectiveness of our approach across various models.
- [176] arXiv:2411.15557 (cross-list from cs.CV) [pdf, html, other]
-
Title: LAGUNA: LAnguage Guided UNsupervised Adaptation with structured spacesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Unsupervised domain adaptation remains a critical challenge in enabling the knowledge transfer of models across unseen domains. Existing methods struggle to balance the need for domain-invariant representations with preserving domain-specific features, which is often due to alignment approaches that impose the projection of samples with similar semantics close in the latent space despite their drastic domain differences. We introduce \mnamelong, a novel approach that shifts the focus from aligning representations in absolute coordinates to aligning the relative positioning of equivalent concepts in latent spaces. \mname defines a domain-agnostic structure upon the semantic/geometric relationships between class labels in language space and guides adaptation, ensuring that the organization of samples in visual space reflects reference inter-class relationships while preserving domain-specific characteristics. %We empirically demonstrate \mname's superiority in domain adaptation tasks across four diverse images and video datasets. Remarkably, \mname surpasses previous works in 18 different adaptation scenarios across four diverse image and video datasets with average accuracy improvements of +3.32% on DomainNet, +5.75% in GeoPlaces, +4.77% on GeoImnet, and +1.94% mean class accuracy improvement on EgoExo4D.
- [177] arXiv:2411.15584 (cross-list from cs.CV) [pdf, html, other]
-
Title: FLD+: Data-efficient Evaluation Metric for Generative ModelsComments: 13 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
We introduce a new metric to assess the quality of generated images that is more reliable, data-efficient, compute-efficient, and adaptable to new domains than the previous metrics, such as Fréchet Inception Distance (FID). The proposed metric is based on normalizing flows, which allows for the computation of density (exact log-likelihood) of images from any domain. Thus, unlike FID, the proposed Flow-based Likelihood Distance Plus (FLD+) metric exhibits strongly monotonic behavior with respect to different types of image degradations, including noise, occlusion, diffusion steps, and generative model size. Additionally, because normalizing flow can be trained stably and efficiently, FLD+ achieves stable results with two orders of magnitude fewer images than FID (which requires more images to reliably compute Fréchet distance between features of large samples of real and generated images). We made FLD+ computationally even more efficient by applying normalizing flows to features extracted in a lower-dimensional latent space instead of using a pre-trained network. We also show that FLD+ can easily be retrained on new domains, such as medical images, unlike the networks behind previous metrics -- such as InceptionNetV3 pre-trained on ImageNet.
- [178] arXiv:2411.15602 (cross-list from cs.CV) [pdf, html, other]
-
Title: Enhancing Object Detection Accuracy in Autonomous Vehicles Using Synthetic DataComments: 7 Pages, 7 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The rapid progress in machine learning models has significantly boosted the potential for real-world applications such as autonomous vehicles, disease diagnoses, and recognition of emergencies. The performance of many machine learning models depends on the nature and size of the training data sets. These models often face challenges due to the scarcity, noise, and imbalance in real-world data, limiting their performance. Nonetheless, high-quality, diverse, relevant and representative training data is essential to build accurate and reliable machine learning models that adapt well to real-world scenarios.
It is hypothesised that well-designed synthetic data can improve the performance of a machine learning algorithm. This work aims to create a synthetic dataset and evaluate its effectiveness to improve the prediction accuracy of object detection systems. This work considers autonomous vehicle scenarios as an illustrative example to show the efficacy of synthetic data. The effectiveness of these synthetic datasets in improving the performance of state-of-the-art object detection models is explored. The findings demonstrate that incorporating synthetic data improves model performance across all performance matrices.
Two deep learning systems, System-1 (trained on real-world data) and System-2 (trained on a combination of real and synthetic data), are evaluated using the state-of-the-art YOLO model across multiple metrics, including accuracy, precision, recall, and mean average precision. Experimental results revealed that System-2 outperformed System-1, showing a 3% improvement in accuracy, along with superior performance in all other metrics. - [179] arXiv:2411.15618 (cross-list from q-bio.BM) [pdf, html, other]
-
Title: Accelerated Hydration Site Localization and Thermodynamic ProfilingSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Water plays a fundamental role in the structure and function of proteins and other biomolecules. The thermodynamic profile of water molecules surrounding a protein are critical for ligand binding and recognition. Therefore, identifying the location and thermodynamic behavior of relevant water molecules is important for generating and optimizing lead compounds for affinity and selectivity to a given target. Computational methods have been developed to identify these hydration sites, but are largely limited to simplified models that fail to capture multi-body interactions, or dynamics-based methods that rely on extensive sampling. Here we present a method for fast and accurate localization and thermodynamic profiling of hydration sites for protein structures. The method is based on a geometric deep neural network trained on a large, novel dataset of explicit water molecular dynamics simulations. We confirm the accuracy and robustness of our model on experimental data and demonstrate it's utility on several case studies.
- [180] arXiv:2411.15624 (cross-list from stat.ML) [pdf, html, other]
-
Title: Trans-Glasso: A Transfer Learning Approach to Precision Matrix EstimationComments: 49 pages, 7 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Precision matrix estimation is essential in various fields, yet it is challenging when samples for the target study are limited. Transfer learning can enhance estimation accuracy by leveraging data from related source studies. We propose Trans-Glasso, a two-step transfer learning method for precision matrix estimation. First, we obtain initial estimators using a multi-task learning objective that captures shared and unique features across studies. Then, we refine these estimators through differential network estimation to adjust for structural differences between the target and source precision matrices. Under the assumption that most entries of the target precision matrix are shared with source matrices, we derive non-asymptotic error bounds and show that Trans-Glasso achieves minimax optimality under certain conditions. Extensive simulations demonstrate Trans Glasso's superior performance compared to baseline methods, particularly in small-sample settings. We further validate Trans-Glasso in applications to gene networks across brain tissues and protein networks for various cancer subtypes, showcasing its effectiveness in biological contexts. Additionally, we derive the minimax optimal rate for differential network estimation, representing the first such guarantee in this area.
- [181] arXiv:2411.15643 (cross-list from eess.SY) [pdf, html, other]
-
Title: On the Boundary Feasibility for PDE Control with Neural OperatorsComments: 27 pages, 5 figures, 8 tablesSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
The physical world dynamics are generally governed by underlying partial differential equations (PDEs) with unknown analytical forms in science and engineering problems. Neural network based data-driven approaches have been heavily studied in simulating and solving PDE problems in recent years, but it is still challenging to move forward from understanding to controlling the unknown PDE dynamics. PDE boundary control instantiates a simplified but important problem by only focusing on PDE boundary conditions as the control input and output. However, current model-free PDE controllers cannot ensure the boundary output satisfies some given user-specified safety constraint. To this end, we propose a safety filtering framework to guarantee the boundary output stays within the safe set for current model-free controllers. Specifically, we first introduce a general neural boundary control barrier function (BCBF) to ensure the feasibility of the trajectorywise constraint satisfaction of boundary output. Based on a neural operator modeling the transfer function from boundary control input to output trajectories, we show that the change in the BCBF depends linearly on the change in input boundary, so quadratic programming-based safety filtering can be done for pre-trained model-free controllers. Extensive experiments under challenging hyperbolic, parabolic and Navier-Stokes PDE dynamics environments validate the effectiveness of the proposed method in achieving better general performance and boundary constraint satisfaction compared to the model-free controller baselines.
- [182] arXiv:2411.15647 (cross-list from q-bio.PE) [pdf, other]
-
Title: Circuit design in biology and machine learning. II. Anomaly detectionSubjects: Populations and Evolution (q-bio.PE); Machine Learning (cs.LG)
Anomaly detection is a well-established field in machine learning, identifying observations that deviate from typical patterns. The principles of anomaly detection could enhance our understanding of how biological systems recognize and respond to atypical environmental inputs. However, this approach has received limited attention in analyses of cellular and physiological circuits. This study builds on machine learning techniques -- such as dimensionality reduction, boosted decision trees, and anomaly classification -- to develop a conceptual framework for biological circuits. One problem is that machine learning circuits tend to be unrealistically large for use by cellular and physiological systems. I therefore focus on minimal circuits inspired by machine learning concepts, reduced to cellular scale. Through illustrative models, I demonstrate that small circuits can provide useful classification of anomalies. The analysis also shows how principles from machine learning -- such as temporal and atemporal anomaly detection, multivariate signal integration, and hierarchical decision-making cascades -- can inform hypotheses about the design and evolution of cellular circuits. This interdisciplinary approach enhances our understanding of cellular circuits and highlights the universal nature of computational strategies across biological and artificial systems.
- [183] arXiv:2411.15656 (cross-list from eess.IV) [pdf, other]
-
Title: Machine-agnostic Automated Lumbar MRI Segmentation using a Cascaded Model Based on Generative NeuronsPromit Basak, Rusab Sarmun, Saidul Kabir, Israa Al-Hashimi, Enamul Hoque Bhuiyan, Anwarul Hasan, Muhammad Salman Khan, Muhammad E. H. ChowdhuryComments: 19 Pages, 11 Figures, Expert Systems with Applications, 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Automated lumbar spine segmentation is very crucial for modern diagnosis systems. In this study, we introduce a novel machine-agnostic approach for segmenting lumbar vertebrae and intervertebral discs from MRI images, employing a cascaded model that synergizes an ROI detection and a Self-organized Operational Neural Network (Self-ONN)-based encoder-decoder network for segmentation. Addressing the challenge of diverse MRI modalities, our methodology capitalizes on a unique dataset comprising images from 12 scanners and 34 subjects, enhanced through strategic preprocessing and data augmentation techniques. The YOLOv8 medium model excels in ROI extraction, achieving an excellent performance of 0.916 mAP score. Significantly, our Self-ONN-based model, combined with a DenseNet121 encoder, demonstrates excellent performance in lumbar vertebrae and IVD segmentation with a mean Intersection over Union (IoU) of 83.66%, a sensitivity of 91.44%, and Dice Similarity Coefficient (DSC) of 91.03%, as validated through rigorous 10-fold cross-validation. This study not only showcases an effective approach to MRI segmentation in spine-related disorders but also sets the stage for future advancements in automated diagnostic tools, emphasizing the need for further dataset expansion and model refinement for broader clinical applicability.
- [184] arXiv:2411.15661 (cross-list from cs.CL) [pdf, html, other]
-
Title: Improving Next Tokens via Second-Last Predictions with Generate and RefineSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Autoregressive language models like GPT aim at predicting next tokens, while autoencoding models such as BERT are trained on tasks such as predicting masked tokens. We train a decoder only architecture for predicting the second last token for a sequence of tokens. Our approach yields higher computational training efficiency than BERT-style models by employing a structured deterministic approach towards masking tokens. We use our model to improve the next token predictions of a standard GPT by combining both predictions in a ``generate-then-refine'' approach. We show on different variants of GPT-2 and different datasets that (not unexpectedly) second last token predictions are much more accurate, i.e., more than 15\% higher accuracy than ordinary next token predictors. The ``generate-then-refine'' approach also demonstrates notable improvements in next-token predictions, yielding smaller yet consistent and significant gains.
- [185] arXiv:2411.15664 (cross-list from cs.DC) [pdf, html, other]
-
Title: Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the CloudComments: 12 pages, 7 figures, TUM Cloud Computing SeminarSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
This review report discusses the cold start latency in serverless inference and existing solutions. It particularly reviews the ServerlessLLM method, a system designed to address the cold start problem in serverless inference for large language models. Traditional serverless approaches struggle with high latency due to the size of LLM checkpoints and the overhead of initializing GPU resources. ServerlessLLM introduces a multitier checkpoint loading system, leveraging underutilized GPU memory and storage to reduce startup times by 6--8x compared to existing methods. It also proposes live inference migration and a startup-time-optimized model scheduler, ensuring efficient resource allocation and minimizing delays. This system significantly improves performance and scalability in serverless environments for LLM workloads. Besides ServerlessLLM, several other methods from recent research literature, including Rainbowcake, are reviewed in this paper. Further discussions explore how FaaS providers tackle cold starts and the possible future scopes.
- [186] arXiv:2411.15669 (cross-list from cs.DS) [pdf, html, other]
-
Title: Implicit High-Order Moment Tensor Estimation and Learning Latent Variable ModelsComments: Abstract shortened due to arxiv requirementsSubjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the task of learning latent-variable models. An obstacle towards designing efficient algorithms for such models is the necessity of approximating moment tensors of super-constant degree. Motivated by such applications, we develop a general efficient algorithm for implicit moment tensor computation. Our algorithm computes in $\mathrm{poly}(d, k)$ time a succinct approximate description of tensors of the form $M_m=\sum_{i=1}^{k}w_iv_i^{\otimes m}$, for $w_i\in\mathbb{R}_+$--even for $m=\omega(1)$--assuming there exists a polynomial-size arithmetic circuit whose expected output on an appropriate samplable distribution is equal to $M_m$, and whose covariance on this input is bounded. Our framework broadly generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians.
By leveraging our general algorithm, we obtain the first polynomial-time learners for the following models.
* Mixtures of Linear Regressions. We give a $\mathrm{poly}(d, k, 1/\epsilon)$-time algorithm for this task. The previously best algorithm has super-polynomial complexity in $k$.
* Learning Mixtures of Spherical Gaussians. We give a $\mathrm{poly}(d, k, 1/\epsilon)$-time density estimation algorithm, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. Prior algorithms incur super-polynomial complexity in $k$. We also give a $\mathrm{poly}(d, k, 1/\epsilon)$-time parameter estimation algorithm, under the {\em optimal} mean separation of $\Omega(\log^{1/2}(k/\epsilon))$.
* PAC Learning Sums of ReLUs. We give a learner with complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/\epsilon)}$. This is the first algorithm for this task that runs in $\mathrm{poly}(d, k)$ time for subconstant values of $\epsilon = o_{k, d}(1)$. - [187] arXiv:2411.15672 (cross-list from cs.CR) [pdf, html, other]
-
Title: IRSKG: Unified Intrusion Response System Knowledge Graph Ontology for Cyber DefenseComments: 10 pages, 8 figuresSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cyberattacks are becoming increasingly difficult to detect and prevent due to their sophistication. In response, Autonomous Intelligent Cyber-defense Agents (AICAs) are emerging as crucial solutions. One prominent AICA agent is the Intrusion Response System (IRS), which is critical for mitigating threats after detection. IRS uses several Tactics, Techniques, and Procedures (TTPs) to mitigate attacks and restore the infrastructure to normal operations. Continuous monitoring of the enterprise infrastructure is an essential TTP the IRS uses. However, each system serves different purposes to meet operational needs. Integrating these disparate sources for continuous monitoring increases pre-processing complexity and limits automation, eventually prolonging critical response time for attackers to exploit. We propose a unified IRS Knowledge Graph ontology (IRSKG) that streamlines the onboarding of new enterprise systems as a source for the AICAs. Our ontology can capture system monitoring logs and supplemental data, such as a rules repository containing the administrator-defined policies to dictate the IRS responses. Besides, our ontology permits us to incorporate dynamic changes to adapt to the evolving cyber-threat landscape. This robust yet concise design allows machine learning models to train effectively and recover a compromised system to its desired state autonomously with explainability.
- [188] arXiv:2411.15684 (cross-list from q-bio.BM) [pdf, html, other]
-
Title: Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide SequencingSubjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established system DeepNovo-DIA by from 25% to 81%, averaging 48%, for amino acid recall, and by from 27% to 89%, averaging 57%, for peptide recall, by equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.
- [189] arXiv:2411.15706 (cross-list from cs.CV) [pdf, html, other]
-
Title: Fixing the Perspective: A Critical Examination of Zero-1-to-3Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Novel view synthesis is a fundamental challenge in image-to-3D generation, requiring the generation of target view images from a set of conditioning images and their relative poses. While recent approaches like Zero-1-to-3 have demonstrated promising results using conditional latent diffusion models, they face significant challenges in generating consistent and accurate novel views, particularly when handling multiple conditioning images. In this work, we conduct a thorough investigation of Zero-1-to-3's cross-attention mechanism within the Spatial Transformer of the diffusion 2D-conditional UNet. Our analysis reveals a critical discrepancy between Zero-1-to-3's theoretical framework and its implementation, specifically in the processing of image-conditional context. We propose two significant improvements: (1) a corrected implementation that enables effective utilization of the cross-attention mechanism, and (2) an enhanced architecture that can leverage multiple conditional views simultaneously. Our theoretical analysis and preliminary results suggest potential improvements in novel view synthesis consistency and accuracy.
- [190] arXiv:2411.15734 (cross-list from cs.CL) [pdf, html, other]
-
Title: Development of Pre-Trained Transformer-based Models for the Nepali LanguageSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Transformer-based pre-trained language models have dominated the field of Natural Language Processing (NLP) for quite some time now. However, the Nepali language, spoken by approximately 32 million people worldwide, remains significantly underrepresented in this domain. This underrepresentation is primarily attributed to the scarcity of monolingual data corpora and limited available resources for the Nepali language. While existing efforts have predominantly concentrated on basic encoder-based models, there is a notable gap in the exploration of decoder-based architectures. To address this gap, we have collected 27.5 GB of Nepali text data, approximately 2.4x larger than any previously available Nepali language corpus. Leveraging this data, we pre-trained three different models i.e., BERT, RoBERTa, and GPT-2, exclusively for the Nepali Language. Furthermore, we performed instruction tuning and explored its potential for monolingual Nepali data, providing a foundation for future research. Our models outperformed the existing best model by 2 points on Nep-gLUE benchmark, scoring 95.60 and also outperformed existing models on text generation tasks, demonstrating improvements in both understanding and generating Nepali text.
- [191] arXiv:2411.15737 (cross-list from cs.AI) [pdf, html, other]
-
Title: TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language ModelsSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification (MTSC). Effective adaptation of LLMs for MTSC necessitates informative data representations. Existing LLM-based methods directly encode embeddings for time series within the latent space of LLMs from scratch to align with semantic space of LLMs. Despite their effectiveness, we reveal that these methods conceal three inherent bottlenecks: (1) they struggle to encode temporal and channel-specific information in a lossless manner, both of which are critical components of multivariate time series; (2) it is much difficult to align the learned representation space with the semantic space of the LLMs; (3) they require task-specific retraining, which is both computationally expensive and labor-intensive. To bridge these gaps, we propose TableTime, which reformulates MTSC as a table understanding task. Specifically, TableTime introduces the following strategies: (1) convert multivariate time series into a tabular form, thus minimizing information loss to the greatest extent; (2) represent tabular time series in text format to achieve natural alignment with the semantic space of LLMs; (3) design a reasoning framework that integrates contextual text information, neighborhood assistance, multi-path inference and problem decomposition to enhance the reasoning ability of LLMs and realize zero-shot classification. Extensive experiments performed on 10 publicly representative datasets from UEA archive verify the superiorities of the TableTime.
- [192] arXiv:2411.15741 (cross-list from cs.CV) [pdf, html, other]
-
Title: Proceedings of the 6th International Workshop on Reading Music SystemsComments: Proceedings edited by Jorge Calvo-Zaragoza, Alexander Pacha and Elona ShatriSubjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music.
These are the proceedings of the 6th International Workshop on Reading Music Systems, held Online on November 22nd 2024. - [193] arXiv:2411.15763 (cross-list from cs.CV) [pdf, html, other]
-
Title: Integrating Deep Metric Learning with Coreset for Active Learning in 3D SegmentationComments: To be published in NeurIPS 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep learning has seen remarkable advancements in machine learning, yet it often demands extensive annotated data. Tasks like 3D semantic segmentation impose a substantial annotation burden, especially in domains like medicine, where expert annotations drive up the cost. Active learning (AL) holds great potential to alleviate this annotation burden in 3D medical segmentation. The majority of existing AL methods, however, are not tailored to the medical domain. While weakly-supervised methods have been explored to reduce annotation burden, the fusion of AL with weak supervision remains unexplored, despite its potential to significantly reduce annotation costs. Additionally, there is little focus on slice-based AL for 3D segmentation, which can also significantly reduce costs in comparison to conventional volume-based AL. This paper introduces a novel metric learning method for Coreset to perform slice-based active learning in 3D medical segmentation. By merging contrastive learning with inherent data groupings in medical imaging, we learn a metric that emphasizes the relevant differences in samples for training 3D medical segmentation models. We perform comprehensive evaluations using both weak and full annotations across four datasets (medical and non-medical). Our findings demonstrate that our approach surpasses existing active learning techniques on both weak and full annotations and obtains superior performance with low-annotation budgets which is crucial in medical imaging. Source code for this project is available in the supplementary materials and on GitHub: this https URL.
- [194] arXiv:2411.15769 (cross-list from math.OC) [pdf, html, other]
-
Title: Gradient Norm Regularization Second-Order Algorithms for Solving Nonconvex-Strongly Concave Minimax ProblemsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we study second-order algorithms for solving nonconvex-strongly concave minimax problems, which have attracted much attention in recent years in many fields, especially in machine learning. We propose a gradient norm regularized trust region (GRTR) algorithm to solve nonconvex-strongly concave minimax problems, where the objective function of the trust region subproblem in each iteration uses a regularized version of the Hessian matrix, and the regularization coefficient and the radius of the ball constraint are proportional to the square root of the gradient norm. The iteration complexity of the proposed GRTR algorithm to obtain an $\mathcal{O}(\epsilon,\sqrt{\epsilon})$-second-order stationary point is proved to be upper bounded by $\tilde{\mathcal{O}}(\rho^{0.5}\kappa^{1.5}\epsilon^{-3/2})$, where $\rho$ and $\kappa$ are the Lipschitz constant of the Jacobian matrix and the condition number of the objective function respectively, which matches the best known iteration complexity of second-order methods for solving nonconvex-strongly concave minimax problems. We further propose a Levenberg-Marquardt algorithm with a gradient norm regularization coefficient and use the negative curvature direction to correct the iteration direction (LMNegCur), which does not need to solve the trust region subproblem at each iteration. We also prove that the LMNegCur algorithm achieves an $\mathcal{O}(\epsilon,\sqrt{\epsilon})$-second-order stationary point within $\tilde{\mathcal{O}}(\rho^{0.5}\kappa^{1.5}\epsilon^{-3/2})$ number of iterations. Numerical results show the efficiency of both proposed algorithms.
- [195] arXiv:2411.15801 (cross-list from cs.MM) [pdf, html, other]
-
Title: A review on Machine Learning based User-Centric Multimedia Streaming TechniquesComments: Computer CommunicationsSubjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The multimedia content and streaming are a major means of information exchange in the modern era and there is an increasing demand for such services. This coupled with the advancement of future wireless networks B5G/6G and the proliferation of intelligent handheld mobile devices, has facilitated the availability of multimedia content to heterogeneous mobile users. Apart from the conventional video, the 360$^o$ videos have gained popularity with the emerging virtual reality applications. All formats of videos (conventional and 360$^o$) undergo processing, compression, and transmission across dynamic wireless channels with restricted bandwidth to facilitate the streaming services. This causes video impairments, leading to quality degradation and poses challenges in delivering good Quality-of-Experience (QoE) to the viewers. The QoE is a prominent subjective quality measure to assess multimedia services. This requires end-to-end QoE evaluation. Efficient multimedia streaming techniques can improve the service quality while dealing with dynamic network and end-user challenges. A paradigm shift in user-centric multimedia services is envisioned with a focus on Machine Learning (ML) based QoE modeling and streaming strategies. This survey paper presents a comprehensive overview of the overall and continuous, time varying QoE modeling for the purpose of QoE management in multimedia services. It also examines the recent research on intelligent and adaptive multimedia streaming strategies, with a special emphasis on ML based techniques for video (conventional and 360$^o$) streaming. This paper discusses the overall and continuous QoE modeling to optimize the end-user viewing experience, efficient video streaming with a focus on user-centric strategies, associated datasets for modeling and streaming, along with existing shortcoming and open challenges.
- [196] arXiv:2411.15804 (cross-list from cs.CL) [pdf, html, other]
-
Title: LoRA-Mini : Adaptation Matrices Decomposition and Selective TrainingComments: 11 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The rapid advancements in large language models (LLMs) have revolutionized natural language processing, creating an increased need for efficient, task-specific fine-tuning methods. Traditional fine-tuning of LLMs involves updating a large number of parameters, which is computationally expensive and memory-intensive. Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters. However, while LoRA reduces the number of trainable parameters, LoRA modules still create significant storage challenges. We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts, with only the two inner matrices being trainable. This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA, addressing both computational and storage efficiency in LLM fine-tuning.
- [197] arXiv:2411.15813 (cross-list from cond-mat.dis-nn) [pdf, html, other]
-
Title: Lattice $\phi^{4}$ field theory as a multi-agent system of financial marketsComments: Code is available from this https URLSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Multiagent Systems (cs.MA); High Energy Physics - Lattice (hep-lat)
We introduce a $\phi^{4}$ lattice field theory with frustrated dynamics as a multi-agent system to reproduce stylized facts of financial markets such as fat-tailed distributions of returns and clustered volatility. Each lattice site, represented by a continuous degree of freedom, corresponds to an agent experiencing a set of competing interactions which influence its decision to buy or sell a given stock. These interactions comprise a cooperative term, which signifies that the agent should imitate the behavior of its neighbors, and a fictitious field, which compels the agent instead to conform with the opinion of the majority or the minority. To introduce the competing dynamics we exploit the Markov field structure to pursue a constructive decomposition of the $\phi^{4}$ probability distribution which we recompose with a Ferrenberg-Swendsen acceptance or rejection sampling step. We then verify numerically that the multi-agent $\phi^{4}$ field theory produces behavior observed on empirical data from the FTSE 100 London Stock Exchange index. We conclude by discussing how the presence of continuous degrees of freedom within the $\phi^{4}$ lattice field theory enables a representational capacity beyond that possible with multi-agent systems derived from Ising models.
- [198] arXiv:2411.15821 (cross-list from cs.CL) [pdf, other]
-
Title: Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?Comments: 10 pages, 4 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This study investigates the relative impact of training data quality versus quantity on the performance of small language models (SLMs), utilizing the TinyStories dataset for empirical analysis. Analysis of dataset variations with respect to size (25% and 50% of the original size) and duplication (controlled rates of 25%, 50%, 75%, and 100%) were performed. Model performance was evaluated based on the validation loss, accuracy, and perplexity metrics. Results indicate training data quality plays a more significant role in the overall performance of SLMs, especially given scale of this experiment. Minimal duplication positively impacted model accuracy (+0.87% increase in accuracy at 25% duplication) without significantly increasing perplexity (+0.52% increase going from 0% to 25% duplication) but excessive duplication led to pronounced performance degradation (-40% drop in accuracy at 100% duplication). The implications of this exploration extend beyond just model performance; training large-scale models imposes significant financial and computational burdens, which can be prohibitive for organizations, individuals, and the public at large, especially in developing countries. Additionally, the energy consumption associated with large-scale training raises environmental concerns. Understanding the relative importance of data quality versus quantity could democratize AI technology, making advanced models more accessible and sustainable for all.
- [199] arXiv:2411.15843 (cross-list from cs.CV) [pdf, html, other]
-
Title: Unveil Inversion and Invariance in Flow Transformer for Versatile Image EditingPengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, Boyu WangSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target contents. However, the prevailing diffusion inversion performs deficiently in flow-based models, and the invariance control cannot reconcile diverse rigid and non-rigid editing tasks. To address these, we systematically analyze the \textbf{inversion and invariance} control based on the flow transformer. Specifically, we unveil that the Euler inversion shares a similar structure to DDIM yet is more susceptible to the approximation error. Thus, we propose a two-stage inversion to first refine the velocity estimation and then compensate for the leftover error, which pivots closely to the model prior and benefits editing. Meanwhile, we propose the invariance control that manipulates the text features within the adaptive layer normalization, connecting the changes in the text prompt to image semantics. This mechanism can simultaneously preserve the non-target contents while allowing rigid and non-rigid manipulation, enabling a wide range of editing types such as visual text, quantity, facial expression, etc. Experiments on versatile scenarios validate that our framework achieves flexible and accurate editing, unlocking the potential of the flow transformer for versatile image editing.
- [200] arXiv:2411.15913 (cross-list from cs.SD) [pdf, html, other]
-
Title: A Training-Free Approach for Music Style Transfer with Latent Diffusion ModelsComments: Codes will be released upon acceptanceSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Music style transfer, while offering exciting possibilities for personalized music generation, often requires extensive training or detailed textual descriptions. This paper introduces a novel training-free approach leveraging pre-trained Latent Diffusion Models (LDMs). By manipulating the self-attention features of the LDM, we effectively transfer the style of reference music onto content music without additional training. Our method achieves superior style transfer and melody preservation compared to existing methods. This work opens new creative avenues for personalized music generation.
- [201] arXiv:2411.15925 (cross-list from cs.CV) [pdf, html, other]
-
Title: Making Images from Images: Interleaving Denoising and TransformationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Simply by rearranging the regions of an image, we can create a new image of any subject matter. The definition of regions is user definable, ranging from regularly and irregularly-shaped blocks, concentric rings, or even individual pixels. Our method extends and improves recent work in the generation of optical illusions by simultaneously learning not only the content of the images, but also the parameterized transformations required to transform the desired images into each other. By learning the image transforms, we allow any source image to be pre-specified; any existing image (e.g. the Mona Lisa) can be transformed to a novel subject. We formulate this process as a constrained optimization problem and address it through interleaving the steps of image diffusion with an energy minimization step. Unlike previous methods, increasing the number of regions actually makes the problem easier and improves results. We demonstrate our approach in both pixel and latent spaces. Creative extensions, such as using infinite copies of the source image and employing multiple source images, are also given.
- [202] arXiv:2411.15982 (cross-list from cs.AR) [pdf, html, other]
-
Title: Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data FormatComments: To appear in 2025 IEEE International Symposium on High-Performance Computer Architecture (HPCA 2025)Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The widely-used, weight-only quantized large language models (LLMs), which leverage low-bit integer (INT) weights and retain floating-point (FP) activations, reduce storage requirements while maintaining accuracy. However, this shifts the energy and latency bottlenecks towards the FP activations that are associated with costly memory accesses and computations. Existing LLM accelerators focus primarily on computation optimizations, overlooking the potential of jointly optimizing FP computations and data movement, particularly for the dominant FP-INT GeMM operations in LLM inference.
To address these challenges, we investigate the sensitivity of activation precision across various LLM modules and its impact on overall model accuracy. Based on our findings, we first propose the Anda data type: an adaptive data format with group-shared exponent bits and dynamic mantissa bit allocation. Secondly, we develop an iterative post-training adaptive precision search algorithm that optimizes the bit-width for different LLM modules to balance model accuracy, energy efficiency, and inference speed. Lastly, a suite of hardware optimization techniques is proposed to maximally exploit the benefits of the Anda format. These include a bit-plane-based data organization scheme, Anda-enhanced processing units with bit-serial computation, and a runtime bit-plane Anda compressor to simultaneously optimize storage, computation, and memory footprints. Our evaluations on FPINT GeMM operations show that Anda achieves a 2.4x speedup, 4.0x area efficiency, and 3.1x energy efficiency improvement on average for popular LLMs including OPT, LLaMA, and LLaMA-2 series over the GPU-like FP-FP baseline. Anda demonstrates strong adaptability across various application scenarios, accuracy requirements, and system performance, enabling efficient LLM inference across a wide range of deployment scenarios. - [203] arXiv:2411.15998 (cross-list from cs.AI) [pdf, other]
-
Title: PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision MakingJonathan Light, Sixue Xing, Yuanzhe Liu, Weiqin Chen, Min Cai, Xiusi Chen, Guanzhi Wang, Wei Cheng, Yisong Yue, Ziniu HuComments: Published at Language Gamification Workshop 2024 @ NeurIPSSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Effective extraction of the world knowledge in LLMs for complex decision-making tasks remains a challenge. We propose a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation. Given only the natural language description of the game and how input observations are formatted, our method can generate a working world model for fast and efficient MCTS simulation. We show that our method works well on two different games that challenge the planning and decision making skills of the agent for both language and non-language based action taking, without any training on domain-specific training data or explicitly defined world model.
- [204] arXiv:2411.16033 (cross-list from hep-th) [pdf, html, other]
-
Title: Generative AI for Brane Configurations, Tropical Coamoeba and 4d N=1 Quiver Gauge TheoriesComments: 21 pages, 8 figures, 1 tableSubjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG); Mathematical Physics (math-ph); Algebraic Geometry (math.AG)
We introduce a generative AI model to obtain Type IIB brane configurations that realize toric phases of a family of 4d N=1 supersymmetric gauge theories. These 4d N=1 quiver gauge theories are worldvolume theories of a D3-brane probing a toric Calabi-Yau 3-fold. The Type IIB brane configurations that realize this family of 4d N=1 theories are known as brane tilings and are given by the tropical coamoeba projection of the mirror curve associated with the toric Calabi-Yau 3-fold. The shape of the mirror curve and its coamoeba projection, as well as the corresponding Type IIB brane configuration and the toric phase of the 4d N=1 theory, all depend on the complex structure moduli parameterizing the mirror curve. We train a generative AI model, a conditional variational autoencoder (CVAE), that takes a choice of complex structure moduli as input and generates the corresponding tropical coamoeba. This enables us not only to obtain a high-resolution representation of the entire phase space for a family of brane tilings corresponding to the same toric Calabi-Yau 3-fold, but also to continuously track the movements of the mirror curve and individual branes in the corresponding Type IIB brane configurations during phase transitions associated with Seiberg duality.
- [205] arXiv:2411.16043 (cross-list from eess.SP) [pdf, html, other]
-
Title: Downlink MIMO Channel Estimation from Bits: Recoverability and AlgorithmSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
In frequency division duplex (FDD) massive MIMO systems, a major challenge lies in acquiring the downlink channel state information}\ (CSI) at the base station (BS) from limited feedback sent by the user equipment (UE). To tackle this fundamental task, our contribution is twofold: First, a simple feedback framework is proposed, where a compression and Gaussian dithering-based quantization strategy is adopted at the UE side, and then a maximum likelihood estimator (MLE) is formulated at the BS side. Recoverability of the MIMO channel under the widely used double directional model is established. Specifically, analyses are presented for two compression schemes -- showing one being more overhead-economical and the other computationally lighter at the UE side. Second, to realize the MLE, an alternating direction method of multipliers (ADMM) algorithm is proposed. The algorithm is carefully designed to integrate a sophisticated harmonic retrieval (HR) solver as subroutine, which turns out to be the key of effectively tackling this hard MLE this http URL numerical experiments are conducted to validate the efficacy of our approach.
- [206] arXiv:2411.16052 (cross-list from hep-th) [pdf, html, other]
-
Title: Machine-learning emergent spacetime from linear response in future tabletop quantum gravity experimentsComments: 24 pages, 10 figuresSubjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG)
We introduce a novel interpretable Neural Network (NN) model designed to perform precision bulk reconstruction under the AdS/CFT correspondence. According to the correspondence, a specific condensed matter system on a ring is holographically equivalent to a gravitational system on a bulk disk, through which tabletop quantum gravity experiments may be possible as reported in arXiv:2211.13863. The purpose of this paper is to reconstruct a higher-dimensional gravity metric from the condensed matter system data via machine learning using the NN. Our machine reads spatially and temporarily inhomogeneous linear response data of the condensed matter system, and incorporates a novel layer that implements the Runge-Kutta method to achieve better numerical control. We confirm that our machine can let a higher-dimensional gravity metric be automatically emergent as its interpretable weights, using a linear response of the condensed matter system as data, through supervised machine learning. The developed method could serve as a foundation for generic bulk reconstruction, i.e., a practical solution to the AdS/CFT correspondence, and would be implemented in future tabletop quantum gravity experiments.
- [207] arXiv:2411.16080 (cross-list from cs.CV) [pdf, html, other]
-
Title: Boosting 3D Object Generation through PBR MaterialsComments: Accepted to SIGGRAPH Asia 2024 Conference PapersSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Automatic 3D content creation has gained increasing attention recently, due to its potential in various applications such as video games, film industry, and AR/VR. Recent advancements in diffusion models and multimodal models have notably improved the quality and efficiency of 3D object generation given a single RGB image. However, 3D objects generated even by state-of-the-art methods are still unsatisfactory compared to human-created assets. Considering only textures instead of materials makes these methods encounter challenges in photo-realistic rendering, relighting, and flexible appearance editing. And they also suffer from severe misalignment between geometry and high-frequency texture details. In this work, we propose a novel approach to boost the quality of generated 3D objects from the perspective of Physics-Based Rendering (PBR) materials. By analyzing the components of PBR materials, we choose to consider albedo, roughness, metalness, and bump maps. For albedo and bump maps, we leverage Stable Diffusion fine-tuned on synthetic data to extract these values, with novel usages of these fine-tuned models to obtain 3D consistent albedo UV and bump UV for generated objects. In terms of roughness and metalness maps, we adopt a semi-automatic process to provide room for interactive adjustment, which we believe is more practical. Extensive experiments demonstrate that our model is generally beneficial for various state-of-the-art generation methods, significantly boosting the quality and realism of their generated 3D objects, with natural relighting effects and substantially improved geometry.
- [208] arXiv:2411.16086 (cross-list from cs.DC) [pdf, html, other]
-
Title: HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge PlatformsComments: 7 pages, 8 figures, 1 table, and 1 algorithm. The manuscript is accepted to be published in 28th Design, Automation and Test in Europe Conference (IEEE DATE, 2025)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Edge inference techniques partition and distribute Deep Neural Network (DNN) inference tasks among multiple edge nodes for low latency inference, without considering the core-level heterogeneity of edge nodes. Further, default DNN inference frameworks also do not fully utilize the resources of heterogeneous edge nodes, resulting in higher inference latency. In this work, we propose a hierarchical DNN partitioning strategy (HiDP) for distributed inference on heterogeneous edge nodes. Our strategy hierarchically partitions DNN workloads at both global and local levels by considering the core-level heterogeneity of edge nodes. We evaluated our proposed HiDP strategy against relevant distributed inference techniques over widely used DNN models on commercial edge devices. On average our strategy achieved 38% lower latency, 46% lower energy, and 56% higher throughput in comparison with other relevant approaches.
- [209] arXiv:2411.16098 (cross-list from physics.ao-ph) [pdf, html, other]
-
Title: Global spatio-temporal downscaling of ERA5 precipitation through generative AISubjects: Atmospheric and Oceanic Physics (physics.ao-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Geophysics (physics.geo-ph)
The spatial and temporal distribution of precipitation has a significant impact on human lives by determining freshwater resources and agricultural yield, but also rainfall-driven hazards like flooding or landslides. While the ERA5 reanalysis dataset provides consistent long-term global precipitation information that allows investigations of these impacts, it lacks the resolution to capture the high spatio-temporal variability of precipitation. ERA5 misses intense local rainfall events that are crucial drivers of devastating flooding - a critical limitation since extreme weather events become increasingly frequent. Here, we introduce spateGAN-ERA5, the first deep learning based spatio-temporal downscaling of precipitation data on a global scale. SpateGAN-ERA5 uses a conditional generative adversarial neural network (cGAN) that enhances the resolution of ERA5 precipitation data from 24 km and 1 hour to 2 km and 10 minutes, delivering high-resolution rainfall fields with realistic spatio-temporal patterns and accurate rain rate distribution including extremes. Its computational efficiency enables the generation of a large ensemble of solutions, addressing uncertainties inherent to the challenges of downscaling. Trained solely on data from Germany and validated in the US and Australia considering diverse climate zones, spateGAN-ERA5 demonstrates strong generalization indicating a robust global applicability. SpateGAN-ERA5 fulfils a critical need for high-resolution precipitation data in hydrological and meteorological research, offering new capabilities for flood risk assessment, AI-enhanced weather forecasting, and impact modelling to address climate-driven challenges worldwide.
- [210] arXiv:2411.16120 (cross-list from cs.AI) [pdf, html, other]
-
Title: Why the Agent Made that Decision: Explaining Deep Reinforcement Learning with Vision MasksSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Due to the inherent lack of transparency in deep neural networks, it is challenging for deep reinforcement learning (DRL) agents to gain trust and acceptance from users, especially in safety-critical applications such as medical diagnosis and military operations. Existing methods for explaining an agent's decision either require to retrain the agent using models that support explanation generation or rely on perturbation-based techniques to reveal the significance of different input features in the decision making process. However, retraining the agent may compromise its integrity and performance, while perturbation-based methods have limited performance and lack knowledge accumulation or learning capabilities. Moreover, since each perturbation is performed independently, the joint state of the perturbed inputs may not be physically meaningful. To address these challenges, we introduce $\textbf{VisionMask}$, a standalone explanation model trained end-to-end to identify the most critical regions in the agent's visual input that can explain its actions. VisionMask is trained in a self-supervised manner without relying on human-generated labels. Importantly, its training does not alter the agent model, hence preserving the agent's performance and integrity. We evaluate VisionMask on Super Mario Bros (SMB) and three Atari games. Compared to existing methods, VisionMask achieves a 14.9% higher insertion accuracy and a 30.08% higher F1-Score in reproducing original actions from the selected visual explanations. We also present examples illustrating how VisionMask can be used for counterfactual analysis.
- [211] arXiv:2411.16121 (cross-list from stat.ML) [pdf, html, other]
-
Title: DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized MixingComments: Under review in Elsevier ArraySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In recent years, the growth of data across various sectors, including healthcare, security, finance, and education, has created significant opportunities for analysis and informed decision-making. However, these datasets often contain sensitive and personal information, which raises serious privacy concerns. Protecting individual privacy is crucial, yet many existing machine learning and data publishing algorithms struggle with high-dimensional data, facing challenges related to computational efficiency and privacy preservation. To address these challenges, we introduce an effective data publishing algorithm \emph{DP-CDA}. Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure formal privacy guarantees. Our comprehensive privacy accounting shows that DP-CDA provides a stronger privacy guarantee compared to existing methods, allowing for better utility while maintaining strict level of privacy. To evaluate the effectiveness of DP-CDA, we examine the accuracy of predictive models trained on the synthetic data, which serves as a measure of dataset utility. Importantly, we identify an optimal order of mixing that balances privacy guarantee with predictive accuracy. Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
- [212] arXiv:2411.16156 (cross-list from cs.CV) [pdf, html, other]
-
Title: VideoOrion: Tokenizing Object Dynamics in VideosSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We present VideoOrion, a Video Large Language Model (Video-LLM) that explicitly captures the key semantic information in videos--the spatial-temporal dynamics of objects throughout the videos. VideoOrion employs expert vision models to extract object dynamics through a detect-segment-track pipeline, encoding them into a set of object tokens by aggregating spatial-temporal object features. Our method addresses the persistent challenge in Video-LLMs of efficiently compressing high-dimensional video data into semantic tokens that are comprehensible to LLMs. Compared to prior methods which resort to downsampling the original video or aggregating visual tokens using resamplers, leading to information loss and entangled semantics, VideoOrion not only offers a more natural and efficient way to derive compact, disentangled semantic representations but also enables explicit object modeling of video content with minimal computational cost. Moreover, the introduced object tokens naturally allow VideoOrion to accomplish video-based referring tasks. Experimental results show that VideoOrion can learn to make good use of the object tokens, and achieves competitive results on both general video question answering and video-based referring benchmarks.
- [213] arXiv:2411.16162 (cross-list from cs.CV) [pdf, html, other]
-
Title: Sparse patches adversarial attacks via extrapolating point-wise informationComments: AdvML-Frontiers 24: The 3nd Workshop on New Frontiers in Adversarial Machine Learning, NeurIPS 24Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Sparse and patch adversarial attacks were previously shown to be applicable in realistic settings and are considered a security risk to autonomous systems. Sparse adversarial perturbations constitute a setting in which the adversarial perturbations are limited to affecting a relatively small number of points in the input. Patch adversarial attacks denote the setting where the sparse attacks are limited to a given structure, i.e., sparse patches with a given shape and number. However, previous patch adversarial attacks do not simultaneously optimize multiple patches' locations and perturbations. This work suggests a novel approach for sparse patches adversarial attacks via point-wise trimming dense adversarial perturbations. Our approach enables simultaneous optimization of multiple sparse patches' locations and perturbations for any given number and shape. Moreover, our approach is also applicable for standard sparse adversarial attacks, where we show that it significantly improves the state-of-the-art over multiple extensive settings. A reference implementation of the proposed method and the reported experiments is provided at \url{this https URL}
- [214] arXiv:2411.16195 (cross-list from math.NA) [pdf, html, other]
-
Title: On the Robustness of the Successive Projection AlgorithmComments: 23 pagesSubjects: Numerical Analysis (math.NA); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
The successive projection algorithm (SPA) is a workhorse algorithm to learn the $r$ vertices of the convex hull of a set of $(r-1)$-dimensional data points, a.k.a. a latent simplex, which has numerous applications in data science. In this paper, we revisit the robustness to noise of SPA and several of its variants. In particular, when $r \geq 3$, we prove the tightness of the existing error bounds for SPA and for two more robust preconditioned variants of SPA. We also provide significantly improved error bounds for SPA, by a factor proportional to the conditioning of the $r$ vertices, in two special cases: for the first extracted vertex, and when $r \leq 2$. We then provide further improvements for the error bounds of a translated version of SPA proposed by Arora et al. (''A practical algorithm for topic modeling with provable guarantees'', ICML, 2013) in two special cases: for the first two extracted vertices, and when $r \leq 3$. Finally, we propose a new more robust variant of SPA that first shifts and lifts the data points in order to minimize the conditioning of the problem. We illustrate our results on synthetic data.
- [215] arXiv:2411.16196 (cross-list from cs.CV) [pdf, html, other]
-
Title: Learn from Foundation Model: Fruit Detection Model without Manual AnnotationComments: 17 pages, 12 figures, conference or other essential infoSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (Segmentation-Description-Matching), a stage that leverages two foundation models: SAM2 (Segment Anything in Images and Videos) for segmentation and OpenCLIP (Open Contrastive Language-Image Pretraining) for zero-shot open-vocabulary classification. In the second stage, a novel knowledge distillation mechanism is utilized to distill compact, edge-deployable models from SDM, enhancing both inference speed and perception accuracy. The complete method, termed SDM-D (Segmentation-Description-Matching-Distilling), demonstrates strong performance across various fruit detection tasks object detection, semantic segmentation, and instance segmentation) without manual annotation. It nearly matches the performance of models trained with abundant labels. Notably, SDM-D outperforms open-set detection methods such as Grounding SAM and YOLO-World on all tested fruit detection datasets. Additionally, we introduce MegaFruits, a comprehensive fruit segmentation dataset encompassing over 25,000 images, and all code and datasets are made publicly available at this https URL.
- [216] arXiv:2411.16227 (cross-list from eess.IV) [pdf, html, other]
-
Title: EigenHearts: Cardiac Diseases Classification Using EigenFaces ApproachNourelhouda Groun, Maria Villalba-Orero, Lucia Casado-Martin, Enrique Lara-Pezzi, Eusebio Valero, Soledad Le Clainche, Jesus Garicano-MenaComments: 16 pages, 9 figures, 3 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In the realm of cardiovascular medicine, medical imaging plays a crucial role in accurately classifying cardiac diseases and making precise diagnoses. However, the field faces significant challenges when integrating data science techniques, as a significant volume of images is required for these techniques. As a consequence, it is necessary to investigate different avenues to overcome this challenge. In this contribution, we offer an innovative tool to conquer this limitation. In particular, we delve into the application of a well recognized method known as the EigenFaces approach to classify cardiac diseases. This approach was originally motivated for efficiently representing pictures of faces using principal component analysis, which provides a set of eigenvectors (aka eigenfaces), explaining the variation between face images. As this approach proven to be efficient for face recognition, it motivated us to explore its efficiency on more complicated data bases. In particular, we integrate this approach, with convolutional neural networks (CNNs) to classify echocardiography images taken from mice in five distinct cardiac conditions (healthy, diabetic cardiomyopathy, myocardial infarction, obesity and TAC hypertension). Performing a preprocessing step inspired from the eigenfaces approach on the echocardiography datasets, yields sets of pod modes, which we will call eigenhearts. To demonstrate the proposed approach, we compare two testcases: (i) supplying the CNN with the original images directly, (ii) supplying the CNN with images projected into the obtained pod modes. The results show a substantial and noteworthy enhancement when employing SVD for pre-processing, with classification accuracy increasing by approximately 50%.
- [217] arXiv:2411.16229 (cross-list from stat.ML) [pdf, html, other]
-
Title: Effective Non-Random Extreme Learning MachineSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The Extreme Learning Machine (ELM) is a growing statistical technique widely applied to regression problems. In essence, ELMs are single-layer neural networks where the hidden layer weights are randomly sampled from a specific distribution, while the output layer weights are learned from the data. Two of the key challenges with this approach are the architecture design, specifically determining the optimal number of neurons in the hidden layer, and the method's sensitivity to the random initialization of hidden layer weights.
This paper introduces a new and enhanced learning algorithm for regression tasks, the Effective Non-Random ELM (ENR-ELM), which simplifies the architecture design and eliminates the need for random hidden layer weight selection. The proposed method incorporates concepts from signal processing, such as basis functions and projections, into the ELM framework. We introduce two versions of the ENR-ELM: the approximated ENR-ELM and the incremental ENR-ELM. Experimental results on both synthetic and real datasets demonstrate that our method overcomes the problems of traditional ELM while maintaining comparable predictive performance. - [218] arXiv:2411.16234 (cross-list from hep-ph) [pdf, html, other]
-
Title: Flow Annealed Importance Sampling Bootstrap meets Differentiable Particle PhysicsComments: Accepted at the 'Machine Learning and the Physical Sciences 2024' workshop at NeurIPSSubjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an)
High-energy physics requires the generation of large numbers of simulated data samples from complex but analytically tractable distributions called matrix elements. Surrogate models, such as normalizing flows, are gaining popularity for this task due to their computational efficiency. We adopt an approach based on Flow Annealed importance sampling Bootstrap (FAB) that evaluates the differentiable target density during training and helps avoid the costly generation of training data in advance. We show that FAB reaches higher sampling efficiency with fewer target evaluations in high dimensions in comparison to other methods.
- [219] arXiv:2411.16246 (cross-list from stat.ML) [pdf, html, other]
-
Title: Efficient pooling of predictions via kernel embeddingsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Probabilistic predictions are probability distributions over the set of possible outcomes. Such predictions quantify the uncertainty in the outcome, making them essential for effective decision making. By combining multiple predictions, the information sources used to generate the predictions are pooled, often resulting in a more informative forecast. Probabilistic predictions are typically combined by linearly pooling the individual predictive distributions; this encompasses several ensemble learning techniques, for example. The weights assigned to each prediction can be estimated based on their past performance, allowing more accurate predictions to receive a higher weight. This can be achieved by finding the weights that optimise a proper scoring rule over some training data. By embedding predictions into a Reproducing Kernel Hilbert Space (RKHS), we illustrate that estimating the linear pool weights that optimise kernel-based scoring rules is a convex quadratic optimisation problem. This permits an efficient implementation of the linear pool when optimally combining predictions on arbitrary outcome domains. This result also holds for other combination strategies, and we additionally study a flexible generalisation of the linear pool that overcomes some of its theoretical limitations, whilst allowing an efficient implementation within the RKHS framework. These approaches are compared in an application to operational wind speed forecasts, where this generalisation is found to offer substantial improvements upon the traditional linear pool.
- [220] arXiv:2411.16251 (cross-list from cs.CL) [pdf, html, other]
-
Title: Transparent Neighborhood Approximation for Text Classifier ExplanationComments: IEEE DSAA'24Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Recent literature highlights the critical role of neighborhood construction in deriving model-agnostic explanations, with a growing trend toward deploying generative models to improve synthetic instance quality, especially for explaining text classifiers. These approaches overcome the challenges in neighborhood construction posed by the unstructured nature of texts, thereby improving the quality of explanations. However, the deployed generators are usually implemented via neural networks and lack inherent explainability, sparking arguments over the transparency of the explanation process itself. To address this limitation while preserving neighborhood quality, this paper introduces a probability-based editing method as an alternative to black-box text generators. This approach generates neighboring texts by implementing manipulations based on in-text contexts. Substituting the generator-based construction process with recursive probability-based editing, the resultant explanation method, XPROB (explainer with probability-based editing), exhibits competitive performance according to the evaluation conducted on two real-world datasets. Additionally, XPROB's fully transparent and more controllable construction process leads to superior stability compared to the generator-based explainers.
- [221] arXiv:2411.16267 (cross-list from eess.SY) [pdf, html, other]
-
Title: Local Bayesian Optimization for Controller Tuning with Crash ConstraintsComments: Published in at-AutomatisierungstechnikJournal-ref: von Rohr, Alexander, Stenger, David, Scheurenberg, Dominik and Trimpe, Sebastian. "Local Bayesian optimization for controller tuning with crash constraints" at - Automatisierungstechnik, vol. 72, no. 4, 2024, pp. 281-292Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Controller tuning is crucial for closed-loop performance but often involves manual adjustments. Although Bayesian optimization (BO) has been established as a data-efficient method for automated tuning, applying it to large and high-dimensional search spaces remains challenging. We extend a recently proposed local variant of BO to include crash constraints, where the controller can only be successfully evaluated in an a-priori unknown feasible region. We demonstrate the efficiency of the proposed method through simulations and hardware experiments. Our findings showcase the potential of local BO to enhance controller performance and reduce the time and resources necessary for tuning.
- [222] arXiv:2411.16273 (cross-list from eess.SY) [pdf, html, other]
-
Title: Deep Learning for Motion Classification in Ankle Exoskeletons Using Surface EMG and IMU SignalsSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Ankle exoskeletons have garnered considerable interest for their potential to enhance mobility and reduce fall risks, particularly among the aging population. The efficacy of these devices relies on accurate real-time prediction of the user's intended movements through sensor-based inputs. This paper presents a novel motion prediction framework that integrates three Inertial Measurement Units (IMUs) and eight surface Electromyography (sEMG) sensors to capture both kinematic and muscular activity data. A comprehensive set of activities, representative of everyday movements in barrier-free environments, was recorded for the purpose. Our findings reveal that Convolutional Neural Networks (CNNs) slightly outperform Long Short-Term Memory (LSTM) networks on a dataset of five motion tasks, achieving classification accuracies of $96.5 \pm 0.8 \%$ and $87.5 \pm 2.9 \%$, respectively. Furthermore, we demonstrate the system's proficiency in transfer learning, enabling accurate motion classification for new subjects using just ten samples per class for finetuning. The robustness of the model is demonstrated by its resilience to sensor failures resulting in absent signals, maintaining reliable performance in real-world scenarios. These results underscore the potential of deep learning algorithms to enhance the functionality and safety of ankle exoskeletons, ultimately improving their usability in daily life.
- [223] arXiv:2411.16301 (cross-list from cs.CV) [pdf, html, other]
-
Title: DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design GenerationComments: 32 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Interior design is a complex and creative discipline involving aesthetics, functionality, ergonomics, and materials science. Effective solutions must meet diverse requirements, typically producing multiple deliverables such as renderings and design drawings from various perspectives. Consequently, interior design processes are often inefficient and demand significant creativity. With advances in machine learning, generative models have emerged as a promising means of improving efficiency by creating designs from text descriptions or sketches. However, few generative works focus on interior design, leading to substantial discrepancies between outputs and practical needs, such as differences in size, spatial scope, and the lack of controllable generation quality. To address these challenges, we propose DiffDesign, a controllable diffusion model with meta priors for efficient interior design generation. Specifically, we utilize the generative priors of a 2D diffusion model pre-trained on a large image dataset as our rendering backbone. We further guide the denoising process by disentangling cross-attention control over design attributes, such as appearance, pose, and size, and introduce an optimal transfer-based alignment module to enforce view consistency. Simultaneously, we construct an interior design-specific dataset, DesignHelper, consisting of over 400 solutions across more than 15 spatial types and 15 design styles. This dataset helps fine-tune DiffDesign. Extensive experiments conducted on various benchmark datasets demonstrate the effectiveness and robustness of DiffDesign.
- [224] arXiv:2411.16305 (cross-list from cs.CL) [pdf, html, other]
-
Title: Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog SystemsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Task-oriented Dialog (ToD) systems have to solve multiple subgoals to accomplish user goals, whereas feedback is often obtained only at the end of the dialog. In this work, we propose SUIT (SUbgoal-aware ITerative Training), an iterative training approach for improving ToD systems. We sample dialogs from the model we aim to improve and determine subgoals that contribute to dialog success using distant supervision to obtain high quality training samples. We show how this data improves supervised fine-tuning or, alternatively, preference learning results. SUIT is able to iteratively generate more data instead of relying on fixed static sets. SUIT reaches new state-of-the-art performance on a popular ToD benchmark.
- [225] arXiv:2411.16313 (cross-list from cs.AI) [pdf, html, other]
-
Title: CATP-LLM: Empowering Large Language Models for Cost-Aware Tool PlanningComments: In submissionSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g. vision models) to tackle complex tasks based on task descriptions. To push this paradigm toward practical applications, it is crucial for LLMs to consider tool execution costs (e.g. execution time) for tool planning. Unfortunately, prior studies overlook the tool execution costs, leading to the generation of expensive plans of which the costs outweigh task performance. To fill this gap, we propose the Cost-Aware Tool Planning with LLMs (CATP-LLM) framework, which for the first time provides a coherent design to empower LLMs for cost-aware tool planning. Specifically, CATP-LLM incorporates a tool planning language to enhance the LLM to generate non-sequential plans of multiple branches for efficient concurrent tool execution and cost reduction. Moreover, it further designs a cost-aware offline reinforcement learning algorithm to fine-tune the LLM to optimize the performance-cost trade-off in tool planning. In lack of public cost-related datasets, we further present OpenCATP, the first platform for cost-aware planning evaluation. Experiments on OpenCATP show that CATP-LLM outperforms GPT-4 even when using Llama2-7B as its backbone, with the average improvement of 28.2%-30.2% higher plan performance and 24.7%-45.8% lower costs even on the challenging planning tasks. The codes of CATP-LLM and OpenCATP will be publicly available.
- [226] arXiv:2411.16339 (cross-list from astro-ph.SR) [pdf, html, other]
-
Title: Solaris: A Foundation Model of the SunSubjects: Solar and Stellar Astrophysics (astro-ph.SR); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Space Physics (physics.space-ph)
Foundation models have demonstrated remarkable success across various scientific domains, motivating our exploration of their potential in solar physics. In this paper, we present Solaris, the first foundation model for forecasting the Sun's atmosphere. We leverage 13 years of full-disk, multi-wavelength solar imagery from the Solar Dynamics Observatory, spanning a complete solar cycle, to pre-train Solaris for 12-hour interval forecasting. Solaris is built on a large-scale 3D Swin Transformer architecture with 109 million parameters. We demonstrate Solaris' ability to generalize by fine-tuning on a low-data regime using a single wavelength (1700 Å), that was not included in pre-training, outperforming models trained from scratch on this specific wavelength. Our results indicate that Solaris can effectively capture the complex dynamics of the solar atmosphere and transform solar forecasting.
- [227] arXiv:2411.16370 (cross-list from cs.CV) [pdf, html, other]
-
Title: A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image SegmentationComments: 20 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Advancements in image segmentation play an integral role within the greater scope of Deep Learning-based computer vision. Furthermore, their widespread applicability in critical real-world tasks has given rise to challenges related to the reliability of such algorithms. Hence, uncertainty quantification has been extensively studied within this context, enabling expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision making. Due to the rapid adoption of Convolutional Neural Network (CNN)-based segmentation models in high-stake applications, a substantial body of research has been published on this very topic, causing its swift expansion into a distinct field. This work provides a comprehensive overview of probabilistic segmentation by discussing fundamental concepts in uncertainty that govern advancements in the field as well as the application to various tasks. We identify that quantifying aleatoric and epistemic uncertainty approximates Bayesian inference w.r.t. to either latent variables or model parameters, respectively. Moreover, literature on both uncertainties trace back to four key applications; (1) to quantify statistical inconsistencies in the annotation process due ambiguous images, (2) correlating prediction error with uncertainty, (3) expanding the model hypothesis space for better generalization, and (4) active learning. Then, a discussion follows that includes an overview of utilized datasets for each of the applications and comparison of the available methods. We also highlight challenges related to architectures, uncertainty-based active learning, standardization and benchmarking, and recommendations for future work such as methods based on single forward passes and models that appropriately leverage volumetric data.
- [228] arXiv:2411.16396 (cross-list from quant-ph) [pdf, html, other]
-
Title: Statistical inference for quantum singular modelsComments: 57 pages, 8 figuresSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Algebraic Geometry (math.AG); Machine Learning (stat.ML)
Deep learning has seen substantial achievements, with numerical and theoretical evidence suggesting that singularities of statistical models are considered a contributing factor to its performance. From this remarkable success of classical statistical models, it is naturally expected that quantum singular models will play a vital role in many quantum statistical tasks. However, while the theory of quantum statistical models in regular cases has been established, theoretical understanding of quantum singular models is still limited. To investigate the statistical properties of quantum singular models, we focus on two prominent tasks in quantum statistical inference: quantum state estimation and model selection. In particular, we base our study on classical singular learning theory and seek to extend it within the framework of Bayesian quantum state estimation. To this end, we define quantum generalization and training loss functions and give their asymptotic expansions through algebraic geometrical methods. The key idea of the proof is the introduction of a quantum analog of the likelihood function using classical shadows. Consequently, we construct an asymptotically unbiased estimator of the quantum generalization loss, the quantum widely applicable information criterion (QWAIC), as a computable model selection metric from given measurement outcomes.
- [229] arXiv:2411.16421 (cross-list from cs.CV) [pdf, html, other]
-
Title: Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper presents the Digital Typhoon Dataset V2, a new version of the longest typhoon satellite image dataset for 40+ years aimed at benchmarking machine learning models for long-term spatio-temporal data. The new addition in Dataset V2 is tropical cyclone data from the southern hemisphere, in addition to the northern hemisphere data in Dataset V1. Having data from two hemispheres allows us to ask new research questions about regional differences across basins and hemispheres. We also discuss new developments in representations and tasks of the dataset. We first introduce a self-supervised learning framework for representation learning. Combined with the LSTM model, we discuss performance on intensity forecasting and extra-tropical transition forecasting tasks. We then propose new tasks, such as the typhoon center estimation task. We show that an object detection-based model performs better for stronger typhoons. Finally, we study how machine learning models can generalize across basins and hemispheres, by training the model on the northern hemisphere data and testing it on the southern hemisphere data. The dataset is publicly available at \url{this http URL} and \url{this https URL}.
- [230] arXiv:2411.16437 (cross-list from cs.CV) [pdf, html, other]
-
Title: Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial AttackComments: Accepted at Safe Generative AI Workshop (NeurIPS 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The growing demand for customized visual content has led to the rise of personalized text-to-image (T2I) diffusion models. Despite their remarkable potential, they pose significant privacy risk when misused for malicious purposes. In this paper, we propose a novel and efficient adversarial attack method, Concept Protection by Selective Attention Manipulation (CoPSAM) which targets only the cross-attention layers of a T2I diffusion model. For this purpose, we carefully construct an imperceptible noise to be added to clean samples to get their adversarial counterparts. This is obtained during the fine-tuning process by maximizing the discrepancy between the corresponding cross-attention maps of the user-specific token and the class-specific token, respectively. Experimental validation on a subset of CelebA-HQ face images dataset demonstrates that our approach outperforms existing methods. Besides this, our method presents two important advantages derived from the qualitative evaluation: (i) we obtain better protection results for lower noise levels than our competitors; and (ii) we protect the content from unauthorized use thereby protecting the individual's identity from potential misuse.
- [231] arXiv:2411.16466 (cross-list from cs.CV) [pdf, html, other]
-
Title: No Identity, no problem: Motion through detection for people trackingComments: Accepted in TMLR November 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Tracking-by-detection has become the de facto standard approach to people tracking. To increase robustness, some approaches incorporate re-identification using appearance models and regressing motion offset, which requires costly identity annotations. In this paper, we propose exploiting motion clues while providing supervision only for the detections, which is much easier to do. Our algorithm predicts detection heatmaps at two different times, along with a 2D motion estimate between the two images. It then warps one heatmap using the motion estimate and enforces consistency with the other one. This provides the required supervisory signal on the motion without the need for any motion annotations. In this manner, we couple the information obtained from different images during training and increase accuracy, especially in crowded scenes and when using low frame-rate sequences. We show that our approach delivers state-of-the-art results for single- and multi-view multi-target tracking on the MOT17 and WILDTRACK datasets.
- [232] arXiv:2411.16475 (cross-list from eess.SY) [pdf, html, other]
-
Title: NonSysId: A nonlinear system identification package with improved model term selection for NARMAX modelsComments: 14 pages, 7 figuresSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
System identification involves constructing mathematical models of dynamic systems using input-output data, enabling analysis and prediction of system behaviour in both time and frequency domains. This approach can model the entire system or capture specific dynamics within it. For meaningful analysis, it is essential for the model to accurately reflect the underlying system's behaviour. This paper introduces NonSysId, an open-sourced MATLAB software package designed for nonlinear system identification, specifically focusing on NARMAX models. The software incorporates an advanced term selection methodology that prioritises on simulation (free-run) accuracy while preserving model parsimony. A key feature is the integration of iterative Orthogonal Forward Regression (iOFR) with Predicted Residual Sum of Squares (PRESS) statistic-based term selection, facilitating robust model generalisation without the need for a separate validation dataset. Furthermore, techniques for reducing computational overheads are implemented. These features make NonSysId particularly suitable for real-time applications such as structural health monitoring, fault diagnosis, and biomedical signal processing, where it is a challenge to capture the signals under consistent conditions, resulting in limited or no validation data.
- [233] arXiv:2411.16483 (cross-list from cond-mat.mtrl-sci) [pdf, html, other]
-
Title: Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End ApproachComments: 8 pages, 3 figuresSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Predicting electronic band structures from crystal structures is crucial for understanding structure-property correlations in materials science. First-principles approaches are accurate but computationally intensive. Recent years, machine learning (ML) has been extensively applied to this field, while existing ML models predominantly focus on band gap predictions or indirect band structure estimation via solving predicted Hamiltonians. An end-to-end model to predict band structure accurately and efficiently is still lacking. Here, we introduce a graph Transformer-based end-to-end approach that directly predicts band structures from crystal structures with high accuracy. Our method leverages the continuity of the k-path and treat continuous bands as a sequence. We demonstrate that our model not only provides accurate band structure predictions but also can derive other properties (such as band gap, band center, and band dispersion) with high accuracy. We verify the model performance on large and diverse datasets.
- [234] arXiv:2411.16498 (cross-list from cs.CV) [pdf, html, other]
-
Title: Multi-Resolution Generative Modeling of Human Motion from Limited DataComments: 1O pages, 7 figures, published in European Conference on Visual Media Production CVMP 24Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
We present a generative model that learns to synthesize human motion from limited training sequences. Our framework provides conditional generation and blending across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our model contains a set of generative and adversarial networks, along with embedding modules, each tailored for generating motions at specific frame rates while exerting control over their content and details. Notably, our approach also extends to the synthesis of co-speech gestures, demonstrating its ability to generate synchronized gestures from speech inputs, even with limited paired data. Through direct synthesis of SMPL pose parameters, our approach avoids test-time adjustments to fit human body meshes. Experimental results showcase our model's ability to achieve extensive coverage of training examples, while generating diverse motions, as indicated by local and global diversity metrics.
- [235] arXiv:2411.16509 (cross-list from cs.MS) [pdf, html, other]
-
Title: Jaya R Package -- A Parameter-Free Solution for Advanced Single and Multi-Objective OptimizationSubjects: Mathematical Software (cs.MS); Machine Learning (cs.LG)
The Jaya R package offers a robust and versatile implementation of the parameter-free Jaya optimization algorithm, suitable for solving both single-objective and multi-objective optimization problems. By integrating advanced features such as constraint handling, adaptive population management, Pareto front tracking for multi-objective trade-offs, and parallel processing for computational efficiency, the package caters to a wide range of optimization challenges. Its intuitive design and flexibility allow users to solve complex, real-world problems across various domains. To demonstrate its practical utility, a case study on energy modeling explores the optimization of renewable energy shares, showcasing the package's ability to minimize carbon emissions and costs while enhancing system reliability. The Jaya R package is an invaluable tool for researchers and practitioners seeking efficient and adaptive optimization solutions.
- [236] arXiv:2411.16556 (cross-list from astro-ph.IM) [pdf, html, other]
-
Title: Anomaly Detection and RFI Classification with Unsupervised Learning in Narrowband Radio Technosignature SearchesBen Jacobson-Bell, Steve Croft, Carmen Choza, Alex Andersson, Daniel Bautista, Vishal Gajjar, Matthew Lebofsky, David H. E. MacMahon, Caleb Painter, Andrew P. V. SiemionComments: 20 pages, 14 figures, submitted to AJSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
The search for radio technosignatures is an anomaly detection problem: candidate signals represent needles of interest in the proverbial haystack of radio-frequency interference (RFI). Current search frameworks find an enormity of false-positive signals, especially in large surveys, requiring manual follow-up to a sometimes prohibitive degree. Unsupervised learning provides an algorithmic way to winnow the most anomalous signals from the chaff, as well as group together RFI signals that bear morphological similarities. We present GLOBULAR (Grouping Low-frequency Observations By Unsupervised Learning After Reduction) clustering, a signal processing method that uses HDBSCAN to reduce the false-positive rate and isolate outlier signals for further analysis. When combined with a standard narrowband signal detection and spatial filtering pipeline, such as turboSETI, GLOBULAR clustering offers significant improvements in the false-positive rate over the standard pipeline alone, suggesting dramatic potential for the amelioration of manual follow-up requirements for future large surveys. By removing RFI signals in regions of high spectral occupancy, GLOBULAR clustering may also enable the detection of signals missed by the standard pipeline. We benchmark our method against the Choza et al. (2024) turboSETI-only search of 97 nearby galaxies at L-band, demonstrating a false-positive hit reduction rate of 93.1% and a false-positive event reduction rate of 99.3%.
- [237] arXiv:2411.16560 (cross-list from quant-ph) [pdf, html, other]
-
Title: Quantum Circuit Training with Growth-Based ArchitecturesComments: 14 pages, 8 figuresSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
This study introduces growth-based training strategies that incrementally increase parameterized quantum circuit (PQC) depth during training, mitigating overfitting and managing model complexity dynamically. We develop three distinct methods: Block Growth, Sequential Feature Map Growth, and Interleave Feature Map Growth, which add reuploader blocks to PQCs adaptively, expanding the accessible frequency spectrum of the model in response to training needs. This approach enables PQCs to achieve more stable convergence and generalization, even in noisy settings. We evaluate our methods on regression tasks and the 2D Laplace equation, demonstrating that dynamic growth methods outperform traditional, fixed-depth approaches, achieving lower final losses and reduced variance between runs. These findings underscore the potential of growth-based PQCs for quantum scientific machine learning (QSciML) applications, where balancing expressivity and stability is essential.
- [238] arXiv:2411.16579 (cross-list from cs.CL) [pdf, html, other]
-
Title: Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time SupervisionZhiheng Xi, Dingwen Yang, Jixuan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, Shihan Do, Wenyu Zhan, Xiao Wang, Rui Zheng, Tao Ji, Xiaowei Shi, Yitao Zhai, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Zuxuan Wu, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Yu-Gang JiangComments: PreprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors such as initial accuracy, question difficulty, and the lack of external feedback. In this paper, we delve into a two-player paradigm that separates the roles of reasoning and critique models, where the critique model provides step-level feedback to supervise the reasoning (actor) model during both test-time and train-time. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data, resulting in a dataset of $76,321$ responses paired with step-level feedback. Fine-tuning language models with this dataset enables them to generate natural language feedback for mathematical reasoning. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time, especially when scaling up inference-time computation. Motivated by these findings, we introduce the critique-based supervision to the actor's self-training process, and propose a critique-in-the-loop self-improvement method. Experiments show that the method improves the actor's exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger reasoning model. Lastly, we take the preliminary step to explore training self-talk reasoning models via critique supervision and showcase its potential. Our code and datasets are at \href{this https URL}{this https URL}.
- [239] arXiv:2411.16586 (cross-list from stat.ML) [pdf, html, other]
-
Title: Alpha Entropy Search for New Information-based Bayesian OptimizationComments: 31 pages, 12 figures, 3 tables, Journal KBSSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Bayesian optimization (BO) methods based on information theory have obtained state-of-the-art results in several tasks. These techniques heavily rely on the Kullback-Leibler (KL) divergence to compute the acquisition function. In this work, we introduce a novel information-based class of acquisition functions for BO called Alpha Entropy Search (AES). AES is based on the {\alpha}-divergence, that generalizes the KL divergence. Iteratively, AES selects the next evaluation point as the one whose associated target value has the highest level of the dependency with respect to the location and associated value of the global maximum of the optimization problem. Dependency is measured in terms of the {\alpha}-divergence, as an alternative to the KL divergence. Intuitively, this favors the evaluation of the objective function at the most informative points about the global maximum. The {\alpha}-divergence has a free parameter {\alpha}, which determines the behavior of the divergence, trading-off evaluating differences between distributions at a single mode, and evaluating differences globally. Therefore, different values of {\alpha} result in different acquisition functions. AES acquisition lacks a closed-form expression. However, we propose an efficient and accurate approximation using a truncated Gaussian distribution. In practice, the value of {\alpha} can be chosen by the practitioner, but here we suggest to use a combination of acquisition functions obtained by simultaneously considering a range of values of {\alpha}. We provide an implementation of AES in BOTorch and we evaluate its performance in both synthetic, benchmark and real-world experiments involving the tuning of the hyper-parameters of a deep neural network. These experiments show that the performance of AES is competitive with respect to other information-based acquisition functions such as JES, MES or PES.
- [240] arXiv:2411.16598 (cross-list from cs.CR) [pdf, html, other]
-
Title: Unlocking The Potential of Adaptive Attacks on Diffusion-Based PurificationSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Diffusion-based purification (DBP) is a defense against adversarial examples (AEs), amassing popularity for its ability to protect classifiers in an attack-oblivious manner and resistance to strong adversaries with access to the defense. Its robustness has been claimed to ensue from the reliance on diffusion models (DMs) that project the AEs onto the natural distribution. We revisit this claim, focusing on gradient-based strategies that back-propagate the loss gradients through the defense, commonly referred to as ``adaptive attacks". Analytically, we show that such an optimization method invalidates DBP's core foundations, effectively targeting the DM rather than the classifier and restricting the purified outputs to a distribution over malicious samples instead. Thus, we reassess the reported empirical robustness, uncovering implementation flaws in the gradient back-propagation techniques used thus far for DBP. We fix these issues, providing the first reliable gradient library for DBP and demonstrating how adaptive attacks drastically degrade its robustness. We then study a less efficient yet stricter majority-vote setting where the classifier evaluates multiple purified copies of the input to make its decision. Here, DBP's stochasticity enables it to remain partially robust against traditional norm-bounded AEs. We propose a novel adaptation of a recent optimization method against deepfake watermarking that crafts systemic malicious perturbations while ensuring imperceptibility. When integrated with the adaptive attack, it completely defeats DBP, even in the majority-vote setup. Our findings prove that DBP, in its current state, is not a viable defense against AEs.
- [241] arXiv:2411.16600 (cross-list from cs.DS) [pdf, html, other]
-
Title: Approximation Algorithms for Combinatorial Optimization with PredictionsSubjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
We initiate a systematic study of utilizing predictions to improve over approximation guarantees of classic algorithms, without increasing the running time. We propose a systematic method for a wide class of optimization problems that ask to select a feasible subset of input items of minimal (or maximal) total weight. This gives simple (near-)linear time algorithms for, e.g., Vertex Cover, Steiner Tree, Min-Weight Perfect Matching, Knapsack, and Clique. Our algorithms produce optimal solutions when provided with perfect predictions and their approximation ratios smoothly degrade with increasing prediction error. With small enough prediction error we achieve approximation guarantees that are beyond reach without predictions in the given time bounds, as exemplified by the NP-hardness and APX-hardness of many of the above problems. Although we show our approach to be optimal for this class of problems as a whole, there is a potential for exploiting specific structural properties of individual problems to obtain improved bounds; we demonstrate this on the Steiner Tree problem. We conclude with an empirical evaluation of our approach.
- [242] arXiv:2411.16627 (cross-list from cs.RO) [pdf, html, other]
-
Title: Inference-Time Policy Steering through Human InteractionsYanwei Wang, Lirui Wang, Yilun Du, Balakumar Sundaralingam, Xuning Yang, Yu-Wei Chao, Claudia Perez-D'Arpino, Dieter Fox, Julie ShahSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Generative policies trained with human demonstrations can autonomously accomplish multimodal, long-horizon tasks. However, during inference, humans are often removed from the policy execution loop, limiting the ability to guide a pre-trained policy towards a specific sub-goal or trajectory shape among multiple predictions. Naive human intervention may inadvertently exacerbate distribution shift, leading to constraint violations or execution failures. To better align policy output with human intent without inducing out-of-distribution errors, we propose an Inference-Time Policy Steering (ITPS) framework that leverages human interactions to bias the generative sampling process, rather than fine-tuning the policy on interaction data. We evaluate ITPS across three simulated and real-world benchmarks, testing three forms of human interaction and associated alignment distance metrics. Among six sampling strategies, our proposed stochastic sampling with diffusion policy achieves the best trade-off between alignment and distribution shift. Videos are available at this https URL.
- [243] arXiv:2411.16645 (cross-list from cs.IR) [pdf, html, other]
-
Title: Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that MattersSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
In the area of recommender systems, the vast majority of research efforts is spent on developing increasingly sophisticated recommendation models, also using increasingly more computational resources. Unfortunately, most of these research efforts target a very small set of application domains, mostly e-commerce and media recommendation. Furthermore, many of these models are never evaluated with users, let alone put into practice. The scientific, economic and societal value of much of these efforts by scholars therefore remains largely unclear. To achieve a stronger positive impact resulting from these efforts, we posit that we as a research community should more often address use cases where recommender systems contribute to societal good (RS4Good). In this opinion piece, we first discuss a number of examples where the use of recommender systems for problems of societal concern has been successfully explored in the literature. We then proceed by outlining a paradigmatic shift that is needed to conduct successful RS4Good research, where the key ingredients are interdisciplinary collaborations and longitudinal evaluation approaches with humans in the loop.
- [244] arXiv:2411.16646 (cross-list from cs.CL) [pdf, html, other]
-
Title: Self-Generated Critiques Boost Reward Modeling for Language ModelsYue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui HouComments: 20 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF). However, current reward models mainly produce scalar scores and struggle to incorporate critiques in a natural language format. We hypothesize that predicting both critiques and the scalar reward would improve reward modeling ability. Motivated by this, we propose Critic-RM, a framework that improves reward models using self-generated critiques without extra supervision. Critic-RM employs a two-stage process: generating and filtering high-quality critiques, followed by joint fine-tuning on reward prediction and critique generation. Experiments across benchmarks show that Critic-RM improves reward modeling accuracy by 3.7%-7.3% compared to standard reward models and LLM judges, demonstrating strong performance and data efficiency. Additional studies further validate the effectiveness of generated critiques in rectifying flawed reasoning steps with 2.5%-3.2% gains in improving reasoning accuracy.
- [245] arXiv:2411.16658 (cross-list from stat.ML) [pdf, html, other]
-
Title: Fast training of large kernel models with delayed projectionsComments: arXiv admin note: text overlap with arXiv:2302.02605Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. We validate our algorithm, EigenPro4, across multiple datasets, demonstrating drastic training speed up over the existing methods while maintaining comparable or better classification accuracy.
- [246] arXiv:2411.16663 (cross-list from stat.ML) [pdf, html, other]
-
Title: Gaussian Process Priors for Boundary Value Problems of Linear Partial Differential EquationsComments: 25 pages, 19 figures. Code available at $\href{this https URL}{\text{this https URL}}$. The paper and all ancillary files are released under CC-BYSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Commutative Algebra (math.AC); Numerical Analysis (math.NA)
Solving systems of partial differential equations (PDEs) is a fundamental task in computational science, traditionally addressed by numerical solvers. Recent advancements have introduced neural operators and physics-informed neural networks (PINNs) to tackle PDEs, achieving reduced computational costs at the expense of solution quality and accuracy. Gaussian processes (GPs) have also been applied to linear PDEs, with the advantage of always yielding precise solutions. In this work, we propose Boundary Ehrenpreis-Palamodov Gaussian Processes (B-EPGPs), a novel framework for constructing GP priors that satisfy both general systems of linear PDEs with constant coefficients and linear boundary conditions. We explicitly construct GP priors for representative PDE systems with practical boundary conditions. Formal proofs of correctness are provided and empirical results demonstrating significant accuracy improvements over state-of-the-art neural operator approaches.
- [247] arXiv:2411.16666 (cross-list from stat.ML) [pdf, html, other]
-
Title: CatNet: Effective FDR Control in LSTM with Gaussian Mirrors and SHAP Feature ImportanceSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistical Finance (q-fin.ST)
We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM with the Gaussian Mirror (GM) method. To evaluate the feature importance of LSTM in time series, we introduce a vector of the derivative of the SHapley Additive exPlanations (SHAP) to measure feature importance. We also propose a new kernel-based dependence measure to avoid multicollinearity in the GM algorithm, to make a robust feature selection with controlled FDR. We use simulated data to evaluate CatNet's performance in both linear models and LSTM models with different link functions. The algorithm effectively controls the FDR while maintaining a high statistical power in all cases. We also evaluate the algorithm's performance in different low-dimensional and high-dimensional cases, demonstrating its robustness in various input dimensions. To evaluate CatNet's performance in real world applications, we construct a multi-factor investment portfolio to forecast the prices of S\&P 500 index components. The results demonstrate that our model achieves superior predictive accuracy compared to traditional LSTM models without feature selection and FDR control. Additionally, CatNet effectively captures common market-driving features, which helps informed decision-making in financial markets by enhancing the interpretability of predictions. Our study integrates of the Gaussian Mirror algorithm with LSTM models for the first time, and introduces SHAP values as a new feature importance metric for FDR control methods, marking a significant advancement in feature selection and error control for neural networks.
- [248] arXiv:2411.16680 (cross-list from cs.CV) [pdf, html, other]
-
Title: Quark: Real-time, High-resolution, and General Neural View SynthesisJohn Flynn, Michael Broxton, Lukas Murmann, Lucy Chai, Matthew DuVall, Clément Godard, Kathryn Heal, Srinivas Kaza, Stephen Lombardi, Xuan Luo, Supreeth Achar, Kira Prabhu, Tiancheng Sun, Lynn Tsai, Ryan OverbeckComments: SIGGRAPH Asia 2024 camera ready version; project page this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
We present a novel neural algorithm for performing high-quality, high-resolution, real-time novel view synthesis. From a sparse set of input RGB images or videos streams, our network both reconstructs the 3D scene and renders novel views at 1080p resolution at 30fps on an NVIDIA A100. Our feed-forward network generalizes across a wide variety of datasets and scenes and produces state-of-the-art quality for a real-time method. Our quality approaches, and in some cases surpasses, the quality of some of the top offline methods. In order to achieve these results we use a novel combination of several key concepts, and tie them together into a cohesive and effective algorithm. We build on previous works that represent the scene using semi-transparent layers and use an iterative learned render-and-refine approach to improve those layers. Instead of flat layers, our method reconstructs layered depth maps (LDMs) that efficiently represent scenes with complex depth and occlusions. The iterative update steps are embedded in a multi-scale, UNet-style architecture to perform as much compute as possible at reduced resolution. Within each update step, to better aggregate the information from multiple input views, we use a specialized Transformer-based network component. This allows the majority of the per-input image processing to be performed in the input image space, as opposed to layer space, further increasing efficiency. Finally, due to the real-time nature of our reconstruction and rendering, we dynamically create and discard the internal 3D geometry for each frame, generating the LDM for each view. Taken together, this produces a novel and effective algorithm for view synthesis. Through extensive evaluation, we demonstrate that we achieve state-of-the-art quality at real-time rates. Project page: this https URL
Cross submissions (showing 109 of 109 entries)
- [249] arXiv:2008.12410 (replaced) [pdf, html, other]
-
Title: Deep sr-DDL: Deep Structurally Regularized Dynamic Dictionary Learning to Integrate Multimodal and Dynamic Functional Connectomics data for Multidimensional Clinical CharacterizationsNiharika Shimona D'Souza, Mary Beth Nebel, Deana Crocetti, Nicholas Wymbs, Joshua Robinson, Stewart H. Mostofsky, Archana VenkataramanSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
We propose a novel integrated framework that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract biomarkers of brain connectivity predictive of behavior. Our framework couples a generative model of the connectomics data with a deep network that predicts behavioral scores. The generative component is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networks and time varying subject-specific loadings. We use the DTI tractography to regularize this matrix factorization and learn anatomically informed functional connectivity profiles. The deep component of our framework is an LSTM-ANN block, which uses the temporal evolution of the subject-specific sr-DDL loadings to predict multidimensional clinical characterizations. Our joint optimization strategy collectively estimates the basis networks, the subject-specific time-varying loadings, and the neural network weights. We validate our framework on a dataset of neurotypical individuals from the Human Connectome Project (HCP) database to map to cognition and on a separate multi-score prediction task on individuals diagnosed with Autism Spectrum Disorder (ASD) in a five-fold cross validation setting. Our hybrid model outperforms several state-of-the-art approaches at clinical outcome prediction and learns interpretable multimodal neural signatures of brain organization.
- [250] arXiv:2012.12311 (replaced) [pdf, other]
-
Title: Unboxing Engagement in YouTube Influencer Videos: An Attention-Based ApproachComments: 47 pages, Online AppendixSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Influencer marketing videos have surged in popularity, yet significant gaps remain in understanding the relationships between video features and engagement. This challenge is intensified by the complexities of interpreting unstructured data. While deep learning models effectively leverage raw unstructured data to predict engagement, they often function as black boxes with limited interpretability, particularly when human validation is hindered by the absence of a known ground truth. To address this issue, we develop an 'interpretable deep learning framework' that provides insights into the relationships captured by the models. Inspired by visual attention in print advertising, our interpretation approach uses measures of model attention to video features, eliminating spurious associations through a two-step process and identifying a subset of relationships for formal causal testing. This approach is versatile, as it applies across well-known attention mechanisms - additive attention, scaled dot-product attention, and gradient-based attention - when analyzing text, audio, or video image data. We apply our framework to YouTube influencer videos, linking video features to measures of shallow and deep engagement developed based on the dual-system framework of thinking. Our findings guide influencers in prioritizing the design of video features associated with deep engagement sentiment.
- [251] arXiv:2208.09449 (replaced) [pdf, html, other]
-
Title: A Novel Plug-and-Play Approach for Adversarially Robust GeneralizationSubjects: Machine Learning (cs.LG)
In this work, we propose a robust framework that employs adversarially robust training to safeguard the ML models against perturbed testing data. Our contributions can be seen from both computational and statistical perspectives. Firstly, from a computational/optimization point of view, we derive the ready-to-use exact solution for several widely used loss functions with a variety of norm constraints on adversarial perturbation for various supervised and unsupervised ML problems, including regression, classification, two-layer neural networks, graphical models, and matrix completion. The solutions are either in closed-form, or an easily tractable optimization problem such as 1-D convex optimization, semidefinite programming, difference of convex programming or a sorting-based algorithm. Secondly, from statistical/generalization viewpoint, using some of these results, we derive novel bounds of the adversarial Rademacher complexity for various problems, which entails new generalization bounds. Thirdly, we perform some sanity-check experiments on real-world datasets for supervised problems such as regression and classification, as well as for unsupervised problems such as matrix completion and learning graphical models, with very little computational overhead.
- [252] arXiv:2210.15294 (replaced) [pdf, html, other]
-
Title: Modeling Inter-Dependence Between Time and Mark in Multivariate Temporal Point ProcessesComments: Accepted in CIKM-2022Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Temporal Point Processes (TPP) are probabilistic generative frameworks. They model discrete event sequences localized in continuous time. Generally, real-life events reveal descriptive information, known as marks. Marked TPPs model time and marks of the event together for practical relevance. Conditioned on past events, marked TPPs aim to learn the joint distribution of the time and the mark of the next event. For simplicity, conditionally independent TPP models assume time and marks are independent given event history. They factorize the conditional joint distribution of time and mark into the product of individual conditional distributions. This structural limitation in the design of TPP models hurt the predictive performance on entangled time and mark interactions. In this work, we model the conditional inter-dependence of time and mark to overcome the limitations of conditionally independent models. We construct a multivariate TPP conditioning the time distribution on the current event mark in addition to past events. Besides the conventional intensity-based models for conditional joint distribution, we also draw on flexible intensity-free TPP models from the literature. The proposed TPP models outperform conditionally independent and dependent models in standard prediction tasks. Our experimentation on various datasets with multiple evaluation metrics highlights the merit of the proposed approach.
- [253] arXiv:2301.07806 (replaced) [pdf, html, other]
-
Title: Federated Automatic DifferentiationComments: 39 pages, 13 figures. To appear in JMLR 25 (2024)Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Symbolic Computation (cs.SC)
Federated learning (FL) is a general framework for learning across an axis of group partitioned data (heterogeneous clients) while preserving data privacy, under the orchestration of a central server. FL methods often compute gradients of loss functions purely locally (ie. entirely at each client, or entirely at the server), typically using automatic differentiation (AD) techniques. We propose a federated automatic differentiation (FAD) framework that 1) enables computing derivatives of functions involving client and server computation as well as communication between them and 2) operates in a manner compatible with existing federated technology. In other words, FAD computes derivatives across communication boundaries. We show, in analogy with traditional AD, that FAD may be implemented using various accumulation modes, which introduce distinct computation-communication trade-offs and systems requirements. Further, we show that a broad class of federated computations is closed under these various modes of FAD, implying in particular that if the original computation can be implemented using privacy-preserving primitives, its derivative may be computed using only these same primitives. We then show how FAD can be used to create algorithms that dynamically learn components of the algorithm itself. In particular, we show that FedAvg-style algorithms can exhibit significantly improved performance by using FAD to adjust the server optimization step automatically, or by using FAD to learn weighting schemes for computing weighted averages across clients.
- [254] arXiv:2302.09649 (replaced) [pdf, html, other]
-
Title: Weakly Supervised Label Learning FlowsComments: Accepted as a full length research article by Neural NetworksSubjects: Machine Learning (cs.LG)
Supervised learning usually requires a large amount of labelled data. However, attaining ground-truth labels is costly for many tasks. Alternatively, weakly supervised methods learn with cheap weak signals that only approximately label some data. Many existing weakly supervised learning methods learn a deterministic function that estimates labels given the input data and weak signals. In this paper, we develop label learning flows (LLF), a general framework for weakly supervised learning problems. Our method is a generative model based on normalizing flows. The main idea of LLF is to optimize the conditional likelihoods of all possible labelings of the data within a constrained space defined by weak signals. We develop a training method for LLF that trains the conditional flow inversely and avoids estimating the labels. Once a model is trained, we can make predictions with a sampling algorithm. We apply LLF to three weakly supervised learning problems. Experiment results show that our method outperforms many baselines we compare against.
- [255] arXiv:2308.08949 (replaced) [pdf, html, other]
-
Title: A Dual-Perspective Approach to Evaluating Feature Attribution MethodsComments: Accepted by Transactions on Machine Learning Research (11/2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
- [256] arXiv:2310.15290 (replaced) [pdf, html, other]
-
Title: Reliable Generation of Privacy-preserving Synthetic Electronic Health Record Time Series via Diffusion ModelsSubjects: Machine Learning (cs.LG)
Electronic Health Records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR de-identification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancement of medical research using EHR. This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic electronic health records (EHRs) time series efficiently. We introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six databases: Medical Information Mart for Intensive Care III and IV (MIMIC-III/IV), the eICU Collaborative Research Database (eICU), and non-EHR datasets on Stocks and Energy. We compared our proposed method with eight existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data fidelity while requiring less training effort. Additionally, data generated by our method yields a lower discriminative accuracy compared to other baseline methods, indicating the proposed method can generate data with less privacy risk. The proposed diffusion-model-based method can reliably and efficiently generate synthetic EHR time series, which facilitates the downstream medical data analysis. Our numerical results show the superiority of the proposed method over all other existing methods.
- [257] arXiv:2310.17748 (replaced) [pdf, other]
-
Title: OrionBench: Benchmarking Time Series Generative Models in the Service of the End-UserComments: This work is accepted by IEEE BigData 2024Subjects: Machine Learning (cs.LG)
Time series anomaly detection is a vital task in many domains, including patient monitoring in healthcare, forecasting in finance, and predictive maintenance in energy industries. This has led to a proliferation of anomaly detection methods, including deep learning-based methods. Benchmarks are essential for comparing the performances of these models as they emerge, in a fair, rigorous, and reproducible approach. Although several benchmarks for comparing models have been proposed, these usually rely on a one-time execution over a limited set of datasets, with comparisons restricted to a few models. We propose OrionBench: an end-user centric, continuously maintained benchmarking framework for unsupervised time series anomaly detection models. Our framework provides universal abstractions to represent models, hyperparameter standardization, extensibility to add new pipelines and datasets, pipeline verification, and frequent releases with published updates of the benchmark. We demonstrate how to use OrionBench, and the performance of pipelines across 17 releases published over the course of four years. We also walk through two real scenarios we experienced with OrionBench that highlight the importance of continuous benchmarking for unsupervised time series anomaly detection.
- [258] arXiv:2310.18681 (replaced) [pdf, html, other]
-
Title: DySurv: dynamic deep learning model for survival analysis with conditional variational inferenceSubjects: Machine Learning (cs.LG)
Machine learning applications for longitudinal electronic health records often forecast the risk of events at fixed time points, whereas survival analysis achieves dynamic risk prediction by estimating time-to-event distributions. Here, we propose a novel conditional variational autoencoder-based method, DySurv, which uses a combination of static and longitudinal measurements from electronic health records to estimate the individual risk of death dynamically. DySurv directly estimates the cumulative risk incidence function without making any parametric assumptions on the underlying stochastic process of the time-to-event. We evaluate DySurv on 6 time-to-event benchmark datasets in healthcare, as well as 2 real-world intensive care unit (ICU) electronic health records (EHR) datasets extracted from the eICU Collaborative Research (eICU) and the Medical Information Mart for Intensive Care database (MIMIC-IV). DySurv outperforms other existing statistical and deep learning approaches to time-to-event analysis across concordance and other metrics. It achieves time-dependent concordance of over 60% in the eICU case. It is also over 12% more accurate and 22% more sensitive than in-use ICU scores like Acute Physiology and Chronic Health Evaluation (APACHE) and Sequential Organ Failure Assessment (SOFA) scores. The predictive capacity of DySurv is consistent and the survival estimates remain disentangled across different datasets. Our interdisciplinary framework successfully incorporates deep learning, survival analysis, and intensive care to create a novel method for time-to-event prediction from longitudinal health records. We test our method on several held-out test sets from a variety of healthcare datasets and compare it to existing in-use clinical risk scoring benchmarks.
- [259] arXiv:2311.07324 (replaced) [pdf, html, other]
-
Title: Data-Aware Gradient Compression for FL in Communication-Constrained Mobile ComputingComments: This article has been accepted for publication in IEEE Transactions on Mobile Computing. Citation information: DOI this https URLSubjects: Machine Learning (cs.LG)
Federated Learning (FL) in mobile environments faces significant communication bottlenecks. Gradient compression has proven as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, it encounters severe performance drops in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is therefore a promising solution. This work derives the convergence rate of distributed SGD with non-uniform compression, which reveals the intricate relationship between model convergence and the compression ratios applied to individual workers. Accordingly, we frame the relative compression ratio assignment as an $n$-variable chi-squared nonlinear optimization problem, constrained by a limited communication budget. We propose DAGC-R, which assigns conservative compression to workers handling larger data volumes. Recognizing the computational limitations of mobile devices, we propose the DAGC-A, which is computationally less demanding and enhances the robustness of compression in non-IID scenarios. Our experiments confirm that the DAGC-R and DAGC-A can speed up the training speed by up to $25.43\%$ and $16.65\%$ compared to the uniform compression respectively, when dealing with highly imbalanced data volume distribution and restricted communication.
- [260] arXiv:2312.00486 (replaced) [pdf, html, other]
-
Title: REDUCR: Robust Data Downsampling Using Class Priority ReweightingComments: PreprintSubjects: Machine Learning (cs.LG)
Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, these techniques can suffer from poor worst-class generalization performance due to class imbalance and distributional shifts. This work introduces REDUCR, a robust and efficient data downsampling method that uses class priority reweighting. REDUCR reduces the training data while preserving worst-class generalization performance. REDUCR assigns priority weights to datapoints in a class-aware manner using an online learning algorithm. We demonstrate the data efficiency and robust performance of REDUCR on vision and text classification tasks. On web-scraped datasets with imbalanced class distributions, REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%.
- [261] arXiv:2312.01472 (replaced) [pdf, html, other]
-
Title: BenchMARL: Benchmarking Multi-Agent Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a benchmarking tool that enables standardization and reproducibility, while leveraging cutting-edge Reinforcement Learning (RL) implementations. In this paper, we introduce BenchMARL, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations while addressing the broad community of MARL PyTorch users. Its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs. BenchMARL is open-sourced on GitHub: this https URL
- [262] arXiv:2312.06254 (replaced) [pdf, other]
-
Title: Modyn: Data-Centric Machine Learning Pipeline OrchestrationMaximilian Böther, Ties Robroek, Viktor Gsteiger, Robin Holzinger, Xianzhe Ma, Pınar Tözün, Ana KlimovicComments: accepted at SIGMOD'25; 30 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
In real-world machine learning (ML) pipelines, datasets are continuously growing. Models must incorporate this new training data to improve generalization and adapt to potential distribution shifts. The cost of model retraining is proportional to how frequently the model is retrained and how much data it is trained on, which makes the naive approach of retraining from scratch each time impractical.
We present Modyn, a data-centric end-to-end machine learning platform. Modyn's ML pipeline abstraction enables users to declaratively describe policies for continuously training a model on a growing dataset. Modyn pipelines allow users to apply data selection policies (to reduce the number of data points) and triggering policies (to reduce the number of trainings). Modyn executes and orchestrates these continuous ML training pipelines. The system is open-source and comes with an ecosystem of benchmark datasets, models, and tooling. We formally discuss how to measure the performance of ML pipelines by introducing the concept of composite models, enabling fair comparison of pipelines with different data selection and triggering policies. We empirically analyze how various data selection and triggering policies impact model accuracy, and also show that Modyn enables high throughput training with sample-level data selection. - [263] arXiv:2312.06519 (replaced) [pdf, html, other]
-
Title: A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph SamplingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Graph neural networks (GNNs) face significant challenges with class imbalance, leading to biased inference results. To address this issue in heterogeneous graphs, we propose a novel framework that combines Graph Neural Network (GNN) and Generative Adversarial Network (GAN) to enhance classification for underrepresented node classes. The framework incorporates an advanced edge generation and selection module, enabling the simultaneous creation of synthetic nodes and edges through adversarial learning. Unlike previous methods, which predominantly focus on homogeneous graphs due to the difficulty of representing heterogeneous graph structures in matrix form, this approach is specifically designed for heterogeneous data. Existing solutions often rely on pre-trained models to incorporate synthetic nodes, which can lead to optimization inconsistencies and mismatches in data representation. Our framework avoids these pitfalls by generating data that aligns closely with the inherent graph topology and attributes, ensuring a more cohesive integration. Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score. These findings highlight the potential of this approach for addressing critical challenges in the field.
- [264] arXiv:2312.09852 (replaced) [pdf, html, other]
-
Title: Learning Distributions on Manifolds with Free-Form FlowsComments: NeurIPS 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose Manifold Free-Form Flows (M-FFF), a simple new generative model for data on manifolds. The existing approaches to learning a distribution on arbitrary manifolds are expensive at inference time, since sampling requires solving a differential equation. Our method overcomes this limitation by sampling in a single function evaluation. The key innovation is to optimize a neural network via maximum likelihood on the manifold, possible by adapting the free-form flow framework to Riemannian manifolds. M-FFF is straightforwardly adapted to any manifold with a known projection. It consistently matches or outperforms previous single-step methods specialized to specific manifolds. It is typically two orders of magnitude faster than multi-step methods based on diffusion or flow matching, achieving better likelihoods in several experiments. We provide our code at this https URL.
- [265] arXiv:2312.13927 (replaced) [pdf, html, other]
-
Title: On the Convergence of Loss and Uncertainty-based Active Learning AlgorithmsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.
- [266] arXiv:2312.17679 (replaced) [pdf, html, other]
-
Title: Data Augmentation for Supervised Graph Outlier Detection via Latent Diffusion ModelsComments: LoG 2024. Source code available at this https URLSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
A fundamental challenge confronting supervised graph outlier detection algorithms is the prevalent problem of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection via latent Diffusion Models. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at PyPI: this https URL.
- [267] arXiv:2401.01599 (replaced) [pdf, html, other]
-
Title: Generalization Error Curves for Analytic Spectral Algorithms under Power-law DecaySubjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.
- [268] arXiv:2401.10895 (replaced) [pdf, html, other]
-
Title: AI in Supply Chain Risk Assessment: A Systematic Literature Review and Bibliometric AnalysisSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Supply chain risk assessment (SCRA) has witnessed a profound evolution through the integration of artificial intelligence (AI) and machine learning (ML) techniques, revolutionizing predictive capabilities and risk mitigation strategies. The significance of this evolution stems from the critical role of robust risk management strategies in ensuring operational resilience and continuity within modern supply chains. Previous reviews have outlined established methodologies but have overlooked emerging AI/ML techniques, leaving a notable research gap in understanding their practical implications within SCRA. This paper conducts a systematic literature review combined with a comprehensive bibliometric analysis. We meticulously examined 1,439 papers and derived key insights from a select group of 51 articles published between 2015 and 2024. The review fills this research gap by addressing pivotal research questions and exploring existing AI/ML techniques, methodologies, findings, and future trajectories, thereby providing a more encompassing view of the evolving landscape of SCRA. Our study unveils the transformative impact of AI/ML models, such as Random Forest, XGBoost, and hybrids, in substantially enhancing precision within SCRA. It underscores adaptable post-COVID strategies, advocating for resilient contingency plans and aligning with evolving risk landscapes. Significantly, this review surpasses previous examinations by accentuating emerging AI/ML techniques and their practical implications within SCRA. Furthermore, it highlights the contributions through a comprehensive bibliometric analysis, revealing publication trends, influential authors, and highly cited articles.
- [269] arXiv:2401.11202 (replaced) [pdf, html, other]
-
Title: PartIR: Composing SPMD Partitioning Strategies for Machine LearningSami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel WeeSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance..
- [270] arXiv:2402.01302 (replaced) [pdf, html, other]
-
Title: A Unified Framework for Center-based Clustering of Distributed DataComments: 49 pages, 9 figures, 7 tablesSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
We develop a family of distributed center-based clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-$\mathcal{F}_\rho$), is parametrized by $\rho \geq 1$, controling the proximity of users' center estimates, with $\mathcal{F}$ determining the clustering loss. Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like $K$-means and Huber loss. Specialized to popular clustering losses like $K$-means and Huber loss, DGC-$\mathcal{F}_\rho$ gives rise to novel distributed clustering algorithms DGC-KM$_\rho$ and DGC-HL$_\rho$, while novel clustering losses based on Logistic and Fair functions lead to DGC-LL$_\rho$ and DGC-FL$_\rho$. We provide a unified analysis and establish several strong results, under mild assumptions. First, we show that the sequence of centers generated by the methods converges to a well-defined notion of fixed point, under any center initialization and value of $\rho$. Second, we prove that, as $\rho$ increases, the family of fixed points produced by DGC-$\mathcal{F}_\rho$ converges to a notion of consensus fixed points. We show that consensus fixed points of DGC-$\mathcal{F}_{\rho}$ are equivalent to fixed points of gradient clustering over the full data, guaranteeing a clustering of the full data is produced. For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points. Extensive numerical experiments on synthetic and real data confirm our theoretical findings, show strong performance of our methods and demonstrate the usefulness and wide range of potential applications of our general framework, such as outlier detection.
- [271] arXiv:2402.01964 (replaced) [pdf, html, other]
-
Title: Scalable and Efficient Temporal Graph Representation Learning via Forward Recent SamplingComments: Learning on Graphs Conference (LoG 2024)Subjects: Machine Learning (cs.LG)
Temporal graph representation learning (TGRL) is essential for modeling dynamic systems in real-world networks. However, traditional TGRL methods, despite their effectiveness, often face significant computational challenges and inference delays due to the inefficient sampling of temporal neighbors. Conventional sampling methods typically involve backtracking through the interaction history of each node. In this paper, we propose a novel TGRL framework, No-Looking-Back (NLB), which overcomes these challenges by introducing a forward recent sampling strategy. This strategy eliminates the need to backtrack through historical interactions by utilizing a GPU-executable, size-constrained hash table for each node. The hash table records a down-sampled set of recent interactions, enabling rapid query responses with minimal inference latency. The maintenance of this hash table is highly efficient, operating with $O(1)$ complexity. Fully compatible with GPU processing, NLB maximizes programmability, parallelism, and power efficiency. Empirical evaluations demonstrate that NLB not only matches or surpasses state-of-the-art methods in accuracy for tasks like link prediction and node classification across six real-world datasets but also achieves 1.32-4.40x faster training, 1.2-7.94x greater energy efficiency, and 1.63-12.95x lower inference latency compared to competitive baselines. The link to the code: this https URL.
- [272] arXiv:2402.02429 (replaced) [pdf, html, other]
-
Title: Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement LearningComments: 25 pages, 8 figures, 7 tables. TLDR: We propose a novel information theoretic framework of the context-based offline meta-RL paradigm, which unifies several mainstream methods and leads to two robust algorithm implementationsJournal-ref: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)Subjects: Machine Learning (cs.LG)
As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. Such theoretical insight offers ample design freedom for novel algorithms. As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
- [273] arXiv:2402.05396 (replaced) [pdf, other]
-
Title: TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation LearningGangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor PrasannaComments: IPDPS 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.
- [274] arXiv:2402.14081 (replaced) [pdf, html, other]
-
Title: Motion Code: Robust Time Series Classification and Forecasting via Sparse Variational Multi-Stochastic Processes LearningComments: 20 pages, 5 figures, 4 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Despite extensive research, time series classification and forecasting on noisy data remain highly challenging. The main difficulties lie in finding suitable mathematical concepts to describe time series and effectively separate noise from the true signals. Unlike traditional methods treating time series as static vectors or fixed sequences, we propose a novel framework that views each time series, regardless of length, as a realization of a continuous-time stochastic process. This mathematical approach captures dependencies across timestamps and detects hidden, time-varying signals within the noise. However, real-world data often involves multiple distinct dynamics, making it insufficient to model the entire process with a single stochastic model. To address this, we assign each dynamic a unique signature vector and introduce the concept of "most informative timestamps" to infer a sparse approximation of the individual dynamics from these vectors. The resulting model, called Motion Code, includes parameters that fully capture diverse underlying dynamics in an integrated manner, enabling simultaneous classification and forecasting of time series. Extensive experiments on noisy datasets, including real-world Parkinson's disease sensor tracking, demonstrate Motion Code's strong performance against established benchmarks for time series classification and forecasting.
- [275] arXiv:2402.19472 (replaced) [pdf, html, other]
-
Title: Efficient Lifelong Model Evaluation in an Era of Rapid ProgressComments: Accepted as a conference paper at NeurIPS'24Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Standardized benchmarks drive progress in machine learning. However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling ever-expanding large-scale benchmarks called Lifelong Benchmarks. These benchmarks introduce a major challenge: the high cost of evaluating a growing number of models across very large sample sets. To address this challenge, we introduce an efficient framework for model evaluation, Sort & Search (S&S)}, which reuses previously evaluated models by leveraging dynamic programming algorithms to selectively rank and sub-select test samples. To test our approach at scale, we create Lifelong-CIFAR10 and Lifelong-ImageNet, containing 1.69M and 1.98M test samples for classification. Extensive empirical evaluations across over 31,000 models demonstrate that S&S achieves highly-efficient approximate accuracy measurement, reducing compute cost from 180 GPU days to 5 GPU hours (about 1000x reduction) on a single A100 GPU, with low approximation error and memory cost of <100MB. Our work also highlights issues with current accuracy prediction metrics, suggesting a need to move towards sample-level evaluation metrics. We hope to guide future research by showing our method's bottleneck lies primarily in generalizing Sort beyond a single rank order and not in improving Search.
- [276] arXiv:2403.07187 (replaced) [pdf, html, other]
-
Title: UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal AdaptationComments: TMLR 2024; ICML 2024 AI for Science Workshop (Spotlight)Subjects: Machine Learning (cs.LG)
We present Unified PDE Solvers (UPS), a data- and compute-efficient approach to developing unified neural operators for diverse families of spatiotemporal PDEs from various domains, dimensions, and resolutions. UPS embeds different PDEs into a shared representation space and processes them using a FNO-transformer architecture. Rather than training the network from scratch, which is data-demanding and computationally expensive, we warm-start the transformer from pretrained LLMs and perform explicit alignment to reduce the modality gap while improving data and compute efficiency. The cross-modal UPS achieves state-of-the-art results on a wide range of 1D and 2D PDE families from PDEBench, outperforming existing unified models using 4 times less data and 26 times less compute. Meanwhile, it is capable of few-shot transfer to unseen PDE families and coefficients.
- [277] arXiv:2403.09613 (replaced) [pdf, html, other]
-
Title: Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured trainingComments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024), VancouverSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before encountering them again. This behavior occurs even though the documents are never presented in context together. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we demonstrate a new mechanism by which over-parametrized neural networks can recover from catastrophic interference and uncover new insights into training over-parameterized networks in cyclically structured environments.
- [278] arXiv:2403.10842 (replaced) [pdf, other]
-
Title: Twin Transformer using Gated Dynamic Learnable Attention mechanism for Fault Detection and Diagnosis in the Tennessee Eastman ProcessSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Fault detection and diagnosis (FDD) is a crucial task for ensuring the safety and efficiency of industrial processes. We propose a novel FDD methodology for the Tennessee Eastman Process (TEP), a widely used benchmark for chemical process control. The model employs two separate Transformer branches, enabling independent processing of input data and potential extraction of diverse information. A novel attention mechanism, Gated Dynamic Learnable Attention (GDLAttention), is introduced which integrates a gating mechanism and dynamic learning capabilities. The gating mechanism modulates the attention weights, allowing the model to focus on the most relevant parts of the input. The dynamic learning approach adapts the attention strategy during training, potentially leading to improved performance. The attention mechanism uses a bilinear similarity function, providing greater flexibility in capturing complex relationships between query and key vectors. In order to assess the effectiveness of our approach, we tested it against 21 and 18 distinct fault scenarios in TEP, and compared its performance with several established FDD techniques. The outcomes indicate that the method outperforms others in terms of accuracy, false alarm rate, and misclassification rate. This underscores the robustness and efficacy of the approach for FDD in intricate industrial processes.
- [279] arXiv:2404.10296 (replaced) [pdf, html, other]
-
Title: Interpolating neural network: A novel unification of machine learning and interpolation theoryChanwook Park, Sourav Saha, Jiachen Guo, Hantao Zhang, Xiaoyu Xie, Miguel A. Bessa, Dong Qian, Wei Chen, Gregory J. Wagner, Jian Cao, Wing Kam LiuComments: 13 pages, 4 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Artificial intelligence (AI) has revolutionized software development, shifting from task-specific codes (Software 1.0) to neural network-based approaches (Software 2.0). However, applying this transition in engineering software presents challenges, including low surrogate model accuracy, the curse of dimensionality in inverse design, and rising complexity in physical simulations. We introduce an interpolating neural network (INN), grounded in interpolation theory and tensor decomposition, to realize Engineering Software 2.0 by advancing data training, partial differential equation solving, and parameter calibration. INN offers orders of magnitude fewer trainable/solvable parameters for comparable model accuracy than traditional multi-layer perceptron (MLP) or physics-informed neural networks (PINN). Demonstrated in metal additive manufacturing, INN rapidly constructs an accurate surrogate model of Laser Powder Bed Fusion (L-PBF) heat transfer simulation, achieving sub-10-micrometer resolution for a 10 mm path in under 15 minutes on a single GPU. This makes a transformative step forward across all domains essential to engineering software.
- [280] arXiv:2404.16789 (replaced) [pdf, html, other]
-
Title: Continual Learning of Large Language Models: A Comprehensive SurveyHaizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, Hao WangComments: 44 pages, 2 figures, 4 tables; Work in progressSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at this https URL.
- [281] arXiv:2404.17801 (replaced) [pdf, other]
-
Title: Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning ApproachesComments: research paper (29 pages, 20 figures)Subjects: Machine Learning (cs.LG)
Combustion instability in gas turbines and rocket engines, as one of the most challenging problems in combustion research, arises from the complex interactions among flames, which are also influenced by chemical reactions, heat and mass transfer, and acoustics. Identifying and understanding combustion instability is essential to ensure the safe and reliable operation of many combustion systems, where exploring and classifying the dynamical behaviors of complex flame systems is a core take. To facilitate fundamental studies, the present work concerns dynamical mode recognition of coupled flame oscillators made of flickering buoyant diffusion flames, which have gained increasing attention in recent years but are not sufficiently understood. The time series data of flame oscillators are generated by fully validated reacting flow simulations. Due to limitations of expertise-based models, a data-driven approach is adopted. In this study, a nonlinear dimensional reduction model of variational autoencoder (VAE) is used to project the simulation data onto a 2-dimensional latent space. Based on the phase trajectories in latent space, both supervised and unsupervised classifiers are proposed for datasets with well known labeling and without, respectively. For labeled datasets, we establish the Wasserstein-distance-based classifier (WDC) for mode recognition; for unlabeled datasets, we develop a novel unsupervised classifier (GMM-DTWC) combining dynamic time warping (DTW) and Gaussian mixture model (GMM). Through comparing with conventional approaches for dimensionality reduction and classification, the proposed supervised and unsupervised VAE-based approaches exhibit a prominent performance for distinguishing dynamical modes, implying their potential extension to dynamical mode recognition of complex combustion problems.
- [282] arXiv:2405.01125 (replaced) [pdf, other]
-
Title: Lipschitz constant estimation for general neural network architectures using control toolsSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
This paper is devoted to the estimation of the Lipschitz constant of general neural network architectures using semidefinite programming. For this purpose, we interpret neural networks as time-varying dynamical systems, where the $k$-th layer corresponds to the dynamics at time $k$. A key novelty with respect to prior work is that we use this interpretation to exploit the series interconnection structure of feedforward neural networks with a dynamic programming recursion. Nonlinearities, such as activation functions and nonlinear pooling layers, are handled with integral quadratic constraints. If the neural network contains signal processing layers (convolutional or state space model layers), we realize them as 1-D/2-D/N-D systems and exploit this structure as well. We distinguish ourselves from related work on Lipschitz constant estimation by more extensive structure exploitation (scalability) and a generalization to a large class of common neural network architectures. To show the versatility and computational advantages of our method, we apply it to different neural network architectures trained on MNIST and CIFAR-10.
- [283] arXiv:2405.02631 (replaced) [pdf, html, other]
-
Title: Unsupervised machine learning for data-driven rock mass classification: addressing limitations in existing systems using drilling dataComments: 55 pages, 21 figures. Includes ancillary interactive versions of some figuresSubjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Systems and Control (eess.SY)
Rock mass classification systems are crucial for assessing stability and risk in underground construction globally and guiding support and excavation design. However, these systems, developed primarily in the 1970s, lack access to modern high-resolution data and advanced statistical techniques, limiting their effectiveness as decision-support systems. We outline these limitations and describe how a data-driven system, based on drilling data, can overcome them. Using statistical information extracted from thousands of MWD-data values in one-meter sections of a tunnel profile, acting as a signature of the rock mass, we demonstrate that well-defined clusters can form a foundational basis for various classification systems. Representation learning was used to reduce the dimensionality of 48-value vectors via a nonlinear manifold learning technique (UMAP) and linear principal component analysis (PCA) to enhance clustering. Unsupervised machine learning methods (HDBSCAN, Agglomerative Clustering, K-means) clustered the data, with hyperparameters optimised through multi-objective Bayesian optimisation. Domain knowledge improved clustering by adding extra features to core MWD-data clusters. We structured and correlated these clusters with physical rock properties, including rock type and quality, and analysed cumulative distributions of key MWD-parameters to determine if clusters meaningfully differentiate rock masses. The ability of MWD data to form distinct rock mass clusters suggests substantial potential for future classification systems using this objective, data-driven methodology, minimising human bias.
- [284] arXiv:2405.04111 (replaced) [pdf, html, other]
-
Title: Adaptive Least Mean pth Power Graph Neural NetworksSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage of adaptive filtering in handling noise and missing observations as well as the online update capability. The incorporated graph neural network within the LMP-GNN can train and update filter parameters online instead of predefined filter parameters in previous methods, outputting more accurate prediction results. The adaptive update scheme of the LMP-GNN follows the solution of a $l_p$-norm optimization, rooting to the minimum dispersion criterion, and yields robust estimation results for time-varying graph signals under impulsive noise. A special case of LMP-GNN named the Sign-GNN is also provided and analyzed, Experiment results on two real-world datasets of temperature graph and traffic graph under four different noise distributions prove the effectiveness and robustness of our proposed LMP-GNN.
- [285] arXiv:2405.08036 (replaced) [pdf, html, other]
-
Title: POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement LearningComments: The first two authors contributed equally to this work. Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal Joint Actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns h QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
- [286] arXiv:2405.16498 (replaced) [pdf, html, other]
-
Title: On Sequential Loss Approximation for Continual LearningSubjects: Machine Learning (cs.LG)
We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results, unless combined with replay. We find that for small datasets, quadratic approximation of the previous loss function leads to poor results, even with full Hessian computation, and NC could significantly improve the predictive performance, while for large datasets, when used with a fixed pre-trained feature extractor, AQC provides superior predictive performance. We also find that using tanh-output features can improve the predictive performance of AQC. In particular, in class-incremental Split MNIST, when a Convolutional Neural Network (CNN) with tanh-output features is pre-trained on EMNIST Letters and used as a fixed pre-trained feature extractor, AQC can achieve predictive performance comparable to joint training.
- [287] arXiv:2405.16623 (replaced) [pdf, html, other]
-
Title: Graph neural networks with configuration cross-attention for tensor compilersDmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. PughSubjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Performance (cs.PF)
With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.
- [288] arXiv:2405.18979 (replaced) [pdf, html, other]
-
Title: MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution ShiftsComments: The three first authors contributed equallySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts. The code is available at \url{this https URL}.
- [289] arXiv:2405.20114 (replaced) [pdf, html, other]
-
Title: Towards Faster Decentralized Stochastic Optimization with Communication CompressionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Communication efficiency has garnered significant attention as it is considered the main bottleneck for large-scale decentralized Machine Learning applications in distributed and federated settings. In this regime, clients are restricted to transmitting small amounts of quantized information to their neighbors over a communication graph. Numerous endeavors have been made to address this challenging problem by developing algorithms with compressed communication for decentralized non-convex optimization problems. Despite considerable efforts, the current results suffer from various issues such as non-scalability with the number of clients, requirements for large batches, or bounded gradient assumption. In this paper, we introduce MoTEF, a novel approach that integrates communication compression with Momentum Tracking and Error Feedback. Our analysis demonstrates that MoTEF achieves most of the desired properties, and significantly outperforms existing methods under arbitrary data heterogeneity. We provide numerical experiments to validate our theoretical findings and confirm the practical superiority of MoTEF.
- [290] arXiv:2405.20543 (replaced) [pdf, html, other]
-
Title: Towards a General Recipe for Combinatorial Optimization with Multi-Filter GNNsComments: In Proceedings of the Third Learning on Graphs Conference (LoG 2024, Oral); 20 pages, 2 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM)
Graph neural networks (GNNs) have achieved great success for a variety of tasks such as node classification, graph classification, and link prediction. However, the use of GNNs (and machine learning more generally) to solve combinatorial optimization (CO) problems is much less explored. Here, we introduce GCON, a novel GNN architecture that leverages a complex filter bank and localized attention mechanisms to solve CO problems on graphs. We show how our method differentiates itself from prior GNN-based CO solvers and how it can be effectively applied to the maximum cut, minimum dominating set, and maximum clique problems in a unsupervised learning setting. GCON is competitive across all tasks and consistently outperforms other specialized GNN-based approaches, and is on par with the powerful Gurobi solver on the max-cut problem. We provide an open-source implementation of our work at this https URL.
- [291] arXiv:2406.08074 (replaced) [pdf, html, other]
-
Title: A Concept-Based Explainability Framework for Large Multimodal ModelsComments: NeurIPS 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains largely a mystery. In this paper, we present a novel framework for the interpretation of LMMs. We propose a dictionary learning based approach, applied to the representation of tokens. The elements of the learned dictionary correspond to our proposed concepts. We show that these concepts are well semantically grounded in both vision and text. Thus we refer to these as ``multi-modal concepts''. We qualitatively and quantitatively evaluate the results of the learnt concepts. We show that the extracted multimodal concepts are useful to interpret representations of test samples. Finally, we evaluate the disentanglement between different concepts and the quality of grounding concepts visually and textually. Our implementation is publicly available.
- [292] arXiv:2406.08638 (replaced) [pdf, html, other]
-
Title: Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell DataSubjects: Machine Learning (cs.LG)
Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per individual cell. In order to translate immune signatures assayed from blood or tissue into powerful diagnostics, machine learning approaches are often employed to compute immunological summaries or per-sample featurizations, which can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are trained only to accurately predict a single outcome and do not take into account relevant additional clinical features or covariates that are likely to also be measured for each sample. Here, we introduce a novel approach for incorporating measured covariates in optimizing model parameters to ultimately specify per-sample encodings that accurately affect both immune signatures and additional clinical information. Our introduced method CytoCoSet is a set-based encoding method for learning per-sample featurizations, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations. Overall, incorporating clinical covariates enables the learning of encodings for each individual sample that ultimately improve prediction of clinical outcome.
- [293] arXiv:2407.07082 (replaced) [pdf, other]
-
Title: Can Learned Optimization Make Reinforcement Learning Less Difficult?Comments: Neurips 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whether learned optimization can help overcome these problems. Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed solutions to these difficulties. We show that our parameterization is flexible enough to enable meta-learning in diverse learning contexts, including the ability to use stochasticity for exploration. Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization characteristics across a range of environments and agent architectures.
- [294] arXiv:2407.12404 (replaced) [pdf, html, other]
-
Title: Analyzing the Generalization and Reliability of Steering VectorsDaniel Tan, David Chanin, Aengus Lynch, Dimitrios Kanoulas, Brooks Paige, Adria Garriga-Alonso, Robert KirkSubjects: Machine Learning (cs.LG)
Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. In-distribution, steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input, presenting a challenge for the widespread use of steering vectors. Out-of-distribution, while steering vectors often generalise well, for several concepts they are brittle to reasonable changes in the prompt, resulting in them failing to generalise well. Overall, our findings show that while steering can work well in the right circumstances, there remain many technical difficulties of applying steering vectors to guide models' behaviour at scale.
- [295] arXiv:2407.19198 (replaced) [pdf, html, other]
-
Title: Towards the Dynamics of a DNN Learning Symbolic InteractionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
This study proves the two-phase dynamics of a deep neural network (DNN) learning interactions. Despite the long disappointing view of the faithfulness of post-hoc explanation of a DNN, a series of theorems have been proven in recent years to show that for a given input sample, a small set of interactions between input variables can be considered as primitive inference patterns that faithfully represent a DNN's detailed inference logic on that sample. Particularly, Zhang et al. have observed that various DNNs all learn interactions of different complexities in two distinct phases, and this two-phase dynamics well explains how a DNN changes from under-fitting to over-fitting. Therefore, in this study, we mathematically prove the two-phase dynamics of interactions, providing a theoretical mechanism for how the generalization power of a DNN changes during the training process. Experiments show that our theory well predicts the real dynamics of interactions on different DNNs trained for various tasks.
- [296] arXiv:2407.20786 (replaced) [pdf, other]
-
Title: Be aware of overfitting by hyperparameter optimization!Comments: 19 pages, 5 TablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each dataset using different data cleaning protocols and hyperparameter optimization. In our study we showed that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures. Similar results could be calculated using pre-set hyperparameters, reducing the computational effort by around 10,000 times. We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of smiles called Transformer CNN. We show that across all analyzed sets using exactly the same protocol, Transformer CNN provided better results than graph-based methods for 26 out of 28 pairwise comparisons by using only a tiny fraction of time as compared to other methods. Last but not least we stressed the importance of comparing calculation results using exactly the same statistical measures.
- [297] arXiv:2408.02688 (replaced) [pdf, other]
-
Title: A probabilistic framework for learning non-intrusive corrections to long-time climate simulations from short-time training dataBenedikt Barthel Sorensen, Leonardo Zepeda-Núñez, Ignacio Lopez-Gomez, Zhong Yi Wan, Rob Carver, Fei Sha, Themistoklis SapsisSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Atmospheric and Oceanic Physics (physics.ao-ph); Fluid Dynamics (physics.flu-dyn)
Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for applications of long-term risk assessment, such as the quantification of extreme weather risk due to climate change. While data-driven modeling offers some promise of alleviating these obstacles, the scarcity of high-quality simulations results in limited available data to train such models, which is often compounded by the lack of stability for long-horizon simulations. As such, the computational, algorithmic, and data restrictions generally imply that the probability of rare extreme events is not accurately captured. In this work we present a general strategy for training neural network models to non-intrusively correct under-resolved long-time simulations of chaotic systems. The approach is based on training a post-processing correction operator on under-resolved simulations nudged towards a high-fidelity reference. This enables us to learn the dynamics of the underlying system directly, which allows us to use very little training data, even when the statistics thereof are far from converged. Additionally, through the use of probabilistic network architectures we are able to leverage the uncertainty due to the limited training data to further improve extrapolation capabilities. We apply our framework to severely under-resolved simulations of quasi-geostrophic flow and demonstrate its ability to accurately predict the anisotropic statistics over time horizons more than 30 times longer than the data seen in training.
- [298] arXiv:2408.05160 (replaced) [pdf, html, other]
-
Title: Federated Hypergraph Learning: Hyperedge Completion with Local Differential PrivacySubjects: Machine Learning (cs.LG)
As the volume and complexity increase, graph-structured data commonly need to be split and stored across distributed systems. To enable data mining on subgraphs within these distributed systems, federated graph learning has been proposed, allowing collaborative training of Graph Neural Networks (GNNs) across clients without sharing raw node features. However, when dealing with graph structures that involve high-order relationships between nodes, known as hypergraphs, existing federated graph learning methods are less effective. In this study, we introduce FedHGL, an innovative federated hypergraph learning algorithm. FedHGL is designed to collaboratively train a comprehensive hypergraph neural network across multiple clients, facilitating mining tasks on subgraphs of a hypergraph where relationships are not merely pairwise. To address the high-order information loss between subgraphs caused by distributed storage, we introduce a pre-propagation hyperedge completion operation before the federated training process. In this pre-propagation step, cross-client feature aggregation is performed and distributed at the central server to ensure that this information can be utilized by the clients. Furthermore, by incorporating local differential privacy (LDP) mechanisms, we ensure that the original node features are not disclosed during this aggregation process. Experimental results on seven real-world datasets confirm the effectiveness of our approach and demonstrate its performance advantages over traditional federated graph learning methods.
- [299] arXiv:2408.08448 (replaced) [pdf, html, other]
-
Title: Exploring Cross-model Neuronal Correlations in the Context of Predicting Model Performance and GeneralizabilitySubjects: Machine Learning (cs.LG)
As Artificial Intelligence (AI) models are increasingly integrated into critical systems, the need for a robust framework to establish the trustworthiness of AI is increasingly paramount. While collaborative efforts have established conceptual foundations for such a framework, there remains a significant gap in developing concrete, technically robust methods for assessing AI model quality and performance. A critical drawback in the traditional methods for assessing the validity and generalizability of models is their dependence on internal developer datasets, rendering it challenging to independently assess and verify their performance claims. This paper introduces a novel approach for assessing a newly trained model's performance based on another known model by calculating correlation between neural networks. The proposed method evaluates correlations by determining if, for each neuron in one network, there exists a neuron in the other network that produces similar output. This approach has implications for memory efficiency, allowing for the use of smaller networks when high correlation exists between networks of different sizes. Additionally, the method provides insights into robustness, suggesting that if two highly correlated networks are compared and one demonstrates robustness when operating in production environments, the other is likely to exhibit similar robustness. This contribution advances the technical toolkit for responsible AI, supporting more comprehensive and nuanced evaluations of AI models to ensure their safe and effective deployment. Code is available at this https URL.
- [300] arXiv:2408.13689 (replaced) [pdf, other]
-
Title: Decentralised Variational Inference Frameworks for Multi-object Tracking on Sensor Networks: Additional NotesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
This paper tackles the challenge of multi-sensor multi-object tracking by proposing various decentralised Variational Inference (VI) schemes that match the tracking performance of centralised sensor fusion with only local message exchanges among neighboring sensors. We first establish a centralised VI sensor fusion scheme as a benchmark and analyse the limitations of its decentralised counterpart, which requires sensors to await consensus at each VI iteration. Therefore, we propose a decentralised gradient-based VI framework that optimises the Locally Maximised Evidence Lower Bound (LM-ELBO) instead of the standard ELBO, which reduces the parameter search space and enables faster convergence, making it particularly beneficial for decentralised tracking. This proposed framework is inherently self-evolving, improving with advancements in decentralised optimisation techniques for convergence guarantees and efficiency. Further, we enhance the convergence speed of proposed decentralised schemes using natural gradients and gradient tracking strategies. Results verify that our decentralised VI schemes are empirically equivalent to centralised fusion in tracking performance. Notably, the decentralised natural gradient VI method is the most communication-efficient, with communication costs comparable to suboptimal decentralised strategies while delivering notably higher tracking accuracy.
- [301] arXiv:2409.06927 (replaced) [pdf, html, other]
-
Title: Representation TuningComments: 10 pages, 7 figures, 6 tablesSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Activation engineering is becoming increasingly popular as a means of online control of large language models (LLMs). In this work, we extend the idea of inference-time steering with vectors that represent a behavioral direction of interest to tuning those vectors directly into the model, obviating the need for online control. First, we identify activation vectors related to honesty in an open-source LLM (Llama-2-13b-chat). Next, we demonstrate that model output can be made more or less honest by adding positive or negative multiples of these vectors to residual stream activations during generation. Then, we show that a similar effect can be achieved by fine-tuning the vectors directly into the model, by use of a dual loss function based on the cosine similarity of residual stream activations to the vectors combined with a standard token-based loss ("representation tuning"). Finally, we compare the generations in response to honesty-probing prompts from the resulting models to those from models fine-tuned with a token-based loss alone, and to those from the untuned model subjected to online steering. Overall, fine-tuning the vectors into the models using the cosine similarity plus token loss showed a stronger effect than online steering, and generalized better than using the standard loss, suggesting the potential utility of this approach as a safety measure. Code and data are available at this https URL. Tuned models are available at this https URL.
- [302] arXiv:2409.14590 (replaced) [pdf, html, other]
-
Title: Explainable AI needs formal notions of explanation correctnessSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The use of machine learning (ML) in critical domains such as medicine poses risks and requires regulation. One requirement is that decisions of ML systems in high-risk applications should be human-understandable. The field of "explainable artificial intelligence" (XAI) seemingly addresses this need. However, in its current form, XAI is unfit to provide quality control for ML; it itself needs scrutiny. Popular XAI methods cannot reliably answer important questions about ML models, their training data, or a given test input. We recapitulate results demonstrating that popular XAI methods systematically attribute importance to input features that are independent of the prediction target. This limits their utility for purposes such as model and data (in)validation, model improvement, and scientific discovery. We argue that the fundamental reason for this limitation is that current XAI methods do not address well-defined problems and are not evaluated against objective criteria of explanation correctness. Researchers should formally define the problems they intend to solve first and then design methods accordingly. This will lead to notions of explanation correctness that can be theoretically verified and objective metrics of explanation performance that can be assessed using ground-truth data.
- [303] arXiv:2409.15100 (replaced) [pdf, html, other]
-
Title: Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored ClippingComments: This is the full version of the paper, and the appendix contains a complete convergence analysis under non-convex conditionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Leveraging over-the-air computations for model aggregation is an effective approach to cope with the communication bottleneck in federated edge learning. By exploiting the superposition properties of multi-access channels, this approach facilitates an integrated design of communication and computation, thereby enhancing system privacy while reducing implementation costs. However, the inherent electromagnetic interference in radio channels often exhibits heavy-tailed distributions, giving rise to exceptionally strong noise in globally aggregated gradients that can significantly deteriorate the training performance. To address this issue, we propose a novel gradient clipping method, termed Median Anchored Clipping (MAC), to combat the detrimental effects of heavy-tailed noise. We also derive analytical expressions for the convergence rate of model training with analog over-the-air federated learning under MAC, which quantitatively demonstrates the effect of MAC on training performance. Extensive experimental results show that the proposed MAC algorithm effectively mitigates the impact of heavy-tailed noise, hence substantially enhancing system robustness.
- [304] arXiv:2410.00509 (replaced) [pdf, html, other]
-
Title: Learning Personalized Treatment Decisions in Precision Medicine: Disentangling Treatment Assignment Bias in Counterfactual Outcome Prediction and Biomarker IdentificationMichael Vollenweider, Manuel Schürch, Chiara Rohrer, Gabriele Gut, Michael Krauthammer, Andreas WickiComments: 9 pages, 5 figures, ML4H conference 2024Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Quantitative Methods (q-bio.QM)
Precision medicine has the potential to tailor treatment decisions to individual patients using machine learning (ML) and artificial intelligence (AI), but it faces significant challenges due to complex biases in clinical observational data and the high-dimensional nature of biological data. This study models various types of treatment assignment biases using mutual information and investigates their impact on ML models for counterfactual prediction and biomarker identification. Unlike traditional counterfactual benchmarks that rely on fixed treatment policies, our work focuses on modeling different characteristics of the underlying observational treatment policy in distinct clinical settings. We validate our approach through experiments on toy datasets, semi-synthetic tumor cancer genome atlas (TCGA) data, and real-world biological outcomes from drug and CRISPR screens. By incorporating empirical biological mechanisms, we create a more realistic benchmark that reflects the complexities of real-world data. Our analysis reveals that different biases lead to varying model performances, with some biases, especially those unrelated to outcome mechanisms, having minimal effect on prediction accuracy. This highlights the crucial need to account for specific biases in clinical observational data in counterfactual ML model development, ultimately enhancing the personalization of treatment decisions in precision medicine.
- [305] arXiv:2410.01405 (replaced) [pdf, html, other]
-
Title: On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep EncodingSubjects: Machine Learning (cs.LG)
Looped Transformers offer advantages in parameter efficiency and Turing completeness. However, their expressive power for function approximation and approximation rate remains underexplored. In this paper, we establish approximation rates of Looped Transformers by defining the concept of the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts us to incorporate scaling parameters for each loop, conditioned on timestep encoding. Experimental results demonstrate that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding architecture.
- [306] arXiv:2410.07610 (replaced) [pdf, html, other]
-
Title: CSA: Data-efficient Mapping of Unimodal Features to Multimodal FeaturesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Multimodal encoders like CLIP excel in tasks such as zero-shot image classification and cross-modal retrieval. However, they require excessive training data. We propose canonical similarity analysis (CSA), which uses two unimodal encoders to replicate multimodal encoders using limited data. CSA maps unimodal features into a multimodal space, using a new similarity score to retain only the multimodal information. CSA only involves the inference of unimodal encoders and a cubic-complexity matrix decomposition, eliminating the need for extensive GPU-based model training. Experiments show that CSA outperforms CLIP while requiring $300,000\times$ fewer multimodal data pairs and $6\times$ fewer unimodal data for ImageNet classification and misinformative news captions detection. CSA surpasses the state-of-the-art method to map unimodal features to multimodal features. We also demonstrate the ability of CSA with modalities beyond image and text, paving the way for future modality pairs with limited paired multimodal data but abundant unpaired unimodal data, such as lidar and text.
- [307] arXiv:2410.08511 (replaced) [pdf, html, other]
-
Title: Distributionally robust self-supervised learning for tabular dataComments: TRL Workshop@NeurIPS2024Subjects: Machine Learning (cs.LG)
Machine learning (ML) models trained using Empirical Risk Minimization (ERM) often exhibit systematic errors on specific subpopulations of tabular data, known as error slices. Learning robust representation in presence of error slices is challenging, especially in self-supervised settings during the feature reconstruction phase, due to high cardinality features and the complexity of constructing error sets. Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision, leaving a gap in approaches tailored for tabular data. We address this gap by developing a framework to learn robust representation in tabular data during self-supervised pre-training. Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations. This paper applies the Just Train Twice (JTT) and Deep Feature Reweighting (DFR) methods during the pre-training phase for tabular data. These methods fine-tune the ERM pre-trained model by up-weighting error-prone samples or creating balanced datasets for specific categorical features. This results in specialized models for each feature, which are then used in an ensemble approach to enhance downstream classification performance. This methodology improves robustness across slices, thus enhancing overall generalization performance. Extensive experiments across various datasets demonstrate the efficacy of our approach. The code is available: \url{this https URL}.
- [308] arXiv:2410.08633 (replaced) [pdf, html, other]
-
Title: Transformers Provably Solve Parity Efficiently with Chain of ThoughtComments: NeurIPS 2024 M3L WorkshopSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This work provides the first theoretical analysis of training transformers to solve complex problems by recursively generating intermediate states, analogous to fine-tuning for chain-of-thought (CoT) reasoning. We consider training a one-layer transformer to solve the fundamental $k$-parity problem, extending the work on RNNs by Wies et al. (2023). We establish three key results: (1) any finite-precision gradient-based algorithm, without intermediate supervision, requires substantial iterations to solve parity with finite samples. (2) In contrast, when intermediate parities are incorporated into the loss function, our model can learn parity in one gradient update when aided by \emph{teacher forcing}, where ground-truth labels of the reasoning chain are provided at each generation step. (3) Even without teacher forcing, where the model must generate CoT chains end-to-end, parity can be learned efficiently if augmented data is employed to internally verify the soundness of intermediate steps. Our findings, supported by numerical experiments, show that task decomposition and stepwise reasoning naturally arise from optimizing transformers with CoT; moreover, self-consistency checking can improve multi-step reasoning ability, aligning with empirical studies of CoT.
- [309] arXiv:2410.08989 (replaced) [pdf, html, other]
-
Title: Zeroth-Order Fine-Tuning of LLMs in Random SubspacesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model's parameter dimension$\unicode{x2013}$a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs' high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks.
- [310] arXiv:2410.11444 (replaced) [pdf, html, other]
-
Title: A Theoretical Survey on Foundation ModelsComments: 63 pages, 16 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Understanding the inner mechanisms of black-box foundation models (FMs) is essential yet challenging in artificial intelligence and its applications. Over the last decade, the long-running focus has been on their explainability, leading to the development of post-hoc explainable methods to rationalize the specific decisions already made by black-box FMs. However, these explainable methods have certain limitations in terms of faithfulness and resource requirement. Consequently, a new class of interpretable methods should be considered to unveil the underlying mechanisms of FMs in an accurate, comprehensive, heuristic, and resource-light way. This survey aims to review those interpretable methods that comply with the aforementioned principles and have been successfully applied to FMs. These methods are deeply rooted in machine learning theory, covering the analysis of generalization performance, expressive capability, and dynamic behavior. They provide a thorough interpretation of the entire workflow of FMs, ranging from the inference capability and training dynamics to their ethical implications. Ultimately, drawing upon these interpretations, this review identifies the next frontier research directions for FMs.
- [311] arXiv:2410.15526 (replaced) [pdf, html, other]
-
Title: SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM TrainingJinda Jia, Cong Xie, Hanlin Lu, Daoce Wang, Hao Feng, Chengming Zhang, Baixi Sun, Haibin Lin, Zhi Zhang, Xin Liu, Dingwen TaoComments: Accepted by NeurIPS 2024Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Recent years have witnessed a clear trend towards language models with an ever-increasing number of parameters, as well as the growing training overhead and memory usage. Distributed training, particularly through Sharded Data Parallelism (ShardedDP) which partitions optimizer states among workers, has emerged as a crucial technique to mitigate training time and memory usage. Yet, a major challenge in the scalability of ShardedDP is the intensive communication of weights and gradients. While compression techniques can alleviate this issue, they often result in worse accuracy. Driven by this limitation, we propose SDP4Bit (Toward 4Bit Communication Quantization in Sharded Data Parallelism for LLM Training), which effectively reduces the communication of weights and gradients to nearly 4 bits via two novel techniques: quantization on weight differences, and two-level gradient smooth quantization. Furthermore, SDP4Bit presents an algorithm-system co-design with runtime optimization to minimize the computation overhead of compression. In addition to the theoretical guarantees of convergence, we empirically evaluate the accuracy of SDP4Bit on the pre-training of GPT models with up to 6.7 billion parameters, and the results demonstrate a negligible impact on training loss. Furthermore, speed experiments show that SDP4Bit achieves up to 4.08$\times$ speedup in end-to-end throughput on a scale of 128 GPUs.
- [312] arXiv:2410.15714 (replaced) [pdf, html, other]
-
Title: Offline reinforcement learning for job-shop scheduling problemsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent advances in deep learning have shown significant potential for solving combinatorial optimization problems in real-time. Unlike traditional methods, deep learning can generate high-quality solutions efficiently, which is crucial for applications like routing and scheduling. However, existing approaches like deep reinforcement learning (RL) and behavioral cloning have notable limitations, with deep RL suffering from slow learning and behavioral cloning relying solely on expert actions, which can lead to generalization issues and neglect of the optimization objective. This paper introduces a novel offline RL method designed for combinatorial optimization problems with complex constraints, where the state is represented as a heterogeneous graph and the action space is variable. Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions. We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks, achieving superior performance compared to state-of-the-art techniques.
- [313] arXiv:2410.17628 (replaced) [pdf, html, other]
-
Title: Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel MetricSubjects: Machine Learning (cs.LG)
Robustness is a critical aspect of machine learning models. Existing robustness evaluation approaches often lack theoretical generality or rely heavily on empirical assessments, limiting insights into the structural factors contributing to robustness. Moreover, theoretical robustness analysis is not applicable for direct comparisons between models. To address these challenges, we propose $\textit{TopoLip}$, a metric based on layer-wise analysis that bridges topological data analysis and Lipschitz continuity for robustness evaluation. TopoLip provides a unified framework for both theoretical and empirical robustness comparisons across different architectures or configurations, and it reveals how model parameters influence the robustness of models. Using TopoLip, we demonstrate that attention-based models typically exhibit smoother transformations and greater robustness compared to convolution-based models, as validated through theoretical analysis and adversarial tasks. Our findings establish a connection between architectural design, robustness, and topological properties.
- [314] arXiv:2410.19631 (replaced) [pdf, html, other]
-
Title: Efficient Biological Data Acquisition through Inference Set DesignSubjects: Machine Learning (cs.LG)
In drug discovery, highly automated high-throughput laboratories are used to screen a large number of compounds in search of effective drugs. These experiments are expensive, so one might hope to reduce their cost by experimenting on a subset of the compounds, and predicting the outcomes of the remaining experiments. In this work, we model this scenario as a sequential subset selection problem: we aim to select the smallest set of candidates in order to achieve some desired level of accuracy for the system as a whole. Our key observation is that, if there is heterogeneity in the difficulty of the prediction problem across the input space, selectively obtaining the labels for the hardest examples in the acquisition pool will leave only the relatively easy examples to remain in the inference set, leading to better overall system performance. We call this mechanism inference set design, and propose the use of a confidence-based active learning solution to prune out these challenging examples. Our algorithm includes an explicit stopping criterion that stops running the experiments when it is sufficiently confident that the system has reached the target performance. Our empirical studies on image and molecular datasets, as well as a real-world large-scale biological assay, show that active learning for inference set design leads to significant reduction in experimental cost while retaining high system performance.
- [315] arXiv:2410.20483 (replaced) [pdf, html, other]
-
Title: Improving Decision SparsityComments: Accepted to 38th Conference on Neural Information Processing Systems (NeurIPS 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Sparsity is a central aspect of interpretability in machine learning. Typically, sparsity is measured in terms of the size of a model globally, such as the number of variables it uses. However, this notion of sparsity is not particularly relevant for decision-making; someone subjected to a decision does not care about variables that do not contribute to the decision. In this work, we dramatically expand a notion of decision sparsity called the Sparse Explanation Value(SEV) so that its explanations are more meaningful. SEV considers movement along a hypercube towards a reference point. By allowing flexibility in that reference and by considering how distances along the hypercube translate to distances in feature space, we can derive sparser and more meaningful explanations for various types of function classes. We present cluster-based SEV and its variant tree-based SEV, introduce a method that improves credibility of explanations, and propose algorithms that optimize decision sparsity in machine learning models.
- [316] arXiv:2410.22594 (replaced) [pdf, html, other]
-
Title: Gaussian Derivative Change-point Detection for Early Warnings of Industrial System FailuresSubjects: Machine Learning (cs.LG)
An early warning of future system failure is essential for conducting predictive maintenance and enhancing system availability. This paper introduces a three-step framework for assessing system health to predict imminent system breakdowns. First, the Gaussian Derivative Change-Point Detection (GDCPD) algorithm is proposed for detecting changes in the high-dimensional feature space. GDCPD conducts a multivariate Change-Point Detection (CPD) by implementing Gaussian derivative processes for identifying change locations on critical system features, as these changes eventually will lead to system failure. To assess the significance of these changes, Weighted Mahalanobis Distance (WMD) is applied in both offline and online analyses. In the offline setting, WMD helps establish a threshold that determines significant system variations, while in the online setting, it facilitates real-time monitoring, issuing alarms for potential future system breakdowns. Utilizing the insights gained from the GDCPD and monitoring scheme, Long Short-Term Memory (LSTM) network is then employed to estimate the Remaining Useful Life (RUL) of the system. The experimental study of a real-world system demonstrates the effectiveness of the proposed methodology in accurately forecasting system failures well before they occur. By integrating CPD with real-time monitoring and RUL prediction, this methodology significantly advances system health monitoring and early warning capabilities.
- [317] arXiv:2411.01140 (replaced) [pdf, html, other]
-
Title: Privacy-Preserving Federated Learning with Differentially Private Hyperdimensional ComputingComments: 28 Pages, 10 FiguresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Federated Learning (FL) is essential for efficient data exchange in Internet of Things (IoT) environments, as it trains Machine Learning (ML) models locally and shares only model updates. However, FL is vulnerable to privacy threats like model inversion and membership inference attacks, which can expose sensitive training data. To address these privacy concerns, Differential Privacy (DP) mechanisms are often applied. Yet, adding DP noise to black-box ML models degrades performance, especially in dynamic IoT systems where continuous, lifelong FL learning accumulates excessive noise over time. To mitigate this issue, we introduce Federated HyperDimensional computing with Privacy-preserving (FedHDPrivacy), an eXplainable Artificial Intelligence (XAI) framework that combines the neuro-symbolic paradigm with DP. FedHDPrivacy carefully manages the balance between privacy and performance by theoretically tracking cumulative noise from previous rounds and adding only the necessary incremental noise to meet privacy requirements. In a real-world case study involving in-process monitoring of manufacturing machining operations, FedHDPrivacy demonstrates robust performance, outperforming standard FL frameworks-including Federated Averaging (FedAvg), Federated Stochastic Gradient Descent (FedSGD), Federated Proximal (FedProx), Federated Normalized Averaging (FedNova), and Federated Adam (FedAdam)-by up to 38%. FedHDPrivacy also shows potential for future enhancements, such as multimodal data fusion.
- [318] arXiv:2411.06299 (replaced) [pdf, other]
-
Title: Intelligent Fault Diagnosis of Type and Severity in Low-Frequency, Low Bit-Depth SignalsSubjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
This study focuses on Intelligent Fault Diagnosis (IFD) in rotating machinery utilizing a single microphone and a data-driven methodology, effectively diagnosing 42 classes of fault types and severities. The research leverages sound data from the imbalanced MaFaulDa dataset, aiming to strike a balance between high performance and low resource consumption. The testing phase encompassed a variety of configurations, including sampling, quantization, signal normalization, silence removal, Wiener filtering, data scaling, windowing, augmentation, and classifier tuning using XGBoost. Through the analysis of time, frequency, mel-frequency, and statistical features, we achieved an impressive accuracy of 99.54% and an F-Beta score of 99.52% with just 6 boosting trees at an 8 kHz, 8-bit configuration. Moreover, when utilizing only MFCCs along with their first- and second-order deltas, we recorded an accuracy of 97.83% and an F-Beta score of 97.67%. Lastly, by implementing a greedy wrapper approach, we obtained a remarkable accuracy of 96.82% and an F-Beta score of 98.86% using 50 selected features, nearly all of which were first- and second-order deltas of the MFCCs.
- [319] arXiv:2411.08758 (replaced) [pdf, html, other]
-
Title: ScaleNet: Scale Invariance Learning in Directed GraphsComments: Scale invariance in node classification is demonstrated and applied in graph transformation to develop ScaleNet, which achieves state-of-the-art performance on both homophilic and heterophilic directed graphsSubjects: Machine Learning (cs.LG)
Graph Neural Networks (GNNs) have advanced relational data analysis but lack invariance learning techniques common in image classification. In node classification with GNNs, it is actually the ego-graph of the center node that is classified. This research extends the scale invariance concept to node classification by drawing an analogy to image processing: just as scale invariance being used in image classification to capture multi-scale features, we propose the concept of ``scaled ego-graphs''. Scaled ego-graphs generalize traditional ego-graphs by replacing undirected single-edges with ``scaled-edges'', which are ordered sequences of multiple directed edges. We empirically assess the performance of the proposed scale invariance in graphs on seven benchmark datasets, across both homophilic and heterophilic structures. Our scale-invariance-based graph learning outperforms inception models derived from random walks by being simpler, faster, and more accurate. The scale invariance explains inception models' success on homophilic graphs and limitations on heterophilic graphs. To ensure applicability of inception model to heterophilic graphs as well, we further present ScaleNet, an architecture that leverages multi-scaled features. ScaleNet achieves state-of-the-art results on five out of seven datasets (four homophilic and one heterophilic) and matches top performance on the remaining two, demonstrating its excellent applicability. This represents a significant advance in graph learning, offering a unified framework that enhances node classification across various graph types. Our code is available at this https URL.
- [320] arXiv:2411.10254 (replaced) [pdf, html, other]
-
Title: Uncertainty in Supply Chain Digital Twins: A Quantum-Classical Hybrid ApproachAbdullah Abdullah, Fannya Ratana Sandjaja, Ayesha Abdul Majeed, Gyan Wickremasinghe, Karen Rafferty, Vishal SharmaSubjects: Machine Learning (cs.LG)
This study investigates uncertainty quantification (UQ) using quantum-classical hybrid machine learning (ML) models for applications in complex and dynamic fields, such as attaining resiliency in supply chain digital twins and financial risk assessment. Although quantum feature transformations have been integrated into ML models for complex data tasks, a gap exists in determining their impact on UQ within their hybrid architectures (quantum-classical approach). This work applies existing UQ techniques for different models within a hybrid framework, examining how quantum feature transformation affects uncertainty propagation. Increasing qubits from 4 to 16 shows varied model responsiveness to outlier detection (OD) samples, which is a critical factor for resilient decision-making in dynamic environments. This work shows how quantum computing techniques can transform data features for UQ, particularly when combined with traditional methods.
- [321] arXiv:2411.11304 (replaced) [pdf, html, other]
-
Title: Toward Personalized Federated Node Classification in One-shot CommunicationComments: Work in progressSubjects: Machine Learning (cs.LG)
Federated Graph Learning (FGL) has emerged as a promising paradigm for breaking data silos in distributed private graphs data management. In practical scenarios involving complex and heterogeneous distributed graph data, personalized Federated Graph Learning (pFGL) aims to enhance model utility by training personalized models tailored to individual client needs, rather than relying on a universal global model. However, existing pFGL methods often require numerous communication rounds under heterogeneous client graphs, leading to significant security concerns and communication overhead. While One-shot Federated Learning (OFL) addresses these issues by enabling collaboration in a single round, existing OFL methods are designed for image-based tasks and ineffective for graph data, leaving a critical gap in the field. Additionally, personalized models often suffer from bias, failing to generalize effectively to minority data. To address these challenges, we propose the first one-shot personalized federated graph learning method for node classification, compatible with the Secure Aggregation protocol for privacy preservation. Specifically, for effective graph learning in a single communication round, our method estimates and aggregates class-wise feature distribution statistics to construct a global pseudo-graph on the server, facilitating the training of a global graph model. Moreover, to mitigate bias, we introduce a two-stage personalized training approach that adaptively balances local personal information and global insights from the pseudo-graph, improving both personalization and generalization. Extensive experiments conducted on 8 multi-scale graph datasets demonstrate that our method significantly outperforms state-of-the-art baselines across various settings.
- [322] arXiv:2411.11910 (replaced) [pdf, other]
-
Title: AIGS: Generating Science from AI-Powered Automated FalsificationComments: Pre-print. 35 pages. Official website: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Rapid development of artificial intelligence has drastically accelerated the development of scientific discovery. Trained with large-scale observation data, deep neural networks extract the underlying patterns in an end-to-end manner and assist human researchers with highly-precised predictions in unseen scenarios. The recent rise of Large Language Models (LLMs) and the empowered autonomous agents enable scientists to gain help through interaction in different stages of their research, including but not limited to literature review, research ideation, idea implementation, and academic writing. However, AI researchers instantiated by foundation model empowered agents with full-process autonomy are still in their infancy. In this paper, we study $\textbf{AI-Generated Science}$ (AIGS), where agents independently and autonomously complete the entire research process and discover scientific laws. By revisiting the definition of scientific research, we argue that $\textit{falsification}$ is the essence of both human research process and the design of an AIGS system. Through the lens of falsification, prior systems attempting towards AI-Generated Science either lack the part in their design, or rely heavily on existing verification engines that narrow the use in specialized domains. In this work, we propose Baby-AIGS as a baby-step demonstration of a full-process AIGS system, which is a multi-agent system with agents in roles representing key research process. By introducing FalsificationAgent, which identify and then verify possible scientific discoveries, we empower the system with explicit falsification. Experiments on three tasks preliminarily show that Baby-AIGS could produce meaningful scientific discoveries, though not on par with experienced human researchers. Finally, we discuss on the limitations of current Baby-AIGS, actionable insights, and related ethical issues in detail.
- [323] arXiv:2411.11940 (replaced) [pdf, html, other]
-
Title: Introducing Milabench: Benchmarking Accelerators for AIPierre Delaunay, Xavier Bouthillier, Olivier Breuleux, Satya Ortiz-Gagné, Olexa Bilaniuk, Fabrice Normandin, Arnaud Bergeron, Bruno Carrez, Guillaume Alain, Soline Blanc, Frédéric Osterrath, Joseph Viviano, Roger Creus-Castanyer Darshan Patil, Rabiul Awal, Le ZhangSubjects: Machine Learning (cs.LG)
AI workloads, particularly those driven by deep learning, are introducing novel usage patterns to high-performance computing (HPC) systems that are not comprehensively captured by standard HPC benchmarks. As one of the largest academic research centers dedicated to deep learning, Mila identified the need to develop a custom benchmarking suite to address the diverse requirements of its community, which consists of over 1,000 researchers. This report introduces Milabench, the resulting benchmarking suite. Its design was informed by an extensive literature review encompassing 867 papers, as well as surveys conducted with Mila researchers. This rigorous process led to the selection of 26 primary benchmarks tailored for procurement evaluations, alongside 16 optional benchmarks for in-depth analysis. We detail the design methodology, the structure of the benchmarking suite, and provide performance evaluations using GPUs from NVIDIA, AMD, and Intel. The Milabench suite is open source and can be accessed at this http URL.
- [324] arXiv:2411.12858 (replaced) [pdf, html, other]
-
Title: CDI: Copyrighted Data Identification in Diffusion ModelsComments: Coda available at this https URLSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Diffusion Models (DMs) benefit from large and diverse datasets for their training. Since this data is often scraped from the Internet without permission from the data owners, this raises concerns about copyright and intellectual property protections. While (illicit) use of data is easily detected for training samples perfectly re-created by a DM at inference time, it is much harder for data owners to verify if their data was used for training when the outputs from the suspect DM are not close replicas. Conceptually, membership inference attacks (MIAs), which detect if a given data point was used during training, present themselves as a suitable tool to address this challenge. However, we demonstrate that existing MIAs are not strong enough to reliably determine the membership of individual images in large, state-of-the-art DMs. To overcome this limitation, we propose CDI, a framework for data owners to identify whether their dataset was used to train a given DM. CDI relies on dataset inference techniques, i.e., instead of using the membership signal from a single data point, CDI leverages the fact that most data owners, such as providers of stock photography, visual media companies, or even individual artists, own datasets with multiple publicly exposed data points which might all be included in the training of a given DM. By selectively aggregating signals from existing MIAs and using new handcrafted methods to extract features for these datasets, feeding them to a scoring model, and applying rigorous statistical testing, CDI allows data owners with as little as 70 data points to identify with a confidence of more than 99% whether their data was used to train a given DM. Thereby, CDI represents a valuable tool for data owners to claim illegitimate use of their copyrighted data.
- [325] arXiv:2411.12948 (replaced) [pdf, html, other]
-
Title: Machine learned reconstruction of tsunami dynamics from sparse observationsSubjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
We investigate the use of the Senseiver, a transformer neural network designed for sparse sensing applications, to estimate full-field surface height measurements of tsunami waves from sparse observations. The model is trained on a large ensemble of simulated data generated via a shallow water equations solver, which we show to be a faithful reproduction for the underlying dynamics by comparison to historical events. We train the model on a dataset consisting of 8 tsunami simulations whose epicenters correspond to historical USGS earthquake records, and where the model inputs are restricted to measurements obtained at actively deployed buoy locations. We test the Senseiver on a dataset consisting of 8 simulations not included in training, demonstrating its capability for extrapolation. The results show remarkable resolution of fine scale phase and amplitude features from the true field, provided that at least a few of the sensors have obtained a non-zero signal. Throughout, we discuss which forecasting techniques can be improved by this method, and suggest ways in which the flexibility of the architecture can be leveraged to incorporate arbitrary remote sensing data (eg. HF Radar and satellite measurements) as well as investigate optimal sensor placements.
- [326] arXiv:2411.13951 (replaced) [pdf, html, other]
-
Title: A Dataset for Evaluating Online Anomaly Detection Approaches for Discrete Multivariate Time SeriesComments: Submitted to the IEEE Transactions on Reliability journalSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
Benchmarking anomaly detection approaches for multivariate time series is challenging due to the lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a small selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data.
- [327] arXiv:1904.01047 (replaced) [pdf, html, other]
-
Title: Dynamically Optimal Treatment AllocationComments: 92 pagesSubjects: Econometrics (econ.EM); Machine Learning (cs.LG)
Dynamic decisions are pivotal to economic policy making. We show how existing evidence from randomized control trials can be utilized to guide personalized decisions in challenging dynamic environments with budget and capacity constraints. Recent advances in reinforcement learning now enable the solution of many complex, real-world problems for the first time. We allow for restricted classes of policy functions and prove that their regret decays at rate n^(-0.5), the same as in the static case. Applying our methods to job training, we find that by exploiting the problem's dynamic structure, we achieve significantly higher welfare compared to static approaches.
- [328] arXiv:1911.08708 (replaced) [pdf, html, other]
-
Title: Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective MappingUttaran Bhattacharya, Christian Roncal, Trisha Mittal, Rohan Chandra, Kyra Kapsaskis, Kurt Gray, Aniket Bera, Dinesh ManochaComments: 18 pages, 5 figures, 3 tablesJournal-ref: In ECCV 2020, Lecture Notes in Computer Science, vol 12355, SpringerSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We present an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data and represented as sequences of 3D poses. Given the motion on each joint in the pose at each time step extracted from 3D pose sequences, we hierarchically pool these joint motions in a bottom-up manner in the encoder, following the kinematic chains in the human body. We also constrain the latent embeddings of the encoder to contain the space of psychologically-motivated affective features underlying the gaits. We train the decoder to reconstruct the motions per joint per time step in a top-down manner from the latent embeddings. For the annotated data, we also train a classifier to map the latent embeddings to emotion labels. Our semi-supervised approach achieves a mean average precision of 0.84 on the Emotion-Gait benchmark dataset, which contains both labeled and unlabeled gaits collected from multiple sources. We outperform current state-of-art algorithms for both emotion recognition and action recognition from 3D gaits by 7%--23% on the absolute. More importantly, we improve the average precision by 10%--50% on the absolute on classes that each makes up less than 25% of the labeled part of the Emotion-Gait benchmark dataset.
- [329] arXiv:2007.06827 (replaced) [pdf, html, other]
-
Title: Early stopping and polynomial smoothing in regression with reproducing kernelsComments: A shortened version of the articleSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
In this paper, we study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS) in the nonparametric regression framework. In particular, we work with the gradient descent and (iterative) kernel ridge regression algorithms. We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle. This method enjoys only one assumption on the regression function: it belongs to a reproducing kernel Hilbert space (RKHS). The proposed rule is proved to be minimax-optimal over different types of kernel spaces, including finite-rank and Sobolev smoothness classes. The proof is derived from the fixed-point analysis of the localized Rademacher complexities, which is a standard technique for obtaining optimal rates in the nonparametric regression literature. In addition to that, we present simulation results on artificial datasets that show the comparable performance of the designed rule with respect to other stopping rules such as the one determined by V-fold cross-validation.
- [330] arXiv:2105.07513 (replaced) [pdf, html, other]
-
Title: Decision Making with Differential Privacy under a Fairness LensComments: Winner of the 2022 Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies ( this https URL). This is an extended version of the homonymous one, accepted at IJCAI-21Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Agencies, such as the U.S. Census Bureau, release data sets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective. {The paper shows that, when the decisions take as input differentially private data}, the noise added to achieve privacy disproportionately impacts some groups over others. The paper analyzes the reasons for these disproportionate impacts and proposes guidelines to mitigate these effects. The proposed approaches are evaluated on critical decision problems that use differentially private census data.
- [331] arXiv:2108.00262 (replaced) [pdf, html, other]
-
Title: Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression LearningComments: 11 pages, 4 figures, 2 tables. Proceedings of the 29th ACM International Conference on Multimedia, October 20-24, 2021, Virtual Event, ChinaJournal-ref: In Proceedings of the 29th ACM International Conference on Multimedia, 2021Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences. We leverage the Mel-frequency cepstral coefficients and the text transcript computed from the input speech in separate encoders in our generator to learn the desired sentiments and the associated affective cues. We design an affective encoder using multi-scale spatial-temporal graph convolutions to transform 3D pose sequences into latent, pose-based affective features. We use our affective encoder in both our generator, where it learns affective features from the seed poses to guide the gesture synthesis, and our discriminator, where it enforces the synthesized gestures to contain the appropriate affective expressions. We perform extensive evaluations on two benchmark datasets for gesture synthesis from the speech, the TED Gesture Dataset and the GENEA Challenge 2020 Dataset. Compared to the best baselines, we improve the mean absolute joint error by 10--33%, the mean acceleration difference by 8--58%, and the Fréchet Gesture Distance by 21--34%. We also conduct a user study and observe that compared to the best current baselines, around 15.28% of participants indicated our synthesized gestures appear more plausible, and around 16.32% of participants felt the gestures had more appropriate affective expressions aligned with the speech.
- [332] arXiv:2205.12751 (replaced) [pdf, other]
-
Title: Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under ParallelizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $\epsilon$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\epsilon})$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrt{\epsilon})$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.
- [333] arXiv:2212.10802 (replaced) [pdf, html, other]
-
Title: BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSIComments: Accepted by IEEE IoT-JSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
In recent years, indoor human presence detection based on supervised learning (SL) and channel state information (CSI) has attracted much attention. However, existing studies that rely on spatial information of CSI are susceptible to environmental changes which degrade prediction accuracy. Moreover, SL-based methods require time-consuming data labeling for retraining models. Therefore, it is imperative to design a continuously monitored model using a semi-supervised learning (SSL) based scheme. In this paper, we conceive a bifold teacher-student (BTS) learning approach for indoor human presence detection in an adjoining two-room scenario. The proposed SSL-based primal-dual teacher-student network intelligently learns spatial and temporal features from labeled and unlabeled CSI datasets. Additionally, the enhanced penalized loss function leverages entropy and distance measures to distinguish drifted data, i.e., features of new datasets affected by time-varying effects and altered from the original distribution. Experimental results demonstrate that the proposed BTS system accomplishes an averaged accuracy of around 98% after retraining the model with unlabeled data. BTS can sustain an accuracy of 93% under the changed layout and environments. Furthermore, BTS outperforms existing SSL-based models in terms of the highest detection accuracy of around 98% while achieving the asymptotic performance of SL-based methods.
- [334] arXiv:2301.13306 (replaced) [pdf, html, other]
-
Title: Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing DynamicsComments: Appeared at COLT 2024. Numerical experiments added since Jun'24 versionSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and return-on-investment constraints. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret. Our algorithm uses only bandit feedback and can be used with the first- or second-price auction, as well as with any "intermediate" auction format. Our main result is that when these autobidders play against each other, the resulting expected liquid welfare over all rounds is at least half of the expected optimal liquid welfare achieved by any allocation. This holds whether or not the bidding dynamics converges to an equilibrium.
- [335] arXiv:2304.10740 (replaced) [pdf, html, other]
-
Title: Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data StreamsSubjects: General Finance (q-fin.GN); Machine Learning (cs.LG)
Knowing which factors are significant in credit rating assignment leads to better decision-making. However, the focus of the literature thus far has been mostly on structured data, and fewer studies have addressed unstructured or multi-modal datasets. In this paper, we present an analysis of the most effective architectures for the fusion of deep learning models for the prediction of company credit rating classes, by using structured and unstructured datasets of different types. In these models, we tested different combinations of fusion strategies with different deep learning models, including CNN, LSTM, GRU, and BERT. We studied data fusion strategies in terms of level (including early and intermediate fusion) and techniques (including concatenation and cross-attention). Our results show that a CNN-based multi-modal model with two fusion strategies outperformed other multi-modal techniques. In addition, by comparing simple architectures with more complex ones, we found that more sophisticated deep learning models do not necessarily produce the highest performance; however, if attention-based models are producing the best results, cross-attention is necessary as a fusion strategy. Finally, our comparison of rating agencies on short-, medium-, and long-term performance shows that Moody's credit ratings outperform those of other agencies like Standard & Poor's and Fitch Ratings.
- [336] arXiv:2306.00353 (replaced) [pdf, html, other]
-
Title: Constructing Semantics-Aware Adversarial Examples with a Probabilistic PerspectiveComments: 21 pages, 9 figuresSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
We propose a probabilistic perspective on adversarial examples, allowing us to embed subjective understanding of semantics as a distribution into the process of generating adversarial examples, in a principled manner. Despite significant pixel-level modifications compared to traditional adversarial attacks, our method preserves the overall semantics of the image, making the changes difficult for humans to detect. This extensive pixel-level modification enhances our method's ability to deceive classifiers designed to defend against adversarial attacks. Our empirical findings indicate that the proposed methods achieve higher success rates in circumventing adversarial defense mechanisms, while remaining difficult for human observers to detect.
- [337] arXiv:2306.01646 (replaced) [pdf, html, other]
-
Title: Auditing for Human ExpertiseComments: 30 pages, 10 figures. Appeared in the proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). 11/2024 replacement fixes typo in the definition of $τ_k$, as pointed out by Liuquan NieSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)
High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human experts add value which could not be captured by an algorithmic predictor. We develop a statistical framework under which we can pose this question as a natural hypothesis test. Indeed, as our framework highlights, detecting human expertise is more subtle than simply comparing the accuracy of expert predictions to those made by a particular learning algorithm. Instead, we propose a simple procedure which tests whether expert predictions are statistically independent from the outcomes of interest after conditioning on the available inputs (`features'). A rejection of our test thus suggests that human experts may add value to any algorithm trained on the available data, and has direct implications for whether human-AI `complementarity' is achievable in a given prediction task. We highlight the utility of our procedure using admissions data collected from the emergency department of a large academic hospital system, where we show that physicians' admit/discharge decisions for patients with acute gastrointestinal bleeding (AGIB) appear to be incorporating information that is not available to a standard algorithmic screening tool. This is despite the fact that the screening tool is arguably more accurate than physicians' discretionary decisions, highlighting that -- even absent normative concerns about accountability or interpretability -- accuracy is insufficient to justify algorithmic automation.
- [338] arXiv:2306.17775 (replaced) [pdf, html, other]
-
Title: Practical and Asymptotically Exact Conditional Sampling in Diffusion ModelsComments: Code: this https URLJournal-ref: NeurIPS 2023Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models through simulating a set of weighted particles. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and in conditional image generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
- [339] arXiv:2307.13484 (replaced) [pdf, other]
-
Title: Rational kernel-based interpolation for complex-valued frequency response functionsSubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Numerical Analysis (math.NA)
This work is concerned with the kernel-based approximation of a complex-valued function from data, where the frequency response function of a partial differential equation in the frequency domain is of particular interest. In this setting, kernel methods are employed more and more frequently, however, standard kernels do not perform well. Moreover, the role and mathematical implications of the underlying pair of kernels, which arises naturally in the complex-valued case, remain to be addressed. We introduce new reproducing kernel Hilbert spaces of complex-valued functions, and formulate the problem of complex-valued interpolation with a kernel pair as minimum norm interpolation in these spaces. Moreover, we combine the interpolant with a low-order rational function, where the order is adaptively selected based on a new model selection criterion. Numerical results on examples from different fields, including electromagnetics and acoustic examples, illustrate the performance of the method, also in comparison to available rational approximation methods.
- [340] arXiv:2307.13533 (replaced) [pdf, html, other]
-
Title: Generalizable data-driven turbulence closure modeling on unstructured grids with differentiable physicsSubjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)
Differentiable physical simulators are proving to be valuable tools for developing data-driven models in computational fluid dynamics (CFD). These simulators enable end-to-end training of machine learning (ML) models embedded within CFD solvers. This paradigm enables novel algorithms which combine the generalization power and low cost of physics-based simulations with the flexibility and automation of deep learning methods. In this study, we introduce a framework for embedding deep learning models within a generic finite element solver to solve the Navier-Stokes equations, specifically applying this approach to learn a subgrid scale closure with a graph neural network (GNN). We validate our method for flow over a backwards-facing step and test its performance on novel geometries, demonstrating the ability to generalize to novel geometries without sacrificing stability. Additionally, we show that our GNN-based closure model may be learned in a data-limited scenario by interpreting closure modeling as a solver-constrained optimization. Our end-to-end learning paradigm demonstrates a viable pathway for physically consistent and generalizable data-driven closure modeling across complex geometries.
- [341] arXiv:2308.07977 (replaced) [pdf, html, other]
-
Title: Dynamic Attention-Guided Diffusion for Image Super-ResolutionComments: Brian B. Moser and Stanislav Frolov contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Diffusion models in image Super-Resolution (SR) treat all image regions uniformly, which risks compromising the overall image quality by potentially introducing artifacts during denoising of less-complex regions. To address this, we propose ``You Only Diffuse Areas'' (YODA), a dynamic attention-guided diffusion process for image SR. YODA selectively focuses on spatial regions defined by attention maps derived from the low-resolution images and the current denoising time step. This time-dependent targeting enables a more efficient conversion to high-resolution outputs by focusing on areas that benefit the most from the iterative refinement process, i.e., detail-rich objects. We empirically validate YODA by extending leading diffusion-based methods SR3, DiffBIR, and SRDiff. Our experiments demonstrate new state-of-the-art performances in face and general SR tasks across PSNR, SSIM, and LPIPS metrics. As a side effect, we find that YODA reduces color shift issues and stabilizes training with small batches.
- [342] arXiv:2310.04407 (replaced) [pdf, html, other]
-
Title: Policy-Gradient Training of Language Models for RankingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires intricate heuristics, including selecting hard negatives and using additional supervision as learning signals. This reliance on heuristics stems from the fact that the contrastive loss itself is heuristic and does not directly optimize the downstream metrics of decision quality at the end of the processing pipeline. To address this issue, we introduce Neural PG-RANK, a novel training algorithm that learns to rank by instantiating a LLM as a Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for end-to-end training of retrieval models as part of larger decision systems via policy gradient, with little reliance on complex heuristics, and it effectively unifies the training objective with downstream decision-making quality. We conduct extensive experiments on various text retrieval benchmarks. The results demonstrate that when the training objective aligns with the evaluation setup, Neural PG-RANK yields remarkable in-domain performance improvement, with substantial out-of-domain generalization to some critical datasets employed in downstream question answering tasks.
- [343] arXiv:2310.12815 (replaced) [pdf, html, other]
-
Title: Formalizing and Benchmarking Prompt Injection Attacks and DefensesComments: Published in USENIX Security Symposium 2024; the model sizes for closed-source models are from blog postsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at this https URL.
- [344] arXiv:2311.01469 (replaced) [pdf, html, other]
-
Title: Leveraging Language Models to Detect GreenwashingSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
In recent years, climate change repercussions have increasingly captured public interest. Consequently, corporations are emphasizing their environmental efforts in sustainability reports to bolster their public image. Yet, the absence of stringent regulations in review of such reports allows potential greenwashing. In this study, we introduce a novel preliminary methodology to train a language model on generated labels for greenwashing risk. Our primary contributions encompass: developing a preliminary mathematical formulation to quantify greenwashing risk, a fine-tuned ClimateBERT model for this problem, and a comparative analysis of results. On a test set comprising of sustainability reports, our best model achieved an average accuracy score of 86.34% and F1 score of 0.67, demonstrating that our proof-of-concept methodology shows a promising direction of exploration for this task.
- [345] arXiv:2311.02058 (replaced) [pdf, html, other]
-
Title: LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill DiscoveryComments: ICRA 2024Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We introduce LOTUS, a continual imitation learning algorithm that empowers a physical robot to continuously and efficiently learn to solve new manipulation tasks throughout its lifespan. The core idea behind LOTUS is constructing an ever-growing skill library from a sequence of new tasks with a small number of human demonstrations. LOTUS starts with a continual skill discovery process using an open-vocabulary vision model, which extracts skills as recurring patterns presented in unsegmented demonstrations. Continual skill discovery updates existing skills to avoid catastrophic forgetting of previous tasks and adds new skills to solve novel tasks. LOTUS trains a meta-controller that flexibly composes various skills to tackle vision-based manipulation tasks in the lifelong learning process. Our comprehensive experiments show that LOTUS outperforms state-of-the-art baselines by over 11% in success rate, showing its superior knowledge transfer ability compared to prior methods. More results and videos can be found on the project website: this https URL.
- [346] arXiv:2311.03489 (replaced) [pdf, other]
-
Title: Leveraging High-Level Synthesis and Large Language Models to Generate, Simulate, and Deploy a Uniform Random Number Generator Hardware DesignComments: The random number generator design that this article describes has bugs that need to be fixed before this article can be republishedSubjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Programming Languages (cs.PL)
We present a new high-level synthesis methodology for using large language model tools to generate hardware designs. The methodology uses exclusively open-source tools excluding the large language model. As a case study, we use our methodology to generate a permuted congruential random number generator design with a wishbone interface. We verify the functionality and quality of the random number generator design using large language model-generated simulations and the Dieharder randomness test suite. We document all the large language model chat logs, Python scripts, Verilog scripts, and simulation results used in the case study. We believe that our method of hardware design generation coupled with the open source silicon 130 nm design tools will revolutionize application-specific integrated circuit design. Our methodology significantly lowers the bar to entry when building domain-specific computing accelerators for the Internet of Things and proof of concept prototypes for later fabrication in more modern process nodes.
- [347] arXiv:2311.11827 (replaced) [pdf, html, other]
-
Title: Few-shot Multispectral Segmentation with Representations Generated by Reinforcement LearningComments: Accepted to BMVC 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The task of segmentation of multispectral images, which are images with numerous channels or bands, each capturing a specific range of wavelengths of electromagnetic radiation, has been previously explored in contexts with large amounts of labeled data. However, these models tend not to generalize well to datasets of smaller size. In this paper, we propose a novel approach for improving few-shot segmentation performance on multispectral images using reinforcement learning to generate representations. These representations are generated as mathematical expressions between channels and are tailored to the specific class being segmented. Our methodology involves training an agent to identify the most informative expressions using a small dataset, which can include as few as a single labeled sample, updating the dataset using these expressions, and then using the updated dataset to perform segmentation. Due to the limited length of the expressions, the model receives useful representations without any added risk of overfitting. We evaluate the effectiveness of our approach on samples of several multispectral datasets and demonstrate its effectiveness in boosting the performance of segmentation algorithms in few-shot contexts. The code is available at this https URL.
- [348] arXiv:2311.17093 (replaced) [pdf, html, other]
-
Title: A Unified Approach to Semi-Supervised Out-of-Distribution DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
One of the early weaknesses identified in deep neural networks trained for image classification tasks, was their inability to provide low confidence predictions on out-of-distribution (OOD) data, that was significantly different from the in-distribution (ID) data used to train them. Representation learning, where neural networks are trained in specific ways that improve their ability to detect OOD examples, has emerged as a promising direction to solving this problem. However, these approaches require long training times, and can be computationally inefficient at detecting OOD examples. Recent developments in Vision Transformer (ViT) foundation models$\unicode{x2013}$large networks trained on large and diverse datasets with self-supervised approaches$\unicode{x2013}$also show strong performance in OOD detection, and could potentially address some of these challenges. This paper presents Mixture of Exemplars (MoLAR), an approach that provides a unified way of tackling OOD detection challenges in both supervised and semi-supervised settings$\unicode{x2013}$that is designed to be trained with a frozen, pretrained foundation model backbone. MoLAR is efficient to train, and provides strong OOD performance when only comparing the distance of OOD examples to the exemplars, a small set of images chosen to be representative of the dataset. As a result, determining if an image is OOD with MoLAR is no more expensive than classifying an image. Extensive experiments demonstrate the superior OOD detection performance of MoLAR in comparison to comparable approaches, and also the strong performance of MoLAR in semi-supervised settings.
- [349] arXiv:2311.17434 (replaced) [pdf, other]
-
Title: GSE: Group-wise Sparse and Explainable Adversarial AttacksSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Optimization and Control (math.OC)
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.
- [350] arXiv:2311.18838 (replaced) [pdf, html, other]
-
Title: Dataset Distillation via Curriculum Data Synthesis in Large Data EraComments: TMLR 2024 Camera-ready Version. Code and distilled ImageNet-21K dataset are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Dataset distillation or condensation aims to generate a smaller but representative subset from a large dataset, which allows a model to be trained more efficiently, meanwhile evaluating on the original testing data distribution to achieve decent performance. Previous decoupled methods like SRe$^2$L simply use a unified gradient update scheme for synthesizing data from Gaussian noise, while, we notice that the initial several update iterations will determine the final outline of synthesis, thus an improper gradient update strategy may dramatically affect the final generation quality. To address this, we introduce a simple yet effective global-to-local gradient refinement approach enabled by curriculum data augmentation ($\texttt{CDA}$) during data synthesis. The proposed framework achieves the current published highest accuracy on both large-scale ImageNet-1K and 21K with 63.2% under IPC (Images Per Class) 50 and 36.1% under IPC 20, using a regular input resolution of 224$\times$224 with faster convergence speed and less synthetic time. The proposed model outperforms the current state-of-the-art methods like SRe$^2$L, TESLA, and MTT by more than 4% Top-1 accuracy on ImageNet-1K/21K and for the first time, reduces the gap to its full-data training counterparts to less than absolute 15%. Moreover, this work represents the inaugural success in dataset distillation on the larger-scale ImageNet-21K dataset under the standard 224$\times$224 resolution. Our code and distilled ImageNet-21K dataset of 20 IPC, 2K recovery budget are available at this https URL.
- [351] arXiv:2312.04398 (replaced) [pdf, other]
-
Title: Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-TuningComments: 25 pages, 7 figures, accepted by the 103rd Transportation Research Board (TRB) Annual Meeting, under review by Transportation Research Record: Journal of the Transportation Research BoardSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.
- [352] arXiv:2312.12641 (replaced) [pdf, html, other]
-
Title: Robust Point Matching with Distance ProfilesSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
We show the outlier robustness and noise stability of practical matching procedures based on distance profiles. Although the idea of matching points based on invariants like distance profiles has a long history in the literature, there has been little understanding of the theoretical properties of such procedures, especially in the presence of outliers and noise. We provide a theoretical analysis showing that under certain probabilistic settings, the proposed matching procedure is successful with high probability even in the presence of outliers and noise. We demonstrate the performance of the proposed method using a real data example and provide simulation studies to complement the theoretical findings. Lastly, we extend the concept of distance profiles to the abstract setting and connect the proposed matching procedure to the Gromov-Wasserstein distance and its lower bound, with a new sample complexity result derived based on the properties of distance profiles. As a result, we contribute to the literature by providing theoretical underpinnings of the matching procedures based on invariants like distance profiles, which have been widely used in practice but have rarely been analyzed theoretically.
- [353] arXiv:2401.07496 (replaced) [pdf, html, other]
-
Title: Efficient Wireless Federated Learning via Low-Rank Gradient FactorizationComments: 6 pages, 3 figures, 23 references, submittedSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
This paper presents a novel gradient compression method for federated learning (FL) in wireless systems. The proposed method centers on a low-rank matrix factorization strategy for local gradient compression that is based on one iteration of a distributed Jacobi successive convex approximation (SCA) at each FL round. The low-rank approximation obtained at one round is used as a "warm start" initialization for Jacobi SCA in the next FL round. A new protocol termed over-the-air low-rank compression (Ota-LC) incorporating this gradient compression method with over-the-air computation and error feedback is shown to have lower computation cost and lower communication overhead, while guaranteeing the same inference performance, as compared with existing benchmarks. As an example, when targeting a test accuracy of 70% on the Cifar-10 dataset, Ota-LC reduces total communication costs by at least 33% compared to benchmark schemes.
- [354] arXiv:2401.09622 (replaced) [pdf, html, other]
-
Title: Is Hyper-Parameter Optimization Different for Software Analytics?Comments: v3, major revisionsSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Yes. SE data can have "smoother" boundaries between classes (compared to traditional AI data sets). To be more precise, the magnitude of the second derivative of the loss function found in SE data is typically much smaller. A new hyper-parameter optimizer, called SMOOTHIE, can exploit this idiosyncrasy of SE data. We compare SMOOTHIE and a state-of-the-art AI hyper-parameter optimizer on three tasks: (a) GitHub issue lifetime prediction (b) detecting static code warnings false alarm; (c) defect prediction. For completeness, we also show experiments on some standard AI datasets. SMOOTHIE runs faster and predicts better on the SE data--but ties on non-SE data with the AI tool. Hence we conclude that SE data can be different to other kinds of data; and those differences mean that we should use different kinds of algorithms for our data. To support open science and other researchers working in this area, all our scripts and datasets are available on-line at this https URL.
- [355] arXiv:2402.00261 (replaced) [pdf, html, other]
-
Title: Understanding Neural Network Systems for Image Analysis using Vector Spaces and Inverse MapsComments: Accepted to IEEE's Southwest Symposium on Image Analysis and Interpretation 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
There is strong interest in developing mathematical methods that can be used to understand complex neural networks used in image analysis. In this paper, we introduce techniques from Linear Algebra to model neural network layers as maps between signal spaces. First, we demonstrate how signal spaces can be used to visualize weight spaces and convolutional layer kernels. We also demonstrate how residual vector spaces can be used to further visualize information lost at each layer. Second, we study invertible networks using vector spaces for computing input images that yield specific outputs. We demonstrate our approach on two invertible networks and ResNet18.
- [356] arXiv:2402.04616 (replaced) [pdf, html, other]
-
Title: Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge DistillationComments: Accepted by WSDM 2025Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Transferring the reasoning capability from stronger large language models (LLMs) to smaller ones has been quite appealing, as smaller LLMs are more flexible to deploy with less expense. Among the existing solutions, knowledge distillation stands out due to its outstanding efficiency and generalization. However, existing methods suffer from several drawbacks, including limited knowledge diversity and the lack of rich contextual information. To solve the problems and facilitate the learning of compact language models, we propose TinyLLM, a new knowledge distillation paradigm to learn a small student LLM from multiple large teacher LLMs. In particular, we encourage the student LLM to not only generate the correct answers but also understand the rationales behind these answers. Given that different LLMs possess diverse reasoning skills, we guide the student model to assimilate knowledge from various teacher LLMs. We further introduce an in-context example generator and a teacher-forcing Chain-of-Thought strategy to ensure that the rationales are accurate and grounded in contextually appropriate scenarios. Extensive experiments on six datasets across two reasoning tasks demonstrate the superiority of our method. Results show that TinyLLM can outperform large teacher LLMs significantly, despite a considerably smaller model size. The source code is available at: this https URL.
- [357] arXiv:2402.07588 (replaced) [pdf, html, other]
-
Title: Understanding Model Selection For Learning In Strategic EnvironmentsComments: NeurIPS 2024Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)
The deployment of ever-larger machine learning models reflects a growing consensus that the more expressive the model class one optimizes over$\unicode{x2013}$and the more data one has access to$\unicode{x2013}$the more one can improve performance. As models get deployed in a variety of real-world scenarios, they inevitably face strategic environments. In this work, we consider the natural question of how the interplay of models and strategic interactions affects the relationship between performance at equilibrium and the expressivity of model classes. We find that strategic interactions can break the conventional view$\unicode{x2013}$meaning that performance does not necessarily monotonically improve as model classes get larger or more expressive (even with infinite data). We show the implications of this result in several contexts including strategic regression, strategic classification, and multi-agent reinforcement learning. In particular, we show that each of these settings admits a Braess' paradox-like phenomenon in which optimizing over less expressive model classes allows one to achieve strictly better equilibrium outcomes. Motivated by these examples, we then propose a new paradigm for model selection in games wherein an agent seeks to choose amongst different model classes to use as their action set in a game.
- [358] arXiv:2402.09122 (replaced) [pdf, html, other]
-
Title: Weighted-Sum Gaussian Process Latent Variable ModelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This work develops a Bayesian non-parametric approach to signal separation where the signals may vary according to latent variables. Our key contribution is to augment Gaussian Process Latent Variable Models (GPLVMs) for the case where each data point comprises the weighted sum of a known number of pure component signals, observed across several input locations. Our framework allows arbitrary non-linear variations in the signals while being able to incorporate useful priors for the linear weights, such as summing-to-one. Our contributions are particularly relevant to spectroscopy, where changing conditions may cause the underlying pure component signals to vary from sample to sample. To demonstrate the applicability to both spectroscopy and other domains, we consider several applications: a near-infrared spectroscopy dataset with varying temperatures, a simulated dataset for identifying flow configuration through a pipe, and a dataset for determining the type of rock from its reflectance.
- [359] arXiv:2402.09721 (replaced) [pdf, html, other]
-
Title: Generalized Principal-Agent Problem with a Learning AgentSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
Classic principal-agent problems such as Stackelberg games, contract design, and Bayesian persuasion, often assume that the agent is able to best respond to the principal's committed strategy. We study repeated generalized principal-agent problems under the assumption that the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal. We reduce this problem to a one-shot generalized principal-agent problem where the agent approximately best responds. Using this reduction, we show that: (1) If the agent uses contextual no-regret learning algorithms with regret $\mathrm{Reg}(T)$, then the principal can guarantee utility at least $U^* - \Theta\big(\sqrt{\tfrac{\mathrm{Reg}(T)}{T}}\big)$, where $U^*$ is the principal's optimal utility in the classic model with a best-responding agent. (2) If the agent uses contextual no-swap-regret learning algorithms with swap-regret $\mathrm{SReg}(T)$, then the principal cannot obtain utility more than $U^* + O(\frac{\mathrm{SReg(T)}}{T})$. But (3) if the agent uses mean-based learning algorithms (which can be no-regret but not no-swap-regret), then the principal can sometimes do significantly better than $U^*$. These results not only refine previous results in Stackelberg games and contract design, but also lead to new results for Bayesian persuasion with a learning agent and all generalized principal-agent problems where the agent does not have private information.
- [360] arXiv:2402.10754 (replaced) [pdf, html, other]
-
Title: LLMDFA: Analyzing Dataflow in Code with Large Language ModelsComments: 26 pages, 15 figures, 6 tables, NeurIPS 2024 posterSubjects: Programming Languages (cs.PL); Machine Learning (cs.LG); Software Engineering (cs.SE)
Dataflow analysis is a fundamental code analysis technique that identifies dependencies between program values. Traditional approaches typically necessitate successful compilation and expert customization, hindering their applicability and usability for analyzing uncompilable programs with evolving analysis needs in real-world scenarios. This paper presents LLMDFA, an LLM-powered compilation-free and customizable dataflow analysis framework. To address hallucinations for reliable results, we decompose the problem into several subtasks and introduce a series of novel strategies. Specifically, we leverage LLMs to synthesize code that outsources delicate reasoning to external expert tools, such as using a parsing library to extract program values of interest and invoking an automated theorem prover to validate path feasibility. Additionally, we adopt a few-shot chain-of-thought prompting to summarize dataflow facts in individual functions, aligning the LLMs with the program semantics of small code snippets to mitigate hallucinations. We evaluate LLMDFA on synthetic programs to detect three representative types of bugs and on real-world Android applications for customized bug detection. On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35. We have open-sourced LLMDFA at this https URL.
- [361] arXiv:2402.14337 (replaced) [pdf, html, other]
-
Title: How Ambiguous are the Rationales for Natural Language Reasoning? A Simple Approach to Handling Rationale UncertaintyComments: Coling2025Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Rationales behind answers not only explain model decisions but boost language models to reason well on complex reasoning tasks. However, obtaining impeccable rationales is often impossible. Besides, it is non-trivial to estimate the degree to which the rationales are faithful enough to encourage model performance. Thus, such reasoning tasks often compel models to output correct answers under undesirable rationales and are sub-optimal compared to what the models are fully capable of. In this work, we propose how to deal with imperfect rationales causing aleatoric uncertainty. We first define the ambiguous rationales with entropy scores of given rationales, using model prior beliefs as informativeness. We then guide models to select one of two different reasoning models according to the ambiguity of rationales. We empirically argue that our proposed method produces robust performance superiority against the adversarial quality of rationales and low-resource settings.
- [362] arXiv:2402.15402 (replaced) [pdf, other]
-
Title: Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure PriorSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
We focus on the task of unknown object rearrangement, where a robot is supposed to re-configure the objects into a desired goal configuration specified by an RGB-D image. Recent works explore unknown object rearrangement systems by incorporating learning-based perception modules. However, they are sensitive to perception error, and pay less attention to task-level performance. In this paper, we aim to develop an effective system for unknown object rearrangement amidst perception noise. We theoretically reveal that the noisy perception impacts grasp and place in a decoupled way, and show such a decoupled structure is valuable to improve task optimality. We propose GSP, a dual-loop system with the decoupled structure as prior. For the inner loop, we learn a see policy for self-confident in-hand object matching. For the outer loop, we learn a grasp policy aware of object matching and grasp capability guided by task-level rewards. We leverage the foundation model CLIP for object matching, policy learning and self-termination. A series of experiments indicate that GSP can conduct unknown object rearrangement with higher completion rates and fewer steps.
- [363] arXiv:2402.19101 (replaced) [pdf, html, other]
-
Title: Effective Two-Stage Knowledge Transfer for Multi-Entity Cross-Domain RecommendationJianyu Guan, Zongming Yin, Tianyi Zhang, Leihui Chen, Yin Zhang, Fei Huang, Jufeng Chen, Shuguang HanSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
In recent years, the recommendation content on e-commerce platforms has become increasingly rich -- a single user feed may contain multiple entities, such as selling products, short videos, and content posts. To deal with the multi-entity recommendation problem, an intuitive solution is to adopt the shared-network-based architecture for joint training. The idea is to transfer the extracted knowledge from one type of entity (source entity) to another (target entity). However, different from the conventional same-entity cross-domain recommendation, multi-entity knowledge transfer encounters several important issues: (1) data distributions of the source entity and target entity are naturally different, making the shared-network-based joint training susceptible to the negative transfer issue, (2) more importantly, the corresponding feature schema of each entity is not exactly aligned (e.g., price is an essential feature for selling product while missing for content posts), making the existing methods no longer appropriate. Recent researchers have also experimented with the pre-training and fine-tuning paradigm. Again, they only consider the scenarios with the same entity type and feature systems, which is inappropriate in our case. To this end, we design a pre-training & fine-tuning based Multi-entity Knowledge Transfer framework called MKT. MKT utilizes a multi-entity pre-training module to extract transferable knowledge across different entities. In particular, a feature alignment module is first applied to scale and align different feature schemas. Afterward, a couple of knowledge extractors are employed to extract the common and entity-specific knowledge. In the end, the extracted common knowledge is adopted for target entity model training. Through extensive offline and online experiments, we demonstrated the superiority of MKT over multiple State-Of-The-Art methods.
- [364] arXiv:2403.03949 (replaced) [pdf, html, other]
-
Title: Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust ManipulationComments: Project page: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Imitation learning methods need significant human supervision to learn policies robust to changes in object poses, physical disturbances, and visual distractors. Reinforcement learning, on the other hand, can explore the environment autonomously to learn robust behaviors but may require impractical amounts of unsafe real-world data collection. To learn performant, robust policies without the burden of unsafe real-world data collection or extensive human supervision, we propose RialTo, a system for robustifying real-world imitation learning policies via reinforcement learning in "digital twin" simulation environments constructed on the fly from small amounts of real-world data. To enable this real-to-sim-to-real pipeline, RialTo proposes an easy-to-use interface for quickly scanning and constructing digital twins of real-world environments. We also introduce a novel "inverse distillation" procedure for bringing real-world demonstrations into simulated environments for efficient fine-tuning, with minimal human intervention and engineering required. We evaluate RialTo across a variety of robotic manipulation problems in the real world, such as robustly stacking dishes on a rack, placing books on a shelf, and six other tasks. RialTo increases (over 67%) in policy robustness without requiring extensive human data collection. Project website and videos at this https URL
- [365] arXiv:2403.09603 (replaced) [pdf, html, other]
-
Title: Optimistic Verifiable Training by Controlling Hardware NondeterminismComments: 11 pages, 5 figures, Neural Information Processing Systems (NeurIPS) 2024,Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The increasing compute demands of AI systems have led to the emergence of services that train models on behalf of clients lacking necessary resources. However, ensuring correctness of training and guarding against potential training-time attacks, such as data poisoning and backdoors, poses challenges. Existing works on verifiable training largely fall into two classes: proof-based systems, which are difficult to scale, and ``optimistic'' methods that consider a third-party auditor who can replicate the training process and contest the trainer. A key challenge with the latter is that nondeterminism between GPU types during training prevents exact replication of the training process, resulting in schemes that are non-robust. We propose a method that combines training in a higher precision than the target, rounding after intermediate computations, and sharing rounding decisions based on an adaptive thresholding procedure, to successfully control for nondeterminism. Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of ResNet-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at this https URL.
- [366] arXiv:2403.10786 (replaced) [pdf, html, other]
-
Title: ContourDiff: Unpaired Image-to-Image Translation with Structural Consistency for Medical ImagingYuwen Chen, Nicholas Konz, Hanxue Gu, Haoyu Dong, Yaqian Chen, Lin Li, Jisoo Lee, Maciej A. MazurowskiSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Preserving object structure through image-to-image translation is crucial, particularly in applications such as medical imaging (e.g., CT-to-MRI translation), where downstream clinical and machine learning applications will often rely on such preservation. However, typical image-to-image translation algorithms prioritize perceptual quality with respect to output domain features over the preservation of anatomical structures. To address these challenges, we first introduce a novel metric to quantify the structural bias between domains which must be considered for proper translation. We then propose ContourDiff, a novel image-to-image translation algorithm that leverages domain-invariant anatomical contour representations of images to preserve the anatomical structures during translation. These contour representations are simple to extract from images, yet form precise spatial constraints on their anatomical content. ContourDiff applies an input image contour representation as a constraint at every sampling step of a diffusion model trained in the output domain, ensuring anatomical content preservation for the output image. We evaluate our method on challenging lumbar spine and hip-and-thigh CT-to-MRI translation tasks, via (1) the performance of segmentation models trained on translated images applied to real MRIs, and (2) the foreground FID and KID of translated images with respect to real MRIs. Our method outperforms other unpaired image translation methods by a significant margin across almost all metrics and scenarios. Moreover, it achieves this without the need to access any input domain information during training.
- [367] arXiv:2403.13565 (replaced) [pdf, html, other]
-
Title: AdaTrans: Feature-wise and Sample-wise Adaptive Transfer Learning for High-dimensional RegressionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
We consider the transfer learning problem in the high dimensional linear regression setting, where the feature dimension is larger than the sample size. To learn transferable information, which may vary across features or the source samples, we propose an adaptive transfer learning method that can detect and aggregate the feature-wise (F-AdaTrans) or sample-wise (S-AdaTrans) transferable structures. We achieve this by employing a fused-penalty, coupled with weights that can adapt according to the transferable structure. To choose the weight, we propose a theoretically informed, data-driven procedure, enabling F-AdaTrans to selectively fuse the transferable signals with the target while filtering out non-transferable signals, and S-AdaTrans to obtain the optimal combination of information transferred from each source sample. We show that, with appropriately chosen weights, F-AdaTrans achieves a convergence rate close to that of an oracle estimator with a known transferable structure, and S-AdaTrans recovers existing near-minimax optimal rates as a special case. The effectiveness of the proposed method is validated using both simulation and real data, demonstrating favorable performance compared to the existing methods.
- [368] arXiv:2403.13802 (replaced) [pdf, html, other]
-
Title: ZigMa: A DiT-style Zigzag Mamba Diffusion ModelVincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Schusterbauer, Björn OmmerComments: ECCV 2024 Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce a simple, plug-and-play, zero-parameter method named Zigzag Mamba, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ $1024\times 1024$ and UCF101, MultiModal-CelebA-HQ, and MS COCO $256\times 256$ . Code will be released at this https URL
- [369] arXiv:2404.08509 (replaced) [pdf, html, other]
-
Title: Efficient Interactive LLM Serving with Proxy Model-based Sequence Length PredictionHaoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew T. Kalbarczyk, Tamer Başar, Ravishankar K. IyerComments: Accepted at AIOps'24Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models (LLMs) have been driving a new wave of interactive AI applications across numerous domains. However, efficiently serving LLM inference requests is challenging due to their unpredictable execution times originating from the autoregressive nature of generative models. Existing LLM serving systems exploit first-come-first-serve (FCFS) scheduling, suffering from head-of-line blocking issues. To address the non-deterministic nature of LLMs and enable efficient interactive LLM serving, we present a speculative shortest-job-first (SSJF) scheduler that uses a light proxy model to predict LLM output sequence lengths. Our open-source SSJF implementation does not require changes to memory management or batching strategies. Evaluations on real-world datasets and production workload traces show that SSJF reduces average job completion times by 30.5-39.6% and increases throughput by 2.2-3.6x compared to FCFS schedulers, across no batching, dynamic batching, and continuous batching settings.
- [370] arXiv:2404.10630 (replaced) [pdf, html, other]
-
Title: HLAT: High-quality Large Language Model Pre-trained on AWS TrainiumHaozheng Fan, Hao Zhou, Guangtai Huang, Parameswaran Raman, Xinwei Fu, Gaurav Gupta, Dhananjay Ram, Yida Wang, Jun HuanSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML led to a scarcity of the expensive conventional accelerators (such as GPUs), which emphasizes the need for the alternative specialized-accelerators that are scalable and cost-efficient. AWS Trainium is the second-generation machine learning accelerator purposely built for training large deep learning models. However, training LLMs with billions of parameters on AWS Trainium is challenging due to its relatively nascent software ecosystem. In this paper, we showcase HLAT: a family of 7B and 70B decoder-only LLMs pre-trained using 4096 AWS Trainium accelerators over 1.8 trillion tokens. The performance of HLAT is benchmarked against popular open source models including LLaMA and OpenLLaMA, which have been trained on NVIDIA GPUs and Google TPUs, respectively. On various evaluation tasks, we show that HLAT achieves model quality on par with the baselines of similar model size. We also open-source all the training scripts and configurations of HLAT (this https URL) and share the best practice of using the NeuronX Distributed Training (NxDT), a customized distributed training library for AWS Trainium. Our work demonstrates that AWS Trainium powered by NxDT is able to successfully pre-train state-of-the-art LLM models with high performance and cost-effectiveness.
- [371] arXiv:2404.15269 (replaced) [pdf, other]
-
Title: Aligning LLM Agents by Learning Latent Preference from User EditsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.
- [372] arXiv:2405.00332 (replaced) [pdf, html, other]
-
Title: A Careful Examination of Large Language Model Performance on Grade School ArithmeticHugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Charlotte Zhuang, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer YueComments: 2024 NeurIPS Camera Ready (Datasets and Benchmarks Track)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k benchmark, the gold standard for measuring elementary mathematical reasoning. We ensure that the two benchmarks are comparable across important metrics such as human solve rates, number of steps in solution, answer magnitude, and more. When evaluating leading open- and closed-source LLMs on GSM1k, we observe accuracy drops of up to 8%, with several families of models showing evidence of systematic overfitting across almost all model sizes. Further analysis suggests a positive relationship (Spearman's r^2 = 0.36) between a model's probability of generating an example from GSM8k and its performance gap between GSM8k and GSM1k, suggesting that some models may have partially memorized GSM8k. Nevertheless, many models, especially those on the frontier, show minimal signs of overfitting, and all models broadly demonstrate generalization to novel math problems guaranteed to not be in their training data.
- [373] arXiv:2405.02968 (replaced) [pdf, html, other]
-
Title: CoverLib: Classifiers-equipped Experience Library by Iterative Problem Distribution Coverage Maximization for Domain-tuned Motion PlanningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Library-based methods are known to be very effective for fast motion planning by adapting an experience retrieved from a precomputed library. This article presents CoverLib, a principled approach for constructing and utilizing such a library. CoverLib iteratively adds an experience-classifier-pair to the library, where each classifier corresponds to an adaptable region of the experience within the problem space. This iterative process is an active procedure, as it selects the next experience based on its ability to effectively cover the uncovered region. During the query phase, these classifiers are utilized to select an experience that is expected to be adaptable for a given problem. Experimental results demonstrate that CoverLib effectively mitigates the trade-off between plannability and speed observed in global (e.g. sampling-based) and local (e.g. optimization-based) methods. As a result, it achieves both fast planning and high success rates over the problem domain. Moreover, due to its adaptation-algorithm-agnostic nature, CoverLib seamlessly integrates with various adaptation methods, including nonlinear programming-based and sampling-based algorithms.
- [374] arXiv:2405.10313 (replaced) [pdf, html, other]
-
Title: How Far Are We From AGI: Are LLMs All We Need?Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing studies have reviewed specific advancements in AI and proposed potential paths to AGI, such as large language models (LLMs), they fall short of providing a thorough exploration of AGI's definitions, objectives, and developmental trajectories. Unlike previous survey papers, this work goes beyond summarizing LLMs by addressing key questions about our progress toward AGI and outlining the strategies essential for its realization through comprehensive analysis, in-depth discussions, and novel insights. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI.
- [375] arXiv:2405.11143 (replaced) [pdf, html, other]
-
Title: OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF FrameworkSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at \url{this https URL}.
- [376] arXiv:2405.12085 (replaced) [pdf, html, other]
-
Title: Noise-tolerant learnability of shallow quantum circuits from statistics and the cost of quantum pseudorandomnessComments: 21+7 pages, 1 figure, 1 tableSubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
This work studies the learnability of quantum circuits in the near term. We show the natural robustness of quantum statistical queries for learning quantum processes and provide an efficient way to benchmark global depolarizing noise from statistics, which gives us a powerful framework for developing noise-tolerant algorithms. We adapt a learning algorithm for constant-depth quantum circuits to the quantum statistical query setting with a small overhead in the query complexity. We prove average-case lower bounds for learning random quantum circuits of logarithmic and higher depths within diamond distance with statistical queries. Finally, we prove that pseudorandom unitaries (PRUs) cannot be constructed using circuits of constant depth by constructing an efficient distinguisher and proving a new variation of the quantum no-free lunch theorem.
- [377] arXiv:2405.13962 (replaced) [pdf, html, other]
-
Title: Robust Generative Learning with Lipschitz-Regularized $\alpha$-Divergences Allows Minimal Assumptions on Target DistributionsComments: 43 pages, 6 figures and 2 tables in the main textSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This paper demonstrates the robustness of Lipschitz-regularized $\alpha$-divergences as objective functionals in generative modeling, showing they enable stable learning across a wide range of target distributions with minimal assumptions. We establish that these divergences remain finite under a mild condition-that the source distribution has a finite first moment-regardless of the properties of the target distribution, making them adaptable to the structure of target distributions. Furthermore, we prove the existence and finiteness of their variational derivatives, which are essential for stable training of generative models such as GANs and gradient flows. For heavy-tailed targets, we derive necessary and sufficient conditions that connect data dimension, $\alpha$, and tail behavior to divergence finiteness, that also provide insights into the selection of suitable $\alpha$'s. We also provide the first sample complexity bounds for empirical estimations of these divergences on unbounded domains. As a byproduct, we obtain the first sample complexity bounds for empirical estimations of these divergences and the Wasserstein-1 metric with group symmetry on unbounded domains. Numerical experiments confirm that generative models leveraging Lipschitz-regularized $\alpha$-divergences can stably learn distributions in various challenging scenarios, including those with heavy tails or complex, low-dimensional, or fractal support, all without any prior knowledge of the structure of target distributions.
- [378] arXiv:2405.15925 (replaced) [pdf, other]
-
Title: MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion SegmentationComments: 11 pages, 8 figures, journal paper is accepted by Exploration of MedicineSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Skin lesion segmentation is key for early skin cancer detection. Challenges in automatic segmentation from dermoscopic images include variations in color, texture, and artifacts of indistinct lesion boundaries. Deep learning methods like CNNs and U-Net have shown promise in addressing these issues. To further aid early diagnosis, especially on mobile devices with limited computing power, we present MUCM-Net. This efficient model combines Mamba State-Space Models with our UCM-Net architecture for improved feature learning and segmentation. MUCM-Net's Mamba-UCM Layer is optimized for mobile deployment, offering high accuracy with low computational needs. Tested on ISIC datasets, it outperforms other methods in accuracy and computational efficiency, making it a scalable tool for early detection in settings with limited resources. Our MUCM-Net source code is available for research and collaboration, supporting advances in mobile health diagnostics and the fight against skin cancer. In order to facilitate accessibility and further research in the field, the MUCM-Net source code is this https URL
- [379] arXiv:2405.19644 (replaced) [pdf, html, other]
-
Title: EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery VideosComments: Early accepted by MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase.
- [380] arXiv:2406.03095 (replaced) [pdf, html, other]
-
Title: EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery VideosSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection.
- [381] arXiv:2406.11148 (replaced) [pdf, html, other]
-
Title: Few-Shot Recognition via Stage-Wise Retrieval-Augmented FinetuningComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Few-shot recognition (FSR) aims to train a classification model with only a few labeled examples of each concept concerned by a downstream task, where data annotation cost can be prohibitively high. We develop methods to solve FSR by leveraging a pretrained Vision-Language Model (VLM). We particularly explore retrieval-augmented learning (RAL), which retrieves data from the VLM's pretraining set to learn better models for serving downstream tasks. RAL has been widely studied in zero-shot recognition but remains under-explored in FSR. Although applying RAL to FSR may seem straightforward, we observe interesting and novel challenges and opportunities. First, somewhat surprisingly, finetuning a VLM on a large amount of retrieved data underperforms state-of-the-art zero-shot methods. This is due to the imbalanced distribution of retrieved data and its domain gaps with the few-shot examples in the downstream task. Second, more surprisingly, we find that simply finetuning a VLM solely on few-shot examples significantly outperforms previous FSR methods, and finetuning on the mix of retrieved and few-shot data yields even better results. Third, to mitigate the imbalanced distribution and domain gap issues, we propose Stage-Wise retrieval-Augmented fineTuning (SWAT), which involves end-to-end finetuning on mixed data in the first stage and retraining the classifier on the few-shot data in the second stage. Extensive experiments on nine popular benchmarks demonstrate that SWAT significantly outperforms previous methods by $>$6% accuracy.
- [382] arXiv:2406.12123 (replaced) [pdf, html, other]
-
Title: ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for StrokeJingxi Xu, Runsheng Wang, Siqi Shang, Ava Chen, Lauren Winterbottom, To-Liang Hsu, Wenxi Chen, Khondoker Ahmed, Pedro Leandro La Rotta, Xinyue Zhu, Dawn M. Nilsen, Joel Stein, Matei CiocarlieComments: 8 pages; accepted to RA-L in November, 2024Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos, source code, and additional information can be found at this https URL.
- [383] arXiv:2406.12901 (replaced) [pdf, html, other]
-
Title: Interpretable machine learning approach for electron antineutrino selection in a large liquid scintillator detectorA. Gavrikov, V. Cerrone, A. Serafini, R. Brugnera, A. Garfagnini, M. Grassi, B. Jelmini, L. Lastrucci, S. Aiello, G. Andronico, V. Antonelli, A. Barresi, D. Basilico, M. Beretta, A. Bergnoli, M. Borghesi, A. Brigatti, R. Bruno, A. Budano, B. Caccianiga, A. Cammi, R. Caruso, D. Chiesa, C. Clementi, S. Dusini, A. Fabbri, G. Felici, F. Ferraro, M. G. Giammarchi, N. Giudice, R. M. Guizzetti, N. Guardone, C. Landini, I. Lippi, S. Loffredo, L. Loi, P. Lombardi, C. Lombardo, F. Mantovani, S. M. Mari, A. Martini, L. Miramonti, M. Montuschi, M. Nastasi, D. Orestano, F. Ortica, A. Paoloni, E. Percalli, F. Petrucci, E. Previtali, G. Ranucci, A. C. Re, M. Redchuck, B. Ricci, A. Romani, P. Saggese, G. Sava, C. Sirignano, M. Sisti, L. Stanco, E. Stanescu Farilla, V. Strati, M. D. C. Torri, A. Triossi, C. Tuvé, C. Venettacci, G. Verde, L. VotanoComments: This is a post-peer-review, pre-copyedit version of an article published in Phys. Lett. B. The final published version is available online: this https URLJournal-ref: Physics Letters B 860, 139141 (2025)Subjects: Instrumentation and Detectors (physics.ins-det); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)
Several neutrino detectors, KamLAND, Daya Bay, Double Chooz, RENO, and the forthcoming large-scale JUNO, rely on liquid scintillator to detect reactor antineutrino interactions. In this context, inverse beta decay represents the golden channel for antineutrino detection, providing a pair of correlated events, thus a strong experimental signature to distinguish the signal from a variety of backgrounds. However, given the low cross-section of antineutrino interactions, the development of a powerful event selection algorithm becomes imperative to achieve effective discrimination between signal and backgrounds. In this study, we introduce a machine learning (ML) model to achieve this goal: a fully connected neural network as a powerful signal-background discriminator for a large liquid scintillator detector. We demonstrate, using the JUNO detector as an example, that, despite the already high efficiency of a cut-based approach, the presented ML model can further improve the overall event selection efficiency. Moreover, it allows for the retention of signal events at the detector edges that would otherwise be rejected because of the overwhelming amount of background events in that region. We also present the first interpretable analysis of the ML approach for event selection in reactor neutrino experiments. This method provides insights into the decision-making process of the model and offers valuable information for improving and updating traditional event selection approaches.
- [384] arXiv:2406.13352 (replaced) [pdf, html, other]
-
Title: AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM AgentsComments: Updated version after fixing a bug in the Llama implementation and updating the travel suiteSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
AI agents aim to solve complex tasks by combining text-based reasoning with external tool calls. Unfortunately, AI agents are vulnerable to prompt injection attacks where data returned by external tools hijacks the agent to execute malicious tasks. To measure the adversarial robustness of AI agents, we introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data. To capture the evolving nature of attacks and defenses, AgentDojo is not a static test suite, but rather an extensible environment for designing and evaluating new agent tasks, defenses, and adaptive attacks. We populate the environment with 97 realistic tasks (e.g., managing an email client, navigating an e-banking website, or making travel bookings), 629 security test cases, and various attack and defense paradigms from the literature. We find that AgentDojo poses a challenge for both attacks and defenses: state-of-the-art LLMs fail at many tasks (even in the absence of attacks), and existing prompt injection attacks break some security properties but not all. We hope that AgentDojo can foster research on new design principles for AI agents that solve common tasks in a reliable and robust manner.. We release the code for AgentDojo at this https URL.
- [385] arXiv:2406.16525 (replaced) [pdf, html, other]
-
Title: OAL: Enhancing OOD Detection Using Latent DiffusionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Numerous Out-of-Distribution (OOD) detection algorithms have been developed to identify unknown samples or objects in real-world model deployments. Outlier Exposure (OE) algorithms, a subset of these methods, typically employ auxiliary datasets to train OOD detectors, enhancing the reliability of their predictions. While previous methods have leveraged Stable Diffusion (SD) to generate pixel-space outliers, these can complicate network optimization. We propose an Outlier Aware Learning (OAL) framework, which synthesizes OOD training data directly in the latent space. To regularize the model's decision boundary, we introduce a mutual information-based contrastive learning approach that amplifies the distinction between In-Distribution (ID) and collected OOD features. The efficacy of this contrastive learning technique is supported by both theoretical analysis and empirical results. Furthermore, we integrate knowledge distillation into our framework to preserve in-distribution classification accuracy. The combined application of contrastive learning and knowledge distillation substantially improves OOD detection performance, enabling OAL to outperform other OE methods by a considerable margin. Source code is available at: \url{this https URL}.
- [386] arXiv:2407.04541 (replaced) [pdf, html, other]
-
Title: PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit PostsComments: Accepted at ICPR 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We introduce PoPreRo, the first dataset for Popularity Prediction of Romanian posts collected from Reddit. The PoPreRo dataset includes a varied compilation of post samples from five distinct subreddits of Romania, totaling 28,107 data samples. Along with our novel dataset, we introduce a set of competitive models to be used as baselines for future research. Interestingly, the top-scoring model achieves an accuracy of 61.35% and a macro F1 score of 60.60% on the test set, indicating that the popularity prediction task on PoPreRo is very challenging. Further investigations based on few-shot prompting the Falcon-7B Large Language Model also point in the same direction. We thus believe that PoPreRo is a valuable resource that can be used to evaluate models on predicting the popularity of social media posts in Romanian. We release our dataset at this https URL.
- [387] arXiv:2407.14985 (replaced) [pdf, html, other]
-
Title: Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining DataXinyi Wang, Antonis Antoniades, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang WangComments: updated 10-page versionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The impressive capabilities of large language models (LLMs) have sparked debate over whether these models genuinely generalize to unseen tasks or predominantly rely on memorizing vast amounts of pretraining data. To explore this issue, we introduce an extended concept of memorization, distributional memorization, which measures the correlation between the LLM output probabilities and the pretraining data frequency. To effectively capture task-specific pretraining data frequency, we propose a novel task-gram language model, which is built by counting the co-occurrence of semantically related $n$-gram pairs from task inputs and outputs in the pretraining corpus. Using the Pythia models trained on the Pile dataset, we evaluate four distinct tasks: machine translation, factual question answering, world knowledge understanding, and math reasoning. Our findings reveal varying levels of memorization, with the strongest effect observed in factual question answering. Furthermore, while model performance improves across all tasks as LLM size increases, only factual question answering shows an increase in memorization, whereas machine translation and reasoning tasks exhibit greater generalization, producing more novel outputs. This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks, providing a scalable method for analyzing large pretraining corpora in greater depth.
- [388] arXiv:2407.20620 (replaced) [pdf, html, other]
-
Title: Accelerated forward-backward and Douglas-Rachford splitting dynamicsComments: 11 pages; 2 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)
We examine convergence properties of continuous-time variants of accelerated Forward-Backward (FB) and Douglas-Rachford (DR) splitting algorithms for nonsmooth composite optimization problems. When the objective function is given by the sum of a quadratic and a nonsmooth term, we establish accelerated sublinear and exponential convergence rates for convex and strongly convex problems, respectively. Moreover, for FB splitting dynamics, we demonstrate that accelerated exponential convergence rate carries over to general strongly convex problems. In our Lyapunov-based analysis we exploit the variable-metric gradient interpretations of FB and DR splittings to obtain smooth Lyapunov functions that allow us to establish accelerated convergence rates. We provide computational experiments to demonstrate the merits and the effectiveness of our analysis.
- [389] arXiv:2407.21565 (replaced) [pdf, html, other]
-
Title: Multi-agent reinforcement learning for the control of three-dimensional Rayleigh-B\'enard convectionComments: Submitted to the special issue titled 'Machine Learning for Fluid Dynamics' in the journal Flow, Turbulence and Combusion. 39 pages and 20 figuresSubjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)
Deep reinforcement learning (DRL) has found application in numerous use-cases pertaining to flow control. Multi-agent RL (MARL), a variant of DRL, has shown to be more effective than single-agent RL in controlling flows exhibiting locality and translational invariance. We present, for the first time, an implementation of MARL-based control of three-dimensional Rayleigh-Bénard convection (RBC). Control is executed by modifying the temperature distribution along the bottom wall divided into multiple control segments, each of which acts as an independent agent. Two regimes of RBC are considered at Rayleigh numbers $\mathrm{Ra}=500$ and $750$. Evaluation of the learned control policy reveals a reduction in convection intensity by $23.5\%$ and $8.7\%$ at $\mathrm{Ra}=500$ and $750$, respectively. The MARL controller converts irregularly shaped convective patterns to regular straight rolls with lower convection that resemble flow in a relatively more stable regime. We draw comparisons with proportional control at both $\mathrm{Ra}$ and show that MARL is able to outperform the proportional controller. The learned control strategy is complex, featuring different non-linear segment-wise actuator delays and actuation magnitudes. We also perform successful evaluations on a larger domain than used for training, demonstrating that the invariant property of MARL allows direct transfer of the learnt policy.
- [390] arXiv:2408.03330 (replaced) [pdf, html, other]
-
Title: Modeling Latent Neural Dynamics with Gaussian Process Switching Linear Dynamical SystemsComments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Understanding how the collective activity of neural populations relates to computation and ultimately behavior is a key goal in neuroscience. To this end, statistical methods which describe high-dimensional neural time series in terms of low-dimensional latent dynamics have played a fundamental role in characterizing neural systems. Yet, what constitutes a successful method involves two opposing criteria: (1) methods should be expressive enough to capture complex nonlinear dynamics, and (2) they should maintain a notion of interpretability often only warranted by simpler linear models. In this paper, we develop an approach that balances these two objectives: the Gaussian Process Switching Linear Dynamical System (gpSLDS). Our method builds on previous work modeling the latent state evolution via a stochastic differential equation whose nonlinear dynamics are described by a Gaussian process (GP-SDEs). We propose a novel kernel function which enforces smoothly interpolated locally linear dynamics, and therefore expresses flexible -- yet interpretable -- dynamics akin to those of recurrent switching linear dynamical systems (rSLDS). Our approach resolves key limitations of the rSLDS such as artifactual oscillations in dynamics near discrete state boundaries, while also providing posterior uncertainty estimates of the dynamics. To fit our models, we leverage a modified learning objective which improves the estimation accuracy of kernel hyperparameters compared to previous GP-SDE fitting approaches. We apply our method to synthetic data and data recorded in two neuroscience experiments and demonstrate favorable performance in comparison to the rSLDS.
- [391] arXiv:2408.07435 (replaced) [pdf, other]
-
Title: Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systemsJulian Ruddick, Glenn Ceusters, Gilles Van Kriekinge, Evgenii Genov, Cedric De Cauwer, Thierry Coosemans, Maarten MessagieComments: Accepted version Energy and AI: this https URLSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Recent advancements in machine learning based energy management approaches, specifically reinforcement learning with a safety layer (OptLayerPolicy) and a metaheuristic algorithm generating a decision tree control policy (TreeC), have shown promise. However, their effectiveness has only been demonstrated in computer simulations. This paper presents the real-world validation of these methods, comparing against model predictive control and simple rule-based control benchmark. The experiments were conducted on the electrical installation of 4 reproductions of residential houses, which all have their own battery, photovoltaic and dynamic load system emulating a non-controllable electrical load and a controllable electric vehicle charger. The results show that the simple rules, TreeC, and model predictive control-based methods achieved similar costs, with a difference of only 0.6%. The reinforcement learning based method, still in its training phase, obtained a cost 25.5\% higher to the other methods. Additional simulations show that the costs can be further reduced by using a more representative training dataset for TreeC and addressing errors in the model predictive control implementation caused by its reliance on accurate data from various sources. The OptLayerPolicy safety layer allows safe online training of a reinforcement learning agent in the real-world, given an accurate constraint function formulation. The proposed safety layer method remains error-prone, nonetheless, it is found beneficial for all investigated methods. The TreeC method, which does require building a realistic simulation for training, exhibits the safest operational performance, exceeding the grid limit by only 27.1 Wh compared to 593.9 Wh for reinforcement learning.
- [392] arXiv:2408.14778 (replaced) [pdf, html, other]
-
Title: GPU-Accelerated Counterfactual Regret MinimizationSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Counterfactual regret minimization is a family of algorithms of no-regret learning dynamics capable of solving large-scale imperfect information games. We propose implementing this algorithm as a series of dense and sparse matrix and vector operations, thereby making it highly parallelizable for a graphical processing unit, at a cost of higher memory usage. Our experiments show that our implementation performs up to about 244.5 times faster than OpenSpiel's Python implementation and, on an expanded set of games, up to about 114.2 times faster than OpenSpiel's C++ implementation and the speedup becomes more pronounced as the size of the game being solved grows.
- [393] arXiv:2409.03757 (replaced) [pdf, html, other]
-
Title: Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
Complex 3D scene understanding has gained increasing attention, with scene encoding strategies playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios. Our evaluation spans seven vision foundation encoders, including image-based, video-based, and 3D foundation models. We evaluate these models in four tasks: Vision-Language Scene Reasoning, Visual Grounding, Segmentation, and Registration, each focusing on different aspects of scene understanding. Our evaluations yield key findings: DINOv2 demonstrates superior performance, video models excel in object-level tasks, diffusion models benefit geometric tasks, and language-pretrained models show unexpected limitations in language-related tasks. These insights challenge some conventional understandings, provide novel perspectives on leveraging visual foundation models, and highlight the need for more flexible encoder selection in future vision-language and scene-understanding tasks. Code: this https URL
- [394] arXiv:2409.04923 (replaced) [pdf, html, other]
-
Title: Single-snapshot machine learning for super-resolution of turbulenceComments: To appear in Journal of Fluid MechanicsSubjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
Modern machine-learning techniques are generally considered data-hungry. However, this may not be the case for turbulence as each of its snapshots can hold more information than a single data file in general machine-learning settings. This study asks the question of whether nonlinear machine-learning techniques can effectively extract physical insights even from as little as a {\it single} snapshot of turbulent flow. As an example, we consider machine-learning-based super-resolution analysis that reconstructs a high-resolution field from low-resolution data for two examples of two-dimensional isotropic turbulence and three-dimensional turbulent channel flow. First, we reveal that a carefully designed machine-learning model trained with flow tiles sampled from only a single snapshot can reconstruct vortical structures across a range of Reynolds numbers for two-dimensional decaying turbulence. Successful flow reconstruction indicates that nonlinear machine-learning techniques can leverage scale-invariance properties to learn turbulent flows. We also show that training data of turbulent flows can be cleverly collected from a single snapshot by considering characteristics of rotation and shear tensors. Second, we perform the single-snapshot super-resolution analysis for turbulent channel flow, showing that it is possible to extract physical insights from a single flow snapshot even with inhomogeneity. The present findings suggest that embedding prior knowledge in designing a model and collecting data is important for a range of data-driven analyses for turbulent flows. More broadly, this work hopes to stop machine-learning practitioners from being wasteful with turbulent flow data.
- [395] arXiv:2409.10392 (replaced) [pdf, html, other]
-
Title: TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based ClusteringSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
The world of Machine Learning (ML) has witnessed rapid changes in terms of new models and ways to process users data. The majority of work that has been done is focused on Deep Learning (DL) based approaches. However, with the emergence of new algorithms such as the Tsetlin Machine (TM) algorithm, there is growing interest in exploring alternative approaches that may offer unique advantages in certain domains or applications. One of these domains is Federated Learning (FL), in which users privacy is of utmost importance. Due to its novelty, FL has seen a surge in the incorporation of personalization techniques to enhance model accuracy while maintaining user privacy under personalized conditions. In this work, we propose a novel approach called TPFL: Tsetlin-Personalized Federated Learning, in which models are grouped into clusters based on their confidence towards a specific class. In this way, clustering can benefit from two key advantages. Firstly, clients share only what they are confident about, resulting in the elimination of wrongful weight aggregation among clients whose data for a specific class may have not been enough during the training. This phenomenon is prevalent when the data are non-Independent and Identically Distributed (non-IID). Secondly, by sharing only weights towards a specific class, communication cost is substantially reduced, making TPLF efficient in terms of both accuracy and communication cost. The TPFL results were compared with 6 other baseline methods; namely FedAvg, FedProx, FLIS DC, FLIS HC, IFCA and FedTM. The results demonstrated that TPFL performance better than baseline methods with 98.94% accuracy on MNIST, 98.52% accuracy on FashionMNIST and 91.16% accuracy on FEMNIST dataset.
- [396] arXiv:2409.17201 (replaced) [pdf, html, other]
-
Title: Immersion and Invariance-based Coding for Privacy-Preserving Federated LearningSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Federated learning (FL) has emerged as a method to preserve privacy in collaborative distributed learning. In FL, clients train AI models directly on their devices rather than sharing data with a centralized server, which can pose privacy risks. However, it has been shown that despite FL's partial protection of local data privacy, information about clients' data can still be inferred from shared model updates during training. In recent years, several privacy-preserving approaches have been developed to mitigate this privacy leakage in FL, though they often provide privacy at the cost of model performance or system efficiency. Balancing these trade-offs presents a significant challenge in implementing FL schemes. In this manuscript, we introduce a privacy-preserving FL framework that combines differential privacy and system immersion tools from control theory. The core idea is to treat the optimization algorithms used in standard FL schemes (e.g., gradient-based algorithms) as a dynamical system that we seek to immerse into a higher-dimensional system (referred to as the target optimization algorithm). The target algorithm's dynamics are designed such that, first, the model parameters of the original algorithm are immersed in its parameters; second, it operates on distorted parameters; and third, it converges to an encoded version of the true model parameters from the original algorithm. These encoded parameters can then be decoded at the server to retrieve the original model parameters. We demonstrate that the proposed privacy-preserving scheme can be tailored to offer any desired level of differential privacy for both local and global model parameters, while maintaining the same accuracy and convergence rate as standard FL algorithms.
- [397] arXiv:2409.17273 (replaced) [pdf, other]
-
Title: An Integrated Deep Learning Framework for Effective Brain Tumor Localization, Segmentation, and Classification from Magnetic Resonance ImagesComments: 36 pages, 27 figures, 5 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Tumors in the brain result from abnormal cell growth within the brain tissue, arising from various types of brain cells. When left undiagnosed, they lead to severe neurological deficits such as cognitive impairment, motor dysfunction, and sensory loss. As the tumor grows, it causes an increase in intracranial pressure, potentially leading to life-threatening complications such as brain herniation. Therefore, early detection and treatment are necessary to manage the complications caused by such tumors to slow down their growth. Numerous works involving deep learning (DL) and artificial intelligence (AI) are being carried out to assist physicians in early diagnosis by utilizing the scans obtained through Magnetic Resonance Imaging (MRI). Our research proposes DL frameworks for localizing, segmenting, and classifying the grade of these gliomas from MRI images to solve this critical issue. In our localization framework, we enhance the LinkNet framework with a VGG19- inspired encoder architecture for improved multimodal tumor feature extraction, along with spatial and graph attention mechanisms to refine feature focus and inter-feature relationships. Following this, we integrated the SeResNet101 CNN model as the encoder backbone into the LinkNet framework for tumor segmentation, which achieved an IoU Score of 96%. To classify the segmented tumors, we combined the SeResNet152 feature extractor with an Adaptive Boosting classifier, which yielded an accuracy of 98.53%. Our proposed models demonstrated promising results, with the potential to advance medical AI by enabling early diagnosis and providing more accurate treatment options for patients.
- [398] arXiv:2409.18737 (replaced) [pdf, html, other]
-
Title: MemFusionMap: Working Memory Fusion for Online Vectorized HD Map ConstructionComments: Accepted to WACV 2025Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
High-definition (HD) maps provide environmental information for autonomous driving systems and are essential for safe planning. While existing methods with single-frame input achieve impressive performance for online vectorized HD map construction, they still struggle with complex scenarios and occlusions. We propose MemFusionMap, a novel temporal fusion model with enhanced temporal reasoning capabilities for online HD map construction. Specifically, we contribute a working memory fusion module that improves the model's memory capacity to reason across a history of frames. We also design a novel temporal overlap heatmap to explicitly inform the model about the temporal overlap information and vehicle trajectory in the Bird's Eye View space. By integrating these two designs, MemFusionMap significantly outperforms existing methods while also maintaining a versatile design for scalability. We conduct extensive evaluation on open-source benchmarks and demonstrate a maximum improvement of 5.4% in mAP over state-of-the-art methods. The project page for MemFusionMap is this https URL
- [399] arXiv:2409.19458 (replaced) [pdf, html, other]
-
Title: Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation ApproachComments: 17 pagesSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from $n$ auxiliary tasks. This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning. The key challenge of this problem is that not all auxiliary tasks are useful to improve the performance of the target task. Thus, choosing the right subset of auxiliary tasks is crucial. Conventional subset selection methods, such as forward and backward stepwise selection, are unsuitable for LM fine-tuning because they require repeated training on subsets of auxiliary tasks. This paper introduces a new algorithm to estimate model fine-tuning performances without repeated training. Our algorithm first performs multitask training using the data of all the tasks to obtain a meta initialization. Then, we approximate the model fine-tuning loss of a subset using functional values and gradients from the meta initialization. Empirically, we find that this gradient-based approximation holds with remarkable accuracy for twelve transformer-based LMs. Thus, we can now estimate fine-tuning performances on CPUs within a few seconds. Finally, we fine-tune the pretrained base model for once on the selected subset of tasks. We conduct extensive experiments to validate this approach, delivering a speedup of $30\times$ over conventional subset selection while incurring only $1\%$ error of the true fine-tuning performances. In downstream evaluations involving both instruction tuning and chain-of-thought fine-tuning, this loss-based selection approach improves over prior gradient or representation similarity-based methods for subset selection by up to $3.8\%$.
- [400] arXiv:2409.20520 (replaced) [pdf, html, other]
-
Title: Accelerating Non-Maximum Suppression: A Graph Theory PerspectiveSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Non-maximum suppression (NMS) is an indispensable post-processing step in object detection. With the continuous optimization of network models, NMS has become the ``last mile'' to enhance the efficiency of object detection. This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure. Consequently, we propose two optimization methods, namely QSI-NMS and BOE-NMS. The former is a fast recursive divide-and-conquer algorithm with negligible mAP loss, and its extended version (eQSI-NMS) achieves optimal complexity of $\mathcal{O}(n\log n)$. The latter, concentrating on the locality of NMS, achieves an optimization at a constant level without an mAP loss penalty. Moreover, to facilitate rapid evaluation of NMS methods for researchers, we introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods. Taking the YOLOv8-N model on MS COCO 2017 as the benchmark setup, our method QSI-NMS provides $6.2\times$ speed of original NMS on the benchmark, with a $0.1\%$ decrease in mAP. The optimal eQSI-NMS, with only a $0.3\%$ mAP decrease, achieves $10.7\times$ speed. Meanwhile, BOE-NMS exhibits $5.1\times$ speed with no compromise in mAP.
- [401] arXiv:2410.03736 (replaced) [pdf, other]
-
Title: CliMB: An AI-enabled Partner for Clinical Predictive ModelingComments: * Evgeny Saveliev and Tim Schubert contributed equally to this workSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Despite its significant promise and continuous technical advances, real-world applications of artificial intelligence (AI) remain limited. We attribute this to the "domain expert-AI-conundrum": while domain experts, such as clinician scientists, should be able to build predictive models such as risk scores, they face substantial barriers in accessing state-of-the-art (SOTA) tools. While automated machine learning (AutoML) has been proposed as a partner in clinical predictive modeling, many additional requirements need to be fulfilled to make machine learning accessible for clinician scientists.
To address this gap, we introduce CliMB, a no-code AI-enabled partner designed to empower clinician scientists to create predictive models using natural language. CliMB guides clinician scientists through the entire medical data science pipeline, thus empowering them to create predictive models from real-world data in just one conversation. CliMB also creates structured reports and interpretable visuals. In evaluations involving clinician scientists and systematic comparisons against a baseline GPT-4, CliMB consistently demonstrated superior performance in key areas such as planning, error prevention, code execution, and model performance. Moreover, in blinded assessments involving 45 clinicians from diverse specialties and career stages, more than 80% preferred CliMB over GPT-4. Overall, by providing a no-code interface with clear guidance and access to SOTA methods in the fields of data-centric AI, AutoML, and interpretable ML, CliMB empowers clinician scientists to build robust predictive models.
The proof-of-concept version of CliMB is available as open-source software on GitHub: this https URL. - [402] arXiv:2410.04996 (replaced) [pdf, other]
-
Title: Assumption-Lean Post-Integrated Inference with Negative Control OutcomesComments: 22 pages for the main text, 27 pages for the appendix, 8 figures for the main text, 5 figures for the appendixSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP); Machine Learning (stat.ML)
Data integration methods aim to extract low-dimensional embeddings from high-dimensional outcomes to remove unwanted variations, such as batch effects and unmeasured covariates, across heterogeneous datasets. However, multiple hypothesis testing after integration can be biased due to data-dependent processes. We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using negative control outcomes. Leveraging causal interpretations, we derive nonparametric identifiability of the direct effects, which motivates our semiparametric inference method. Our method extends to projected direct effect estimands, accounting for hidden mediators, confounders, and moderators. These estimands remain statistically meaningful under model misspecifications and with error-prone embeddings. We provide bias quantifications and finite-sample linear expansions with uniform concentration bounds. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification, facilitating data-adaptive estimation with machine learning algorithms. Our proposal is evaluated with random forests through simulations and analysis of single-cell CRISPR perturbed datasets with potential unmeasured confounders.
- [403] arXiv:2410.07404 (replaced) [pdf, html, other]
-
Title: Fostering Intrinsic Motivation in Reinforcement Learning with Pretrained Foundation ModelsComments: Accepted at the Intrinsically Motivated Open-ended Learning workshop at NeurIPS 2024Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Exploration remains a significant challenge in reinforcement learning, especially in environments where extrinsic rewards are sparse or non-existent. The recent rise of foundation models, such as CLIP, offers an opportunity to leverage pretrained, semantically rich embeddings that encapsulate broad and reusable knowledge. In this work we explore the potential of these foundation models not just to drive exploration, but also to analyze the critical role of the episodic novelty term in enhancing exploration effectiveness of the agent. We also investigate whether providing the intrinsic module with complete state information -- rather than just partial observations -- can improve exploration, despite the difficulties in handling small variations within large state spaces. Our experiments in the MiniGrid domain reveal that intrinsic modules can effectively utilize full state information, significantly increasing sample efficiency while learning an optimal policy. Moreover, we show that the embeddings provided by foundation models are sometimes even better than those constructed by the agent during training, further accelerating the learning process, especially when coupled with the episodic novelty term to enhance exploration.
- [404] arXiv:2410.07838 (replaced) [pdf, html, other]
-
Title: Minority-Focused Text-to-Image Generation via Prompt OptimizationComments: 20 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for producing high-quality generations. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that can encourage the emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor this generic prompt optimizer into a specialized solver that promotes the generation of minority features by incorporating a carefully-crafted likelihood objective. Our comprehensive experiments, conducted across various types of T2I models, demonstrate that our approach significantly enhances the capability to produce high-quality minority instances compared to existing samplers.
- [405] arXiv:2410.10384 (replaced) [pdf, html, other]
-
Title: Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to OptimalSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence. To overcome this, we introduce Length scale Balancing (LB) - a novel approach, aggregating multiple base surrogate models with varying length scales. LB intermittently adds smaller length scale candidate values while retaining longer scales, balancing exploration and exploitation. We formally derive a cumulative regret bound of LB and compare it with the regret of an oracle BO algorithm using the optimal length scale. Denoting the factor by which the regret bound of A-GP-UCB was away from oracle as $g(T)$, we show that LB is only $\log g(T)$ away from oracle regret. We also empirically evaluate our algorithm on synthetic and real-world benchmarks and show it outperforms A-GP-UCB, maximum likelihood estimation and MCMC.
- [406] arXiv:2410.12822 (replaced) [pdf, html, other]
-
Title: AVID: Adapting Video Diffusion Models to World ModelsComments: Project Webpage: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Large-scale generative models have achieved remarkable success in a number of domains. However, for sequential decision-making problems, such as robotics, action-labelled data is often scarce and therefore scaling-up foundation models for decision-making remains a challenge. A potential solution lies in leveraging widely-available unlabelled videos to train world models that simulate the consequences of actions. If the world model is accurate, it can be used to optimize decision-making in downstream tasks. Image-to-video diffusion models are already capable of generating highly realistic synthetic videos. However, these models are not action-conditioned, and the most powerful models are closed-source which means they cannot be finetuned. In this work, we propose to adapt pretrained video diffusion models to action-conditioned world models, without access to the parameters of the pretrained model. Our approach, AVID, trains an adapter on a small domain-specific dataset of action-labelled videos. AVID uses a learned mask to modify the intermediate outputs of the pretrained model and generate accurate action-conditioned videos. We evaluate AVID on video game and real-world robotics data, and show that it outperforms existing baselines for diffusion model adaptation.1 Our results demonstrate that if utilized correctly, pretrained video models have the potential to be powerful tools for embodied AI.
- [407] arXiv:2410.12830 (replaced) [pdf, html, other]
-
Title: Incorporating Metabolic Information into LLMs for Anomaly Detection in Clinical Time-SeriesJournal-ref: NeurIPS 2024 Workshop on Time Series in the Age of Large ModelsSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Anomaly detection in clinical time-series holds significant potential in identifying suspicious patterns in different biological parameters. In this paper, we propose a targeted method that incorporates the clinical domain knowledge into LLMs to improve their ability to detect anomalies. We introduce the Metabolism Pathway-driven Prompting (MPP) method, which integrates the information about metabolic pathways to better capture the structural and temporal changes in biological samples. We applied our method for doping detection in sports, focusing on steroid metabolism, and evaluated using real-world data from athletes. The results show that our method improves anomaly detection performance by leveraging metabolic context, providing a more nuanced and accurate prediction of suspicious samples in athletes' profiles.
- [408] arXiv:2410.15201 (replaced) [pdf, html, other]
-
Title: Learning the Rolling Penny DynamicsSubjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Mathematical Physics (math-ph)
We consider learning the dynamics of a typical nonholonomic system -- the rolling penny. A nonholonomic system is a system subject to nonholonomic constraints. Unlike a holonomic constraints, a nonholonomic constraint does not define a submanifold on the configuration space. Therefore, the inverse problem of finding the constraints has to involve the tangent space. This paper discusses how to learn the dynamics, as well as the constraints for such a system, given the data set of discrete trajectories on the tangent bundle $TQ$.
- [409] arXiv:2410.15851 (replaced) [pdf, html, other]
-
Title: R2I-rPPG: A Robust Region of Interest Selection Method for Remote Photoplethysmography to Extract Heart RateComments: preprintSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
The COVID-19 pandemic has underscored the need for low-cost, scalable approaches to measuring contactless vital signs, either during initial triage at a healthcare facility or virtual telemedicine visits. Remote photoplethysmography (rPPG) can accurately estimate heart rate (HR) when applied to close-up videos of healthy volunteers in well-lit laboratory settings. However, results from such highly optimized laboratory studies may not be readily translated to healthcare settings. One significant barrier to the practical application of rPPG in health care is the accurate localization of the region of interest (ROI). Clinical or telemedicine visits may involve sub-optimal lighting, movement artifacts, variable camera angle, and subject distance. This paper presents an rPPG ROI selection method based on 3D facial landmarks and patient head yaw angle. We then demonstrate the robustness of this ROI selection method when coupled to the Plane-Orthogonal-to-Skin (POS) rPPG method when applied to videos of patients presenting to an Emergency Department for respiratory complaints. Our results demonstrate the effectiveness of our proposed approach in improving the accuracy and robustness of rPPG in a challenging clinical environment.
- [410] arXiv:2410.19193 (replaced) [pdf, html, other]
-
Title: Enriching GNNs with Text Contextual Representations for Detecting Disinformation Campaigns on Social MediaComments: Work still in progress. Accepted as Extended Abstract Poster at LoG Conference 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Disinformation on social media poses both societal and technical challenges, requiring robust detection systems. While previous studies have integrated textual information into propagation networks, they have yet to fully leverage the advancements in Transformer-based language models for high-quality contextual text representations. This work addresses this gap by incorporating Transformer-based textual features into Graph Neural Networks (GNNs) for fake news detection. We demonstrate that contextual text representations enhance GNN performance, achieving 33.8% relative improvement in Macro F1 over models without textual features and 9.3% over static text representations. We further investigate the impact of different feature sources and the effects of noisy data augmentation. We expect our methodology to open avenues for further research, and we made code publicly available.
- [411] arXiv:2410.19843 (replaced) [pdf, html, other]
-
Title: Artificial intelligence for partial differential equations in computational mechanics: A reviewYizheng Wang, Jinshuai Bai, Zhongya Lin, Qimin Wang, Cosmin Anitescu, Jia Sun, Mohammad Sadegh Eshaghi, Yuantong Gu, Xi-Qiao Feng, Xiaoying Zhuang, Timon Rabczuk, Yinghua LiuSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
In recent years, Artificial intelligence (AI) has become ubiquitous, empowering various fields, especially integrating artificial intelligence and traditional science (AI for Science: Artificial intelligence for science), which has attracted widespread attention. In AI for Science, using artificial intelligence algorithms to solve partial differential equations (AI for PDEs: Artificial intelligence for partial differential equations) has become a focal point in computational mechanics. The core of AI for PDEs is the fusion of data and partial differential equations (PDEs), which can solve almost any PDEs. Therefore, this article provides a comprehensive review of the research on AI for PDEs, summarizing the existing algorithms and theories. The article discusses the applications of AI for PDEs in computational mechanics, including solid mechanics, fluid mechanics, and biomechanics. The existing AI for PDEs algorithms include those based on Physics-Informed Neural Networks (PINNs), Deep Energy Methods (DEM), Operator Learning, and Physics-Informed Neural Operator (PINO). AI for PDEs represents a new method of scientific simulation that provides approximate solutions to specific problems using large amounts of data, then fine-tuning according to specific physical laws, avoiding the need to compute from scratch like traditional algorithms. Thus, AI for PDEs is the prototype for future foundation models in computational mechanics, capable of significantly accelerating traditional numerical algorithms.
- [412] arXiv:2411.02998 (replaced) [pdf, html, other]
-
Title: Accelerating Task Generalisation with Multi-Level Hierarchical OptionsComments: 10 pages, under review for ICLR 2025Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Creating reinforcement learning agents that generalise effectively to new tasks is a key challenge in AI research. This paper introduces Fracture Cluster Options (FraCOs), a multi-level hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks. FraCOs identifies patterns in agent behaviour and forms options based on the expected future usefulness of those patterns, enabling rapid adaptation to new tasks. In tabular settings, FraCOs demonstrates effective transfer and improves performance as it grows in hierarchical depth. We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments. Our results show that FraCOs achieves higher in-distribution and out-of-distribution performance than competitors.
- [413] arXiv:2411.04389 (replaced) [pdf, html, other]
-
Title: Approximate FW Algorithm with a novel DMO method over Graph-structured Support SetSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
In this project, we reviewed a paper that deals graph-structured convex optimization (GSCO) problem with the approximate Frank-Wolfe (FW) algorithm. We analyzed and implemented the original algorithm and introduced some extensions based on that. Then we conducted experiments to compare the results and concluded that our backtracking line-search method effectively reduced the number of iterations, while our new DMO method (Top-g+ optimal visiting) did not make satisfying enough improvements.
- [414] arXiv:2411.06601 (replaced) [pdf, html, other]
-
Title: OffLight: An Offline Multi-Agent Reinforcement Learning Framework for Traffic Signal ControlSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Efficient traffic control (TSC) is essential for urban mobility, but traditional systems struggle to handle the complexity of real-world traffic. Multi-agent Reinforcement Learning (MARL) offers adaptive solutions, but online MARL requires extensive interactions with the environment, making it costly and impractical. Offline MARL mitigates these challenges by using historical traffic data for training but faces significant difficulties with heterogeneous behavior policies in real-world datasets, where mixed-quality data complicates learning. We introduce OffLight, a novel offline MARL framework designed to handle heterogeneous behavior policies in TSC datasets. To improve learning efficiency, OffLight incorporates Importance Sampling (IS) to correct for distributional shifts and Return-Based Prioritized Sampling (RBPS) to focus on high-quality experiences. OffLight utilizes a Gaussian Mixture Variational Graph Autoencoder (GMM-VGAE) to capture the diverse distribution of behavior policies from local observations. Extensive experiments across real-world urban traffic scenarios show that OffLight outperforms existing offline RL methods, achieving up to a 7.8% reduction in average travel time and 11.2% decrease in queue length. Ablation studies confirm the effectiveness of OffLight's components in handling heterogeneous data and improving policy performance. These results highlight OffLight's scalability and potential to improve urban traffic management without the risks of online learning.
- [415] arXiv:2411.07180 (replaced) [pdf, other]
-
Title: Counterfactual Generation from Language ModelsComments: A preprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery -- e.g., model ablations or manipulation of linear subspaces tied to specific concepts -- to intervene on these models. To understand the impact of interventions precisely, it is useful to examine counterfactuals -- e.g., how a given sentence would have appeared had it been generated by the model following a specific intervention. We highlight that counterfactual reasoning is conceptually distinct from interventions, as articulated in Pearl's causal hierarchy. Based on this observation, we propose a framework for generating true string counterfactuals by reformulating language models as Generalized Structural-equation. Models using the Gumbel-max trick. This allows us to model the joint distribution over original strings and their counterfactuals resulting from the same instantiation of the sampling noise. We develop an algorithm based on hindsight Gumbel sampling that allows us to infer the latent noise variables and generate counterfactuals of observed strings. Our experiments demonstrate that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.
- [416] arXiv:2411.08460 (replaced) [pdf, html, other]
-
Title: Trap-MID: Trapdoor-based Defense against Model Inversion AttacksComments: Accepted by Neural Information Processing Systems (NeurIPS) 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Model Inversion (MI) attacks pose a significant threat to the privacy of Deep Neural Networks by recovering training data distribution from well-trained models. While existing defenses often rely on regularization techniques to reduce information leakage, they remain vulnerable to recent attacks. In this paper, we propose the Trapdoor-based Model Inversion Defense (Trap-MID) to mislead MI attacks. A trapdoor is integrated into the model to predict a specific label when the input is injected with the corresponding trigger. Consequently, this trapdoor information serves as the "shortcut" for MI attacks, leading them to extract trapdoor triggers rather than private data. We provide theoretical insights into the impacts of trapdoor's effectiveness and naturalness on deceiving MI attacks. In addition, empirical experiments demonstrate the state-of-the-art defense performance of Trap-MID against various MI attacks without the requirements for extra data or large computational overhead. Our source code is publicly available at this https URL.
- [417] arXiv:2411.10096 (replaced) [pdf, html, other]
-
Title: Neural Port-Hamiltonian Models for Nonlinear Distributed Control: An Unconstrained Parametrization ApproachComments: The paper has 15 pages, and has been submitted for a possible publication. arXiv admin note: text overlap with arXiv:2403.17785Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The control of large-scale cyber-physical systems requires optimal distributed policies relying solely on limited communication with neighboring agents. However, computing stabilizing controllers for nonlinear systems while optimizing complex costs remains a significant challenge. Neural Networks (NNs), known for their expressivity, can be leveraged to parametrize control policies that yield good performance. However, NNs' sensitivity to small input changes poses a risk of destabilizing the closed-loop system. Many existing approaches enforce constraints on the controllers' parameter space to guarantee closed-loop stability, leading to computationally expensive optimization procedures. To address these problems, we leverage the framework of port-Hamiltonian systems to design continuous-time distributed control policies for nonlinear systems that guarantee closed-loop stability and finite $\mathcal{L}_2$ or incremental $\mathcal{L}_2$ gains, independent of the optimzation parameters of the controllers. This eliminates the need to constrain parameters during optimization, allowing the use of standard techniques such as gradient-based methods. Additionally, we discuss discretization schemes that preserve the dissipation properties of these controllers for implementation on embedded systems. The effectiveness of the proposed distributed controllers is demonstrated through consensus control of non-holonomic mobile robots subject to collision avoidance and averaged voltage regulation with weighted power sharing in DC microgrids.
- [418] arXiv:2411.11681 (replaced) [pdf, html, other]
-
Title: PSPO*: An Effective Process-supervised Policy Optimization for Reasoning AlignmentComments: Our code can be found at this https URLSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Process supervision enhances the performance of large language models in reasoning tasks by providing feedback at each step of chain-of-thought reasoning. However, due to the lack of effective process supervision methods, even advanced large language models are prone to logical errors and redundant reasoning. We claim that the effectiveness of process supervision significantly depends on both the accuracy and the length of reasoning chains. Moreover, we identify that these factors exhibit a nonlinear relationship with the overall reward score of the reasoning process. Inspired by these insights, we propose a novel process supervision paradigm, PSPO*, which systematically outlines the workflow from reward model training to policy optimization, and highlights the importance of nonlinear rewards in process supervision. Based on PSPO*, we develop the PSPO-WRS, which considers the number of reasoning steps in determining reward scores and utilizes an adjusted Weibull distribution for nonlinear reward shaping. Experimental results on six mathematical reasoning datasets demonstrate that PSPO-WRS consistently outperforms current mainstream models.
- [419] arXiv:2411.12070 (replaced) [pdf, html, other]
-
Title: Autoassociative Learning of Structural Representations for Modeling and Classification in Medical ImagingComments: 16 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep learning architectures based on convolutional neural networks tend to rely on continuous, smooth features. While this characteristics provides significant robustness and proves useful in many real-world tasks, it is strikingly incompatible with the physical characteristic of the world, which, at the scale in which humans operate, comprises crisp objects, typically representing well-defined categories. This study proposes a class of neurosymbolic systems that learn by reconstructing the observed images in terms of visual primitives and are thus forced to form high-level, structural explanations of them. When applied to the task of diagnosing abnormalities in histological imaging, the method proved superior to a conventional deep learning architecture in terms of classification accuracy, while being more transparent.
- [420] arXiv:2411.13615 (replaced) [pdf, other]
-
Title: A Deep Learning Approach to Predict the Fall [of Price] of Cryptocurrency Long Before its Actual FallComments: 22 pages, 3 figuresSubjects: Statistical Finance (q-fin.ST); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In modern times, the cryptocurrency market is one of the world's most rapidly rising financial markets. The cryptocurrency market is regarded to be more volatile and illiquid than traditional markets such as equities, foreign exchange, and commodities. The risk of this market creates an uncertain condition among the investors. The purpose of this research is to predict the magnitude of the risk factor of the cryptocurrency market. Risk factor is also called volatility. Our approach will assist people who invest in the cryptocurrency market by overcoming the problems and difficulties they experience. Our approach starts with calculating the risk factor of the cryptocurrency market from the existing parameters. In twenty elements of the cryptocurrency market, the risk factor has been predicted using different machine learning algorithms such as CNN, LSTM, BiLSTM, and GRU. All of the models have been applied to the calculated risk factor parameter. A new model has been developed to predict better than the existing models. Our proposed model gives the highest RMSE value of 1.3229 and the lowest RMSE value of 0.0089. Following our model, it will be easier for investors to trade in complicated and challenging financial assets like bitcoin, Ethereum, dogecoin, etc. Where the other existing models, the highest RMSE was 14.5092, and the lower was 0.02769. So, the proposed model performs much better than models with proper generalization. Using our approach, it will be easier for investors to trade in complicated and challenging financial assets like Bitcoin, Ethereum, and Dogecoin.
- [421] arXiv:2411.14449 (replaced) [pdf, other]
-
Title: Unlearn to Relearn Backdoors: Deferred Backdoor Functionality Attacks on Deep Learning ModelsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Deep learning models are vulnerable to backdoor attacks, where adversaries inject malicious functionality during training that activates on trigger inputs at inference time. Extensive research has focused on developing stealthy backdoor attacks to evade detection and defense mechanisms. However, these approaches still have limitations that leave the door open for detection and mitigation due to their inherent design to cause malicious behavior in the presence of a trigger. To address this limitation, we introduce Deferred Activated Backdoor Functionality (DABF), a new paradigm in backdoor attacks. Unlike conventional attacks, DABF initially conceals its backdoor, producing benign outputs even when triggered. This stealthy behavior allows DABF to bypass multiple detection and defense methods, remaining undetected during initial inspections. The backdoor functionality is strategically activated only after the model undergoes subsequent updates, such as retraining on benign data. DABF attacks exploit the common practice in the life cycle of machine learning models to perform model updates and fine-tuning after initial deployment. To implement DABF attacks, we approach the problem by making the unlearning of the backdoor fragile, allowing it to be easily cancelled and subsequently reactivate the backdoor functionality. To achieve this, we propose a novel two-stage training scheme, called DeferBad. Our extensive experiments across various fine-tuning scenarios, backdoor attack types, datasets, and model architectures demonstrate the effectiveness and stealthiness of DeferBad.
- [422] arXiv:2411.15098 (replaced) [pdf, html, other]
-
Title: OminiControl: Minimal and Universal Control for Diffusion TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models. At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone and process them with its flexible multi-modal attention processors. Unlike existing methods, which rely heavily on additional encoder modules with complex architectures, OminiControl (1) effectively and efficiently incorporates injected image conditions with only ~0.1% additional parameters, and (2) addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions such as edges, depth, and more. Remarkably, these capabilities are achieved by training on images generated by the DiT itself, which is particularly beneficial for subject-driven generation. Extensive evaluations demonstrate that OminiControl outperforms existing UNet-based and DiT-adapted models in both subject-driven and spatially-aligned conditional generation. Additionally, we release our training dataset, Subjects200K, a diverse collection of over 200,000 identity-consistent images, along with an efficient data synthesis pipeline to advance research in subject-consistent generation.