Soft-TransFormers for Continual Learning

Kang, Haeyong; Yoo, Chang D.

Computer Science > Machine Learning

arXiv:2411.16073 (cs)

[Submitted on 25 Nov 2024]

Title:Soft-TransFormers for Continual Learning

Authors:Haeyong Kang, Chang D. Yoo

View PDF HTML (experimental)

Abstract:Inspired by Well-initialized Lottery Ticket Hypothesis (WLTH), which provides suboptimal fine-tuning solutions, we propose a novel fully fine-tuned continual learning (CL) method referred to as Soft-TransFormers (Soft-TF). Soft-TF sequentially learns and selects an optimal soft-network or subnetwork for each task. During sequential training in CL, Soft-TF jointly optimizes the weights of sparse layers to obtain task-adaptive soft (real-valued) networks or subnetworks (binary masks), while keeping the well-pre-trained layer parameters frozen. In inference, the identified task-adaptive network of Soft-TF masks the parameters of the pre-trained network, mapping to an optimal solution for each task and minimizing Catastrophic Forgetting (CF) - the soft-masking preserves the knowledge of the pre-trained network. Extensive experiments on Vision Transformer (ViT) and CLIP demonstrate the effectiveness of Soft-TF, achieving state-of-the-art performance across various CL scenarios, including Class-Incremental Learning (CIL) and Task-Incremental Learning (TIL), supported by convergence theory.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.16073 [cs.LG]
	(or arXiv:2411.16073v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.16073

Submission history

From: Haeyong Kang [view email]
[v1] Mon, 25 Nov 2024 03:52:47 UTC (1,034 KB)

Computer Science > Machine Learning

Title:Soft-TransFormers for Continual Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Soft-TransFormers for Continual Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators