SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference

Shin, Jiho; Yang, Hoeseok; Yi, Youngmin

Computer Science > Performance

arXiv:2411.12692 (cs)

[Submitted on 19 Nov 2024]

Title:SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference

Authors:Jiho Shin, Hoeseok Yang, Youngmin Yi

View PDF HTML (experimental)

Abstract:Leveraging sparsity is crucial for optimizing large language model inference. however, modern LLMs employing SiLU as their activation function exhibit minimal activation sparsity. Recent research has proposed replacing SiLU with ReLU to induce significant activation sparsity and showed no downstream task accuracy degradation through fine tuning. However, taking full advantage of it required training a predictor to estimate this sparsity. In this paper, we introduce SparseInfer, a simple, light weight, and training free predictor for activation sparsity of ReLU field LLMs, in which activation sparsity is predicted by comparing only the sign bits of inputs and weights. To compensate for possible prediction inaccuracy, an adaptive tuning of the predictor's conservativeness is enabled, which can also serve as a control knob for optimizing LLM inference. The proposed method achieves approximately faster inference speed over the state of the art, with negligible accuracy loss of within 1%p.

Subjects:	Performance (cs.PF)
Cite as:	arXiv:2411.12692 [cs.PF]
	(or arXiv:2411.12692v1 [cs.PF] for this version)
	https://doi.org/10.48550/arXiv.2411.12692

Submission history

From: Jiho Shin [view email]
[v1] Tue, 19 Nov 2024 17:59:12 UTC (7,426 KB)

Computer Science > Performance

Title:SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Performance

Title:SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators