AI News

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

Fei Zuo, Xiaoyan Xi, Quanyi Zeng, Feiyu Wang, Ho Fai Leung·arXiv cs.LG·1h ago·1 min read

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

Fei Zuo, Xiaoyan Xi, Quanyi Zeng, Feiyu Wang, Ho Fai Leung·arXiv cs.LG·1h ago · Friday, April 24, 2026·1 min read

arXiv:2604.20913v1 Announce Type: new Abstract: Large language models are increasingly deployed on CPU-only platforms where memory bandwidth is the primary bottleneck for autoregressive generation. Weight quantization to four bits or below reduces memory pressure, yet existing systems still dequantize weights and perform floating-point multiplications, limiting the achievable gains. Ternary weight

Continue reading on arXiv cs.LG

This article was sourced from arXiv cs.LG's RSS feed. Visit the original for the complete story.

Read full article