arXiv:2604.20937v1 Announce Type: new Abstract: Video Large Language Models (Video LLMs) incur high inference latency due to a large number of visual tokens provided to LLMs. To address this, training-free visual token pruning has emerged as a solution to reduce computational costs; however, existing methods are primarily validated on Multiple-Choice Question Answering (MCQA) benchmarks, where coa
Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs
Kibum Kim, Jiwan Kim, Kyle Min, Yueqi Wang, Jinyoung Moon, Julian McAuley, Chanyoung Park·arXiv cs.LG··1 min read
a
Continue reading on arXiv cs.LG
This article was sourced from arXiv cs.LG's RSS feed. Visit the original for the complete story.