arXiv:2604.21432v1 Announce Type: new Abstract: In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get bored when songs of the same genre are recommended over and over) or by an external factor (e.g., content becomes
A single algorithm for both restless and rested rotting bandits
Julien Seznec, Pierre M\'enard, Alessandro Lazaric, Michal Valko·arXiv stat.ML··1 min read
a
Continue reading on arXiv stat.ML
This article was sourced from arXiv stat.ML's RSS feed. Visit the original for the complete story.