arXiv:2604.20972v1 Announce Type: new Abstract: Content moderation systems are typically evaluated by measuring agreement with human labels. In rule-governed environments this assumption fails: multiple decisions may be logically consistent with the governing policy, and agreement metrics penalize valid decisions while mischaracterizing ambiguity as error -- a failure mode we term the Agreement Tr
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
Michael O'Herlihy, Rosa Catal\`a·arXiv cs.AI··1 min read
a
Continue reading on arXiv cs.AI
This article was sourced from arXiv cs.AI's RSS feed. Visit the original for the complete story.