Before I wrote a single line of RealDataAgentBench, I spent time doing something most benchmark builders skip: I mapped out what each major model was actually known to be good at and where each one quietly fell apart. The observation that started everything was simple: no single model dominates across all dimensions. Every model has a superpower.

Every model has a blind spot. And no existing bench