Even the best self-driving cars keep getting confused and crashing into stuff, sometimes with fatal results. Here’s a thought: maybe the simulations being used to test them aren’t wacky enough. A team of researchers have unveiled a new benchmark for testing autonomous vehicles that subjects them to all sorts of “unseen” and truly random scenarios — such as an elephant lumbering across a city street.
“Why did the elephant cross the road? To expose how fragile your model is,” Andreas Geiger, head of the Autonomous Vision Group at the University of Tübingen in Germany, and coauthor of a new preprint paper, wrote in a LinkedIn post. In footage provided by the researchers, a simulated AV veers into one of these gentle giants, mowing it down.
In another test, the car stops in front of — and then slams into, for some reason — a playground slide sitting in the middle of the road. The car also gets foiled by a Looney Tunes-style wall painted to look like the road ahead (a ruse that’s confounded real-life self-driving cars.) Why did the elephant cross the road? To expose how fragile your model is.There's a relatively quiet but serious problem in autonomous driving research: most models are trained and evaluated on the same scenarios. pic.twitter.com/eAXHiZTZ1U— Katrin Renz (@KatrinRenz) April 23, 2026 Without the added context, you could mistake these for clips of a hacker trolling players in “GTA Online.” But they serve a serious purpose, according to Geiger.
“There’s a relatively quiet but serious problem in autonomous driving research: most models are trained and evaluated not on the same exact data, but on the same scenarios,” he said. “What looks like strong benchmark performance may just be strong memorization.” Geiger’s new benchmark, dubbed Fail2Drive, is designed to address that. It introduces heaps of so-called out-of-distribution scenarios into an open source simulator for AV research called CARLA, which is widely used in the industry.
Some of the scenarios are as outlandish as crosswalk-abiding elephants, but others aren’t. One depicted in the video shows a firetruck parked in a road, which the car brainlessly smashes into at full speed. (And that beside, AVs have been involved in the deaths of numerous animals.) When Geiger and his team tested autonomous driving models using Fail2Drive, they uncovered a worrying discrepancy. On average, their success rate dropped by 22.8 percent, “highlighting fundamental robustness concerns in current approaches,” he wrote.
Whether this proves to be an effective way of preparing AVs for the chaos of driving in the real world, it could probably at least save the lives of a few wayward elephants. More on self-driving: Elon Musk Admits He Lied to Tesla Customers’ Faces for Years About Self-Driving
