Technology & Science

Solving CartPole Without Gradients: Simulated Annealing

Berkan Sesen·Dev.to·2h ago·1 min read

Solving CartPole Without Gradients: Simulated Annealing

Berkan Sesen·Dev.to·2h ago · Thursday, April 23, 2026·1 min read

In the previous post, we solved CartPole using the Cross-Entropy Method: sample 200 candidate policies, keep the best 40, refit a Gaussian, repeat. It worked beautifully, reaching a perfect score of 500 in 50 iterations. But 200 candidates per iteration means 10,000 total episode evaluations.

That got me wondering: do we really need a population of 200 to find four good numbers? The original code

Continue reading on Dev.to

This article was sourced from Dev.to's RSS feed. Visit the original for the complete story.

Read full article