GRASP: Making Long-Horizon Planning with World Models Practical
Introduction
Modern world models—learned simulators that predict future observations from actions—have grown remarkably powerful. They can forecast long sequences in high-dimensional visual spaces and generalize across tasks, resembling general-purpose simulators rather than narrow predictors. However, wielding these models for effective planning remains a challenge, especially over long horizons. The optimization often becomes ill-conditioned, trapped in local minima, or undermined by high-dimensional latent spaces. In this article, we explore a new approach called GRASP (Gradient-based Ascent for Robust Sampling and Planning), which redesigns gradient-based planning to make it robust for long-term decision-making.

The Challenge of Long-Horizon Planning
Why Traditional Planning Fails
Standard gradient-based planners optimize a sequence of actions by backpropagating through a world model. This works well for short horizons but breaks down when the planning horizon extends. The gradient signal becomes noisy or vanishes, the loss landscape develops sharp ravines, and the high dimensionality of latent states amplifies these issues. Moreover, greedy local improvements often overlook strategic long-term consequences.
The Role of World Models
A world model, as we define it, predicts future states given current state and actions. Formally, it approximates Pθ(st+1 | st−h:t, at). These models are typically learned from data and can be used as differentiable simulators for planning. Yet even with accurate models, the planning procedure itself introduces fragility.
GRASP: A Robust Gradient-Based Planner
GRASP tackles these problems with three key innovations that together make gradient-based planning practical for long horizons.
1. Virtual State Lifting
Instead of processing one time step at a time, GRASP lifts the entire trajectory into a set of virtual states—one per future time step—that are optimized in parallel. This parallelization removes the sequential bottleneck and allows the gradient to propagate uniformly across the horizon, avoiding the decay that plagues step-by-step methods.
2. Stochastic Exploration in State Space
To escape poor local minima, GRASP injects controlled stochasticity directly into the state iterates during optimization. This is not noise in the actions, but in the predicted states themselves, which helps the planner explore diverse trajectories and avoid premature convergence.

3. Gradient Reshaping for Clean Action Signals
When gradients flow through high-dimensional vision models, they can become brittle—especially the “state-to-action” gradients. GRASP reshapes these gradients to give clean, actionable signals to the action sequence while bypassing the noisy gradients from pixel-level predictions. This separation stabilizes updates and makes optimization more reliable.
Results and Implications
Empirical Validation
In experiments across several continuous control tasks with visual observations, GRASP consistently outperforms baseline planners, especially as horizon lengths increase. It achieves higher success rates, lower cumulative costs, and better sample efficiency in planning.
Broader Impact
The ability to plan reliably over long horizons opens the door to using learned world models as true simulators for reinforcement learning, robotics, and autonomous systems. GRASP makes it feasible to leverage powerful models without the fragility that previously limited their deployment.
Conclusion
GRASP introduces a principled way to make gradient-based planning robust for long horizons. By virtual state lifting, stochastic exploration, and gradient reshaping, it overcomes the fundamental challenges of optimization in high-dimensional latent spaces. As world models continue to scale, techniques like GRASP will be essential to translate prediction power into effective control. For more details, see the full paper: Gradient-based Planning for World Models at Longer Horizons (with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar).
Related Articles
- Preserving Team Culture in the Age of AI: A Guide to Avoiding the Hidden Costs of Efficiency
- Urban Microplastic Pollution: Why Better Runoff Data is Key to Prediction and Prevention
- Consciousness May Be the Universe's Fundamental Substance, New Theory Suggests
- The Silent Cost of AI Efficiency: Why 'Not Having to Bug a Colleague' Could Be Eroding Team Trust
- Masters of the Universe: A Guide to the Iconic Heroes and Villains of Eternia
- The Santa Marta Playbook: A Step-by-Step Guide to Transitioning Away from Fossil Fuels
- Strawberry Moon 2026: Peak Times and Viewing Tips for June's Celestial Spectacle
- Breakthrough Study: Slow 'Lowering' Movements Build Muscle in Just 5 Minutes a Day