SDAX: Unsupervised Skill Discovery As eXploration

for Learning Agile Locomotion

CoRL 2025

Seungeun Rho^*, Kartik Garg^*, Morgan Byrd, Sehoon Ha

Georgia Institute of Technology

^* co-first authors

Motivation

How can we enable robots to learn agile locomotion skills without manual engineering efforts?

     (1) No excessive reward engineering
     (2) No curriculum learning
     (3) No reference trajectories

We leverage unsupervised skill discovery for high level exploration to discover skills that can solve the task, and apply bi-level optimization to automatically balance between exploration and task completion.

Method Overview

We train a skill-conditioned policy π(a|s,z) that optimizes both task completion and skill diversity. The policy is parameterized by neural network weights θ, which we optimize using:
θ = argmaxJ^task+div = argmaxE_{π_θ}[∑^∞_t=0 γ^t (r^task_t + λ r^div_t)]

The key component is the learnable balancing parameter λ that automatically adjusts the weight between:

Task reward (r_task): Simple goal-oriented signal (e.g., forward velocity)
Diversity reward (r_div): Encourages different skills to explore distinct behaviors

A figure of bi-level optimization for π_θ and λ. The task reward gives the gradient signal for training λ, and the sum of both sources of rewards provides the gradient signal for optimizing π_θ.

Bi-Level Optimization

While the policy parameters θ optimize the combined objective, λ is trained using only the task reward:
λ = argmax_λ J^task

Since we cannot directly compute ∇_λJ^task, we apply the chain rule:
∇_λJ^task = ∇_θJ^task ∇_λθ

This can be expanded into our final tractable form:
∇_λJ^task ≈ α A^task ∇_θ' log π_θ'(a|s,z) · A^div ∇_θ log π_θ(a|s,z)

This ensures λ increases when diversity helps task performance and decreases when it hinders progress.

Results

We evaluate our framework on challenging locomotion tasks requiring distinctive control strategies. Our method successfully learns the necessary motor skills for all tasks, outperforming task-only baselines and demonstrating that incorporating diversity rewards helps in learning agile locomotion skills. Skill discovery functions as a high-level exploration module, enabling agents to systematically probe diverse strategies and rapidly identify the contact patterns and momentum profiles needed for task completion.

Crawl

Sim

Real

Leap

Sim

Real

Jump

Sim

Real

Positive Skill Collapse

As training progresses, a growing number of skill vectors become capable of solving the task: Leap ~97%, Climb ~65%, Crawl ~60%. This suggests that once a viable solution is discovered, different skill vectors converge into similar behaviors. This phenomenon is facilitated by task rewards: when a skill finds a successful solution, learning propagates to other skill-conditioned behaviors through the shared policy network. We term this "positive collapse" of skills, which is beneficial because it mitigates the issue of selecting the right skill.

Skills converging to successful behaviors

Diverse Terrain Adaptation

To evaluate the robustness of the crawling policy trained with our skill discovery approach, we deployed it on different terrains including wood and rubber mat surfaces. The robot successfully crawled under obstacles in both settings, demonstrating its ability to generalize across varying surface conditions and terrain properties.

Robust crawling across wood and rubber mat surfaces

Wall-Jump: Learning Super Agile Tasks

We pushed our method to its limits by introducing a new task named wall-jump. It requires the robot to perform a sequence of highly agile motions, including running, jumping, flipping, and landing in a specific order. Providing the guideline alone was not sufficient for the agent to successfully perform the wall-jump. Our method was able to acquire the specific orientation needed to kick off the wall by exploring diverse orientations through skill discovery, achieving successful wall-jumps with much higher task returns.

Super Agile Walljump Task

Paper and Code

Latest version: here
Conference on Robot Learning (CoRL) 2025

BibTeX

@article{rho2025unsupervised,
  title={Unsupervised Skill Discovery as Exploration for Learning Agile Locomotion},
  author={Rho, Seungeun and Garg, Kartik and Byrd, Morgan and Ha, Sehoon},
  journal={arXiv preprint arXiv:2508.08982},
  year={2025}
}

Contact

If you have any questions, please feel free to contact Seungeun Rho.