LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts

Abstract

Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts.

We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing.

LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations.

We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.

Learned Stunt Skills

MiniHop

A small vertical hop reaching approximately 32 cm in height and 50 cm in distance. Demonstrated on real hardware.

LargeHop

A larger vertical hop reaching about 56 cm in height and 80 cm in distance. Demonstrated on real hardware.

ThreePointTurn

A planar two-step pivot turn that requires precise balance adjustments. Demonstrated on real hardware.

Backflip Simulation

A full vertical backward flip executed while moving forward. The guideline is generated via trajectory optimization with a simplified two-mass model.

DriftTurn Simulation

A planar left drift-turn maneuver of roughly 90 degrees involving controlled rear-wheel slip.

Generalization to Quadrupeds

To demonstrate the generality of our framework, we applied LineRides to a quadruped platform and trained five diverse stunt skills: MultiHop, TurnLeft, Circle, WallJump, and Backflip. The quadruped successfully acquires all five skills using the same guideline-based specification, suggesting that the framework is not limited to bicycle robots alone.

Method Overview

LineRides enables robots to learn stunt behaviors from a simple user-provided line (spatial guideline) and a sequence of key-orientations. The framework addresses three key challenges:

Margin Parameter: Allows controlled deviation from potentially infeasible guidelines, providing the policy with flexibility to satisfy physical constraints.
Distance-based Progress: Uses the robot's cumulative traveled distance as a progression measure along the guideline, enabling consistent target selection without explicit timing.
Key-orientations: Specifies desired base orientations at selected waypoints for fine-grained motion control, supporting both position-based and sequence-based formulations.

The framework trains a single end-to-end policy that supports both driving mode and stunt mode, enabling seamless transitions between normal operation and stunt execution.

BibTeX

coming soon...
}