Dog Trot¶

Action Space	`Box(-2` \(\pi\)`,2` \(\pi\)`, (12,), float32)`
Observation Space	`Box(_, _, (160,), float32)`
import	`gymnasium.make('QuadrupedLocomotion-DogTrot-v0')`

Description¶

This quadruped locomotion environment was proposed by . They showed that by imitating skills of real-world reference motion data, a diverse repertoire of behaviors can be learned using a single learning-based approach. The quadruped is a legged robot which has 18 degrees of freedom (DoF), 3 actuated DoF per leg and 6 under-actuated DoF for the torso. The objective is to learn from imitating real animals. The observation is a reference motion of a desired skill, captured with motion capture. The policy enables a quadruped to reproduce the skill in the real world, using the 12 actuated degrees of freedom of the legs. The controller runs at a control frequency of 30Hz.

Action Space¶

The actions specify target rotations for PD controllers at each joint. There are 12 controllable DoFs, three for each leg. The each action therefore corresponds to a target rotation for a single DoF. The action space is a Box(-2 \(\pi\),2 \(\pi\), (12,), float32).

For smooth motions, the PD targets are passed through a low-pass filter before being applied to the robot.

Observation Space¶

The observation space of this environment consists of the reference motion of the skills that need to be learned.

Rewards¶

The reward function is designed to encourage the policy to track a sequence of poses from the reference motion. Using reference motions, we avoid the need of designing skill-specific reward functions, enabling a common framework to learn a diverse array of behaviors. The reward at each step is:

\[r_t = 0.5r^p_t + 0.05 r^v_t + 0.2 r^e_t + 0.15r^{rp}_t + 0.1r^{rv}_t\]

Where the pose reward \(r^p_t\) reflects the difference between the reference joint rotations, \(\hat q^j_t\), and the robot joint rotations, \(q^j_t\):

\[r^p_t = exp(-5 \sum_{j}||\hat q^j_t - q^j_t||^2)\]

The velocity reward \(r^v_t\) reflects the difference between the reference joint velocities, \(\hat{ \dot{q}}^j_t\), and the robot joint velocities, \(\dot{q}^j_t\):

\[r^p_t = exp(-0.1 \sum_{j}||\hat{ \dot{q}}^j_t - \dot{q}^j_t||^2)\]

The end-effector reward \(r^e_t\) encourages the robot to track the end-effector positions. \(x^e_t\) is the relative 3D position of the end-effector e with respect to the root:

\[r^e_t = exp(-40 \sum_{e}||\hat{x}^e_t - e^e_t||^2)\]

Finally, the root pose reward \(r^{rp}_t\), and the root velocity reward \(r^{rv}_t\) encourage the robot to track the reference root motion. \(x^{root}_t\) is the root’s global position and \(\dot x^{root}_t\) is the root’s linear velocity, \(q^{root}_t\) is the root’s global porotationsition and \(\dot q^{root}_t\) is the root’s angular velocity :

\[\begin{split}r^{rp}_t = exp(-20 ||\hat{x}^{root}_t -x^{root}_t||^2 - 10 || \hat{q}^{root}_t - q^{root}_t || ^2) \\ r^{rv}_t = exp(-20 ||\hat{\dot{x}}^{root}_t -\dot{x}^{root}_t||^2 - 10 || \hat{\dot{q}}^{root}_t - \dot{q}^{root}_t || ^2)\end{split}\]

Starting State¶

When resetting the environment, the robot’s position in the world is reset. The initial condition of the reference motion varies for very reset, to encourage a robust policy. The starting state therefore varies for every episode.

Episode End¶

The episode ends under two conditions. Either the task is completed, or the robot is in an unsafe condition.

Arguments¶

When creating the quadruped locomotion environment, we can pass several kwargs such as enable_rendering, mode, num_parallel_envs etc.

import gymnasium as gym
import rl_perf.domains.quadruped_locomotion

env = gym.make('QuadrupedLocomotion-v0', enable_rendering=False, ...)

Parameter	Type	Default	Description
`enable_rendering`	bool	`False`	If `True`, the environment will be rendered
`mode`	str	`"train"`	Can be either `"train"` or `"test"`, in the training mode, the randomizer is automatically disabled.
`num_parallel_envs`	int	`None`	Number of parallel `MPI` workers. Most likely, you will not use this parameter.
`enable_randomizer`	bool	`None`	If `True`, the dynamics of the robot get randomized. If the mode is `train`, defaults to `True`, else `False`. Most likely, you will not use this parameter.
`robot_class`	Class	`laikago.Laikago`	Provide a `Class` rather than an instance. Most likely, you will not use this parameter.
`trajectory_generator`		`LaikagoPoseOffsetGenerator()`	A trajectory_generator that can potentially modify the action and observation. Expected to have `get_action` and `get_observation` interfaces. Most likely, you will not use this parameter.

Next, the reset method allows several options.

env.reset(seed, options)

Where the options should be passes as a Dict. The possible options are:

Parameter	Type	Default	Description
`initial_motor_angles`	list	`None`	A list of Floats. The desired joint angles after reset. If None, the robot will use its built-in value.
`reset_duration`	Float	`0.0`	The time (in seconds) needed to rotate all motors to the desired initial values.
`reset_visualization_camera`	bool	`True`	Whether to reset debug visualization camera on reset.

Version History¶

v0: Initial versions release