G1 编程与工程 IsaacLab 训练配置

G1-23dof Velocity 任务训练配置规范

2026-04-12 · 6 min read

G1-23dof Velocity 任务训练配置规范

文档版本：V1.0
机器人平台：Unitree G1-23dof
仿真框架：IsaacLab 2.3.0 + IsaacSim 5.1.0
训练算法：RSL-RL PPO
编写日期：2026-04-12
任务标识：Unitree-G1-23dof-Velocity

1. 任务概述

1.1 任务目标

Velocity 任务是 G1-23dof 人形机器人的基础运动任务，目标为：

跟踪指令给出的期望线速度 $(v_x, v_y)$ 和角速度 $(\omega_z)$，在平坦/混合地形上实现稳定行走。

该任务是所有其他高级任务（CPG-Flat、Fusion、Following 等）的基线参考。

1.2 任务特性

特性	说明
任务类型	速度跟踪（Velocity Tracking）
地形	混合地形（flat 50% + 其他纹理）
传感器	Height Scanner（160D）+ Contact Force
动作空间	23-DOF 关节位置控制
Episode 长度	20 秒
并行环境数	4096（训练）/ 32（推理）

2. 环境配置

2.1 仿真参数

decimation = 4              # 控制频率：1000Hz / 4 = 250Hz
episode_length_s = 20.0    # Episode 时长 20 秒
sim_dt = 0.005             # 物理步长：5ms
render_interval = 4        # 渲染间隔（headless 时关闭）
physics_material = RigidBodyMaterialCfg(
    static_friction=1.0,
    dynamic_friction=1.0,
)
gpu_max_rigid_patch_count = 10 * 2**15  # GPU 物理求解器规模

2.2 地形配置

COBBLESTONE_ROAD_CFG = TerrainGeneratorCfg(
    size=(8.0, 8.0),           # 地形单元尺寸
    border_width=20.0,         # 边界宽度
    num_rows=9,                # 难度行数
    num_cols=21,                # 难度列数
    horizontal_scale=0.1,      # 水平分辨率：10cm
    vertical_scale=0.005,      # 垂直分辨率：5mm
    slope_threshold=0.75,     # 坡度阈值（rad）
    difficulty_range=(0.0, 1.0),
    sub_terrains={
        "flat": MeshPlaneTerrainCfg(proportion=0.5),
    },
)

2.3 场景传感器配置

Height Scanner（高度扫描器）

height_scanner = RayCasterCfg(
    prim_path="{ENV_REGEX_NS}/Robot/torso_link",
    offset=OffsetCfg(pos=(0.0, 0.0, 20.0)),  # 偏移：距基座20m高度发射
    ray_alignment="yaw",                       # 射线随偏航角对齐
    pattern_cfg=patterns.GridPatternCfg(
        resolution=0.1,   # 网格分辨率：10cm
        size=[1.6, 1.0], # 扫描范围：1.6m × 1.0m
    ),
    mesh_prim_paths=["/World/ground"],
    update_period=decimation * sim_dt,  # 250Hz / 4 = 62.5Hz
)

Contact Force Sensor（接触力传感器）

contact_forces = ContactSensorCfg(
    prim_path="{ENV_REGEX_NS}/Robot/.*",  # 所有刚体
    history_length=3,                     # 历史记录长度（用于接触时序计算）
    track_air_time=True,                   # 追踪腾空时间
    update_period=sim_dt,                  # 1000Hz 全速更新
)

3. 观测空间

3.1 Policy 观测（Actor）

观测项	维度	Scale	Noise	说明
`base_ang_vel`	3D	0.2	±0.2	基座角速度
`projected_gravity`	3D	1.0	±0.05	重力投影向量
`velocity_commands`	3D	1.0	None	指令速度 $(v_x, v_y, \omega_z)$
`joint_pos_rel`	23D	1.0	±0.01	关节位置相对值
`joint_vel_rel`	23D	0.05	±1.5	关节速度
`last_action`	23D	1.0	None	上一步动作

总计：78D（单帧），历史窗口 5 帧

3.2 Critic 观测（Privileged）

观测项	维度	Scale	Noise	说明
`base_lin_vel`	3D	1.0	None	基座线速度（特权信息）
`base_ang_vel`	3D	0.2	None	基座角速度
`projected_gravity`	3D	1.0	None	重力投影向量
`velocity_commands`	3D	1.0	None	指令速度
`joint_pos_rel`	23D	1.0	None	关节位置
`joint_vel_rel`	23D	0.05	None	关节速度
`last_action`	23D	1.0	None	上一步动作

总计：78D（无历史窗口）

3.3 观测配置参数

def __post_init__(self):
    self.history_length = 5        # 保留 5 帧历史
    self.enable_corruption = True  # 启用观测损坏（增强鲁棒性）
    self.concatenate_terms = True # 展平拼接所有项

4. 动作空间

4.1 动作配置

JointPositionAction = mdp.JointPositionActionCfg(
    asset_name="robot",
    joint_names=[".*"],      # 控制所有关节
    scale=0.25,              # 动作尺度：输出 × 0.25 = 实际关节偏移 (rad)
    use_default_offset=True, # 使用机器人默认姿态作为偏移基准
)

4.2 动作说明

类型：关节位置控制（Joint Position）
尺度：策略输出范围 [-1, 1] → 映射到 [-0.25, +0.25] rad
控制范围：23 个关节全部控制（腿 12 + 腰 1 + 臂 10）

5. 奖励函数设计

5.1 奖励分组

奖励函数分为 4 组：Task（任务）、Base Motion（基座运动）、Posture（姿态）、Feet（足部）。

5.1.1 Task 组（核心任务奖励）

奖励项	权重	函数	参数	物理意义
`track_lin_vel_xy`	1.0	`track_lin_vel_xy_yaw_frame_exp`	`std=0.5`	跟踪 XY 平面线速度，使用指数衰减误差
`track_ang_vel_z`	0.5	`track_ang_vel_z_exp`	`std=0.5`	跟踪偏航角速度
`alive`	0.15	`is_alive`	—	每步存活奖励，防止策略学会摔倒

5.1.2 Base Motion 组（基座运动惩罚）

奖励项	权重	函数	物理意义
`base_linear_velocity`	-2.0	`lin_vel_z_l2`	惩罚垂直方向速度（防止跳跃/颠簸）
`base_angular_velocity`	-0.05	`ang_vel_xy_l2`	惩罚 XY 平面角速度（防止剧烈翻滚）

5.1.3 Joint Motion 组（关节运动惩罚）

奖励项	权重	函数	物理意义
`joint_vel`	-0.001	`joint_vel_l2`	惩罚关节速度过大（节省能量）
`joint_acc`	-2.5e-7	`joint_acc_l2`	惩罚关节加速度变化（平滑动作）
`action_rate`	-0.05	`action_rate_l2`	惩罚动作变化率（防止抖关节）
`dof_pos_limits`	-5.0	`joint_pos_limits`	惩罚关节超出位置限制
`energy`	-2e-5	`energy`	惩罚电机功耗 $(\|qvel\| \times \|qfrc\|)$

5.1.4 Posture 组（姿态约束）

奖励项	权重	函数	目标关节	物理意义
`joint_deviation_arms`	-0.1	`joint_deviation_l1`	臂关节（shoulder/elbow/wrist）	约束手臂保持自然下垂
`joint_deviation_waists`	-1.0	`joint_deviation_l1`	waist_yaw	约束腰部不剧烈摆动
`joint_deviation_legs`	-1.0	`joint_deviation_l1`	hip_roll, hip_yaw	约束髋关节侧向过度偏移
`flat_orientation_l2`	-5.0	`flat_orientation_l2`	—	惩罚基座倾斜（保持水平）
`base_height`	-10.0	`base_height_l2`	`target=0.78m`	惩罚高度偏离 0.78m

5.1.5 Feet 组（足部步态奖励）

奖励项	权重	函数	参数	物理意义
`gait`	0.5	`feet_gait`	`period=0.8s, offset=[0.0, 0.5], threshold=0.55`	奖励周期性对角步态
`feet_slide`	-0.2	`feet_slide`	—	惩罚足部在地面滑动
`feet_clearance`	1.0	`foot_clearance_reward`	`target=0.1m, std=0.05`	奖励抬脚高度适中
`undesired_contacts`	-1.0	`undesired_contacts`	`threshold=1`	惩罚非踝关节触地

5.2 奖励项详解

`track_lin_vel_xy_yaw_frame_exp`

# 物理意义：在偏航坐标系下跟踪 XY 线速度
# 使用指数误差：exp(-error² / 2σ²)
reward = exp(-||v_cmd - v_current||² / (2 × 0.5²))

`feet_gait`

# 物理意义：周期性对角步态奖励
# period=0.8s：期望步态周期 0.8 秒
# offset=[0.0, 0.5]：左右脚相位差 50%（对角步态）
# threshold=0.55：触地判定阈值

`foot_clearance_reward`

# 物理意义：抬脚高度奖励
# target=0.1m：最佳抬脚高度 10cm
# tanh_mult=2.0：双曲正切乘法因子，增强高值区梯度

5.3 奖励权重汇总

分组	正奖励合计	负惩罚合计
Task	+1.65	—
Base Motion	—	-2.05
Joint Motion	—	≈-0.06
Posture	—	≈-7.1
Feet	+1.5	-1.2

6. 超参数配置

6.1 PPO 算法超参数

BasePPORunnerCfg(
    #  rollout
    num_steps_per_env=24,        # 每环境每次收集 24 步
    max_iterations=50000,        # 最多 50000 次迭代

    #  model
    policy=RslRlPpoActorCriticCfg(
        init_noise_std=1.0,       # 初始动作噪声标准差
        actor_hidden_dims=[512, 256, 128],  # Actor MLP [78] → [512] → [256] → [128] → [23]
        critic_hidden_dims=[512, 256, 128], # Critic MLP [78] → [512] → [256] → [128] → [1]
        activation="elu",        # ELU 激活函数
    ),

    #  algorithm
    algorithm=RslRlPpoAlgorithmCfg(
        value_loss_coef=1.0,      # Value loss 系数
        use_clipped_value_loss=True,
        clip_param=0.2,           # PPO 裁剪参数 ε
        entropy_coef=0.01,        # 熵正则系数（鼓励探索）
        num_learning_epochs=5,    # 每次更新做 5 轮 PPO epoch
        num_mini_batches=4,       # 4 个 mini batch
        learning_rate=1e-3,       # 学习率 1×10⁻³
        schedule="adaptive",      # 自适应学习率调度
        gamma=0.99,               # 折扣因子
        lam=0.95,                # GAE lambda
        desired_kl=0.01,         # 目标 KL 散度（触发学习率下降）
        max_grad_norm=1.0,       # 梯度裁剪阈值
    ),
)

6.2 超参数影响对照表

超参数	默认值	增大影响	减小影响
`num_steps_per_env`	24	降低方差、增加 epoch 间延迟	方差增大、延迟降低
`clip_param`	0.2	更保守的策略更新、探索减少	更多探索、可能不稳定
`entropy_coef`	0.01	更多探索、可能降低最终性能	更少探索、可能更快收敛
`learning_rate`	1e-3	收敛更快、可能不稳定	更稳定、收敛慢
`init_noise_std`	1.0	初始探索更多	初始探索更少
`num_mini_batches`	4	梯度估计更稳定、内存增加	梯度噪声更大
`desired_kl`	0.01	更激进策略更新	更保守更新

7. Curriculum（课程学习）

7.1 Terrain Curriculum

terrain_levels = CurrTerm(func=mdp.terrain_levels_vel)

地形难度根据机器人在当前难度的成功率自动调整。成功率高则提升难度等级，失败多则降低难度。

7.2 Velocity Command Curriculum

lin_vel_cmd_levels = CurrTerm(mdp.lin_vel_cmd_levels)

指令速度范围随训练进程逐步扩展，从低速（-0.1, 0.1）逐步扩展到全速（-0.5, 1.0）。

8. 终止条件

终止项	函数	参数	触发条件
`time_out`	`mdp.time_out`	—	Episode 达到 20 秒
`base_height`	`root_height_below_minimum`	`minimum_height=0.2m`	基座高度 < 0.2m（摔倒）
`bad_orientation`	`bad_orientation`	`limit_angle=0.8rad ≈ 46°`	基座倾斜 > 46°（严重倾斜）

9. 事件随机化（Domain Randomization）

9.1 Startup 事件

事件	函数	参数	说明
物理材质	`randomize_rigid_body_material`	`friction: 0.3~1.0`	摩擦系数随机化
基座质量	`randomize_rigid_body_mass`	`+(-1.0~+3.0) kg`	基座质量扰动

9.2 Reset 事件

事件	函数	参数	说明
基座位置	`reset_root_state_uniform`	`x,y: ±0.5m, yaw: ±π`	初始位置/朝向随机
基座速度	—	全零	初始速度为零
关节位置	`reset_joints_by_scale`	`1.0~1.0`（默认姿态）	关节复位
关节速度	`reset_joints_by_scale`	`-1.0~1.0`	初始速度随机

9.3 Interval 事件

事件	函数	参数	说明
`push_robot`	`push_by_setting_velocity`	`interval=5s, v_xy: ±0.5 m/s`	每 5 秒施加随机水平推力

10. 训练命令

# 完整训练
source ~/miniconda3/etc/profile.d/conda.sh && \
conda activate unitree_sim_env && \
source /home/robot/0_lerobot/IsaacLab/_isaac_sim/setup_conda_env.sh && \
python scripts/rsl_rl/train.py \
    --task Unitree-G1-23dof-Velocity \
    --num_envs 4096 \
    --max_iterations 50000 \
    --headless

# 推理验证
./unitree_rl_lab.sh -p --task Unitree-G1-23dof-Velocity

11. 附录：任务横向对比

11.1 Velocity vs CPG-Flat vs Fusion 观测空间对比

任务	Policy 观测维度	Critic 观测维度	视觉传感器
Velocity	78D（5帧历史）	78D（privileged）	无
CPG-Flat	82D（+4D CPG相位）	78D	无
Fusion V0	6030D（Depth+LiDAR）	241D（Height）	Depth(48×64) + LiDAR(8×360)

11.2 奖励函数数量对比

任务	奖励项总数	Task组	Reg组	Safety组
Velocity	17项	3项	9项	2项
CPG-Flat	19项（+2 CPG专项）	3项	11项	2项
Fusion V11	27项	22项	—	5项

11.3 动作空间对比

任务	动作接口	尺度	特殊设计
Velocity	JointPosition	0.25	无
CPG-Flat	CPG-Residual	0.25	CPG基准+残差
Fusion	JointPosition	0.25	无

版本记录

版本	日期	修改内容	作者
V1.1	2026-04-12	增加章节11，Velocity/CPG/Fusion 横向对比	AI Assistant
V1.0	2026-04-12	初始版本，Velocity 完整配置规范	AI Assistant

本文档由 AI 辅助整理自 unitree_lab_locomotion 仓库源码

← 上一篇

G1-23dof 强化学习超参数系统调优指南

G1-23dof 强化学习策略部署架构

← 返回博客列表