How to Plan Whole-Body Motion in ~1 ms

Autolife (24 DoF) performing a pick-and-place under whole-body planning (5× preview). Full documentation and benchmarks on the Autolife-Planning docs.

A mobile manipulator is expected to cross a room, squeeze through a doorway, reach over a countertop, and place a cup on a shelf, all as a single coordinated motion. The planner that feeds this behavior has to reason about the base, the torso, and the arm simultaneously. Any navigate-then-reach decomposition leaks workspace, and whole-body data is what modern learned policies (VLAs, diffusion policies, world models) need to train on at scale.

The problem is structurally hard for two reasons. A modest platform like Fetch already lives in \(\mathbb{R}^{11}\); a bimanual platform like Autolife reaches 24 DoF with head and torso, and in that regime sampling-based planners are dominated by collision-check cost. On top of the dimensionality, the subspaces are heterogeneous: the arm is a holonomic Euclidean space, but a non-holonomic base cannot slide sideways, and its local metric is Reeds-Shepp or Dubins rather than Euclidean. Dropping the two into a single RRT without accounting for this gives edges that are kinematically valid but dynamically infeasible, and a distance metric under which arm motion looks "free" relative to base motion (the planner waves the arm instead of driving).

Despite the problem's importance, the community has not converged on a single answer. This post is a set of engineering notes from our exploration: the existing approaches, where the bottleneck actually sits, and the architecture we settled on: OMPL [1] as the front end, VAMP [2] as a vectorized FK and collision backend, with extensions for subgroup, manifold, cost-space, and non-holonomic planning. The same stack runs on Autolife (24 DoF) and Fetch (11 DoF) at millisecond-level query times that, to our knowledge, are faster than any other whole-body planner in the literature.

What we measure

All numbers below are single-search wall-clock times with path simplification and interpolation disabled, on a consumer CPU (no GPU). 20 runs per cell, tabletop point-cloud scene. Collision checking is VAMP with a spherized URDF; non-holonomic base uses Reeds-Shepp steering.

Configuration	DoF	RRT-Connect median	P95
Autolife left arm	7	1.1 ms	1.1 ms
Autolife dual arm	14	1.1 ms	1.2 ms
Autolife full body	24	1.1 ms	1.1 ms
Fetch whole body	11	2.1 ms	—

The surprise is that a 24-DoF whole-body query is not noticeably slower than a 7-DoF arm query, because collision checking (not dimension) is what the wall-clock time was really measuring. Per-check cost sits at about 3.2 μs including the Python boundary, which yields ~0.31 million checks per second, and batched checks reach 8.5× on all-valid workloads.

Read as an order-of-magnitude comparison, not a head-to-head benchmark. We did not reproduce any of the methods below, and we did not run them on the same queries we ran ourselves. Each bar for other methods is the number that paper itself reports on its own scenes, robots, and hardware. We only care about significant-figure differences, and the hardware each approach requires at inference time.

Ours
Autolife 24 DoF · RRT-Connect + VAMP

1.1 ms

CPU

REMANI
hierarchical search + opt.

~30 ms

CPU

cuRobo
batched traj. optimization

~53 ms

GPU

NRP
neural sampler + SBMP

~300 ms

GPU

Our bar is the measured 1.1 ms median on Autolife 24 DoF with a tabletop point-cloud scene. Other bars are approximate numbers from each paper's own reports: REMANI [7] on the order of tens of milliseconds, cuRobo [6] ~53 ms on a workstation GPU, NRP [4] in the hundreds of milliseconds. Hardware badges on the right show what each method runs at inference time. Different scenes, different robots, different metrics — an order-of-magnitude sketch, not a benchmark.

Two things we want to point out with this chart. First, once collision checking is vectorized, a classical CPU-only sampling-based planner sits roughly one to two orders of magnitude below GPU-optimization and learned-sampler pipelines on the problems those papers report — the significant figures line up with the rest of this post's argument. Second, the hardware column matters for deployment: mobile manipulators often do not ship with a discrete GPU, so a planner that needs one is a constraint on the whole platform.

How the community has attacked it

The textbook answer: concatenate all DoFs into one vector and run a classical sampling-based planner. RRT, RRT-Connect, PRM, or BIT* on the product space [3]. For a holonomic robot this is actually tractable, but without careful engineering it is slow, and the bottleneck is almost always collision checking rather than sampling or nearest-neighbor search. For a non-holonomic base the story is worse: the base and the arm live under different metrics and steering functions, so a uniform sampler and an \(SE(2) \times \mathbb{R}^{n}\)-style Euclidean distance are both inappropriate, and the planner produces long, oscillatory, non-optimal trajectories.

Learned samplers: keep the sampling-based skeleton, replace the uniform sampler with a neural one. Neural Randomized Planner (NRP) [4] by Yunfan Lu, David Hsu, Panpan Cai and colleagues pioneered this for whole-body mobile manipulation: a local neural sampler proposes promising configurations, and the global sampling-based planner stitches these local distributions together by exploiting geometric similarity between neighborhoods. It transfers zero-shot to a real robot in novel household scenes, a genuinely strong result. Xu, Wong, Fei Gao and colleagues at the ZJU FAST Lab pushed this further with a primitive-based truncated diffusion trajectory generator [5] for differential-drive mobile manipulators, using the diffusion posterior as a distribution-aware sampler. The catch for both is that the learned component is tied to a specific embodiment and environment distribution, so retargeting to a new robot requires new data and new training, and the speed gain at inference does not push below the collision-check-dominated floor of a well-engineered classical planner. It also depends on how representative the training data is, a familiar failure mode when the deployment scene drifts from the training distribution.

Trajectory optimization: give up sampling, cast planning as batched GPU optimization. NVIDIA's cuRobo [6] solves many-seed trajectory optimization with gradient descent, L-BFGS and MPPI on GPU, and reports 53 ms average planning times on a workstation: excellent for short-horizon, locally-smooth problems and a strong baseline for arm planning. The limitation we care about here is long-horizon, room-scale whole-body queries: MPPI-style samplers need reasonable initializations, and warm-started optimization does not recover well from deep local minima in cluttered environments; and the GPU requirement is a deployment constraint we often cannot accept on mobile platforms.

Hierarchical decomposition: split the problem. Plan the base in 2D, then plan the arm in 7D. REMANI [7], by Wu, Fei Gao, Boyu Zhou and colleagues, is a strong representative: an environment-adaptive search over the base plus spatial-temporal optimization over the arm, capable of generating high-quality trajectories in real time. What we lose is probabilistic completeness in the joint configuration space. The decomposition implicitly assumes there exists a base path such that the arm can, afterwards, plan for the required end-effector trajectory. When that assumption breaks (narrow passages requiring simultaneous arm retraction and base rotation, or reachability-constrained grasps where the base has to tuck in a particular way), a hierarchical planner can report infeasibility for problems that are in fact feasible in the joint space.

VAMP: the most important recent change is not an algorithm but a data structure. Rewrite FK and collision checking for SIMD. VAMP [2] (Vectorized Sampling-Based Motion Planning) by Wil Thomason, Zachary Kingston and Lydia Kavraki rewrote forward kinematics and collision checking around a struct-of-arrays layout that maps cleanly onto SIMD lanes, and pushed motion planning for a Panda arm from milliseconds into the tens of microseconds on a single CPU core. Their Collision-Affording Point Tree answers point-cloud collision queries in under 10 ns on average. This is the development that changes the design space. Once collision checking is no longer the bottleneck, a classical sampling-based planner on the full configuration space can match or beat learned and GPU-optimization approaches, while keeping probabilistic-completeness and generality. The question becomes an engineering one: how do you wire an OMPL-style planner front end to a VAMP-style backend, and how do you extend the combination to handle subgroups, constraint manifolds, and non-holonomic base motion?

Our architecture: OMPL front end × VAMP backend

The design decision that underlies the rest of this post is: keep the classical, general, probabilistically-complete sampling-based skeleton, and put the engineering effort into making its bottlenecks disappear.

A thin bridge (spherized URDF, CasADi code-gen for residuals and costs) lets the OMPL front end delegate every state-validity query to VAMP's vectorized backend. The planner proposes a configuration \(q\); VAMP answers valid? in microseconds.

OMPL [1] is the front end for the same reason nobody rewrites their own BLAS: 26 planners (RRT, RRT-Connect, RRT*, PRM, BIT*, AIT*, KPIECE, EST, SBL, ...), a well-factored C-space abstraction, goal regions, projections, and the ProjectedStateSpace / AtlasStateSpace machinery that manifold planning needs. What OMPL does not have is a SIMD-aware collision checker: its StateValidityChecker is called one state at a time, and under typical FCL/MoveIt plumbing a single check runs tens to hundreds of microseconds, which dominates every other cost on a 24-DoF robot. VAMP is what we plug in behind it: a single check drops to ~3 μs including the Python binding, batched checks hit 8.5× on top of that, and with the bottleneck gone RRT-Connect returns whole-body paths in ~1.1 ms median across 7-, 14-, and 24-DoF configurations of Autolife. A thin bridge handles the practical glue — spherized URDF links so FK-then-collision stays vectorizable, direct point-cloud ingestion through VAMP's Collision-Affording Point Tree (no mesh-to-SDF round trip), and warm-start paths so the data-generation loop can amortize one hard query across many similar ones.

Engineering takeaway. Most whole-body planning papers compare algorithms at a fixed backend. Switching the backend is a far larger lever than switching the algorithm.

Planning modes we found we needed

5.1 Subgroup planning

In practice very few queries actually need all 24 DoFs. A tabletop pick with the left arm needs 7; a dual-arm handover needs 14; a whole-body reach that shifts the base and torso needs the full 24. Running a 24-DoF planner on a problem that really lives in 7 DoFs is exactly the "probability of sampling the useful subspace" trap.

We expose this as subgroups: a subgroup is a named subset of joint indices that the planner varies, while the remaining joints are frozen at their current values. Crucially, collision checking still runs over the full body, so a frozen arm still avoids collisions. We measure roughly 2 to 5× speedup on 7-DoF queries compared to running them on the full 24-DoF state.

planner = create_planner(
    robot="autolife",
    subgroup="autolife_left_arm",   # 7 DoF, full-body collision
    backend="vamp",
)
path = planner.plan(start, goal, pointcloud=pc)

Subgroup planning on Autolife. The active subgroup (here the left arm, 7 DoF) is the only thing the planner varies; the full 24-DoF body still participates in collision checking.

5.2 Manifold planning

A lot of real tasks live on constraint manifolds: keep the end-effector on a plane while wiping a table, pin it to a rail while drawing a line, lock the orientation while carrying a glass of water. All three ask the planner to stay on the zero-set of some residual map \(F(q) = 0\), an implicitly-defined submanifold \(\mathcal{M} \subset \mathcal{C}\) of the ambient configuration space.

How the projection works. A uniform sample \(\hat{q}\) from the ambient space will almost never satisfy \(F(\hat{q}) = 0\). We pull it onto the manifold with a damped Newton iteration on the residual:

q_{k+1} = q_k - J(q_k)^{+}\, F(q_k), \qquad J(q_k) = \frac{\partial F}{\partial q}\bigg|_{q_k}

where \(J^{+}\) is the Moore-Penrose pseudoinverse. We iterate until \(\|F(q)\|_\infty < \varepsilon\) or a retry budget is exhausted; success returns the projected point, failure rejects the sample. Both \(F\) and \(J\) are CasADi symbolic expressions that we code-generate to a shared object at start-up, so each Newton step costs a few hundred nanoseconds instead of the tens of microseconds a Python/autograd evaluation would take. OMPL's ProjectedStateSpace calls this projection on every extend and every interpolated edge point, so the tree only ever contains states that satisfy \(F(q) = 0\) to tolerance.

In the snippets below p_x, p_y, p_z, R = forward_kinematics(q) are the end-effector position components and rotation, and starred quantities (\(R^{*}\), \(x^{*}\), \(y^{*}\), \(z^{*}\)) are reference values captured at the start pose.

Plane constraint. End-effector \(z\) pinned at 0.8 m; the rest of the body moves freely on the manifold.

Residual

$F(q) = p_z(q) - 0.8$

Code

residual = p_z - 0.8

Plane + obstacle. Same residual as above, but a sphere obstacle forces the tree to route around it while staying on the manifold.

Residual

$F(q) = p_z(q) - 0.8$

Code

residual = p_z - 0.8

Orientation lock. Two rotation columns are pinned, fixing the full gripper orientation (the carrying-a-glass-of-water case).

Residual

F(q) = \begin{bmatrix} R_{:,1} - R^{*}_{:,1} \\ R_{:,2} - R^{*}_{:,2} \end{bmatrix} \in \mathbb{R}^{6}

Code

residual = ca.vertcat(
    R[:, 0] - R_ref[:, 0],
    R[:, 1] - R_ref[:, 1],
)

Horizontal rail. \(y\) and \(z\) pinned plus orientation locked; the gripper slides along a horizontal line.

Residual

F(q) = \begin{bmatrix} p_y - y^{*} \\ p_z - z^{*} \\ R_{:,1} - R^{*}_{:,1} \\ R_{:,2} - R^{*}_{:,2} \end{bmatrix} \in \mathbb{R}^{8}

Code

residual = ca.vertcat(
    p_y - y_ref,
    p_z - z_ref,
    R[:, 0] - R_ref[:, 0],
    R[:, 1] - R_ref[:, 1],
)

Vertical rail. \(x\) and \(y\) pinned plus orientation locked; the gripper slides along a vertical line (the drawing-on-a-wall case).

Residual

F(q) = \begin{bmatrix} p_x - x^{*} \\ p_y - y^{*} \\ R_{:,1} - R^{*}_{:,1} \\ R_{:,2} - R^{*}_{:,2} \end{bmatrix} \in \mathbb{R}^{8}

Code

residual = ca.vertcat(
    p_x - x_ref,
    p_y - y_ref,
    R[:, 0] - R_ref[:, 0],
    R[:, 1] - R_ref[:, 1],
)

5.3 Cost-space planning

Sometimes we want a preference, not a hard constraint: "prefer the end-effector near its home height", "prefer small base rotations". We attach scalar CasADi cost functions that get integrated along edges (trapezoidally), and let OMPL's AO planners (RRT*, BIT*) minimize a line-integral objective combining path length and state cost:

J(\gamma) = \int_\gamma \|\dot{\gamma}(s)\|_2\, ds + \lambda \int_\gamma c(\gamma(s))\, ds

This is the same infrastructure as manifold planning (CasADi, code-gen, shared objects), with "residual = 0" replaced by "minimize \(\int c(q)\,ds\)". In practice we use manifold planning when physics demands it and cost-space when we have strong preferences that should nonetheless be violated if necessary to remain feasible. The snippets below reuse p_x, p_y, p_z, R = forward_kinematics(q) from the previous section.

Cost plane. Quadratic penalty pulls the end-effector toward home height, but the planner is free to deviate when geometry demands it.

Cost

$c(q) = (p_z(q) - 0.8)^2$

Code

expr = (p_z - 0.8) ** 2

Soft orientation preference. Encourages home orientation while allowing free translation and orientation violation when needed.

Cost

c(q) = \|R_{:,1} - R^{*}_{:,1}\|^2 + \|R_{:,2} - R^{*}_{:,2}\|^2

Code

expr = (
    ca.sumsqr(R[:, 0] - R_ref[:, 0])
    + ca.sumsqr(R[:, 1] - R_ref[:, 1])
)

Soft horizontal rail. Multi-component residual as a cost rather than a hard constraint; the path prefers the rail but can leave it.

Cost

c(q) = (p_y - y^{*})^2 + (p_z - z^{*})^2 + \|R_{:,1} - R^{*}_{:,1}\|^2 + \|R_{:,2} - R^{*}_{:,2}\|^2

Code

expr = (
    (p_y - y_ref) ** 2
    + (p_z - z_ref) ** 2
    + ca.sumsqr(R[:, 0] - R_ref[:, 0])
    + ca.sumsqr(R[:, 1] - R_ref[:, 1])
)

Soft vertical rail. Same idea, different axis; useful when the "rail" is a preference about surface contact rather than a rigid constraint.

Cost

c(q) = (p_x - x^{*})^2 + (p_y - y^{*})^2 + \|R_{:,1} - R^{*}_{:,1}\|^2 + \|R_{:,2} - R^{*}_{:,2}\|^2

Code

expr = (
    (p_x - x_ref) ** 2
    + (p_y - y_ref) ** 2
    + ca.sumsqr(R[:, 0] - R_ref[:, 0])
    + ca.sumsqr(R[:, 1] - R_ref[:, 1])
)

5.4 Non-holonomic base: multi-level planning on a fiber bundle

Why not just flatten base and arm into one vector. The obvious thing to try is a single RRT-Connect tree living in the product space \(SE(2) \times \mathbb{R}^{k}\) with a weighted compound metric and a per-component steering function (Reeds-Shepp on the base, linear on the arm). We implemented it, and for Fetch it is the wrong choice. With 8 arm DoFs against 3 base DoFs, nearest-neighbor lookup is dominated by the arm even after metric re-weighting, and uniform product sampling spends most of its budget wiggling the arm inside each \(SE(2)\) cell before the tree ever makes it across the room. The symptom is exactly the one the introduction warned about: the arm waves while the base barely moves.

What we actually use: OMPL multilevel planning (fiber bundles). The Fetch stack decomposes the whole-body configuration into a two-level bundle

\underbrace{SE(2)}_{\text{base level } \mathcal{B}_0} \;\hookrightarrow\; \underbrace{SE(2) \times \mathbb{R}^{k}}_{\text{total space } \mathcal{B}_1 = \mathcal{Q}}

where the projection \(\pi : \mathcal{B}_1 \to \mathcal{B}_0\) drops the arm joints and the fiber \(\pi^{-1}(b)\) over each base pose \(b \in SE(2)\) is the \(k\)-dimensional arm joint space. Planning proceeds level by level: QRRT (Orthey & Toussaint, bundle-space RRT) grows a tree on the base level \(SE(2)\) first, using Dubins (or Reeds-Shepp) steering and whole-body collision checks, then lifts base states into the total space by sampling arm configurations on the fiber. The upper-level tree inherits the lower-level base path as structure, so exploration spends its samples on the fiber (arm) rather than re-discovering base connectivity. OMPL calls this the XRN_X_SE2 projection; the two planners we expose are QRRT/QRRT* and QMP/QMP*.

Base steering: Dubins by default, Reeds-Shepp on request. The base level uses DubinsStateSpace by default: forward-only, 6 curve families, asymmetric distance. Because Dubins curves structurally forbid reverse motion, no reverse-cost penalty is needed anywhere downstream: the planner simply cannot propose a back-up. Setting allow_reverse=True swaps the base space for ReedsSheppStateSpace with the shortest-of-48 curve families, for scenes that genuinely require reversing (tight re-orientations against a wall, etc.).

Lifting. Each base edge is a bounded-curvature Dubins/RS curve. When the upper level lifts a base vertex into the total space, it attaches arm configurations to samples along that curve; every interpolated \((x, y, \theta, a)\) on the lifted edge is handed to VAMP for whole-body collision checking. So the base motion the robot can actually execute is fixed by the lower level, and the upper level only decides how the arm is allowed to move along it.

planner = create_planner(
    "fetch_whole_body",
    config=PlannerConfig(planner_name="rrtc", time_limit=0.5),
    pointcloud=cloud,
)
# Internally: Dubins base level  ->  Compound(Dubins + R^8) total space,
# planned with QRRT (bundle-space RRT). Upper-level "rrtc" selects the
# QRRT variant; "rrtstar" selects QRRT*.
planner.set_base_bounds(-4.0, 4.0, -4.0, 4.0)
path = planner.plan(start, goal)   # 11-DoF whole-body path

Why multi-level, not flat compound

Sample budget goes to the right dimensions. On a flat \(SE(2) \times \mathbb{R}^{8}\) tree, nearly every extension perturbs the arm without moving the base. On the bundle, base connectivity is established on a 3-dimensional lower level where samples are cheap and base-goal bias is effective; the arm is only explored once there is a base path to hang it on.
Steering stays admissible. The Dubins/RS steering function is a complete local planner on \(SE(2)\) (Reeds & Shepp, 1990; Laumond 1998): any two base poses are connected by a shortest bounded-curvature curve, with length continuous in the endpoints. Dropping it to the lower level means every base edge is something the differential-drive base can actually drive, and the upper level never has to reason about non-holonomic feasibility.
Whole-body collision stays honest. Collision checks at both levels run on the full 11-DoF configuration, via VAMP. At the base level the arm is held at the current configuration (or a representative tuck); at the total-space level the arm is the lifted value. Nothing is ever checked against a disembodied footprint.

Probabilistic completeness on the bundle

QRRT is probabilistically complete on \(\mathcal{Q}_{\text{free}}\) under the usual bundle-space conditions (Orthey, Sucan & Toussaint, 2018; Orthey & Toussaint, 2020): the projection \(\pi\) is continuous, both levels are sampled with positive density on open sets, and the per-level local planners are admissible. Our stack satisfies these by construction — Dubins/RS is a complete local planner on \(SE(2)\), Euclidean interpolation is complete on \(\mathbb{R}^{k}\), and VAMP performs exact discretized collision checks on each lifted edge.

This is not the same guarantee as a hierarchical base-then-arm decomposition like REMANI (§ 3). A hierarchical planner commits to a base path and then asks whether an arm plan exists on top of it: if no feasible arm plan exists for that specific base path, the query is declared infeasible even when another base path would have worked. QRRT avoids this because the upper level can backtrack through the base tree: if lifting fails on one base vertex, the planner tries the next one, and the search is complete over the joint space, not just the base subspace.

Holonomic fallback. For an omnidirectional base the same architecture degrades gracefully: we swap the Dubins/RS lower level for a standard \(SE(2)\) space with Euclidean metric and straight-line steering, and the bundle structure and QRRT planner above it are unchanged.

Reflections

A few things we did not expect going in:

Dimensionality is a red herring once collision checking is fast. We expected a 24-DoF plan to be meaningfully slower than a 7-DoF plan; it is not.
The right abstraction for constraints is a compiler, not a class hierarchy. CasADi plus code-gen for residuals and costs gave us the flexibility of a framework with the speed of hand-written C. Adding a new task-space constraint is ~30 lines.
Non-holonomic handling belongs in the state-space hierarchy, not the algorithm. For Fetch, a flat RRT-Connect on \(SE(2) \times \mathbb{R}^{8}\) with a compound metric was the obvious first try, and it wastes its sample budget on the arm. Dropping to a two-level fiber bundle — Dubins on \(SE(2)\) at the lower level, arm as the fiber, QRRT across the two — recovered the whole-body coupling without any changes to the underlying algorithm.
Learned samplers remain valuable, but the value shifts. They used to close a 10 to 100× gap against classical planners. With VAMP, that gap is much smaller, and the interesting use of learned samplers is no longer "be faster than RRT" but "propose globally-informed initial seeds for AO planners or for warm-starting a batch of similar queries".

We release both stacks as reference implementations: Autolife-Planning for the 24-DoF bimanual platform, and our VAMP fork in AdaCompNUS/vamp with the rls-digital-twin companion for Fetch. We hope they are useful as starting points.

The bigger meta-point is that there is still a lot of performance left on the table in classical planning, and a lot of it is unlocked by the SIMD and memory-layout work that VAMP pioneered. If you are designing a whole-body planner today, we think the right first question is not "which algorithm" but "how many microseconds does my collision check cost".

Citation

If you found this perspective useful, please cite:

@misc{whole-body-motion-planning-blog,
  author    = {Hu Tianrun},
  title     = {How to Plan Whole-Body Motion in ~1 ms},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://h-tr.github.io/blog/posts/whole-body-motion-planning.html}
}

References

Sşucan, I. A., Moll, M., & Kavraki, L. E. (2012). The Open Motion Planning Library. IEEE Robotics & Automation Magazine, 19(4), 72–82. [Website]
Thomason, W., Kingston, Z., & Kavraki, L. E. (2024). Motions in Microseconds via Vectorized Sampling-Based Planning. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 8749–8756). [arXiv]
LaValle, S. M. (2006). Planning Algorithms. Cambridge University Press.
Lu, Y., Ma, Y., Hsu, D., & Cai, P. (2024). Neural Randomized Planning for Whole-Body Robot Motion. arXiv:2405.11317. [arXiv]
Xu, L., Wong, C., Zhong, Y., Lin, J., Hou, J., & Gao, F. (2026). Primitive-based Truncated Diffusion for Efficient Trajectory Generation of Differential-Drive Mobile Manipulators. arXiv:2604.04166. [arXiv]
Sundaralingam, B., Hari, S. K. S., Fishman, A., Garrett, C., Van Wyk, K., Blukis, V., Millane, A., Oleynikova, H., Handa, A., Ramos, F., Ratliff, N., & Fox, D. (2023). CuRobo: Parallelized Collision-Free Minimum-Jerk Robot Motion Generation. NVIDIA Technical Report. [Website]
Wu, C., Wang, R., Song, M., Gao, F., Mei, J., & Zhou, B. (2024). Real-time Whole-body Motion Planning for Mobile Manipulators Using Environment-adaptive Search and Spatial-temporal Optimization. In IEEE International Conference on Robotics and Automation (ICRA) (pp. 1369–1375). [Website]