Blog

·
How to Plan Whole-Body Motion in ~1 ms
April 17, 2026
Room-scale mobile manipulation demands a planner that reasons over the base, torso, and arm at once, yet the field has not converged on how to do it fast. This post walks through the prior art (neural samplers, cuRobo-style GPU optimization, hierarchical decoupling), argues that vectorized collision checking changes the design space, and presents our OMPL front end × VAMP backend architecture with subgroup, manifold, cost-space, and non-holonomic extensions — deployed on Autolife (24 DoF) and Fetch (11 DoF) at millisecond planning times.
The Grasping Gap in the Wild
March 28, 2026
Pick and place is simultaneously considered too trivial to demo and too hard to deploy reliably. Our recent work on visibility-aware mobile grasping revealed structural gaps in grasping: 43.5% of grasp failures come from kinematically infeasible grasps that score high on geometric quality. This blog presents the failure analysis, per-object breakdowns, and pipeline data that explain why grasping falls apart outside the tabletop, and what to do about it.
Why is mobile manipulation so hard?
December 29, 2025
Mobile manipulation is inherently a holistic system with deep spatial and temporal coupling, yet our engineering capabilities force us to decompose it into modules and hierarchies. This brings a "lossy" problem. But is that the end? What can we do about it?
When You're Too Lazy to Beat Level 2: Teaching AI to Play Water Sort
December 24, 2025
That infamous "Mindful Pouring" game has a level 2 that's driving everyone crazy. Instead of actually solving it like a normal person, I decided to over-engineer the problem with PDDL and Fast Downward. Because why solve puzzles manually when you can write symbolic planning domains? Also featuring: what happens when some blocks are hidden (spoiler: chaos).
Unified Whole-Body MPC Controller for Holistic Mobile Manipulators
April 6, 2025
In this blog, we introduce a unified Model Predictive Control (MPC) trajectory following framework for holistic mobile manipulators. Our whole-body controller coordinates both the arm and mobile base using a single optimization problem, enabling smoother trajectory following and incorporating Control Barrier Functions (CBFs) for base obstacle avoidance.
Beyond the Papers: Scaling Laws and Data Requirements in End-to-End Robotics
March 31, 2025
After implementing ACT, Diffusion Policy, and 3D Diffusion Policy for various manipulation tasks, I've discovered significant gaps between research papers and real-world performance. This blog examines the overfitting problems in current approaches and explores a critical question: Can scaling laws from language models apply to robotics?
Can we do the Stochastic Modeling in Visual Place Recognition?
February 1, 2025
Inspired by VAEs, I had this thought: "Hey, if uncertainty modeling works so well there, why not try it in other areas?" Since I'm pretty familiar with visual place recognition and noticed almost no one was doing this kind of stochastic modeling there, I thought it'd be a perfect playground for experimentation. This blog records my journey - from how I formulated the problem to what happened when I actually tried to make it work. Spoiler alert: things didn't go quite as planned, but the insights were fascinating!
Understanding the Dynamic Balance in Variational Autoencoders
January 18, 2025
This blog explores the fascinating antagonistic process in Variational Autoencoders (VAEs) between the encoder's predicted variance and decoder quality. We examine how the variance adapts throughout training, creating an automatic curriculum that balances reconstruction accuracy with latent space exploration. The post includes mathematical proofs and visualizations to illustrate this dynamic equilibrium.
Building Stable and Consistent Robot Control for Learning
December 23, 2024
This blog introduces a system for stable and consistent robot control that combines velocity control, delta pose representations, and null-space optimization. The framework ensures smooth motion generation and frame-independent movements, producing high-quality training data for advanced learning algorithms like Diffusion Policies and reinforcement learning models.
Contact-Rich Manipulation and Contact Planning
April 23, 2026
Robotic manipulation through hybrid, nonsmooth, combinatorially rich contact has become the central challenge linking trajectory optimization, discrete planning, deep reinforcement learning, and tactile sensing. This survey covers seven themes (contact models, contact-implicit trajectory optimization, mode and sequence planning, learning-based skill acquisition, tactile feedback, dexterous multi-finger manipulation, and sim-to-real transfer) across 2018 to 2026, arguing that the most effective modern methods are not pure paradigms but hybrids that match implicit and explicit contact reasoning to the structure of the task.
Multi-Modal Scene Understanding in Dynamic Environments
April 23, 2026
Autonomous systems must perceive dynamic, partially observable scenes through heterogeneous sensors that individually fail in predictable ways. This survey covers five interlocking themes (dynamic scene representation, observation fusion under uncertainty, temporal observation history, uncertainty quantification and propagation, and robust perception in unknown environments) across 2018 to 2026, arguing that bird's-eye-view has quietly become the unifying abstraction of modern autonomous perception, while the principled propagation of uncertainty from sensors to decisions remains the field's most consequential open problem.
Planning as Inference in Probabilistic Models
April 23, 2026
Planning can be cast as posterior inference in a probabilistic graphical model, with optimality variables conditioning a trajectory distribution that concentrates on high-reward behavior. This survey covers four pillars (control-as-inference foundations, active inference and expected free energy, maximum entropy reinforcement learning, and model-based probabilistic planning) across 2006 to 2026, arguing that the choice of approximate inference scheme is not a computational detail but a constitutive design decision that shapes the epistemic and pragmatic behavior of the resulting agent.
Grasp Planning for Robotic Manipulation
April 23, 2026
Grasp planning, in mature form, is rarely a single-shot problem of pose synthesis. The hard computational work sits upstream, in the pushes, slides, regrasps, handovers, and role assignments that make a grasp feasible in the first place. This survey covers five themes (pre-grasp non-prehensile manipulation, regrasping and handover, bimanual role assignment, integrated task and motion planning, and learning-based sequential manipulation under constraints) across 2016 to 2026, arguing that enabling actions have moved from the periphery of grasp planning to its center and that the most capable recent systems combine foundation-model task decomposition with classical planning for sequencing and learned policies for contact-rich execution.
Programming Paradigms in Robotics
April 23, 2026
Robotics has quietly absorbed a large share of the vocabulary of software engineering (object-centric encapsulation, modular libraries, separation of concerns, abstraction hierarchies) and wrapped it around learned perception, skills, and control. This survey covers object-centric representations, modular skill libraries, and decoupled layered architectures across 2015 to 2026, arguing that the field has solved decomposition but still lacks principled composition operators, and that progress on composition will determine whether robotics inherits the compositional payoff that software engineering has enjoyed for half a century.
Fast Whole-Body Planners and Upstream Architecture
April 23, 2026
Millisecond-scale whole-body motion planning is not merely a performance milestone. It is an architectural parameter that reorganizes the stack above it. This survey covers fast planners, task decomposition, learned samplers, base-placement and IK pre-configuration, and planning-perception-action feedback loops across 2018 to 2026, arguing that once per-query cost drops to a few milliseconds the planner migrates from periphery to center, serving as a feasibility oracle for task planners, a training-data engine for neural policies, and the inner loop of reactive controllers.
Robotic Tool Use and Non-Prehensile Manipulation
April 23, 2026
Beyond grasping sits a vast space of manipulation strategies that most robots still cannot handle. Pushing, sliding, pivoting, tossing, and wielding tools share a common computational core. Reasoning about contact, friction, and function across chains of interacting bodies. This survey covers four themes (planning and control, affordance learning, learning-based approaches, and integrated tool use with non-prehensile strategies) across 2014 to 2026, arguing that representation, not algorithms or hardware, is the dominant factor shaping progress in robotic tool use.
Active Perception in Mobile Manipulation
April 22, 2026
Mobile manipulation robots must perceive their environment to act, but passive perception fails in cluttered, occluded, dynamic scenes. This survey covers five paradigms for action-for-perception (viewpoint planning, exploration, interactive perception, active tactile sensing, and learned perception-action policies) across 2016 to 2026, arguing that the convergence of foundation models, neural implicit representations, and high-fidelity simulation is creating the conditions for a unified treatment where robots reason fluidly across looking, exploring, interacting, and touching.
Chain-of-Thought Reliability
April 22, 2026
Chain-of-thought prompting has become a cornerstone of modern LLM deployment, but the reasoning chains these models produce may not faithfully reflect the computation that actually determines their outputs. This survey covers four interconnected dimensions of reliability (correctness, faithfulness, robustness, and verification) across 2022 to 2026, arguing that the persuasiveness of a reasoning chain is not evidence of its reliability, and that closing the gap between the appearance and the reality of reliable reasoning is the field's central unsolved problem.
Bridging the Simulation-to-Reality Gap
April 23, 2026
Policies trained in simulation routinely fail on physical hardware, and the gap is not a single problem but a composite of dynamics, sensor, appearance, actuator, and action-space errors. This survey traces the 2018 to 2026 arc across locomotion, navigation, manipulation, and mobile manipulation, covering domain randomization, system identification, teacher-student adaptation, residual learning, real-to-sim-to-real pipelines, and continual deployment-time adaptation.
Formal Logic and Reasoning Frameworks for Robotic Systems
March 31, 2026
Robotics sits at the intersection of discrete symbolic reasoning and continuous physical dynamics — a regime that strains every logical formalism. This survey covers six families of tools (PDDL, LTL/STL, probabilistic programming, SMT solvers, ontologies, behavior trees) with formal definitions, worked robotics examples, and practical software guidance. We provide the insights on what task is specifically suitable for what kind of reasoning tool.
Structured Representation for Generalizable Manipulation Skill Modeling
March 30, 2026
The choice of structured representation is the decisive factor determining whether a manipulation system generalizes or merely memorizes. This survey covers six families of approaches — from dense keypoints and affordances to scene graphs, object-centric decompositions, and neuro-symbolic predicates — across 58 papers spanning 2016–2026, arguing that hybrid structured representations consistently outperform end-to-end approaches of much larger scale.
LLM and VLM for Task and Motion Planning
March 29, 2026
Can foundation models replace, augment, or bypass the manual engineering bottleneck in TAMP? This survey covers five families of approaches — from LLMs as task planners to VLAs as end-to-end action models — across the full arc from 2022 to early 2026, and argues that foundation models work best as translators, heuristics, and constraint proposers within architectures that retain classical guarantees.