1. Introduction
Simulation has become the dominant substrate for training learned robot controllers. Physics engines and rendering pipelines can generate millions of interaction episodes at a fraction of the cost and risk of physical experimentation, enabling reinforcement learning and imitation learning to scale to behaviors that would be infeasible to acquire on hardware alone [81, 51, 54, 68]. Yet a persistent and well-documented discrepancy, the simulation-to-reality gap, continues to separate policies that perform well in simulation from those that succeed on physical platforms. Closing the gap, or circumventing it through clever training and deployment strategies, is arguably the central bottleneck for deploying learning-based robotics at scale [29, 57, 62].
The urgency of this problem has intensified in the 2018 to 2026 period for three converging reasons. First, reinforcement learning algorithms and computing infrastructure have matured to the point where simulation-trained policies can solve tasks of genuine practical complexity. Dexterous in-hand reorientation [4], Rubik's cube solving on the Shadow Dexterous Hand [1], agile quadrupedal locomotion over challenging terrain [44, 43, 54, 17], perceptive locomotion in natural environments [54], and champion-level autonomous drone racing [40] all demonstrate that simulation-trained controllers now reach or exceed human expert performance on meaningful tasks. Second, the proliferation of high-fidelity, GPU-accelerated simulators such as MuJoCo [81], Isaac Gym [51], Habitat [70], AI2-THOR [42], and ProcTHOR [20] has lowered the barrier to simulation-based research and expanded the community studying sim-to-real transfer [29, 57]. Third, deployment pressure from logistics, manufacturing, healthcare, and service robotics is elevating transfer reliability from an academic curiosity to an engineering requirement [23, 74, 63].
This survey addresses the following research question. What methods and frameworks have been proposed to bridge the sim-to-real transfer gap in robotics, specifically for locomotion, navigation, manipulation, and mobile manipulation tasks, and how effective are they? We survey the literature from 2018 to 2026, while incorporating foundational earlier work where it remains directly influential on contemporary methods. Our scope encompasses domain randomization and adaptation, system identification and calibration, task-specific transfer strategies for locomotion [77, 32, 44, 43, 54, 72, 65, 24], navigation [69, 39, 70, 42, 20, 3, 11, 71], manipulation [4, 1, 7, 36, 52, 59, 64, 87, 82], and mobile manipulation [76, 25, 26, 27, 85, 84, 45], as well as emerging paradigms, including foundation-model-based methods and real-to-sim-to-real pipelines, that seek to reshape or bypass the gap entirely [9, 82, 8, 60, 31, 46].
The review is organized as follows. Section 2 establishes definitions and scope boundaries. Section 3 examines domain randomization and adaptation as a cross-cutting methodological family, tracing its evolution from heuristic practice to theoretically grounded methodology. Section 4 addresses sim-to-real transfer for locomotion and legged robots. Section 5 covers navigation-specific transfer methods. Section 6 surveys manipulation transfer. Section 7 treats the emerging area of mobile manipulation, where navigation and manipulation gaps compound. Section 8 provides cross-cutting analysis of trends and tensions shared across themes. Section 9 identifies open problems and specific future directions. Section 10 concludes.
Central thesis. The sim-to-real gap is not a single problem but a composite of distinct, diagnosable error sources (dynamics, sensors, visual appearance, actuators, action space abstractions, computational constraints). The most effective transfer pipelines diagnose the dominant gap sources for a specific task and apply targeted methodological tools, rather than relying on any single technique as a universal remedy. The field is converging on a hybrid strategy that combines targeted fidelity for well-understood gap sources with broad randomization for poorly characterized ones, and is migrating from one-time zero-shot transfer toward continual adaptation during deployment.
2. Background and Definitions
2.1 The Reality Gap
The reality gap refers to the aggregate discrepancy between a simulated training environment and the physical deployment environment that causes degraded performance when a simulation-trained policy is executed on real hardware. The term was coined in the evolutionary robotics literature of the 1990s, where controllers evolved inside a simulator for the Khepera and octopod platforms frequently failed on the corresponding hardware [34, 33, 47]. Three decades later the problem persists, though the analytical vocabulary has sharpened. Following diagnostic taxonomies developed in the locomotion literature [5, 6, 41] and broader surveys [29, 57, 62], the gap can be decomposed into five principal sources.
- Dynamics modeling errors. Inaccuracies in rigid-body parameters (masses, inertias, centers of mass), contact models (penetration depth, restitution, friction cones), and actuator models (torque limits, backlash, transmission losses, thermal effects) [77, 32, 6].
- Sensor modeling errors. Discrepancies in simulated sensor outputs relative to physical sensor behavior, including noise characteristics, latency, field-of-view artifacts, and calibration errors for cameras, depth sensors, LiDAR, force and torque sensors, and inertial measurement units [83, 21, 36].
- Visual appearance errors. Differences in rendering quality, lighting, texture, material properties, and scene composition between synthetic and real images, affecting vision-based perception [79, 69, 2, 7].
- Structural and computational errors. Mismatches arising from discretization (integration timestep, solver accuracy), action space abstractions (discrete versus continuous commands), communication latency, and control frequency limitations [3, 63].
- Unmodeled phenomena. Effects absent from simulation entirely, including air currents, cable forces, surface deformations, thermal drift, wear, and payload changes, that accumulate during real-world deployment [38, 30, 6].
This taxonomy is useful because different transfer strategies target different gap sources, and the relative importance of each source varies by task domain. Locomotion is dominated by dynamics and actuator modeling errors [77, 32, 6]; navigation by sensor and visual appearance errors [69, 39, 3]; manipulation by contact dynamics and unmodeled phenomena [4, 59, 89]; mobile manipulation inherits all of the above and compounds them across subsystems [76, 84, 45]. This task dependence recurs throughout the review.
2.2 Key Methodological Concepts
Domain randomization (DR) trains policies over a distribution of simulation parameters, physical, visual, or both, so that the real world appears as merely another sample from the training distribution [79, 61, 80]. Domain adaptation (DA) explicitly aligns representations or distributions across source (simulation) and target (reality) domains, typically using adversarial training or image translation [7, 36, 56, 67]. System identification (SysID) estimates the physical parameters of the real system and configures the simulator to match, reducing the gap at its source [77, 66]. Residual learning trains a learned component to correct or augment a classical controller, limiting the scope of what must transfer [73, 37, 89]. Teacher-student distillation trains a privileged teacher with access to ground-truth simulation state, then distills it into a deployable student that uses only real-world-available observations [44, 15, 11]. Real-to-sim-to-real transfer constructs a personalized simulation from real observations and trains policies within it before redeployment [82]. Continual adaptation performs ongoing learning during deployment, tracking non-stationary real-world dynamics rather than treating transfer as a one-time event [38, 30, 43].
2.3 Scope and Boundaries
This review covers methods that involve a simulation training phase and a physical deployment phase, with the explicit goal of bridging the performance gap between them. We include zero-shot transfer (no real-world training data) [79, 69, 61], few-shot adaptation (limited real-world data for calibration or fine-tuning) [66, 78, 14], and continual adaptation (ongoing real-world learning after deployment) [38, 43, 30]. We exclude pure real-world learning that does not involve simulation, classical control without learned components, and sim-to-sim transfer used solely as a research methodology, except where it explicitly proxies the reality gap [47]. While we note connections to the foundation model literature where it intersects with sim-to-real methods, a comprehensive review of large-scale robot learning is beyond our scope.
3. Domain Randomization and Adaptation
Domain randomization has emerged as the most widely adopted family of techniques for sim-to-real transfer, owing to its conceptual simplicity, ease of implementation, and generality across task domains [79, 61, 4, 1, 57]. This section traces its evolution from an early heuristic to a theoretically grounded methodology, examines distinct considerations for visual versus dynamics randomization, surveys principled calibration methods, and reviews domain adaptation as an alternative alignment strategy.
3.1 Visual Domain Randomization
A foundational insight of visual domain randomization is that diversity of training appearances matters more than photorealistic rendering fidelity. Tobin et al. [79] demonstrated that training object detectors on non-photorealistic synthetic images with highly varied random textures, lighting, and camera placements enabled successful transfer to real visual environments, because broad distributional coverage causes the real world to appear as just another variation within the training distribution. This diversity-over-realism principle has been independently validated across manipulation [80, 35], navigation [69], and grasping perception [88, 22, 49].
The relationship between rendering quality and transfer performance is not monotonic. Alghonaim and Liarokapis [2] conducted a systematic benchmark of visual DR design choices and found a quality-quantity tradeoff. A small number of high-fidelity rendered images consistently outperformed a large number of low-quality images, which nuances the diversity principle. While photorealism is not necessary, rendering quality is not irrelevant, and computational budgets should prioritize fidelity over sheer dataset size when both cannot be maximized. Procedural environment generation at scale, as in ProcTHOR [20], offers a middle path by algorithmically generating diverse but structured environments that maintain geometric plausibility while varying broadly in layout, texture, and lighting.
Sadeghi and Levine [69] provided an early and influential demonstration that purely simulated visual data, with no real images whatsoever, could train collision-avoidance policies for indoor drone navigation that transferred zero-shot to real flight. Their CAD2RL result established the viability of the synthetic-only training paradigm for navigation well before the broader adoption of visual DR. The practical implication across this body of work is that visual DR design requires balancing coverage breadth against per-sample quality, a tradeoff that remains underspecified by theory and is typically resolved through empirical tuning [2, 20]. Object-aware GAN adaptation [67] further shows that pairing visual DR with targeted region-aware translation can preserve task-relevant appearance while randomizing nuisance factors.
3.2 Dynamics Domain Randomization
Dynamics randomization, varying physical simulation parameters during training, addresses the complementary problem of physics modeling inaccuracy. Peng et al. [61] established the modern dynamics DR paradigm by demonstrating that randomizing masses, friction, damping, and actuator gains produced pushing policies that transferred successfully to real robots, even when no single parameter setting matched reality. This work, together with the visual DR results of Tobin et al. [79], established the two pillars of the DR framework that subsequent work has refined [4, 1, 43, 28].
The question of which parameters to randomize has yielded surprising findings. Exarchos et al. [24] challenged the conventional practice of carefully measuring kinematic parameters while randomizing dynamic ones, showing that randomizing kinematic parameters (link lengths, joint offsets) yields more robust transfer than randomizing dynamic parameters alone. Geometric uncertainty may therefore be a more impactful gap source than dynamic uncertainty for many tasks. The early work of Jakobi [34] anticipated this selectivity by proposing minimal simulation with targeted noise injection. Rather than exhaustively randomizing all parameters, identify only the sensorimotor features that can be reliably simulated and inject noise specifically into those features. This principle of surgical randomization has been independently rediscovered in several contemporary studies [22, 16, 6].
The question of how much to randomize received theoretical grounding from Chen et al. [16], who proved that history-dependent policies are not merely a practical heuristic but a theoretically necessary ingredient for DR to achieve tight performance bounds. Their analysis demonstrates that the sim-to-real gap under DR is provably reducible under mild conditions even without real-world samples, but only when the policy architecture has sufficient memory to implicitly identify the current domain from interaction history. This formal result retroactively explains the empirical success of recurrent and transformer-based policy architectures in DR-heavy training regimes and motivated the widespread adoption of observation histories as policy inputs [43, 44, 54].
3.3 Automatic Domain Randomization
Manual specification of randomization ranges scales poorly to complex environments. Akkaya et al. [1] introduced Automatic Domain Randomization (ADR), which progressively expands parameter ranges during training based on policy performance thresholds. When the policy achieves a target success rate under the current range, ADR automatically widens it; when performance drops, ADR narrows. Applied to Rubik's cube solving with a Shadow Dexterous Hand, arguably the most complex sim-to-real manipulation result to date, ADR enabled transfer of a policy trained entirely in simulation to solve the cube on physical hardware. ADR eliminates the need for manual range engineering and naturally produces a curriculum from easy (narrow randomization) to hard (broad randomization), improving both training stability and final transfer performance.
Mehta et al. [53] proposed Active Domain Randomization, which uses a discriminator network to identify simulation parameters where the policy is most likely to fail and concentrates randomization effort on those parameters. Rather than uniformly expanding all ranges as in ADR, active DR allocates randomization budget where it matters most, producing policies that are robust specifically where the gap is largest. Auto-tuned sim-to-real transfer [91] connects these curricular approaches with automated calibration, jointly searching over the randomization schedule and the policy parameters. Both strategies echo the broader principle that surgical randomization outperforms brute-force approaches [34, 24].
3.4 Calibrating Randomization Distributions
A critical limitation of standard DR is that randomization ranges are typically chosen ad hoc. Too narrow a distribution fails to cover the real world, while too broad a distribution produces an overly conservative policy. Several principled calibration strategies have emerged.
Bayesian system identification frames simulator calibration as posterior inference over physical parameters. Ramos et al. [66] introduced BayesSim, which uses likelihood-free inference to estimate posterior distributions over simulation parameters from small sets of real-world trajectories, providing both point estimates and calibrated uncertainty bounds. Tiboni et al. [78] pursued a complementary approach with DROPO, fitting simulation parameter distributions to maximize the likelihood of offline real trajectories and enabling zero-shot transfer by centering the randomization distribution on the true physical parameters. Both methods convert the manual range-engineering problem into a principled statistical inference problem, though they require some real-world data, typically between five and fifty trajectories.
Closed-loop simulation-to-real methods iteratively alternate between policy training in simulation and parameter estimation from real-world trials. Chebotar et al. [14] demonstrated this approach with SimOpt, using real-world rollout data to update the distribution over simulation parameters, which in turn guides the next round of policy optimization. This iterative refinement converges to parameter distributions that are well-calibrated to the specific deployment environment, achieving better transfer than static randomization ranges while requiring only modest real-world interaction. Digital-twin approaches [92] extend this closed loop by maintaining a persistent real-robot calibration database that guides simulator configuration across tasks.
Online fine-tuning takes a different philosophy. Train with broad DR to obtain a robust initialization, then fine-tune on limited real-world data to close the residual gap [55, 43, 82]. Josifovski et al. [38] extended this with safe continual adaptation, combining safe reinforcement learning constraints with continual learning during deployment to track non-stationary dynamics without catastrophic forgetting. The practical reality that deployed environments are never stationary motivates this shift from one-time calibration to continuous refinement.
3.5 Domain Adaptation Across Representations
While DR achieves robustness by training across a distribution of environments, domain adaptation methods explicitly align the statistical properties of simulated and real observations, typically through learned image transformations or feature-space alignment. Generative adversarial approaches have been particularly influential. Bousmalis et al. [7] demonstrated GraspGAN, which uses GAN-based pixel-level adaptation to translate simulated grasping images toward realistic appearance, enabling training of grasping policies on adapted synthetic data that transfer to real robots. James et al. [36] proposed Randomized-to-Canonical Adaptation Networks (RCAN), which invert the usual adaptation direction. Rather than making simulation look real, they train a network to map both randomized simulation images and real images to a shared canonical representation. This sim-to-sim-to-real approach elegantly combines DR (to provide input diversity for the adaptation network) with DA (to align the canonical space) and achieves sim-to-real transfer of visuomotor manipulation policies with no real-world training.
Müller et al. [56] applied a similar philosophy to autonomous driving, using domain adaptation to translate between simulated and real driving scenes. Rao et al. [67] combined reinforcement learning with cycle-consistent image translation in RL-CycleGAN, jointly optimizing the policy and the domain translator. RetinaGAN [93] added object-awareness to the pixel-level translator so that task-relevant instances remain perceptually consistent across the translated images. The consistent finding across these works is that explicit visual alignment can substantially reduce the visual sim-to-real gap, at the cost of additional architectural complexity and the risk that the adaptation network introduces artifacts that degrade policy performance.
3.6 Representation-Based Adaptation
A more recent trend moves beyond pixel-level alignment toward learning representations that are inherently domain invariant. Ma [50] proposed spectral skill representations in which spectral decomposition of the simulator Markov decision process yields basis functions (skills) spanning the Q-function space. Real-world deployment then discovers orthogonal complement skills from limited real data, which by construction capture only the dynamics discrepancy. This provides a principled adaptation mechanism that avoids the brute-force coverage of DR while requiring less real-world data than full fine-tuning.
Dense object descriptors offer a complementary approach for visual manipulation. Cao [12] demonstrated that learning a domain-invariant dense descriptor space with explicit cross-domain pixel consistency constraints enables zero-training generalization to unseen objects and visual environments, extending the Dense Object Nets line of work [94, 95] into the sim-to-real setting. Similarly, point cloud observations serve as a naturally domain-invariant input modality, because 3D geometric structure is more consistent between simulation and reality than visual appearance, so policies trained on point clouds transfer more readily than those trained on RGB images [64, 87]. R3M [96] and related pretrained visual representations provide another path by learning domain-invariant features from large real-world video corpora, then freezing those features as backbones for simulation-trained policies. These representation-centric strategies reflect a broader shift from making the data transferable (DR, DA) to making the representation transferable, connecting sim-to-real research to the broader transfer learning community.
A unifying view. DR, DA, and representation-based adaptation are three points on a single continuum. DR covers the real world in data space; DA aligns source and target in pixel space; representation-based methods align in a learned feature space. Each trades a different axis of cost (sample count, model complexity, representation expressiveness) for a different axis of robustness, and practical systems increasingly combine all three.
4. Sim-to-Real for Locomotion
Locomotion has served as one of the most active proving grounds for sim-to-real methods, driven by the convergence of capable reinforcement learning algorithms, accurate actuator models, and commercially available legged platforms. This section traces the progression from early system-identification-based approaches through the domain randomization revolution to the current state of the art in agile, terrain-adaptive locomotion.
4.1 System Identification and Actuator Modeling
Early sim-to-real work for locomotion emphasized minimizing the gap through careful system identification. Tan et al. [77] demonstrated that systematic motor modeling, specifically fitting an accurate actuator model from hardware data, combined with careful simulation tuning, enabled transfer of agile locomotion gaits to the Minitaur robot. Their approach of training in a well-calibrated simulator and then deploying with latency compensation achieved remarkably robust transfer, establishing system identification as a viable standalone strategy for platforms with well-characterized dynamics.
Hwangbo et al. [32] extended this approach by training a neural network actuator model (the "actuator net") from real motor data, which was then integrated into the simulator as a drop-in replacement for the idealized actuator model. When combined with reinforcement learning training, this produced agile and dynamic locomotion behaviors on the ANYmal quadruped that transferred reliably to hardware. The actuator net concept demonstrated that learning the sim-to-real gap, rather than engineering it away through physics equations, could be both more accurate and more general than analytical models.
Bjelonic et al. [6] identified actuator-specific energy losses, namely electrical and mechanical dissipation in permanent magnet synchronous motors, as a distinct, previously underappreciated gap component. Integrating a first-principles motor energy model into the reinforcement learning reward simultaneously closed this gap and improved real-world energy efficiency by 32 percent in cost of transport. This illustrates a general principle. Modeling a specific physical phenomenon not only improves transfer fidelity but can unlock performance benefits that generic robustness strategies cannot capture.
4.2 Domain Randomization in Locomotion
The breakthrough in scaling sim-to-real locomotion came from combining domain randomization with teacher-student training architectures. Lee et al. [44] trained a teacher policy with access to privileged terrain information in simulation, then distilled it into a student policy that used only proprioceptive observations and an observation history. The history implicitly encoded terrain properties through the policy's experience. Combined with extensive DR over terrain types and dynamics parameters, this teacher-student pipeline enabled a quadruped to traverse challenging terrains (stairs, slopes, gaps) that no single simulation setting could have prepared it for.
Kumar et al. [43] formalized the implicit system identification happening through observation histories with Rapid Motor Adaptation (RMA), which explicitly decomposes the policy into a base policy and an adaptation module. The adaptation module estimates a latent encoding of the current environment dynamics from a short history of proprioceptive observations, which modulates the base policy's behavior. Trained with DR in simulation, RMA enables rapid online adaptation to novel terrains, payloads, and motor degradation in the real world, all without explicit system identification or fine-tuning. This architecture has become arguably the most influential sim-to-real framework in locomotion, with numerous subsequent works adopting its core design [54, 17, 65].
The combination of massively parallel simulation and DR enabled dramatic scaling. Rudin et al. [68] demonstrated that locomotion policies could be learned in minutes using Isaac Gym's GPU-parallel simulation, with the resulting policies transferring directly to hardware. Miki et al. [54] extended this to perceptive locomotion, combining exteroceptive sensing (elevation maps from depth cameras) with proprioceptive adaptation to enable robust navigation over diverse terrains including snow, mud, and urban environments. The key insight is that proprioceptive adaptation handles dynamics uncertainty, while exteroceptive perception handles geometric uncertainty. This decomposition has become standard practice. Recent benchmarks comparing sim-to-real quadruped locomotion on Isaac-family stacks [23, 97] reaffirm that the RMA-style teacher-student template transfers across hardware platforms with modest retuning.
4.3 Agile and Extreme Locomotion
Recent work has pushed sim-to-real locomotion toward increasingly athletic behaviors. Zhuang et al. [90] and Cheng et al. [17] demonstrated parkour-capable quadrupeds that can jump over obstacles, climb walls, and leap across gaps, with behaviors learned entirely in simulation through curriculum learning with DR and transferred zero-shot to hardware. These results suggest that the combination of large-scale DR, curriculum design, and sufficient policy capacity can produce even highly dynamic behaviors that transfer reliably, though the curriculum design itself remains a manual engineering challenge.
In the aerial domain, Loquercio et al. [48] demonstrated sim-to-real transfer of agile drone flight policies through gates and obstacle courses at speeds exceeding human pilot performance. Kaufmann et al. [40] achieved champion-level autonomous drone racing, with simulation-trained policies competing against and defeating world-class human pilots. Both works relied on privileged teacher-student training combined with DR, confirming that the architectural patterns developed for ground locomotion generalize to aerial platforms, albeit with domain-specific gap sources (aerodynamic effects, motor response curves). Dedicated quadrotor benchmarks such as Learning to Fly in Seconds [98] further compress the simulation time budget.
For humanoid and bipedal locomotion the gap remains more challenging due to higher dimensionality and more complex contact patterns. Radosavovic et al. [65] demonstrated that simulation-trained transformer policies can enable humanoid walking in the real world, though the behavioral repertoire remains more limited than for quadrupeds. Siekmann et al. [72] achieved sim-to-real transfer for the Cassie bipedal robot using reward shaping and DR. Bao [5] and Kim [41] provide diagnostic taxonomies identifying four primary gap sources for bipedal locomotion (dynamics modeling, contact modeling, state estimation, solver discrepancies), with Kim's controlled ablation study establishing a prioritized ranking of individual technique importance. Singh [74] trained a robust biped locomotion policy via systematic DR combined with reference-free reward shaping, and Nai et al. [58] recently proposed a robot-free humanoid manipulation interface that feeds IMU-captured human motion to the humanoid, bypassing both simulation and teleoperation for the upper body.
4.4 Real-World Learning Infrastructure
An emerging approach addresses the gap not through better simulation but through better real-world infrastructure. Hu et al. [30] proposed using a secondary robotic system (a robotic arm) as physical teacher infrastructure that provides safety support, perturbations, reward signals, and automatic resets for humanoid locomotion learning. This "robot trains robot" paradigm enables direct real-world policy training with minimal human intervention, complementing simulation-based approaches with a pathway to ground-truth dynamics. Smith et al. [75] similarly demonstrated that real-world reinforcement learning for quadruped locomotion is tractable with appropriate safety infrastructure and reset mechanisms. As robotic platforms become more reliable, the role of simulation may shift from training substrate to initialization source, with real-world learning closing the final gap after deployment.
5. Sim-to-Real for Navigation
Navigation presents distinctive sim-to-real challenges because it involves sustained interaction with large-scale, visually complex environments where sensor fidelity and scene diversity are the dominant gap sources, rather than the contact dynamics that challenge locomotion and manipulation. Early visual-servoing work on AR.Drone platforms [83] and task-level grounding studies in urban search and rescue [13] established the problem well before the current learning-centric formulations.
5.1 The Visual Navigation Gap
For visual navigation the primary gap source is the discrepancy between rendered and real visual observations. Sadeghi and Levine [69] demonstrated that this gap can be crossed with sufficient visual diversity. Their CAD2RL approach trained collision-avoidance policies entirely on synthetic images with no real-world data, transferring zero-shot to indoor drone flight. The key enabler was massive visual randomization of textures, objects, and lighting rather than photorealistic rendering.
Kadian et al. [39] systematically evaluated the sim-to-real predictivity of the Habitat simulator for PointGoal navigation, finding that relative performance rankings between methods in simulation were largely preserved in real-world deployment, even when absolute performance differed. This result, that simulation is a reliable ordinal predictor of real-world performance even if it is not an accurate cardinal predictor, has important implications for using simulation as a development tool. Researchers can trust that methods ranking better in simulation will rank better in reality, even if the precise transfer gap is unpredictable.
The sim-to-real gap for Vision-and-Language Navigation (VLN) introduces qualitatively different challenges. Anderson et al. [3] identified a discrete-to-continuous action space mismatch as a distinct gap source. Policies trained with high-level discrete navigation actions in simulation (turn left, go forward) cannot directly command real actuators, requiring an intermediate waypoint model. This structural gap is orthogonal to visual and dynamics gaps, and highlights that sim-to-real transfer involves not only physical fidelity but also architectural alignment between training and deployment interfaces.
5.2 Simulation Platforms and Scene Diversity
The development of large-scale, photorealistic simulation platforms has been a major enabler for navigation research. Habitat [70] reconstructs real indoor environments from 3D scans, providing visually realistic navigation environments. AI2-THOR [42] and its extension RoboTHOR [19] offer interactive environments with manipulable objects. ProcTHOR [20] addresses scene diversity through procedural generation of thousands of house-scale environments, significantly expanding the distributional coverage of training environments. Earlier multimodal indoor simulators such as MINOS [99] and high-fidelity procedural pipelines for mobile robots [100] informed this progression.
These platforms have enabled a progression from simple PointGoal navigation toward semantically rich tasks. ObjectGoal navigation requires navigating to an instance of a category, ImageGoal navigation requires matching a depicted location, and Vision-and-Language Navigation requires grounding language in perceptual experience. Each task type introduces additional gap sources. ObjectGoal requires recognizing real instances of object categories encountered only in simulation; ImageGoal requires visual matching across domain differences; VLN requires grounding language in real perceptual experience. The trend is toward increasingly complex navigation tasks where the sim-to-real gap is multifaceted and irreducible to a single source.
5.3 Privileged Information and Teacher-Student Methods
A powerful strategy for bridging the navigation gap leverages simulation's unique advantage, access to ground-truth information unavailable in the real world. Cai [11] proposed NavDP, which trains a critic on contrastive trajectory samples supervised by privileged labels (ground-truth occupancy maps, full scene geometry) to distill spatial understanding into the policy, enabling zero-shot transfer with only RGB-D observations at inference time. This teacher-student approach, using simulation's perfect information as a training signal while deploying with partial observations, mirrors the architecture that transformed locomotion [44, 43] and has become a standard design pattern.
Chen et al. [15] demonstrated that training a privileged teacher for navigation with access to the full floor plan, then distilling into a student with only egocentric observations, substantially outperforms end-to-end training of the student alone. The distillation process effectively compresses global spatial understanding into local observation patterns, producing policies that behave as if they understand the global environment structure despite perceiving only locally. Gervet et al. [26] combined this privileged-information paradigm with classical mapping modules for real-world object navigation, delivering one of the first science-style demonstrations of modular sim-to-real navigation at home-robot scale.
5.4 Deployment Extremes
Practical deployment introduces additional gap dimensions underrepresented in the academic literature. Pylypenko [63] demonstrated that INT8 quantization of reinforcement-learning-trained navigation policies enables deployment on bare microcontrollers (ESP32) with less than 2.5 percent accuracy loss and sub-millisecond inference, showing that sim-to-real transfer and severe compute constraints can be jointly satisfied. At the opposite extreme, modern visual navigation transformers such as ViNT [71] leverage large-scale pretraining on diverse navigation data to learn representations that generalize across environments, potentially reducing the sim-to-real gap through the same mechanisms that make foundation models domain robust.
An alternative to gap reduction is gap avoidance. Bruce et al. [10] demonstrated that grounding the training world model in a single real-world sensor traversal and applying stochastic augmentation can provide sufficient coverage for zero-shot transfer, sidestepping the sim-to-real gap entirely by never constructing a synthetic environment. While limited in scalability (it requires at least one physical traversal per environment), this approach offers a valuable complement in settings where simulation fidelity is particularly poor or where rapid deployment in a specific environment is the priority. The historical rescue robotics literature [13, 101] anticipated this grounded approach by emphasizing that physically traversed maps beat generic indoor simulators for targeted field tasks.
6. Sim-to-Real for Manipulation
Manipulation tasks impose distinctive transfer challenges. Precise contact interactions, complex and variable object geometries, and the requirement for accurate perception of potentially novel objects jointly determine the transfer surface. The sim-to-real manipulation literature has developed specialized strategies spanning visual perception transfer, dexterous skill transfer, contact-rich assembly, and emerging paradigms that reshape the simulation-to-reality relationship [62].
6.1 Visual Perception Transfer
A large body of manipulation transfer work focuses on bridging the visual gap for perception components. The progression from sparse to dense visual representations illustrates the field's maturation. Keypoint-based approaches train detectors to localize semantically meaningful points on robots or objects. Lu et al. [49] demonstrated sim-to-real transfer of marker-less keypoint detectors for robot pose estimation, trained entirely on synthetic data, with the notable innovation of optimizing keypoint placement in simulation to maximize detection robustness. KOVIS [102] extended this to keypoint-based visual servoing with zero-shot transfer across manipulators.
Dense pixel-level descriptors represent a more expressive alternative. Cao [12] showed that domain-invariant dense descriptor spaces with cross-domain consistency constraints enable zero-real-training generalization to unseen objects, addressing the per-object limitation of keypoint methods. Dense Object Nets [94] established the feature-space paradigm, with subsequent work extending it to multi-step pick-and-place [95], multi-object grasping [103], and cluttered picking [104]. Self-supervised sim-to-real visual adaptation [105] offers an alternative that trains on unlabeled real data through reconstruction and contrastive objectives.
Ding et al. [21] extended the synthetic-only training paradigm from visual to tactile modalities, demonstrating that soft-body simulation of a deformable TacTip sensor can train contact geometry inference networks that transfer to real hardware with no real data. The wider tactile sensing literature, spanning GelSight [106], BioTac-style vibration sensing [107], FingerVision [108], and finite-element force supervision [109], collectively establishes tactile simulation as a viable training substrate. This cross-modal extension suggests that the DR framework is not fundamentally limited to vision. Any approximately simulable sensor modality is amenable to synthetic training, and recent surveys of sim-to-real manipulation [62] highlight tactile transfer as one of the fastest-growing subareas.
Rather than adapting visual representations, an alternative strategy selects input modalities that are inherently domain invariant. Qin et al. [64] proposed DexPoint, using point cloud inputs for dexterous manipulation. Because 3D geometric structure is more consistent across domains than visual appearance, point cloud policies transfer directly without adaptation. Ze et al. [87] extended this to 3D Diffusion Policy, combining point cloud representations with denoising diffusion architectures for more expressive and generalizable manipulation policies. The tradeoff is that point clouds require depth sensing and may lose appearance-relevant information, but for geometric manipulation tasks this tradeoff is highly favorable.
6.2 Dexterous Manipulation at Scale
The most ambitious demonstrations of DR-based sim-to-real manipulation come from OpenAI's series of works. Andrychowicz et al. [4] achieved sim-to-real transfer of in-hand object reorientation on the Shadow Dexterous Hand through massive DR of physical and visual parameters combined with distributed reinforcement learning and no human demonstrations. Human-like manipulation strategies (finger gaiting, multi-finger coordination, gravity exploitation) emerged naturally from reward maximization, suggesting that sufficiently randomized simulation can discover contact-rich skills without imitation signals. Akkaya et al. [1] extended this to Rubik's cube solving with Automatic Domain Randomization, demonstrating that ADR's progressive range expansion can handle the extreme complexity of multi-step, multi-finger manipulation.
These results are important not only technically but for their implication about the sufficiency of DR at scale. With broad enough randomization and sufficient policy capacity, emergent dexterous behaviors can arise without explicit modeling of the specific phenomena that produce them. However, both required enormous computational resources, raising questions about accessibility and scalability. Handa et al. [28] (DeXtreme) advanced dexterous manipulation sim-to-real further using Isaac Gym's GPU parallelism, demonstrating that GPU-accelerated simulation can reduce the computational barrier while maintaining transfer quality for in-hand reorientation tasks.
Subsequent work extended DR-based manipulation transfer to broader domains. Matas et al. [52] demonstrated transfer for deformable object manipulation (cloth folding), where explicit physical modeling is impractical. Zhang L. [88] combined DR with task-specific data augmentation for model-free grasping of novel complex-shaped objects. Dong et al. [22] showed that geometric primitives (cylinders, ellipsoids) trained with targeted DR transfer cleanly to industrial picking. GraspNet-1Billion [110] and related large-scale grasping benchmarks [111] now provide standardized evaluation platforms for DR-trained grasping policies. These extensions confirm that the DR paradigm generalizes beyond the rigid-body, fixed-geometry settings where it was first validated.
6.3 Hybrid and Residual Approaches
End-to-end learned manipulation policies face particularly severe gaps in contact-rich tasks because contact dynamics are among the least accurately simulated phenomena. Residual policy learning addresses this by training a learned component to correct a classical baseline, limiting the scope of what must transfer. Johannink et al. [37] and Silver et al. [73] established residual reinforcement learning as a general paradigm, showing that learning a residual correction on top of a hand-designed controller can overcome both the limitations of the controller (inability to handle complex dynamics) and the limitations of pure reinforcement learning (sample inefficiency, poor transfer).
Yoneda et al. [86] demonstrated this in the Real Robot Challenge for dexterous manipulation with the TriFinger platform. Zhang X. [89] extended residual learning specifically to compliance control, showing that online residual learning of admittance parameters using force feedback bridges the sim-to-real gap for assembly tasks. Their two-phase approach (offline reinforcement learning with DR, then lightweight online residual adaptation) achieves robust transfer without extensive real-world data. Narang et al. [59] demonstrated this principle at industrial scale with Factory, using residual learning for contact-rich assembly in simulation with transfer to real peg-insertion and gear-meshing. The common principle is scope limitation. By constraining the learned component to correct a classical baseline rather than generate the full control signal, the magnitude of what must transfer is reduced and the classical component provides a safety floor.
6.4 Imitation Learning and the Foundation Model Era
A parallel development has reshaped manipulation transfer through large-scale imitation learning. While not sim-to-real transfer in the classical sense, these methods intersect with the sim-to-real literature through their use of simulation for data augmentation, policy initialization, and evaluation.
Diffusion-based policies have emerged as a powerful architecture for manipulation. Chi et al. [18] introduced Diffusion Policy, which models action generation as iterative denoising, achieving strong performance on contact-rich manipulation. While the original work focused on real-world imitation, the diffusion architecture has been adopted for sim-to-real pipelines [87, 11] because its multimodal action distribution naturally accommodates the variability introduced by domain randomization. Generalist robot manipulation frameworks beyond action-labeled data [112] further blur the sim-to-real boundary by training action-free video policies and composing them with simulation-trained controllers.
The RT series from Google DeepMind [8, 9] demonstrated that scaling robot data and model capacity can produce manipulation policies with broad generalization. RT-2 [9] showed that vision-language models pretrained on internet-scale data can serve as manipulation policy backbones, potentially providing domain-invariant representations that reduce the visual sim-to-real gap. The Open X-Embodiment collaboration [60] extended this to cross-embodiment transfer, training on data from 22 different robot types, a form of "embodiment randomization" that echoes domain randomization at a much larger scale.
6.5 Real-to-Sim-to-Real
An emerging paradigm inverts the traditional transfer direction by constructing task-specific simulations from real-world observations. Torne et al. [82] demonstrated that building personalized digital twins from small amounts of real-world scan data, then using reinforcement learning fine-tuning in those digital twins, achieves manipulation robustness exceeding both pure imitation and generic DR. This approach collapses the sim-to-real gap by construction. The simulation is the real world, up to reconstruction accuracy. Industrial digital-twin pipelines [92] generalize this idea to structured factory environments.
Nai et al. [58] pushed gap avoidance further by bypassing simulation entirely for humanoid manipulation, retargeting human demonstrations directly to robot skills via IMU-based motion capture. Robot See Robot Do [113] demonstrated an analogous pipeline that imitates articulated object manipulation from monocular 4D reconstructions of a human demonstrator. While limited to demonstrable tasks, these approaches eliminate the sim-to-real gap by eliminating the simulation, representing a philosophically distinct alternative to simulation-centric approaches.
Manipulation gap decomposition. Visual gap reduction (DR, DA, descriptors, point clouds) targets perception. Residual learning and SysID target contact dynamics. Real-to-sim-to-real and human-demo retargeting bypass simulation for the remainder. Ambitious manipulation systems now combine all three axes, and the open question is how to allocate engineering effort across them for a given platform and task.
7. Sim-to-Real for Mobile Manipulation
Mobile manipulation, tasks requiring coordinated locomotion and manipulation such as navigating to an object and grasping it, represents the frontier of sim-to-real transfer. This domain inherits the gap sources of both navigation (visual appearance, scene diversity) and manipulation (contact dynamics, object variability), and introduces additional challenges from their interaction. Early aerial mobile manipulation work [114] and reactive mobile-manipulation planners [115] anticipated these compounding challenges before the current learning pipelines matured.
7.1 The Compounding Gap Problem
The fundamental challenge of mobile manipulation sim-to-real is that errors compound across subsystems. A visual navigation error that positions the robot imprecisely leads to an out-of-distribution manipulation scenario; a manipulation failure that changes the robot's payload affects subsequent navigation dynamics. This error compounding means that methods achieving adequate transfer for navigation alone and manipulation alone may fail when composed, because neither subsystem was trained to handle the distribution shifts introduced by the other's real-world imperfections.
Szot et al. [76] introduced Habitat 2.0 as the first large-scale simulation platform explicitly targeting mobile manipulation, providing a virtual home assistant that must navigate through realistic indoor environments, approach target objects, and manipulate them. This platform revealed that monolithic end-to-end policies for mobile manipulation are extremely difficult to train even in simulation, due to the enormous state-action space and the need to coordinate navigation and manipulation behaviors over long horizons.
7.2 Modular Versus End-to-End Architectures
The architectural choice between modular and end-to-end approaches has distinct sim-to-real implications. Modular systems decompose the task into navigation, approach, and manipulation stages, each with its own policy or controller, connected through handoff conditions. This decomposition enables each module to be trained, transferred, and debugged independently, and allows mixing simulation-trained learned modules with classical controllers [26]. The cost is that interfaces between modules must be carefully designed, and errors at handoff points can cascade.
Gu et al. [27] demonstrated a multi-skill mobile manipulation framework that uses a skill library of pretrained manipulation and navigation primitives, composed by a higher-level task planner. Each skill is trained with DR in simulation and transferred independently, with composition happening at the symbolic level. Yokoyama et al. [85] evaluated sim-to-real mobile manipulation for home robotics, finding that modular architectures with explicit state machines for mode switching (navigate, orient, grasp, place) achieved more reliable real-world transfer than end-to-end alternatives, though at the cost of reduced behavioral flexibility. Sim-to-real case studies on general-purpose platforms such as TIAGo [116] reinforce the observation that modular decompositions simplify debugging when gap sources are heterogeneous.
End-to-end approaches avoid the interface design problem but face a harder transfer challenge. Xia et al. [84] benchmarked interactive navigation (navigation requiring physical interaction with the environment, such as opening doors) as a stepping stone toward full mobile manipulation, finding that policies must learn coordinated base and arm motions that are highly sensitive to dynamics parameters. Li et al. [45] introduced BEHAVIOR-1K, a benchmark of 1,000 everyday activities requiring mobile manipulation, highlighting the enormous diversity of scenarios that a general mobile manipulation system must handle, and by extension the diversity of sim-to-real gaps it must bridge.
7.3 Learning from Human Demonstrations
Demonstration-based approaches have shown particular promise for mobile manipulation, where the reward engineering and curriculum design challenges of reinforcement learning are amplified by the long-horizon, multi-stage nature of the tasks. Fu et al. [25] introduced Mobile ALOHA, a system for bimanual mobile manipulation that learns from human teleoperation demonstrations. While Mobile ALOHA operates primarily in the real world (with co-training on diverse manipulation data for generalization), it highlights a trend toward collecting demonstrations that inherently bridge the reality gap. The data comes from the real robot in the real world, and simulation's role shifts from training substrate to data augmentation and evaluation.
The intersection of mobile manipulation with large language and vision-language models has opened new transfer pathways. High-level task planning through language models [31, 46] can decompose complex mobile manipulation instructions into sequences of primitive skills, each of which can be individually transferred from simulation. This hierarchical approach effectively limits the scope of sim-to-real transfer to short-horizon skill execution while delegating long-horizon coordination to foundation models that generalize through language rather than through physics.
7.4 State of the Art and Open Limits
Mobile manipulation sim-to-real remains substantially less mature than its constituent subfields. The literature is characterized by system-level demonstrations [85, 27] rather than systematic ablation studies and theoretical analyses available for locomotion [41] and domain randomization [16]. Most published results use relatively simple manipulation primitives (pick, place, push) rather than the dexterous behaviors achieved in manipulation-only sim-to-real work [4, 1]. The gap between what simulation-trained policies can do when navigation and manipulation are separately optimized versus jointly executed remains large, and principled methods for joint transfer, rather than modular composition, are largely absent. Benchmark suites now emerging in this space [117] offer the first shared evaluation harness.
8. Cross-Cutting Analysis
8.1 The Fidelity and Diversity Tradeoff
Across all task domains a central tension recurs. Should the sim-to-real gap be closed by increasing simulation fidelity, making simulation more like reality, or by increasing training diversity, making the policy robust to any reality? This is not a binary choice but a continuum, and the optimal operating point varies by task, gap source, and available resources.
For visual transfer, the diversity principle [79, 69] and the quality-quantity tradeoff [2] bracket this continuum. For dynamics transfer, the parallel is between high-fidelity actuator modeling [77, 32, 6] and broad parameter randomization [61, 43]. The intentionally minimal simulation philosophy [33] occupies an extreme, arguing that less fidelity can improve transfer by preventing overfitting to simulation artifacts.
A resolution is emerging. Targeted fidelity improvements for specific, known gap sources combine productively with broad robustness training for poorly characterized parameters. Bjelonic et al. [6] model actuator losses precisely while randomizing terrain and payload. Kumar et al. [43] use an adaptation module to handle unmodeled dynamics while training with careful reward design for the modeled components. The field is converging on this hybrid strategy, though principled methods for deciding which parameters to model and which to randomize remain underdeveloped [24, 16, 34].
| Gap source | Primary strategy | Representative works |
|---|---|---|
| Actuator dynamics | High-fidelity SysID, actuator net, energy modeling | [77], [32], [6] |
| Contact dynamics | Residual learning, Factory-style contact simulation | [73], [37], [59], [89] |
| Visual appearance | Visual DR, GAN adaptation, canonical representations | [79], [7], [36], [67] |
| Terrain or payload | Dynamics DR plus adaptation module | [44], [43], [54] |
| Object novelty | Point clouds, dense descriptors, 3D diffusion | [64], [12], [87] |
| Action space mismatch | Waypoint models, skill decomposition | [3], [27], [85] |
| Non-stationary deployment | Continual adaptation, safe online learning | [38], [30], [43] |
| Full environment mismatch | Real-to-sim-to-real, grounded traversals, human demo retargeting | [82], [10], [58], [113] |
8.2 The Teacher-Student-Adaptation Template
Perhaps the most impactful methodological development of the review period is the crystallization of the teacher-student-adaptation architecture as a near-universal design pattern. The pattern has three components. A privileged teacher is trained with ground-truth simulation state [44, 15]. A sensorimotor student is distilled from the teacher using only deployment-available observations [44, 54]. An online adaptation module infers environment parameters from interaction history [43, 16].
This architecture has been adopted across locomotion [43, 54, 17], navigation [11, 15], and aerial robotics [48, 40]. Its success stems from a clean separation of concerns. The teacher encodes task knowledge without perceptual constraints, the distillation process handles the observation gap, and the adaptation module handles the dynamics gap. This decomposition makes each component independently improvable and diagnosable, a significant advantage over monolithic architectures where the source of transfer failure is difficult to localize.
8.3 From Zero-Shot to Continual Adaptation
The ambition level of sim-to-real transfer has evolved substantially. Early work focused on zero-shot transfer [79, 69, 61]. The recognition that residual gaps are inevitable led to few-shot adaptation methods using limited real data for calibration [78, 66] or fine-tuning [14]. The frontier has now moved to continual adaptation, meaning ongoing learning during deployment that tracks non-stationary conditions [38, 43, 89, 30].
This trajectory suggests the field is converging on a view of sim-to-real transfer not as a one-time event but as an ongoing process, with simulation providing initialization and the real world providing continuous refinement. The infrastructure for real-world learning includes safe exploration mechanisms [38], automated reset systems [30, 75], and lightweight adaptation modules [43]. This infrastructure is becoming as important as the simulation itself.
8.4 Simulation Infrastructure as a Research Instrument
The role of simulation platforms has evolved from passive training environments to active research tools. The progression from MuJoCo [81] and earlier CPU simulators to GPU-accelerated platforms like Isaac Gym [51] enabled the massively parallel training that made DR at scale computationally feasible [68]. For navigation, the development of Habitat [70], AI2-THOR [42], ProcTHOR [20], and MINOS [99] enabled training in visually rich, diverse environments. For mobile manipulation, Habitat 2.0 [76] and BEHAVIOR-1K [45] provide the first comprehensive benchmarks. Soft-body and soft-robot simulators [118, 119, 120] have extended the infrastructure envelope to deformable manipulation, while vehicle and industrial simulators [121, 100] address domain-specific deployment targets.
This infrastructure development has a compounding effect. Better simulators reduce the gap, which makes simulation-based methods more effective, which attracts more users, which drives further simulator development. The dual-simulator methodology of Ligot and Birattari [47] uses one simulator as a proxy for reality and provides a reproducible evaluation framework, but the field still lacks standardized physical benchmarks with documented gap characteristics.
8.5 Methodological Limitations
Several methodological weaknesses span the surveyed literature. First, ablation studies isolating individual technique contributions are rare. Kim [41] is a notable exception, but most papers evaluate complete pipelines holistically. Second, standardized benchmarks for sim-to-real transfer are lacking. Each paper uses different hardware, simulators, and task definitions, making direct comparison extremely difficult [117, 62]. Third, failure modes are underreported. Negative results would be highly informative but are seldom published. Fourth, the generalization claims of many works are tested on only one or two hardware platforms, making it unclear whether the methods or merely the specific sim-real pairs transfer. Fifth, the emerging foundation-model approaches [9, 60] blur the boundary between sim-to-real transfer and broad generalization, making it increasingly difficult to attribute performance improvements to transfer methods versus model scale.
9. Open Problems and Future Directions
9.1 Principled Parameter Partitioning
The finding that kinematic randomization outperforms dynamic randomization for some tasks [24], and that targeted actuator modeling outperforms generic robustness for others [6], suggests the need for principled methods to decide which parameters to model precisely and which to randomize. Future work should investigate information-theoretic or sensitivity-analysis-based approaches to automatically partition simulation parameters into "model accurately" and "randomize broadly" categories, conditioned on the specific task and available calibration data. The Bayesian system identification methods [66, 78] provide a starting point, but connecting posterior uncertainty directly to randomization range selection remains an open problem. Recent work on understanding DR [16] and on intentional noise injection [34] offers initial theoretical scaffolding for this partitioning question.
9.2 Unified Benchmarks
The absence of standardized sim-to-real benchmarks makes it impossible to rank methods or track field-wide progress. Future work should develop benchmark suites with standardized hardware platforms, precisely characterized gap sources, and established evaluation protocols covering zero-shot, few-shot, and continual adaptation settings. The dual-simulator methodology [47] provides one template, but physical benchmarks with open-source robot configurations, calibrated gap measurements, and shared result databases would be transformative. Recent efforts like the Real Robot Challenge [86] and emerging policy-evaluation benchmarks [117] demonstrate the value of standardized physical evaluation, but broader adoption is needed.
9.3 Safe and Sample-Efficient Continual Adaptation
The transition from one-time transfer to continual adaptation [38, 30, 43] raises critical safety and efficiency questions. Future work should develop formal safety guarantees that bound worst-case performance during adaptation, investigate catastrophic forgetting prevention mechanisms tailored to non-stationary deployment dynamics, and establish sample-efficiency benchmarks quantifying how quickly and safely different adaptation mechanisms respond to distribution shifts. The integration of safe reinforcement learning with continual learning [38] is a promising direction but remains at an early stage.
9.4 Foundation Models as Transfer Bridges
Large-scale visual and multimodal foundation models present an underexplored opportunity for sim-to-real transfer. Representations pretrained on internet-scale data may serve as domain-invariant bridges between simulation and reality, complementing or replacing task-specific representation learning [50, 12, 64]. Early results from RT-2 [9] and related work suggest that vision-language model representations do reduce the visual domain gap, but systematic evaluation of when and why foundation model representations help sim-to-real transfer, and when they introduce their own biases, is needed. Action-free video pretraining [112] and universal manipulation representations [96] point to the same frontier from different angles.
9.5 Joint Transfer for Mobile Manipulation
The most conspicuous gap in the current literature is the scarcity of principled methods for joint sim-to-real transfer in mobile manipulation. While navigation and manipulation transfer are each well studied, their joint transfer introduces compounding error dynamics, coordinated base-arm control challenges, and changing visual viewpoints that are not addressed by treating the problems independently. Future work should develop integrated transfer pipelines with joint randomization strategies that capture the interaction between locomotion dynamics and manipulation contact, rather than relying on modular composition of independently transferred components.
9.6 Quantifying and Predicting Transferability
Most current methods determine transfer success post hoc by deploying on real hardware. Developing predictive measures of transferability, meaning estimating expected real-world performance from simulation-only evaluation, would be transformative. The theoretical analysis of Chen et al. [16] provides initial tools for bounding the DR gap, and sim-to-real predictivity studies [39] show that ordinal predictions are often reliable. Extending these to practical cardinal predictions that account for specific gap sources, policy architectures, and randomization distributions remains open. Emerging benchmark-oriented work [117] offers a concrete evaluation substrate for this question.
9.7 Soft Bodies, Compliant Robots, and New Morphologies
Soft robot sim-to-real [118, 119, 120] is an especially unforgiving test bed because continuum mechanics, hyperelastic materials, and pneumatic dynamics are all imperfectly simulated. Analogous challenges arise in compliant mobile manipulation [122] and robotic fish or bioinspired platforms [123]. These domains motivate sim-to-real methods that fuse learned surrogate models with targeted randomization, and broader surveys of contact-rich manipulation [124] suggest that the same toolkit will be needed whenever rigid-body simulators fundamentally misspecify the physics.
10. Conclusion
This survey has covered the methods and frameworks proposed between 2018 and 2026 to bridge the simulation-to-reality gap in robotics for locomotion, navigation, manipulation, and mobile manipulation. The field has progressed from ad hoc domain randomization to a sophisticated methodological landscape encompassing principled randomization calibration [78, 66, 1], teacher-student-adaptation architectures [44, 43], representation-based transfer [50, 64], real-to-sim-to-real paradigms [82], and continual adaptation [38]. The most effective approaches combine targeted simulation fidelity for well-understood gap sources with broad robustness training for poorly characterized ones, a hybrid philosophy that has proven remarkably consistent across task domains.
The single most important takeaway is that the sim-to-real gap is not a monolithic problem but a composite of distinct, diagnosable sources (dynamics, sensors, visual appearance, actuators, action spaces, computational constraints, unmodeled phenomena), each best addressed by a different methodological tool. The most effective transfer pipelines diagnose the dominant gap sources for their specific task and apply targeted solutions, rather than relying on any single technique as a universal remedy. As simulation infrastructure matures, adaptation architectures improve, and foundation models provide increasingly domain-invariant representations, the field is moving toward a future where the gap is not eliminated but managed, continuously diagnosed, targeted, and adapted to throughout a robot's operational lifetime.
Citation
If you find this survey useful, please cite it as
@misc{sim2real_survey_2026,
author = {Hu Tianrun},
title = {Bridging the Simulation-to-Reality Gap},
year = {2026},
publisher = {GitHub},
url = {https://h-tr.github.io/blog/surveys/sim-to-real-gap.html}
}
References
- Akkaya, I., Andrychowicz, M., Chociej, M., et al. (2019). “Solving Rubik's Cube with a Robot Hand.” arXiv:1910.07113.
- Alghonaim, R., & Liarokapis, M. (2020). “Benchmarking Domain Randomisation for Visual Sim-to-Real Transfer.” arXiv:2011.07112.
- Anderson, P., Shrivastava, A., Truong, J., et al. (2020). “Sim-to-Real Transfer for Vision-and-Language Navigation.” Conference on Robot Learning.
- Andrychowicz, M., Baker, B., Chociej, M., et al. (2019). “Learning Dexterous In-Hand Manipulation.” The International Journal of Robotics Research, 39(1), 3–20.
- Bao, L. (2025). “Sim-to-Real Transfer in Deep Reinforcement Learning for Bipedal Locomotion.” arXiv preprint.
- Bjelonic, F., Lee, J., Arm, P., et al. (2025). “Towards Bridging the Gap. Systematic Sim-to-Real Transfer for Diverse Legged Robots.” arXiv:2509.06342.
- Bousmalis, K., Irpan, A., Wohlhart, P., et al. (2018). “Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping.” ICRA, 4243–4250.
- Brohan, A., Brown, N., Carbajal, J., et al. (2022). “RT-1: Robotics Transformer for Real-World Control at Scale.” arXiv:2212.06817.
- Brohan, A., Brown, N., Carbajal, J., et al. (2023). “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.” arXiv:2307.15818.
- Bruce, J., Sünderhauf, N., Mirowski, P., et al. (2017). “One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay.” arXiv:1711.10137.
- Cai, W. (2025). “NavDP. Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance.” arXiv:2505.08712.
- Cao, H.-G. (2023). “Learning Sim-to-Real Dense Object Descriptors for Robotic Manipulation.” arXiv:2304.08703.
- Carpin, S. (2007). “Bridging the Gap Between Simulation and Reality in Urban Search and Rescue.” Lecture Notes in Computer Science.
- Chebotar, Y., Handa, A., Makoviychuk, V., et al. (2019). “Closing the Sim-to-Real Loop. Adapting Simulation Randomization with Real World Experience.” ICRA, 8973–8979.
- Chen, C., Gupta, S., Salakhutdinov, R., & Gupta, A. (2020). “Learning to Explore using Active Neural SLAM.” ICLR.
- Chen, X., Hu, J., Jin, C., et al. (2021). “Understanding Domain Randomization for Sim-to-real Transfer.” arXiv:2110.03239.
- Cheng, X., Shi, K., Agarwal, A., & Pathak, D. (2024). “Extreme Parkour with Legged Robots.” ICRA.
- Chi, C., Feng, S., Du, Y., et al. (2023). “Diffusion Policy. Visuomotor Policy Learning via Action Diffusion.” Robotics: Science and Systems.
- Deitke, M., Han, W., Herrasti, A., et al. (2020). “RoboTHOR. An Open Simulation-to-Real Embodied AI Platform.” CVPR, 3164–3174.
- Deitke, M., VanderBilt, E., Herrasti, A., et al. (2022). “ProcTHOR. Large-Scale Embodied AI Using Procedural Generation.” NeurIPS.
- Ding, Z., Lepora, N. F., & Johns, E. (2020). “Sim-to-Real Transfer for Optical Tactile Sensing.” arXiv:2004.00136.
- Dong, H., Chen, H., & Wang, W. (2022). “Robotic Manipulations of Cylinders and Ellipsoids by Ellipse Detection With Domain Randomization.” IEEE/ASME Transactions on Mechatronics, 27(5), 3467–3477.
- Dowdy, J. (2025). “Isaac Sim-to-Real. Reinforcement Learning based Locomotion for Quadrupeds.” Technical report.
- Exarchos, I., Jiang, Y., Yu, W., & Liu, C. K. (2020). “Policy Transfer via Kinematic Domain Randomization and Adaptation.” arXiv:2011.01891.
- Fu, Z., Zhao, T. Z., & Finn, C. (2024). “Mobile ALOHA. Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation.” arXiv:2401.02117.
- Gervet, T., Chintala, S., Batra, D., et al. (2023). “Navigating to Objects in the Real World.” Science Robotics, 8(79).
- Gu, J., Kirmani, S., Wohlhart, P., et al. (2023). “Multi-Skill Mobile Manipulation for Object Rearrangement.” arXiv preprint.
- Handa, A., Allshire, A., Makoviychuk, V., et al. (2023). “DeXtreme. Transfer of Agile In-Hand Manipulation from Simulation to Reality.” ICRA.
- Höfer, S., Bekris, K., Handa, A., et al. (2021). “Sim2Real in Robotics and Automation. Applications and Challenges.” IEEE Transactions on Automation Science and Engineering, 18(2), 398–400.
- Hu, K. (2025). “Robot Trains Robot. Automatic Real-World Policy Adaptation and Learning for Humanoids.” arXiv:2508.12252.
- Huang, W., Wang, C., Zhang, R., et al. (2023). “VoxPoser. Composable 3D Value Maps for Robotic Manipulation with Language Models.” Conference on Robot Learning.
- Hwangbo, J., Lee, J., Dosovitskiy, A., et al. (2019). “Learning Agile and Dynamic Motor Skills for Legged Robots.” Science Robotics, 4(26), eaau5872.
- Jakobi, N. (1998). “Running Across the Reality Gap. Octopod Locomotion Evolved in a Minimal Simulation.” European Conference on Artificial Life.
- Jakobi, N. (1995). “Noise and the Reality Gap. The Use of Simulation in Evolutionary Robotics.” Lecture Notes in Computer Science.
- James, S., Davison, A., & Johns, E. (2017). “Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task.” Conference on Robot Learning.
- James, S., Wohlhart, P., Kalakrishnan, M., et al. (2019). “Sim-to-Real via Sim-to-Sim. Data-Efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks.” CVPR, 12627–12637.
- Johannink, T., Bahl, S., Nair, A., et al. (2019). “Residual Reinforcement Learning for Robot Control.” ICRA, 6023–6029.
- Josifovski, J., Urakami, Y., Kerzel, M., & Knoll, A. (2025). “Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics.” arXiv:2503.10949.
- Kadian, A., Truong, J., Gokaslan, A., et al. (2020). “Sim2Real Predictivity. Does Evaluation in Simulation Predict Real-World Performance?” IEEE Robotics and Automation Letters, 5(4), 6670–6677.
- Kaufmann, E., Bauersfeld, L., Loquercio, A., et al. (2023). “Champion-level Drone Racing using Deep Reinforcement Learning.” Nature, 620, 982–987.
- Kim, D. (2024). “Bridging the Reality Gap. Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion.” IEEE Robotics & Automation Magazine.
- Kolve, E., Mottaghi, R., Han, W., et al. (2017). “AI2-THOR. An Interactive 3D Environment for Visual AI.” arXiv:1712.05474.
- Kumar, A., Fu, Z., Pathak, D., & Malik, J. (2021). “RMA. Rapid Motor Adaptation for Legged Robots.” Robotics: Science and Systems.
- Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., & Hutter, M. (2020). “Learning Quadrupedal Locomotion over Challenging Terrain.” Science Robotics, 5(47), eabc5986.
- Li, C., Gokmen, C., Srivastava, S., et al. (2023). “BEHAVIOR-1K. A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation.” Conference on Robot Learning.
- Liang, J., Huang, W., Xia, F., et al. (2023). “Code as Policies. Language Model Programs for Embodied Control.” ICRA.
- Ligot, A., & Birattari, M. (2019). “Simulation-only Experiments to Mimic the Effects of the Reality Gap in the Automatic Design of Robot Swarms.” Swarm Intelligence, 14, 1–24.
- Loquercio, A., Kaufmann, E., Ranftl, R., Mueller, M., Koltun, V., & Scaramuzza, D. (2021). “Learning High-Speed Flight in the Wild.” Science Robotics, 6(59), eabg5810.
- Lu, J., Richter, F., & Yip, M. C. (2020). “Pose Estimation for Robot Manipulators via Keypoint Optimization and Sim-to-Real Transfer.” arXiv:2010.08054.
- Ma, H. (2024). “Skill Transfer and Discovery for Sim-to-Real Learning. A Representation-Based Viewpoint.” arXiv:2404.05051.
- Makoviychuk, V., Wawrzyniak, L., Guo, Y., et al. (2021). “Isaac Gym. High Performance GPU-Based Physics Simulation for Robot Learning.” NeurIPS Datasets and Benchmarks.
- Matas, J., James, S., & Davison, A. J. (2018). “Sim-to-Real Reinforcement Learning for Deformable Object Manipulation.” arXiv:1806.07851.
- Mehta, B., Diaz, M., Golemo, F., Pal, C. J., & Paull, L. (2020). “Active Domain Randomization.” Conference on Robot Learning.
- Miki, T., Lee, J., Hwangbo, J., et al. (2022). “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild.” Science Robotics, 7(62), eabk2822.
- Moiz, M. J. (2026). “Bridging the Sim-to-Real Gap in Robotic Manipulation Using Domain Randomization and Policy Fine-Tuning.” Zenodo.
- Müller, M., Dosovitskiy, A., Ghanem, B., & Koltun, V. (2018). “Driving Policy Transfer via Modularity and Abstraction.” Conference on Robot Learning.
- Muratore, F., Ramos, F., Turk, G., et al. (2022). “Robot Learning from Randomized Simulations. A Review.” Frontiers in Robotics and AI, 9, 799893.
- Nai, R. (2026). “Humanoid Manipulation Interface. Humanoid Whole-Body Manipulation from Robot-Free Demonstrations.” arXiv:2602.06643.
- Narang, Y., Sundaralingam, B., Macklin, M., et al. (2022). “Factory. Fast Contact for Robotic Assembly.” Robotics: Science and Systems.
- Open X-Embodiment Collaboration. (2024). “Open X-Embodiment. Robotic Learning Datasets and RT-X Models.” ICRA.
- Peng, X. B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018). “Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.” ICRA, 3803–3810.
- Pitkevich, A. V. (2024). “A Survey on Sim-to-Real Transfer Methods for Robotic Manipulation.” Technical report.
- Pylypenko, V. (2025). “Reinforcement Learning for Autonomous Navigation of Robotic Platforms Under Uncertainty. Domain Randomization and Sim-to-Real Transfer.” Technologies and Engineering.
- Qin, Y., Huang, B., Yin, Z.-H., Su, H., & Wang, X. (2022). “DexPoint. Generalizable Point Cloud Reinforcement Learning for Sim-to-Real Dexterous Manipulation.” arXiv:2211.09423.
- Radosavovic, I., Xiao, T., Zhang, B., Darrell, T., Malik, J., & Sreenath, K. (2024). “Real-World Humanoid Locomotion with Reinforcement Learning.” Science Robotics, 9(89).
- Ramos, F., Possas, R. C., & Fox, D. (2019). “BayesSim. Adaptive Domain Randomization via Probabilistic Inference for Robotics Simulators.” Robotics: Science and Systems.
- Rao, K., Harris, C., Irpan, A., et al. (2020). “RL-CycleGAN. Reinforcement Learning Aware Simulation-to-Real.” CVPR, 11157–11166.
- Rudin, N., Hoeller, D., Reist, P., & Hutter, M. (2022). “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.” Conference on Robot Learning.
- Sadeghi, F., & Levine, S. (2017). “CAD2RL. Real Single-Image Flight without a Single Real Image.” Robotics: Science and Systems.
- Savva, M., Kadian, A., Maksymets, O., et al. (2019). “Habitat. A Platform for Embodied AI Research.” ICCV, 9339–9347.
- Shah, D., Sridhar, A., Dashora, N., et al. (2023). “ViNT. A Foundation Model for Visual Navigation.” Conference on Robot Learning.
- Siekmann, J., Green, K., Warila, J., Fern, A., & Hurst, J. (2021). “Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition.” ICRA, 7309–7315.
- Silver, T., Allen, K., Tenenbaum, J., & Kaelbling, L. (2018). “Residual Policy Learning.” arXiv:1812.06298.
- Singh, R. (2024). “Robust Biped Locomotion Through Sim-to-Real Reinforcement Learning.” IRDB.
- Smith, L., Kostrikov, I., & Levine, S. (2022). “A Walk in the Park. Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning.” Robotics: Science and Systems.
- Szot, A., Clegg, A., Undersander, E., et al. (2021). “Habitat 2.0. Training Home Assistants to Rearrange their Habitat.” NeurIPS.
- Tan, J., Zhang, T., Coumans, E., et al. (2018). “Sim-to-Real. Learning Agile Locomotion For Four-Legged Robots.” Robotics: Science and Systems.
- Tiboni, G., Arndt, K., & Kyrki, V. (2022). “DROPO. Sim-to-Real Transfer with Offline Domain Randomization.” arXiv:2201.08434.
- Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.” IROS, 23–30.
- Tobin, J., Biewald, L., Duan, R., et al. (2018). “Domain Randomization and Generative Models for Robotic Grasping.” IROS, 3482–3489.
- Todorov, E., Erez, T., & Tassa, Y. (2012). “MuJoCo. A Physics Engine for Model-Based Control.” IROS, 5026–5033.
- Torne, M., Simeonov, A., Li, Z., et al. (2024). “Reconciling Reality through Simulation. A Real-to-Sim-to-Real Approach for Robust Manipulation.” arXiv:2403.03949.
- Visser, A., et al. (2012). “Closing the Gap Between Simulation and Reality in the Sensor and Motion Models of an Autonomous AR.Drone.” UvA-DARE.
- Xia, F., Li, C., Chen, K., et al. (2020). “Interactive Gibson Benchmark. A Benchmark for Interactive Navigation in Cluttered Environments.” IEEE Robotics and Automation Letters, 5(2), 713–720.
- Yokoyama, N., Ha, S., Batra, D., Wang, J., & Bucher, B. (2024). “Adaptive Skill Coordination for Robotic Mobile Manipulation.” arXiv preprint.
- Yoneda, T., Sun, C., & Walter, M. R. (2021). “Grasp and Motion Planning for Dexterous Manipulation for the Real Robot Challenge.” arXiv:2101.02842.
- Ze, Y., Zhang, G., Zhang, K., et al. (2024). “3D Diffusion Policy. Generalizable Visuomotor Policy Learning via Simple 3D Representations.” Robotics: Science and Systems.
- Zhang, L. (2023). “Towards Precise Model-free Robotic Grasping with Sim-to-Real Transfer Learning.” arXiv:2301.12249.
- Zhang, X. (2023). “Efficient Sim-to-real Transfer of Contact-Rich Manipulation Skills with Online Admittance Residual Learning.” arXiv:2310.10509.
- Zhuang, Z., Fu, Z., Wang, J., Atkeson, C., Schwertfeger, S., Finn, C., & Zhao, H. (2023). “Robot Parkour Learning.” Conference on Robot Learning.
- Du, Y., Watkins, O., Darrell, T., Abbeel, P., & Pathak, D. (2021). “Auto-Tuned Sim-to-Real Transfer.” ICRA.
- Wang, Y., et al. (2022). “A Digital Twin-Based Sim-to-Real Transfer for Deep Reinforcement Learning-Enabled Industrial Robot Grasping.” Robotics and Computer-Integrated Manufacturing.
- Ho, D., Rao, K., Xu, Z., Jang, E., Khansari, M., & Bai, Y. (2021). “RetinaGAN. An Object-aware Approach to Sim-to-Real Transfer.” ICRA.
- Florence, P. R., Manuelli, L., & Tedrake, R. (2018). “Dense Object Nets. Learning Dense Visual Object Descriptors By and For Robotic Manipulation.” Conference on Robot Learning.
- Yen-Chen, L., Zeng, A., Song, S., Isola, P., & Lin, T.-Y. (2019). “Multi-step Pick-and-Place Tasks Using Object-centric Dense Correspondences.” IROS.
- Nair, S., Rajeswaran, A., Kumar, V., Finn, C., & Gupta, A. (2022). “R3M. A Universal Visual Representation for Robot Manipulation.” arXiv:2203.12601.
- Prakash, A. (2024). “Sim-to-Real Gap in RL. Use Case with TIAGo and Isaac Sim/Gym.” arXiv:2403.07091.
- Eschmann, J., Albani, D., & Loianno, G. (2024). “Learning to Fly in Seconds.” IEEE Robotics and Automation Letters.
- Savva, M., Chang, A. X., Dosovitskiy, A., Funkhouser, T., & Koltun, V. (2017). “MINOS. Multimodal Indoor Simulator for Navigation in Complex Environments.” arXiv:1712.03931.
- Malinov, M. (2024). “Advancing Behavior Generation in Mobile Robotics through High-Fidelity Procedural Simulations.” arXiv:2405.16818.
- Balakirsky, S., & Messina, E. (2006). “High Fidelity Tools for Rescue Robotics. Results and Perspectives.” Lecture Notes in Computer Science.
- Puang, E. Y., Tee, K. P., & Jing, W. (2020). “KOVIS. Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation.” IROS.
- Chen, Y., et al. (2021). “Learning Multi-Object Dense Descriptor for Autonomous Goal-Conditioned Grasping.” IEEE Robotics and Automation Letters.
- Hu, H., et al. (2022). “Reinforcement Learning for Picking Cluttered General Objects with Dense Object Descriptors.” IEEE Transactions on Cognitive and Developmental Systems.
- Zhang, X., et al. (2020). “Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation.” ICRA.
- Dong, S., Yuan, W., & Adelson, E. H. (2017). “Improved GelSight Tactile Sensor for Measuring Geometry and Slip.” IROS.
- Fishel, J. A., & Loeb, G. E. (2012). “Sensing Tactile Microvibrations with the BioTac. Comparison with Human Sensitivity.” BioRob.
- Yamaguchi, A., & Atkeson, C. G. (2018). “FingerVision Tactile Sensor Design and Slip Detection Using Convolutional LSTM Network.” arXiv preprint.
- Narang, Y., et al. (2019). “Ground Truth Force Distribution for Learning-Based Tactile Sensing. A Finite Element Approach.” IEEE Access.
- Fang, H.-S., Wang, C., Gou, M., & Lu, C. (2020). “GraspNet-1Billion. A Large-Scale Benchmark for General Object Grasping.” CVPR.
- Xiang, Y., et al. (2020). “Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation.” Conference on Robot Learning.
- Liu, Z. (2025). “Generalist Robot Manipulation beyond Action Labeled Data.” arXiv:2509.19958.
- Kerr, J., Kim, C. M., Wu, M., et al. (2024). “Robot See Robot Do. Imitating Articulated Object Manipulation with Monocular 4D Reconstruction.” arXiv:2409.18121.
- Kim, S., Choi, S., & Kim, H. J. (2020). “Aerial Mobile Manipulator System to Enable Dexterous Manipulations with Increased Precision.” arXiv:2010.09618.
- Vasilopoulos, V., et al. (2020). “Reactive Planning for Mobile Manipulation Tasks in Unexplored Semantic Environments.” arXiv:2011.00642.
- Prakash, A. (2024). “Sim-to-Real Gap in RL. Use Case with TIAGo and Isaac Sim/Gym.” arXiv:2403.07091.
- Jiang, X. (2025). “Robot Policy Evaluation for Sim-to-Real Transfer. A Benchmarking Perspective.” arXiv:2508.11117.
- Collins, J., et al. (2020). “Scalable Sim-to-Real Transfer of Soft Robot Designs.” IEEE International Conference on Soft Robotics.
- Ranzani, T., et al. (2022). “Efficient Jacobian-Based Inverse Kinematics With Sim-to-Real Transfer of Soft Robots by Learning.” IEEE/ASME Transactions on Mechatronics.
- Fang, Z. (2025). “Bridging High-Fidelity Simulations and Physics-Based Learning using a Surrogate Model for Soft Robots.” Advanced Intelligent Systems.
- Reway, F., et al. (2020). “Test Method for Measuring the Simulation-to-Reality Gap of Camera-based Object Detection Algorithms for Autonomous Driving.” IEEE Intelligent Vehicles Symposium.
- Wang, L. (2025). “Leveraging Passive Compliance of Soft Robotics for Physical Human-Robot Collaborative Manipulation.” arXiv:2504.08184.
- Zhang, Y., et al. (2021). “Learning for Attitude Holding of a Robotic Fish. An End-to-End Approach With Sim-to-Real Transfer.” IEEE Transactions on Robotics.
- Suomalainen, M., Karayiannidis, Y., & Kyrki, V. (2021). “A Survey of Robot Manipulation in Contact.” arXiv:2112.01942.